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Problem Definition 


The Abelian hidden subgroup problem is the 
problem of finding generators for a subgroup K 
of an Abelian group G, where this subgroup is 
defined implicitly by a function f : G > X, 
for some finite set X. In particular, f has the 
property that f(v) = f(w) if and only if the 
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cosets (we are assuming additive notation for the 
group operation here.) v+ K and w+ K are equal. 
In other words, f is constant on the cosets of the 
subgroup K and distinct on each coset. 

It is assumed that the group G is finitely 
generated and that the elements of G and X have 
unique binary encodings. The binary assumption 
is only for convenience, but it is important to 
have unique encodings (e.g., in [22] Watrous 
uses a quantum state as the unique encoding of 
group elements). When using variables g and h 
(possibly with subscripts), multiplicative notation 
is used for the group operations. Variables x and 
y (possibly with subscripts) will denote integers 
with addition. The boldface versions x and y will 
denote tuples of integers or binary strings. 

By assumption, there is computational means 
of computing the function /, typically a circuit or 
“black box” that maps the encoding of a value g 
to the encoding of f(g). The theory of reversible 
computation implies that one can turn a circuit 
for computing f(g) into a reversible circuit for 
computing f(g) with a modest increase in the 
size of the circuit. Thus, it will be assumed that 
there is a reversible circuit or black box that 
maps (g,z) > (g,z® f(g)), where @ denotes 
the bit-wise XOR (sum modulo 2), and z is any 
binary string of the same length as the encoding 
of f(g). 

Quantum mechanics implies that any 
reversible gate can be extended linearly to a 
unitary operation that can be implemented in 
the model of quantum computation. Thus, it 
is assumed that there is a quantum circuit or 


black box that implements the unitary map 
Uy :|g)|z)+> |g) |z® f(g)). 

Although special cases of this problem have 
been considered in classical computer science, 
the general formulation as the hidden subgroup 
problem seems to have appeared in the context of 
quantum computing, since it neatly encapsulates 
a family of black-box problems for which quan- 
tum algorithms offer an exponential speedup (in 
terms of query complexity) over classical algo- 
rithms. For some explicit problems (i.e., where 
the black box is replaced with a specific function, 
such as exponentiation modulo A), there is a 
conjectured exponential speedup. 


Abelian Hidden Subgroup Problem: 


Input: Elements g), g2,...,2n € G that gen- 
erate the Abelian group G. A black box that 
implements Ur : |m1,mz,...,1n)|y) 
|m1,M2,...,Mn)| f(g) By) where g = 
ie a; 2... gn” and K is the hidden subgroup 
corresponding to f. 

Output: Elements h;,/2,...,4; € G that gen- 
erate K. 


Here we use multiplicative notation for the 
group G in order to be consistent with Kitaev’s 
formulation of the Abelian stabilizer problem. 
Most of the applications of interest typically use 
additive notation for the group G. 

It is hard to trace the precise origin of this 
general formulation of the problem, which simul- 
taneously generalizes “Simon’s problem” [20], 
the order-finding problem (which is the quantum 
part of the quantum factoring algorithm [18]), and 
the discrete logarithm problem. 

One of the earliest generalizations of Simon’s 
problem, order-finding problem, and discrete log- 
arithm problem, which captures the essence of the 
Abelian hidden subgroup problem, is the Abelian 
stabilizer problem which was solved by Kitaev 
using a quantum algorithm in his 1995 paper [14] 
(and also appears in [15, 16]). 

Let G be a group acting on a finite set X. 
That is, each element of G acts as a map from 
X to X in such a way that for any two elements 
gh eG, g(h(z)) = (gh)(z) for all z € X. For 
a particular element z € X, the set of elements 
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that fix z (ie., the elements g € G such that 
g(z) = z) form a subgroup. This subgroup is 
called the stabilizer of z in G, denoted Stg(z). 


Abelian Stabilizer Problem: 


Input: Elements gj, 92,....8, © G _ that 
generate the group G. An element z € X. 
A black box that implements U(g,x) 
|m1,™M2,...,Mn)|Z) -» |my,m2,...,Mn) 
|g(z)) where g = gi''g5?... gn”. 

Output: Elements 1;,/2,...,4; € G that gen- 
erate Stg(z). 


Let f, denote the function from G to X that 
maps g € G to g(z). One can implement Us, 
using U(g,x). The hidden subgroup correspond- 
ing to f, is Stg(z). Thus, the Abelian stabilizer 
problem is a special case of the Abelian hidden 
subgroup problem. 

One of the subtle differences (discussed in 
Appendix 6 of [12]) between the above for- 
mulation of the Abelian stabilizer problem and 
the Abelian hidden subgroup problem is that 
Kitaev’s formulation gives a black box that for 
any g,h € G maps |m,...,™n)| fe(A)) 
|71,..-,7n) | fe(hg)), where g = gi"! 5"? 
gn”. The algorithm given by Kitaev is essentially 
one that estimates eigenvalues of shift operations 
of the form | f;(h)) t& | f-(hg)). In general, 
these shift operators are not explicitly needed, 
and it suffices to be able to compute a map of the 
form | y) +> | f-(A) ® y) for any binary string y. 

Generalizations of this form have been known 
since shortly after Shor presented his factoring 
and discrete logarithm algorithms (e.g., [23] 
presents the hidden subgroup problem for a large 
class of finite Abelian groups or more generally 
in [11] for finite Abelian groups presented as a 
product of finite cyclic groups. In [17] the natural 
Abelian hidden subgroup algorithm is related to 
eigenvalue estimation.) 

Other problems which can be formulated in 
this way include: 


Deutsch’s Problem: 


Input: A black box that implements U;y 
|x)|b) th |x)|b @® f(x)), for some function 
f that maps Zz = {0, 1} to {0, 1}. 
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Output: “constant” if f(0) = f(1) and “bal- 
anced” if f(0) # f(1). 


Note that f(x) = f(y) if and only if x — y € 
K, where K is either {0} or Zz = {0, 1}. If K = 
{0}, then f is 1—1 or “balanced,” andif K = Zo, 
then f is constant [5,6]. 


Simon’s Problem: 


Input: A black box that implements Uy 
|x) |b) +> |x) |b @ f(x)) for some function 
f from Z5 to some set X (which is assumed to 
consist of binary strings of some fixed length) 
with the property that f(x) = f(y) if and 
only ifx—y € K = {0,s} for somes € Z. 
Output: The “hidden” string s. 


The decision version allows K = {0} and 
asks whether K is trivial. Simon [20] presents 
an efficient algorithm for solving this problem 
and an exponential lower bound on the query 
complexity. The solution to the Abelian hidden 
subgroup problem is a generalization of Simon’s 
algorithm. 


Key Results 


Theorem (ASP) There exists a quantum algo- 
rithm that, given an instance of the Abelian stabi- 
lizer problem, makes n + O(1) queries to Ug,x) 
and uses poly(n) other elementary quantum and 
classical operations, with probability at least 3 
output values 11, h2,...,h; such that Stg(z) = 
(1) ® (h2) ® +++ (hi). 

Kitaev first solved this problem (with a 
slightly higher query complexity, because 
his eigenvalue estimation procedure was not 
optimal). An eigenvalue estimation procedure 
based on the Quantum Fourier Transform 
achieves the n + O(1) query complexity [5]. 


Theorem (AHSP) There exists a quantum algo- 
rithm that, given an instance of the Abelian hid- 
den subgroup problem, makes n+ O(1) queries to 
Uy and uses poly(n) other elementary quantum 
and classical operations, with probability at least 


2 output values 4y,h2,...,h; such that K = 
(1) ® (h2) ® +++ (hi). 

In some cases, the success probability can 
be made 1 with the same complexity, and in 
general the success probability can be made 
1 — € using n + O(log(1/e)) queries and 
poly(n, log(1/e)) other elementary quantum and 
classical operations. 


Applications 


Most of these applications in fact were known 
before the Abelian stabilizer problem or hidden 
subgroup problem were formulated. 


Finding the order of an element in a group: 
Let a be an element of a group H (which does 
not need to be Abelian), and let r be the smallest 
positive integer so that a” = 1. 

Consider the function f from G = Z to the 
group H where f(x) = a* for some element a of 
H.Then f(x) = f(y) if and only if x—y € rZ. 
The hidden subgroup is K = rZ and a generator 
for K gives the order r of a. 


Finding the period of a periodic function: 
Consider a function f from G = Z toa set X 
with the property that for some positive integer r, 
we have f(x) = f(y) if and only if x — y €rZ. 
The hidden subgroup of f is K = rZ anda 
generator for K gives the period r. 

Order finding is a special case of period find- 
ing and was also solved by Shor’s algorithm 
[18]. 


Discrete Logarithms: Let a be an element of a 
group H (which does not need to be Abelian), 
with a” = 1, and suppose b = a* from 
some unknown k. The integer k is called the 
discrete logarithm of b to the base a. Consider 
the function f from G = Z, x Z, to H satis- 
fying f(x1,x%2) = a™'b*2. Then f(x1,x2) = 
(1, y2) if and only if (x1,.x2) — (V1, y2) € 
{(t, -tk),t = 0,1,...,r — 1} which is the 
subgroup ((1,—k)) of Z, x Z,. Thus, finding a 
generator for the hidden subgroup K will give 


the discrete logarithm k. Note that this algorithm 
works for H equal to the multiplicative group of 
a finite field, or the additive group of points on an 
elliptic curve, which are groups that are used in 
public-key cryptography. 

Recently, Childs and Ivanyos [3] presented 
an efficient quantum algorithm for finding dis- 
crete logarithms in semigroups. Their algo- 
rithm makes use of the quantum algorithms for 
period finding and discrete logarithms as subrou- 
tines. 


Hidden Linear Functions: Let o be some per- 
mutation of Zy for some integer N. Let h be a 
function fom G = Z x Z to Zy, h(x,y) = 
x + ay mod N. Let f = o oh. The hidden 
subgroup of f is ((—a,1)). Boneh and Lipton 
[1] showed that even if the linear structure of h 
is hidden (by o), one can efficiently recover the 
parameter a with a quantum algorithm. 


Self-Shift-Equivalent Polynomials: Given a 


polynomial P in/ variables X1, X2,..., X7 over 
*,, the function f that maps (a1,d2,...,a)) € 
v to P(X — a1,X2 — az,...,X1 — az) is 


q 
constant on cosets of a subgroup K of Fi. This 


subgroup K is the set of shift-self-equivalences 
of the polynomial P. Grigoriev [10] showed how 
to compute this subgroup. 


Decomposition of a Finitely Generated 
Abelian Group: Let G bea group with a unique 
binary representation for each element of G, and 
assume that the group operation, and recognizing 
if a binary string represents an element of G or 
not, can be done efficiently. 

Given a set of generators 21, 22,...,2n fora 
group G, output a set of elements 1, h2,...,h7, 
1 <n, from the group G such that G = 
(g1) ® (go) ®... ® (g7). Such a generating set 
can be found efficiently [2] from generators of 
the hidden subgroup of the function that maps 
(m,,m2,...,Mn) b> B11 857... Bn". 

This simple algorithm directly leads to an 
algorithm for computing the class group and 
class number of a quadratic number field, as 
pointed out by Watrous [22] in his paper that 
shows how to compute the order of solvable 
groups. Computing the class group of a more 
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general number field is a much more difficult 
task: this and related problems have been suc- 
cessfully tackled in a series of elegant work 
summarized in > Quantum Algorithms for Class 
Group of a Number Field. 

Such a decomposition of Abelian groups was 
also applied by Friedl, Ivanyos, and Santha [9] to 
test if a finite set with a binary operation is an 
Abelian group, by Kedlaya [13] to compute the 
zeta function of a genus g curve over a finite field 
F, in time polynomial in g and q, and by Childs, 
Jao, and Soukharev [4] in order to construct 
elliptic curve isogenies in subexponential time. 


Discussion: What About Non-Abelian 
Groups? 

The great success of quantum algorithms for 
solving the Abelian hidden subgroup problem 
leads to the natural question of whether it 
can solve the hidden subgroup problem for 
non-Abelian groups. It has been shown that 
a polynomial number of queries suffice [8]; 
however, in general there is no bound on 
the overall computational complexity (which 
includes other elementary quantum or classical 
operations). 

This question has been studied by many re- 
searchers, and efficient quantum algorithms can 
be found for some non-Abelian groups. However, 
at present, there is no efficient algorithm for 
most non-Abelian groups. For example, solving 
the HSP for the symmetric group would directly 
solve the graph automorphism problem. 
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Problem Definition 


Concrete Voronoi diagrams are usually defined 
for a set S of sites p that exert influence over 
the points z of a surrounding space M. Often, in- 
fluence is measured by distance functions dp (z) 
that are associated with the sites. For each p, its 
Voronoi region is given by 


VR(p,S) 
= {zeM; dp(z) <dg(z) forallqeS \ {p}}, 


and the Voronoi diagram V(S) of S is the decom- 
position of M into Voronoi regions; compare the 
entry > Voronoi Diagrams and Delaunay Trian- 
gulations of this Encyclopedia. 

Quite different Voronoi diagrams result de- 
pending on the particular choices of space, sites, 
and distance measures; see Fig. 1. A great num- 
ber of other types of Voronoi diagrams can be 
found in the monographs [1] and [14]. In each 


Abstract Voronoi 
Diagrams, Fig. 1 Voronoi 
diagrams of points in the 
Euclidean and Manhattan 
metric and of disks (or 
additively weighted points) ° 
in the Euclidean plane 


case, one wants to quickly compute the Voronoi 
diagram, because it contains a lot of distance in- 
formation about the sites. However, the classical 
algorithms for the standard case of point sites in 
the Euclidean plane do not apply to more general 
situations. 

To free us from designing individual algo- 
rithms for each and every special case, we would 
like to find a unifying concept that provides struc- 
tural results and efficient algorithms for gener- 
alized Voronoi diagrams. One possible approach 
studied in [5,6] is to construct the lower envelope 
of the 3-dimensional graphs of the distance func- 
tions d,(z), whose projection to the X Y-plane 
equals the Voronoi diagram. 


Key Results 


A different approach is given by abstract Voronoi 
diagrams that are not based on the notions of sites 
and distance measures (as their definitions vary 
anyway). Instead, AVDs are built from bisecting 
curves as primary objects [7]. 

Let S = {p,q,r,...} bea set of n indices, and 
for p #q € S, let J(p,q) = J(q, p) denote an 
unbounded curve that bisects the plane into two 
unbounded open domains D(p,q) and D(q, p). 
We require that each J(p,q) is mapped to a 
closed Jordan curve through the north pole, under 
stereographic projection to the sphere. Now we 
define Voronoi regions by 

VR(p,S) = (]) Div.4) 


qeS\{p} 


Abstract Voronoi Diagrams 


and the abstract Voronoi diagram by 


V(S) := R?\ (J vR@,S). 
pes 


The system J of the curves J(p,q) is called 
admissible if the following axioms are fulfilled 
for every subset T of S of size three. 


Al. Each Voronoi region VR(p, T) is pathwise 
connected. 

A2. Each point of R? lies in the closure of a 
Voronoi region VR(p, T). 


These combinatorial properties should not be 
too hard to check in a concrete situation because 
only triplets of sites need to be inspected. Yet, 
they form a strong foundation, as was shown 
in [8]. The following fact is crucial for the proof 
of Theorem |. It also shows that AVDs can be 
seen as lower envelopes of surfaces in dimen- 
sion 3. 


Lemmal For all p,q,r in S, we have 
D(p,q) N D(q,r) C D(p,r). Consequently, 
for each point z € R? not contained in any curve 
of J, the relation 


DP <zcq :< z2€ D(p,q) 


is an ordering of the sites in S at z. 


Theorem 1 /[f J is admissible, then axioms Al 
and A2 hold for all subsets T of S. Moreover, the 
abstract Voronoi diagram V(S) is a planar graph 


of size O(n). 


Abstract Voronoi Diagrams 


Of the classical algorithm for constructing 
Voronoi diagrams, the randomized incremental 
construction method works best for abstract 
Voronoi diagrams [8, 10]. 


Theorem 2 /f J is admissible, then V(S) can be 
constructed in an expected number of O(n logn) 
many steps and in expected linear space. 


Here basic operations like computing an intersec- 
tion of two bisecting curves are counted as one 
step. 


Applications 


To show that a concrete type of Voronoi diagram 
is under the roof of abstract Voronoi diagrams, 
one needs to prove that its bisector system is 
admissible. 

Let d be a metric in the plane that enjoys 
the following properties. Each d-disk contains a 
Euclidean disk and vice versa; for any two points 
a, b, there exists a point c different from a and b 
such that d(a,b) = d(a,c) + d(c,b) holds; for 
any two points a, b, their metric bisector 


Ba(a,b) = {z €R’; d(a,z) = d(b,z)} 


is itself a curve that maps to a closed Jordan curve 
through the north pole by stereographic projec- 
tion to the sphere, or, in case Bg (p, g) contains 2- 
dimensional pieces, its boundary consists of two 
such curves. 

The first two properties ensure that any two 
points can be connected by a d-straight path 
along which d-distances add up. The third 
condition ensures that we can choose from 
Ba(p,q) suitable bisecting curves. Let us call 
metric d very nice if also a fourth condition is 
fulfilled. Given three points a, po, p1, there exist 
d-straight paths from a to po and from a to 
Pi that have only point a in common, or each 
d-straight path from a to p; contains pi; for 
i = Oori = 1. All convex distance functions 
(gauges) are very nice. 


Theorem 3 Very nice metrics have admissible 
point bisector curves. 


Other applications of AVDs include points with 
additive weights, both the regular and the Haus- 
dorff Voronoi diagram of disjoint convex sites 
with respect to a convex distance function, and 
some types of city Voronoi diagrams; see [1] for 
further details. 


Generalizations 


How to dynamize abstract Voronoi diagrams 
has been studied in [12]. Special cases of 3- 
dimensional abstract Voronoi diagrams have been 
discussed in [11]; they include all convex distance 
functions whose unit spheres are ellipsoids. 
It is well known that for the vertices of a 
convex polygon, the Voronoi diagram can be 
constructed in linear time. This result has been 
generalized to AVDs in [9] and [4]. In [3] the 
path-connectedness of abstract Voronoi regions 
(axiom A1) has been relaxed. If a region of three 
sites can have up to s connected components, the 
abstract Voronoi diagram can still be constructed 
in expected time O(s*n )°_,m;/j), where mj; 
denotes the average number of faces per region 
in any subdiagram of j sites from S. 

In an order-k Voronoi diagram, all points of 
space M are placed in one region that shares the 
same k nearest sites in S. For k = n—1, this con- 
cept has been generalized to furthest site abstract 
Voronoi diagrams in [13]. Here the furthest (or 
inverse) region of p € S is the intersection of all 
domains D(q, p), whereg € S\{p}. If all regular 
Voronoi regions are nonempty, then the furthest 
site AVD is a tree of size O(n), even though some 
regions may be disconnected. 

General order-k abstract Voronoi diagrams 
have been studied in [2]. If all regular Voronoi 
regions are nonempty and if bisecting curves 
are in general position, a tight upper complexity 
bound of 2k(n—k) can be shown. Fortunately, the 
nonemptiness of the regular regions need only be 
tested for all subsets of S' of size 4. 
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Problem Definition 


Most classic machine learning methods depend 
on the assumption that humans can annotate all 
the data available for training. However, many 
modern machine learning applications (including 
image and video classification, protein sequence 
classification, and speech processing) have mas- 
sive amounts of unannotated or unlabeled data. 
As a consequence, there has been tremendous in- 
terest both in machine learning and its application 
areas in designing algorithms that most efficiently 
utilize the available data while minimizing the 
need for human intervention. An extensively used 
and studied technique is active learning, where 
the algorithm is presented with a large pool of 
unlabeled examples (such as all images available 
on the web) and can interactively ask for the 
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labels of examples of its own choosing from the 
pool, with the goal to drastically reduce labeling 
effort. 


Formal Setup 

We consider classification problems (such 
as classifying images by who is in them or 
classifying emails as spam or not), where the goal 
is to predict a label y based on its corresponding 
input vector x. In the standard machine learning 
formulation, we assume that the data points 
(x,y) are drawn from an unknown underlying 
distribution Dyy over X x Y; X is called the 
feature (instance) space and Y = {0,1} is the 
label space. The goal is to output a hypothesis 
function h of small error (or small 0/1 loss), 
where err(h) = Peyyy-pyy[A(x) FYI. 
In the passive learning setting, the learning 
algorithm is given a set of labeled examples 
(%1,V1),--->(%m, Vm) drawn iid. from Dyy 
and the goal is to output a hypothesis of small 
error by using only a polynomial number of 
labeled examples. In the realizable case [10] 
(PAC learning), we assume that the true label of 
any example is determined by a deterministic 
function of the features (the so-called target 
function) that belongs to a known concept class 
C (e.g., the class of linear separators, decision 
trees, etc.). In the agnostic case [10, 13], we do 
not make the assumption that there is a perfect 
classifier in C, but instead we aim to compete 
with the best function in C (i.e., we aim to 
identify a classifier whose error is not much 
worse than opt(C), the error of the best classifier 
in C). Both in the realizable and agnostic settings, 
there is a well-developed theory of Sample 
Complexity [13], quantifying in terms of the so- 
called VC-dimension (a measure of complexity of 
a concept class) how many training examples we 
need in order to be confident that a rule that does 
well on training data is a good rule for future data 
as well. 

In the active learning setting, a set of labeled 
examples (x1, 91),---,(Xm, Vm) is also drawn 
iid. from Dyy; the learning algorithm is per- 
mitted direct access to the sequence of x; values 
(unlabeled data points), but has to make a label 
request to obtain the label y; of example x;. 


The hope is that we can output a classifier of 
small error by using many fewer label requests 
than in passive learning by actively directing the 
queries to informative examples (while keeping 
the number of unlabeled examples polynomial). 
It has been long known that, in the realiz- 
able case, active learning can sometimes provide 
an exponential improvement in label complexity 
over passive learning. The canonical example [6] 
is learning threshold classifiers (X = [0,1] and 
C = {1o,a] | a € [0, 1]}). Here we can actively 
learn with only O(log(1/e)) label requests by 
using a simple binary search-like algorithm as 
follows: we first draw N = O((1/e) log(1/8)) 
unlabeled examples, then do binary search to 
find the transition from label 1 to label 0, and 
with only O(log(N)) queries we can correctly 
infer the labels of all our examples; we finally 
output a classifier from C consistent with all 
the inferred labels. By standard VC-dimension 
based bounds for supervised learning [13], we are 
guaranteed to output an €-accurate classifier. On 
the other hand, for passive learning, we provably 
need £2(1/e) labels to output a classifier of error 
at most € with constant probability, yielding the 
exponential reduction in label complexity. 


Key Results 


While in the simple threshold concept class 
described above active learning always provides 
huge improvements over passive learning, things 
are more delicate in more general scenarios. 
In particular, both in the realizable and in 
the agnostic case, it has been shown that for 
more general concept spaces, in the worst case 
over all data-generating distributions, the label 
complexity of active learning equals that of 
passive learning. Thus, much of the literature was 
focused on identifying non-worst case, natural 
conditions about the relationship between the 
data distribution and the target, under which 
active learning provides improvements over 
passive. Below, we discuss three approaches, 
under which active learning has been shown to 
reduce the label complexity: disagreement-based 
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techniques, margin-based techniques and cluster- 
based techniques. 


Disagreement-Based Active Learning 
Disagreement-based active learning was_ the 
first method to demonstrate the feasibility of 
agnostic active learning for general concept 
classes. The general algorithmic framework 
of disagreement-based active learning in the 
presence of noise was introduced with the A? 
algorithm by Balcan etal. [2]. Subsequently, 
several researchers have proposed related 
disagreement-based algorithms with improved 
sample complexity, e.g., [5,8, 11]. 

At a high level, A? operates in rounds. It 
maintains a set of candidate classifiers from the 
concept class C and in each round queries labels 
aiming to efficiently reduce this set to only few 
high-quality candidates. More precisely, in round 
i, A? considers the set of surviving classifiers 
C; © C, and asks for the labels of a few random 
points that fall in the region of disagreement of 
C;. Formally, the region of disagreement of a set 
of classifiers C; is DIS(C;) = {x € X | ifige 
Ci : f(x) # g(x)}. Based on these queried la- 
bels from DIS(C;), to obtain C;+1, the algorithm 
then throws out hypotheses that are suboptimal. 
The key ingredient is that A? only throws out 
hypotheses, for which it is statistically confident 
that they are suboptimal. 

Balcan et al. [2] show that A? provides 
exponential improvements in the label sample 
complexity in terms of the 1/e-parameter 
when the noise rate 7 is sufficiently small, 
both for learning thresholds and for learning 
homogeneous linear separators in R%, one 
of the most widely used and studied classes 
in machine learning. Following up on this, 
Hanneke [9] provided a generic analysis of the 
A? algorithm that applies to any concept class. 
This analysis quantifies the label complexity 
of A? in terms of the so-called disagreement 
coefficient of the class C. The disagreement 
coefficient is a distribution-dependent sample 
complexity measure that quantifies how fast the 
region of disagreement of the set of classifiers 
at distance r of the optimal classifier collapses 
as a function r. In particular, [9] showed that 
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the label complexity of the A? algorithm is 
0(67( 2+ 1)(d log(1/e)+log(1/8)) log(1/¢)), 
where v is the best error rate of a classifier 
in C, d is the VC-dimension of C, and @ is 
the disagreement-coefficient. As an example, 
for homogeneous linear separators, we have 
6 = 6(VJd) under uniform marginal over the 
unit ball. Here, the disagreement-based analysis 


yields a label complexity of O (ds log(1/e)) 


in the agnostic case and O (d aie log(1/e)) in the 
realizable case. 


Margin-Based Active Learning 

While the disagreement-based active learning 
line of work provided the first general 
understanding of the sample complexity benefits 
with active learning for arbitrary concept classes, 
it suffers from two main drawbacks: (1) methods 
and analyses developed in this context are often 
suboptimal in terms of label complexity, since 
they take a conservative approach and query even 
points on which there is only a small amount of 
uncertainty, (2) the methods are computationally 
inefficient. Margin-based active learning is 
a technique that overcomes both the above 
drawbacks for learning homogeneous linear 
separators under log-concave distributions. The 
technique was first introduced by Balcan et al. [3] 
and further developed by Balcan et al. [4], and 
Awasthi et al. [1]. 

At a high level, like disagreement-based meth- 
ods, the margin-based active learning algorithm 
operates in rounds, in which a number of labels 
are queried in some subspace of the domain 
and a set of candidate classifiers for the next 
round is identified. The crucial idea to reduce 
the label complexity is to design a more ag- 
gressive querying strategy by carefully choosing 
where to query instead of querying in all of 
the current disagreement region. Concretely, in 
round k the algorithm has a current hypothesis 
wz, and the set of candidate classifiers for the 
next round consists of all homogeneous halfs- 
paces that lie in a ball of radius ry around wz 
(in terms of their angle with wz). The algorithm 
then queries points for labels near the decision 
boundary of wx; that is, it only queries points 
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ry 


Active Learning - Modern Learning Theory, Fig. 1 
The margin-based active learning algorithm after iteration 
k. The algorithm samples points within margin yx of the 
current weight vector wx and then minimizes the hinge 
loss over this sample subject to the constraint that the new 
weight vector wx+1 is within distance rx from wx 


that are within a margin yz of wz; see Fig. 1. To 
obtain wz +1, the algorithm finds a loss minimizer 
among the current set of candidates with respect 
to the queried examples of round k. In the realiz- 
able case, this is done by 0/1-loss minimization. 
In the presence of noise, to obtain a compu- 
tationally efficient procedure, the margin-based 
technique minimizes a convex surrogate loss. 
Balcan et al. [3] and Balcan and Long [4] 
showed that by localizing aggressively, namely 
by setting the margin parameter to yy = Ose), 
one can actively learn with only O(d log(1/e)) 
label requests in the realizable case, when the 
underlying distribution is isotropic log-concave. 
A key idea of their analysis is to decompose, 
in round k, the error of a candidate classifier 
w as its error outside margin yz of the current 
separator plus its error inside margin yz, and 
to prove that for the above parameters, a small 
constant error inside the margin suffices to re- 
duce the overall error by a constant factor. For 
the constant error inside the margin only 6(d) 
labels need to be queried, and since in each 
round the overall error gets reduced by a con- 
stant factor, O(log(1/e)) rounds suffice to reduce 
the error to €, yielding the label complexity of 
O(d log(1/e)). Passive learning here provably 
requires {2(d/¢) labeled examples. Thus, the 
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dependence on 1/e is exponentially improved, 
but without increasing the dependence on d (as 
in the disagreement-based method for this case, 
see above). 

Building on this work, [1] gave the first 
polynomial-time active learning algorithm for 
learning linear separators to error € in the 
presence of agnostic noise (of rate O(€)) when 
the underlying distribution is an isotropic log- 
concave distribution in R¢?. They proposed to 
use a normalized hinge loss minimization (with 
normalization factor t,) for selecting the next 
classifier w+, in round k. Awasthi et al. [1] 
show that by setting the parameters appropriately 
(namely, t% = O(1/2*) and ry = O(1/2*)), 
the algorithm again achieves error € using only 
O(log(1/e)) rounds, with O(d”) label requests 
per round. This yields a query complexity of 
poly(d,log1/e). The key ingredient for the 
analysis of this computationally efficient version 
in the noisy setting is proving that by constraining 
the search for wg41 to vectors within a ball of 
radius rx around w,x, the hinge-loss acts as a 
sufficiently faithful proxy for the 0/1-loss. 

A recent work [14] proposes an elegant gener- 
alization of [3,4] to more general concept spaces 
and shows an analysis that is always tighter than 
disagreement-based active learning (though their 
results are not computationally efficient). 


Cluster-Based Active Learning 
The methods described above (disagreement- 
based and margin-based active learning) use 
active label queries to efficiently identify a clas- 
sifier from the concept class C with low error. An 
alternative approach to agnostic active learning is 
to design active querying methods that efficiently 
find a (approximately) correct labeling of the 
unlabeled input sample. Here, “correct labeling” 
refers to the hidden labels y; in the sample 
(x1, V1),---,(%m,¥m) from the distribution 
Dxy (as defined in the formal setup section). 
The so labeled sample can then be used as input to 
a passive learning algorithm to learn an arbitrary 
concept class. 

Cluster-based active learning is a method 
for the latter approach and was introduced 
by Dasgupta and Hsu [7]. The idea is to use 
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a hierarchical clustering (cluster tree) of the 
unlabeled data, and check the clusters for label 
homogeneity by starting at the root of the tree (the 
whole data set) and working towards the leaves 
(single data points). The label homogeneity of a 
cluster is estimated by choosing data points for 
label query uniformly at random from the cluster. 
If a cluster is considered label homogeneous 
(with sufficiently high confidence), all remaining 
unlabeled points in that cluster are labeled with 
the majority label. If a cluster is detected to be 
label heterogeneous, it is split into its children 
in the cluster tree and processed later. The key 
insight in [7] is that since the cluster tree is fixed 
before any labels were seen, the induced labeled 
subsample of a child cluster can be considered 
a sample that was chosen uniformly at random 
from the points in that child-cluster. Thus, the 
algorithm can reuse labels from the parent cluster 
without introducing any sampling bias. The label 
efficiency of this paradigm crucially depends 
on the quality of input hierarchical clustering. 
Intuitively, if the cluster tree has a small pruning 
with label homogeneous clusters, the procedure 
will make only few label queries. 

Urner et al. [12] proved label complexity re- 
ductions with this paradigm under a distributional 
assumption. They analyze a version (PLAL) of 
the above paradigm that uses hierarchical clus- 
terings induced by spatial trees on the domain 
[0, 1]¢ and provide label query bounds in terms of 
the Probabilistic Lipschitzness of the underlying 
data-generating distribution. Probabilistic Lips- 
chitzness quantifies a marginal-label relatedness 
in the sense of close points being likely to have 
the same label. For a distribution with determin- 
istic labels (Pr[Y = 1 | X = x] € {0,1} for all 
x), the Probabilistic Lipschitzness is a function ¢ 
that bounds, as a function of A, the mass of points 
x for which both labels 0 and 1 occur in the ball 
B AX (x). 

Urner et al. [12] show that, independently of 
the any data assumptions, (with probability 1 —6) 
PLAL labels a (1 — €)-fraction of the input points 
correctly. They further show that using PLAL as 
a preprocedure, if the data-generating distribution 
has deterministic labels and its Probabilistic Lip- 
schitzness is bounded by (A) = A” for some 
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n €N, then classes C of bounded VC-dimension 
on domain X = [0,1]? can be learned with 


7 n42d 
O (() nt+d ) many labels, while any passive 


proper learner (i.e., a passive learner that outputs 
a function from C) requires to see 92(1/e7) many 
labels. Further, [12] show that PLAL can be used 
to reduce the number of labels needed for nearest 
neighbor classification (i.e., labeling a test point 
by the label of its nearest point in the sample) 


= 2 
from 2 (ir) to O (Gyr), 
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Problem Definition 


In the theory of molecular-scale self-assembly, 
large numbers of simple interacting components 
are designed to come together to build 
complicated shapes and patterns. Many models 
of self-assembly, such as the abstract Tile 
Assembly Model [6], are cellular automata-like 
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crystal growth models. Indeed such models have 
given rise to a rich theory of self-assembly as 
described elsewhere in this encyclopedia. In 
biological organisms we frequently see much 
more sophisticated growth processes, where self- 
assembly is combined with active molecular 
components that change internal state and even 
molecular motors that have the ability to push 
and pull large structures around. Molecular 
engineers are now beginning to design and build 
molecular-scale DNA motors and active self- 
assembly systems [2]. We wish to understand, 
at a high level of abstraction, the ultimate 
computational capabilities and limitations of 
such molecular-scale rearrangement and growth. 
The nubot model, put forward in [8], is akin 
to an asynchronous nondeterministic cellular 
automaton augmented with nonlocal rigid-body 
movement. Unit-sized monomers are placed 
on a 2D triangular grid. Monomers undergo 
state changes, appear, and disappear using local 
rules, as shown in Fig. 1. However, there is 
also a nonlocal aspect to the model: rigid-body 
movement that comes in two forms, movement 
rules and random agitations. 

A movement rule r, consisting of a pair of 
monomer states A, B and two unit vectors, is a 
programmatic way to specify unit-distance trans- 
lation of a set of monomers in one step. See Fig. 2 
for an example. If A and B are in a prescribed 
orientation, one is nondeterministically chosen 
to move unit distance in a prescribed direction. 
The rule r is applied in a rigid-body fashion: 
roughly speaking, if A is to move right, it pushes 
anything immediately to its right and pulls any 
monomers that are bound to its left which in turn 
push and pull other monomers, all in one step. 
The rule may not be applicable if it is blocked 
(i.e., if movement of A would force B to also 
move), which is analogous to the fact that an 
arm cannot push its own shoulder. The other, 
somewhat related, form of movement is called 
agitation: at every point in time, every monomer 
on the grid may move unit distance in any of the 
six directions, at unit rate for each (monomer, 
direction) pair. An agitating monomer will push 
or pull any monomers that it is adjacent to, in a 
way that preserves rigid-body structure and all in 
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©) 


Break a rigid bond 


Active Self-Assembly and Molecular Robotics with 
Nubots, Fig. 1 Overview of the nubot model. (a) A 
nubot configuration showing a single nubot monomer 
on the triangular grid. (b) Examples of nubot monomer 
rules. Rules rl-16 are local cellular automaton-like rules, 


(0,0) (0,0). 

Active Self-Assembly and Molecular Robotics with 
Nubots, Fig. 2. Movement rule. (a) Initial configuration. 
(b) Movement rule with one of two results depending on 
the choice of arm or base. (c) Result if the monomer with 
state 2 is the arm or (d) monomer with state 1 is the arm. 
The shaded monomers are the movable set. The affect on 
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whereas r7 effects a nonlocal movement that may translate 
other monomers as shown in Fig.2. Monomers contin- 
uously undergo agitation, as shown in Fig. 3. A flexible 
bond is depicted as an empty red circle and a rigid bond is 
depicted as a solid red disk (from [8]) 


(0,0). 


rigid (filled red disks), flexible (hollow red circles), and 
null bonds is shown. (e) A configuration for which the 
movement rule is blocked: movement of 1 or 2 would 
force the other to move; hence the rule is not applicable 
(from [3]) 
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Active Self-Assembly and Molecular Robotics with 
Nubots, Fig. 3 Example agitations. Starting from the 
centre configuration, there are 48 possible agitations (8 
monomers, 6 directions each) each with equal probability. 
The right configuration is the result of agitation of the 


one step as shown in Fig.3. Unlike movement, 
agitations are never blocked. Rules are applied 
asynchronously and in parallel. Taking its time 
model from stochastic chemical kinetics, a nubot 
system evolves as a continuous time Markov 
process. 

For intuition, we describe motion in terms of 
pushing and pulling. However movement and ag- 
itation are actually intended to model a nanoscale 
environment with diffusion, Brownian motion, 
convection, turbulent flow, cytoplasmic stream- 
ing, and other uncontrolled inputs of energy that 
interact monomers in all directions, moving large 
molecular assemblies in a random fashion (i.e., 
agitation) and allowing motors to simply latch 
and unlatch large assemblies into position (i.e., 
the movement rule). 


Key Results 


Assembling simple structures, namely, lines and 
squares, has proven to be a fruitful way to explore 
the power of the nubot model for a few reasons. 
Firstly, it helps us develop a number of tech- 
niques and intuitions for the model. Secondly, 
lines and squares get used again and again in 
more general results that show the full power of 
the model. Thirdly, the efficiency of assembling 
simple shapes has been a de facto benchmark 
problem for a number of self assembly models 
(although this benchmark often does not give the 
full story). In a variety of models, such as the 
abstract Tile Assembly Model, cellular automata, 
and some robotics models, it takes time §2(7) to 
assemble a length n line. In the nubot model this 


monomer in state 2 in direction —, the left is the result 
of the agitation of the monomer in state | in direction <—. 
The shaded monomers are the agitation set — monomers 
that are moved by the agitation (from [4]) 


is achieved in merely O(log) expected time and 
O(log 7) states. 


Theorem 1 (([8]) For each n € N, there is a 
set of nubot rules N}° such that starting from 
a single monomer Nii® assembles a length n 
line in O(logn) expected time, n x 2 space, and 
O(log n) states. 


One can trade time for states by giving a slightly 
slower method with fewer states: 


Theorem 2 ([3]) There is a set of nubot rules 
N"¢ such that for each n € N, from a line of 
O(logn) “binary” monomers (each in state 0 or 
1), N"© assembles a length n line in O(log* n) 
expected time, n x O(1) space, and O(\) 
States. 


Ann Xn square can be built by growing a 
horizontal line and then 7 vertical lines, showing 
that assembly of squares with nubots is exponen- 
tially faster than the ©(7) expected time seen in 
the abstract Tile Assembly Model [1]: 


Theorem 3 (([8]) For each n € N, there is a 
set of nubot rules Nj" such that starting from 
a single monomer, Ni" assembles an x n 
square in O(logn) expected time, n x n space, 
and O(log n) states. 


The results above, and all of those in [3, 8], 
crucially make use of the rigid-body movement 
rule: the ability for a single monomer to control 
the movement of large objects quickly and at 
a time and place of the programmer’s choos- 
ing. However, in a molecular-scale environment, 
molecular motion is happening in a largely un- 
controlled and fundamentally random manner, 
all of the time. The agitation nubot model does 
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not have the movement rule, but instead permits 
such uncontrolled random agitation (movement). 
Although this form of movement is challenging 
to control in a precise manner, the following 
result shows we can use it to achieve sublinear 
expected time growth of a length 7 line in only 
O(n) space: 


Theorem 4 ((4]) There is a set of nubot rules 
Mine, such that Wn € N, starting from a line of 
[log n| +1 monomers, each in state 0 or 1, Nine 
in the agitation nubot model assembles ann x 1 
line in O(n"/3 logn) expected time, n x 5 space, 
and O(1) monomer states. 


For a square we can do much better, achieving 
polylogarithmic expected time: 


Theorem 5 ((4]) There is a set of nubot rules 
Nequare, Such that Yn € N, starting from 
a line of |log,n| + 1 monomers, each in 
state O or 1, Nsainate in the agitation nubot 
model assembles an n x n square in O(log? n) 
expected time, n X n space, and O(1) monomer 
States. 


This section concludes with three results 
on general-purpose computation and _ shape 
construction with the nubot model. First we 
have a computability-theoretic result: any finite 
computable connected shape can be quickly self- 
assembled. 


Theorem 6 ([8]) An arbitrary connected com- 
putable 2D shape of size < /n x ./n can be 
assembled in expected time O(log” n + t(\n|)) 
using O(s + logn) states. Here, t(|n|) is the time 
required for a program-size s Turing machine to 
compute, given a pixel index as a binary string 
of length |n| = |log,n| + 1, whether or not the 
pixel is present in the shape. 


For complicated computable shapes the 
construction for Theorem 6 necessarily requires 
computation workspace outside of the shape’s 
bounding box. The next result is of a more 
resource-bounded style and, roughly speaking, 
states that 2D patterns with efficiently com- 
putable pixel colors can be assembled using 
nubots in merely polylogarithmic expected time 
while staying inside the pattern’s bounding box. 
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Theorem 7 ([8]) An arbitrary finite computable 
2D pattern of size <n xn, wheren = 2?,p € 
N, with pixels whose color is computable on 
a polynomial time O(\|n|®) (inputs are binary 
strings of length |n| = O(logn)), linear space 
O(\n|), program-size s Turing machine, can be 
assembled in expected time O(log’*! n), with 
O(s + logn) monomer states and without grow- 
ing outside the pattern borders. 


The results cited so far can be used to compare 
the nubot model to other models of self-assembly 
and tell us that nubots build shapes and patterns 
in a fast parallel manner. The next result quan- 
tifies this parallelism in terms of a well-known 
parallel model from computational complexity 
theory: NC is the class of problems solved by 
uniform polylogarithmic depth and polynomial- 
size Boolean circuits. 


Theorem 8 ([3]) For each language L € NC, 
there is a set of nubot rules Ny, that decides L in 
polylogarithmic expected time, constant number 
of monomer states, and polynomial space in the 
length of the input string of binary monomers 
(in state O or 1). The output is a single binary 
monomer. 


This result stands in contrast to sequential ma- 
chines like Turing machines, that cannot read all 
of an n-bit input string in polylogarithmic time, 
and “somewhat parallel” models like cellular au- 
tomata and the abstract Tile Assembly Model, 
that cannot have all of n bits influence a single 
output bit decision in polylogarithmic time [5]. 
Thus, adding the nubot rigid-body movement 
primitive to an asynchronous nondeterministic 
cellular automaton drastically increases its paral- 
lel processing abilities. 


Open Problems 


Some future research directions are discussed 
here and in [3, 4, 8]. It remains as future work 
to look at other topics such as fault tolerance, 
self-healing, dynamical tasks, or systems that 
continuously respond to the environment. 
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The Complexity of Assembling Lines 

Theorem | states that a line can be grown in 
expected time O(logn), space O(n) x O(1), 
and O(logn) states, and Theorem 2 trades time 
for states to get expected time O(log” n), space 
O(n) x O(1), and O(1) states. What is the com- 
plexity (expected time x states) of assembling a 
line in the nubot model? Is it possible to meet 
the lower bound of expected time x states = 
2 (log n)? In this problem, the input should be a 
set of monomers with space x states = O(logn). 


Computational Power 

Theorem 8 gives a lower bound on the compu- 
tational complexity of the nubot model. What is 
the exact power of polylogarithmic expected time 
nubots? The answer may differ on whether we 
begin from a small collection of monomers (as 
in Theorem 8) or a large prebuilt structure. One 
challenge, for the upper bound, involves finding 
better Turing machine space, or circuit depth, 
bounds on computing multiple applications of the 
movable set on a large nubots grid. 


Synchronization and Composition of 

Nubot Algorithms 

Synchronization is a method to quickly send 
signals using nonlocal rigid-body motion [3, 8]. 
The nubot model is asynchronous, but synchro- 
nization can be used to set discrete stages, or 
checkpoints, during a complicated construction. 
This in turn facilitates composition of nubot al- 
gorithms (run algorithm 1, synchronize, run al- 
gorithm 2, synchronize, etc.) and many of the 
results cited here use it for exactly that reason. 
However, synchronization-less constructions of- 
ten exhibit a kind of independence where growth 
proceeds everywhere in parallel, without waiting 
on signals from distant components. Such sys- 
tems are highly distributed, easy to analyze, and 
perhaps more amenable to laboratory implemen- 
tation. Intuitively, this seems like the right way 
to program molecules. The proof of Theorem 7 
does not use synchronization which shows that 
without it a very general class of (efficiently) 
computable patterns can be grown and indeed 
the proof gives methods to compose nubot al- 
gorithms without resorting to synchronization. 
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It remains as future work to formalize both this 
notion of synchronization-less “independence” 
and what we mean by “composition” of nubot 
algorithms. What conditions are necessary and 
sufficient for composition of nubot algorithms? 
What classes of shapes and patterns can be as- 
sembled using without synchronization or other 
forms of rapid long-range communication? 


Agitation Versus the Movement Rule 

Is it possible to simulate the movement rule using 
agitation? More formally, is it the case that for 
each nubot program NV, there is an agitation 
nubot program Ayy, that acts just like NV but with 
some m x m scale-up in space, and a k factor 
slowdown in time, where m and k are (constants) 
independent of NV and its input? As motivation, 
note that every self-assembled molecular-scale 
structure was made under conditions where ran- 
dom jiggling of monomers is a dominant source 
of movement! Our question asks if we can pro- 
grammably exploit this random molecular motion 
to build structures quicker than without it. 


Intrinsic Universality and Simulation 

Is the nubot model intrinsically universal? Specif- 
ically, does there exist a set of monomer rules U, 
such that any nubot system WV can be simulated 
by “seeding” U with a suitable initial configura- 
tion? Here the simulation should have a spatial 
scale factor m that is a function of the number of 
states in the simulated system WV. Is the agitation 
nubot model intrinsically universal? Our hope 
would be that simulation could be used to tease 
apart the power of different notions of movement 
(e.g., to understand if nubot-style movement is 
weaker or stronger than other notions of robotic 
movement), in the way it has been used to char- 
acterize and separate the power of other self- 
assembly models [7]. 


Brownian Nubots 

With nubots, under agitation, or multiple parallel 
movement rules, larger objects move faster. This 
is intended to model an environment with uncon- 
trolled and rapid fluid flows. But in Brownian 
motion, larger objects move slower: what is the 
power of nubots with such a rate model, for 
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example, with rate equal to object size? Although 
assembly in such a model may be slower than 
with the usual model, many of the same program- 
ming principles should apply, and indeed it will 
still be possible to assemble objects in a parallel 
distributed fashion. 
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Problem Definition 


Adaptive partition is one of major techniques 
to design polynomial-time approximation 
algorithms, especially polynomial-time approx- 
imation schemes for geometric optimization 
problems. The framework of this technique 
is to put the input data into a rectangle and 
partition this rectangle into smaller rectangles 
by a sequence of cuts so that the problem is also 
partitioned into smaller ones. Associated with 
each adaptive partition, a feasible solution can be 
constructed recursively from solutions in smallest 
rectangles to bigger rectangles. With dynamic 
programming, an optimal adaptive partition is 
computed in polynomial time. 


Historical Note 

The adaptive partition was first introduced to 
the design of an approximation algorithm by Du 
et al. [4] with a guillotine cut while they studied 
the minimum edge-length rectangular partition 
(MELRP) problem. They found that if the par- 
tition is performed by a sequence of guillotine 
cuts, then an optimal solution can be computed 
in polynomial time with dynamic programming. 
Moreover, this optimal solution can be used as a 
pretty good approximation solution for the origi- 
nal rectangular partition problem. Both Arora [1] 
and Mitchell et al. [12, 15] found that the cut does 
not need to be completely guillotine. In other 
words, the dynamic programming can still run 
in polynomial time if subproblems have some 
relations but the number of relations is small. 
As the number of relations goes up, the approxi- 
mation solution obtained approaches the optimal 
one, while the run time, of course, goes up. They 
also found that this technique can be applied to 
many geometric optimization problems to obtain 
polynomial-time approximation schemes. 


Key Results 


The MELRP was proposed by Lingas et al. [10] 
as follows: Given a rectilinear polygon possibly 
with some rectangular holes, partition it into 
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rectangles with minimum total edge length. Each 
hole may be degenerated into a line segment or a 
point. 

There are several applications mentioned in 
[10] for the background of the problem: process 
control (stock cutting), automatic layout systems 
for integrated circuit (channel definition), and 
architecture (internal partitioning into offices). 
The minimum edge-length partition is a natural 
goal for these problems since there is a cer- 
tain amount of waste (e.g., sawdust) or expense 
incurred (e.g., for dividing walls in the office) 
which is proportional to the sum of edge lengths 
drawn. For very large-scale integration (VLSD 
design, this criterion is used in the MIT place- 
ment and interconnect (PI) system to divide the 
routing region up into channels — one finds that 
this produces large “natural-looking” channels 
with a minimum of channel-to-channel interac- 
tion to consider. 

They showed that while the MELRP in general 
is nondeterministic polynomial-time (NP)-hard, 
it can be solved in time O(n*) in the hole- 
free case, where n is the number of vertices 
in the input rectilinear polygon. The polynomial 
algorithm is essentially a dynamic programming 
based on the fact that there always exists an opti- 
mal solution satisfying the property that every cut 
line passes through a vertex of the input polygon 
or holes (namely, every maximal cut segment is 
incident to a vertex of input or holes). 

A naive idea to design an approximation al- 
gorithm for the general case is to use a forest 
connecting all holes to the boundary and then to 
solve the resulting hole-free case in O(n*) time. 
With this idea, Lingas [9] gave the first constant- 
bounded approximation; its performance ratio 
is 41. 

Motivated by a work of Du et al. [6] on 
application of dynamic programming to opti- 
mal routing trees, Du et al. [4] initiated an idea 
of adaptive partition. They used a sequence of 
guillotine cuts to do rectangular partition; each 
guillotine cut breaks a connected area into at least 
two parts. With dynamic programming, they were 
able to show that a minimum-length guillotine 
rectangular partition (i.e., one with minimum 
total length among all guillotine partitions) can 
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be computed in O(n>) time. Therefore, they 
suggested using the minimum-length guillotine 
rectangular partition to approximate the MELRP 
and tried to analyze the performance ratio. Un- 
fortunately, they failed to get a constant ratio in 
general and only obtained an upper bound of 2 for 
the performance ratio in an NP-hard special case 
[7]. In this special case, the input is a rectangle 
with some points inside. Those points are holes. 
The following is a simple version of the proof 
obtained by Du et al. [5]. 


Theorem 1 The minimum-length guillotine rect- 
angular partition is an approximation with per- 
formance ratio 2 for the MELRP. 


Proof Consider a rectangular partition P. Let 
proj x (P) denote the total length of segments on 
a horizontal line covered by vertical projection of 
the partition P. 

A rectangular partition is said to be covered 
by a guillotine partition if each segment in the 
rectangular partition is covered by a guillotine 
cut of the latter. Let guil(P) denote the minimum 
length of the guillotine partition covering P and 
length(P) denote the total length of rectangular 
partition P. It will be proved by induction on the 
number k of segments in P that 


guil(P) <2-length(P) — projx(P). 


For k = 1, one has guil(P) = length(P). If the 
segment is horizontal, then one has proj,(P) = 
length(P) and hence 


guil(P) = 2-length(P) — proj,(P). 


If the segment is vertical, then proj, (P) = 0 and 
hence 


guil(P) <2-length(P) — projx(P). 


Now, consider k > 2. Suppose that the initial 
rectangle has each vertical edge of length a and 
each horizontal edge of length b. Consider two 
cases. 

Case 1. There exists a vertical segment s having 
length greater than or equal to 0.5a. Apply a 
guillotine cut along this segment s. Then the 
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remainder of P is divided into two parts, P 
and P2, which form rectangular partition of two 
resulting small rectangles, respectively. By induc- 
tion hypothesis, 


guil(P;) < 2-length(P;) — projx(Pi) 


fori = 1,2. Note that 


guil(P) < guil (Pi) + guil (P2) +4, 
length(P) = length (P,) + length (P2) 
+ length (s), 
projx (P) = projx (Pi) + projx (P2). 


Therefore, 


guil(P) <2-length(P) — projx(P). 


Case 2. No vertical segment in P has length 
greater than or equal to 0.5a. Choose a horizontal 
guillotine cut which partitions the rectangle into 
two equal parts. Let P; and Pz denote rectangle 
partitions of the two parts, obtained from P. By 
induction hypothesis, 


guil(P;) <2-length(Pi) — projx(Pi) 
fori = 1,2. Note that 
guil(P) = guil(P,) + guil(P2) + 5, 
length(P) => length(P,) + length(P2), 
projx(P) = projx(Pi) = projx(P2) = b. 
Therefore, 


guil(P) <2-length(P) — projx(P). 


Gonzalez and Zheng [8] improved this upper 
bound to 1.75 and conjectured that the perfor- 
mance ratio in this case is 1.5. oO 
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Applications 


In 1996, Arora [1] and Mitchell et al. [12, 14, 15] 
found that the cut does not necessarily have 
to be completely guillotine in order to have a 
polynomial-time computable optimal solution for 
such a sequence of cuts. Of course, the number 
of connections left by an incomplete guillotine 
cut should be limited. While Mitchell et al. de- 
veloped the m-guillotine subdivision technique, 
Arora employed a “portal” technique. They also 
found that their techniques can be used for not 
only the MELRP, but also for many geometric 
optimization problems [1—3, 12—15]. 


Open Problems 


One current important submicron step of tech- 
nology evolution in electronics interconnects has 
become the dominating factor in determining 
VLSI performance and reliability. Historically 
a problem of interconnects design in VLSI has 
been very tightly intertwined with the classi- 
cal problem in computational geometry: Steiner 
minimum tree generation. Some essential char- 
acteristics of VLSI are roughly proportional to 
the length of the interconnects. Such character- 
istics include chip area, yield, power consump- 
tion, reliability, and timing. For example, the 
area occupied by interconnects is proportional 
to their combined length and directly impacts 
the chip size. Larger chip size results in re- 
duction of yield and increase in manufacturing 
cost. The costs of other components required for 
manufacturing also increase with the increase of 
the wire length. From the performance angle, 
longer interconnects cause an increase in power 
dissipation, degradation of timing, and other un- 
desirable consequences. That is why finding the 
minimum length of interconnects consistent with 
other goals and constraints is such an important 
problem at this stage of VLSI technology. 

The combined length of the interconnects on 
a chip is the sum of the lengths of individual 
signal nets. Each signal net is a set of electrically 
connected terminals, where one terminal acts 
as a driver and other terminals are receivers of 
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electrical signals. Historically, for the purpose 
of finding an optimal configuration of intercon- 
nects, terminals were considered as points on the 
plane, and a routing problem for individual nets 
was formulated as a classical Steiner minimum 
tree problem. For a variety of reasons, VLSI 
technology implements only rectilinear wiring 
on the set of parallel planes, and, consequently, 
with few exceptions, only a rectilinear version of 
the Steiner tree is being considered in the VLSI 
domain. This problem is known as the RSMT. 

Further progress in VLSI technology resulted 
in more factors than just length of interconnects 
gaining importance in selection of routing topolo- 
gies. For example, the presence of obstacles led 
to reexamination of techniques used in studies of 
the rectilinear Steiner tree, since many classical 
techniques do not work in this new environment. 
To clarify the statement made above, we will 
consider the construction of a rectilinear Steiner 
minimum tree in the presence of obstacles. 

Let us start with a rectilinear plane with ob- 
stacles defined as rectilinear polygons. Given n 
points on the plane, the objective is to find the 
shortest rectilinear Steiner tree that interconnects 
them. One already knows that a polynomial-time 
approximation scheme for RSMT without obsta- 
cles exists and can be constructed by adaptive 
partition with application of either the portal or 
the m-guillotine subdivision technique. However, 
both the m-guillotine cut and the portal tech- 
niques do not work in the case that obstacles ex- 
ist. The portal technique is not applicable because 
obstacles may block the movement of the line that 
crosses the cut at a portal. The m-guillotine cut 
could not be constructed either, because obstacles 
may break down the cut segment that makes the 
Steiner tree connected. 

In spite of the facts stated above, the RSMT 
with obstacles may still have polynomial-time 
approximation schemes. Strong evidence was 
given by Min et al. [11]. They constructed a 
polynomial-time approximation scheme for the 
problem with obstacles under the condition that 
the ratio of the longest edge and the shortest edge 
of the minimum spanning tree is bounded by a 
constant. This design is based on the classical 
nonadaptive partition approach. All of the above 
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make us believe that a new adaptive technique 


can be found for the case with obstacles. 
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Problem Definition 


A spanner is a subgraph of a given graph that 
faithfully preserves the pairwise distances of that 
graph. Formally, an (a, 8) spanner of a graph 
G = (V,E) is a subgraph H of G such that 
for any pair of nodes x, y, dist(x,y,H) < a- 
dist(x, y,G) + 8, where dist(x, y,H’) for a 
subgraph H’ is the distance of the shortest path 
from s to t in H’. We say that the spanner is 
additive if a = 1, and if in addition B = O(1), 
we say that the spanner is purely additive. If 
B = 0, we say that the spanner is multiplicative; 
otherwise, we say that the spanner is mixed. 
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Key Results 


This section presents a survey on _ span- 
ners with a special focus on additive span- 
ners. 

Graph spanners were first introduced in 
[12,13] in the late 1980s and have been 
extensively studied since then. 

Spanners are used as a key ingredient in many 
distributed applications, e.g., synchronizers [13], 
compact routing schemes [6, 14, 17], broadcast- 
ing [11], etc. 

Much of the work on spanners considers mul- 
tiplicative spanners. A well-known theorem on 
multiplicative spanners is that one can efficiently 
construct a (2k — 1,0) spanner with O (ni+1/k) 
edges [2]. Based in the girth conjecture of Erdés 
[10], this size-stretch ratio is conjectured to be 
optimal. 

The problem of additive spanners was also 
extensively studied, but yet several key questions 
remain open. The girth conjecture does not con- 
tradict the existence of (1, 2k —2) spanners of size 
O (n'*1/F), or in fact any (w, 8) spanners of size 
O gers) such thata+ 8 = 2k—1 witha > 1 
and B > 0. 

The first construction for purely additive 
spanners was introduced by Aingworth et al. [1]. 
It was shown in [1] how to efficiently construct a 
(1, 2) spanner, or a 2-additive spanner for short, 
with O (n3/) edges (see also [8, 9, 16, 19] for 
further follow-up). Later, Baswana et al. [3, 4] 
presented an efficient construction for 6-additive 
spanners with O (n4/3) edges. Woodruff [21] 
further presented another construction for 6- 
additive spanners with O (n‘4/3) edges with 
improved construction time. Chechik [7] recently 
presented a new algorithm for (1, 4)-additive 
spanners with O (n7/>) edges. These are the only 
three constructions known for purely additive 
spanners. Interestingly, Woodruff [20] presented 
lower bounds for additive spanners that match the 
girth conjecture bounds. These lower bounds do 
not rely on the correctness of the conjecture. 
More precisely, Woodruff [20] showed the 
existence of graphs for which any spanner of size 
O (k~!n'*1/*) must have an additive stretch of 
at least 2k — 1. 
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In the absence of additional purely additive 
spanners, attempts were made to seek sparser 
spanners with nonconstant additive stretch. Bol- 
lobas et al. [5] showed how to efficiently con- 
struct a (1,7!~?°) spanner with O (2!/8n!+°) 
edges for any 6 > 0. Later, Baswana et al. in [3,4] 
improved this additive stretch to (1, mae and 
in addition, Pettie [15] improved the stretch to 
(1,9/16—78/8) (the latter is better than the former 
for every 6 < 7/34). 

Chechik [7] recently further improved the 
stretch for a specific range of 6. More specifically, 
Chechik presented a construction for additive 
spanners with O (7) edges and O eames | 
additive stretch for any 3/17 < 6 < 1/3. Namely, 
[7] decreased the stretch for this range to the root 
of the best previously known additive stretch. 

Sublinear additive spanners, that is, spanners 
with additive stretch that is sublinear in the 
distances, were also studied. Thorup and Zwick 
[19] showed a construction of a O (kn!*1/*) 
size spanner such that for every pair of nodes s 
and t, the additive stretch is O (a + a 
where d = dist(s,t) is the distance between 
s and t. This was later improved by Pettie 


[15] who presented an efficient spanner 
(3/4)K—-2 
200 | size 


and O (kd!-V/k + kk) additive stretch, where 
d = dist(s,t). Specifically, for k = 2, the size of 
the spanner is O (n°/5) and the additive stretch 
is O (va ). 

Chechik [7] slightly improved the size of 
Pettie’s [15] sublinear additive spanner with 
additive stretch O(Vd) from O (n!**/5) to 


fa} (it3/17), 


: : 14+ 
construction with O (in 


Open Problems 


A major open problem in the area of additive 
spanners is on the existence of purely additive 
spanners with O(n'*®) for any § > 0. In 
particular, proving or disproving the existence of 
a spanner of size O (n4/ a) for some constant 
€ with constant or even polylog additive stretch 
would be a major breakthrough. 
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Problem Definition 


The model studied here is the same as that which 
is first presented in [10] by Varian. For some 
keyword, VN = {1,2,...,N} advertisers bid 
K = {1,2,..., K} advertisement slots (K < NV) 
which will be displayed on the search result page 
from top to bottom. The higher the advertisement 
is positioned, the more conspicuous it is and the 
more clicks it receives. Thus for any two slots 
ki,k2y € K, if ky < ko, then slot k,’s click- 
through rate (CTR) cx, is larger than cz. That 
is, Cy > Cz > +++ > cx, from top to bottom, 
respectively. Moreover, each bidder i € NV has 
privately known information, v’, which repre- 
sents the expected return of per click to bidder 7. 
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According to each bidder 7’s submitted bid bi, 
the auctioneer then decides how to distribute the 
advertisement slots among the bidders and how 
much they should pay for per click. In particular, 
the auctioneer first sorts the bidders in decreasing 
order according to their submitted bids. Then the 
highest slot is allocated to the first bidder, the 
second highest slot is allocated to the second 
bidder, and so on. The last N — K bidders would 
lose and get nothing. Finally, each winner would 
be charged on a per-click basis for the next bid in 
the descending bid queue. The losers would pay 
nothing. 

Let bx denote the kth highest bid in the de- 
scending bid queue and v x the true value of the 
kth bidder in the descending queue. Thus if bid- 
der i got slot k, i’s payment would be bg+4 - Cx. 
Otherwise, his payment would be zero. Hence, 
for any bidder i € AN, if i were on slot k € K, 
his/her utility (payoff) could be represented as 


u, = (v' — bey) + CK 


Unlike one-round sealed-bid auctions where 
each bidder has only one chance to bid, the 
adword auction allows bidders to change their 
bids any time. Once bids are changed, the 
system refreshes the ranking automatically 
and instantaneously. Accordingly, all bidders’ 
payment and utility are also recalculated. As a 
result, other bidders could then have an incentive 
to change their bids to increase their utility 
and so on. 


Definition 1 (Adword Pricing) 


INPUT: The CTR for each slot, each bidder’s 
expected return per click on his/her advertising 

OuTPUuUT: The stable states of this auction and 
whether any of these stable states can be 
reached from any initial states 


Key Results 


Let b represent the bid vector (b!, b?,... DN). 
Vi € WN, O'(b) denotes bidder i’s place 
in the descending bid queue. Let b’ = 
(b!,...,b'-!,bit!,...,bN) denote the bids 
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of all other bidders except 7. M'(b~) returns a 
set defined as 


M'(b™) = ara oleh (1) 


Definition 2 (Forward-Looking Best-Response 
Function) Given b~', suppose O!(M'(b~“), 
b') = k, then bidder i’s forward-looking 
response function F' (b~‘) is defined as 


F'(b') 


vi — Fk (v! —bey1) 25k<5 K 
vi k=lork>K 
(2) 


Definition 3 (Forwarding-Looking Equilibr- 
ium) A forward-looking best-response function- 
based equilibrium is a strategy profile b such that 


Vie N,b! € F'(b") 


Definition 4 (Output Truthful (Kao et al., 
2006, Output truthful versus input truthful: 
a new concept for algorithmic mechanism 
design, unpublished) [7]) For any instance 
of adword auction and the corresponding 
equilibrium set €, if Ve € € and Vi € WN, 
O'(e) = O(v!,...,v%), then the adword 
auction is output truthful on €. 


Theorem 1 An adword auction is output truthful 


on E forward—looking- 


Corollary 1 An Adword auction has a unique 
forward-looking Nash equilibrium. 


Corollary 2 Any bidder’s payment under the 
forward-looking Nash equilibrium is equal to 
his/her payment under the VCG mechanism for 
the auction. 


Corollary 3 For adword auctions, the auction- 
eer’s revenue in a forward-looking Nash equilib- 
rium is equal to his/her revenue under the VCG 
mechanism for the auction. 


Definition 5 (Simultaneous Readjustment 
Scheme) In a_ simultaneous readjustment 
scheme, all bidders participating in the auction 
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will use forward-looking best-response function 
F to update their current bids simultaneously, 
which turns the current stage into a new stage. 
Then based on the new stage, all bidders may 
update their bids again. 


Theorem 2 An adword auction may not always 
converge to a forward-looking Nash equilibrium 
under the simultaneous readjustment scheme 
even when the number of slots is 3. But the 
protocol converges when the number of slots is 2. 
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Definition 6 (Round-Robin Readjustment 
Scheme) In the round-robin — readjustment 
scheme, bidders update their biddings one after 
the other, according to the order of the bidder’s 
number or the order of the slots. 


Theorem 3 An adword auction may not always 
converge to a forward-looking Nash equilibrium 
under the round-robin readjustment scheme even 
when the number of slots is 4. But the protocol 
converges when the number of slots is 2 or 3. 


1 Readjustment Scheme: Lowest-First(K, 7, by, bg,-- +, bx) 


: if (j= 0) then 
exit 
: end if 


: Let 7 be the ID of the bidder whose current bid is b; (and equivalently, b*). 


: Let F*(b~™) be the best response function value for Bidder i. 
: Re-sort the bid sequence. (So h is the slot of the new bid F*(b~) of Bidder 7.) 


if (h < j) then 
call Lowest-First (1.,7,b1,b2,---,0n), 
10: else 
11: call Lowest-First(,h—1,b1,b9,---,bn) 
12: end if 


1 
e, 
3 
4 
5: Let h = O'(M?*(b™), b~). 
6 
7 
8 
9 


Theorem 4 Adword auctions converge to a 
forward-looking Nash equilibrium in finite steps 
with a lowest-first adjustment scheme. 


Theorem 5 Adword auctions converge to 
a_ forward-looking Nash _ equilibrium with 
probability I under a randomized readjustment 
scheme. 


Applications 


Online adword auctions are the fastest growing 
form of advertising. Many search engine compa- 
nies such as Google and Yahoo! make huge prof- 
its on this kind of auction. Because advertisers 
can change their bids anytime, such auctions can 
reduce the advertisers’ risk. Further, because the 
advertisement is only displayed to those people 
who are really interested in it, such auctions can 
reduce the advertisers’ investment and increase 
their return on investment. 


For the same model, Varian [10] focuses on 
a subset of Nash equilibria, called Symmetric 
Nash Equilibria, which can be formulated nicely 
and dealt with easily. Edelman et al. [8] study 
locally envy-free equilibria, where no player can 
improve his/her payoff by exchanging bids with 
the player ranked one position above him/her. 
Coincidently, locally envy-free equilibrium is 
equal to symmetric Nash equilibrium proposed 
in [10]. Further, the revenue under the forward- 
looking Nash equilibrium is the same as the 
lower bound under Varian’s symmetric Nash 
equilibria and the lower bound under Edelman 
et al.’s locally envy-free equilibria. In [6], Cary 
et al. also study the dynamic model’s equilibria 
and convergence based on the balanced bidding 
strategy which is actually the same as the 
forward-looking best-response function in [4]. 
Cary et al. explore the convergence properties 
under two models, a synchronous model which 
is the same as simultaneous readjustment scheme 
in [4] and an asynchronous model which is 
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the same as randomized readjustment scheme 
in [4]. 

In addition, there are other models for adword 
auctions. Abrams [1] and Bu et al. [5] study 
the model under which each bidder could submit 
his/her daily budget, even the maximum number 
of clicks per day, in addition to the price per 
click. Both [9] and [3] study bidders’ behavior 
of bidding on several keywords. Aggarwal et al. 
[2] studies the model where the advertiser not 
only submits a bid but additionally submits which 
positions he/she is going to bid for. 


Open Problems 


The speed of convergence still remains open. 
Does the dynamic model converge in polynomial 
time under randomized readjustment scheme? 
Even more, are there other readjustment scheme 
that converge in polynomial time? 
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Problem Definition 


In the k-Server Problem, one wishes to schedule 
the movement of k-servers in a metric space M, 
in response to a sequence Q = 11,/2,...,1n Of 
requests, where rj; € M for each 7. Initially, all 
the servers are located at some initial configu- 
ration X¥9 C M of k points. After each request 
rj; is issued, one of the k-servers must move 
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to r;. A schedule specifies which server moves 
to each request. The cost of a schedule is the 
total distance traveled by the servers, and our 
objective is to find a schedule with minimum 
cost. 

In the online version of the k-Server Problem, 
the decision as to which server to move to each 
request r; must be made before the next request 
rj+1 is issued. In other words, the choice of this 
server is a function of requests 71,72,...,7r;. It 
is quite easy to see that in this online scenario, it 
is not possible to compute an optimal schedule 
for each request sequence, raising the question 
of how to measure the accuracy of such online 
algorithms. A standard approach to doing this 
is based on competitive analysis. If A is an 
online k-server algorithm denote by cost,4(Q) 
the cost of the schedule produced by A on a 
request sequence g, and by opt(q) the cost of 
the optimal schedule. A is called R-competitive 
if costa(o) < R-opt(o) + B, where B is 
a constant that may depend on M and Xo. The 
smallest such R is called the competitive ratio 
of A. 

The k-Server Problem was introduced by 
Manasse, McGeoch, and Sleator [7, 8], who 
proved that there is no online R-competitive 
algorithm for R < k, for any metric space 
with at least k + 1 points. They also gave a 2- 
competitive algorithm for k = 2 and formulated 
what is now known as the k-Server Conjecture, 
which postulates that there exists a k-competitive 
online algorithm for all k. Koutsoupias and 
Papadimitriou [5, 6] proved that the so-called 
Work-Function Algorithm has competitive ratio 
at most 2k — 1, which to date remains the best 
upper bound known. 

Efforts to prove the k-Server Conjecture led to 
discoveries of k-competitive algorithms for some 
restricted classes of metric spaces, including Al- 
gorithm DC-TREE for trees [3] presented in this 
entry. (See [1, 2,4] for other examples.) A tree 
is a metric space defined by a connected acyclic 
graph whose edges are treated as line segments 
of arbitrary positive lengths. This metric space 
includes both the tree’s vertices and the points on 
the edges, and the distances are measured along 
the (unique) shortest paths. 
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Key Results 


Let T be a tree, as defined above. Given the 
current server configuration S = {51,..., 5x}, 
where s; denotes the location of server j, and a 
request point 7, the algorithm will move several 
servers, with one of them ending up on r. For 
two points x, y € T, let [x, y] be the unique path 
from x to y in T. A server j is called active if 
there is no other server in [s;,r] — {sj} and j is 
the minimum-index server located on s; (the last 
condition is needed only to break ties). 


Algorithm DC-TREE. On a request r, move all 
active servers, continuously and with the same 
speed, towards r, until one of them reaches the 
request. Note that during this process some active 
servers may become inactive, in which case they 
halt. Clearly, the server that will arrive at r is the 
one that was closest to r at the time when r was 
issued. 

More formally, denoting by s; the variable 
representing the current position of server 7, the 
algorithm serves r as follows: 


while s; 4 r for all j do 
let 8 = 5 minj<; {d(s;,5;) + d(s.r) 
—d(s;,r)} 
move each active server s; by distance 
6 towards r 


The example below shows how DC-TREE 
serves a request r (Fig. 1). 

The competitive analysis of Algorithm DC- 
TREE is based on a potential argument. The cost 
of Algorithm DC-TREE is compared to that of 
an adversary who serves the requests with her 
own servers. Denoting by A the configuration of 
the adversary servers at a given step, define the 
potential by 6 = k - D(S, A) + Vij <; A(si, 8), 
where D(S, A) is the cost of the minimum match- 
ing between S and A. At each step, the adversary 
first moves one of her servers to r. In this sub- 
step the potential increases by at most k times the 
increase of the adversary’s cost. Then, Algorithm 
DC-TREE serves the request. One can show that 
then the sum of ® and DC-TREE’s cost does not 
increase. These two facts, by amortization over 
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Algorithm DC-Tree for k-Servers on Trees, Fig. 1 
Algorithm DC-TREE serving a request on r. The config- 
uration before 7 is issued is on the left; the configuration 
after the service is completed is on the right. At first, all 


the whole request sequence, imply the following 
result [3]: 


Theorem 1 ({3]) Algorithm DC-TREE is k- 
competitive on trees. 


Applications 


The k-Server Problem is an abstraction of various 
scheduling problems, including emergency crew 
scheduling, caching in multilevel memory sys- 
tems, or scheduling head movement in 2-headed 
disks. Nevertheless, due to its abstract nature, the 
k-server problem is mainly of theoretical interest. 

Algorithm DC-TREE can be applied to other 
spaces by “embedding” them into trees. For ex- 
ample, a uniform metric space (with all distances 
equal 1) can be represented by a star with arms 
of length 1/2, and thus Algorithm DC-TREE 
can be applied to those spaces. This also im- 
mediately gives a k-competitive algorithm for 
the caching problem, where the objective is to 
manage a two-level memory system consisting 
of a large main memory and a cache that can 
store up to kK memory items. If an item is in the 
cache, it can be accessed at cost 0, otherwise it 
costs 1 to read it from the main memory. This 
caching problem can be thought of as the k-server 
problem in a uniform metric space where the 
server positions represent the items residing in 
the cache. This idea can be extended further to the 
weighted caching [4], which is a generalization of 
the caching problem where different items may 
have different costs. In fact, if one can embed 
a metric space M into a tree with distortion 
bounded by 6, then Algorithm DC-TREE yields 
a 6k-competitive algorithm for M. 


servers are active. When server 3 reaches point x, server 1 
becomes inactive. When server 3 reaches point y, server 
2 becomes inactive 


Open Problems 


The k-Server Conjecture — whether there is a k- 
competitive algorithm for k-servers in any metric 
space — remains open. It would be of interest 
to prove it for some natural special cases, for 
example the plane, either with the Euclidean or 
Manhattan metric. (A k-competitive algorithm 
for the Manhattan plane for k = 2,3 servers is 
known [1], but not for k > 4). 

Very little is known about online randomized 
algorithms for k-servers. In fact, even for k = 2 
it is not known if there is a randomized algorithm 
with competitive ratio smaller than 2. 
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Problem Definition 


The fusion of concepts taken from the fields 
of quantum computation, data compression, and 
thermodynamics has recently yielded novel algo- 
rithms that resolve problems in nuclear magnetic 
resonance and potentially in other areas as well, 
algorithms that “cool down” physical systems. 


¢ A leading candidate technology for the con- 
struction of quantum computers is nuclear 
magnetic resonance (NMR). This technology 
has the advantage of being well established 
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for other purposes, such as chemistry and 
medicine. Hence, it does not require new and 
exotic equipment, in contrast to ion traps and 
optical lattices, to name a few. However, when 
using standard NMR techniques, (not only for 
quantum computing purposes) one has to live 
with the fact that the state can only be ini- 
tialized in a very noisy manner: The particles’ 
spins point in mostly random directions, with 
only a tiny bias towards the desired state. 

The key idea of Schulman and Vazi- 
rani [27] is to combine the tools of both data 
compression and quantum computation, to 
suggest a scalable state initialization process, 
a “molecular-scale heat engine.” Based on 
Schulman and Vazirani’s method, Boykin, 
Mor, Roychowdhury, Vatan, and Vrijen [4] 
then developed a new process, “heat-bath 
algorithmic cooling,” to significantly improve 
the state initialization process, by opening 
the system to the environment. Strikingly, 
this offered a way to put to good use 
the phenomenon of decoherence, which 
is usually considered to be the villain in 
quantum computation. These two methods are 
now sometimes called “closed-system” (or 
“reversible”’), algorithmic cooling, and “open- 
system” algorithmic cooling, respectively. 
The far-reaching consequence of this research 
lies in the possibility of reaching beyond 
the potential implementation of remote- 
future quantum computing devices. An 
efficient technique to generate ensembles of 
spins that are highly polarized by external 
magnetic fields is considered to be a Holy 
Grail in NMR _ spectroscopy. Spin-half 
nuclei have steady-state polarization biases 
that increase inversely with temperature; 
therefore, spins exhibiting polarization biases 
above their thermal-equilibrium biases are 
considered cool. Such cooled spins present an 
improved signal-to-noise ratio if used in NMR 
spectroscopy or imaging. 

Existing spin-cooling techniques are 
limited in their efficiency and usefulness. 
Algorithmic cooling is a promising new 
spin-cooling approach that employs data 
compression methods in open systems. 


Algorithmic Cooling 


It reduces the entropy of spins to a point 
far beyond Shannon’s entropy bound on 
reversible entropy manipulations, thus 
increasing their polarization biases. As a 
result, it is conceivable that the open-system 
algorithmic cooling technique could be 
harnessed to improve on current uses of NUR 
in areas such as chemistry, material science, 
and even medicine, since NMR is at the basis 
of MRI — magnetic resonance imaging. 


Basic Concepts 


Loss-Less In-Place Data Compression 

Given a bit string of length n, such that the 
probability distribution is known and far enough 
from the uniform distribution, one can use data 
compression to generate a shorter string, say 
of m bits, such that the entropy of each bit 
is much closer to one. As a simple example, 
consider a four-bit string which is distributed as 
follows: pooo1 = Poo10 = Po100 = P1000 = 
1/4, with p; the probability of the string 7. The 
probability of any other string value is exactly 
zero, so the probabilities sum up to one. Then, 
the bit string can be compressed, via a lossy 
compression algorithm, into a 2-bit string that 
holds the binary description of the location of “1” 
in the above four strings. One can also envision 
a similar process that generates an output which 
is of the same length 7 as the input, but such 
that the entropy is compressed via a loss-less, in- 
place, data compression into the last two bits. For 
instance, logical gates that operate on the bits can 
perform the permutation 0001 — 0000, 0010 > 
0001, 0100 — 0010, and 1000 — 0011, while 
the other input strings transform to output strings 
in which the two most significant bits are not 
zero; for instance, 1100 — 1010. One can easily 
see that the entropy is now fully concentrated on 
the two least significant bits, which are useful in 
data compression, while the two most significant 
bits have zero entropy. 

In order to gain some intuition about the de- 
sign of logical gates that perform entropy manip- 
ulations, one can look at a closely related scenario 
which was first considered by von Neumann. 


31 


He showed a method to extract fair coin flips, 
given a biased coin; he suggested taking a pair of 
biased coin flips, with results a and b, and using 
the value of a conditioned on a # b. A simple 
calculation shows that a = 0 anda = | are now 
obtained with equal probabilities, and therefore, 
the entropy of coin a is increased in this case to 1. 
The opposite case, the probability distribution of 
a given that a = b, results in a highly determined 
coin flip, namely, a (conditioned) coin flip with 
a higher bias or lower entropy. A gate that flips 
the value of b if (and only if) a = 1 is called a 
controlled NOT gate. If after applying such a gate 
b = 1 is obtained, this means that a 4 b prior to 
the gate operation; thus, now the entropy of a is 
1. If, on the other hand, after applying such a gate 
b = Ois obtained, this means that a = b prior to 
the gate operation; thus, the entropy of a is now 
lower than its initial value. 


Spin Temperature, Polarization Bias, and 
Effective Cooling 

In physics, two-level systems, namely, systems 
that possess only binary values, are useful in 
many ways. Often it is important to initialize such 
systems to a pure state “0” or to a probability 
distribution which is as close as possible to a pure 
state “O.” In these physical two-level systems, 
a data compression process that brings some of 
them closer to a pure state can be considered as 
“cooling.” For quantum two-level systems, there 
is a simple connection between temperature, en- 
tropy, and population probability. The population 
difference between these two levels is known as 
the polarization bias, €. Consider a single spin- 
half particle — for instance, a hydrogen nucleus 
— in a constant magnetic field. At equilibrium 
with a thermal heat-bath, the probability of this 
spin to be up or down (i.e., parallel or antipar- 
allel to the field direction) is given by py = 
ite and py = i. The entropy H of the 
spin is H(single — bit) = H(1/2 + €/2) with 
H(P) = —P log, P — (1 — P)log,(1 — P) 
measured in bits. The two pure states of a spin- 
half nucleus are commonly written as |t) =“0” 
and ||) =“1”; the |) notation will be clarified 
elsewhere. (Quantum Computing entries in this 
encyclopedia.) The polarization bias of the spin at 
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thermal equilibrium is given by € = p;— py. For 
such a physical System, the bias is obtained via 
a quantum statistical mechanics argument, € = 


tanh (ser), where fi is Planck’s constant, B 
B 


is the magnetic field, y is the particle-dependent 
gyromagnetic constant,(This constant, y, is thus 
responsible for the difference in equilibrium po- 
larization bias [e.g., a hydrogen nucleus is 4 
times more polarized than a carbon isotope 17C 
nucleus, but about 10? less polarized than an 
electron spin].) Kg is Boltzman’s coefficient, and 
T is the thermal heat-bath temperature. For high 
temperatures or small biases € ~ sty a ; thus, the 
bias is inversely proportional to the temperature. 
Typical values of € for spin-half nuclei at room 
temperature (and magnetic field of ~10T) are 
107>-10-°, and therefore, most of the analysis 
here is done under the assumption thate < 
1. The spin temperature at equilibrium is thus 
T= Const and its (Shannon) entropy is H = 
1 — (e7/1n4). 

A spin temperature out of thermal equilibrium 
is still defined via the same formulas. Therefore, 
when a system is moved away from thermal 
equilibrium, achieving a greater polarization bias 
is equivalent to cooling the spins, without cool- 
ing the system, and to decreasing their entropy. 
The process of increasing the bias (reducing the 
entropy) without increasing the temperature of 
the thermal bath is known as “effective cooling.” 
After a typical period of time, termed the ther- 
malization time or relaxation time, the bias will 
gradually revert to its thermal equilibrium value; 
yet during this process, typically in the order 
of seconds, the effectively cooled spin may be 
used for various purposes as described in section 
“Applications.” 

Consider a molecule that contains n adjacent 
spin-half nuclei arranged in a line; these form 
the bits of the string. These spins are initially 
at thermal equilibrium due to their interaction 
with the environment. At room temperature, the 
bits at thermal equilibrium are not correlated to 
their neighbors on the same string: more pre- 
cisely, the correlation is very small and can be 
ignored. Furthermore, in a liquid state one can 
also neglect the interaction between strings (be- 
tween molecules). It is convenient to write the 
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probability distribution of a single spin at thermal 
equilibrium using the “density-matrix” notation 


_ (Pp, 0\_ (A+0/2 0 
PENG. py (i (=p) 
(1) 


since these two-level systems are of a quantum 
nature (namely, these are quantum bits — qubits) 
and, in general, can also have states other than 
just a classical probability distribution over “0” 
and “1.” The classical case will now be consid- 
ered, where p contains only diagonal elements, 
and these describe a conventional probability 
distribution. At thermal equilibrium, the state of 
n = 2 uncorrelated qubits that have the same po- 
larization bias is described by the density matrix 
a 7} Pe ®pPe, where ® means tensor product. 
The probability of the state “O00,” for instance, is 
then (1 + €)/2 x (1+ 6)/2 = (1+ €)7/4, etc. 
Similarly, the initial state of an n-qubit system of 
this type, at thermal equilibrium, is 


pr” = pe ® pe @ + ® pe. (2) 


This state represents a thermal probability distri- 
bution, such that the probability of the classical 
state “O00...0” is Pooo...0 = (1 + €09)"/2”, ete. 
In reality, the initial bias is not the same on each 
qubit,(Furthermore, individual addressing of each 
spin during the algorithm requires a slightly dif- 
ferent bias for each.) but as long as the differences 
between these biases are small (e.g., all qubits 
are of the same nucleus), these differences can be 
ignored in a discussion of an idealized scenario. 


Key Results 


Molecular-Scale Heat Engines 

Schulman and Vazirani (SV) [27] identified the 
importance of in-place loss-less data compression 
and of the low-entropy bits created in that pro- 
cess: physical two-level systems (e.g., spin-half 
nuclei) may be similarly cooled by data com- 
pression algorithms. SV analyzed the cooling 
of such a system using various tools of data 
compression. A loss-less compression of an -bit 
binary string distributed according to the thermal 


Algorithmic Cooling 


equilibrium distribution, Eq.2, is readily ana- 
lyzed using information-theoretical tools: In an 
ideal compression scheme (not necessarily real- 
izable), with sufficiently large n, all randomness 
— and hence all the entropy — of the bit string 
is transferred to n — m bits; the remaining m 
bits are thus left, with extremely high probability, 
at a known deterministic state, say the string 
“000...0.” The entropy H of the entire system 
is H(system) = nH(single — bit) = nH(1/2+ 
€/2). Any compression scheme cannot decrease 
this entropy; hence, Shannon’s source coding 
entropy bound yields m < n[1 — H(1/2 + €/2)]. 
A simple leading-order calculation shows that m 
is bounded by (approximately) en for small 
values of the initial bias ¢. Therefore, with typical 
€ ~ 107°, molecules containing an order of 
magnitude of 10!° spins are required to cool a 
single spin close to zero temperature. 
Conventional methods for NMR quantum 
computing are based on _ unscalable state 
initialization schemes [7, 14] (e.g., the “pseudo- 
pure-state” approach) in which the signal-to- 
noise ratio falls exponentially with , the number 
of spins. Consequently, these methods are 
deemed inappropriate for future NMR quantum 
computers. SV [27] were first to employ tools 
of information theory to address the scaling 
problem; they presented a compression scheme 
in which the number of cooled spins scales 
well (namely, a constant times n). SV also 
demonstrated a scheme approaching Shannon’s 
entropy bound, for very large n. They provided 
detailed analyses of three cooling algorithms, 
each useful for a different regime of € values. 
Some ideas of SV were already explored a 
few years earlier by Sgrensen [29], a physical 
chemist who analyzed effective cooling of spins. 
He considered the entropy of several spin systems 
and the limits imposed on cooling these systems 
by polarization transfer and more general polar- 
ization manipulations. Furthermore, he consid- 
ered spin-cooling processes in which only unitary 
operations were used, wherein unitary matrices 
are applied to the density matrices; such oper- 
ations are realizable, at least from a conceptual 
point of view. Sgrensen derived a stricter bound 
on unitary cooling, which today bears his name. 
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Yet, unlike SV, he did not infer the connection 
to data compression or advocate compression 
algorithms. 

SV named their concept “molecular-scale heat 
engine.” When combined with conventional po- 
larization transfer (which is partially similar to 
a SWAP gate between two qubits), the term 
“reversible polarization compression (RPC)” is 
more descriptive. 


Heat-Bath Algorithmic Cooling 

The next significant development came when 
Boykin, Mor, Roychowdhury, Vatan, and Vrijen 
(hereinafter referred to as BMRVV), invented a 
new spin-cooling technique, which they named 
Algorithmic cooling [4] or more specifically heat- 
bath algorithmic cooling in which the use of 
controlled interactions with a heat bath enhances 
the cooling techniques much further. Algorith- 
mic cooling (AC) expands the effective cooling 
techniques by exploiting entropy manipulations 
in open systems. It combines RPC steps (When 
the entire process is RPC, namely, any of the 
processes that follow SV ideas, one can refer to 
it as reversible AC or closed-system AC, rather 
than as RPC.) with fast relaxation (namely, ther- 
malization) of the hotter spins, as a way of pump- 
ing entropy outside the system and cooling the 
system much beyond Shannon’s entropy bound. 
In order to pump entropy out of the system, AC 
employs regular spins (here called computation 
spins) together with rapidly relaxing spins. The 
latter are auxiliary spins that return to their ther- 
mal equilibrium state very rapidly. These spins 
have been termed “reset spins,” or, equivalently, 
reset bits. The controlled interactions with the 
heat bath are generated by polarization transfer 
or by standard algorithmic techniques (of data 
compression) that transfer the entropy onto the 
reset spins which then lose this excess entropy 
into the environment. 

The ratio Ryejax—times, between the relaxation 
time of the computation spins and the relaxation 
time of the reset spins, must satisfy Ryelax—times >> 
1. This condition is vital if one wishes to perform 
many cooling steps on the system to obtain sig- 
nificant cooling. 
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In a pure information-theoretical point of 
view, it is legitimate to assume that the only 
restriction on ideal RPC steps is Shannon’s 
entropy bound; then the equivalent of Shannon’s 
entropy bound, when an ideal open-system AC is 
used, is that all computation spins can be cooled 
down to zero temperature, that is toe = 1. 
Proof: repeat the following till the entropy of 
all computation spins is exactly zero: (i) push 
entropy from computation spins into reset spins 
and (ii) let the reset spins cool back to room 
temperature. Clearly, each application of step 
(i), except the last one, pushes the same amount 
of entropy onto the reset spins, and then this 
entropy is removed from the system in step (ii). 
Of course, a realistic scenario must take other 
parameters into account such as finite relaxation- 
time ratios, realistic environment, and physical 
operations on the spins. Once this is done, cooling 
to zero temperature is no longer attainable. While 
finite relaxation times and a realistic environment 
are system dependent, the constraint of using 
physical operations is conceptual. 

BMRVV therefore pursued an algorithm that 
follows some physical rules; it is performed by 
unitary operations and reset steps and still bypass 
Shannon’s entropy bound, by far. The BMRVV 
cooling algorithm obtains significant cooling be- 
yond that entropy bound by making use of very 
long molecules bearing hundreds or even thou- 
sands of spins, because its analysis relies on the 
law of large numbers. 


Practicable Algorithmic Cooling 

The concept of algorithmic cooling then led 
to practicable algorithms [13] for cooling 
small molecules. In order to see the impact of 
practicable algorithmic cooling, it is best to use a 
different variant of the entropy bound. Consider 
a system containing n spin-half particles with 
total entropy higher than n — 1, so that there 
is no way to cool even one spin to zero 
temperature. In this case, the entropy bound 
is a result of the compression of the entropy 
into n — 1 fully random spins, so that the 
remaining entropy on the last spin is minimal. 
The entropy of the remaining single spin satisfies 
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H(single) > 1 — ne?/1n4; thus, at most, its 
polarization can be improved to 


€final S eJ/n . (3) 


The practicable algorithmic cooling (PAC), 
suggested by Fernandez, Lloyd, Mor, and 
Roychowdhury in [13], indicated potential for 
a near-future application to NMR spectroscopy. 
In particular, it presented an algorithm named 
PAC2 which uses any (odd) number of spins n, 
such that one of them is a reset spin, and (n — 1) 
are computation spins. PAC2 cools the spins such 
that the coldest one can (approximately) reach 
a bias amplification by a factor of (3/2)"—)/2. 
The approximation is valid as long as the final 
bias (3/2)"-)/2 € is much smaller than 1. 
Otherwise, a more precise treatment must be 
done. This proves an exponential advantage 
of AC over the best possible reversible AC, 
as these reversible cooling techniques, e.g., 
of [27, 29], are limited to improve the bias by 
no more than a factor of ./n. PAC can be applied 
for small n (e.g., in the range of 10-20), and 
therefore, it is potentially suitable for near- 
future applications [9, 13, 19] in chemical and 
biomedical usages of NMR spectroscopy. 

It is important to note that in typical scenarios, 
the initial polarization bias of a reset spin is 
higher than that of a computation spin. In this 
case, the bias amplification factor of (3/2)"~)/? 
is relative to the larger bias, that of the reset 
spin. 


Exhaustive Algorithmic Cooling 

Next, AC was analyzed, wherein the cooling 
steps (reset and RPC) are repeated an arbitrary 
number of times. This is actually an idealization 
where an unbounded number of reset and logic 
steps can be applied without error or decoher- 
ence, while the computation qubits do not lose 
their polarization biases. Fernandez [12] consid- 
ered two computation spins and a single reset 
spin (the least significant bit, namely, the qubit 
at the right in the tensor-product density-matrix 
notation) and analyzed optimal cooling of this 
system. By repeating the reset and compression 
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exhaustively, he realized that the bound on the 
final biases of the three spins is approximately 
{2,1, 1} in units of €, the polarization bias of the 
reset spin. 

Mor and Weinstein generalized this analysis 
further and found that n — 1 computation spins 
and a single reset spin can be cooled (approx- 
imately) to biases according to the Fibonacci 
series: {...34,21,13,8,5,3,2,1,1}. The com- 
putation spin that is further away from the reset 
spin can be cooled up to the relevant Fibonacci 
number F,,. That approximation is valid as long 
as the largest term times € is still much smaller 
than 1. Schulman then suggested the “partner 
pairing algorithm” (PPA) and proved the optimal- 
ity of the PPA among all classical and quantum 
algorithms. These two algorithms, the Fibonacci 
AC and the PPA, led to two joint papers [25, 26], 
where upper and lower bounds on AC were also 
obtained. The PPA is defined as follows: repeat 
these two steps until cooling sufficiently close to 
the limit: (a) RESET, applied to a reset spin in a 
system containing n — 1 computation spins and 
a single (the LSB) reset spin, and (b) SORT, a 
permutation that sorts the 2” diagonal elements of 
the density matrix by decreasing order, so that the 
MSB spin becomes the coldest. Two important 
theorems proven in [26] are: 


(1) Lower bound: When €2” > 1 (namely, for 
long enough molecules), Theorem 3 in [26] 
promises that n —log(1/e) cold qubits can be 
extracted. This case is relevant for scalable 
NMR quantum computing. 

Upper bound: Section 4.2 in [26] proves 
the following theorem: No algorithmic cool- 
ing method can increase the probability of 
any basis state to above min{2~"e?"*, 1}, 
wherein the initial configuration is the com- 
pletely mixed state (the same is true if the 
initial state is a thermal state). 


(2 


wm 


More recently, Elias, Fernandez, Mor, and 
Weinstein [9] analyzed more closely the case 
of nm < 15 (at room temperature), where 
the coldest spin (at all stages) still has a 
polarization bias much smaller than 1. This 
case is most relevant for near-future applications 
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in NMR spectroscopy. They generalized the 
Fibonacci-AC to algorithms yielding higher- 
term Fibonacci series, such as the tribonacci 
(also known as 3-term Fibonacci series), 
{...81, 44, 24, 13,7, 4,2, 1, 1}, etc. The ultimate 
limit of these multi-term Fibonacci series 
is obtained when each term in the series is 
the sum of all previous terms. The resulting 
series is precisely the exponential series 
{... 128, 64, 32, 16,8, 4,2, 1,1}, so the coldest 
spin is cooled by a factor of 2”~?. Furthermore, 
a leading-order analysis of the upper bound 
mentioned above (Section 4.2 in Ref. [26]) shows 
that no spin can be cooled beyond a factor of 
2"-1: see Corollary 1 in [9]. 


Other Results 

For several other theoretical results dealing with 
relevant algorithms and with the connection to 
thermodynamics, see [11, 15, 17,21]. For several 
popular “News and Views” discussions of AC in 
Nature, see [18, 22, 24]. 


Applications 


The two major far-future and near-future appli- 
cations are already described in section “Prob- 
lem Definition.” It is important to add here that 
although the specific algorithms analyzed so far 
for AC are usually classical, their practical im- 
plementation via an NMR spectrometer must be 
done through analysis of universal quantum com- 
putation, using the specific gates allowed in such 
systems. Therefore, AC could yield the first near- 
future application of quantum computing devices. 

AC may also be useful for cooling various 
other physical systems; for several examples (the- 
oretical and experimental), see [2, 16, 28, 30,31], 
since state initialization is a common problem in 
physics in general and in quantum computation 
in particular. 


Open Problems 


A main open problem in practical AC is 
technological; can the ratio of relaxation 
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times be increased so that many cooling steps 
may be applied onto relevant NMR systems? 
Other methods, for instance, a spin-diffusion 
mechanism [3, 23], may also be useful for various 
applications. 

Another interesting open problem is whether 
the ideas developed during the design of AC can 
also lead to applications in classical information 
theory. 

Last but not least, in the context of building 
scalable quantum computers, it is interesting to 
study if AC can become a practical tool for ad- 
vancing the non-conventional model of quantum 
computing called the one pure qubit (or one clean 
qubit) model as suggested in [1,8] and to study 
if AC can be useful for designing fault-tolerant 
quantum computers as suggested in [20]. 


Experimental Results 


Various ideas of AC had already led to several 
experiments using 3-4 qubit quantum computing 
devices in NMR (AC used in other systems was 
mentioned earlier in section “Applications’’): 


(1) An experiment [6] that implemented a single 
RPC step. 

(2) Two experiments [5, 10] in which entropy- 
conservation bounds (which apply in any 
closed system) were bypassed. The second 
one [10] was done on bio-molecules — amino 
acids. 

A full AC experiment [3] that includes the 
initialization of three carbon nuclei to the bias 
of a hydrogen spin, followed by a single com- 
pression step on these three carbons. This 
work was later on extended also to multi- 
cycle AC [23]. 


(3) 


Cross-References 


Quantum computing entries such as >» Quantum 
Algorithm for Factoring, > Quantum Algorithm 
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Problem Definition 


Mechanism design is a sub-field of economics 
and game theory that studies the construction 
of social mechanisms in the presence of selfish 
agents. The nature of the agents dictates a basic 
contrast between the social planner, that aims 
to reach a socially desirable outcome, and the 
agents, that care only about their own private 
utility. The underlying question is how to incen- 
tivize the agents to cooperate, in order to reach 
the desirable social outcomes. 

In the Internet era, where computers act and 
interact on behalf of selfish entities, the connec- 
tion of the above to algorithmic design suggests 
itself: suppose that the input to an algorithm is 
kept by selfish agents, who aim to maximize their 
own utility. How can one design the algorithm so 
that the agents will find it in their best interest 
to cooperate, and a close-to-optimal outcome 
will be outputted? This is different than clas- 
sic distributed computing models, where agents 
are either “good” (meaning obedient) or “bad” 
(meaning faulty, or malicious, depending on the 
context). Here, no such partition is possible. It is 
simply assumed that all agents are utility maxi- 
mizers. To illustrate this, let us describe a moti- 
vating example: 


A Motivating Example: Shortest Paths 
Given a weighted graph, the goal is to find 
a shortest path (with respect to the edge weights) 
between a given source and target nodes. Each 
edge is controlled by a selfish entity, and the 
weight of the edge, we is private information 
of that edge. If an edge is chosen by the 
algorithm to be included in the shortest path, 
it will incur a cost which is minus its weight 
(the cost of communication). Payments to the 
edges are allowed, and the total utility of an edge 
that participates in the shortest path and gets 
a payment p, is assumed to be ue = Pe — We. 
Notice that the shortest path is with respect to the 
true weights of the agents, although these are not 
known to the designer. 

Assuming that each edge will act in order 
to maximize its utility, how can one choose the 
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path and the payments? One option is to ignore 
the strategic issue all together, ask the edges 
to simply report their weights, and compute the 
shortest path. In this case, however, an edge 
dislikes being selected, and will therefore prefer 
to report a very high weight (much higher than 
its true weight) in order to decrease the chances 
of being selected. Another option is to pay each 
selected edge its reported weight, or its reported 
weight plus a small fixed “bonus”. However in 
such a case all edges will report lower weights, 
as being selected will imply a positive gain. 

Although this example is written in an 
algorithmic language, it is actually a mechanism 
design problem, and the solution, which is now 
a classic, was suggested in the 1970’s. The 
chapter continues as follows: First, the abstract 
formulation for such problems is given, the 
classic solution from economics is described, and 
its advantages and disadvantages for algorithmic 
purposes are discussed. The next section then 
describes the new results that algorithmic 
mechanism design offers. 


Abstract Formulation 
The framework consists of a set A of alternatives, 
or outcomes, and n players, or agents. Each 
player i has a valuation function vj: A > 
that assigns a value to each possible alternative. 
This valuation function belongs to a domain 
V; of all possible valuation functions. Let 
V=V,x---xX Vy, and Vii = Tai Vi- 
Observe that this generalizes the shortest path 
example of above: A is all the possible s —t 
paths in the given graph, ve(a) for some path 
a € Ais either —w, (if e € a) or zero. 

A social choice function f:V — A assigns 
a socially desirable alternative to any given 
profile of players’ valuations. This parallels the 
notion of an algorithm. A mechanism is a tuple 
M =(f, p1.---; Pn), where f is a social choice 
function, and pj: V > % (fori = 1,...,m) is the 
price charged from player i. The interpretation is 
that the social planner asks the players to reveal 
their true valuations, chooses the alternative 
according to f as if the players have indeed 
acted truthfully, and in addition rewards/punishes 
the players with the prices. These prices should 
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induce “truthfulness” in the following strong 
sense: no matter what the other players declare, 
it is always in the best interest of player i to 
reveal her true valuation, as this will maximize 
her utility. Formally, this translates to: 


Definition 1 (Truthfulness) ™ is “truthful” (in 
dominant strategies) if, for any player i, any pro- 
file of valuations of the other players v_; € V_j, 
and any two valuations of player iv;, v; E Vj, 


vj (a) — pi (uj, v-i) = vi(b) — pi(v;, v-i) 
where f(v;,v_;) = a and f(v;, v_;) = b. 


Truthfulness is quite strong: a player need not 
know anything about the other players, even not 
that they are rational, and still determine the best 
strategy for her. Quite remarkably, there exists 
a truthful mechanism, even under the current 
level of abstraction. This mechanism suits all 
problem domains, where the social goal is to 
maximize the “social welfare”: 


Definition 2 (Social welfare maximiza- 
tion) A social choice function f:V >A 
maximizes the social welfare if f(v) € 
argmax,e4 >; vi(a), for any v € V. 


Notice that the social goal in the shortest path 
domain is indeed welfare maximization, and, in 
general, this is a natural and important economic 
goal. Quite remarkably, there exists a general 
technique to construct truthful mechanisms that 
implement this goal: 


Theorem 1 (Vickrey—Clarke—Groves (VCG)) 
Fix any alternatives set A and any domain V, 
and suppose that f: V — A maximizes the social 
welfare. Then there exist prices p such that the 
mechanism (f, p) is truthful. 


This gives “for free” a solution to the shortest 
path problem, and to many other algorithmic 
problems. The great advantage of the VCG 
scheme is its generality: it suits all problem 
domains. The disadvantage, however, is that 
the method is tailored to social welfare 
maximization. This turns out to be restrictive, 
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especially for algorithmic and computational 
settings, due to several reasons: (i) different 
algorithmic goals: the algorithmic literature 
considers a variety of goals, including many 
that cannot be translated to welfare maximiza- 
tion. VCG does not help us in such cases. 
(ii) computational complexity: even if the goal is 
welfare maximization, in many settings achieving 
exactly the optimum is computationally hard. 
The CS discipline usually overcomes this by 
using approximation algorithms, but VCG will 
not work with such algorithm — reaching exact 
optimality is a necessary requirement of VCG. 
(ii) different algorithmic models: common CS 
models change “the basic setup”, hence cause 
unexpected difficulties when one tries to use 
VCG (for example, an online model, where the 
input is revealed over time; this is common 
in CS, but changes the implicit setting that 
VCG requires). This is true even if welfare 
maximization is still the goal. 

Answering any one of these difficulties re- 
quires the design of a non-VCG mechanism. 
What analysis tools should be used for this pur- 
pose? In economics and classic mechanism de- 
sign, average-case analysis, that relies on the 
knowledge of the underlying distribution, is the 
standard. Computer science, on the other hand, 
usually prefers to avoid strong distributional as- 
sumptions, and to use worst-case analysis. This 
difference is another cause to the uniqueness 
of the answers provided by algorithmic mecha- 
nism design. Some of the new results that have 
emerged as a consequence of this integration 
between Computer Science and Economics is 
next described. Many other research topics that 
use the tools of algorithmic mechanism design 
are described in the entries on Adword Pric- 
ing, Competitive Auctions, False Name Proof 
Auctions, Generalized Vickrey Auction, Incen- 
tive Compatible Ranking, Mechanism for One 
Parameter Agents Single Buyer/Seller, Multiple 
Item Auctions, Position Auctions, and Truthful 
Multicast. 

There are two different but closely related 
research topics that should be mentioned in the 
context of this entry. The first is the line of works 
that studies the “price of anarchy” of a given 
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system. These works analyze existing systems, 
trying to quantify the loss of social efficiency 
due to the selfish nature of the participants, while 
the approach of algorithmic mechanism design 
is to understand how new systems should be 
designed. For more details on this topic the reader 
is referred to the entry on Price of Anarchy. 
The second topic regards the algorithmic study 
of various equilibria computation. These works 
bring computational aspects into economics and 
game theory, as they ask what equilibria notions 
are reasonable to assume, if one requires com- 
putational efficiency, while the works described 
here bring game theory and economics into com- 
puter science and algorithmic theory, as they ask 
what algorithms are reasonable to design, if one 
requires the resilience to selfish behavior. For 
more details on this topic the reader is referred 
(for example) to the entry on Algorithms for 
Nash Equilibrium and to the entry on General 
Equilibrium. 


Key Results 


Problem Domain 1: Job Scheduling 

Job scheduling is a classic algorithmic setting: n 
jobs are to be assigned to m machines, where job 
j Tequires processing time p;; on machine i. In the 
game-theoretic setting, it is assumed that each 
machine i is a selfish entity, that incurs a cost pj 
from processing job j. Note that the payments 
in this setting (and in general) may be negative, 
offsetting such costs. A popular algorithmic goal 
is to assign jobs to machines in order to minimize 
the “makespan”: max; )) ; , aiicnearer Dif: “TS 
is different than welfare maximization, which 
translates in this setting to the minimization of 
i Dj isassignedto: Pij» further illustrating the 
problem of different algorithmic goals. Thus the 
VCG scheme cannot be used, and new methods 
must be developed. 

Results for this problem domain depend on the 
specific assumptions about the structure of the 
processing time vectors. In the related machines 
case, pi; = p;/s; for any ij, where the p,’s are 
public knowledge, and the only secret parameter 
of player i is its speed, sj. 
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Theorem 2 ((3, 22]) For job scheduling 
on related machines, there exists a_ truthful 
exponential-time mechanism that obtains the 
optimal makespan, and a truthful polynomial- 
time mechanism that obtains a 3-approximation 
to the optimal makespan. 


More details on this result are given in the entry 
on Mechanism for One Parameter Agents Sin- 
gle Buyer. The bottom line conclusion is that, 
although the social goal is different than welfare 
maximization, there still exists a truthful mech- 
anism for this goal. A non-trivial approximation 
guarantee is achieved, even under the additional 
requirement of computational efficiency. How- 
ever, this guarantee does not match the best pos- 
sible without the truthfulness requirement, since 
in this case a PTAS is known. 


Open Question 1 Js there a truthful PTAS for 
makespan minimization in related machines? 


If the number of machines is fixed then [2] give 
such a truthful PTAS. 

The above picture completely changes in the 
move to the more general case of unrelated ma- 
chines, where the p,’s are allowed to be arbitrary: 


Theorem 3 ({13, 30]) Any truthful scheduling 
mechanism for unrelated machines cannot ap- 
proximate the optimal makespan by a factor bet- 
ter than 1 + /2 (for deterministic mechanisms) 
and 2 — 1/m (for randomized mechanisms). 


Note that this holds regardless of computational 
considerations. In this case, switching from 
welfare maximization to makespan minimization 
results in a strong impossibility. On the 
possibilities side, virtually nothing (!) is known. 
The VCG mechanism (which minimizes the total 
social cost) is an m-approximation of the optimal 
makespan [32], and, in fact, nothing better is 
currently known: 


Open Question 2 What is the best possible ap- 
proximation for truthful makespan minimization 
in unrelated machines? 


What caused the switch from “mostly possi- 
bilities” to “mostly impossibilities’? Related 
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machines is a_ single-dimensional domain 
(players hold only one secret number), for 
which truthfulness is characterized by a simple 
monotonicity condition, that leaves ample 
flexibility for algorithmic design. Unrelated 
machines, on the other hand, are a multi- 
dimensional domain, and the algorithmic 
conditions implied by truthfulness in such a case 
are harder to work with. It is still unclear 
whether these conditions imply real mathematical 
impossibilities, or perhaps just pose harder 
obstacles that can be in principle solved. One 
multi-dimensional scheduling domain for which 
possibility results are known is the case where 
pi; © {L;, H;}, where the “low” ’s and “high” 
*s are fixed and known. This case generalizes 
the classic multi-dimensional model of restricted 
machines (p;; € {p;,0o}), and admits a truthful 
3-approximation [27]. 


Problem Domain 2: Digital Goods 

and Revenue Maximization 

In the E-commerce era, a new kind of “digital 
goods” have evolved: goods with no marginal 
production cost, or, in other words, goods with 
unlimited supply. One example is songs being 
sold on the Internet. There is a sunk cost of 
producing the song, but after that, additional 
electronic copies incur no additional cost. How 
should such items be sold? One possibility is 
to conduct an auction. An auction is a one- 
sided market, where a monopolistic entity (the 
auctioneer) wishes to sell one or more items to 
a set of buyers. 

In this setting, each buyer has a privately 
known value for obtaining one copy of the good. 
Welfare maximization simply implies the allo- 
cation of one good to every buyer, but a more 
interesting question is the question of revenue 
maximization. How should the auctioneer design 
the auction in order to maximize his profit? Stan- 
dard tools from the study of revenue-maximizing 
auctions (This model was not explicitly studied 
in classic auction theory, but standard results 
from there can be easily adjusted to this setting.) 
suggest to simply declare a price-per-buyer, de- 
termined by the probability distribution of the 
buyer’s value, and make a take-it-or-leave-it offer. 
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However, such a mechanism needs to know the 
underlying distribution. Algorithmic mechanism 
design suggests an alternative, worst-case result, 
in the spirit of CS-type models and analysis. 

Suppose that the auctioneer is required to 
sell all items in the same price, as is the case 
for many “real-life” monopolists, and denote by 
F(v) the maximal revenue from a fixed-price sale 
to bidders with values ¥ = v1,...U,, assuming 
that all values are known. Reordering indexes so 
that vy > v2 > +++ > vg, let F(v) = max; i - v;. 
The problem is, of-course, that in fact nothing 
about the values is known. Therefore, a truthful 
auction that extracts the players’ values is in 
place. Can such an auction obtain a profit that is 
a constant fraction of F(v), for any v (ie., in the 
worst case)? Unfortunately, the answer is prov- 
ably no [17]. The proof makes use of situations 
where the entire profit comes from the highest 
bidder. Since there is no potential for competition 
among bidders, a truthful auction cannot force 
this single bidder to reveal her value. 

Luckily, a small relaxation in the optimality 
criteria significantly helps. Specifically, denote 
by F(t) = max;s9i - v; (i.e., the benchmark 
is the auction that sells to at least two buyers). 


Theorem 4 ((17, 20]) There exists a truthful 
randomized auction that obtains an expected rev- 
enue of at least F /3.25, even in the worst- 
case. On the other hand, no truthful auction 
can approximate F (2) within a factor better than 
2.42. 


Several interesting formats of distribution-free 
revenue-maximizing auctions have been consid- 
ered in the literature. The common building block 
in all of them is the random partitioning of the 
set of buyers to random subsets, analyzing each 
set separately, and using the results on the other 
sets. Each auction utilizes a different analysis on 
the two subsets, which yields slightly different 
approximation guarantees. Aggarwal et al. [1] 
describe an elegant method to derandomize these 
type of auctions, while losing another factor of 
4 in the approximation. More details on this 
problem domain can be found in the entry on 
Competitive Auctions. 


4 


Problem Domain 3: Combinatorial 

Auctions 

Combinatorial auctions (CAs) are a central 
model with theoretical importance and practical 
relevance. It generalizes many theoretical 
algorithmic settings, like job scheduling and 
network routing, and is evident in many real- 
life situations. This new model has various pure 
computational aspects, and, additionally, exhibits 
interesting game theoretic challenges. While each 
aspect is important on its own, obviously only 
the integration of the two provides an acceptable 
solution. 

A combinatorial auction is a multi-item auc- 
tion in which players are interested in bundles 
of items. Such a valuation structure can repre- 
sent substitutabilities among items, complemen- 
tarities among items, or a combination of both. 
More formally, m items ((2) are to be allocated 
to n players. Players value subsets of items, 
and v;(S) denotes i’s value of a bundle S$ C Q. 
Valuations additionally satisfy: (i) monotonicity, 
ie., uj(S) < v;(T) for S C T, and (ii) normal- 
ization, i.e., v;(@) = 0. The literature has mostly 
considered the goal of maximizing the social 
welfare: find an allocation (S;,..., S,) that max- 
imizes }°, v; (Sj). 

Since a general valuation has size exponential 
in n and m, the representation issue must be taken 
into account. Two models are usually considered 
(see [11] for more details). In the bidding lan- 
guages model, the bid of a player represents his 
valuation is a concise way. For this model it is 
NP-hard to approximate the social welfare within 
a ratio of 2(m'/2-€), for any € > 0 (if “single- 
minded” bids are allowed; the exact definition 
is given below). In the query access model, the 
mechanism iteratively queries the players in the 
course of computation. For this model, any al- 
gorithm with polynomial communication cannot 
obtain an approximation ratio of 2(m!/2-£) for 
any € > 0. These bounds are tight, as there exist 
a deterministic ./m-approximation with polyno- 
mial computation and communication. Thus, for 
the general valuation structure, the computational 
status by itself is well-understood. 

The basic incentives issue is again well- 
understood: VCG obtains truthfulness. Since 
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VCG requires the exact optimum, which is NP- 
hard to compute, the two considerations therefore 
clash, when attempting to use classic techniques. 
Algorithmic mechanism design aims to develop 
new techniques, to integrate these two desirable 
aspects. 

The first positive result for this integration 
challenge was given by [29], for the special case 
of “single-minded bidders”: each bidder, i, is 
interested in a specific bundle S;, for a value v; 
(any bundle that contains S; is worth v;, and other 
bundles have zero value). Both v;, S; are private 
to the player i. 


Theorem 5 ((29]) There exists a truthful and 
polynomial-time combinatorial 
auction for single-minded bidders, which obtains 
a /m-approximation to the optimal social 
welfare. 


deterministic 


A possible generalization of the basic model 
is to assume that each item has B copies, and 
each player still desires at most one copy from 
each item. This is termed “multi-unit CA”. As B 
grows, the integrality constraint of the problem 
reduces, and so one could hope for better solu- 
tions. Indeed, the next result exploits this idea: 


Theorem 6 ([7]) There exists a truthful and 
polynomial-time deterministic multi-unit CA, 
for B=3 copies of each item, that obtains 
O(B -m'/8-2)).approximation to the optimal 
social welfare. 


This auction copes with the representation issue 
(since general valuations are assumed) by access- 
ing the valuations through a “demand oracle”: 
given per-item prices { px}xe@, specify a bundle 
S that maximizes v;(S) — es Px- 

Two main drawbacks of this auction motivate 
further research on the issue. First, as B gets 
larger it is reasonable to expect the approxi- 
mation to approach | (indeed polynomial-time 
algorithms with such an approximation guarantee 
do exist). However here the approximation ratio 
does not decrease below O(logm) (this ratio is 
achieved for B = O(logm)). Second, this auc- 
tion does not provide a solution to the original 
setting, where B = 1, and, in general for small 
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B’s the approximation factor is rather high. One 
way to cope with these problems is to introduce 
randomness: 


Theorem 7 ((26]) There exists a truthful-in- 
expectation and polynomial-time randomized 
multi-unit CA, for any B = 1 copies of each item, 
that obtains O(m'/8+)-approximation to the 
optimal social welfare. 


Thus, by allowing randomness, the gap from 
the standard computational status is being com- 
pletely closed. The definition of truthfulness-in- 
expectation is the natural extension of truthful- 
ness to a randomized environment: the expected 
utility of a player is maximized by being truthful. 

However, this notion is strictly weaker than 
the deterministic notion, as this implicitly implies 
that players care only about the expectation of 
their utility (and not, for example, about the 
variance). This is termed “the risk-neutrality” 
assumption in the economics literature. An in- 
termediate notion for randomized mechanisms is 
that of “universal truthfulness”: the mechanism 
is truthful given any fixed result of the coin toss. 
Here, risk-neutrality is no longer needed. Dobzin- 
ski et al. [15] give a universally truthful CA for 
B =1 that obtains an O(./m)-approximation. 
Universally truthful mechanisms are still weaker 
than deterministic truthful mechanisms, due to 
two reasons: (1) It is not clear how to actually cre- 
ate the correct and exact probability distribution 
with a deterministic computer. The situation here 
is different than in “regular” algorithmic settings, 
where various derandomization techniques can 
be employed, since these in general does not 
carry through the truthfulness property. (ii) Even 
if a natural randomness source exists, one cannot 
improve the quality of the actual output by re- 
peating the computation several times (using the 
the law of large numbers). Such a repetition will 
again destroy truthfulness. Thus, exactly because 
the game-theoretic issues are being considered 
in parallel to the computational ones, the impor- 
tance of determinism increases. 


Open Question 3 What is the best-possible 


approximation ratio that deterministic and 


Algorithmic Mechanism Design 


truthful combinatorial auctions can obtain, in 
polynomial-time ? 


There are many valuation classes, that restrict the 
possible valuations to some reasonable format 
(see [28] for more details). For example, sub- 
additive valuations are such that, for any two 
bundles $,7,C 2, v(S UT) < v(S) 4+ v(7). 
Such classes exhibit much better approx- 
imation guarantees, e.g., for sub-additive 
valuation a polynomial-time 2-approximation 
is known [16]. However, no polynomial-time 
truthful mechanism (be it randomized, or 
deterministic) with a constant approximation 
ratio, is known for any of these classes. 


Open Question 4 Does there exist polynomial- 
time truthful constant-factor approximations for 
special cases of CAs that are NP-hard? 


Revenue maximization in CAs is of-course 
another important goal. This topic is. still 
mostly unexplored, with few exceptions. The 
mechanism [7] obtains the same guarantees 
with respect to the optimal revenue. Improved 
approximations exist for multi-unit auctions 
(where all items are identical) with budget 
constrained players [12], and for unlimited- 
supply CAs with single-minded bidders [6]. 

The topic of Combinatorial Auctions is 
discussed also in the entry on Multiple Item 
Auctions. 


Problem Domain 4: Online Auctions 

In the classic CS setting of “online computa- 
tion”, the input to an algorithm is not revealed 
all at once, before the computation begins, but 
gradually, over time (for a detailed discussion 
see the many entries on online problems in this 
book). This structure suits the auction world, 
especially in the new electronic environments. 
What happens when players arrive over time, and 
the auctioneer must make decisions facing only 
a subset of the players at any given time? 

The integration of online settings, worst- 
case analysis, and auction theory, was suggested 
by [24]. They considered the case where players 
arrive one at a time, and the auctioneer must 
provide an answer to each player as it arrives, 
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without knowing the future bids. There are 
k identical items, and each bidder may have 
a distinct value for every possible quantity of the 
item. These values are assumed to be marginally 
decreasing, where each marginal value lies in 
the interval [v,v]. The private information of 
a bidder includes both her valuation function, 
and her arrival time, and so a truthful auction 
need to incentivize the players to arrive on time 
(and not later on), and to reveal their true values. 
The most interesting result in this setting is for 
a large k, so that in fact there is a continuum of 
items: 


Theorem 8 ([24]) There exists a truthful on- 
line auction that simultaneously approximates, 
within a factor of O(log(v/v)), the optimal of- 
fline welfare, and the offline revenue of VCG. Fur- 
thermore, no truthful online auction can obtain 
a better approximation ratio to either one of these 
criteria (separately). 


This auction has the interesting property of being 
a “posted price” auction. Each bidder is not re- 
quired to reveal his valuation function, but, rather, 
he is given a price for each possible quantity, and 
then simply reports the desired quantity under 
these prices. 

Ideas from this construction were later used 
by [10] to construct two-sided online auction 
markets, where multiple sellers and buyers arrive 
online. 

This approximation ratio can be dramatically 
improved, to be a constant, 4, if one assumes 
that (i) there is only one item, and (ii) player 
values are i.i.d from some fixed distribution. 
No a-priori knowledge of this distribution 
is needed, as neither the mechanism nor the 
players are required to make any use of it. 
This work, [19], analyzes this by making an 
interesting connection to the class of “secretary 
problems”. 

A general method to convert online algorithms 
to online mechanisms is given by [4]. This is 
done for one item auctions, and, more generally, 
for one parameter domains. This method is com- 
petitive both with respect to the welfare and the 
revenue. 
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The revenue that the online auction of The- 
orem 8 manages to raise is competitive only 
with respect to VCG’s revenue, which may be 
far from optimal. A parallel line of works is 
concerned with revenue maximizing auctions. To 
achieve good results, two assumptions need to be 
made: (i) there exists an unlimited supply of items 
(and recall from section “Problem Domain 2: 
Digital Goods and Revenue Maximization” that 
F(v) is the offline optimal monopolistic fixed- 
price revenue), and (ii) players cannot lie about 
their arrival time, only about their value. This 
last assumption is very strong, but apparently 
needed. Such auctions are termed here “value- 
truthful’, indicating that “time-truthfulness”’ is 
missing. 


Theorem 9 ([9]) For any € > 0, there exists 
a value-truthful online auction, for the unlimited 
supply case, with expected revenue of at least 


(F(v))/(1 + €) — O(h/e?). 


The construction exploits principles from 
learning theory in an elegant way. Posted 
price auctions for this case are also possible, 
in which case the additive loss increases to 
O(hloglogh). Hajiaghayi et al. [19] consider 
fully-truthful online auctions for revenue 
maximization, but manage to obtain only 
very high (although fixed) competitive ratios. 
Constructing fully-truthful online auctions with 
a close-to-optimal revenue remains an open 
question. Another interesting open question 
involves multi-dimensional valuations. The 
work [24] remains the only work for players 
that may demand multiple items. However 
their competitive guarantees are quite high, 
and achieving better approximation guarantees 
(especially with respect to the revenue) is 
a challenging task. 


Advanced Issues 


Monotonicity 

What is the general way for designing a truthful 
mechanism? The straight-forward way is to 
check, for a given social choice function f, 
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whether truthful prices exist. If not, try to “fix” 
f. It turns out, however, that there exists a more 
structured way, an algorithmic condition that 
will imply the existence of truthful prices. Such 
a condition shifts the designer back to the familiar 
territory of algorithmic design. Luckily, such 
a condition do exist, and is best described in the 
abstract social choice setting of section “Problem 
Definition”: 


Definition 3 ((8, 23)] A social choice function 
f:V — A is “weakly monotone” (W-MON) if 
for any i, v-; € V_;, and any v;, v € JV;, the fol- 
lowing holds. Suppose that f(v;, v-;) = a, and 
S (vj, v-i) = b. Then v;(b) — v;(b) = v;i(a) — 
vu; (a). 


In words, this condition states the following. 
Suppose that player i changes her declaration 
from v; to vi, and this causes the social choice 
to change from a to b. Then it must be the case 
that i’s value for b has increased in the transition 
from v; to v; no-less than i’s value for a. 


Theorem 10 ((35]) Fix a social choice function 
f:V — A, where V is convex, and A is finite. 
Then there exist prices p such that M = (f, p) is 
truthful if and only if f is weakly monotone. 


Furthermore, given a weakly monotone f, there 
exists an explicit way to determine the appropri- 
ate prices p (see [18] for details). 

Thus, the designer should aim for weakly 
monotone algorithms, and need not worry about 
actual prices. But how difficult is this? For single- 
dimensional domains, it turns out that W-MON 
leaves ample flexibility for the algorithm de- 
signer. Consider for example the case where ev- 
ery alternative has a value of either 0 (the player 
“loses’”) or some v; € ht (the player “wins” and 
obtains a value v;). In such a case, it is not hard 
to show that W-MON reduces to the following 
monotonicity condition: if a player wins with vj, 
and increases her value to v; > vu; (while v_; 
remains fixed), then she must win with v; as 
well. Furthermore, in such a case, the price of 
a winning player must be set to the infimum over 
all winning values. 
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Impossibilities of truthful design 

It is fairly simple to construct algorithms 
that satisfy W-MON for single-dimensional 
domains, and a variety of positive results were 
obtained for such domains, in classic mechanism 
design, as well as in algorithmic mechanism 
design. But how hard is it to satisfy W-MON 
for multi-dimensional domains? This question 
is yet unclear, and seems to be one of the 
challenges of algorithmic mechanism design. 
The contrast between single-dimensionality and 
multi-dimensionality appears in all problem 
domains that were surveyed here, and seems to 
reflect some inherent difficulty that is not exactly 
understood yet. Given a social choice function 
f, call f implementable (in dominant strategies) 
if there exist prices p such that M = (f, p) is 
truthful. The basic question is then what forms of 
social choice functions are implementable. 

As detailed in the beginning, the welfare max- 
imizing social choice function is implementable. 
This specific function can be slightly generalized 
to allow weights, in the following way: fix some 
non-negative real constants {w;}7_, (not all are 
zero) and {yg}aeA, and choose an alternative 
that maximizes the weighted social welfare, i.e., 
f(v) € argmax,e 4 >~; Wivi(a) + Ya. This class 
of functions is sometimes termed “affine maxi- 
mizers”’. It turns out that these functions are also 
implementable, with prices similar in spirit to 
VCG. In the context of the above characterization 
question, one sharp result stands out: 


Theorem 11 ((34]) Fix a social choice function 
f:V — A, such that (i) A is finite, |A| > 3, and 
f is onto A, and (ii) V; = 84 for all i. Then f 
is implementable (in dominant strategies) if and 
only if it is an affine maximizer. 


The domain V that satisfies V; = 4 for all i 
is term an “unrestricted domain”. The theorem 
states that, if the domain is unrestricted, at least 
three alternatives are chosen, and the set A of 
alternatives is finite, then nothing besides affine 
maximizers can be implemented! 

However, the assumption that the domain is 
unrestricted is very restrictive. All the above 
example domains exhibit some basic combina- 
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torial structure, and are therefore restricted in 
some way. And as discussed above, for many 
restricted domains the theorem is simply not 
true. So what is the possibilities — impossibilities 
border? As mentioned above, this is an unsolved 
challenge. Lavi, Mu’alem, and Nisan [23] ex- 
plore this question for Combinatorial Auctions 
and similar restricted domains, and reach partial 
answers. For example: 


Theorem 12 ((23]) Any truthful combinatorial 
auction or multi-unit auction among two players, 
that must always allocate all items, and that 
approximates the welfare by a factor better than 
2, must be an affine maximizer. 


Of-course, this is far from being a complete 
answer. What happens if there are more than two 
players? And what happens if it is possible to 
“throw away” part of the items? These questions, 
and the more general and abstract characteriza- 
tion question, are all still open. 


Alternative solution concepts 

In light of the conclusions of the previous section, 
a natural thought would be to re-examine the 
solution concept that is being used. Truthfulness 
relies on the strong concept of dominant strate- 
gies: for each player there is a unique strategy that 
maximizes her utility, no matter what the other 
players are doing. This is very strong, but it fits 
very well the worst-case way of thinking in CS. 
What other solution concepts can be used? As de- 
scribed above, randomization, and truthfulness- 
in-expectation, can help. A related concept, again 
for randomized mechanisms, is truthfulness with 
high probability. Another direction is to consider 
mechanisms where players cannot improve their 
utility too much by deviating from the truth- 
telling strategy [21]. 

Algorithm designers do not care so much 
about actually reaching an equilibrium point, or 
finding out what will the players play — the major 
concern is to guarantee the optimality of the so- 
lution, taking into account the strategic behavior 
of the players. Indeed, one way of doing this is to 
guarantee a good equilibrium point. But there is 
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no reason to rule out mechanisms where several 
acceptable strategic choices for the players exist, 
provided that the approximation will be achieved 
in each of these choices. 

As a first attempt, one is tempted to simply 
let the players try and improve the basic result 
by allowing them to lie. However, this can cause 
unexpected dynamics, as each player chooses her 
lies under some assumptions about the lies of the 
others, etc. etc. To avoid such an unpredictable 
situation, it is important to insist on using rigor- 
ous game theoretic reasoning to explain exactly 
why the outcome will be satisfactory. 

The work [31] suggests the notion of “feasibly 
dominant” strategies, where players reveal the 
possible lies they consider, and the mechanism 
takes this into account. By assuming that the 
players are computationally bounded, one can 
show that, instead of actually “lying”, the players 
will prefer to reveal their true types plus all the 
lies they might consider. In such a case, since 
the mechanism has obtained the true types of 
the players, a close-to-optimal outcome will be 
guaranteed. 

Another definition tries to capture the initial 
intuition by using the classic game-theoretic no- 
tion of undominated strategies: 


Definition 4 ({5]) A mechanism M is an “al- 
gorithmic implementation of a c-approximation 
(in undominated strategies)” if there exists a set 
of strategies, D, such that (i) M obtains a c- 
approximation for any combination of strategies 
from D, in polynomial time, and (ii) For any 
strategy not in D, there exists a strategy in D 
that weakly dominates it, and this transition is 
polynomial-time computable. 


By the second condition, it is reasonable to as- 
sume that a player will indeed play some strategy 
in D, and, by the first condition, it does not 
matter what tuple of strategies in D will actually 
be chosen, as any of these will provide the ap- 
proximation. This transfers some of the burden 
from the game-theoretic design to the algorithmic 
design, since now a guarantee on the approxi- 
mation should bu provided for a larger range of 
strategies. Babaioff et al. [5] exploit this notion to 
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design a deterministic CA for multi-dimensional 
players that achieves a close-to-optimal approxi- 
mation guarantee. A similar-in-spirit notion, al- 
though a weaker one, is the notion of “Set- 
Nash” [25]. 


Applications 


One of the popular examples to a “real-life” com- 
binatorial auction is the spectrum auction that the 
US government conducts, in order to sell spec- 
trum licenses. Typical bids reflect values for dif- 
ferent spectrum ranges, to accommodate different 
geographical and physical needs, where different 
spectrum ranges may complement or substitute 
one another. The US government invests research 
efforts in order to determine the best format for 
such an auction, and auction theory is heavily 
exploited. Interestingly, the US law guides the 
authorities to allocate these spectrum ranges in 
a way that will maximize the social welfare, thus 
providing a good example for the usefulness of 
this goal. 

Adword auctions are another new and fast- 
growing application of auction theory in general, 
and of the new algorithmic auctions in particular. 
These are auctions that determine the advertise- 
ments that web-search engines place close to the 
search results they show, after the user submits 
her search keywords. The interested companies 
compete, for every given keyword, on the right to 
place their ad on the results’ page, and this turns 
out to be the main source of income for com- 
panies like Google. Several entries in this book 
touch on this topic in more details, including 
the entries on Adwords Pricing and on Position 
Auctions. 

A third example to a possible application, in 
the meanwhile implemented only in the academic 
research labs, is the application of algorithmic 
mechanism design to pricing and congestion con- 
trol in communication networks. The existing 
fixed pricing scheme has many disadvantages, 
both with respect to the needs of efficiently allo- 
cating the available resources, and with respect to 
the new opportunities of the Internet companies 
to raise more revenue due to specific types of 
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traffic. Theory suggests solutions to both of these 
problems. 


Cross-References 
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The topics presented here are detailed in the 
textbook [33]. Section “Problem Definition” is 
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“algorithmic mechanism design’. The book [14] 
covers the various aspects of combinatorial auc- 
tions. 
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Problem Definition 


A phylogenetic tree is a binary, rooted, unordered 
tree whose leaves are distinctly labeled. A phylo- 
genetic network is a generalization of a phyloge- 
netic tree formally defined as a rooted, connected, 
directed acyclic graph in which (1) each node has 
outdegree at most 2; (2) each node has indegree 1 
or 2, except the root node which has indegree 0; 
(3) no node has both indegree 1 and outdegree 1; 
and (4) all nodes with outdegree 0 are labeled 
by elements from a finite set L in such a way 
that no two nodes are assigned the same label. 
Nodes of outdegree 0 are referred to as leaves and 
are identified with their corresponding elements 
in L. Nodes with indegree 2 are called reticula- 
tion nodes. For any phylogenetic network JN, let 
U(N) be the undirected graph obtained from N 
by replacing each directed edge by an undirected 
edge. N is said to be a galled phylogenetic 
network (galled network, for short) if all cycles 
in U(N) are node-disjoint. Galled networks are 
also known in the literature as topologies with 
independent recombination events [15], galled- 
trees [6], and level-1 phylogenetic networks [2,5, 
7,9, 10, 14]. 

A phylogenetic tree with exactly three leaves 
is called a rooted triplet. The unique rooted triplet 
on a leaf set {x, y,z} in which the lowest com- 
mon ancestor of x and y is a proper descendant 
of the lowest common ancestor of x and z (or 
equivalently, where the lowest common ances- 
tor of x and y is a proper descendant of the 
lowest common ancestor of y and z) is denoted 
by xy|z. For any phylogenetic network N, the 
rooted triplet xy|z is said to be consistent with N 
if N contains three leaves labeled by x, y, and z 
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Algorithms for 
Combining Rooted 
Triplets into a Galled 
Phylogenetic Network, 
Fig.1 A dense set 7 = 
{ab|c, ab|d, cd|a, bc|d} 
of rooted triplets with leaf 
set {a,b,c,d}anda 
galled phylogenetic 
network that is consistent 
with 7. Note that this 
solution is not unique 


A 
a b 

A 
c d 

as well as two internal vertices w and z such that 
there are four directed paths of nonzero length 
from w to a, from w to Db, from z to w, and 
from z to c that are vertex-disjoint except for in 
the vertices w and z. A set 7 of rooted triplets is 
consistent with N if every rooted triplet in 7 is 
consistent with NV. See Fig. | for an example. 

Denote the set of leaves in any phylogenetic 
network N by A(N), and for any set 7 of 
rooted triplets, define A(T) = U, «7 A(t). 
A set 7 of rooted triplets is dense if for each 
{x,y,z} © A(T), at least one of the three 
possible rooted triplets xy|z, xz|y, and yz|x 
belongs to 7. Observe that if 7 is dense, then 


|7| = O(|A(T)|3). Jansson and Sung introduced 
the following problem in [10]. 


Problem 1 Given a set 7 of rooted triplets, out- 
put a galled network N with A(V) = A(7) such 
that N and 7 are consistent, if such a network 
exists; otherwise, output null. 


A natural optimization version of Problem 1 
is: 
Problem 2 Given a set 7 of rooted triplets, out- 
put a galled network N with A(NV) = A(7) that 
is consistent with the maximum possible number 
of rooted triplets belonging to 7. 


A generalization of Problem 1 studied by He 
et al. in [8] involves forbidden rooted triplets and 
is defined as follows. 


AY 
a b 
2 d 
b 
d 
b Cc 


Problem 3 Given two sets 7 and F of rooted 
triplets, output a galled network N with A(V) = 
A(T) U A(F) such that (1) N and 7 are consis- 
tent and (2) N is not consistent with any rooted 
triplet belonging to F; if no such network exists, 
output null. 


Below, we write L = A(7) andn = |L|. 


Key Results 


As shown in [11], Problem 1 can be solved in 
(optimal) O(|7|) = O(n?) time for dense inputs: 


Theorem 1 ({11]) Given any dense set JT 
of rooted triplets with leaf set L, a galled 
network consistent with T (if one exists) can 
be constructed in O(n?) time, where n = |L|. 


The algorithm referred to in Theorem | was 
extended by van Iersel and Kelk [14] as follows. 


Theorem 2 ({14]) Given any dense set T of 
rooted triplets with leaf set L, a galled network 
consistent with T (if one exists) that contains 
as few reticulation nodes as possible can be 
constructed in O(n*) time, where n = |L). 


For the more general case of nondense inputs, 
Problem | becomes harder: 


Theorem 3 ({11]) The problem of determining if 
there exists a galled network that is consistent 
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with an input nondense set T of rooted triplets 
is NP-hard. 


Since not all sets of rooted triplets are con- 
sistent with a galled network, it is of interest to 
consider Problem 2. It follows from Theorem 3 
that Problem 2 is also NP-hard for nondense 
inputs, and this motivates polynomial-time ap- 
proximation algorithms. Say that an algorithm 
for Problem 2 is an f-approximation algorithm 
if it always returns a galled network N such 
that “oe > f, where N(7) is the number of 
rooted triplets in 7 that are consistent with NV. 
Define the nonlinear recurrence relation S(n) = 
max <an{(8)+2-(8)-(1—a) +4-("5") + S(n 
a)} forn > 0 and S(0) = 0. It was shown in [4] 


that lim, soo 5H) = 2V3-) = 0.488033... 
3 


S(n) 


and that Sf > 265-0 ~ 0.488033... for 
a 


all n > 2. The following theorem was proved by 
Byrka et al. in [2]. 


Theorem 4 ([2]) There exists an we -approxim- 


3 
ation algorithm for Problem 2 that runs in 
O(n? + n|T]) time. 


A matching negative bound is: 


Theorem 5 ({11]) For any f > limy+o0 2@ 


3(7)’ 
there exists a set T of rooted triplets such Ve 
no galled network can be consistent with at least 
a factor of f of the rooted triplets in T. (Thus, 
no f -approximation algorithm for Problem 2 is 


possible.) 


For Problem 3, Theorem 3 immediately im- 
plies NP-hardness by taking F = 9. The follow- 
ing positive result is known for the optimization 
version of Problem 3. 


Theorem 6 ([8]) There exists an O(|L|?|T|(\T| 
+ |F|))-time algorithm for inferring a galled net- 
work N that guarantees |N(T)| — |N(F)| = 
3%: (T|—|FI), where L = A(T) U A(F). 


Finally, we remark that the analogous version 
of Problem 1| of inferring a phylogenetic tree con- 
sistent with all the rooted triplets in an input set 
(when such a tree exists) can be solved in poly- 
nomial time with a classical algorithm by Aho 
et al. [1] from 1981. Similarly, for Problem 2, to 


infer a phylogenetic tree consistent with as many 
rooted triplets from an input set of rooted triplets 
as possible is NP-hard and admits a polynomial- 
time 1/3-approximation algorithm, which is op- 
timal in the sense that there exist certain inputs 
for which no tree can achieve a factor larger than 
1/3. See, e.g., [3] for a survey of known results 
about maximizing rooted triplet consistency for 
trees. On the other hand, more complex network 
structures such as the level-k phylogenetic net- 
works [5] permit a higher percentage of the input 
rooted triplets to be embedded; in the extreme 
case, if there are no restrictions on the reticula- 
tion nodes at all, then a sorting network-based 
construction yields a phylogenetic network that 
is trivially consistent with every rooted triplet 
over L [10]. A number of efficient algorithms 
for combining rooted triplets into higher level 
networks have been developed; see, e.g., [2,7, 14] 
for further details and references. 


Applications 


Phylogenetic networks are used by scientists to 
describe evolutionary relationships that do not 
fit the traditional models in which evolution is 
assumed to be treelike. Evolutionary events such 
as horizontal gene transfer or hybrid speciation 
(often referred to as recombination events) which 
suggest convergence between objects cannot be 
represented in a single tree but can be modeled in 
a phylogenetic network as internal nodes having 
more than one parent (i.e., reticulation nodes). 
The phylogenetic network is a relatively new tool, 
and various fast and reliable methods for con- 
structing and comparing phylogenetic networks 
are currently being developed. 

Galled networks form an important class 
of phylogenetic networks. They have attracted 
special attention in the literature [5, 6, 15] due 
to their biological significance (see [6]) and 
their simple, almost treelike, structure. When 
the number of recombination events is limited 
and most of the recombination events have 
occurred recently, a galled network may suffice 
to accurately describe the evolutionary process 
under study [6]. The motivation behind the 
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rooted triplet approach taken here is that a highly 
accurate tree for each cardinality-three subset of 
the leaf set can be obtained through maximum 
likelihood-based methods or Sibley-Ahlquist- 
style DNA-DNA hybridization experiments 
(see [13]). The algorithms mentioned above can 
be used as the merging step in a divide-and- 
conquer approach for constructing phylogenetic 
networks analogous to the quartet method 
paradigm for inferring unrooted phylogenetic 
trees [12] and other supertree methods. We 
consider dense input sets in particular because 
this case can be solved in polynomial time. 


Open Problems 


The approximation factor given in Theorem 4 
is expressed in terms of the number of rooted 
triplets in the input 7, and Theorem 5 shows that 
it cannot be improved. However, if one measures 
the quality of the approximation in terms of a 
galled network Nopv that is consistent with 
the maximum possible number of rooted triplets 
from 7, Theorem 4 can be far from optimal. An 
open problem is to determine the polynomial- 
time approximability and inapproximability of 
Problem 2 when the approximation ratio is de- 
Noprty instead of a 

Another research direction is to develop fixed- 
parameter polynomial-time algorithms for Prob- 
lem |. The level of the constructed network, the 
number of allowed reticulation nodes, or some 
measure of the density of the input set of rooted 
triplet might be suitable parameters. 


URLs to Code and Data Sets 


A Java implementation of the algorithm for 
Problem | referred to in Theorem 2 (coded by 
its authors [14]) is available at http://skelk.sdf- 
eu.org/marlon.html. See also http://skelk.sdf-eu. 
org/levlathan/ for a Java implementation of a 
polynomial-time heuristic described in [9] for 
Problem 2. 
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Problem Definition 


Given a communications network or road net- 
work, one of the most natural algorithmic ques- 
tions is how to determine the shortest path from 
one point to another. The all pairs shortest path 
problem (APSP) is, given a directed graph G = 
(V, E,1), to determine the distance and shortest 
path between every pair of vertices, where |V| = 
n,|E| = m, and! : E — R is the edge length 
(or weight) function. The output is in the form 
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of two n x n matrices: D(u, v) is the distance 
from u to v and S(u,v) = w if (u,w) is the 
first edge on a shortest path from u to v. The 
APSP problem is often contrasted with the point- 
to-point and single source (SSSP) shortest path 
problems. They ask for, respectively, the shortest 
path from a given source vertex to a given target 
vertex and all shortest paths from a given source 
vertex. 


Definition of Distance 

If £ assigns only non-negative edge lengths then 
the definition of distance is clear: D(u, v) is the 
length of the minimum length path from u to v, 
where the length of a path is the total length of 
its constituent edges. However, if € can assign 
negative lengths then there are several sensible 
notations of distance that depend on how negative 
length cycles are handled. Suppose that a cycle 
C has negative length and that u,v € V are 
such that C is reachable from u and v reachable 
from C. Because C can be traversed an arbitrary 
number of times when traveling from u to v, there 
is no shortest path from u to v using a finite 
number of edges. It is sometimes assumed a priori 
that G has no negative length cycles; however it 
is cleaner to define D(u, v) = —oo if there is no 
finite shortest path. If D(u, v) is defined to be the 
length of the shortest simple path (no repetition of 
vertices) then the problem becomes NP-hard. (If 
all edges have length —1 then D(u, v) = —(n—1) 
if and only if G contains a Hamiltonian path [7] 
from u to v.) One could also define distance to be 
the length of the shortest path without repetition 
of edges. 


Classic Algorithms 

The Bellman-Ford algorithm solves SSSP in 
O(mn) time and under the assumption that edge 
lengths are non-negative, Dijkstra’s algorithm 
solves it in O(m + nlogn) time. There is a 
well known O(mn)-time shortest path preserving 
transformation that replaces any length function 
with a non-negative length function. Using 
this transformation and n runs of Dijkstra’s 
algorithm gives an APSP algorithm running in 
O(mn + n?logn) = O(n?) time. The Floyd- 
Warshall algorithm computes APSP in a more 
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direct manner, in O(n) time. Refer to [4] for 
a description of these algorithms. It is known 
that APSP on complete graphs is asymptotically 
equivalent to (min, +) matrix multiplication 
[1], which can be computed by a non-uniform 
algorithm that performs O(n?°) numerical 
operations [6]. 


Integer-Weighted Graphs 

Much recent work on shortest paths assume 
that edge lengths are integers in the range 
{-C,...,C} or {0,...,C}. One line of research 
reduces APSP to a series of standard matrix 
multiplications. These algorithms are limited in 
their applicability because their running times 
scale linearly with C. There are faster SSSP 
algorithms for both non-negative edge lengths 
and arbitrary edge lengths. The former exploit 
the power of RAMs to sort in o(n logn) time 
and the latter are based on the scaling technique. 
See Zwick [20] for a survey of shortest path 
algorithms up to 2001. 


Key Results 


Pettie’s APSP algorithm [12] adapts the hier- 
archy approach of Thorup [16] (designed for 
undirected, integer-weighted graphs) to general 
real-weighted directed graphs. Theorem | is the 
first improvement over the O(mn-+n? log n) time 
bound of Dijkstra’s algorithm on arbitrary real- 
weighted graphs. 


Theorem 1 Given a_ real-weighted directed 
graph, all pairs shortest paths can be solved 
in O(mn + n? log log n) time. 


This algorithm achieves a logarithmic speedup 
through a trio of new techniques. The first is to 
exploit the necessary similarity between the SSSP 
trees emanating from nearby vertices. The second 
is a method for computing discrete approximate 
distances in real-weighted graphs. The third is 
a new hierarchy-type SSSP algorithm that runs 
in O(m + nloglogn) time when given suitably 
accurate approximate distances. 
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Theorem | should be contrasted with the time 
bounds of other hierarchy-type APSP algorithms 
[11, 14, 16]. 


Theorem 2 ([14], 2005) Given a real-weighted 
undirected graph, APSP can be solved in 
O(mn log a(m, n)) time. 


Theorem 3 ({16], 1999) Given an undirected 
graph G(V, E,1), where € assigns integer edge 
lengths in the range { — 2""1,...,2""! — 1}, 
APSP can be solved in O(mn) time on a RAM 
with w-bit word length. 


Theorem 4 ([13], 2002) Given a real-weighted 
directed graph, APSP can be _ solved in 
polynomial time by an algorithm that performs 
O(mn loga(m,n)) numerical operations, where 
a is the inverse-Ackermann function. 


A secondary result of [12, 14] is that no 
hierarchy-type shortest path algorithm can 
improve on the O(m + nlogn) running time 
of Dijkstra’s algorithm. 


Theorem5 Let G be an input graph such 
that the ratio of the maximum to minimum 
edge length is r. Any hierarchy-type SSSP 
algorithm performs Q(m-+ min {n logn,n log r}) 
numerical operations if G is directed and 
Q(m + min{nlogn,nloglogr}) if G is 
undirected. 


Applications 


Shortest paths appear as a subproblem in other 
graph optimization problems; the minimum 
weight perfect matching, minimum cost flow, 
and minimum mean-cycle problems are some 
examples. A well known commercial application 
of shortest path algorithms is finding efficient 
routes on road networks; see, for example, 
Google Maps, MapQuest, or Yahoo Maps. 


Open Problems 


The longest standing open shortest path problems 
are to improve the SSSP algorithms of Dijkstra’s 
and Bellman-Ford on real-weighted graphs. 
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Problem 1 Is there an o(mn) time SSSP or point- 
to-point shortest path algorithm for arbitrarily 
weighted graphs? 


Problem 2 Is there an O(m) + o(n logn) time 
SSSP algorithm for directed, non-negatively 
weighted graphs? For undirected graphs? 


A partial answer to Problem 2 appears in 
[14], which considers undirected graphs. Perhaps 
the most surprising open problem is whether 
there is any (asymptotic) difference between the 
complexities of the all pairs, single source, and 
point-to-point shortest path problems on arbitrar- 
ily weighted graphs. 


Problem 3 Is point-to-point shortest paths eas- 
ier than all pairs shortest paths on arbitrarily 
weighted graphs? 


Problem 4 Is there a truly subcubic APSP al- 
gorithm, i.e., one running in time O(n>~£)? In 
a recent breakthrough on this problem, Williams 
[19] gave a new APSP algorithm running in 
n3 /20C/ een/ lose”) time. Vassilevska Williams 
and Williams [17] proved that a truly subcubic 
algorithm for APSP would imply truly subcubic 
algorithms for other graph problems. 


Experimental Results 


See [5, 8, 15] for recent experiments on SSSP 
algorithms. On sparse graphs the best APSP al- 
gorithms use repeated application of an SSSP 
algorithm, possibly with some precomputation 
[15]. On dense graphs cache-efficiency becomes 
a major issue. See [18] for a cache conscious 
implementation of the Floyd-Warshall algorithm. 

The trend in recent years is to construct a lin- 
ear space data structure that can quickly answer 
exact or approximate point-to-point shortest path 
queries; see [2,6,9, 10]. 


Data Sets 


See [5] for a number of U.S. and European road 
networks. 
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URL to Code 


See [5]. 
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Problem Definition 


The all pairs shortest path (APSP) problem is 
to compute shortest paths between all pairs of 
vertices of a directed graph with nonnegative real 
numbers as edge costs. Focus is given on shortest 
distances between vertices, as shortest paths can 
be obtained with a slight increase of cost. Classi- 
cally, the APSP problem can be solved in cubic 
time of O(n>). The problem here is to achieve 
a sub-cubic time for a graph with small integer 
costs. 

A directed graph is given by G = (V,E), 
where V = {1,...,m}, the set of vertices, and E 
is the set of edges. The cost of edge (i, j) € E is 
denoted by d;;. The (n, n)-matrix D is one whose 
(i, 7) element is d;;. It is assumed for simplicity 
that djj > O and dj; = 0 for alli # jf. If 
there is no edge from i to /, let djj = oo. The 
cost, or distance, of a path is the sum of costs of 
the edges in the path. The length of a path is the 
number of edges in the path. The shortest distance 
from vertex i to vertex j is the minimum cost 
over all paths from i to j, denoted by dj. Let 
D* = {diz}. The value of 7 is called the size of 
the matrices. 

Let A and B are (n,n)-matrices. The three 
products are defined using the elements of A and 
B as follows: (1) Ordinary matrix product over a 
ring C = AB (2) Boolean matrix product C = 
A- B (3) Distance matrix product C = A x B, 
where 


() cj = x aikbej, (2) cij = Vy dik \ bxj. 


k=1 k=1 
(3) cij = in lak + dxj}. 


The matrix C is called a product in each case; the 
computational process is called multiplication, 
such as distance matrix multiplication. In those 
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three cases, k changes through the entire set 
{1,...,n}. A partial matrix product of A and 
B is defined by taking k in a subset J of V. 
In other words, a partial product is obtained 
by multiplying a vertically rectangular matrix, 
A(x*, 1), whose columns are extracted from A 
corresponding to the set 7, and similarly a hor- 
izontally rectangular matrix, B(/,*), extracted 
from B with rows corresponding to /. Intuitively, 
I is the set of check points k, when going from i 
to j. 

The best algorithm [11] computes (1) in 
O(n®) time, where @ = 2.373. This was recently 
achieved as improvement from wm = 2.376 in [4] 
after more than two decades of interval. We 
2.376 to describe Zwick’s result in 
this article. Three decimal points are carried 
throughout this article. To compute (2), Boolean 
values 0 and 1 in A and B can be regarded 
as integers and use the algorithm for (1), and 
convert nonzero elements in the resulting matrix 
to 1. Therefore, this complexity is O(n®). The 
witnesses of (2) are given in the witness matrix 
W = {wij} where w;; = k for some k such that 
aix \ be; = 1. If there is no such k, wij = 0. 
The witness matrix W = {w;;} for (3) is defined 
by wij = & that gives the minimum to c;;. If 
there is an algorithm for (3) with 7T(n) time, 
ignoring a polylog factor of n, the APSP problem 
can be solved in O(T(n)) time by the repeated 
squaring method, described as the repeated use 
of D<—D x D O(logn) times. 

The definition here of computing shortest 
paths is to give a witness matrix of size n by 
which a shortest path from 7 to j can be given 
in O(£) time where @ is the length of the path. 
More specifically, if wi; = k in the witness 
matrix W = {w;;}, it means that the path from 
i to 7 goes through k. Therefore, a recursive 
function path(i, 7) is defined by (path(i,k), k, 
path(k, 7)) if path, 7) = k > O and nil if 
path(i, 7) = 0, where a path is defined by a list 
of vertices excluding endpoints. In the following 
sections, k is recorded in w;; whenever k is found 
such that a path from 7 to 7 is modified or newly 
set up by paths from i to k and from k to /. 
Preceding results are introduced as a framework 
for the key results. 


us¢é @ = 
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Alon-Galil-Margalit Algorithm 

The algorithm by Alon, Galil, and Margalit [1] 
is reviewed. Let the costs of edges of the given 
graph be ones. Let D© be the €-th approximate 
matrix for D* defined by a = dj if dj; < £, 
and a? = oo otherwise. Let A be the adjacency 
matrix of G, that is, a;; = 1 if there is an edge 
(i, j), and aj; = O, otherwise. Let a;; = 1 for 
alli. The algorithm consists of two phases. In the 
first phase, D© is computed for £ = 1,...,r, 
by checking the (i, j) element of Af = {aj,}. 
Note that if as, = |, there is a path from i 
to j of length @ or less. Since Boolean matrix 
multiplication can be computed in O(n®) time, 
the computing time of this part is O(rn®). 

In the second phase, the algorithm computes 
D© for € = r, [2r], Beak ...,n' by 
repeated squaring, where 7’ is the smallest integer 
in this sequence of & such that £ > n. Let 
Tig = Lila? = a} and J; = Tj, such 
that |7;,| is minimum for [€/2] < a < £. 
The key observation in the second phase is that 
it is only needed to check k in J; whose size 
is not larger than 2n/£, since the correct dis- 
tances between £ + 1 and [3£/2] can be obtained 
as the sum dj,” + diy for some k satisfying 
[£/2] < a’ < £. The meaning of J; is 
similar to J for partial products except that J 
varies for each 7. Hence, the computing time of 
one squaring is O(n3/¢). Thus, the time of the 
second phase is given with N = [log3/.”/r| 
by O( 2%, n3/((3/2)8r)) = O(n3/r). Bal- 
ancing the two phases with rn® = n3/r yields 
O(n@+3)/2) = O(n?-®88) time for the algorithm 
with r = O(nG-®)/2), 

Witnesses can be kept in the first phase in 
time polylog of n by the method in [2]. The 
maintenance of witnesses in the second phase is 
straightforward. 

When a directed graph G whose edge costs are 
integers between | and M is given, where M isa 
positive integer, the graph G can be expanded to 
G’ by creating up to M — | new vertices for each 
vertex and replacing each edge by up to M edges 
with unit cost. Obviously, the problem for G can 
be solved by applying the above algorithm to G’, 
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which takes O ((Mn)@+)/2) time. This time is 
sub-cubic when M < n°-!!©, The maintenance 
of witnesses has an extra polylog factor in each 
case. 

For undirected graphs with unit edge costs, 
O(n®) time is known in Seidel [9]. 


Takaoka Algorithm 

When the edge costs are bounded by a positive in- 
teger M, a better algorithm can be designed than 
in the above as shown in Takaoka [10]. Romani’s 
algorithm [7] for distance matrix multiplication is 
reviewed briefly. 

Let A and B be (n,m) and (m,n) distance 
matrices whose elements are bounded by M or 
infinite. Let the diagonal elements be 0. A and 
B are converted into A’ and B’ where aj; = 
(m + 1)”@-4ii, if dij # ov, 0, otherwise, and 
bi, = (m + 1)¥@-4is, if bi; A 00, 0, otherwise. 

Let C’ = A’B’ be the product by ordinary 
matrix multiplication and C = A x B be that 
by distance matrix multiplication. Then, it holds 
that 


m 


cy = Son + yet, 
k=1 


cij = 2M — [log 41 Cj, 1. 


This distance matrix multiplication is called 
(n,m)-Romani. In this section, the above 
multiplication is used with square matrices, that 
is, (n,)-Romani is used. In the next section, the 
case where m < n is dealt with. 

C can be computed with O(n®) arithmetic 
operations on integers up to (n + 1)™. Since 
these values can be expressed by O(M logn) 
bits and Schénhage and Strassen’s algorithm 
[8] for multiplying k-bit numbers takes 
O(k logk loglogk) bit operations, C can be 
computed in O(n® M logn log(M logn) log log 
(M logn)) time, or O(Mn®) time. 

The first phase is replaced by the one based on 
(n,n)-Romani and the second phase is modified 
based on path lengths, not distances. 

Note that the bound M is replaced by £M 
in the distance matrix multiplication in the first 
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phase. Ignoring polylog factors, the time for the 
first phase is given by O(n®r2M). It is as- 
sumed that M is O(n") for some constant k. 
Balancing this complexity with that of second 
phase, O(n3/r), yields the total computing time 
of O(n&t)/3 71/3) with the choice of r = 
O(n@-®)/3 M—-"/3). The value of M can be al- 
most O(n°-6?4) to keep the complexity within 
sub-cubic. 


Key Results 


Zwick improved the Alon-Galil-Margalit algo- 
rithm in several ways. The most notable is an im- 
provement of the time for the APSP problem with 
unit edge costs from O(n?-88) to O(n?57>). The 
main accelerating engine in Alon-Galil-Margalit 
[1] was the fast Boolean matrix multiplication 
and that in Takaoka [10] was the fast distance ma- 
trix multiplication by Romani, both powered by 
the fast matrix multiplication of square matrices. 

In this section, the engine is the fast distance 
matrix multiplication by Romani powered by the 
fast matrix multiplication of rectangular matrices 
given by Coppersmith [3] and Huang and Pan 
[5]. Suppose the product of (n,m) matrix and 
(m,n) matrix can be computed with O(n@“-“))) 
arithmetic operations, where m = n” with O < 
pe < 1. Several facts such as O(n®0-1)) = 
O(n2-376) and O(n®(10.294,1)) = O(n?) are 
known. To compute the product of (1,1) square 
matrices, n!— matrix multiplications are needed, 
resulting in O(n®@@--)+1!-/) time, which is re- 
formulated as O(n?*+“), where ju satisfies the 
equation w(1,w,1) = 24+ 1. Also, the upper 
bound of w(1, jz, 1) is given by 


o(1,u,1)=2,if0<w<a 
o(1, uw, 1) = 24+ (@—-2) 
(u—a)/U-a@), ifa<psl 


The best known value for jz, when [12] was 
published, was yp = 0.575, derived from the 
above formulae, a > 0.294 and w < 2.376. 
So, the time becomes O(n?:>7>), which is not as 
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good as O(n?-37°), Thus, we use the algorithm 
for rectangular matrices in the following. 

The above algorithm for rectangular matrix 
multiplication is incorporated into (n,m)- 
Romani with m = n“ and M = ni’, and 
the computing time of O(Mn®0"-)), The 
next step is how to incorporate (n,m)-Romani 
into the APSP algorithm. The first algorithm 
is a monophase algorithm based on repeated 
squaring, similar to the second phase of the 
algorithm in [1]. To take advantage of rectangular 
matrices in (n,m)-Romani, the following 
definition of the bridging set is needed, which 
plays the role of the set 7 in the partial distance 
matrix product in section “Problem Definition.” 

Let 5(7, 7) be the shortest distance from i to /, 
and 7(i, 7) be the minimum length of all shortest 
paths from i to 7. A subset J of V is an €- 
bridging set if it satisfies the condition that if 
n(i, j) = £, there exists k € J such that 6(i, 7) = 
6(i,k) + 6(k, 7). I is a strong €-bridging set if it 
satisfies the condition that if n(i, 7) > 2£, there 
exists k € J such that 6(i, 7) = 6(@7,k) + 6(k, j) 
and n(i, 7) = n@i,k) + nk, j). Note that those 
two sets are the same for a graph with unit edge 
costs. 

Note that if (2/3)€ < wi, j) < € and J is 
a strong £/3-bridging set, there is ak € J such 
that 6G, 7) = 6(@7,k) + d(k, j) and wi, j) = 
(i,k) + wk, 7). With this property of strong 
bridging sets, (1, m)-Romani can be used for the 
APSP problem in the following way. By repeated 
squaring in a similar way to Alon-Galil-Margalit, 
the algorithm computes D© for € = 1, [2], 


Beles where n’ is the first value of 


£ that exceeds n, using various types of set I 
described below. To compute the bridging set, the 
algorithm maintains the witness matrix with extra 
polylog factor in the complexity. In [12], there are 
three ways for selecting the set J. Let |J| = n" 
for some r such that 0 < r < 1. 


1. Select 9nInn/€ vertices for Jn from V at 
random. In this case, it can be shown that 
the algorithm solves the APSP problem with 
high probability, i.e., with 1 — 1/n° for some 
constant c > 0, which can be shown to 
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be 3. In other words, J is a strong €/3- 
bridging set with high probability. The time T 
is dominated by (n,m)-Romani. It holds that 
T = O(€Mn®4")), since the magnitude 
of matrix elements can be up to £M. Since 
m = O(ninn/£) = n’, it holds that € = 
O(n'—*), and thus T = O(Mn!7n27-)), 
When M = 1, this bound onr is w = 
0.575, and thus T = O(n?>7°). When M = 
n' > 1, the time becomes O(n?+#), where 
t <3-—q@ = 0.624 and w = p(t) satisfies 
o(1, uw, 1) = 1+2y-1. Itis determined from 
the best known (1, jz, 1) and the value of f. 
As the result is correct with high probability, 
this is a randomized algorithm. 


. Consider the case of unit edge costs here. 


In (1), the computation of witnesses is an 
extra thing, i.e., not necessary if only shortest 
distances are needed. To achieve the same 
complexity in the sense of an exact algorithm, 
not a randomized one, the computation of 
witnesses is essential. As mentioned earlier, 
maintenance of witnesses, that is, matrix W, 
can be done with an extra polylog factor, 
meaning the analysis can be focused on Ro- 
mani within the O-notation. Specifically, 
is selected as an £/3-bridging set, which is 
strong with unit edge costs. To compute [ 
as an O(£)-bridging set, obtain the vertices 
on the shortest path from i to j for each i 
and j using the witness matrix W in O(£) 
time. After obtaining those n? sets spending 
O(£n7) time, it is shown in [12] how to obtain 
a O(€)-bridging set of O(n Inn/£) size within 
the same time complexity. The process of 
obtaining the bridging set must stop at £ = 
n'/2 as the process is too expensive beyond 
this point, and thus, the same bridging set is 
used beyond this point. The time before this 
point is the same as that in (1) and that after 
this point is O (n?-°). Thus, this is a two-phase 
algorithm. 


. When edge costs are positive and bounded 


by M = n‘ > 0, a similar procedure can 
be used to compute an O(€)-bridging set of 
O(nInn/£) size in O(€n?) time. Using the 
bridging set, the APSP problem can be solved 
in O(n2+#©) time in a similar way to (1). 


All-Distances Sketches 


The result can be generalized into the case 
where edge costs are between —M and M 
within the same time complexity by modifying 
the procedure for computing an £-bridging set, 
provided there is no negative cycle. The details 
are shown in [12]. 


Applications 


The eccentricity of a vertex v of a graph is the 
greatest distance from v to any other vertices. The 
diameter of a graph is the greatest eccentricity of 
any vertices. In other words, the diameter is the 
greatest distance between any pair of vertices. If 
the corresponding APSP problem is solved, the 
maximum element of the resulting matrix is the 
diameter. 


Open Problems 


Recently, LeGall [6] discovered an algorithm 
for multiplying rectangular matrices with 
@(1, 0.530, 1) < 2.060, which gives the upper 
bound pp < 0.530. This improves the complexity 
of APSP with unit edge costs from O(n?°>7>) 
by Zwick to O(n?->3°) in the same framework 
as that of Zwick in this article. Two major 
challenges are stated here among others. The 
first is to improve the complexity of O(n?-53°) 
for the APSP with unit edge costs. 

The other is to improve the bound of M < 
O(n°-%4) for the complexity of the APSP with 
integer costs up to M to be sub-cubic. 
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Problem Definition 


All-distances sketches (The term least element 
lists was used in [3]; the terms MV/D lists and 
Neighborhood summaries were used in [6].) are 
randomized summary structures of the distance 
relations of nodes in a graph. The graph can 
be directed or undirected, and edges can have 
uniform or general nonnegative weights. 


Preprocessing Cost 

A set of sketches, ADS(v) for each node v, 
can be computed efficiently, using a near-linear 
number of edge traversals. The sketch sizes are 
well concentrated, with logarithmic dependence 
on the total number of nodes. 


Supported Queries 
The sketches support approximate distance-based 
queries, which include: 


¢ Distance distribution: The query specifies a 
node v and value d > O and returns the 
cardinality |Nqg(v)| of the d-neighborhood of 
v Na(v) = {u | dy, < d}, where dy 
is the shortest path distance from u to v. 
We are interested in estimating |Nq(v)| from 
ADS(v). 

A related property is the effective diameter 
of the graph, which is a quantile of the dis- 
tance distribution of all node pairs; we are in- 
terested in computing an estimate efficiently. 

¢ Closeness_ centrality (distance-decaying) 
is defined for a node v, a monotone 
nonincreasing function a(x) > O (where 
a(+oo) = 0), and a nonnegative function 
B(u) = 0: 


Ca,p(v) = Y>o(dyy)B). (A) 


u 


The function a specifies the decay of rele- 
vance with distance and the function 6 weighs 
nodes based on metadata to focus on a topic or 
property of relevance. Neighborhood cardinal- 
ity is a special case obtained using 6 = | and 
a(x) = lif x < d anda(x) = 0 otherwise. 
The number of reachable nodes from v is 
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obtained using a(x) = 1. Also studied were 
exponential decay a(x) = 2~* with distance 
[10], the (inverse) harmonic mean of distances 
a(x) = 1/x [1, 14], and general decay func- 
tions [4,6]. We would like to estimate Cy g (v) 
from ADS(v). 

¢ Closeness similarity [8] relates a pair of nodes 
based on the similarity of their distance rela- 
tions to all other nodes. 


yi a(max{dy,j,dv,j})BU) 
Le; a(min{dy,j. dv, BU) 
(2) 


SIM(v, u) = 
an (v, u) 


We would like to estimate SIMy g(v,u) € 
[0, 1] from ADS(v) and ADS(u). 

¢ Timed influence of a seed set S of nodes 
depends on the set of distances from S to other 
nodes. Intuitively, when edge lengths model 
transition times, the distance is the “elapsed 
time” needed to reach the node from S. Influ- 
ence is higher when distances are shorter: 


INF(S) =} /a(mindi)BU). 3) 
, J 


We would like to estimate INF, (S}) from the 
sketches {ADS(v) | v € S}. 

¢ Approximate distance oracles: For two nodes 
v,u, use ADS(u) and ADS(v) to estimate d,,y. 


Key Results 


We provide a precise definition of ADSs, 
overview algorithms for scalable computation, 
and finally discuss estimators. 


Definition 
The ADS of a node v, ADS(v), is a set of node ID 
and distance pairs (u, dy,). The included nodes 
are a sample of the nodes reachable from v. 
The sampling is such that the inclusion proba- 
bility of a node is inversely proportional to its 
Dijkstra rank (nearest neighbor rank). That is, the 
probability that the 7th closest node is sampled is 
proportional to 1/7. 

The ADSs are defined with respect to random 
mappings/permutations r of the set of all nodes 
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and come in three flavors: bottom-k, k-mins, and 
k-partition. The integer parameter k determines 
a tradeoff between the sketch size and computa- 
tion time on one hand and the information and 
estimate quality on the other. For simplicity, we 
assume here that distances dy, are unique for dif- 
ferent u (using tie breaking). We use the notation 
@.,(v) for the set of nodes that are closer to 
node v than node u and my, = 1 + |®<,(v)| 
for the Dijkstra rank of u with respect to v (u is 
the zy, closest node from v). For a set N and 
a numeric function r : N, the function k,th(V) 
returns the kth smallest value in the range of r on 
N.If |N| < k, then we define k,;th(V) to be the 
supremum of the range of r. 


A bottom-k ADS [3,7] is defined with respect to 
a single random permutation r. ADS(v) includes 
a node u if and only if the rank r(u) is one of the 
k smallest ranks among nodes that are at least as 
close to v: 


u€ ADS(v) => r(u) <k,th(®<,(v)). (4 
A k-partition ADS (implicit in) [2] is defined 
with respect to a random partition BUCKET : 
V —= [k] of the nodes to k subsets and a random 
permutation 7. ADS(v) includes u if and only if 
u has the smallest rank among nodes in its bucket 
that are at least as close to v. 


ué€ADS(v) <=> r(u) 


BUCKET(h) = BUCKET(u) 


<min{r(h) | Keon) 


A k-mins ADS [3,15] is k bottom-1 ADSs, de- 
fined with respect to k independent permutations. 

It is often convenient to specify the ranks 
r(j) and (for k-partition ADSs) the bucket 
BUCKET(/ ) using random hash functions, so they 
are readily available from the node ID. The same 
randomization is used for all nodes, which results 
in the sketches being coordinated. This means 
that a node sampled in one sketch is more likely 
to be included in other sketches. Coordination is 
an artifact of scalable computation of the sketches 
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but also facilitates more accurate similarity and 
influence queries. 


Relation to MINHASH Sketches 

All-distances sketches are related to MINHASH 
sketches: ADS(v) is the union of the MINHASH 
sketches of the neighborhoods Ng(v), for all 
possible values of d. We explain how a MIN- 
HASH sketch of a neighborhood Ng(v) can be 
obtained from ADS(v). From a k-mins ADS, 
we obtain a k-mins MINHASH sketch of Ng(v), 
which includes for each of the & permutations r 
the value x < min,ew,,(v) ’(u). Note that x is the 
minimum rank of a node of distance at most d in 
the respective bottom-1 ADS defined for r. The 
k minimum rank values x t¢ € [k] we obtain 
from the different permutations are the k-mins 
MINHASH sketch of Ng(v). We now consider 
obtaining a bottom-k MINHASH sketch of Nq(v) 
from a bottom-k ADS(v). The MINHASH sketch 
of Na(v) includes the k nodes of minimum rank 
in Nqg(v), which are also the k nodes of mini- 
mum rank in ADS(v) within distance at most d. 
Finally, a k-partition MINHASH sketch of Nq(v) 
is similarly obtained from a k-partition ADS by 
taking, for each bucket 7 € [k], the smallest rank 
value of a node in bucket i that is in Ng(v). This 
is also the smallest value in ADS(v) over nodes 
in bucket 7 that have distance at most d from v. 


Direction 

For directed graphs, influence, centrality, and 
closeness similarity queries can be defined with 
respect to either forward or reversed distances. 
Accordingly, we can separately consider for each 
node v the forward ADS and the backward ADS 
of each node, which are defined respectively 
using forward or reverse paths from v. 


Node Weights 

All the ADS flavors can be extended to be with 
respect to specified node weights B(v) > 0 [3,4]. 
This makes queries formulated with respect to the 
weights more efficient. 


Algorithms 
There are two meta-approaches for scalable 
computation of an ADS set. The first approach, 


62 


PRUNEDDIJKSTRA (Algorithm 1), performs 
pruned applications of Dijkstra’s — single- 
source shortest paths algorithm (BFS when 
unweighted) [3, 7]. The second approach, DP 
CGmplicit in [2, 15]), applies to unweighted 
graphs and is based on dynamic programming 
or Bellman-Ford shortest paths computation. 
LOCALUPDATES (Algorithm 2) [4] extends DP 
to weighted graphs. LOCALUPDATES is node- 
centric and is appropriate for MapReduce or 
similar platforms, but can incur more overhead 
than PRUNEDDIJKSTRA. The algorithms are 
presented for bottom-k sketches, but both 
approaches can be easily adopted to work with 
all three ADS flavors. 


Algorithm 1: ADS set for G via PRUNED- 
DIJKSTRA 
1 for u by increasing r(u) do 


Perform Dijkstra from u on G7 (the transpose 
graph) 

3 foreach scanned node v do 

4 if |{(x, y) € ADS(u) | y < dyu}| =k 

then 

5 | prune Dijkstra at v 

6 else 

7 | ADS(v) <— ADS(v) U {(ry), dou) } 


Both PRUNEDDIJKSTRA and DP can be 
performed in O(km logn) time (on unweighted 
graphs) on a single processor in main memory, 
where n and m are the number of nodes and 
edges in the graph. These algorithms maintain a 
partial ADS for each node, as entries of node 
ID and distance pairs. ADS(v) is initialized 
with the pair (v,0). The basic operation we 
use is edge relaxation: when relaxing (v,u), 
ADS(v) is updated using ADS(u). For bottom-k, 
the relaxation modifies ADS(v) when ADS(u) 
contains a node 7 such that r(i) is smaller than 
the kth smallest rank among nodes in ADS(v) 
with distance at most d,; + Wy, from v. Both 
PRUNEDDIJKSTRA and DP perform relaxations 
in an order which guarantees that inserted entries 
are part of the final ADS, that is, there are no 
other nodes that are both closer and have lower 
rank: PRUNEDDIJKSTRA iterates over all nodes 
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in increasing rank, runs Dijkstra’s algorithm from 
the node on the transpose graph, and prunes at 
nodes when the ADS is not updated. DP performs 
iterations, where in each iteration, all edges 
(v,u), such that ADS(v) was updated in the 
previous step, are relaxed. Therefore, entries are 
inserted by increasing distance. 


Algorithm 2: ADS set for G via LOCALUP- 
DATES 
// Initialization 
1 for udo 
2 | ADS(w) < {(r@), 0)} 


// Propagate updates (r,d) at node 
u 

3 if (7, d) is added to ADS(u) then 

4 foreach y | (u, y) € G do 

5 | send (7,d + w(u, y)) to y 


// Process update (r,d) received at 
u 
6 if node ureceives (r,d) then 
7 ifr <k*{(x, y) € ADS(u) | y < d} then 
8 | ADS(u) <— ADS(@) U {(r(v), d)} 


// Clean-up ADS(u) 
9 for entries (x, y) € ADS(u) | y > d by 
increasing y do 


10 if x > kP{(h, z) € ADS(u) | z < y} 
then 

11 | ADS(u) <— ADS(u) \ (x, y) 

Estimation 


Distance Distribution 

Neighborhood cardinality queries for a node v, 
and d > O can be estimated with a small 
relative error from ADS(v). The generic estima- 
tor extracts a MINHASH sketch of the neighbor- 
hood Ng(v) from ADS(v) and applies a MIN- 
HASH cardinality estimator to this sketch. This 
approach was used in [3,7, 15]. A nearly optimal 
estimator, the Historic Inverse Probability (HIP) 
estimator [4], has a factor 2 improvement in 
variance by using all information in the ADS 
instead of just the MINHASH sketch. HIP works 
by considering for each entry (u, d) in the sketch, 
the HIP threshold probability, which is the prob- 
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ability, under randomly drawn rank for the node 
u, but fixing ranks of all other nodes, that the 
entry is included in the sketch. The entry then 
obtains an adjusted weight that is the inverse 
of the HIP threshold probability. Neighborhood 
cardinality can be estimated by the sum of the 
adjusted weights of ADS entries that fall in the 
neighborhood. 


Closeness Centrality (Distance-Decaying) 

Cy,g(v) can be estimated from ADS(v) with a 
small relative error when the set of ADSs is 
computed with respect to $. Estimators using 
MINHASH sketches were given in [6]. The tighter 
HIP estimator in [4] simply sums, over entries 
(u,d) € ADS(v), the product of the adjusted 
weight of the entry and a(d)A(u). 


Closeness Similarity and Influence 

When the sketches are computed with respect to 
B, the closeness similarity of two nodes u and 
v can be estimated from ADS(u) and ADS(v) 
within a small additive € [8]. The influence of a 
set of nodes S can be estimated from {ADS(v) | 
v € S$} to within a small relative error [9]. These 
estimators are instances of the L* estimator [5] 
applied with the HIP inclusion probabilities [4]. 


Approximate Distance Oracles 

An upper bound on the distance of two nodes u, v 
can be computed from ADS(u) and ADS(v) [8]. 
This is done by looking at the minimum, over 
nodes hf that are in the intersection ADS(u) 9 
ADS(v), of diy + day. When the graph is undi- 
rected, the oracle has worst-case quality guaran- 
tees that match the distance oracle of [16] (oracle 
time can be improved by looking only at nodes in 
the few ADS entries that correspond to k = 1). 
We note that observed quality in practice (using 
the full oracle) tends to have a small relative 
error [8]. 


Applications 
Massive graphs, with billions of edges, are preva- 


lent and model web graphs and social networks. 
Centralities, similarities, influence, and distances 
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are basic data analysis tasks on these graphs. 
ADSs are a powerful tool for scalable analysis of 
very large graphs. 


Extensions 
An ADS can be viewed as a MINHASH sketch 
constructed from a stream, where all updates are 
recorded. This means that the HIP estimator [4] 
can be applied for distinct counting on streams, 
obtaining improved performance over estimators 
applied to the MINHASH sketch alone [11, 12]. 
In a graph context, ADS(v) is a recording of 
all updates to a MINHASH sketch obtained by 
sweeping through nodes in increasing distance 
from v. More generally, we can construct ADS 
for other settings and apply the same estimation 
machinery. One example is Euclidean distances 
[6, 13]. Another example is constructing a com- 
bined ADS of multiple graphs for the application 
of timed influence oracles [9]. 
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Problem Definition 


A parameterized problem is a language L C 
&'* x N. Such a problem is said to be fixed- 
parameter tractable if there is an algorithm that 
decides if (x,k) € L in time f(k)|X|O™. 
For attacking an intractable problem within the 
multivariate algorithmic framework, a necessary 
first step is to identify some reasonable param- 
eters. The relevance of an FPT algorithm will 
depend on the quality of the choice of parameters. 
The first objective is of a practical concern: the 
choice of parameter should not “cheat,” that is, 
it should be a choice that leads to tractability in 
the context of instances that are relevant to real- 
world applications. On the other hand, the param- 
eter should also lend a perspective that is useful 
to the algorithm designer, usually by provid- 
ing additional structural insights, thereby making 
an otherwise unwieldy problem manageable. Fi- 
nally, the parameter itself should be accessible, 
in the sense that it should either typically accom- 
pany the input or be easy to compute from the 
input. 

For a combinatorial optimization problem, the 
size of the desired solution is a natural param- 
eter. For a minimization problem, it is usually 
reasonable to assume that this parameter is also 
small in practice. For a maximization problem, 
the dual parameter, which is the difference from 
the best possible upper bound on the optimum, 
is also a natural choice. For example, consider 
the problem of satisfying at least k clauses of 
a CNF formula. Here, the standard parameter 
would be k, while the dual parameter would be 
(m — k): in other words, can we satisfy all but k 
clauses in the formula? In the rest of this section, 
we broadly describe the other possibilities for 
parameters. 


Structural Parameterizations 

Structural parameters are a considered attempt at 
acknowledging that various aspects of an instance 
influence its complexity. A classic example is 
ML-type checking [11], which is an NP-complete 
problem but can be resolved in time O(2* ne, 
where k is the maximum nesting depth of the 
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input program. Fortunately, nesting depths of 
most programs are no more than four or five; the 
algorithm proposed is entirely adequate for real- 
world instances. 

Since every problem context is inherently 
suggestive of several possible parameters, we 
are only able to describe a few illustrative 
examples. In the context of graph problems, 
width parameters such as treewidth, cliquewidth, 
and rankwidth have enjoyed immense success. 
The notion of treewidth is particularly popular 
because of a number of real-world instances 
that are known to exhibit small treewidth, and 
on the other hand, the theoretical foundations 
of algorithms on graphs of bounded treewidth 
are extremely well established and actively 
developed (see, e.g., [3, Chapter 7]). An 
analogous notion for treewidth for directed 
graphs remains elusive, although — several 
proposals with varying merits exist in the 
literature [7]. 

Special graph classes, such as interval graphs, 
chordal graphs, planar graphs, and so on, 
have been extensively studied, and most hard 
problems turn out to be tractable on these 
classes. For an arbitrary graph, one might hope 
that the tractability carries over if the graph 
is “close enough” to being, say, a chordal 
graph. An increasingly popular program involves 
considering distance-to-C parameterizations, 
where C is a class of graphs on which the 
problem of interest is easily solvable. For 
instance, we might let k be the size of a smallest 
subset of vertices whose removal makes the 
input graph a member of C. Other measures 
of closeness, using operations like addition, 
removal, or contraction of edges, are also 
frequently considered. 

In the context of satisfiability and constraint 
satisfaction also, the notion of distance from 
tractable subclasses has garnered much attention 
in recent times. This is formalized by the notion 
of backdoors, which are subsets of variables 
whose “removal” makes the formula tractable 
(for instance, one of the Schaefer classes). Much 
work has been done with backdoors as parame- 
ters for determining satisfiability, and we refer the 
reader to [9]. 
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Above or Below Guarantee 

Parameterizations 

Consider the standard parameterization of VER- 
TEX COVER: given a graph G = (V,E) onn 
vertices, decide whether G has a vertex cover 
of size at most k. The best-known algorithm for 
vertex cover is due to Chen et al. [1] and runs 
in time O(1.2852* + kn). Observe that if G 
has a matching of size jz, then any vertex cover 
also has size at least jz. In particular, if G has a 
perfect matching, then all vertex covers of G have 
§2(n) edges, and even the FPT algorithms for the 
standard parameter will be obliged to spend time 
that is exponential in n. 

Mahajan and Raman [13] consider the follow- 
ing alternative parameterization: does G have a 
vertex cover of size at most + k? Note that the 
parameter k here is the size of the vertex cover 
above the matching size. Since the matching size 
is a guaranteed lower bound on the vertex cover 
size, this problem is referred to as the ABOVE 
GUARANTEE VERTEX COVER problem. Just as 
one can parameterize above guaranteed values, 
one can consider parameterizations below guar- 
anteed values. A classic example is the follow- 
ing variant of VERTEX COVER: given a planar 
graph G = (V, £) onn vertices and an integer 
parameter k, does G admit a vertex cover of size 
[3n/4| —k? 


Key Results 


One of the earliest attempts at parameterizing by 
the size of the vertex cover was made in [4]. 
Various graph layout problems were considered, 
where it turned out that a small vertex cover led 
to a very convenient structure for formulating 
a linear program. For many of these problems, 
it is not known if they are FPT parameterized 
by treewidth, and some are hard even on graphs 
of bounded treewidth (indeed, BANDWIDTH is 
NP-hard even on trees). This justifies the need 
for a stronger structural parameter, and in these 
examples, vertex cover turned out to be a very 
fruitful parameterization. 

These examples led to a broader theme, 
namely, the “complexity ecology of parameters” 
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program, which was proposed in [5]. The 
theoretical foundations of this program were 
further established and surveyed in [6]. An 
immediate concern is that of how one formalizes 
the structural parameterization in question. Along 
the lines of [6], we distinguish the following 
possible objectives from the formalization: 


1. The complexity of verifying and then exploit- 
ing a bounded parameter value 

2. The complexity of exploiting structure that 
is guaranteed, but not given explicitly (a 
“promise” problem) 

3. The complexity of exploiting structure that is 
explicitly provided along with the input, as a 
“witness” 


The first setting is the most general but also 
the most computationally restrictive, as it puts 
the burden of discovering the structure also on 
the algorithm. This definition makes the study 
of parameters like bandwidth and cliquewidth 
prohibitive, as these are hard to determine even in 
the parameterized framework. On the other hand, 
while the other two notions are increasingly re- 
laxed, the premise of a promise or the availability 
of witnesses in real-world witnesses remains a 
concern. We point the reader to [6] for a detailed 
discussion of the precise formalisms and their 
respective merits and trade-offs. 

One of the major theoretical themes with al- 
ternate parameterizations is the exercise of iden- 
tifying meta-theorems that explain the influence 
of the parameter over a large class of prob- 
lems, usually specified in an appropriate logic. 
A cornerstone result of this kind is Courcelle’s 
theorem, establishing that a problem express- 
ible in Monadic Second Order Logic is FPT 
when parameterized by treewidth and the size 
of the formula [2]. Several generalizations of 
Courcelle’s theorem have since been proposed, 
and many of them are surveyed in [10]. More 
recent work also establishes a similar result in 
the context of kernelization and parameters like 
vertex cover [8]. 

There is a rich literature that evidences the 
growing consideration of alternate parameter- 
izations for optimization problems in varied 
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contexts. As a concluding example, we turn 
our attention to [14], which serves to illustrate 
the scale at which it is possible to execute 
an exercise in understanding a question from 
several perspectives. Given two graphs H and 
G, the SUBGRAPH ISOMORPHISM problem 
asks if H is isomorphic to a subgraph of G. 
In [14], a framework is developed involving 
ten relevant parameters for each of H and G 
(such as treewidth, maximum degree, number of 
components, and so on). The generic question 
addressed in this work is if the problem admits 
an algorithm with running time: 


fi(p1, P2,---s Pe) nJ2PepiesPk) 


where each of p1,..., Pg is one of the ten pa- 
rameters depending only on H or G. We refer 
the reader to Figure 1 in [14] for a concise tab- 
ulation of the results. Notably, all combinations 
of questions (the number of which runs into the 
billions) are answered by a set of 28 of positive 
and negative results. 

There are many examples of problems that 
are parameterized away from guaranteed bounds. 
We note that parameterizing vertex cover above 
the LP optimum has attracted considerable inter- 
est because a number of fundamental problems 
including Above Guarantee Vertex Cover, Odd 
Cycle Transversal, Split Vertex Deletion, and 
Almost 2-SAT reduce to this problem. Indeed, for 
many of these problems, the fastest algorithms at 
the time of this writing are obtained by reducing 
these problems to vertex cover parameterized 
above the LP optimum [12]. 


Open Problems 


We direct the reader to the excellent survey [6] for 
several open problems concerning specific com- 
binations of parameters for particular problems. 
In an applied context, an interesting possibility is 
to investigate if parameters can be learned from 
large samples of data. 
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Problem Definition 


While the competitive ratio [19] is the most 
common metric in online algorithm analysis and 
it has led to a vast amount of knowledge in the 
field, there are numerous known applications in 
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which the competitive ratio produces unsatisfac- 
tory results. Far too often, it leads to unrealisti- 
cally pessimistic measures including the failure to 
distinguish between algorithms that have vastly 
differing performance under any practical char- 
acterization in practice. Because of this there, 
has been extensive research in alternatives to the 
competitive ratio, with a renewed effort in the 
period from 2005 to the present date. 

The competitive ratio metric can be derived 
from the observation that an online algorithm, in 
essence, computes a partial solution to a prob- 
lem using incomplete information. Then, it is 
only natural to quantify the performance drop 
due to this absence of information. That is, we 
compare the quality of the solution obtained by 
the online algorithm with the one computed in 
the presence of full information, namely, that of 
the offline optimal OPT, in the worst case. More 
formally, 


Definition 1 An online algorithm A is said to 
have (asymptotic) competitive ratio c if A(a) < 
c-OPT(o) +5 for all input sequences o and fixed 
constants b and c. 


The early literature considered only algorithms 
with constant competitive ratio, and all others 
are termed as algorithms with unbounded com- 
petitive ratio. However, it is easy to extend this 
definition to a C(n)-competitive algorithm as 
follows: 


Definition 2 An online algorithm A is said 
to have (asymptotic) competitive ratio C(n) if 
A(a) < C(n)- oPT(c) + 5 for all o and a fixed 
constant b. When b = 0, C(n) is termed the 
absolute competitive ratio. 


A natural expectation would be that the per- 
formance of OPT reflects both knowledge of the 
future and the inherent structure of the specific 
instance being solved, and hence, an online algo- 
rithm with optimal competitive ratio must handle 
most if not all instances in an efficient manner. 
Unfortunately, for most problems, the worst-case 
nature of the competitive ratio leads to algorithms 
of varying degrees of sophistication having the 
same equally bad competitive ratio. As a con- 
sequence the competitive ratio leads to “equiva- 
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lence” for online algorithms with vastly differing 
performance in practice. 

In the next sections we discuss the main al- 
ternatives to and refinements of the competitive 
ratio and highlight their relative benefits and 
drawbacks. 


Key Results 


Relative Worst-Order Ratio 

The relative worst-order ratio [8, 10, 11] com- 
bines some desirable properties of two earlier 
measures, namely, the max/max ratio [6] and 
the random order ratio [15]. Using this measure 
we can directly compare two online algorithms. 
Informally, for a given sequence, it considers the 
worst-case ordering of that sequence for each 
algorithm and compares their behavior as a ratio 
on these orderings. Then it finds among all se- 
quences (not just reorderings) the one that max- 
imizes the ratio above in the worst-case perfor- 
mance. 

Let A and 6 be online algorithms for an 
online minimization problem and let A(/) 
be the cost of A on an input sequence J = 
(i1,i2,...,in). Denote by J, the sequence 
obtained by applying a permutation o to /, 
ie, Ig = (ig,,..-,40,). Define AwW) = 
ming A(z). 


Definition 3 ({11]) Let S;(c) and S2(c) be the 
statements about algorithms A and 6 defined in 
the following way: 


S,(c) : There exists a constant b such that 
Aw(!) <c-Bw(U) + 5 for all J. 


S2(c) : There exists a constant b such that 
Aw()>c-Bw() — b for all I. 


The relative worst-order ratio WR.4,g of an on- 
line algorithm A to algorithm 6 is defined if 
S;(1) or S2(1) holds. In this case A and B 
are said to be comparable. If S,(1) holds, then 
WR.«,6 = sup{r|S2(r)}, and if S2(r) holds, then 
WR.a,w = inf{r|Si(r)}. 


WR 4,6 can be used to compare the qualities 
of A and GB. If WRj~ = 1, then these two 
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algorithms have the same quality with respect 
to this measure. The magnitude of difference 
between WRB and | reflects the difference 
between the behavior of the two algorithms. For 
a minimization problem, A is better than 6 with 
respect to this measure if WRyj jp < 1 and 
vice versa. Boyar and Favrholdt showed that the 
relative worst-order ratio is transitive [8]. 

Note that we can also compare the online 
algorithm A to an optimal offline algorithm OPT. 
The worst-order ratio of A is defined as WR 4 = 
WR.,,opt. For some problems, OPT is the same 
for all order of requests on a given input se- 
quence, and hence, the worst-order ratio is the 
same as the competitive ratio. However, for other 
problems such as paging the order does matter for 
OPT. 

In [10], three online algorithms (FIRST-FIT, 
BEST-FIT, and WORST-FIT) for two variants of 
the seat reservation problem [9] are compared 
using the relative worst-order ratio. The relative 
worst-order ratio when applied to paging algo- 
rithms can be used to differentiate LRU which 
is strictly better than FWF with respect to the 
worst-order ratio, while they have the same com- 
petitive ratio [11]. Similarly, [11] proposes a new 
paging algorithm, retrospective LRU (RLRU), 
and shows that it is better than LRU under this 
measure while not under the competitive ratio. 


Loose Competitiveness 

Loose competitiveness was first proposed in [22] 
and later modified in [25]. It attempts to obtain 
a more realistic measure by observing that first, 
in many real online problems, we can ignore 
those input sequences on which the online al- 
gorithm incurs a cost less than a certain thresh- 
old and, second, many online problems have a 
second resource parameter (e.g., size of cache, 
number of servers) and the input sequences are 
independent of these parameters. In contrast, in 
competitive analysis, the adversary can select 
sequences tailored against those parameters. For 
example, for caching the worst-case input with 
competitive ratio k can only be constructed by 
the adversary if it is aware of the size k of 
the cache. However, in practice the competitive 
ratios of many online paging algorithms have 
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been observed to be constant [25], i.e., indepen- 
dent of k. 

In loose competitiveness we consider an ad- 
versary that is oblivious to the parameter by 
requiring it to give a sequence that is bad for most 
values of the parameter rather than just a specific 
bad value of the parameter. Let A; (/) denote the 
cost of an algorithm A on an input sequence /, 
when the parameter of the problem is k. 


Definition 4 ({25]) An algorithm A is (€,6)- 
loosely c-competitive if, for any input sequence 
I and for any n, at least (1 — 5)n of the 
values kK € {1,2,...,n} satisfy ApJ) < 
max{c - OPT; (/),€ |/ |}. 


Therefore, we ignore the input sequences J which 
cost less than € |/|. Also we require the algorithm 
to be good for at least (1 — 5) fraction of the 
possible parameters. For each online problem, we 
can select the appropriate constants € and 6. The 
following result shows that by this modification 
of the competitive analysis, we can obtain paging 
algorithms with constant performance ratios. 


Theorem 1 ([25]) Every k-competitive paging 
algorithm is (€, 5)-loosely c-competitive for any 
0 <€,6 < 1, andc = (e/5) In(e/e), where e is 
the base of the natural logarithm. 


Diffuse Adversary Model 

The diffuse adversary model [16] tries to refine 
the competitive ratio by restricting the set of legal 
input sequences. In the diffuse adversary model, 
the input is generated according to a distribution 
belonging to a member of a class A of distribu- 
tions. 


Definition 5 Let A be an online algorithm for 
a minimization problem and let A be a class of 
distributions for the input sequences. Then A is 
c-competitive against A, if there exists a constant 
b, such that Eyep A(/) < c-EzepoPt(1) +5, for 
every distribution D € A, where A(/) denotes 
the cost of A on the input sequence J and the 
expectations are taken over sequences that are 
picked according to D. 


In other words, for a given algorithm A, the 
adversary selects the distribution D in A that 
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leads to its worst-case performance in that family. 
If A is highly restrictive, then A knows more 
about the distribution of input sequences and the 
power of adversary is more constrained. When 
A contains all possible distributions, then the 
competitive analysis against A is the same as the 
standard competitive ratio. 

Computing the actual competitive ratio of both 
deterministic and randomized paging algorithms 
against A, is studied in [23, 24]. An estimation 
of the optimal competitive ratio for several algo- 
rithms (such as LRU and FIFO) within a factor 
of 2 is given. Also it is observed that around 
the threshold « ~ 1/k, the best competitive 
ratios against A, are O(Ink). The competitive 
ratios rapidly become constant for values of € 
less than the threshold. For € = @(1/k), i-e., 
values greater than the threshold, the competitive 
ratio rapidly tends to O(k) for deterministic al- 
gorithms while it remains unchanged for random- 
ized algorithms. 

Note that we can also model locality of refer- 
ence using the diffuse adversary model by consid- 
ering only those distributions that are consistent 
with distributions obeying a locality of reference 
principle. In particular Dorrigiv et al. showed 
that for the list update problem MTF is optimal 
in expected cost under any probability distribu- 
tion that has locality of reference monotonicity, 
i.e., a recently accessed item has equal or larger 
probability of being accessed than a less recently 
accessed item [14]. 


Bijective Analysis 

Bijective analysis and average analysis [3] build 
upon the framework of locality of reference by 
[1]. These models directly compare two online 
algorithms without appealing to the concept of 
the offline “optimal” cost. In addition, these mea- 
sures do not evaluate the performance of the 
algorithm on a single “worst-case” request, but 
instead use the cost that the algorithm incurs 
on each and all request sequences. Informally, 
bijective analysis aims to pair input sequences 
for two algorithms A and 6 using a bijection 
in such a way that the cost of A on input o 
is no more than the cost of B on the image 
of o, for all request sequences o of the same 
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length. In this case, intuitively, A is no worse 
than B. On the other hand, average analysis 
compares the average cost of the two algorithms 
over all request sequences of the same length. 
For an online algorithm A and an input sequence 
a, let A(o) be the cost incurred by A on o. 
Denote by /, the set of all input sequences of 
length n. 

We say that an online algorithm A is no worse 
than an online algorithm 6 according to bijective 
analysis if there exists an integer 79 > 1 so that 
for each n > no, there is a bijection b : Z, <= T,, 
satisfying A(o) < B(b(o)) for each o € Z,. We 
denote this by A <, B. 

We say that an online algorithm A is no worse 
than an online algorithm 6 according to average 
analysis if there exists an integer 79 => 1 so that 
for each n > no, Yoyez, AU) <= Vrez, BU). 
We denote this by A <, B. 

Under both bijective analysis and average 
analysis alone, all /azy algorithms (including 
LRU and FIFO, but not FWF) are in fact strongly 
equivalent. This is evidence of an inherent 
difficulty to separate these algorithms in any 
general unrestricted setting. Their superiority 
is seemingly derived from the well-known 
observation that input sequences for paging 
and several other problems show locality of 
reference [12,13]. This means that when a page is 
requested, it is more likely to be requested in the 
near future. Therefore, several models for paging 
with locality of reference have been proposed. 

Hence, the need to combine bijective analysis 
with an assumption of locality of reference model 
such as concave analysis. In this model a request 
sequence has high locality of reference if the 
number of distinct pages in a window of size n 
is small. 

Using this measure Angelopoulos et al. [3] 
show that LRU is never outperformed in any pos- 
sible subpartition on the request sequence space 
induced by concave analysis, while it always 
outperforms any other paging algorithm in at 
least one subpartition of the sequence space. This 
result proves separation between LRU and all 
other algorithms and provides theoretical back- 
ing to the observation that LRU is preferable in 
practice. This is the first deterministic theoretical 
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model to provide full separation between LRU 
and all other algorithms. Recently this result was 
strengthened by Angelopolous and Schweitzer 
[2] where they showed that the separation also 
holds under the stricter bijective analysis (as 
opposed to average analysis) using the concave 
analysis framework. 


Smoothed Competitiveness 

Some algorithms that have very bad worst-case 
performance behave very well in practice. One of 
the most famous examples is the simplex method. 
This algorithm has a very good performance in 
practice but it has exponential worst-case running 
time. Average case analysis of algorithms can 
somehow explain this behavior, but sometimes 
there is no basis to the assumption that the inputs 
to an algorithm are random. 

Smoothed analysis of algorithms [21] tries to 
explain this intriguing behavior without assum- 
ing anything about the distribution of the input 
instances. In this model, we randomly perturb 
(smoothen) the input instances according to a 
probability distribution f and then analyze the 
behavior of the algorithm on these perturbed 
(smoothed) instances. For each input instance T ; 
we compute the neighborhood N a ) of T which 
contains the set of all perturbed instances that 
can be obtained from 7. Then we compute the 
expected running time of the algorithm over all 
perturbed instances in this neighborhood. The 
smoothed complexity of the algorithm is the 
maximum of this expected running time over all 
the input instances. Intuitively, an algorithm with 
a bad worst-case performance can have a good 
smoothed performance if its worst-case instances 
are isolated. Spielman and Teng show [21] that 
the simplex algorithm has polynomial smoothed 
complexity. Several other results are known about 
the smoothed complexity of the algorithms [4, 7, 
18, 20]. 

Becchetti et al. [5] introduced smoothed 
competitive analysis which mirrors competitive 
analysis except that we consider the cost 
of the algorithm on randomly perturbed 
adversarial sequences. As in the analysis of 
the randomized online algorithms, we can 
have either an oblivious adversary or an 
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adaptive adversary. The smoothed competitive 
ratio of an online algorithm A for a mini- 
mization problem can be formally defined as 
follows. 


Definition 6 ((5]) The smoothed competitive ra- 
tio of an algorithm A is defined as 


AZ) 
C= SPE wii) OPT@) |" 


where the supremum is taken over all input in- 
stances J and the expectation is taken over all 
instances J that are obtainable by smoothening 
the input instance 7 according to f in the neigh- 
borhood N(1). 


In [5], they use the smoothed competitive ratio 
to analyze the MULTI-LEVEL FEEDBACK(MLF) 
algorithm for processor scheduling in a time- 
sharing multitasking operating system. This al- 
gorithm has very good practical performance, 
but its competitive ratio is very bad and obtains 
strictly better ratios using the smooth competitive 
analysis than with the competitive ratio. 


Search Ratio 

The search ratio belongs to the family of mea- 
sures in which the offline OPT is weakened. It is 
defined only for the specific case of geometric 
searches in an unknown terrain for a target of 
unknown position. Recall that the competitive 
ratio compares against an all-knowing OPT; in- 
deed, for geometric searches in the competitive 
ratio framework, the OPT is simply a shortest 
path algorithm, while the online search algorithm 
has intricate methods for searching. The search 
ratio instead considers the case where OPT knows 
the terrain but not the position of the target. 
That is, the search ratio compares two search 
algorithms, albeit one more powerful than the 
other. By comparing two instances of like objects, 
the search ratio can be argued to be a more mean- 
ingful measure of the quality of an online search 
algorithm. Koutsopias et al. show that searching 
in trees results the same large competitive ratio 
regardless of the search strategy, yet under the 
search ratio framework, certain algorithms are far 
superior to others [17]. 
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Problem Definition 


Let A be an enumeration algorithm. Suppose that 
A is a recursive type algorithm, i.e., composed 
of a subroutine that recursively calls itself several 
times (or none). Thus, the recursion structure of 
the algorithm forms a tree. We call the subroutine 
or the execution of the subroutine an iteration. 
We here assume that an iteration does not include 
the computation done in the recursive calls gen- 
erated by itself. We regard a series of subroutines 
of different types as an iteration if they form a 
nested recursion. We simply write the set of all 
iterations of an execution of A by ¥. 

When an iteration X recursively calls an iter- 
ation Y, X is called the parent of Y, and Y is 
called a child of X. The root iteration is that with 
no parent. For non-root iteration X, its parent is 
unique and is denoted by P(X). The set of the 
children of X is denoted by C(X). The parent- 
child relation between iterations forms a tree 
structure called a recursion tree, or an enumer- 
ation tree. An iteration is called a leaf iteration if 
it has no child and an inner iteration otherwise. 

For iteration X, an upper bound of the exe- 
cution time (the number of operations) of X is 
denoted by T(X). Here we exclude the computa- 
tion for the output process from the computation 
time. We remind that T(X) is the time for local 
execution time and thus does not include the 
computation time in the recursive calls generated 
by X. For example, when T(X) = O(n”), T(X) 
is written as cn? for some constant c. T* is 
the maximum 7 (X) among all leaf iterations X. 
Here, T* can be either constant or a polynomial 
of the input size. If X is an inner iteration, let 


TOO = Yrae FO). 


Key Results 


We explain methods to amortize the computation 
time of iterations that only requires a local 
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condition and give simple algorithms which 
achieves nontrivial time complexity. On 
enumeration algorithms, it is very hard to grasp 
the global structures of the computation and the 
recursion tree that is coming from the hardness 
of estimating the number of iterations in a 
branch. Instead of that, we approach from local 
amortization from parent and children. When 
we go deep in a recursion tree, the number of 
iterations tends to increase exponentially, and the 
size of the input of each iteration often decreases 
on the other hand. Motivated by this observation, 
we amortize the computation time by moving 
the computation time of each iteration to its 
children from the top to bottom, so that the long 
computation time on upper levels is diffused. 


Amortization by Children 

Suppose that each iteration X takes O((|C(X)|+ 
1)T) time. Note that this implies that a leaf itera- 
tion takes O(T) time. Then, the total computation 
time of the algorithm is O(T Do yey |C(X)| + 
1) = O(T(#| + Xxex COO) = OFX), 
since any iteration is a child of at most one 
iteration. Hence, an iteration takes O(T) time 
on average. Let us see an example on the fol- 
lowing algorithm for enumerating all subsets of 
{1,...,n}. 

We can confirm that the algorithm correctly 
enumerates all subsets without duplications, and 
an iteration X takes O(|C(X)|) time, except 
for the output process. Without amortization, the 
time complexity is O(m) for each iteration, but 
the above amortization reduces it to O(1). Note 
that the output process is shortened by outputting 
each subset by the difference from the previously 
output subset, and by this the accumulated com- 
putation time for output process is also bounded 
by O(1) for each subset. This amortization tech- 
nique is common in many algorithms. Further, 
in the enumeration of spanning trees, the time 


Algorithm EnumSubset (S,, x): 


1 output S 
2 fori := x +1 ton; call EnumSubset 
(S U {i},i + 1) 
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complexity is amortized by not only the chil- 
dren but also the grandchildren [3]. More so- 
phisticated amortization is used in [1,2] for path 
connecting given two vertices and subtrees of 
size k. 


Push-Out Amortization 

When the computation time of an iteration X 
is not proportional |C(X)|, the above amorti- 
zation does not work. In such cases, push-out 
amortization [4-6] can work. We amortize the 
computation time by charging the computation 
time of iterations near by the root of the recursion 
to those in bottom levels, by recursively moving 
the computation time from an iteration to its 
children from top to down. The move is done in 
the following push-out rule. 

Push-out rule (PO rule): Suppose that 
iteration X receives a computation time of 
S(X) from its parent; thus X has computation 
time of S(X) + T(X) in total. Then, we fix 
a (|C(X)|+1)7* of the computation time to X 
and charge (push-out) the remaining computation 
time of quantity S(X) + T(X) — FB (C(X)| + 
1)T* to its children. Each child Z of X receives 
computation time proportional to T(Z), that 
is, S(Z) = (S(X) + T(X) — 2 (IC(X)| + 
Tay 

After the moves in this rule from the top to 
bottom of the recursion tree, each inner itera- 
tion has O((|C(X)| + 1)T*) computation time, 


B 


S(Z) = (S(X) + T(X) - —— 


lA 


aT(X) — BYIC(X)| + DT* 


(T(X)/@-1I) + T(X)- 
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thus O(T*) time per iteration. Moreover, when 
the following push-out condition holds for any 
non-leaf iteration X, each leaf iteration receives 
computation time of O(7*) from its parent; thus 
the computation time per iteration is bounded by 
O(T*). Suppose that a > 1 and B > 0 are two 
constants. 


Push-Out Condition (PO Condition) 

T(X) > aT (X) — BIC(X)| + DT* 

Intuitively, this means that T(X) > aT (X) holds 
after the assignment of the computation time of 
aB(|C(X)| + 1)T* to children and the remaining 
to itself, the inequation. Thus, the computation 
time of one level of recursion intuitively increases 
as the depth, unless there are not so many leaf it- 
erations. These suggest that the total computation 
time spent by middle-level iterations is relatively 
short compared to that by leaf iterations. 


Theorem 1 [f any inner iteration of an enu- 
meration algorithm satisfies PO condition, the 
amortized computation time of an iteration is 


O(T*). 


Proof We state by induction that when we charge 
computation time with PO rule, from the root 
iteration to the leaf iterations, each iteration X 
satisfies S(X) < T(X)/(a — 1). The root it- 
eration satisfies this condition. Suppose that an 
iteration X satisfies it. Then, for any child Z of 
X, we have 


F(x) 


«, T(Z) 
(|C(X)| + 1I)T Fx 
B +, F(Z) 
eat +1)T Fay 
T(Z) 
a—l 


Therefore, any leaf iteration receives O(7*) time 
from its parent, and the statement holds. 

Since PO condition is satisfied, T(X ) = 
aT (X) — B(|C(X)| + 1)T*. Thus, 


aT (X)—BIIC(X)| + DT* TZ) _ TZ) 7 
T(X) 


a-1l17 a-1 


Matching Enumeration 

Let us see an example of designing algorithms 
so that push-out amortization does work. The 
problem is the enumeration of matchings in an 
undirected graph G = (V, EF). A matching is an 
edge set M C E such that any vertex is incident 
to at most one edge in M. A straightforward 


Amortized Analysis on Enumeration Algorithms 


way to enumerate all matchings is to choose an 
edge e and enumerate matchings including e and 
enumerate matchings not including e, recursively. 
This algorithm yields the time complexity of 
O(|V |) for each matching. 

We here consider another way for the enumer- 
ation. We choose a vertex v of the maximum de- 
gree and partition the problem into enumeration 
of matchings including e;, matchings including 
€2, ..., Matchings including ex, and matchings 
including none of e;,..., ex. Here e1,..., eg are 
the edges incident to v. Since any matching has 
at most one edge incident to v, this algorithm 
is complete and makes no duplication. The algo- 
rithm is described as follows. Note that G \ {v} 
denotes the graph obtained by removing vertex v 
and edges incident to v from G. 


Algorithm EnumMatching (G=(V, £), 
M): 
1 if E = M then output M; return 
2 choose a vertex v having the maximum degree in G 
3 call EnumMatching (G \ {v}, M) 


4 for each edge e = (v,u), call EnumMatching 
(G \ tu, v}, M U fe}) 


G \ {u,v} is obtained from G \ {u’, v} in 
O(d(u) + d(u’)) time, where d(u) and d(u’) are 
the degrees of u and wu’, respectively. From this, 
the computation time in step 4 is bounded by the 
sum of degrees of all vertices adjacent to v. Here 
T(X) = c|E| for some c, except for the output 
process. Note that || is the number of edges in 
the graph given to X. 

The input graph of the child generated in step 
3 has |E| — d(v) edges and that in step 4 has 
|E|—d(v) —d(u) + 1 edges. Thus, when d(v) < 
|E|/4, we have T(X) = c((|E|—d(v))4+ (\E|— 
d(v) — d(u) + 1)) = 1.25c|E|. When d(v) > 
|E|/4, |C(X)| => |E|/4. Thus, PO condition 
holds by setting a = 1.25 and choosing a certain 
f. The output process can be shorten as the subset 
enumeration. 


Theorem 2 Matchings of a graph can be enu- 
merated in O(1) time for each matching. 
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Elimination Ordering 

An elimination ordering is a sequence of 
elements obtained by iteratively removing an 
element from an object G with keeping a property 
satisfied, until the object will be empty. Examples 
are perfect elimination ordering and perfect 
sequence. The former is the removal sequence 
of simplicial vertices from a chordal graph, and 
the latter is the removal sequence of cliques from 
a connected chordal graph. Elimination orderings 
can be enumerated by a simple algorithm as 
follows. 


Algorithm EnumElim (G, S): 


1 if G = @ then output S; return 
2 for each element e of G that can be removed, call 


EnumElim (G \ {e}, S U {e}) 


Here we assume that T(X) =  poly(|G|) 
except for output process. The decision problem 
of removing an element from G is naturally 
considered to be solved in O(poly(|G|)) time; 
thus this assumption is natural. 


Theorem 3 /f any G of size larger than some 
constant c has at least two removable elements, 
elimination orderings are enumerated in O(\) 
time for each. 


Proof The statement means that each iteration 
has at least two children, if its computation time is 
not constant. For sufficiently large constant 6, we 
always have poly(|G|) < 2poly(|G| — 1) for any 
|G| > 6. This implies that PO condition always 
holds for these iterations. Oo 
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Problem Definition 


Streaming algorithms aim to summarize a large 
volume of data into a compact summary, by main- 
taining a data structure that can be incrementally 
modified as updates are observed. They allow the 
approximation of particular quantities. The AMS 
sketch is focused on approximating the sum of 
squared entries of a vector defined by a stream of 
updates. This quantity is naturally related to the 
Euclidean norm of the vector and so has many 
applications in high-dimensional geometry and in 
data mining and machine learning settings that 
use vector representations of data. 

The data structure maintains a linear projec- 
tion of the stream (modeled as a vector) with 
a number of randomly chosen vectors. These 
random vectors are defined implicitly by sim- 
ple hash functions, and so do not have to be 
stored explicitly. Varying the size of the sketch 
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changes the accuracy guarantees on the resulting 
estimation. The fact that the summary is a linear 
projection means that it can be updated flexibly, 
and sketches can be combined by addition or 
subtraction, yielding sketches corresponding to 
the addition and subtraction of the underlying 
vectors. 


Key Results 


The AMS sketch was first proposed by Alon, 
Matias, and Szegedy in 1996 [1]. Several re- 
finements or variants have subsequently appeared 
in the literature, for example, in the work of 
Thorup and Zhang [4]. The version presented 
here works by using hashing to map each update 
to one of ¢ counters rather than taking the average 
of ¢ repetitions of an “atomic” sketch, as was 
originally proposed. This hash-based variation is 
often referred to as the “fast AMS” summary. 


Data Structure Description 

The AMS summary maintains an array of counts 
which are updated with each arriving item. It 
gives an estimate of the £2-norm of the vector 
v that is induced by the sequence of updates. 
The estimate is formed by computing the norm of 
each row and taking the median of all rows. Given 
parameters ¢ and 6, the summary uses space 
O(1/e” log 1/6) and guarantees with probability 
of at least 1 — 6 that its estimate is within relative 
é-error of the true £2-norm, ||v||2. 

Initially, v is taken to be the zero vector. A 
stream of updates modifies uv by specifying an 
index i to which an update w is applied, setting 
vj < v; + w. The update weights w can be 
positive or negative. 

The AMS summary is represented as a com- 
pact array C of d x ¢ counters, arranged as d 
rows of length ¢. In each row j, a hash func- 
tion h; maps the input domain U uniformly to 
{1,2,...t}. A second hash function g; maps 
elements from U uniformly onto {—1, +1}. For 
the analysis to hold, we require that g; is four- 
wise independent. That is, over the random choice 
of g; from the set of all possible hash functions, 
the probability that any four distinct items from 
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the domain that get mapped to {—1,+1}4 is 
uniform: each of the 16 possible outcomes is 
equally likely. This can be achieved by using 
polynomial hash functions of the form g;(x) = 
2((ax? + bx? +ex +d mod p) mod 2)—1, 
with parameters a, b,c,d chosen uniformly from 
the prime field p. 

The sketch is initialized by picking the hash 
functions to be used and initializing the array of 
counters to all zeros. For each update operation 
to index i with weight w (which can be either 
positive or negative), the item is mapped to an 
entry in each row based on the hash functions 
h and the update applied to the corresponding 
counter, multiplied by the corresponding value 
of g. That is, foreach 1 < j < d, h;(i) is 
computed, and the quantity wg;(i) is added to 
entry C[j,/;(i)] in the sketch array. Processing 
each update therefore takes time O(d), since 
each hash function evaluation takes constant 
time. 

The sketch allows an estimate of ||v||5, the 
squared Euclidean norm of v, to be obtained. 
This is found by taking the sums of the squares 
of the rows of the sketch and in turn finding 
the median of these sums. That is, for row /, 
it computes Ny C[j, k]? as an estimate and 
takes the median of these d estimates. The query 
time is linear in the size of the sketch, O(td), as 
is the time to initialize a new sketch. Meanwhile, 
update operations take time O(d). 

The analysis of the algorithm follows by con- 
sidering the produced estimate as a random vari- 
able. The random variable can be shown to be 
correct in expectation: its expectation is the de- 
sired quantity, ||v||3. This can be seen by expand- 
ing the expression of the estimator. The resulting 
expression has terms )*; v? but also terms of the 
form v;v; fori # j. However, these “unwanted 
terms” are multiplied by either +1 or —1 with 
equal probability, depending on the choice of the 
hash function g. Therefore, their expectation is 
zero, leaving only ||v||2. To show that it is likely 
to fall close to its expectation, we also analyze 
the variance of the estimator and use Chebyshev’s 
inequality to argue that with constant probabil- 
ity, each estimate is close to the desired value. 
Then, taking the median of sufficient repetitions 
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amplifies this constant probability to be close to 
certainty. 

This analysis shows that the estimate is be- 
tween (1 — ¢)||v||5 and (1 + e)||v||3. Taking the 
square root of the estimate gives a result that 
is between (1 — ¢)!/?|Jul/2 and (1 + €)!/?|Ju|2, 
which means it is between (1 — ¢/2)]||v||2 and 
(1 + e/2)llull2. 

Note that since the updates to the AMS sketch 
can be positive or negative, it can be used to mea- 
sure the Euclidean distance between two vectors 
v and u: we can build an AMS sketch of v and 
one of —u and merge them together by adding the 
sketches. Note also that a sketch of —u can be 
obtained from a sketch of u by negating all the 
counter values. 


Applications 


The sketch can also be applied to estimate the 
inner product between a pair of vectors. A similar 
analysis shows that the inner product of cor- 
responding rows of two sketches (formed with 
the same parameters and using the same hash 
functions) is an unbiased estimator for the inner 
product of the vectors. This use of the sum- 
mary to estimate the inner product of vectors 
was described in a follow-up work by Alon, 
Matias, Gibbons, and Szegedy [2], and the anal- 
ysis was similarly generalized to the fast version 
by Cormode and Garofalakis [3]. The ability to 
capture norms and inner products in Euclidean 
space means that these sketches have found many 
applications in settings where there are high- 
dimensional vectors, such as machine learning 
and data mining. 


URLs to Code and Data Sets 


Sample implementations are widely available in 
a variety of languages. 


C code is given by the MassDAL code 
bank: http://www.cs.rutgers.edu/~muthu/ 
massdalcode-index.html. 
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C++ code given by Marios Hadjieleftheriou 
is available at http://hadjieleftheriou.com/ 
sketches/index.html. 
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Problem Definition 


Multicore processors are commonly equipped 
with one or more levels of cache memory, some 
of which are shared among two or more cores. 
Multiple cores compete for the use of shared 
caches for fast access to their program’s data, 
with the cache usage patterns of a program 
running on one core, possibly affecting the cache 
performance of programs running on other cores. 


Paging 
The management of data across the various levels 
of the memory hierarchy of modern computers is 
abstracted by the paging problem. Paging models 
a two-level memory system with a small and fast 
memory — known as cache — and a large and 
slow memory. Data is transferred between the 
two levels of memory in units known as pages. 
The input to the problem is a sequence of page 
requests that must be made available in cache 
as they are requested. If the currently requested 
page is already present in the cache, then this 
is known as a hit. Otherwise a fault occurs, and 
the requested page must be brought from slow 
memory to cache, possibly requiring the eviction 
of a page currently residing in the cache. An 
algorithm for this problem must decide, upon 
each request that results in a fault with a full 
cache, which page to evict in order to minimize 
the number of faults. Since the decision of which 
page to evict must be taken without information 
of future requests, paging is an online problem. 

The most popular framework to analyze the 
performance of online algorithms is competitive 
analysis [10]: an algorithm A for a minimization 
problem is said to be c-competitive if its cost 
is at most c times that of an optimal algorithm 
that knows the input in advance. Formally, let 
A(r) and OPT(r) denote the costs of A and the 
optimal algorithm OPT on an input r. Then A 
is c-competitive if for all inputs r, A(r) < c- 
OPT(r) + 8, where f is a constant that does not 
depend on r. The infimum of all such values c is 
known as A’s competitive ratio. 

Traditional paging algorithms, like least 
recently used (LRU), evict the page currently 
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in cache that was least recently accessed, or 
first-in-first-out (FIFO), evict the page currently 
in the cache that was brought in the earliest, 
have an optimal competitive ratio equal to the 
cache size. Other optimal eviction policies are 
flush-when-full (FWF) and Clock (see [2] for 
definitions). 


Paging in Multicore Caches 

The paging problem described above can be ex- 
tended to model several programs running simul- 
taneously with a shared cache. For a multicore 
system with p cores sharing one cache, the mul- 
ticore paging problem consists of a set r of p 
request sequences r;,.../p to be served with one 
shared cache of size k pages. At any timestep, 
at most p requests from different processors can 
arrive and must be served in parallel. A paging 
algorithm must decide which pages to evict when 
a fault occurs on a full cache. 

The general model we consider for this prob- 
lem was proposed by Hassidim [6]. This model 
defines the fetching time t of a page as the 
ratio between a cache miss and a cache hit. A 
sequence of requests that suffers a page fault 
must wait t timesteps for the page to be fetched 
into the cache, while other sequences that in- 
cur hits can continue to be served. In addition, 
paging algorithms can decide on the schedule of 
request sequences, choosing to serve a subset of 
the sequences and delay others. In this problem, 
the goal of a paging algorithm is to minimize 
the makespan. L6pez-Ortiz and Salinger [8] pro- 
posed a slightly different model in which paging 
algorithms are not allowed to make schedul- 
ing decisions and must serve requests as they 
arrive. Furthermore, instead of minimizing the 
makespan, they propose two different goals: min- 
imize the number of faults and decide if each 
of the sequences can be served with a num- 
ber of faults below a given threshold. We con- 
sider both these settings here and the following 
problems: 


Definition 1 (Min-Makespan) Given a set r of 
request sequences rj,...,/p to be served with a 
cache of size k, minimize the timestep at which 
the last request among all sequences is served. 
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Definition 2 (Min-Faults) Given a set r of re- 
quests r1,...,/p to be served with a cache of 
size k, minimize the total number of faults when 
serving r. 


Definition 3 (Partial-Individual-Faults) Given 
a set r of requests r1,...,/p to be served with a 
cache of size k, a timestep ¢, and a bound 5; for 
each sequence, decide whether r can be served 
such that at time ¢ the number of faults on 7; is at 
most 5; for all 1 <i < p. 


Key Results 


Online Paging 
For both the models of Hassidim and Lépez-Ortiz 
and Salinger ,no online algorithm has been shown 
to be competitive, while traditional algorithms 
that are competitive in the classic paging setting 
are not competitive in the multicore setting. Has- 
sidim shows that LRU and FIFO have a com- 
petitive ratio in the Min-Makespan problem of 
(ct), which is the worst possible for any online 
algorithm in this problem. 

In the following, k is the size of the shared 
cache of an online algorithm, and h is the size of 
the shared cache of the optimal offline. 


Theorem 1 ((6]) For anya > 1, the competitive 
ratio of LRU (or FIFO) is @(t/a), when h = 
k/a. In particular, if we give LRU a constant 
factor resource augmentation, the ratio is ©(t). 
There is a setting with this ratio with just [a] + 1 
cores. 


The bad competitive ratio stems from the abil- 
ity of the offline algorithm to schedule sequences 
one after the other one so that each sequence can 
use the entire cache. Meanwhile, LRU or FIFO 
will try to serve all sequences simultaneously, not 
having enough cache to satisfy the demands of 
any sequence. A similar result is shown in [8] for 
the Min-Faults problem, even in the case in which 
the optimal offline cannot explicitly schedule the 
input sequences. In this case, given a set of 
request sequences that alternate periods of high 
and low cache demand, the optimal offline algo- 
rithm can delay some sequences through faults in 
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order to align periods of high demands of some 
sequences with periods of low demands of others 
and with a total cache demand below capacity. 
As in the previous lower bound, traditional on- 
line algorithms will strive to serve all sequences 
simultaneously, incurring only faults in periods of 
high demand. 


Theorem 2 ((8]) Let A be any of LRU, FIFO, 
Clock, or FWF, let p = 4, let n be the total 
length of request sequences, and assume t > 1. 
The competitive ratio of A is at least Q(,/nt/k) 
when the optimal offline’s cache ish > k/2+ 
3p/2. If A has no resource augmentation, the 


competitive ratio is at least 82(,/ntp/k). 


These results give light about the characteris- 
tics required by online policies to achieve better 
competitive ratios. L6pez-Ortiz and Salinger an- 
alyzed paging algorithms for Min-Faults , sepa- 
rating the cache partition and the eviction policy 
aspects. They defined partitioned strategies as 
those that give a portion of the cache to each 
core and serve the request sequences with a given 
eviction policy exclusively with the given part of 
the cache. The partition can be static or dynamic. 
They also define shared strategies as those in 
which all requests are served with one eviction 
policy using a global cache. The policies consid- 
ered in Theorems | and 2 above are examples of 
shared strategies. 

If a cache partition is determined externally by 
a scheduler or operating system, then traditional 
eviction policies can achieve a good performance 
when compared to the optimal eviction policy 
with the same partition. More formally, 


Theorem 3 Let A be any marking or conserva- 
tive paging algorithm and B be any dynamically 
conservative algorithm [9] (these classes include 
LRU, FIFO, and Clock). Let S and D be any 
static and dynamic partition functions and let 
OPT, and OPT g denote the optimal eviction 
policies given S and D, respectively. Then, for 
all inputs r, A(r) < k - OPTs(r) and B(r) < 
pk - OPTq(r). 


The result above relies on a result by Peserico 
[9] which states that dynamically conservative 
policies are k-competitive when the size of the 
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cache varies throughout the execution of the 
cache instance. 

When considering a strategy as a partition plus 
eviction policy, it should not be a surprise that 
a strategy involving a static partition cannot be 
competitive. In fact, even a dynamic partition that 
does not change the sizes of the parts assigned 
to its cores often enough cannot be competitive. 
There are sequences for which the optimal static 
partition with the optimal paging policy in each 
part can incur a number of faults that is arbitrarily 
large compared to an online shared strategy using 
LRU. A similar result applies to dynamic parti- 
tions that change a sublinear number of times. 
These results suggest that in order to be com- 
petitive, an online strategy needs to be either 
shared or partitioned with a partition that changes 
often. 


Offline Paging 

We now consider the offline multicore paging 
problem. Hassidim shows that Min-Makespan is 
NP-hard for k = p/3 and [7] extends it to arbi- 
trary k and p. In the model without scheduling 
of [8], Partial-Individual-Faults, a variant of the 
fault minimization problem, is also shown to be 
NP-hard. It is not known, however, whether Min- 
Faults is NP-hard as well. Interestingly, Partial- 
Individual-Faults remains NP-hard when t = 1 
(and hence a fault does not delay the affected 
sequence with respect to other sequences). In 
contrast, in this case, Min-Faults can be solved 
simply by evicting the page that will be requested 
furthest in the future, as in classic paging. On 
the positive side, the following property holds for 
both Min-Makespan and Min-Faults (on disjoint 
sequences). 


Theorem 4 There exist optimal algorithms for 
Min-Makespan and Min-Faults that, upon a fault, 
evict the page that is the furthest in the future for 
some sequence. 


This result implies that multicore paging re- 
duces to determining the optimal dynamic par- 
tition of the cache: upon a fault, the part of the 
cache of one sequence is reduced (unless this 
sequence is the same as the one which incurred 
the fault), and the page whose next request is 
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furthest is the future in this sequence should be 
evicted. 

Finally, in the special case of a constant num- 
ber of processors p and constant delay t, Min- 
Makespan admits a polynomial time approxi- 
mation scheme (PTAS), while Min-Faults and 
Partial-Individual-Faults admit exact polynomial 
time algorithms. 


Theorem 5 ((6]) There exists an algorithm that, 
given an instance of Min-Makespan with optimal 
makespan m, returns a solution with makespan 
(1 + €)m. The running time is exponential on p, 
t, and \/e. 


Theorem 6 ([8]) Az instance of Min-Faults with 
Pp requests of total length n, with p = O(\) and 
t = O(1) can be solved in time O(n€+? 1”). 


Theorem 7 ([8]) An of — Partial- 
Individual-Faults with p requests of total length 
n, with p = O(1) and t = O(\), can be solved 
in time O(n¥+2P+1,P+1), 


instance 


Other Models 

Paging with multiple sequences with a shared 
cache has also been studied in other models 
[1,3—5], even prior to multicores. In these models, 
request sequences may be interleaved; however, 
only one request is served at a time and all 
sequences must wait upon a fault affecting one 
sequence. 

In the application-controlled model of Cao 
et al. [3], each process has full knowledge of its 
request sequence, while the offline algorithm also 
knows the interleaving of requests. As opposed to 
the models in [6, 8], the interleaving is fixed and 
does not depend on the decisions of algorithms. 
It has been shown that for p sequences and a 
cache of size k, no online deterministic algorithm 
can have a competitive ratio better than p + 1 
in the case where sequences are disjoint [1] and 
B log (“4) otherwise [7]. On the other hand, 
there exist algorithms with competitive ratios 
2(p + 1) [1,3] and max{10, p + 1} [7] for the 
disjoint case, and 2p(In(ek/p) + 1) [7] for the 
shared case. 
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Open Problems 


Open problems in multicore paging are finding 
competitive online algorithms, determining 
the exact complexity of Min-Faults, obtaining 
approximation algorithms for Min-Makespan for 
a wider range of parameters, and obtaining faster 
exact offline algorithms for Min-Faults and 
Partial-Individual-Faults. Another challenge in 
multicore paging is concerned with modeling 
the right features of the multicore architecture 
while enabling the development of meaningful 
algorithms. Factors to consider are cache 
coherence, limited parallelism in other shared 
resources (such as bus bandwidth), different 
cache associativities, and others. 
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Problem Definition 


The problem considered here is multiple 
sequence access via cache memory. Consider 
the following pattern of memory accesses. k 
sequences of data, which are stored in disjoint 
arrays and have a total length of NV, are accessed 
as follows: 
fort := 1 to N do 

select a sequence s; € {1,...k} 

work on the current element of sequence s; 

advance sequence s; to the next element. 
The aim is to obtain exact (not just asymp- 
totic) closed form upper and lower bounds for 
this problem. Concurrent accesses to multiple 
sequences of data are ubiquitous in algorithms. 
Some examples of algorithms which use this 
paradigm are distribution sorting, k-way merg- 
ing, priority queues, permuting, and FFT. This 
entry summarizes the analyses of this problem in 
[5, 8]. 


Analyzing Cache Misses 


Caches, Models, and Cache Analysis 

Modern computers have hierarchical memory 
which consists of registers, one or more levels 
of caches, main memory, and external memory 
devices such as disks and tapes. Memory size 
increases, but the speed decreases with distance 
from the CPU. Hierarchical memory is designed 
to improve the performance of algorithms by 
exploiting temporal and spatial locality in data 
accesses. 

Caches are modeled as follows. A cache has 
m blocks each of which holds B data elements. 
The capacity of the cache is M = mB. Data is 
transferred between one level of cache and the 
next larger and slower memory in blocks of B 
elements. A cache is organized as s = m/a sets 
where each set consists of a blocks. Memory at 
address xB, referred to as memory block x, can 
only be placed in a block in set x mod s. If 
a = 1, the cache is said to be direct mapped, and 
if a = 5$, it is said to be fully associative. 

If memory block x is accessed and it is not in 
cache, then a cache miss occurs, and the data in 
memory block x is brought into cache, incurring 
a performance penalty. In order to accommodate 
block x, it is assumed that the least recently used 
(LRU) or the first used (FIFO) block from the 
cache set x mod s is evicted, and this is referred 
to as the replacement strategy. Note that a block 
may be evicted from a set, even though there may 
be unoccupied blocks in other sets. 

Cache analysis is performed for the number of 
cache misses for a problem with N data elements. 
To read or write N data elements, an algorithm 
must incur Q(N/B) cache misses. These are the 
compulsory or first reference misses. In the multi- 
ple sequence access via cache memory problem, 
for given values of M and B, one aim is to find 
the largest k such that there are O(N/B) cache 
misses for the N data accesses. It is interesting 
to analyze cache misses for the important case of 
direct mapped cache and for the general case of 
set-associative caches. 

A large number of algorithms have been 
designed on the external memory model [11], 
and these algorithms optimize the number of 
data transfers between main memory and disk. 
It seems natural to exploit these algorithms 
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to minimize cache misses, but due to the 
limited associativity of caches, this is not 
straightforward. In the external memory model, 
data transfers are under programmer control, 
and the multiple sequence access problem has 
a trivial solution. The algorithm simply chooses 
k < M./Be, where Be is the block size and 
Me is the capacity of the main memory in the 
external memory model. For k < M,/Be, there 
are O(N/B-) accesses to external memory. Since 
caches are hardware controlled, the problem 
becomes nontrivial. For example, consider the 
case where the starting addresses of k > a equal 
length sequences map to the ith element of the 
same set, and the sequences are accessed in a 
round-robin fashion. On a cache with an LRU or 
FIFO replacement strategy, all sequence accesses 
will result in a cache miss. Such pathological 
cases can be overcome by randomizing the 
starting addresses of the sequences. 


Related Problems 
A very closely related problem is where accesses 
to the sequences are interleaved with accesses 


N k 
Ui sk+ FT (1+@-2), 
B m 
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to a small working array. This occurs in ap- 
plications such as distribution sorting or matrix 
multiplication. 

Caches can emulate external memory with an 
optimal replacement policy [3, 10]; however, this 
requires some constant factor more memory. 
Since the emulation techniques are software 
controlled and require modification to the 
algorithm, rather than selection of parameters, 
they work well for fairly simple algorithms 


[6]. 


Key Results 


Theorem 1 ((5]) Given an a-way set-associative 
cache with m cache blocks, s = m/a cache 
sets, cache blocks size B, and LRU or FIFO 
replacement strategy. Let Ug denote the expected 
number of cache misses in any schedule of 
N_ sequential accesses to k sequences with 
starting addresses that are at least (a + 1)-wise 
independent: 


() 
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where a = a(a) = a/(a!)!/*, Pai(n, p,a) = 
> (7) p'(1—p)" is the cumulative binomial 
i>a 
probability, and 8B := 1+ a([ax]) where x = 
x(a) = imff0 < z < 1:24 2/a([az]) = 
1}. 


Here 1 < a < e and B(1) = 2, B(oo) = 
1+ e & 3.71. This analysis assumes that an ad- 
versary schedules the accesses to the sequences. 
For the lower bound, the adversary initially ad- 
vances sequence s; fori = 1...k by Xj; el- 
ements, where the X; is chosen uniformly and 
independently from {0, M — 1}. The adversary 
then accesses the sequences in a round-robin 
manner. 

The k in the upper bound accounts for a 
possible extra block that may be accessed due 
to randomization of the starting addresses. The 
—kM term in the lower bound accounts for the 
fact that cache misses cannot be counted when 
the adversary initially winds forwards the se- 
quences. 

The bounds are of the form pN + c, where c 
does not depend on WN and p is called the cache 
miss probability. Letting r = k/m, the ratio 
between the number of sequences and the number 
of cache blocks, the bounds for the cache miss 
probabilities in Theorem | become [5] 


Pi < (1/B)A + (B—1)r), (7) 
r 

p1 > (1/B) (1 (B= vi). (8) 

Pa < (1/B)(1 + (B — 1)(ra)* + ra + ar) 

; (9) 


Pa < (1/B)(1 + B- 1)(rB)* + rB for r < 7 
(10) 


k 
Pa(1/B) (: + (B— 1)(ra)? (1 _ .) ‘ 


(1) 


The 1/B term accounts for the compulsory or 
first reference miss, which must be incurred in 


Analyzing Cache Misses 


order to read a block of data from a sequence. 
The remaining terms account for conflict misses, 
which occur when a block of data is evicted from 
cache before all its elements have been scanned. 
Conflict misses can be reduced by restricting 
the number of sequences. As r approaches zero, 
the cache miss probabilities approach 1/B. In 
general, inequality (4) states that the number of 
cache misses is O(N/B) if r < 1/(2B) and 
(B — 1)(rB)* = O(1). Both of these condi- 
tions are satisfied if k < m/ max(B!/4, 2B). So, 
there are O(N/B) cache misses provided k = 
O(m/B"/2), 

The analysis shows that for a direct-mapped 
cache, where a = 1, the upper bound is a factor 
of r + 1 above the lower bound. For a > 2, 
the upper bounds and lower bounds are close if 
(1—1/s)* = and (a + w)r < 1, and both these 
conditions are satisfied ifk < s. 

Rahman and Raman [8] obtain closer up- 
per and lower bounds for average case cache 
misses assuming the sequences are accessed uni- 
formly randomly on a direct-mapped cache. Sen 
and Chatterjee [10] also obtain upper and lower 
bounds assuming the sequences are randomly 
accessed. Ladner, Fix, and LaMarca have ana- 
lyzed the problem on direct-mapped caches on 
the independent reference model [4]. 


Multiple Sequence Access with Additional 
Working Set 

As stated earlier in many applications, accesses 
to sequences are interleaved with accesses to an 
additional data structure, a working set, which 
determines how a sequence element is to be 
treated. Assuming that the working set has size 
at most sB and is stored in contiguous memory 
locations, the following is an upper bound on the 
number of cache misses: 


Theorem 2 ({5]) Let Uz denote the bound on the 
number of cache misses in Theorem 1 and define 
Up = N. With the working set occupying w 
conflict-free memory blocks, the expected number 
of cache misses arising in the N accesses to the 
sequence data, and any number of accesses to the 
working set, is bounded by w + (1 — w/s)Uqg + 
2(w/s) Ug 1- 


Analyzing Cache Misses 


On a direct-mapped cache, fori = 1,...,k, if 
sequence i is accessed with probability p; inde- 
pendently of all previous accesses and is followed 
by an access to element i of the working set, then 
the following are upper and lower bounds for the 
number of cache misses: 


Theorem 3 ([8]) Jn a direct-mapped cache with 
m cache blocks, each of B elements, if sequence 
i, fori =,...,k, is accessed with probability 
p; and block j of the working set, for j = 

.,k/B, is accessed with probability P;, then 
the expected number of cache misses in N se- 
quence accesses is at most N(ps + pw) #kKUA + 
1/B), where 
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Theorem 4 ([8]) Jn a direct-mapped cache with 
m cache blocks each of B elements, if sequence 
i, fori = 1,...,k, is accessed with probability 
Di = 1/m, then the expected number of cache 
misses in N sequence accesses is at least Nps + 
k, where 


1 kQm—k k(k —3m 1 k B(k —m) + 2m — 3k 
Ps = =F ( 2 c 2 _ — 2 ae 2 3 3 2h 
B 2m 2Bm 2Bm 2B*m Bm =e Di + Pj 
k k ok 
(B-1)? pil =p;) B=1 Pi _—B 
+ - = O(e ). 
B3m? apy (Di + 2 a2 rarer 


The lower bound ignores the interaction with 
the working set, since this can only increase the 
number of cache misses. 


In Theorems 3 and 4, py, is the probability 
of a cache miss for a sequence access, and in 
Theorem 3, p,, is the probability of a cache miss 
for an accesses to the working set. 

If the sequences are accessed uniformly ran- 
domly, then using Theorems 3 and 4, the ratio 
between the upper and lower bound is 3/(3 —r), 
where r = k/m. So for uniformly random data, 
the lower bound is within a factor of about 3/2 of 
the upper bound when k < m and is much closer 
when k < m. 


Applications 


Numerous algorithms have been developed on the 
external memory model which access multiple 
sequences of data, such as merge sort, distribution 


sort, priority queues, and radix sorting. These 
analyses are important as they allow initial pa- 
rameter choices to be made for cache memory 
algorithms. 


Open Problems 


The analyses assume that the starting addresses 
of the sequences are randomized, and current ap- 
proaches to allocating random starting addresses 
waste a lot of virtual address space [5]. An open 
problem is to find a good online scheme to ran- 
domize the starting addresses of arbitrary length 
sequences. 


Experimental Results 


The cache model is a powerful abstraction of real 
caches; however, modern computer architectures 
have complex internal memory hierarchies, with 
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registers, multiple levels of caches, and transla- 
tion lookaside buffers (TLB). Cache miss penal- 
ties are not of the same magnitude as the cost 
of disk accesses, so an algorithm may perform 
better by allowing conflict misses to increase in 
order to reduce computation costs and compul- 
sory misses, by reducing the number of passes 
over the data. This means that in practice, cache 
analysis is used to choose an initial value of k 
which is then fine-tuned for the platform and 
algorithm [1,2,6,7,9, 12, 13]. 

For distribution sorting, in [6], a heuristic was 
considered for selecting k, and equations for 
approximate cache misses were obtained. These 
equations were shown to be very accurate in 
practice. 


Cross-References 


Cache-Oblivious Model 
Cache-Oblivious Sorting 
External Sorting and Permuting 
1/O-Model 


Recommended Reading 


1. Bertasi P, Bressan M, Peserico E (2011) Psort, yet 
another fast stable sorting software. ACM J Exp 
Algorithmics 16:Article 2.4 

2. Bingmann T, Sanders P (2013) Parallel string sample 
sort. In: Proceedings of the 21st European symposium 
on algorithms (ESA’ 13), Sophia Antipolis. Springer, 
pp 169-180 

3. Frigo M, Leiserson CE, Prokop H, Ramachandran S 
(1999) Cache-oblivious algorithms. In: Proceedings 
of the 40th annual symposium on foundations of com- 
puter science (FOCS’99), New York. IEEE Computer 
Society, Washington, DC, pp 285-297 

4. Ladner RE, Fix JD, LaMarca A (1999) Cache perfor- 
mance analysis of traversals and random accesses. In: 
Proceedings of the 10th annual ACM-SIAM sympo- 
sium on discrete algorithms (SODA’99), Baltimore. 
Society for Industrial and Applied Mathematics, 
Philadelphia, pp 613-622 

5. Mehlhorn K, Sanders P (2003) Scanning multiple 
sequences via cache memory. Algorithmica 35:75—93 

6. Rahman N, Raman R (2000) Analysing cache effects 
in distribution sorting. ACM J Exp Algorithmics 
5:Article 14 

7. Rahman N, Raman R (2001) Adapting radix sort 
to the memory hierarchy. ACM J Exp Algorithmics 
6:Article 7 


Applications of Geometric Spanner Networks 


8. Rahman N, Raman R (2007) Cache analysis of 
non-uniform distribution sorting algorithms. http:// 
www.citebase.org/abstract?id=oai:arXiv.org:0706. 
2839. Accessed 13 Aug 2007 Preliminary version in: 
Proceedings of the 8th annual European symposium 
on algorithms (ESA’00), Saarbriicken. Lecture 
notes in computer science, vol 1879. Springer, 
Berlin/Heidelberg, pp 380-391 (2000) 

9. Sanders P (2000) Fast priority queues for cached 
memory. ACM J Exp Algorithmics 5:Article 7 

10. Sen S, Chatterjee S (2000) Towards a theory of cache- 
efficient algorithms. In: Proceedings of the 11th an- 
nual ACM-SIAM symposium on discrete algorithms 
(SODA’ 00), San Francisco. Society for Industrial and 
Applied Mathematics, pp 829-838 

11. Vitter JS (2001) External memory algorithms and 
data structures: dealing with massive data. ACM 
Comput Surv 33, 209-271 

12. Wassenberg J, Sanders P (2011) Engineering a Multi- 
core Radix Sort, In: Proceedings of the 17th inter- 
national conference, Euro-Par (2) 2011, Bordeaux. 
Springer, pp 160-169 

13. Wickremesinghe R, Arge L, Chase JS, Vitter JS 
(2002) Efficient sorting using registers and caches. 
ACM J Exp Algorithmics 7:9 


Applications of Geometric Spanner 
Networks 


Joachim Gudmundsson!, Giri Narasimhan?" 
and Michiel Smid? 

'DMiST, National ICT Australia Ltd, 
Alexandria, Australia 

School of Information Technologies, University 
of Sydney, Sydney, NSW, Australia 
3Department of Computer Science, Florida 
International University, Miami, FL, USA 
4School of Computing and Information 
Sciences, Florida International University, 
Miami, FL, USA 

*School of Computer Science, Carleton 
University, Ottawa, ON, Canada 


Keywords 
Approximation algorithms; Cluster graphs; 
Dilation; Distance oracles; Shortest paths; 


Spanners 


Applications of Geometric Spanner Networks 


Years and Authors of Summarized 
Original Work 


2002; Gudmundsson, Levcopoulos, Narasimhan, 
Smid 

2005; Gudmundsson, Narasimhan, Smid 

2008; Gudmundsson, Levcopoulos, Narasimhan, 
Smid 


Problem Definition 


Given a geometric graph in d-dimensional space, 
it is useful to preprocess it so that distance 
queries, exact or approximate, can be answered 
efficiently. Algorithms that can report distance 
queries in constant time are also referred to as 
“distance oracles.” With unlimited preprocessing 
time and space, it is clear that exact distance 
oracles can be easily designed. This entry sheds 
light on the design of approximate distance 
oracles with limited preprocessing time and space 
for the family of geometric graphs with constant 
dilation. 


Notation and Definitions 

If p and q are points in R4, then the notation | pq| 
is used to denote the Euclidean distance between 
p and q; the notation 6g (p,q) is used to denote 
the Euclidean length of a shortest path between 
p and q in a geometric network G. Given a 
constant t > 1, a graph G with vertex set S is 
a t-spanner for S if dg(p,q) < t|pq| for any 
two points p and q of S. A t-spanner network 
is said to have dilation (or stretch factor) t. A 
(1 + €)-approximate shortest path between p and 
q is defined to be any path in G between p and 
q having length A, where dg(p,q) < A < 
(1+¢)5G (p,q). For a comprehensive overview of 
geometric spanners, see the book by Narasimhan 
and Smid [14]. 

All networks considered in this entry are sim- 
ple and undirected. The model of computation 
used is the traditional algebraic computation tree 
model with the added power of indirect address- 
ing. In particular, the algorithms presented here 
do not use the non-algebraic floor function as a 
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unit-time operation. The problem is formalized 
below. 


Problem 1 (Distance Oracle) Given an arbi- 
trary real constant € > 0, and a geometric 
graph G in d-dimensional Euclidean space with 
constant dilation ¢, design a data structure that 
answers (1 + €)-approximate shortest path length 
queries in constant time. 


The data structure can also be applied to 
solve several other problems. These include (a) 
the problem of reporting approximate distance 
queries between vertices in a planar polygonal 
domain with “rounded” obstacles, (b) query 
versions of closest pair problems, and (c) 
the efficient computation of the approximate 
dilations of geometric graphs. 


Survey of Related Research 

The design of efficient data structures for 
answering distance queries for general (non- 
geometric) networks was considered by Thorup 
and Zwick [17] (unweighted general graphs), 
Baswanna and Sen [3] (weighted general graphs, 
i.e., arbitrary metrics), and Arikati et al. [2] and 
Thorup [16] (weighted planar graphs). 

For the geometric case, variants of the problem 
have been considered in a number of papers (for 
a recent paper, see, e.g., Chen et al. [5]). Work 
on the approximate version of these variants can 
also be found in many articles (for a recent 
paper, see, e.g., Agarwal et al. [1]). The focus 
of this entry is the results reported in the work 
of Gudmundsson et al. [10-13]. Similar results 
on distance oracles were proved subsequently for 
unit disk graphs [7]. Practical implementations of 
distance oracles in geometric networks have also 
been investigated [15]. 


Key Results 


The main result of this entry is the existence 
of approximate distance oracle data structures 
for geometric networks with constant dilation 
(see Theorem 4 below). As preprocessing, the 
network is “pruned” so that it only has a linear 
number of edges. The data structure consists of 


88 


a series of “cluster graphs” of increasing coarse- 
ness, each of which helps answer approximate 
queries for pairs of points with interpoint dis- 
tances of different scales. In order to pinpoint 
the appropriate cluster graph to search in for a 
given query, the data structure uses the bucketing 
tool described below. The idea of using cluster 
graphs to speed up geometric algorithms was first 
introduced by Das and Narasimhan [6] and later 
used by Gudmundsson et al. [9] to design an 
efficient algorithm to compute (1 + e)-spanners. 
Similar ideas were explored by Gao et al. [8] for 
applications to the design of mobile networks. 


Pruning 

If the input geometric network has a superlinear 
number of edges, then the preprocessing step 
for the distance oracle data structure involves 
efficiently “pruning” the network so that it has 
only a linear number of edges. The pruning may 
result in a small increase of the dilation of the 
spanner. The following theorem was proved by 
Gudmundsson et al. [12]. 


Theorem1 Let t > 1 and & > 0 be real 
constants. Let S be a set of n points in R4, 
and let G = (S,E) be a t-spanner for S with 
m edges. There exists an algorithm to compute 
in O(m + nlogn) time, a (1 + «’)-spanner 
of G having O(n) edges and whose weight is 
O(wt(MST(S))). 


The pruning step requires the following technical 
theorem proved by Gudmundsson et al. [12]. 


Theorem 2 Let S be a set of n points in R4, and 
let c => 7 be an integer constant. In O(n logn) 
time, it is possible to compute a data structure 
D(S) consisting of: 


I. A sequence L1,Lp,.. 
where £ = O(n), and 

2. A sequence S,, S2,...,S¢ of subsets of S such 
that )~t_, |S;| = O(n), 


., Le of real numbers, 


such that the following holds. For any two distinct 
points p and q of S, it is possible to compute in 
O(1) time an index i with 1 < i < £ and two 
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points x and y in S; such that (a) Li/n°t! < 
|xy| < L; and (b) both |px| and |\qy| are less 
than |xy|/n°—. 


Despite its technical nature, the above theorem 
is of fundamental importance to this work. In 
particular, it helps to deal with networks where 
the interpoint distances are not confined to a 
polynomial range, i.e., there are pairs of points 
that are very close to each other and very far from 
each other. 


Bucketing 

Since the model of computation assumed here 
does not allow the use of floor functions, an 
important component of the algorithm is a “buck- 
eting tool” that allows (after appropriate prepro- 
cessing) constant-time computation of a quantity 
referred to as BINDEX, which is defined to be the 
floor of the logarithm of the interpoint distance 
between any pair of input points. 


Theorem 3 Let S be a set of n points in R% that 
are contained in the hypercube (0, n*)4, for some 
positive integer constant k, and let ¢ be a positive 
real constant. The set S can be preprocessed 
in O(n logn) time into a data structure of size 
O(n), such that for any two points p and q of 
S, with |pq| > 1, it is possible to compute 
in constant time the quantity BINDEX,(p,q) = 
llogi+4¢ Pall. 


The constant-time computation mentioned in 
Theorem 3 is achieved by reducing the prob- 
lem to one of answering least common ancestor 
queries for pairs of nodes in a tree, a problem for 
which constant-time solutions were devised most 
recently by Bender and Farach-Colton [4]. 


Main Results 

Using the bucketing and the pruning tools, and 
using the algorithms described by Gudmundsson 
et al. [13], the following theorem can be proved. 


Theorem4 Let t > 1 ande > 0O be real 
constants. Let S be a set of n points in R4, and let 
G = (S, E) be a t-spanner for S with m edges. 
The graph G can be preprocessed into a data 
structure of size O(n logn) in time O(mn logn), 
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such that for any pair of query points p,q € S, 
it is possible to compute a (1 + €)-approximation 
of the shortest path distance in G between p and 
q in O(1) time. Note that all the big-Oh notations 
hide constants that depend on d, t, and &. 


Additionally, if the traditional algebraic model 
of computation (without indirect addressing) is 
assumed, the following weaker result can be 
proved. 


Theorem 5 Let S be a set of n points in R4, 
and let G = (S,E) be a t-spanner for S, for 
some real constant t > 1, having m edges. 
Assuming the algebraic model of computation, 
in O(mloglogn + n log? n) time, it is possible 
to preprocess G into a data structure of size 
O(n logn), such that for any two points p and q 
in S, a (1+8)-approximation of the shortest-path 
distance in G between p and q can be computed 
in O(log log n) time. 


Applications 


As mentioned earlier, the data structure described 
above can be applied to several other problems. 
The first application deals with reporting distance 
queries for a planar domain with polygonal ob- 
stacles. The domain is further constrained to be 
t-rounded, which means that the length of the 
shortest obstacle-avoiding path between any two 
points in the input point set is at most f times the 
Euclidean distance between them. In other words, 
the visibility graph is required to be a f-spanner 
for the input point set. 


Theorem 6 Let F be a t-rounded collection of 
polygonal obstacles in the plane of total com- 
plexity n, where t is a positive constant. One can 
preprocess F in O(n logn) time into a data struc- 
ture of size O(n logn) that can answer obstacle- 
avoiding (1+ ¢)-approximate shortest path length 
queries in time O(logn). If the query points are 
vertices of F, then the queries can be answered 


in O(1) time. 


The next application of the distance oracle data 
structure includes query versions of closest pair 
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problems, where the queries are confined to spec- 
ified subset(s) of the input set. 


Theorem 7 Let G = (S,E) be a geometric 
graph on n points and m edges, such that G is 
a t-spanner for S, for some constant t > 1. One 
can preprocess G in time O(m + nlogn) into a 
data structure of size O(n logn) such that given 
a query subset S' of S, a (1 + &)-approximate 
closest pair in S' (where distances are measured 
in G) can be computed in time O(|S’| log |S’|). 


Theorem 8 Let G = (S,E) be a geometric 
graph on n points and m edges, such that G is 
a t-spanner for S, for some constant t > 1. One 
can preprocess G in time O(m + nlogn) into a 
data structure of size O(n logn) such that given 
two disjoint query subsets X and Y of S,a(1+ 
€)-approximate bichromatic closest pair (where 
distances are measured in G) can be computed in 
time O((|X| + |¥ |) log(|X| + |Y|)). 


The last application of the distance oracle data 
structure includes the efficient computation of the 
approximate dilations of geometric graphs. 


Theorem 9 Given a geometric graph on n ver- 
tices with m edges, and given a constant C that 
is an upper bound on the dilation t of G, it is 
possible to compute a (1 + €)-approximation to t 
in time O(m + nlogn). 


Open Problems 
Two open problems remain unanswered: 


1. Improve the space utilization of the distance 
oracle data structure from O(n logn) to O(n). 

2. Extend the approximate distance oracle data 
structure to report not only the approximate 
distance but also the approximate shortest path 
between the given query points. 
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Problem Definition 


The Problem and the Model 

A static data structure problem consists of a set 
of data D, a set of queries Q, a set of answers A, 
and a function f : D x Q = A. The goal is to 
store the data succinctly, so that any query can be 
answered with only a few probes to the data struc- 
ture. Static membership is a well-studied problem 
in data structure design [2, 6,9, 10, 16, 17,23]. 


Definition 1 (Static Membership) In the static 
membership problem, one is given a subset 
S of at most n keys from a universe U = 
{1,2,...,m}. The task is to store S so that 
queries of the form “Is u in S?” can be answered 
by making few accesses to the memory. 


Approximate Dictionaries 


A natural and general model for studying any 
data structure problem is the cell probe model 
proposed by Yao [23]. 


Definition 2 (Cell Probe Model) An (s,w, tf) 
cell probe scheme for a static data structure 
problem f : Dx QO —> A has two components: a 
storage scheme and a query scheme. The storage 
scheme stores the data d € D as a Table T[d] of 
s cells, each cell of word size w bits. The storage 
scheme is deterministic. Given a query g € Q, 
the query scheme computes f(d,q) by making 
at most t probes to T[d], where each probe reads 
one cell at a time, and the probes can be adaptive. 
In a deterministic cell probe scheme, the query 
scheme is deterministic. In a randomized cell 
probe scheme, the query scheme is randomized 
and is allowed to err with a small probability. 


Buhrman et al. [3] study the complexity of 
the static membership problem in the bitprobe 
model. The bitprobe model is a variant of the 
cell probe model in which each cell holds just a 
single bit. In other words, the word size w is 1. 
Thus, in this model, the query algorithm is given 
bitwise access to the data structure. The study of 
the membership problem in the bitprobe model 
was initiated by Minsky and Papert in their book 
Perceptrons [16]. However, they were interested 
in average-case upper bounds for this problem, 
while this work studies worst-case bounds for the 
membership problem. 

Observe that if a scheme is required to 
store sets of size at most n, then it must use 
at least [log }0;<, (")] number of bits. If 
n< m2) | this implies that the scheme must 
store Q(nlogm) bits (and therefore use Q(n) 
cells). The goal in [3] is to obtain a scheme that 
answers queries, uses only constant number of 
bitprobes, and at the same time uses a table of 
O(n log m) bits. 


Related Work 

The static membership problem has been well 
studied in the cell probe model, where each cell 
is capable of holding one element of the universe. 
That is, w = O(logm). In a seminal paper, 
Fredman, Komlos, and Szemerédi [10] proposed 
a scheme for the static membership problem in 
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the cell probe model with word size O(log m) 
that used a constant number of probes and a table 
of size O(n). This scheme will be referred to as 
the FKS scheme. Thus, up to constant factors, 
the FKS scheme uses optimal space and number 
of cell probes. In fact, Fiat et al. [9], Brodnik 
and Munro [2], and Pagh [17] obtain schemes 
that use space (in bits) that is within a small 
additive term of [log }); <, (”)] and yet answer 
queries by reading at most a constant number 
of cells. Despite all these fundamental results 
for the membership problem in the cell probe 
model, very little was known about the bitprobe 
complexity of static membership prior to the 
work in [3]. 


Key Results 


Buhrman et al. investigate the complexity of the 
static membership problem in the bitprobe model. 
They study 


¢ Two-sided error randomized schemes that are 
allowed to err on positive instances as well as 
negative instances (i.e., these schemes can say 
“No” with a small probability when the query 
element u is in the set S and “Yes” when it is 
not). 

¢ One-sided error randomized schemes where 
the errors are restricted to negative instances 
alone (i.e., these schemes never say “No” 
when the query element u is in the set S); 

¢ Deterministic schemes in which no errors are 
allowed. 


The main techniques used in [3] are based 
on 2-colorings of special set systems that are 
related to r-cover-free family of sets considered 
in [5, 7, 11]. The reader is referred to [3] for 
further details. 


Randomized Schemes with Two-Sided 

Error 

The main result in [3] shows that there are ran- 
domized schemes that use just one bitprobe and 
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yet use space close to the information theoretic 
lower bound of Q(n log m) bits. 


Theorem 1 For any 0 < ¢€ < ; 


scheme for storing subsets S of size at most n 


there is a 


of a universe of size m using O (3 log m) bits 
so that any membership query “Is u € S?” can 
be answered with error probability at most € by a 
randomized algorithm which probes the memory 
at just one location determined by its coin tosses 
and the query element u. 


Note that randomization is allowed only in the 
query algorithm. It is still the case that for each 
set S, there is exactly one associated data struc- 
ture T(S). It can be shown that deterministic 
schemes that answer queries using a single bit- 
probe need m bits of storage (see the remarks 
following Theorem 4). Theorem | shows that, by 
allowing randomization, this bound (for constant 
€) can be reduced to O(n log m) bits. This space 
is within a constant factor of the information 
theoretic bound for n sufficiently small. Yet, 
the randomized scheme answers queries using a 
single bitprobe. 

Unfortunately, the construction above does not 
permit us to have sub-constant error probability 
and still use optimal space. Is it possible to im- 
prove the result of Theorem | further and design 
such a scheme? Buhrman et al. [3] shows that this 
is not possible: if € is made sub-constant, then the 
scheme must use more than log m space. 


n 1 
Theorem 2 Suppose es. © = a Then, 
any two-sided €-error randomized scheme which 


answers queries using one bitprobe must use 


space Q (ates log m). 


Randomized Schemes with One-Sided 

Error 

Is it possible to have any savings in space if the 
query scheme is expected to make only one-sided 
errors? The following result shows it is possible 
if the error is allowed only on negative instances. 


Theorem3 For any 0 < € < i: 


scheme for storing subsets S of size at most n of 


there is a 


a universe of size m using O ((2)* log m) bits 
so that any membership query “Is u € S?” can 
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be answered with error probability at most € by 
a randomized algorithm which makes a single 
bitprobe to the data structure. Furthermore, if 
u € S, the probability of error is 0. 


Though this scheme does not operate with op- 
timal space, it still uses significantly less space 
than a bitvector. However, the dependence on n is 
quadratic, unlike in the two-sided scheme where 
it was linear. Buhrman et al. [3] shows that this 
scheme is essentially optimal: there is necessarily 
a quadratic dependence on 2 for any scheme with 
one-sided error. 


Theorem 4 Suppose -73 < € < i. Consider 
the static membership problem for sets S of 
size at most n from a universe of size m. Then, 
any scheme with one-sided error € that answers 
queries using at most one bitprobe must use 


Q (os log m) bits of storage. 

Remark I One could also consider one-probe 
one-sided error schemes that only make errors on 
positive instances. That is, no error is made for 
query elements not in the set S. In this case, [3] 
shows that randomness does not help at all: such 
a scheme must use m bits of storage. 


The following result shows that the space 
requirement can be reduced further in one-sided 
error schemes if more probes are allowed. 


Theorem 5 Suppose 0 < 5 < 1. There is 
a randomized scheme with one-sided error n~® 
that solves the static membership problem using 
O (n'*8 logm) bits of storage and O (}) bit- 
probes. 


Deterministic Schemes 

In contrast to randomized schemes, Buhrman 
et al. show that deterministic schemes exhibit a 
time-space tradeoff behavior. 


Theorem 6 Suppose a deterministic scheme 

stores subsets of size n from a universe of 

size m using s bits of storage and answers 

membership queries with t bitprobes to memory. 
m 2s 

Then, (”") < MaXxj<nr ( F ). 


This tradeoff result has an interesting conse- 
quence. Recall that the FKS hashing scheme is 
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a data structure for storing sets of size at most 
n from a universe of size m using O(n logm) 
bits, so that membership queries can be answered 
using O(log m) bitprobes. As a corollary of the 
tradeoff result, [3] show that the FKS scheme 
makes an optimal number of bitprobes, within a 
constant factor, for this amount of space. 


Corollary 1 Let € > 0,c => 1 be any constants. 
There is a constant 6 > 0 so that the following 
holds. Letn < m!~ and let a scheme for storing 
sets of size at most n of a universe of size m 
as data structures of at most cnlogm bits be 
given. Then, any deterministic algorithm answer- 
ing membership queries using this structure must 
make at least 6 log m bitprobes in the worst case. 


From Theorem 6, it also follows that any 
deterministic scheme that answers queries using ¢ 
bitprobes must use space at least Q (tm!/‘n!~1/1) 
in the worst case. The final result shows the 
existence of schemes which almost match the 
lower bound. 


Theorem 7 /. There is a nonadaptive scheme 
that stores sets of size at most n from a 
universe of sizem using O (numr** ) bits and 
answers queries using 2t + 1 bitprobes. This 
scheme is non-explicit. 

2. There is an explicit adaptive scheme that 
Stores sets of size at most n from a universe 
of size m using O(m'/‘nlogm) bits and 
answers queries using O(logn+loglogm)+t 
bitprobes. 


Power of Few Bitprobes 

In this section, we highlight some of recent re- 
sults for this problem subsequent to [3] and en- 
courage the reader to read the corresponding 
references for more details. Most of these results 
focus on the power of deterministic schemes with 
a small number of bitprobes. 

Let S(m,n,t) denote the minimum number of 
bits of storage needed by a deterministic scheme 
that answers queries using ¢ (adaptive) bitprobes. 
In [3], it was shown that S(m,n,1) = m and 
S(m,n,5) = o(m) forn = o (m1/3) (Theo- 
rem 7, Part 1). This leads us to a natural question: 


93 


Is S(m,n,t) = o(m) for t =2, 3, and 4 and under 
what conditions on n? 

Initial progress for the case t = 2 was made 
by [18] who considered the simplest case: n = 2. 
They showed that S(m,2,2) = O (m?/3). 
It was later shown in [19] that S(m,2,2) = 
Q (m*4/ ae The upper bound result of [18] was 
improved upon by the authors of [1] who showed 
that S(m,n,2) = o(m) if n = o(logm). 
Interestingly, a matching lower bound was shown 
recently in [12]: SQ@m,n,2) = o(m) only if 
n = o(logm). 

Strong upper bounds were obtained by [1] for 
the case tf = 3 andt = 4. They showed that 
S(m,n,3) = o(m) whenever n = o(m). They 
also showed that S(m,n,4) = o(m) forn = 
o(m) even if the four bitprobes are nonadaptive. 
Recently, it was shown in [14] that S(m,2,3) < 
7Tm?/5, This work focuses on explicit schemes for 
n=2andt > 3. 

Finally, we end with two remarks. Our prob- 
lem for the case n = ©(m) has been studied by 
Viola [22]. A recent result of Chen, Grigorescu, 
and de Wolf [4] studies our problem in the pres- 
ence of adversarial noise. 


Applications 


The results in [3] have interesting connections to 
questions in coding theory and communication 
complexity. In the framework of coding theory, 
the results in [3] can be viewed as constructing 
locally decodable source codes, analogous to the 
locally decodable channel codes of [13]. The- 
orems 1-4 can also be viewed as giving tight 
bounds for the following communication com- 
plexity problem (as pointed out in [15]): Alice 
gets u € {1,...,m}, Bob gets S C {1,...,m} of 
size at most 7, and Alice sends a single message 
to Bob after which Bob announces whether u € 
S. See [3] for further details. 
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Problem Definition 


This problem is concerned with obtaining a com- 
pact data structure capable of efficiently reporting 
approximate shortest path distance queries in 
a given undirected edge-weighted graph G = 
(V, E). If the query time is independent (or nearly 
independent) of the size of G, we refer to the 
data structure as an approximate distance oracle 
for G. For vertices u and v in G, we denote by 
dg (u, v) the shortest path distance between u and 
v in G. For a given stretch parameter 6 > 1, we 
call the oracle 6-approximate if for all vertices u 
and v in G, dg(u,v) < dg(u,v) < Sdg(u,v), 
where dg(u, v) is the output of the query for u 
and v. Hence, we allow estimates to be stretched 
by a factor up to 6 but not shrunk. 


Approximate Distance Oracles with Improved Query Time 


Key Results 


A major result in the area of distance oracles is 
due to Thorup and Zwick [4]. They gave, for 
every integer k > 1, a (2k — 1)-approximate 
distance oracle of size O (kn!*1/*) and query 
time O(k), where n is the number of vertices of 
the graph. This is constant query time when k 
is constant. Corresponding approximate shortest 
paths can be reported in time proportional to 
their length. Mendel and Naor [3] asked the 
question of whether query time can be improved 
to a universal constant (independent also of k) 
while keeping both size and stretch small. They 
obtained O (n!+!/*) size and O(1) query time 
at the cost of a constant-factor increase in stretch 
to 128k. Unlike the oracle of Thorup and Zwick, 
Mendel and Naor’s oracle is not path reporting. 

In [5], it is shown how to improve the query 
time of Thorup-Zwick to O(logk) without 
increasing space or stretch. This is done while 
keeping essentially the same data structure but 
applying binary instead of linear search in so- 
called bunch structures that were introduced by 
Thorup and Zwick [4]; the formal definition 
will be given below. Furthermore, it is shown 
in [5] how to improve the stretch of the oracle of 
Mendel and Naor to (2 + €)xk for an arbitrarily 
small constant € > 0 while keeping query time 
constant (bounded by 1/e). This improvement is 
obtained without an increase in space except for 
large values of k close to logn (only values of k 
less than logn are interesting since the Mendel- 
Naor oracle has optimal O(”) space and O(1) 
query time for larger values). Below, we sketch 
the main ideas in the improvement of Thorup- 
Zwick and of Mendel-Naor, respectively. 


Oracle with O(log k) Query Time 

The oracle of Thorup and Zwick keeps a hier- 
archy of sets of sampled vertices V = Ag D 
Ay D Ao... D Ax = GO, where fori = 
1,...,4 — 1, A; is obtained by picking each 
element of A;—; independently with probability 
n—'/*. Define pi(u) as the vertex in A; closest 
to u. The oracle precomputes and stores for each 
vertex u € V the bunch B,, defined as 


95 


k-1 
LU (vu € Ai \ Aitilde@, v) 


i=0 


By = 


< dg(u, pi+1(u))}. 


See Fig.1 for an illustration of a bunch. The 
distance dg (u, v) for each v € B, is precomputed 
as well. 

Now, to answer a query for a vertex pair 
(u, v), the oracle performs a linear search through 
bunches B, and B,. Pseudocode is given in 
Fig. 2. It is clear that query time is O(k), and it 
can be shown that the estimate output in line 6 
has stretch 2k — 1. 


Approximate Distance Oracles with Improved Query 
Time, Fig. 1 A bunch B,, in a complete Euclidean graph 
with k = 3. Black vertices belong to Ao, grey vertices to 
Aj, and white vertices to Az. Line segments connect u to 
vertices of B, 


Algorithm dist,(u, v) 

1. w — po(u); 7 —0 

2. while w ¢ By 

3. gegtl 

4. (u,v) — (v, u) 

5. w+ pj(u) 

6. return dg(w, u) + dg(w, v) 


Approximate Distance Oracles with Improved Query 
Time, Fig. 2 Answering a distance query, starting at 
sample level i 
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We can improve query time to O(logk) 
by instead doing binary search in the bunch 
structures. A crucial property of the Thorup- 
Zwick oracle is that every time the test in line 
2 succeeds, dg(u, p;(u)) increases by at most 
dg(u, v), and this is sufficient to prove 2k — 1 
stretch. In particular, if the test succeeds two 
times in a row, dg (u, pj+2(u))—dg(u, p;(u)) < 
2dg(u, v), where 7 is even. If we can check that 
dg lu, pj4+2(u)) — detu, pj) < 2detu,v) 
for all smaller even indices j’, we may start the 
query algorithm at index 7 instead of index 0. 
Since we would like to apply binary search, pick 
J to be (roughly) k/2. It suffices to check only 
one inequality, namely, the one with the largest 
value dg(u, pj'42(u)) — dg u, p;(u)). Note 
that this value depends only on u and k, so we 
can precompute the index j’ with this largest 
value. In the query phase, we can check in O(1) 
time whether dg (u, p j742(u)) —dg(u, pj (u)) < 
2dg(u, v). If the test succeeds, we can start the 
query at j, and hence, we can recurse on indices 
between j and k — 1. Conversely, if the test fails, 
it means that the test in line 2 fails for either j’ or 
j’ +1. Hence, the query algorithm of Thorup and 
Zwick terminates no later than at index j’ + 1, 
and we can recurse on indices between 0 and 
j' + 1. In both cases, the number of remaining 
indices is reduced by a factor of at least 2. Since 
each recursive call takes O(1) time, we thus 
achieve O(log k) query time. 

Since the improved oracle is very similar to the 
Thorup-Zwick oracle, it is path reporting, i.e., it 
can report approximate paths in time proportional 
to their length. 


Oracle with Constant Query Time 

The second oracle in [5] can be viewed as a 
hybrid between the oracles of Thorup-Zwick and 
of Mendel-Naor. An initial estimate is obtained 
by querying the Mendel-Naor oracle. This esti- 
mate has stretch at most 128k, and it is refined 
in subsequent iterations until the desired stretch 
(2+ €)k is obtained. In each iteration, the current 
estimate is reduced by a small constant factor 
greater than 1 (depending on €). Note that after 
a constant number of iterations, the estimate will 
be below the desired stretch, but it needs to be 
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ensured that it is not below the shortest path 
distance. 

In each iteration, the hybrid algorithm at- 
tempts to start the Thorup-Zwick query algorithm 
at a step corresponding to this estimate. If this can 
be achieved, only a constant number of steps of 
this query algorithm need to be executed before 
the desired stretch is obtained. Conversely, if 
the hybrid algorithm fails to access the Thorup- 
Zwick oracle in any iteration, then by a prop- 
erty of the bunch structures, it is shown that 
the current estimate is not below the shortest 
path distance. Hence, the desired stretch is again 
obtained. 

An important property of the Mendel-Naor 
oracle needed above is that the set dyn of all 
different values the oracle can output has size 
bounded by O (nith/k), This is used in the 
hybrid algorithm as follows. In a preprocessing 
step, values from dyn are ordered in a list £ 
together with additional values corresponding to 
the intermediate estimates that the hybrid algo- 
rithm can consider in an iteration. Updating the 
estimate in each iteration then corresponds to a 
linear traversal of part of £. Next, each vertex 
p; of each bunch structure B, of the Thorup- 
Zwick oracle is associated with the value in the 
list closest to dg(u, p;). For each element of 
£, a hash table is kept for the bunch vertices 
associated with that element. It can be shown that 
this way of linking the oracle of Thorup-Zwick 
and Mendel-Naor achieves the desired. 


Applications 


The practical need for efficient algorithms to 
answer the shortest path (distance) queries in 
graphs has increased significantly over the years, 
in large part due to emerging GPS navigation 
technology and other route planning software. 
Classical algorithms like Dijkstra do not scale 
well as they may need to explore the entire graph 
just to answer a single query. As road maps are 
typically of considerable size, obtaining compact 
distance oracles has received a great deal of 
attention from the research community. 
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Open Problems 


A widely believed girth conjecture of Erdés [2] 
implies that an oracle with stretch 2k — 1, size 
O tars), and query time O(1) would be op- 
timal. Obtaining such an oracle (preferably one 
that is path reporting) is a main open problem in 
the area. Some progress has recently been made: 
Chechik [1] gives an oracle (not path reporting) 
with stretch 2k — 1, size O (kn!*1/*), and O(1) 
query time. 
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Problem Definition 


In graph theory, a matching in a graph is a 
set of edges without common vertices, while a 
perfect matching is one in which all vertices 
are associated with matching edges. In a graph 
G = (V,E,w), where V is the set of vertices, 
E is the set of edges, and w E > Ris 
the edge weight function, the maximum matching 
problem determines the matching M in G which 
maximizes w(M) = °c, w(e). Note that a 
maximum matching is not necessarily perfect. 
The maximum cardinality matching (MCM) prob- 
lem means the maximum matching problem for 
w(e) = 1 for all edges. Otherwise, it is called the 
maximum weight matching (MWM). 


Algorithms for Exact MWM 

Although the maximum matching problem has 
been studied for decades, the computational com- 
plexity of finding an optimal matching remains 
quite open. Most algorithms for graph matchings 
use the concept of augmenting paths. An alternat- 
ing path (or cycle) is one whose edges alternate 
between M and E\M. An alternating path P is 


augmenting if P begins and ends at free vertices, 


that is, M@ P “ (M\ P)U(P\M) isa matching 


with cardinality |M @ P| = |M|+ 1. Therefore, 
the basic algorithm finds the maximum cardinal- 
ity matching by finding an augmenting path in the 
graph and adding it the matching each time, until 
no more augmenting paths exist. The running 
time for the basic algorithm will be O(mn) where 
m = |E| andn = |V|. The major improvement 
over this for bipartite graphs is the Hopcroft-Karp 
algorithm [10]. It finds a maximal set of vertex 
disjoint shortest augmenting paths in each step 
and shows that the length of shortest augmenting 
paths will increase each time. The running time 
of the Hopcroft-Karp algorithm is O(m./n). Its 
corresponding algorithm for general graphs is 
given by Micali and Vazirani [14]. 

For the maximum weight matching (MWM) 
and maximum weight perfect matching (MWPM), 
the most classical algorithm is the Hungarian 
algorithm [12] for bipartite graphs and the 
Edmonds algorithm for general graphs [6, 7]. 
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For fast implementations, Gabow and Tarjan [8] 
gave bit-scaling algorithms for MWM _ in 
bipartite graphs running in O(m,/nlog(nN)) 
time, where the edge weights are integers 
in [—N,...,N]. Then, they also gave its 
corresponding algorithm for general graphs [9]. 
Extending [15], Sankowski [18] gave an O(Nn®) 
MWM algorithm for bipartite graphs (here, w < 
2.373 denotes the exponential of the complexity 
of fast matrix multiplication (FMM) [2, 20]), 
while Huang and Kavitha [11] obtained a similar 
time bound for general graphs. We can see these 
time complexities are still far from linear, which 
shows the importance of fast approximation 
algorithms. 


Approximate Matching 

Let a 6-MWM be a matching whose weight is 
at least a 6 fraction of the maximum weight 
matching, where 0 < 6 < 1, and let 6-MCM be 
defined analogously. 

It is well known that the greedy algorithm — 
iteratively chooses the maximum weight edge not 
incident to previously chosen edges — produces 
a 3-MWM. A Straightforward implementation of 
this algorithm takes O(7m log n) time. Preis [3, 17] 
gave a 53-MWM algorithm running in linear time. 
Vinkemeier and Hougardy [19] and Pettie and 
Sanders [16] proposed several (4 = €) -MWM al- 
gorithms (see also [13]) running in O(m log €~!) 
time; each is based on iteratively improving a 
matching by identifying sets of short weight- 
augmenting paths and cycles. 


Key Results 


Approximate Maximum Cardinality 

Matching 

In fact, the Hopcroft-Karp algorithm [10] for bi- 
partite graphs and Micali-Vazirani [14] algorithm 
for general graphs both imply a (1 — «)-MCM 
algorithm in O(e—!m) time. We can search for a 
maximal set of vertex disjoint shortest augment- 
ing paths for k steps, and the matching obtained 


is a (1 — ¢¢7)-MCM. 


Approximate Matching 


Theorem 1 ({10,14]) Ina general graph G, the 
(1 — €)-MCM algorithm can be found in time 
O(e—!m). 


Approximate Maximum Weighted 

Matching 

In 2014, Duan and Pettie [5] give the first (1 —€)- 
MWMM algorithm for arbitrary weighted graphs 
whose running time is linear. In particular, we 
show that such a matching can be found in 
O(me—! loge) time, improving a preliminary 
result of O(me~? log? n) running time by the 
authors in 2010 [4]. This result leaves little room 
for improvement. The main results are given in 
the following two theorems: 


Theorem 2 ([5]) Ina general graph G with inte- 
ger edge weights between [0, N], a (1 —€)-MWM 
can be computed in time O(me“! log N). 


Theorem 3 ([5]) Jn a general graph G with real 
edge weights, a (1 — €)-MWM can be computed 
in time O(me! log €7!). 


Unlike previous algorithms of approximation 
ratios of 1/2 [3, 17] or 2/3 [16, 19], the new 
algorithm does not find weight-augmenting paths 
and cycles directly, but follows a primal-dual 
relaxation on the linear programming formulation 
of MWM. This relaxed complementary slack- 
ness approach relaxes the constraint of the dual 
variables by a small amount, so that the itera- 
tive process of the dual problem will converge 
to an approximate solution much more quickly. 
While it takes O(./n) iterations of augmenting 
to achieve a perfect matching, we proved that we 
only need O(log N/€) iterations to achieve a (1 — 
€)-approximation. Also, we make the relaxation 
“dynamic” by shrinking the relaxation when the 
dual variables decrease by one half, so that finally 
the relaxation is at most € times the edge weight 
on each matching edge and very small on each 
nonmatching edge, which gives an approximate 
solution. 


Applications 


Graph matching is a fundamental combinatorial 
problem that has a wide range of applications in 
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many fields, and it can also be building blocks 
of other algorithms, such as the Christofides al- 
gorithm [1] for approximate traveling salesman 
problem. The approximate algorithm for maxi- 
mum weight matching described above has linear 
running time, much faster than the Hungarian 
algorithm [12] and Edmonds [6,7] algorithm. It is 
also much simpler than the Gabow-Tarjan scaling 
algorithms [8,9] of O(m./n) running time. Thus, 
it has a great impact both in theory and in real- 
world applications. 
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Problem Definition 


Given a text string T = tytz2...t, and a regular 
expression R of length m denoting language, 
£(R) over an alphabet & of size o, and given a 
distance function among strings d and a threshold 
k, the approximate regular expression matching 
(AREM) problem is to find all the text positions 
that finish a so-called approximate occurrence of 
R in T, that is, compute the set {7, di, 1, < i < 
j, dP € LCR), d(P, ti, ...,t)) < k} T,R, 
and k are given together, whereas the algorithm 
can be tailored for a specific d. 

This entry focuses on the so-called weighted 
edit distance, which is the minimum sum of 
weights of a sequence of operations converting 
one string into the other. The operations are inser- 
tions, deletions, and substitutions of characters. 
The weights are positive real values associated 
to each operation and characters involved. The 
weight of deleting a character c is written w(c > 
€), that of inserting c is written w(e — c), 
and that of substituting c by c # cv is written 
w(c —c/). It is assumed w(c — c) = 0 for 
allc € & U é and the triangle inequality, 
that is, w(x > y) + wy > Zz) = w(x > 2Z) for 
any x,y,z,€ DY U {e}. As the distance may be 
asymmetric, it is also fixed that d(A,B) is the 
cost of converting A into B. For simplicity and 
practicality, m = o(n) is assumed in this entry. 


Key Results 


The most versatile solution to the problem [3] is 
based on a graph model of the distance computa- 
tion process. Assume the regular expression R is 
converted into a nondeterministic finite automa- 
ton (NFA) with O(m) states and transitions using 
Thompson’s method [8]. Take this automaton 
as a directed graph G(V,E) where edges are 
labeled by elements in © U {e}. A directed and 


Approximate Regular Expression Matching 


weighted graph G is built to solve the AREM 
problem. G is formed by putting n+1 copies 
of G, Go, Gj,..., Gp and connecting them with 
weights so that the distance computation reduces 
to finding shortest paths in G. 

More formally, the nodes of G are {vj, v € V, 
O< i <n},so that v; is the copy of node v € V in 
graph G; . For each edge u + vin E,ce due, 
the following edges are added to graph G: 


uj > v;, With weight w(c > ®), 


0O<i <n. 
uj > Uj4+1, With weight w(e > tj +1), 


0O<i <n. 


uj > Vi+1, 


with weight w(c > t +1), 


Assume for simplicity that G has initial state s 
and a unique final state f (this can always be 
arranged). As defined, the shortest path in G from 
So to fy gives the smallest distance between T 
and a string in £(R). In order to adapt the graph 
to the AREM problem, the weights of the edges 
between s; and s; + ; are modified to be zero. 

Then, the AREM problem is reduced to com- 
puting shortest paths. It is not hard to see that G 
can be topologically sorted so that all the paths 
to nodes in G; are computed before all those to 
G; + 1. This way, it is not hard to solve this short- 
est path problem in O(mnlog m) time and O(m) 
space. Actually, if one restricts the problem to the 
particular case of network expressions, which are 
regular expressions without Kleene closure, then 
G has no loops and the shortest path computation 
can be done in O(mn) time, and even better on 
average [2]. 

The most delicate part in achieving O(mn) 
time for general regular expressions [3] is to 
prove that, given the types of loops that arise in 
the NFAs of regular expressions, it is possible to 
compute the distances correctly within each G; 
by (a) computing them in a topological order of 
G; without considering the back edges introduced 
by Kleene closures, (b) updating path costs by 
using the back edges once, and (c) updating path 
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costs once more in topological order ignoring 
back edges again. 


Theorem 1 (Myers and Miller [3]) There exists 
an O(mn) worst-case time solution to the AREM 
problem under weighted edit distance. 


It is possible to do better when the weights 
are integer-valued, by exploiting the unit-cost 
RAM model through a four-Russian technique 
[10]. The idea is as follows. Take a small subex- 
pression of R, which produces an NFA that will 
translate into a small subgraph of each G; . At 
the time of propagating path costs within this 
automaton, there will be a counter associated to 
each node (telling the current shortest path from 
Sq). This counter can be reduced to a number in 
[0, k + 1], where k + 1 means “more than k.” 
If the small NFA has r states, r[log 2(k + 2)] 
bits are needed to fully describe the counters 
of the corresponding subgraph of G;. Moreover, 
given an initial set of values for the counters, 
it is possible to precompute all the propagation 
that will occur within the same subgraph of G;, 
in a table having 2’!!°22 + 2)! entries, one per 
possible configuration of counters. It is sufficient 
that r < alog ,42n for some a < | to 
make the construction and storage cost of those 
tables o(n). With the help of those tables, all the 
propagation within the subgraph can be carried 
out in constant time. Similarly, the propagation 
of costs to the same subgraph at G; + 1 can also 
be precomputed in tables, as it depends only on 
the current counters in G; and on text character 
t; + 1, for which there are only o alternatives. 

Now, take all the subtrees of R of maximum 
size not exceeding r and preprocess them with 
the technique above. Convert each such subtree 
into a leaf in R labeled by a special character 
aa, associated to the corresponding small NFA 
A. Unless there are consecutive Kleene closures 
in R, which can be simplified as R* * = R*, 
the size of R after this transformation is O(m/r). 
Call R/ the transformed regular expression. One 
essentially applies the technique of Theorem | to 
Ry, taking care of how to deal with the special 
leaves that correspond to small NFAs. Those 
leaves are converted by Thompson’s construction 
into two nodes linked by an edge labeled ay. 
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When the path cost propagation process reaches 
the source node of an edge labeled a4 with cost 
c, one must update the counter of the initial state 
of NFA A toc (or k + 1 if c > k). One then 
uses the four-Russians table to do all the cost 
propagation within A in constant time and finally 
obtain, at the counter of the final state of A, the 
new value for the target node of the edge labeled 
aa in the top-level NFA. Therefore, all the edges 
(normal and special) of the top-level NFA can be 
traversed in constant time, so the costs at G; can 
be obtained in O(mn/r) time using Theorem 1. 
Now one propagates the costs to G; + 1, using the 
four-Russians tables to obtain the current counter 
values of each subgraph A in G; + 1. 


Theorem 2 (Wu et al. [10]) There exists an 
O(n + mn/log % + 2n) worst-case time solution to 
the AREM problem under weighted edit distance 
if the weights are integer numbers. 


Applications 


The problem has applications in computational 
biology, to find certain types of motifs in DNA 
and protein sequences. See [1] for a more de- 
tailed discussion. In particular, PROSITE pat- 
terns are limited regular expressions rather pop- 
ular to search protein sequences. PROSITE pat- 
terns can be searched for with faster algorithms in 
practice [7]. The same occurs with other classes 
of complex patterns [6] and network expressions 


[2]. 


Open Problems 


The worst-case complexity of the AREM 
problem is not fully understood. It is of 
course (2(n), which has been achieved for 
mlog(k + 2) = O(log n), but it is not known 
how much can this be improved. 


Experimental Results 


Some experiments are reported in [5]. For small 
m and k, and assuming all the weights are 1 
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(except w(c — c) = 0), bit-parallel algorithms 
of worst-case complexity O(kn(m/log n)7) [4,9] 
are the fastest (the second is able to skip some text 
characters, depending on R). For arbitrary inte- 
ger weights, the best choice is a more complex 
bit-parallel algorithm [5] or the four-Russians 
based one [10] for larger m and k. The original 
algorithm [3] is slower, but it is the only one 
supporting arbitrary weights. 


URL to Code 


A recent and powerful software package 
implementing AREM is TRE (http://laurikari. 
net/tre), which supports edit distance with 
different costs for each type of operation. 
Older packages offering efficient AREM are 
agrep [9] (https://github.com/Wikinaut/agrep) for 
simplified weight choices and nrgrep [4] (http:// 
www.dcc.uchile.cl/~gnavarro/software). 


Cross-References 


Approximate String Matching is a simplifica- 
tion of this problem, and the relation between 
graph G here and matrix C there should be 
apparent. 

Regular Expression Matching is the simplified 
case where exact matching with strings in £(R) 
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Problem Definition 


Given a text string T = tit2...t, and a pattern 
string P = pi p2... Pm, both being sequences 
over an alphabet » of size o, and given a distance 
function among strings d and a threshold k, the 
approximate string matching (ASM) problem is to 
find all the text positions that finish the so-called 
approximate occurrence of P in 7, that is, com- 
pute the set {j,di,1 <i < j,d(P,t)...tj) < 
k}. In the sequential version of the problem, 7, P, 
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and & are given together, whereas the algorithm 
can be tailored for a specific d. 

The solutions to the problem vary widely de- 
pending on the distance d used. This entry fo- 
cuses on a very popular one, called Levenshtein 
distance or edit distance, defined as the minimum 
number of character insertions, deletions, and 
substitutions necessary to convert one string into 
the other. It will also pay some attention to other 
common variants such as indel distance, where 
only insertions and deletions are permitted and 
is the dual of the longest common subsequence 
Ics (d(A, B) = |A| + |B| — 2- Ics(A, B)), and 
Hamming distance, where only substitutions are 
permitted. 

A popular generalization of all the above is 
the weighted edit distance, where the operations 
are given positive real-valued weights and the 
distance is the minimum sum of weights of a 
sequence of operations converting one string into 
the other. The weight of deleting a character c 
is written w(c — 6), that of inserting c is 
written w(e — c), and that of substituting c 
by c’ # c is written w(c — c’). It is assumed 
w(c — c) = 0 and the triangle inequality, that 
is, w(x > y) + w(y > Zz) = w(x => 2) for any 
x,y,Z,€ )° Ufe}. As the distance may now be 
asymmetric, it is fixed that d(A, B) is the cost 
of converting A into B. Of course, any result for 
weighted edit distance applies to edit, Hamming, 
and indel distances (collectively termed unit-cost 
edit distances) as well, but other reductions are 
not immediate. 

Both worst- and average-case complexity are 
considered. For the latter, one assumes that pat- 
tern and text are randomly generated by choosing 
each character uniformly and independently from 
&. For simplicity and practicality, m = o(n) is 
assumed in this entry. 


Key Results 


The most ancient and versatile solution to 
the problem [13] builds over the process of 
computing weighted edit distance. Let A = 
a\d2...dm and B = byb2...by be two strings. 
Let C[0...m,0...n] be a matrix such that 
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Cli, 7] = d(a,...a;,b,...b;). Then, it holds 
C[0, 0] = 0 and 


Ci, j] = min(C[i — 1, j] 
+ w(a; > €),Cli, j — 1] + wle > 5;), 
Cli-1,7 —1])+w(a > b;)), 


where C[i,—1] = C[-1, j] = oo is assumed. 
This matrix is computed in O(mn) time and 
d(A, B) = C[m,n]. In order to solve the approx- 
imate string matching problem, one takes A = P 
and B = T and sets C[0, j] = 0 for all 7, so that 
the above formula is used only fori > 0. 


Theorem 1 (Sellers 1980 [13]) There exists an 
O(mn) worst-case time solution to the ASM 
problem under weighted edit distance. 


The space is O(m) if one realizes that C can 
be computed column-wise and only column 7 — 1 
is necessary to compute column /. As explained, 
this immediately implies that searching under 
unit-cost edit distances can be done in O(mn) 
time as well. In those cases, it is quite easy to 
compute only part of matrix C so as to achieve 
O(kn) average-time algorithms [14]. 

Yet, there exist algorithms with lower worst- 
case complexity for weighted edit distance. By 
applying a Ziv-Lempel parsing to P and T, 
it is possible to identify regions of matrix C 
corresponding to substrings of P and T that 
can be computed from other previous regions 
corresponding to similar substrings of P and T 


[5]. 


Theorem 2 (Crochemore et al. 2003 [5]) 
There exists an O(n + mn/log,n) worst- 
case time solution to the ASM problem under 
weighted edit distance. Moreover, the time is 
O(n + mnh/logn), where 0 < h < loga is the 
entropy of T. 


This result is very general, also holding for 
computing weighted edit distance and local sim- 
ilarity (see section on “Applications”). For the 
case of edit distance and exploiting the unit-cost 
RAM model, it is possible to do better. On one 
hand, one can apply a four-Russian technique: 
All the possible blocks (submatrices of C) of 
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size t x t, fort = O(log, n), are precomputed, 
and matrix C is computed block-wise [9]. On 
the other hand, one can represent each cell in 
matrix C using a constant number of bits (as it 
can differ from neighboring cells by +1) so as to 
store and process several cells at once in a single 
machine word [10]. This latter technique is called 
bit-parallelism and assumes a machine word of 
O(log n) bits. 


Theorem 3 (Masek and _ Paterson 1980 
[9]; Myers 1999 [10]) There exist O(n + 
mn/(log, n)*) and O(n + mn/logn) worst- 
case time solutions to the ASM problem under 
edit distance. 


Both complexities are retained for indel dis- 
tance, yet not for Hamming distance. 

For unit-cost edit distances, the complexity 
can depend on k rather than on m, as k < m for 
the problem to be nontrivial, and usually k is a 
small fraction of m (or even k = o(m)). A classic 
technique [8] computes matrix C by processing 
in constant time diagonals C[i + d,j + d],0 < 
d <_ s, along which cell values do not change. 
This is possible by preprocessing the suffix trees 
of T and P for lowest common ancestor queries. 


Theorem 4 (Landau and Vishkin 1989 [8]) 
There exists an O(kn) worst-case time solution 
to the ASM problem under unit-cost edit 
distances. 


Other solutions exist which are better for small 
k, achieving time O(n(1 + k*/m)) [4]. For 
the case of Hamming distance, one can achieve 
improved results using convolutions [1]. 


Theorem 5 (Amir et al. 2004 [1]) There exist 
O(n \/k logk) and O(n(1+k3/m) log k) worst- 
case time solution to the ASM problem under 
Hamming distance. 


The last result for edit distance [4] achieves 
O(n) time if k is small enough (k = O(m!'/4)). It 
is also possible to achieve O(n) time on unit-cost 
edit distances at the expense of an exponential 
additive term on m or k: The number of different 
columns in C is independent of n, so the transi- 
tion from every possible column to the next can 
be precomputed as a finite-state machine. 
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Theorem 6 (Ukkonen 1985 [14]) There exists 
an O(n+m min(3”, m(2ma)*)) worst-case time 
solution to the ASM problem under edit distance. 


Similar results apply for Hamming and in- 
del distance, where the exponential term reduces 
slightly according to the particularities of the 
distances. 

The worst-case complexity of the ASM prob- 
lem is of course Q(n), but it is not known if this 
can be attained for any m and k. Yet, the average- 
case complexity of the problem is known. 


Theorem 7 (Chang and Marr 1994 [3]) The 
average-case complexity of the ASM problem is 
O(n(k + log, m)/m) under unit-cost edit dis- 
tances. 


It is not hard to prove the lower bound as 
an extension to Yao’s bound for exact string 
matching [15]. The lower bound was reached in 
the same paper [3], fork /m < 1/3—O (1//o). 
This was improved later to k/m < 1/2 — 
O (1/ Jc) [6] using a slightly different idea. The 
approach is to precompute the minimum distance 
to match every possible text substring (block) of 
length O(log, m) inside P. Then, a text window 
is scanned backwards, block-wise, adding up 
those minimum precomputed distances. If they 
exceed k before scanning all the window, then 
no occurrence of P with k errors can contain the 
scanned blocks, and the window can be safely slid 
over the scanned blocks, advancing in 7. This 
is an example of a filtration algorithm, which 
discards most text areas and applies an ASM 
algorithm only over those areas that cannot be 
discarded. 


Theorem 8 (Fredriksson and Navarro 2004 
[6]) There exists an optimal-on-average solution 
to the ASM problem under edit distance, for any 


k/m < ee = 1/2- O(1/ Jo). 


The result applies verbatim to indel distance. 
The same complexity is achieved for Hamming 
distance, yet the limit on k/m improves to 1 — 
1 /o. Note that, when the limit k/m is reached, 
the average complexity is already (7). It is not 
clear up to which k/m limit could one achieve 
linear time on average. 
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Applications 


The problem has many applications in compu- 
tational biology (to compare DNA and protein 
sequences, recovering from experimental errors, 
so as to spot mutations or predict similarity of 
structure or function), text retrieval (to recover 
from spelling, typing, or automatic recognition 
errors), signal processing (to recover from trans- 
mission and distortion errors), and several others. 
See a survey [11] for a more detailed discussion. 

Many extensions of the ASM problem exist, 
particularly in computational biology. For exam- 
ple, it is possible to substitute whole substrings 
by others (called generalized edit distance), swap 
characters in the strings (string matching with 
swaps or transpositions), reverse substrings (re- 
versal distance), have variable costs for inser- 
tions/deletions when they are grouped (similarity 
with gap penalties), and look for any pair of 
substrings of both strings that are sufficiently 
similar (local similarity). See, for example, Gus- 
field’s book [7], where many related problems are 
discussed. 


Open Problems 


The worst-case complexity of the problem is not 
fully understood. For unit-cost edit distances, it is 
O(n) if m = O(min(logn, (log, n)?)) or k = 
O(min(m!/4 loging n)). For weighted edit dis- 
tance, the complexity is O(n) ifm = O(log, n). 
Itis also unknown up to which k /m value can one 
achieve O(n) average time; up to now this has 
been achieved up tok/m = 1/2— O(1//o). 


Experimental Results 


A thorough survey on the subject [11] presents 
extensive experiments. Nowadays, the fastest al- 
gorithms for edit distance are in practice filtra- 
tion algorithms [6, 12] combined with bit-parallel 
algorithms to verify the candidate areas [2, 10]. 
Those filtration algorithms work well for small 
enough k/m; otherwise, the bit-parallel algo- 
rithms should be used stand-alone. Filtration al- 
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gorithms are easily extended to handle multiple 
patterns searched simultaneously. 


URL to Code 


Well-known packages offering efficient ASM 
are agrep (https://github.com/Wikinaut/agrep) 
and nrgrep (http://www.dcc.uchile.cl/~gnavarro/ 
software). 
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Problem Definition 


Identification of periodic structures in words 
(variants of which are known as tandem repeats, 
repetitions, powers, or runs) is a fundamental 
algorithmic task (see entry » Squares and 
Repetitions). In many practical applications, such 
as DNA sequence analysis, considered repetitions 
admit a certain variation between copies of the 
repeated pattern. In other words, repetitions under 
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interest are approximate tandem repeats and not 
necessarily exact repeats only. 

The simplest instance of an approximate tan- 
dem repeat is an approximate square. An approx- 
imate square in a word w is a subword uv, where 
u and v are within a given distance k accord- 
ing to some distance measure between words, 
such as Hamming distance or edit (also called 
Levenshtein) distance. There are several ways 
to define approximate tandem repeats as succes- 
sions of approximate squares, i.e., to generalize 
to the approximate case the notion of arbitrary 
periodicity (see entry » Squares and Repetitions). 
In this entry, we discuss three different definitions 
of approximate tandem repeats. The first two are 
built upon the Hamming distance measure, and 
the third one is built upon the edit distance. 

Let h(-,-) denote the Hamming distance be- 
tween two words of equal length. 


Definition 1 A word r[l...n] is called a K- 
repetition of period p, p < n/2, iff h(r[l...n — 
Pl.r[p+1...n]) < K. 


Equivalently, a word r[1...n] is a K-repetition 
of period p, if the number of mismatches, i.e., the 
number of i such that r[i] 4 r[i + p], is at most 
K. For example, ataa atta ctta ct is a 2-repetition 
of period 4. atc atc atc atg atg atg atg atg isa 
1-repetition of period 3, but atc atc atc att atc atc 
atc att is not. 


Definition 2 A word r[1...7] is called a K-run, 
of period p, p < n/2, iff for every i € [l...n — 
2p +1], wehaveA(r[i...i+ p—1],rfit+p,i+ 
2p—I)) < K. 


A K-run can be seen as a sequence of approx- 
imate squares uv such that |u| = |v| = p andu 
and v differ by at most K mismatches. The total 
number of mismatches in a K-run is not bounded. 

Let ed(-,-) denote the edit distance between 
two strings. 


Definition 3. A word r is a K-edit repeat if it can 
be partitioned into consecutive subwords, r = 
v'wyw2...wev”, £ > 2, such that 


t-1 
ed(v’, Wi )+>— ed(wj, wig) +ed(wy, v")<K, 


i=1 
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where wis some suffix of w; and wy is some 
prefix of we. 


A K-edit repeat is a sequence of “evolving” 
copies of a pattern such that there are at most 
K insertions, deletions, and mismatches, overall, 
between all consecutive copies of the repeat. For 
example, the word r =caagct cagct ccgct is a 2- 
edit repeat. 

When looking for tandem repeats occurring in 
a word, it is natural to consider maximal repeats. 
Those are the repeats extended to the right and 
left as much as possible provided that the corre- 
sponding definition is still verified. Note that the 
notion of maximality applies to K-repetitions, to 
K-runs, and to K-edit repeats. 

Under the Hamming distance, K-runs 
provide the weakest “reasonable” definition of 
approximate tandem repeats, since it requires 
that every square it contains cannot contain 
more than K mismatch errors, which seems 
to be a minimal reasonable requirement. On 
the other hand, K-repetition is the strongest 
such notion as it limits by K the total number 
of mismatches. This provides an additional 
justification that finding these two types 
of repeats is important as they “embrace” 
other intermediate types of repeats. Several 
intermediate definitions have been discussed in 
[9, Section 5]. 

In general, each K-repetition is a part of a 
K-run of the same period, and every K-run is 
the union of all K-repetitions it contains. Ob- 
serve that a K-run can contain as many as a 
linear number of K-repetitions with the same 
period. For example, the word (000 100)” of 
length 6n is a 1-run of period 3, which contains 
(2n — 1) 1-repetitions. In general, a K-run r 
contains (s — K + 1) K-repetitions of the same 
period, where s is the number of mismatches 
inr. 


Example I The following Fibonacci word con- 
tains three 3-runs of period 6. They are shown in 
regular font, in positions aligned with their oc- 
currences. Two of them are identical and contain 
each four 3-repetitions, shown in italic for the 
first run only. The third run is a 3-repetition in 
itself. 
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010010 100100 101001 010010 010100 1001 


10010 100100 101001 
10010 100100 10 
0010 100100 101 

10 100100 10100 

0 100100 101001 

1001 010010 010100 1 


10 010100 1001 


Key Results 


Given a word w of length n and an integer K, it 
is possible to find all K-runs, K-repetitions, and 
K-edit repeats within w in the following time and 
space bounds: 


K -runs can be found in time O(nK log K + S) 
(S' the output size) and working space O(n) 
[9]. 

K -repetitions can be found in time O(nK 
log K + S) and working space O(7) [9]. 

K -edit repeats can be found in time 
O(nK log K log(n/K) + S) and working 
space O(n + K?) [14, 19]. 


All three algorithms are based on similar 
algorithmic tools that generalize corresponding 
techniques for the exact case [4, 15, 16] (see 
[10] for a systematic presentation). The first 
basic tool is a generalization of the longest 
extension functions [16] that, in the case of 
Hamming distance, can be exemplified as 
follows. Given a word w, we want to compute, 
for each position p and each k < K, the 
quantity max{j|h(w[l...j],wlp...p + J - 
1]) < k}. Computing all those values can 
be done in time O(nK) using a method 
based on the suffix tree and the computation 
of the lowest common ancestor described 
in [7]. 

The second tool is the Lempel-Ziv factoriza- 
tion used in the well-known compression method. 
Different variants of the Lempel-Ziv factorization 
of a word can be computed in linear time [7, 18]. 

The algorithm for computing K-repetitions 
from [9] can be seen as a direct generalization of 
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the algorithm for computing maximal repetitions 
(runs) in the exact case [8, 15]. Although based 
on the same basic tools and ideas, the algorithm 
[9] for computing K-runs is much more involved 
and uses a complex “bootstrapping” technique for 
assembling runs from smaller parts. 

The algorithm for finding the K-edit repeats 
uses both the recursive framework and the idea 
of the longest extension functions of [16]. The 
longest common extensions, in this case, allow 
up to K edit operations. Efficient methods for 
computing these extensions are based upon a 
combination of the results of [12] and [13]. The 
K-edit repeats are derived by combining the 
longest common extensions computed in the for- 
ward direction with those computed in the reverse 
direction. 


Applications 


Tandemly repeated patterns in DNA sequences 
are involved in various biological functions and 
are used in different practical applications. 

Tandem repeats are known to be involved in 
regulatory mechanisms, e.g., to act as binding 
sites for regulatory proteins. Tandem repeats have 
been shown to be associated with recombina- 
tion hotspots in higher organisms. In bacteria, 
a correlation has been observed between certain 
tandem repeats and virulence and pathogenicity 
genes. 

Tandem repeats are responsible for a number 
of inherited diseases, especially those involving 
the central nervous system. Fragile X syndrome, 
Kennedy disease, myotonic dystrophy, and Hunt- 
ington’s disease are among the diseases that have 
been associated with triplet repeats. 

Examples of different genetic studies illustrat- 
ing abovementioned biological roles of tandem 
repeats can be found in introductive sections 
of [1, 6, 11]. Even more than just genomic el- 
ements associated with various biological func- 
tions, tandem repeats have been established to be 
a fundamental mutational mechanism in genome 
evolution [17]. 

A major practical application of short tandem 
repeats is based on the interindividual variability 
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in copy number of certain repeats occurring at a 
single locus. This feature makes tandem repeats a 
convenient tool for genetic profiling of individ- 
uals. The latter, in turn, is applied to pedigree 
analysis and establishing phylogenetic relation- 
ships between species, as well as to forensic 
medicine [3]. 


Open Problems 


The definition of K-edit repeats is similar to 
that of K-repetitions (for the Hamming distance 
case). It would be interesting to consider other 
definitions of maximal repeats over the edit dis- 
tance. For example, a definition similar to the K- 
run would allow up to K edits between each pair 
of neighboring periods in the repeat. Other possi- 
ble definitions would allow K errors between any 
pair of copies of a repeat, or between all pairs 
of copies, or between some consensus and each 
copy. 

In general, a weighted edit distance scheme is 
necessary for biological applications. Known al- 
gorithms for tandem repeats based on a weighted 
edit distance scheme are not feasible, and thus, 
only heuristics are currently used. 


URL to Code 


The algorithms described in this entry have 
been implemented for DNA sequences and 
are publicly available. The Hamming distance 
algorithms (K-runs and K-repetitions) are part 
of the mreps software package, available at 
http://mreps.univ-mlv.fr/ [11]. The K-edit repeat 
software, TRED, is available at http://tandem.sci. 
brooklyn.cuny.edu/ [19]. The implementations of 
the algorithms are coupled with postprocessing 
filters, necessary due to the nature of biological 
sequences. 

In practice, software based on heuristic and 
statistical methods is largely used. Among them, 
TRF (http://tandem.bu.edu/trf/trf.html) [1] is the 
most popular program used by the bioinformatics 
community. Other programs include ATRHunter 
(http://bioinfo.cs.technion.ac.il/atrhunter/) [20] 
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and TandemSWAN (http://favorov.bioinfolab. 
net/swan/) [2]. STAR  (http://atgc.lirmm.frt/ 
star/) [5] is another software, based on an 
information-theoretic approach, for computing 
approximate tandem repeats of a prespecified 
pattern. 
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Problem Definition 


Population and evolutionary dynamics have been 
extensively studied, usually with the assumption 
that the evolving population has no spatial struc- 
ture. One of the main models in this area is 
the Moran process [17]. The initial population 


110 


contains a single “mutant” with fitness r > 0, 
with all other individuals having fitness 1. At 
each step of the process, an individual is chosen 
at random, with probability proportional to its 
fitness. This individual reproduces, replacing a 
second individual, chosen uniformly at random, 
with a copy of itself. 

Lieberman, Hauert, and Nowak introduced a 
generalization of the Moran process, where the 
members of the population are placed on the 
vertices of a connected graph which is, in gen- 
eral, directed [13, 19]. In this model, the initial 
population again consists of a single mutant of 
fitness r > 0 placed on a vertex chosen uniformly 
at random, with each other vertex occupied by a 
nonmutant with fitness 1. The individual that will 
reproduce is chosen as before, but now one of its 
neighbors is randomly selected for replacement, 
either uniformly or according to a weighting of 
the edges. The original Moran process can be re- 
covered by taking the graph to be an unweighted 
clique. 

Several similar models describing particle in- 
teractions have been studied previously, including 
the SIR and SIS epidemic models [8, Chapter 21], 
the voter model, the antivoter model, and the 
exclusion process [1,7, 14]. Related models, such 
as the decreasing cascade model [12, 18], have 
been studied in the context of influence propa- 
gation in social networks and other models have 
been considered for dynamic monopolies [2]. 
However, these models do not consider different 
fitnesses for the individuals. 

In general, the Moran process on a finite, con- 
nected, directed graph may end with all vertices 
occupied by mutants or with no vertex occupied 
by a mutant — these cases are referred to as fixa- 
tion and extinction, respectively — or the process 
may continue forever. However, for undirected 
graphs and strongly connected digraphs, the pro- 
cess terminates almost surely, either at fixation 
or extinction. At the other extreme, in a directed 
graph with two sources, neither fixation nor ex- 
tinction is possible. In this work we consider 
finite undirected graphs. The fixation probability 
for a mutant of fitness r in a graph G is the 
probability that fixation is reached and is denoted 
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Key Results 


The fixation probability can be determined by 
standard Markov chain techniques. However, do- 
ing so for a general graph on n vertices requires 
solving a set of 2” linear equations, which is not 
computationally feasible, even numerically. As 
a result, most prior work on computing fixation 
probabilities in the generalized Moran process 
has either been restricted to small graphs [6] or 
graph classes where a high degree of symme- 
try reduces the size of the set of equations — 
for example, paths, cycles, stars, and complete 
graphs [3-5] — or has concentrated on finding 
graph classes that either encourage or suppress 
the spread of the mutants [13, 16]. 

Because of the apparent intractability of exact 
computation, we turn to approximation. Using 
a potential function argument, we show that, 
with high probability, the Moran process on an 
undirected graph of order n reaches absorption 
(either fixation or extinction) within O(n°) steps 
if r = 1 and O(n*) and O(n?) steps when r > 1 
andr < 1, respectively. Taylor et al. [20] studied 
absorption times for variants of the generalized 
Moran process, but, in our setting, their results 
only apply to the process on regular graphs, 
where it is equivalent to a biased random walk 
on a line with absorbing barriers. The absorption 
time analysis of Broom et al. [3] is also restricted 
to cliques, cycles, and stars. In contrast to this 
earlier work, our results apply to all connected 
undirected graphs. 

Our bound on the absorption time, along with 
polynomial upper and lower bounds for the fix- 
ation probability, allows the estimation of the 
fixation and extinction probabilities by Monte 
Carlo techniques. Specifically, we give a fully 
polynomial randomized approximation scheme 
(FPRAS) for these quantities. An FPRAS for a 
function f(X) is a polynomial-time randomized 
algorithm g that, given input X and an error 
bound ¢, satisfies (1 — e) f(X) < g(X) < I+ 
€) f(X) with probability at least 3 and runs in 
time polynomial in the length of X and i [11]. 

For the case r < 1, there is no polynomial 
lower bound on the fixation probability so only 
the extinction probability can be approximated 
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by this technique. Note that, when f < 1, 
computing 1 — f to within a factor of 1 + ¢ does 
not imply computing / to within the same factor. 


Bounding the Fixation Probability 

In the next two lemmas, we provide polynomial 
upper and lower bounds for the fixation proba- 
bility of an arbitrary undirected graph G. Note 
that the lower bound of Lemma | holds only 
for r => 1. Indeed, for example, the fixation 
probability of the complete graph K,, is given by 
fn = (— 4)/0 — +) [13, 19], which is 
exponentially small for any r < 1. 


Lemmal Let G = (V,E) be an undirected 


‘ : 1 
graph with n vertices. Then fg, 2 ;, for any 
re. 


Lemma2 Let G = (V,E) be an undirected 
graph with n vertices. Then fg, < 1—- + for 
anyr > 0. 


Bounding the Absorption Time 

In this section, we show that the Moran process 
on a connected graph G of order n is expected 
to reach absorption in a polynomial number of 
steps. To do this, we use the potential function 
given by 


for any state S C V(G) and we write ¢(G) for 
o(V(G)). Note that 1 < $(G) < n and that 
$({x}) = 1/degx < 1 for any vertex x € V. 

First, we show that the potential strictly in- 
creases in expectation when r > | and strictly 
decreases in expectation when r < 1. 


Lemma 3 Let (X;)j>0 be a Moran process on a 
graph G = (V,E) andlet®@ CS CV.Mfr21, 
then 


‘[p(Xi41) — (Xi) | Xi = S] > (: 7 alee 


AB 
with equality if and only ifr = 1. Forr <1, 


1 


r-— 
n> — 


“[6(Xi+1) — (Xi) | Xi = S] < 


To bound the expected absorption time, we 
use martingale techniques. It is well known how 
to bound the expected absorption time using a 
potential function that decreases in expectation 
until absorption. This has been made explicit by 
Hajek [9] and we use the following formulation 
based on that of He and Yao [10]. The proof is 
essentially theirs but is modified to give a slightly 
stronger result. 


Theorem 1 Let (¥;)j>0 be a Markov chain with 
state space $2, where Yo is chosen from some set 
I C Q. If there are constants k,,kz > 0 anda 
nonnegative function w: 2 — R such that 


¢ w(S) = 0 for some S € Q, 

* wW(S) <k, forall S € I and 

° Elw(%) — wisi) | Yi = S] 2 ko for all 
i > Oandall S with y(S) > 0, 


then 
O}. 


i[t] < ki/k2, where t = min {i : y(%j) = 


Using Theorem |, we can prove the following 
upper bounds for the absorption time t in the 
cases where r < | andr > 1, respectively. 


Theorem 2 Let G = (V, E) be a graph of order 
n. Forr < 1 and any S C V, the absorption time 
t of the Moran process on G satisfies 


[tr | Xo = S] < : —0°$(S). 


_ 


Theorem 3 Let G = (V, E) be a graph of order 
n. Forr > 1 and any S C V, the absorption time 
t of the Moran process on G satisfies 


r 
alt | Xo = S] < =n? (g(G) — 6(5)) 
eat 57 
r-1 
The case r = 1 is more complicated as 


Lemma 3 shows that the expectation is constant. 
However, this allows us to use standard martin- 
gale techniques and the proof of the following 
is partly adapted from the proof of Lemma 3.4 
in [15]. 
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Theorem 4 The expected absorption time for the 
Moran process (Xj)jxo with r = 1 ona graph 
G = (V, E) is at most n*(¢(G)? — E[¢(Xo)?)). 


Approximation Algorithms 

We now have all the components needed 
to present our fully polynomial randomized 
approximation schemes (FPRAS) for the problem 
of computing the fixation probability of a graph, 
where r = 1, and for computing the extinction 
probability for all r > 0. In the following two 
theorems, we give algorithms whose running 
times are polynomial in n, 7, and i, For the 
algorithms to run in time polynomial in the 
length of the input and thus meet the definition of 
FPRAS, r must be encoded in unary. 


Theorem 5 There is an FPRAS for MORAN FIX- 
ATION, forr = 1. 


Proof (sketch) The algorithm is as follows. If 
r = | then we return 4. Otherwise, we simulate 
the Moran process on G for T = 3" Nn*] 
steps, N = [4e-?n In 16] times and compute 
the proportion of simulations that reached fixa- 
tion. If any simulation has not reached absorption 
(fixation or extinction) after T steps, we abort and 
immediately return an error value. 

Note that each transition of the Moran process 
can be simulated in O(1) time. Maintaining ar- 
rays of the mutant and nonmutant vertices allows 
the reproducing vertex to be chosen in constant 
time, and storing a list of each vertex’s neigh- 
bors allows the same for the vertex where the 
offspring is sent. Therefore, the total running time 
is O(NT) steps, which is polynomial in n and 1, 
as required. 

For i € {1,...,N}, let X; = 1 if the ith 
simulation of the Moran process reaches fixation 
and X; = 0 otherwise. Assuming all simulation 
runs reach absorption, the output of the algorithm 
isp = 7 >) Xi. Oo 

Note that this technique fails for disadvan- 
tageous mutants (r < 1) because there is no 
analogue of Lemma | giving a polynomial lower 
bound on fG,-. As such, an exponential number 
of simulations may be required to achieve the 
desired error probability. However, we can give 
an FPRAS for the extinction probability for all 
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r > 0. Although the extinction probability is just 
1— fG,r, there is no contradiction because a small 
relative error in | — fg,- does not translate into a 
small relative error in fg, when fG,r is, itself, 
small. 


Theorem 6 There is an FPRAS for MORAN EX- 
TINCTION for all r > 0. 


Proof (sketch) The algorithm and its correctness 
proof are essential as in the previous theorem. If 
r = 1, we return | — 1. Otherwise, we run N = 
[4e-2(r + n)? In 16] simulations of the Moran 
process on G for T(r) steps each, where 


8r 4 . 
— TN fr>1 
roy =) Natl itr 

[Nn] Ur<t. 


If any simulation has not reached absorption 
within 7'(r) steps, we return an error value; oth- 
erwise, we return the proportion p of simulations 
that reached extinction. Oo 


It remains open whether other techniques 
could lead to an FPRAS for MORAN FIXATION 
when r < 1. 
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Problem Definition 


This problem is to construct a random tree metric 
that probabilistically approximates a given arbi- 
trary metric well. A solution to this problem is 
useful as the first step for numerous approxima- 
tion algorithms because usually solving problems 
on trees is easier than on general graphs. It 
also finds applications in on-line and distributed 
computation. 

It is known that tree metrics approximate 
general metrics badly, e.g., given a cycle C, 
with n nodes, any tree metric approximating this 
graph metric has distortion 2(n) [17]. However, 
Karp [15] noticed that a random spanning tree 
of C, approximates the distances between any 
two nodes in C, well in expectation. Alon, 
Karp, Peleg, and West [1] then proved a bound 
of exp(O(,/lognloglogn)) on an average 
distortion for approximating any graph metric 
with its spanning tree. 

Bartal [2] formally defined the notion of prob- 
abilistic approximation. 


Notations 

A graph G = (V, E) with an assignment of non- 
negative weights to the edges of G defines a met- 
ric space (V,dg) where for each pair u,v € V, 
dg u,v) is the shortest path distance between 
u and v in G. A metric (V, d) is a tree metric 
if there exists some tree T = (V’, E’) such that 
V CV‘ and for all u,v € V, dr(u,v) = d(u, v). 
The metric (V, d) is also called a metric induced 
by T. 

Given a metric (V, d), a distribution D 
over tree metrics over V a-probabilistically 
approximates d if every tree metric dr € D, 
dr(u,v) = d(u,v) and Egpep[dr(u,v)] < 
a-d(u,v), for every u,v € V. The quantity is 
referred to as the distortion of the approximation. 
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Although the definition of probabilistic ap- 
proximation uses a distribution D over tree met- 
rics, One is interested in a procedure that con- 
structs a random tree metric distributed according 
to D, i.e., an algorithm that produces a random 
tree metric that probabilistically approximates 
a given metric. The problem can be formally 
stated as follows. 


Problem (APPROX-TREE) 

INPUT: a metric (V, d) 

OUTPUT: a tree metric (V,dr) sampled from 
a distribution D over tree metrics that a- 
probabilistically approximates (V, d). 


Bartal then defined a class of tree metrics, called 
hierarchically well-separated trees (HST), as fol- 
lows. A k-hierarchically well-separated tree (k- 
HST) is a rooted weighted tree satisfying two 
properties: the edge weight from any node to 
each of its children is the same, and the edge 
weights along any path from the root to a leaf 
are decreasing by a factor of at least k. These 
properties are important to many approximation 
algorithms. 

Bartal showed that any metric on n points 
can be probabilistically approximated by a set 
of k-HST’s with O(log?n) distortion, an 
improvement from exp(O(,/logzn log logn)) 
in [1]. Later Bartal [3], following the same 
approach as in Seymour’s analysis on the 
Feedback Arc Set problem [18], improved the 
distortion down to O(log loglogn). Using 
a rounding procedure of Calinescu, Karloff, and 
Rabani [5], Fakcharoenphol, Rao, and Talwar [9] 
devised an algorithm that, in expectation, 
produces a tree with O(logn) distortion. This 
bound is tight up to a constant factor. 


Key Results 


A tree metric is closely related to graph decom- 
position. The randomized rounding procedure 
of Calinescu, Karloff, and Rabani [5] for the 
0-extension problem decomposes a graph into 
pieces with bounded diameter, cutting each edge 
with probability proportional to its length and 
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a ratio between the numbers of nodes at certain 
distances. Fakcharoenphol, Rao, and Talwar [9] 
used the CKR rounding procedure to decompose 
the graph recursively and obtained the following 
theorem. 


Theorem 1 Given an n-point metric (V, d), there 
exists a randomized algorithm, which runs in 
time O(n?), that samples a tree metric from the 
distribution D over tree metrics that O(log n)- 
probabilistically approximates (V, d). The tree is 
also a 2-HST. 


The bound in Theorem | is tight, as Alon et al. [1] 
proved the bound of an (2(log 7) distortion when 
(V, d) is induced by a grid graph. Also note that it 
is known (as folklore) that even embedding a line 
metric onto a 2-HST requires distortion (2 (log). 

If the tree is required to be a k-HST, one 
can apply the result of Bartal, Charikar, and 
Raz [4] which states that any 2-HST can be 
O(k/ log k)-probabilistically approximated by 
k-HST, to obtain an expected distortion of 
O(k logn/logk). 

Finding a distribution of tree metrics that 
probabilistically approximates a given metric 
has a dual problem that is to find a single 
tree T with small average weighted stretch. 
More specifically, given weight c,, on edges, 
find a tree metric dr such that for all u,v € 
Vdr (u,v) => d(u, v) and Dower Cuv dT (u, v) < 
a are Cuy + d(u, v). 

Charikar, Chekuri, Goel, Guha, and Plotkin [6] 
showed how to find a distribution of O(n logn) 
tree metrics that a-probabilistically approximates 
a given metric, provided that one can solve the 
dual problem. The algorithm in Theorem | can 
be derandomized by the method of conditional 
expectation to find the required tree metric 
with a = O(logn). Another algorithm based on 
modified region growing techniques is presented 
in [9], and independently by Bartal. 


Theorem 2 Given an n-point metric (V, d), there 
exists a polynomial-time deterministic algorithm 
that finds a distribution D over O(n logn) tree 
metrics that O(logn)-probabilistically approxi- 
mates (V, d). 
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Note that the tree output by the algorithm con- 
tains Steiner nodes, however Gupta [10] showed 
how to find another tree metric without Steiner 
nodes while preserving all distances within a con- 
stant factor. 


Applications 


Metric approximation by random trees has ap- 
plications in on-line and distributed computation, 
since randomization works well against oblivious 
adversaries, and trees are easy to work with and 
maintain. Alon et al. [1] first used tree embedding 
to give a competitive algorithm for the k-server 
problem. Bartal [3] noted a few problems in his 
paper: metrical task system, distributed paging, 
distributed k-server problem, distributed queuing, 
and mobile user. 

After the paper by Bartal in 1996, numerous 
applications in approximation algorithms have 
been found. Many approximation algorithms 
work for problems on tree metrics or HST 
metrics. By approximating general metrics with 
these metrics, one can turn them into algorithms 
for general metrics, while, usually, losing only 
a factor of O(log 7) in the approximation factors. 
Sample problems are metric labeling, buy-at-bulk 
network design, and group Steiner trees. Recent 
applications include an approximation algorithm 
to the Unique Games [12], information network 
design [13], and oblivious network design [11]. 

The SIGACT News article [8] is a review of 
the metric approximation by tree metrics with 
more detailed discussion on developments and 
techniques. See also [3, 9], for other applications. 


Open Problems 


Given a metric induced by a graph, some applica- 
tion, e.g., solving a certain class of linear systems, 
does not only require a tree metric, but a tree 
metric induced by a spanning tree of the graph. 
Elkin, Emek, Spielman, and Teng [7] gave an 
algorithm for finding a spanning tree with average 
distortion of O(log’ n log log 7). It remains open 
if this bound is tight. 
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Problem Definition 


The diameter of a graph is the largest distance be- 
tween its vertices. Closely related to the diameter 
is the radius of the graph. The center of a graph is 
a vertex that minimizes the maximum distance to 
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all other nodes, and the radius is the distance from 
the center to the node furthest from it. Being able 
to compute the diameter, center, and radius of a 
graph efficiently has become an increasingly im- 
portant problem in the analysis of large networks 
[11]. For general weighted graphs the only known 
way to compute the exact diameter and radius 
is by solving the all-pairs shortest paths problem 
(APSP). Therefore, a natural question is whether 
it is possible to get faster diameter and radius 
algorithms by settling for an approximation. For 
a graph G with diameter D, a c-approximation 
of D is a value D such that D € [D/c, D]. 
The question is whether a c-approximation can 
be computed in sub-cubic time. 


Key Results 


For sparse directed or undirected unweighted 
graphs, the best-known algorithm (ignoring poly- 
logarithmic factors) for APSP, diameter, and ra- 
dius does breadth-first search (BFS) from every 
node and hence runs in O(mn) time, where m 
is the number of edges in the graph. For dense 
directed unweighted graphs, it is possible to com- 
pute both the diameter and the radius using fast 
matrix multiplication (this is folklore; for a recent 
simple algorithm, see [5]), thus obtaining O(n”) 
time algorithms, where wm < 2.38 is the matrix 
multiplication exponent [4, 9, 10] and n is the 
number of nodes in the graph. 

A 2-approximation for both the diameter and 
the radius of an undirected graph can be obtained 
in O(m + n) time using BFS from an arbitrary 
node. For APSP, Dor et al. [6] show that any 
(2 — €)-approximation algorithm in unweighted 
undirected graphs running in T(n) time would 
imply an O(7(n)) time algorithm for Boolean 
matrix multiplication (BMM). Hence a priori it 
could be that (2 — €)-approximating the diameter 
and radius of a graph may also require solving 
BMM. 

Aingworth et al. [1] showed that this is not 
the case by presenting a sub-cubic (2 — e€)- 
approximation algorithm for the diameter in both 
directed and undirected graphs that does not 
use fast matrix multiplication. Their algorithm 
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computes in O(mJ/n +n”) time an estimate 
D such that D € [|2D/3|, D]. Berman and 
Kasiviswanathan [2] showed that for the radius 
problem the approach of Aingworth et al. can be 
used to obtain in O(mJn +n”) time an estimate 
F that satisfies r € [7, 3/2r], where r is the radius 
of the graph. For weighted graphs the algorithm 
of Aingworth et al. [1] guarantees that the 
estimate D satisfies D € [L4 -D|—(M -1), D], 
where M is the maximum edge weight in the 
graph. 

Roditty and Vassilevska Williams [8] gave 
a Las Vegas algorithm running in expected 
O(m./n) time that has the same approximation 
guarantee as Aingworth et al. for the diameter 
and the radius. They also showed that obtaining 
a € — €)-approximation algorithm running 
in O(n?~*) time in sparse undirected and 
unweighted graphs for constant «,5 > 0 would 
be difficult, as it would imply a fast algorithm 
for CNF Satisfiability, violating the widely 
believed Strong Exponential Time Hypothesis 
of Impagliazzo, Paturi, and Zane [7]. 

Chechik et al. [3] showed that it is possible 
to remove the additive error while still keeping 
the running time (in terms of 1) subquadratic for 
sparse graphs. They present two deterministic al- 
gorithms with 3-approximation for the diameter, 


~ 3 
one running in O(m2) time and one running in 
3 


O(mn2) time. 


Open Problems 


The main open problem is to understand the 
relation between the diameter computation and 
the APSP problem. Is there a truly sub-cubic time 
algorithm for computing the exact diameter or 
can we show sub-cubic equivalence between the 
exact diameter computation and APSP problem? 

Another important open problem is to find an 
algorithm that distinguishes between graphs of 
diameter two to graphs of diameter three in sub- 
cubic time. Alternatively, can we show that it 
is sub-cubic equivalent to the problem of exact 
diameter? 
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Problem Definition 


Spin systems are well-studied objects in statisti- 
cal physics and applied probability. An instance 
of a spin system is an undirected graph G = 
(V, E) of n vertices. A configuration of a two- 
state spin system, or simply just two-spin system 
on G, is an assignment 0 : V — {0,1} of two 
spin states “O” and “1” (sometimes called “—” 
and “+” or seen as two colors) to the vertices 
of G. Let A = pe Ao, 

Ai,o Ait 
symmetric matrix which specifies the local inter- 


be a nonnegative 


by 
a nonnegative vector which specifies preferences 
of individual vertices over the two spin states. For 
each configuration o € {0, 1}", its weight is then 
given by the following product: 


. . : b 
actions between adjacent vertices and b = i 


w(a) = I] Ag(w),o(v) I] bow): 


{u,v}EE vEeV 


The partition function Z,4 ,(G) of a two-spin sys- 
tem on G is defined to be the following exponen- 
tial summation over all possible configurations: 


Za»(G)= >) w6). 


o€{0, 1} 


Up to normalization, A and b can be described 
by three parameters, so that one can assume that 


_ {pl _ [Aa 
eae and b = j [> Where By 2 0 


are the edge activities and A > 0 is the external 
field. Since the roles of the two spin states are 
symmetric, it can be further assumed that B < 
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y without loss of generality. Therefore, a two- 
spin system is completely specified by the three 
parameters (6, y, A) where it holds that0 < B < 
y and A > 0. The resulting partition function is 
written as Z(g,y,a)(G) = Za,»(G) and as Z(G) 
for short if the parameters are clear from the 
context. 

The two-spin systems are classified according 
to their parameters into two families with dis- 
tinct physical and computational properties: the 
ferromagnetic two-spin systems (By > 1) in 
which neighbors favor agreeing spin states and 
the antiferromagnetic two-spin systems (By < 1) 
in which neighbors favor disagreeing spin states. 
Two-spin systems with By = 1 are trivial in 
both physical and computational senses and thus 
are usually not considered. The model of two- 
spin systems covers some of the most exten- 
sively studied statistical physics models as spe- 
cial cases, as well as being accepted in computer 
science as a framework for counting problems, 
for examples: 


¢ When 6 = 0, y = 1, anddA = 1, the 
Z(B,y,A)(G) gives the number of independent 
sets (or vertex covers) of G. 

¢ When f = Oandy = 1, the Z(g y,4)(G) is the 
partition function of the hardcore model with 
fugacity A on G. 

* When f = y, the Z(g,,,4)(G) is the partition 
function of the Ising model with edge activity 
B and external field A on G. 


Given a set of parameters (f,y,A), the 
computational problem TWo-SPIN(6, y,A) is 
the problem of computing the value of the 
partition function Z(g y,4)(G) when the graph 
G is given as input. This problem is known to 
be #P-hard except for the trivial cases where 
By = lor B = y = O [1]. Therefore, the 
main focus here is the efficient approximation 
algorithms for TWwo-SPIN(, y,A). Formally, 
a fully polynomial-time approximation scheme 
(FPTAS) is an algorithm which takes G and 
any € > O as input and outputs a number Z 
satisfying Z(G) exp(—e) < Z 2 Z(G) exp(e) 
within time polynomial in n and 1/e; and a fully 
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polynomial-time randomized approximation 
scheme (FPRAS) is its randomized relaxation 
in which randomness is allowed and the above 
accuracy of approximation is required to be 
satisfied with high probability. 

For many important two-spin systems (e.g., in- 
dependent sets, antiferromagnetic Ising model), it 
is NP-hard to approximate the partition function 
on graphs of unbounded degrees. In these cases, 
the problem is further refined to consider the 
approximation algorithms for Two-SPIN(, y, A) 
on graphs with bounded maximum degree. In 
addition, in order to study the approximation 
algorithms on graphs which has bounded average 
degree or on special classes of lattice graphs, the 
approximation of partition function is studied on 
classes of graphs with bounded connective con- 
stant, a natural and well-studied notion of average 
degree originated from statistical physics. 

Therefore, the main problem of interest is to 
characterize the regimes of parameters (f, y, A) 
for which there exist efficient approximation al- 
gorithms for TWo-SPIN(6, y,4) on classes of 
graphs with bounded maximum degree Ajax, OF 
on classes of graphs with bounded connective 
constant A, or on all graphs. 


Key Results 


Given a two-spin system on graph G = (V, E), a 
natural probability distribution jz over all config- 
urations o € {0, i , called the Gibbs measure, 
can be defined by (co) = cOL where w(a) = 
Tuvyez Aou.ov Ivey 40, is the weight of o 
and the normalizing factor Z(G) is the partition 
function. 

The Gibbs measure defines a marginal distri- 
bution at each vertex. Suppose that a configura- 
tion o is sampled according to the Gibbs measure 
ju. Let py denote the probability of vertex vu 
having spin state “O” in o; and for a fixed con- 
figuration t, € {0,1}4 partially specified over 
vertices in A C V, let p,;“ denote the probability 
of vertex v having spin state “O” conditioning on 
that the configuration of vertices in A in o is as 
specified by Ty. 
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The marginal probability plays a key role in 
computing the partition function. Indeed, the 
marginal probability p,“ itself is a quantity 
of main interest in many applications such as 
probabilistic inference. In addition, due to the 
standard procedure of self-reduction, an FPTAS 
for the partition function Z(G) can be obtained if 
the value of p;“ can be approximately computed 
with an additive error ¢ in time polynomial 
in both n and 1/e. This reduces the problem 
of approximating the partition function (with 
multiplicative errors) to approximating the 
marginal probability (with additive errors), which 
is achieved either by rapidly mixing random 
walks or by recursions exhibiting a decay of 
correlation. 


Ferromagnetic Two-Spin Systems 

For the ferromagnetic case, the problem 
TWo-SPIN(B, y,A) is considered for By > 1 
and without loss of generality for B < y. 

In a seminal work [3], Jerrum and Sinclair 
gave an FPRAS for approximately computing 
the partition function of the ferromagnetic Ising 
model, which is the TWO-SPIN(f, y, A) problem 
with B =y > 1. 

The algorithm uses the Markov chain Monte 
Carlo (MCMC) method; however very interest- 
ingly, it does not directly apply the random walk 
over configurations of two-spin system since such 
random walk might have a slow mixing time. 
Instead, it first transforms the two-spin system 
into configurations of the so-called “subgraphs 
world”: each such configuration is a subgraph of 
G. A random walk over the subgraph configura- 
tions is applied and proved to be rapidly mixing 
for computing the new partition function defined 
over subgraphs, which is shown to be equal to the 
partition function Z(G) of the two-spin system. 
This equivalence is due to that this transformation 
between the “spins world” and the “subgraphs 
world” is actually a holographic transformation, 
which is guaranteed to preserve the value of the 
partition function. 

The result of [3] can be stated as the following 
theorem. 
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Theorem 1 /f 6 = y > 1 and i > 0, then there 
is an FPRAS for TWO-SPIN(f, y, A). 


The algorithm actually works for a stronger 
setting where the external fields are local (vertices 
have different external fields) as long as the ex- 
ternal fields are homogeneous (all have the same 
preference over spin states). 

For the two-spin system with general 6 and 
y, one can translate it to the Ising model where 
B = y by delegating the effect of the general 6, y 
to the degree-dependent effective external fields. 
This extends the FPRAS for the ferromagnetic 
Ising model to certain regime of ferromagnetic 
two-spin systems, stated as follows. 


Theorem 2 ([2, 6]) Jf B < y, By > 1, 
and 4 < y/fB, then there is an FPRAS for 
TWO-SPIN(B, y, A). 


If one is restricted to the deterministic algo- 
rithms for approximating the partition function, 
then a deterministic FPTAS is known for a strictly 
smaller regime, implicitly stated in the following 
theorem. 


Theorem 3 ([7]) There is a continuous mono- 
tonically increasing function I(y) defined 
on [1,+00) satisfying (1) TQ) = 1, (2) 
1 < I(y) < y for all y > 1, and (3) 
limy— +00 PY) — 1, such that there is an FPTAS 


for TWo-SPIN(B, y,A) if By > 1, B < T(y), 
and Xr <1. 


This deterministic FPTAS uses the same holo- 
graphic transformation from two-spin systems 
to the “subgraphs world” as in [3], and it ap- 
proximately computes the marginal probability 
defined in the subgraphs world by a recursion. 
The accuracy of the approximation is guaranteed 
by the decay of correlation. This technique is 
more extensively and successfully used for the 
antiferromagnetic two-spin systems. 

On the other hand, assuming certain complex- 
ity assumptions, it is unlikely that for every ferro- 
magnetic two-spin system its partition function is 
easy to approximate. 


Theorem 4 ([6]) For any B < y with By > 1, 
there is a Xo such that TWoO-SPIN(B, y, A) is 
#B1S-hard for alld > Xo. 
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Antiferromagnetic Two-Spin Systems 
For the antiferromagnetic case, the problem 
TWo-SPIN(6, y,A) is considered for By < 1 
and without loss of generality for B < y. 

In [2], a heatbath random walk over spin 
configurations is applied to obtain an FPRAS for 
TWwo-SPIN(f, y, A) for a regime of antiferromag- 
netic two-spin systems. 

The regime of antiferromagnetic two-spin sys- 
tems whose partition function is efficiently ap- 
proximable is characterized by the uniqueness 
condition. 

Given parameters (8, y,A) and d > 1, the tree 
recursion f(x) is given by 


d 
a) . (1) 


xy=A 
fey = (F* 
For antiferromagnetic (8, y, A), the function f(x) 
is decreasing in x; thus, there is a unique positive 
fixed point X satisfying X = f(X). Consider the 
absolute derivative of f(x) at the fixed point: 


r=. 

(BX + 1)\X%+yY) 
Definition] Let 0 < B < y, By < 1, 
and d > 1. The uniqueness condition 
UNIQUE(f, y,A, d) is satisfied if | f’(x)| < 1; 
and the condition NON-UNIQUE(f,y,A,d) is 
satisfied if | f’(x)| > 1. 


The condition UNIQUE(f,y,4,d) holds if 
and only if the dynamical system (1) converges 
to its unique fixed point * at an exponential rate. 
The name uniqueness condition is due to that 
UNIQUE(£, y, A, d) implies the uniqueness of the 
Gibbs measure of two-spin system of parameters 
(6, y,4) on the Bethe lattice (i.e., the infinite 
d-regular tree) and NON-UNIQUE(f, y, A, d) 
implies that there are more than one such 
measures (Fig. 1). 

Efficient approximation algorithms for 
TWo-SPIN(6, y,A) are discovered for special 
cases of antiferromagnetic two-spin systems 
within the uniqueness regime, including the 
hardcore model [12], the antiferromagnetic Ising 
model [8], and the antiferromagnetic two-spin 
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uniqueness threshold 


-—-- threshold achieved by 
heatbath random walk 
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ems, Fig. 1 The regime of (8, y) for which the unique- 


ness condition UNIQUE(B, y, A, d) holds for A = 1 and for all integer d > 1 


systems without external field [4], and finally for 
all antiferromagnetic two-spin systems within the 
uniqueness regime [5]. 


Theorem 5 ((5]) For 0 < B < y and By < 
1, there is an FPTAS for Two-SPIN(f, y, A) 
on graphs of maximum degree at most Amax if 
UNIQUE(6, y,A,d) holds for all integer 1 < 
d < Amax — 1. 


This algorithmic result for graphs of bounded 
maximum degree can be extended to graphs of 
unbounded degrees. 


Theorem 6 ([4,5]) For 0 < B < y and By < 
1, there is an FPTAS for TWo-SPIN(B, y,A) if 
UNIQUE(f, y, A, d) holds for all integer d > 1. 


All these algorithms follow the framework 
introduced by Weitz in his seminal work [12]. 
In this framework, the marginal probability p, 
is computed by applying the tree recursion (1) 
on the tree of self-avoiding walks, (In fact, (1) 
is the recursion for the ratio py“ /(1 — py“) of 


marginal probabilities.) which enumerates all 
paths originated from vertex v. Then, a decay 
of correlation, also called the spatial mixing 
property, is verified, so that a truncated recursion 
tree of polynomial size is sufficient to provide 
the required accuracy for the estimation of the 
marginal probability. For graphs of unbounded 
degrees, a stronger notion of decay of correlation, 
called the computationally efficient correlation 
decay [4], is verified to enforce the same cost and 
accuracy even when the branching number of the 
recursion tree is unbounded. 

On the other hand, for antiferromagnetic two- 
spin systems in the nonuniqueness regime, the 
partition function is hard to approximate. 


Theorem 7 ([11]) Let0 < B < y and By < 1. 
For any Amax = 3, unless NP RP, there 
does not exist an FPRAS for TWO-SPIN(B, y, A) 
on graphs of maximum degree at most Amax if 
NON-UNIQUE(f, y, A, d) holds for some integer 
1<d < Ama —1. 
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Altogether, this gives a complete classification 
of the approximability of partition function of 
antiferromagnetic two-spin systems except for 
the uniqueness threshold. 


Algorithms for Graphs with Bounded 
Connective Constant 
The connective constant is a natural and well- 
studied notion of the average degree of a graph, 
which, roughly speaking, measures the growth 
rate of the number of self-avoiding walks in 
the graph as their length grows. As a quantity 
originated from statistical physics, the connec- 
tive constant has been especially well studied 
for various infinite regular lattices. In order to 
suit the algorithmic applications, the definition 
of connective constant was extended in [9] to 
families of finite graphs. 

Given a vertex v in a graph G, let N(v,/) de- 
note the number of self-avoiding walks of length 
£ in G which start at v. 


Definition 2 ((9]) Let G be a family of finite 
graphs. The connective constant of G is at most 
A if there exist constants a and c such that for 
any graph G = (V, £) in G and any vertex v 
in G, it holds that )f_, N(v,i) < cA? for all 
£>alog|V|. 


The connective constant has a natural interpre- 
tation as the “average arity” of the tree of self- 
avoiding walks. 

For certain antiferromagnetic two-spin sys- 
tems, it is possible to establish the desirable decay 
of correlation on the tree of self-avoiding walks 
with bounded average arity instead of maximum 
arity, and hence the arity d in the uniqueness 
condition UNIQUE(6, y,A,d) can be replaced 
with the connective constant A. The algorithmic 
implication of this is stated as the following 
theorem. 


Theorem 8 ([10]) For — the two 


cases: 


following 


¢ (The hardcore model) B = 0 and y = 1; 
¢ (The antiferromagnetic Ising model with zero 
field) B = y <1andi = 1; 
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there exists an FPTAS for TWo-SPIN(B, y, A) 
on graphs of connective constant at most A if 


UNIQUE(f, y, A, A) holds. 


For the two-spin systems considered by this the- 
orem, it holds that UNIQUE(f, y,4, A) implies 
UNIQUE(6, y,A,d) forall 1 <d <A. 

The connective constant of a graph of max- 
imum degree Ajax is at most Amax — 1, but 
the connective constant of a family of graphs 
can be much smaller than this crude bound. For 
example, though the maximum degree of a graph 
drawn from the Erdés-Rényi model G(n,d/n) 
is O(logn/loglogn) with high probability, the 
connective constant of such a graph is at most 
d(1 + «) with high probability for any fixed 
é > 0. Therefore, for the considered two-spin 
systems, the algorithm in Theorem 8 works on 
strictly more general families of graphs than that 
of Theorem 5. 
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Problem Definition 


In the bin-packing problem, the input consists 
of a collection of items specified by their sizes. 
There are also identical bins, which without loss 
of generality can be assumed to be of size 1, and 
the goal is to pack these items using the minimum 
possible number of bins. 

Bin packing is a classic optimization problem, 
and hundreds of its variants have been defined 
and studied under various settings such as av- 
erage case analysis, worst-case off-line analysis, 
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and worst-case online analysis. This note consid- 
ers the most basic variant mentioned above under 
the off line model where all the items are given 
in advance. The problem is easily seen to be NP- 
hard by a reduction from the partition problem. 
In fact, this reduction implies that unless P = NP, 
it is impossible to determine in polynomial time 
whether the items can be packed into two bins or 
whether they need three bins. 


Notations 
The input to the bin-packing problem is a set of n 
items J specified by their sizes 51,..., 5,7, where 
each s; is a real number in the range (0,1]. A 
subset of items S C J can be packed feasibly in a 
bin if the total size of items in S is at most 1. The 
goal is to pack all items in J into the minimum 
number of bins. Let OPT(/) denote the value of 
the optimum solution and Size(/) the total size of 
all items in J. Clearly, OPT(/) > [ Size(/)]. 
Strictly speaking, the problem does not admit 
a polynomial-time algorithm with an approxi- 
mation guarantee better than 3/2. Interestingly, 
however, this does not rule out an algorithm that 
requires, say, OPT(/) + 1 bins (unlike other 
optimization problems, making several copies of 
a small hard instance to obtain a larger hard in- 
stance does not work for bin packing). It is more 
meaningful to consider approximation guarantees 
in an asymptotic sense. An algorithm is called an 
asymptotic p approximation if the number of bins 
required by it is p- OPT(/) + O(1). 


Key Results 


During the 1960s and 1970s, several algorithms 
with constant factor asymptotic and absolute ap- 
proximation guarantees and very efficient run- 
ning times were designed (see [1] for a survey). A 
breakthrough was achieved in 1981 by de la Vega 
and Lueker [3], who gave the first polynomial- 
time asymptotic approximation scheme. 


Theorem 1 ([3]) Given any arbitrary parameter 
€ > 0, there is an algorithm that uses (1 + 
€)OPT(I) + O(1) bins to pack I. The running 
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time of this algorithm is O(nlogn) + (1 + 
eyOure). 


The main insight of de la Vega and Lueker 
[3] was to give a technique for approximating 
the original instance by a simpler instance where 
large items have only O(1) distinct sizes. Their 
idea was simple. First, it suffices to restrict at- 
tention to large items, say, with size greater than 
é. These can be called /,. Given an (almost) 
optimum packing of J, consider the solution 
obtained by greedily filling up the bins with 
remaining small items, opening new bins only 
if needed. Indeed, if no new bins are needed, 
then the solution is still almost optimum since the 
packing for J, was almost optimum. If additional 
bins are needed, then each bin, except possibly 
one, must be filled to an extent (1 — €), which 
gives a packing using Size(/)/(1 —«) +1 < 
OPT(/)/(1 — €) + 1 bins. So it suffices to focus 
on solving J, almost optimally. To do this, the 
authors show how to obtain another instance 
I’ with the following properties. First, 7’ has 
only O(1/e7) distinct sizes, and second, J’ is an 
approximation of J; in the sense that OPT(J,) > 
OPT(J’), and moreover, any solution of J’ im- 
plies another solution of /, using O(e€ -OPT(/)) 
additional bins. As J’ has only 1/¢€? distinct item 
sizes, and any bin can obtain at most 1/e such 
items, there are at most O (1/<2)'/¢ ways to 
pack a bin. Thus, J’ can be solved optimally by 
exhaustive enumeration (or more efficiently using 
an integer programming formulation described 
below). 

Later, Karmarkar, and Karp [4] proved a sub- 
stantially stronger guarantee. 


Theorem 2 ([4]) Given an instance I, there is 
an algorithm that produces a packing of I using 
OPT(/) + O(log? OPT(/)) bins. The running 
time of this algorithm is O(n®). 


Observe that this guarantee is significantly 
stronger than that of [3] as the additive term 
is O(log?OPT) as opposed to o (€ - OPT). Their 
algorithm also uses the ideas of reducing the 
number of distinct item sizes and ignoring small 
items, but in a much more refined way. In par- 
ticular, instead of obtaining a rounded instance 


Approximation Schemes for Bin Packing 


in a single step, their algorithm consists of a 
logarithmic number of steps where in each step 
they round the instance “mildly” and then solve it 
partially. 

The starting point is an exponentially large 
linear programming (LP) relaxation of the prob- 
lem commonly referred to as the configuration 
LP. Here, there is a variable xs corresponding 
to each subset of items S that can be packed 
feasibly in a bin. The objective is to minimize 
>> xg subject to the constraint that for each item 
Ss 


i, the sum of xg over all subsets S that contain 
i is at least 1. Clearly, this is a relaxation as 
setting xs = | for each set S' corresponding to a 
bin in the optimum solution is a feasible integral 
solution to the LP. Even though this formulation 
has exponential size, the separation problem for 
the dual is a knapsack problem, and hence the LP 
can be solved in polynomial time to any accuracy 
(in particular within an accuracy of 1) using the 
ellipsoid method. Such a solution is called a 
fractional packing. Observe that if there are n; 
items each of size exactly s;, then the constraints 
corresponding to 7 can be “combined” to obtain 
the following LP: 


min >> x5 
S 
st. Slasixs =n; Vitem sizes i 
S 


xs >0 V feasible sets S. 
Here, as; is the number of items of size s; in the 
feasible S. Let q(/) denote the number of distinct 
sizes in J. The number of nontrivial constraints 
in LP is equal to g(/), which implies that there is 
a basic optimal solution to this LP that has only 
q(/) variables set nonintegrally. Karmarkar and 
Karp exploit this observation in a very clever way. 
The following lemma describes the main idea. 


Lemma 1 Given any instance J, suppose there 
is an algorithmic rounding procedure to obtain 
another instance J' such that J’ has Size(J)/2 
distinct item sizes and J and J' are related in the 
following sense: given any fractional packing of 
J using € bins gives a fractional packing of J’ 
with at most £ bins, and given any packing of J' 
using €' bins gives a packing of J using €' +c 
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bins, where c is some fixed parameter. Then, J 
can be packed using OPT(J) + c - log(OPT(J/)) 


bins. 


Proof Let Ig = TI and let J; be the instance 
obtained by applying the rounding procedure to 
Io. By the property of the rounding procedure, 
OPT(/) < OPT(/1)+c and LP(/1) < LP(/). As 
I, has Size(Jo)/2 distinct sizes, the LP solution 
for I; has at most Size(/9)/2 fractionally set vari- 
ables. Remove the items packed integrally in the 
LP solution, and consider the residual instance 
I,. Note that Size(I7) < Size(Ip)/2. Now, again 
apply the rounding procedure to I; to obtain I, 
and solve the LP for /2. Again, this solution has 
at most Size(Ij)/2 < Size(Io)/4 fractionally 
set variables and OPT(/}) < OPT(/2) + ¢ and 
LP(/2) < LP(/{). The above process is repeated 
for a few steps. At each step, the size of the 
residual instance decreases by a factor of at least 
two, and the number of bins required to pack Jo 
increases by additive c. After log(Size(Jo))(~ 
log(OPT(/))) steps, the residual instance has size 
O(1) and can be packed into O(1) additional 
bins. Oo 


It remains to describe the rounding procedure. 
Consider the items in nondecreasing order s; > 
$2 = > S, and group them as follows. 
Add items to current group until its size first 
exceeds 2. At this point, close the group and start 
a new group. Let G1,...,G, denote the groups 
formed and let n; = |G;|, setting no = 0 for 
convenience. Define J’ as the instance obtained 
by rounding the size of n;—1 largest items in G; 
to the size of the largest item in G; fori = 
1,...,&. The procedure satisfies the properties of 
Lemma | with c = O(logn,) (left as an exercise 
to the reader). To prove Theorem 2, it suffices to 
show that nz = O(Size(/)). This is done easily 
by ignoring all items smaller than 1/Size(/) and 
filling them in only in the end (as in the algorithm 
of de la Vega and Lueker). 

In the case when the item sizes are not too 
small, the following corollary is obtained. 


Corollary 1 Jf all the item sizes are at least 
6, it is easily seen that c = O(logl/6), 
and the above algorithm implies a guarantee 
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of OPT + O(log(1/6) - log OPT), which is 
OPT + O(log OPT) if 6 is a constant. 


Recently, Rothvoss gave the first improve- 
ment to the result of Karmarkar and Karp and im- 
prove their additive guarantee from O(log”Opt) 
to Odog Opt log log Opt). His algorithm also 
uses the configuration LP solution and is based on 
several new ideas and recent developments. First 
is the connection of bin packing to a problem in 
discrepancy theory known as the k-permutation 
problem. Second are the recently developed al- 
gorithmic approaches for addressing discrepancy 
minimization problems. 

In addition to these, a key idea in Rothvoss’ 
algorithm is to glue several small items contained 
in a configuration into a new large item. For more 
details, we refer the reader to [5]. 


Applications 


The bin-packing problem is directly motivated 
from practice and has many natural applications 
such as packing items into boxes subject to 
weight constraints, packing files into CDs, 
packing television commercials into station 
breaks, and so on. It is widely studied in 
operations research and computer science. Other 
applications include the so-called cutting-stock 
problems where some material such as cloth or 
lumber is given in blocks of standard size from 
which items of certain specified size must be 
cut. Several variations of bin packing, such as 
generalizations to higher dimensions, imposing 
additional constraints on the algorithm and 
different optimization criteria, have also been 
extensively studied. The reader is referred to 
[1,2] for excellent surveys. 


Open Problems 


Except for the NP-hardness, no other hardness 
results are known, and it is possible that a 
polynomial-time algorithm with guarantee OPT 
+ 1 exists for the problem. Resolving this is a key 
open question. A promising approach seems to 
be via the configuration LP (considered above). 
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In fact, no instance is known for which the 
additive gap between the optimum configuration 
LP solution and the optimum integral solution 
is more than 1. It would be very interesting to 
design an instance that has an additive integrality 
gap of two or more. 
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Problem Definition 


Geometric network optimization is the problem 
of computing a network in a geometric space 
(e.g., the Euclidean plane), based on an 
input of geometric data (e.g., points, disks, 
polygons/polyhedra) that is optimal according 
to an objective function that typically involves 
geometric measures, such as Euclidean length, 
perhaps in addition to combinatorial metrics, 
such as the number of edges in the network. 
The desired network is required to have certain 
properties, such as being connected (or k- 
connected), having a specific topology (e.g., 
forming a path/cycle), spanning at least a certain 
number of input objects, etc. 

One of the most widely studied optimization 
problems is the traveling salesperson problem 
(TSP): given a set S of n sites (e.g., cities), and 
distances between each pair of sites, determine a 
route or tour of minimum length that visits every 
member of S. The (symmetric) TSP is often for- 
mulated in terms of a graph optimization problem 
on an edge-weighted complete graph K,,, and the 
goal is to determine a Hamiltonian cycle (a cycle 
visiting each vertex exactly once), or a tour, of 
minimum total weight. In geometric settings, the 
sites are often points in the plane with distances 
measured according to the Euclidean metric. 

The TSP is known to be NP-complete in 
graphs and NP-hard in the Euclidean plane. Many 
methods of combinatorial optimization, as well 
as heuristics, have been developed and applied 
successfully to solving to optimality instances of 
TSP; see Cook [7]. Our focus here is on provable 
approximation algorithms. 


Approximation Schemes for Geometric Network Optimization Problems 


In the context of the TSP, a minimization prob- 
lem, a c-approximation algorithm is an algorithm 
guaranteed to yield a solution whose objective 
function value (length) is guaranteed to be at most 
c times that of an optimal solution. A polynomial- 
time approximation scheme (PTAS) is a family of 
c-approximation algorithms, with c = 1 + ¢, that 
runs in polynomial (in input size) time for any 
fixed ¢ > 0. A quasi-polynomial-time approx- 
imation scheme (QPTAS) is an approximation 
scheme, with factor c = 14+ e for any fixed 
€ > 0, whose running time is quasi-polynomial, 
QO Mogn)©) for some C. 

In the Euclidean Steiner minimum spanning 
tree (SMST) problem, the objective is to compute 
a minimum total length tree that spans all of the 
input points S, allowing nodes of the tree to be 
at points of the Euclidean space other than S 
(such points are known as Steiner points). The 
Euclidean SMST is known to be NP-hard, even 
in the plane. 


Key Results 


A simple 2-approximation algorithm for TSP 
follows from a “doubling” of a minimum 
spanning tree, assuming that the distances 
obey the triangle inequality. By augmenting the 
minimum spanning tree with a minimum-weight 
matching on the odd-degree nodes of the tree, 
Christofides obtained a 1.5-approximation for 
TSP with triangle inequality. This is the currently 
best-known approximation for general metric 
spaces; an outstanding open conjecture is that a 
4/3-approximation (or better) may be possible. It 
is known that the TSP in a general metric space 
is APX-complete, implying that, unless P = NP, 
no PTAS exists, in general. 

Research has shown that “geometry helps” in 
network optimization problems. Geometric struc- 
ture has played a key role in solving combinato- 
rial optimization problems. There are problems 
that are NP-hard in their abstract generality, yet 
are solvable exactly in polynomial time in geo- 
metric settings (e.g., maximum TSP in polyhe- 
dral metrics), and there are problems for which 
we have substantially better, or more efficient, 
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approximation algorithms in geometric settings 
(e.g., TSP). 

As shown by Arora [1] and Mitchell [10] in 
papers originally appearing in 1996, geometric 
instances of TSP and SMST have special 
structure that allows for the existence of a 
PTAS. Arora [1] gives a randomized algorithm 
that, with probability 1/2, yields a (1 + 6)- 
approximate tour in time n(log nyoWa/ eit 
in Euclidean d-space. Rao and Smith [14] 
obtain a deterministic algorithm with running 
time 24/9) 4 (d/e)?°Onlogn. This 
O(nlogn) bound (for fixed d,¢) matches the 
92(nlogn) lower bound in the decision tree 
bound. In the real RAM model, with atomic 
floor or mod function, Bartal and Gottlieb [3] 
give a randomized linear-time PTAS _ that, 
with probability 1 — en Ont et) computes a 
(1 + €)-approximation to an optimal tour in time 
24/2) On The exponential dependence on d 
in the PTAS bounds is essentially best possible, 
since Trevisan has shown that if d > logn, it is 
NP-hard to obtain a (1 + €)-approximation. 

A key insight of Rao and Smith is the applica- 
tion of the concept of “spanners” to the approx- 
imation schemes. A connected subgraph G of 
the complete Euclidean graph, joining every pair 
of points in S (within Euclidean d-dimensional 
space), is said to be a t-spanner for S if all points 
of S are nodes of G and, for any points u,v € S, 
the length of a shortest path in G from u to v is 
at most ¢ times the Euclidean distance, d2(u, v). 
It is known that for any point set S and ft > 1, 
t-spanners exist and can be calculated in time 
O(n logn), with the property that the ¢-spanner 
is lightweight, meaning that the sum of its edge 
lengths is at most a constant factor (depending on 
d and f) greater than the Euclidean length of a 
minimum spanning tree on S. 


Overview of Methods 

The PTAS techniques are based on structure the- 
orems showing that an optimal solution can be 
“rounded” to a “nearby” solution, of length at 
most a factor (1 + e) longer, that falls within a 
special class of recursively “nice” solutions for 
which optimization via dynamic programming 
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can be done efficiently, because the interface 
between adjacent subproblems is “small” com- 
binatorially. Arora’s algorithm [1] is random- 
ized, as is that of Rao and Smith [14]; both 
can be derandomized. The m-guillotine method 
(Mitchell [10]) is directly a deterministic method; 
however, the proof of its structure theorem is 
effectively an averaging argument. 


Arora’s Dissection Method 

Arora [1, 2] gives a method based on geomet- 
ric dissection using a quadtree (or its octtree 
analogue in d dimensions). On the boundary 
of each quadtree square are m equally spaced 
points (“portals”); a portal-respecting tour is one 
that crosses the boundaries of squares only at 
portals. Using an averaging argument based on 
a randomly shifted quadtree that contains the 
bounding square of S, Arora proves structure 
theorems, the simplest of which shows that, when 
m > (logn)/e, the expected length of a shortest 
portal-respecting tour, 7’, is at most (1 + €) times 
the length of an optimal tour. Within a quadtree 
square, 7 consists of at most m disjoint paths that 
together visit all sites within the square. Since 
the interface data specifying a subproblem has 
size 2°), dynamic programming computes a 
shortest portal-respecting tour in time 20 per 
quadtree square, for overall time 208”)/e) — 
nOG/s) An improved, near-linear (randomized) 
running time is obtained via a stronger structure 
theorem, based on “(m,k)-light” tours, which 
are portal respecting and enter/leave each square 
at most k times (with k = O(1/e)). Rao and 
Smith’s improvement uses the observation that it 
suffices to restrict the algorithm to use edges of a 
(1 + €)-spanner. 


The m-Guillotine Method 

The m-guillotine method of Mitchell [10] is 
based on the notion of an m-guillotine structure. 
A geometric graph G in the plane has the m- 
guillotine structure if the following holds: either 
(1) G has O(m) edges or (2) there exists a cut by 
an axis-parallel line € such that the intersection 
of £ with the edge set of G has O(m) connected 
components and the subgraphs of G on each side 
of the cut recursively also have the m-guillotine 
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structure. The m-guillotine structure is defined in 
dimensions d > 2 as well, using hyperplane cuts 
orthogonal to the coordinate axes. 

The m-guillotine structure theorem in 2 di- 
mensions states that, for any positive integer m, 
a set E of straight line segments in the plane is 
either m-guillotine already or is “close” to being 
m-guillotine, in that there exists a superset, Ey, D 
E that has m-guillotine structure, where Ey, is 
obtained from EF by adding a set of axis-parallel 
segments (bridges, or m-spans) of total length at 
most O(«|£|). The proof uses a simple charging 
scheme. 

The m-guillotine method originally (1996) 
yielded a PTAS for TSP and related problems in 
the plane, with running time n?“/®); this was 
improved (1997) to n?“). With the injection 
of the idea of Rao and Smith [14] to employ 
spanners, the m-guillotine method yields a 
simple, deterministic O(n logn) time PTAS for 
TSP and related problems in fixed dimension 
d > 2. The steps are the following: (a) construct 
(in O(n logn) time) a spanner, 7; (b) compute 
its m-guillotine superset, T,,, by standard sweep 
techniques (in time O(nlogn)); and (c) use 
dynamic programming (time O(n)) applied to 
the recursive tree associated with T,,, to optimize 
over spanning subgraphs of 7;,. 


Generalizations to Other Metrics 
The PTAS techniques described above have 
been significantly extended to variants of the 
Euclidean TSP. While we do not expect that a 
PTAS exists for general metric spaces (because 
of APX-hardness), the methods can be extended 
to a very broad class of “geometric” metrics 
known as doubling metrics, or metric spaces of 
bounded doubling dimension. A metric space V 
is said to have doubling constant cg if any ball of 
radius r can be covered by cq balls of radius r/2; 
the logarithm of cg is the doubling dimension of 
A’. Euclidean d-space has doubling dimension 
O(d). Bartal, Gottlieb, and Krauthgamer [4] 
have given a PTAS for TSP in doubling metrics, 
improving on a prior QPTAS. 

For the discrete metric space induced by an 
edge-weighted planar graph, the TSP has a linear- 
time PTAS. The subset TSP for edge-weighted 


Approximation Schemes for Geometric Network Optimization Problems 


planar graphs, in which there is a subset S C V 
of the vertex set V that must be visited, also has 
an efficient (O(n log) time) PTAS; this implies 
a PTAS for the geodesic metric for TSP on a set 
S of sites in a polygonal domain in the plane, 
with distances given by the (Euclidean) lengths 
of geodesic shortest paths between pairs of 
sites. 


Applications to Network Optimization 

The approximation schemes we describe above 
have been applied to numerous geometric net- 
work optimization problems, including the list 
below. We do not give references for most of the 
results summarized below; see the surveys [2, 11, 
12] and Har-Peled [9]. 


1. A PTAS for the Euclidean Steiner minimum 
spanning tree (SMST) problem. 

2. A PTAS for the Euclidean minimum Steiner 
forest problem, in which one is to compute 
a minimum-weight forest whose connected 
components (Steiner trees) span given (dis- 
joint) subsets S;,...,S% C S of the sites, 
allowing Steiner points. 

3. A PTAS for computing a minimum-weight 
k-connected spanning graph of S in Eu- 
clidean d-space. 

4. A PTAS for the k-median problem, in which 
one is to determine k centers, among S, 
in order to minimize the sum of the dis- 
tances from the sites S to their nearest center 
points. 

5. A PTAS for the minimum latency problem 
(MLP), also known as the deliveryman prob- 
lem or the traveling repairman problem, in 
which one is to compute a tour on S that 
minimizes the sum of the “latencies” of all 
points, where the /atency of a point p is the 
length of the tour from a given starting point 
to the point p. The PTAS of Sitters [15] 
runs in time n?“/®), improving the prior 
QPTAS. 

6. A PTAS for the k-TSP (and k-MST), in 
which one is to compute a shortest tour (tree) 
spanning at least k of the n sites of S. 

7. A QPTAS for degree-bounded spanning trees 
in the plane. 
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8. A QPTAS for the capacitated vehicle routing 
problem (VRP) [8], in which one is to com- 
pute a minimum-length collection of tours, 
each visiting at most k sites of S. A PTAS is 
known for some values of k. 

9. A PTAS for the orienteering problem, in 
which the goal is to maximize the number 
of sites visited by a length-bounded tour 
[6]. 

10. A PTAS for TSP with Neighborhoods 
(TSPN), in which each site of the input set S 
is a connected region of d-space (rather than 
a point), and the goal is to compute a tour that 
visits each region. The TSPN has a PTAS for 
regions in the plane that are “fat” and are 
weakly disjoint (no point lies in more than 
a constant number of regions) [13]. Chan 
and Elbassioni [5] give a QPTAS for fat, 
weakly disjoint regions in doubling metrics. 
For TSPN with disconnected regions, the 
problem is that of the “group TSP” (also 
known as “generalized TSP” or “one-of-a- 
set TSP’), which, in general metrics, is much 
harder than TSP; even in the Euclidean plane, 
the problem is NP-hard to approximate to 
within any constant factor for finite point 
sets and is NP-hard to approximate better 
than a fixed constant for visiting point 
pairs. 

11. A PTAS for the milling and lawnmowing 
problems, in which one is to compute a 
shortest path/tour for a specified cutter so 
that all points of a given region R in the 
plane is swept over by the cutter head 
while keeping the cutter fully within the 
region R (milling), or allowing the cutter 
to sweep over points outside of region R 
(lawnmowing). 

12. A PTAS for computing a minimum-length 
cycle that separates a given set of “red” 
points from a given set of “blue” points in 
the Euclidean plane. 

13. A QPTAS for the minimum-weight trian- 
gulation (MWT) problem of computing 
a triangulation of the planar point set S$ 
in order to minimize the sum of the edge 
lengths. The MWT has been shown to be 
NP-hard. 
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14. A PTAS for the minimum-weight Steiner 
convex partition problem in the plane, in 
which one is to compute an embedded planar 
straight-line graph with convex faces whose 
vertex set contains the input set S. 


Open Problems 


A prominent open problem in approximation al- 
gorithms for network optimization is to deter- 
mine if approximations better than factor 3/2 
can be achieved for the TSP in general metric 
spaces. 

Specific open problems for geometric network 
optimization problems include: 


1. Is there a PTAS for minimum-weight trian- 
gulation (MWT) in the plane? (A QPTAS is 
known.) 

2. Is there a PTAS for capacitated vehicle rout- 
ing, for all k? 

3. Is there a PTAS for Euclidean minimum span- 
ning trees of bounded degree (3 or 4)? (A 
QPTAS is known for degree-3 trees.) 

4. Is there a PTAS for TSP with Neighborhoods 
(TSPN) for connected disjoint regions in the 
plane? 

5. Is there a PTAS for computing a minimum- 
weight t-spanner of a set of points in a Eu- 
clidean space? 


Finally, can PTAS techniques be implemented 
to be competitive with other practical methods 
for TSP or related network optimization 
problems? 
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Problem Definition 


Non-preemptive makespan minimization on m 
uniformly related machines is defined as follows. 
We are given a set M = {1,2,...,m} of m 
machines where each machine 7 has a speed 5; 
such that s; > 0. In addition we are given a set 
of jobs J = {1,2,...,”}, where each job 7 has 
a positive size p; and all jobs are available for 
processing at time 0. The jobs need to be parti- 
tioned into m subsets S1,..., Sm, with S; being 
the subset of jobs assigned to machine 7, and each 
such (ordered) partition is a feasible solution to 
the problem. Processing job 7 on machine i takes 
PL time units. For such a solution (also known 
as a schedule), we let L; = ies pj)/si be 
the completion time or load of machine i. The 
work of machine i is W; = vies; py = Li - Sj, 
that is, the total size of the jobs assigned to 
i. The makespan of the schedule is defined as 
max{L,, L2,..., Lm}, and the goal is to find a 
schedule that minimizes the makespan. We also 
consider the problem on identical machines, that 
is, the special case of the above problem in which 
s; = 1 for all 7 (in this special case, the work 
and the load of a given machine are always the 
same). 
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Key Results 


A PTAS_ (polynomial-time approximation 
scheme) is a family of polynomial-time 
algorithms such that for all e > 0, the family 
has an algorithm such that for every instance of 
the makespan minimization problem, it returns 
a feasible solution whose makespan is at most 
1 + € times the makespan of an optimal solution 
to the same instance. Without loss of generality, 
we can assume that € < z. 


The Dual Approximation Framework and 
Common Preprocessing Steps 

Using a guessing step of the optimal makespan, 
and scaling the sizes of all jobs by the value of 
the optimal makespan, we can assume that the 
optimal makespan is in the interval [1, 1 + €) and 
it suffices to construct a feasible solution whose 
makespan is at most | + ce for a constant c (then 
scaling € before applying the algorithm will give 
the claimed result). This assumption can be made 
since we can find in polynomial time two values 
LB and UB such that the optimal makespan is 
in the interval [L B, UB] and aa is at most some 
constant (or even at most an exponential function 
of the length of the binary encoding of the input), 
then using a constant (or polynomial) number 
of iterations, we can find the minimum integer 
power of | + e€ for which the algorithm below 
will succeed to find a schedule with makespan at 
most 1 + ce times the optimal makespan. This 
approach is referred to as the dual approximation 
method [7,8]. 

From now on, we assume that the optimal 
makespan is in the interval [1,1 + €). The next 
step is to round up the size of each job to the 
next integer power of 1 + ¢€ and to round down 
the speed of each machine to the next integer 
power of | + e. That is, the rounded size of job 
jis pi, = (1+ €)lles1+e 271 and the rounded 
speed of machine i is s; = (1 + e)losite si, 
Note that this rounding does not decrease the 
makespan of any feasible solution and increase 
the optimal makespan by a multiplicative factor 
of at most (1 + €)?. Thus, in the new instance 
that we call the rounded instance, the makespan 
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of an optimal solution is in the interval [1, (1 + 
€)?). We observe that if the original instance to 
the makespan minimization problem was for the 
special case of identical machines, so does the 
rounded instance. The next steps differ between 
the PTAS for identical machines and its general- 
ization for related machines. 


The Case of Identical Machines 

We define a job to be small if its rounded size 
is at most €, and otherwise it is Jarge. The large 
jobs instance is the instance we obtain from the 
rounded instance by removing all small jobs. The 
first observation is that it is sufficient to design 
an algorithm for finding a feasible solution to the 
large jobs instance whose makespan is at most 
1 + ce where c > 5 is some constant. This is 
so, because we can apply this algorithm on the 
large jobs instance and obtain a schedule of these 
large jobs. Later, we add to the schedule the small 
jobs one by one using the list scheduling heuristic 
[5]. In the analysis, there are two cases. In the 
first one, adding the small jobs did not increase 
the makespan of the resulting schedule, and in 
this case our claim regarding the makespan of 
the output of the algorithm clearly holds. In the 
second case, the makespan increased by adding 
the small jobs, and we consider the last iteration 
in which such increase happened. In that last 
iteration, the load of one machine was increased 
by the size of the job assigned in this iteration, 
that is by at most ¢€, and before this iteration 
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its load was smaller than Eiep = (1 +e)’, 
where the inequality holds because the makespan 
of a feasible solution cannot be smaller than the 
average load of the machines. The claim now 
follows using (1 + 6)? + € < 1+ 5e€ ase < i. 

The large jobs instance has a compact 
representation. There are m identical machines 
and jobs of at most O(log,1, =) distinct sizes. 
Note that each machine has at most 2 large jobs 
assigned to it (in any solution with makespan 
smaller than 2), and thus there are a constant 
number of different schedules of one machine 
when we consider jobs of the same size as 
identical jobs. A schedule of one machine in 
a solution to the large jobs instance is called the 
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configuration of the machine. Now, we can either 
perform a dynamic programming algorithm that 
assigns large jobs to one machine after the other 
while recalling in each step the number of jobs 
of each size that still need to be assigned (as 
done in [7]) or use an integer program of fixed 
dimension [9] to solve the problem of assigning 
all large jobs to configurations of machines while 
having at most m machines in the solution and 
allowing only configurations corresponding to 
machines with load at most (1 + €)? as suggested 
by Shmoys (see [6]). 

We refer to [1, 2], and [10] for PTASs for 
other load balancing problems on_ identical 
machines. 


The Case of Related Machines 

Here, we still would like to consider separately 
the large jobs and the small jobs; however, a given 
job 7 can be large for one machine and small 
for another machine (it may even be too large 
for other machines, that is, processing it on such 
machine may take a period of time that is longer 
than the makespan of an optimal solution). Thus, 
for a given job j, we say that it is huge for 
machine 7 if a > (1+), it is large for machine 
iife < = < (1 + €)3, and otherwise it is 
small for machine 7. A configuration of machine 
i is the number of large jobs of each rounded 
size that are assigned to machine i (observe 
that similarly to the case of identical machines, 
the number of sizes of large jobs for a given 
machine is a constant) as well as approximate 
value of the total size of small jobs assigned to 
machine 7, that is a value A; such that the total 
size of small jobs assigned to machine / is in 


the interval ((4i —e- zi Ajeé- +]. Note that 
the vector of configurations of machines defines 
some information about the schedule, but it does 
not give a one-to-one assignment of small jobs to 
the machines. 

Once again, [8] suggested to use dynamic 
programming for assigning jobs to the machines 
by traversing the machines from the slowest one 
to the fastest one and, for each machine, decide 
the number of large jobs of each size as well as 
an approximate value of the total size of small 
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jobs (for that machine) assigned to it. That is, the 
dynamic programming decides the configuration 
of each machine. To do so, it needs to recall the 
number of large jobs (with respect to the current 
machine) that are still not assigned, as well as 
the total size of small jobs (with respect to the 
current machine) that are still not assigned (this 
total size is a rounded value). At a postprocessing 
step, the jobs assigned as large jobs by the solu- 
tion for the dynamic programming are scheduled 
accordingly, while the other jobs are assigned as 
small jobs as follows. 

We assign the small jobs to the machines while 
traversing the machines from slowest to fastest 
and assigning the small jobs from the smallest 
to largest. At each time we consider the current 
machine i and the prefix of unassigned small jobs 
that are small with respect to the current machine. 
Denote by A; the value of this parameter accord- 
ing to the solution of the dynamic programming. 
Due to the successive rounding of the total size of 
unassigned small jobs, we will allow assignments 
of a slightly larger total size of small jobs to 
machine 7. So we will assign the small jobs one 
by one as long as their total size is at most 
Bitie If at some point there are no further 
unassigned small jobs that are small for machine 
i, we move to the next machine, otherwise we 
assign machine 7 small jobs (for machine 7) of 
total size of at least ee This suffices to 
guarantee the feasibility of the resulting solution 
(i.e., all jobs are assigned) while increasing the 
makespan only by a small amount. 

We refer to [3] and [4] for PTASs for other 
load balancing problems on related machines. 
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Problem Definition 


Many NP-hard graph problems become easier to 
approximate on planar graphs and their general- 
izations. (A graph is planar if it can be drawn 
in the plane (or the sphere) without crossings. 
For definitions of other related graph classes, see 
the entry on » Bidimensionality (2004; Demaine, 
Fomin, Hajiaghayi, Thilikos).) For example, a 
maximum independent set asks to find a maxi- 
mum subset of vertices in a graph that induce 
no edges. This problem is inapproximable in 
general graphs within a factor of n'~€ for any 
€ > 0 unless NP=ZPP (and inapproximable 
within n!/2-€ unless P=NP), while for planar 
graphs, there is a 4-approximation (or simple 5- 
approximation) by taking the largest color class 
in a vertex 4-coloring (or 5-coloring). Another 
is minimum dominating set, where the goal is 
to find a minimum subset of vertices such that 
every vertex is either in or adjacent to the subset. 
This problem is inapproximable in general graphs 
within € log n for some € > 0 unless P= NP, 
but as we will see, for planar graphs, the problem 
admits a polynomial-time approximation scheme 
(PTAS): a collection of (1 + €)-approximation 
algorithms for all € > 0. 

There are two main general approaches to 
designing PTASs for problems on planar graphs 
and their generalizations: the separator approach 
and the Baker approach. 

Lipton and Tarjan [15, 16] introduced the first 
approach, which is based on planar separators. 
The first step in this approach is to find a separator 
of O(./n) vertices or edges, where n is the size 
of the graph, whose removal splits the graph into 
two or more pieces each of which is a constant 
fraction smaller than the original graph. Then, 
recurse in each piece, building a recursion tree of 
separators, and stop when the pieces have some 
constant size such as 1/e. The problem can be 
solved on these pieces by brute force, and then it 
remains to combine the solutions up the recursion 
tree. The induced error can often be bounded in 
terms of the total size of all separators, which 
in turn can be bounded by en. If the optimal 
solution is at least some constant factor times n, 
this approach often leads to a PTAS. 
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There are two limitations to this planar- 
separator approach. First, it requires that the 
optimal solution be at least some constant factor 
times n; otherwise, the cost incurred by the 
separators can be far larger than the desired 
optimal solution. Such a bound is possible in 
some problems after some graph pruning (linear 
kernelization), e.g., independent set, vertex cover, 
and forms of the traveling salesman problem. 
But, for example, Grohe [12] states that the 
dominating set is a problem “to which the 
technique based on the separator theorem does 
not apply.” Second, the approximation algorithms 
resulting from planar separators are often 
impractical because of large constant factors. For 
example, to achieve an approximation ratio of 
just 2, the base case requires exhaustive solution 
of graphs of up to 2?-4°° vertices. 

Baker [1] introduced her approach to address 
the second limitation, but it also addresses the 
first limitation to a certain extent. This approach 
is based on decomposition into overlapping sub- 
graphs of bounded outerplanarity, as described in 
the next section. 


Key Results 


Baker’s original result [1] is a PTAS for a 
maximum independent set (as defined above) 
on planar graphs, as well as the following list 
of problems on planar graphs: maximum tile 
salvage, partition into triangles, maximum H- 
matching, minimum vertex cover, minimum 
dominating set, and minimum edge-dominating 
set. 

Baker’s approach starts with a planar embed- 
ding of the planar graph. Then it divides vertices 
into layers by iteratively removing vertices on 
the outer face of the graph: layer j consists of 
the vertices removed at the jth iteration. If one 
now removes the layers congruent to i modulo 
k, for any choice of i, the graph separates into 
connected components each with at most k con- 
secutive layers, and hence the graph becomes k- 
outerplanar. Many NP-complete problems can be 
solved on k-outerplanar graphs for fixed k using 
dynamic programming (in particular, such graphs 
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have bounded treewidth). Baker’s approximation 
algorithm computes these optimal solutions for 
each choice 7 of the congruence class of layers 
to remove and returns the best solution among 
these k solutions. The key argument for maxi- 
mization problems considers the optimal solution 
to the full graph and argues that the removal of 
one of the & congruence classes of layers must 
remove at most a 1/k fraction of the optimal 
solution, so the returned solution must be within 
a 1+ 1/k factor of optimal. A more delicate 
argument handles minimization problems as well. 
For many problems, such as maximum indepen- 
dent set, minimum dominating set, and minimum 
vertex cover, Baker’s approach obtains a (1 + €)- 
approximation algorithms with a running time of 
200/6)°() on planar graphs. 

Eppstein [10] generalized Baker’s approach 
to a broader class of graphs called graphs of 
bounded local treewidth, i.e., where the treewidth 
of the subgraph induced by the set of vertices at a 
distance of at most r from any vertex is bounded 
above by some function f(r) independent of n. 
The main differences in Eppstein’s approach are 
replacing the concept of bounded outerplanarity 
with the concept of bounded treewidth, where 
dynamic programming can still solve many prob- 
lems, and labeling layers according to a sim- 
ple breadth-first search. This approach has led 
to PTASs for hereditary maximization problems 
such as maximum independent set and maximum 
clique, maximum triangle matching, maximum 
AT-matching, maximum tile salvage, minimum 
vertex cover, minimum dominating set, mini- 
mum edge-dominating set, minimum color sum, 
and subgraph isomorphism for a fixed pattern 
[6, 8, 10]. Frick and Grohe [11] also developed a 
general framework for deciding any property ex- 
pressible in first-order logic in graphs of bounded 
local treewidth. 

The foundation of these results is Eppstein’s 
characterization of minor-closed families of 
graphs with bounded local treewidth [10]. 
Specifically, he showed that a minor-closed 
family has bounded local treewidth if and only 
if it excludes some apex graph, a graph with 
a vertex whose removal leaves a planar graph. 
Unfortunately, the initial proof of this result 
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brought Eppstein’s approach back to the realm 
of impracticality, because his bound on local 
treewidth in a general apex-minor-free graph is 
doubly exponential in r : 279), Fortunately, 
this bound could be improved to 200) [3] and 
even the optimal O(r) [4]. The latter bound 
restores Baker’s 220/99 sunning time for 
(1 + €)-approximation algorithms, now for all 
apex-minor-free graphs. 

Another way to view the necessary decom- 
position of Baker’s and Eppstein’s approaches 
is that the vertices or edges of the graph can 
be split into any number k of pieces such that 
deleting any one of the pieces results in a graph of 
bounded treewidth (where the bound depends on 
k). Such decompositions in fact exist for arbitrary 
graphs excluding any fixed minor H [9], and 
they can be found in polynomial time [6]. This 
approach generalizes the Baker-Eppstein PTASs 
described above to handle general H -minor-free 
graphs. 

This decomposition approach is effectively 
limited to deletion-closed problems, whose 
optimal solution only improves when deleting 
edges or vertices from the graph. Another 
decomposition approach targets contraction- 
closed problems, whose optimal solution 
only improves when contracting edges. These 
problems include classic problems such as 
dominating set and its variations, the traveling 
salesman problem, subset TSP, minimum Steiner 
tree, and minimum-weight c-edge-connected 
submultigraph. PTASs have been obtained for 
these problems in planar graphs [2, 13, 14] and 
in bounded-genus graphs [7] by showing that the 
edges can be decomposed into any number k of 
pieces such that contracting any one piece results 
in a bounded-treewidth graph (where the bound 
depends on k). 


Applications 


Most applications of Baker’s approach have been 
limited to optimization problems arising from 
“local” properties (such as those definable in first- 
order logic). Intuitively, such local properties can 
be decided by locally checking every constant- 
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size neighborhood. In [5], Baker’s approach 
is generalized to obtain PTASs for nonlocal 
problems, in particular, connected dominating 
set. This generalization requires the use of 
two different techniques. The first technique 
is to use an &-fraction of a constant-factor (or 
even logarithmic-factor) approximation to the 
problem as a “backbone” for achieving the 
needed nonlocal property. The second technique 
is to use subproblems that overlap by O(log n) 
layers instead of the usual © (1) in Baker’s 
approach. 

Despite this advance in applying Baker’s 
approach to more general problems, the planar- 
separator approach can still handle some different 
problems. Recall, though, that the planar- 
separator approach was limited to problems 
in which the optimal solution is at least some 
constant factor times n. This limitation has 
been overcome for a wide range of problems 
[5], in particular obtaining a PTAS for feedback 
vertex set, to which neither Baker’s approach nor 
the planar-separator approach could previously 
apply. This result is based on evenly dividing the 
optimum solution instead of the whole graph, 
using a relation between treewidth and the 
optimal solution value to bound the treewidth 
of the graph, thus obtaining an O( OPT) 
separator instead of an O(./n) separator. The 
O(/OPT) bound on treewidth follows from 
the bidimensionality theory described in the 
entry on » Bidimensionality (2004; Demaine, 
Fomin, Hajiaghayi, Thilikos). We can divide 
the optimum solution into roughly even pieces, 
without knowing the optimum solution, by using 
existing constant-factor (or even logarithmic- 
factor) approximations for the problem. At the 
base of the recursion, pieces no longer have 
bounded size but do have bounded treewidth, so 
fast fixed-parameter algorithms can be used to 
construct optimal solutions. 


Open Problems 


An intriguing direction for future research is 
to build a general theory for PTASs of subset 
problems. Although PTASs for subset TSP and 
Steiner tree have recently been obtained for 
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planar graphs [2, 14], there remain several open 
problems of this kind, such as subset feedback 
vertex set, group Steiner tree, and directed Steiner 
tree. 

Another instructive problem is to understand 
the extent to which Baker’s approach can be 
applied to nonlocal problems. Again there is an 
example of how to modify the approach to handle 
the nonlocal problem of connected dominating 
set [5], but, for example, the only known PTAS 
for feedback vertex set in planar graphs follows 
the separator approach. 
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Problem Definition 


Nash [15] introduced the concept of Nash 
equilibria in noncooperative games and proved 
that any game possesses at least one such 
equilibrium. A well-known algorithm for 
computing a Nash equilibrium of a 2-player 
game is the Lemke-Howson algorithm [13]; 
however, it has exponential worst-case running 
time in the number of available pure strategies 
[18]. 

Daskalakis et al. [5] showed that the problem 
of computing a Nash equilibrium in a game with 
4 or more players is PPAD-complete; this result 
was later extended to games with 3 players [8]. 
Eventually, Chen and Deng [3] proved that the 
problem is PPAD-complete for 2-player games as 
well. 

This fact emerged the computation of approx- 
imate Nash equilibria. There are several versions 
of approximate Nash equilibria that have been 
defined in the literature; however, the focus of 
this entry is on the notions of ¢-Nash equilibrium 
and ¢€-well-supported Nash equilibrium. An €- 
Nash equilibrium is a strategy profile such that 
no deviating player could achieve a payoff higher 
than the one that the specific profile gives her, 
plus «. A stronger notion of approximate Nash 
equilibria is the €-well-supported Nash equilib- 
ria; these are strategy profiles such that each 
player plays only approximately best-response 
pure strategies with nonzero probability. These 
are additive notions of approximate equilibria; 
the problem of computing approximate equilibria 
of bimatrix games using a relative notion of 
approximation is known to be PPAD-hard even 
for constant approximations. 


Notation 

For an x | vector x, denote by x1,...,X, the 
components of x and by x? the transpose of x. 
Denote by e; the column vector with | at the 
ith coordinate and O elsewhere. For an n x m 
matrix A, denote a;; the element in the i-th row 
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and j-th column of A. Let P” be the set of 
all probability vectors in n dimensions: P? = 


n 
}2 REp Sa =i}. 


i=1 


Bimatrix Games 

Bimatrix games [16] are a special case of 2- 
player games such that the payoff functions can 
be described by two real n x m matrices A and 
B. The n rows of A, B represent the action set 
of the first player (the row player), and the m 
columns represent the action set of the second 
player (the column player). Then, when the row 
player chooses action i and the column player 
chooses action j, the former gets payoff aj;, 
while the latter gets payoff b;;. Based on this, 
bimatrix games are denoted by . = (A, B). 

A strategy for a player is any probability 
distribution on his/her set of actions. Therefore, 
a strategy for the row player can be expressed 
as a probability vector x € P”, while a strategy 
for the column player can be expressed as a 
probability vector y € P”. Each extreme point 
ej € P"(e; € P”) that corresponds to the 
strategy assigning probability 1 to the i-th row 
(j-th column) is called a pure strategy for the 
row (column) player. A strategy profile (x, y) is 
a combination of (mixed in general) strategies, 
one for each player. In a given strategy profile 
(x,y), the players get expected payoffs x" Ay 
(row player) and x? By (column player). 

If both payoff matrices belong to [0, 1] 
then the game is called a [0,1]-bimatrix (or else, 
positively normalized) game. The special case of 
bimatrix games in which all elements of the ma- 
trices belong to {0, 1} is called a {0, 1}-bimatrix 
(or else, win-lose) game. A bimatrix game (A, B) 
is called zero sum if B = —A. 


mxn 
’ 


Approximate Nash Equilibria 

Definition 1 (¢-Nash equilibrium) For any € > 
0, a strategy profile (x, y) is an €-Nash equilib- 
rium for the n x m bimatrix game T = (A, B) 
if 


1. For all pure strategies i € {1,...,n} of the 
row player, e? Ay <xTAy +e. 
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2. For all pure strategies 7 € {1,...,m} of the 
column player, x’ Be; < x’ By +e. 


Definition 2 (€-well-supported Nash equilib- 
rium) For any € > 0, a strategy profile (x, y) 
is an €-well-supported Nash equilibrium for the 
n x m bimatrix game I = (A, B) if 


1. For all pure strategies i € {1,...,”} of the 
row player, 


Xj > 0 => e? Ay > ef Ay—e Vk € {1,...,n} 


2. For all pure strategies 7 € {1,...,m} of the 
column player, 


yi >0> x" Be; > x" Bex 


—eVke{l,...,m}. 


Note that both notions of approximate equilibria 
are defined with respect to an additive error term 
€. Although (exact) Nash equilibria are known 
not to be affected by any positive scaling, it is 
important to mention that approximate notions of 
Nash equilibria are indeed affected. Therefore, 
the commonly used assumption in the literature 
when referring to approximate Nash equilibria is 
that the bimatrix game is positively normalized, 
and this assumption is adopted in the present 
entry. 


Key Results 


The work of Althdfer [1] shows that, for any 

probability vector p, there exists a probability 

vector P with logarithmic supports, so that for a 

fixed matrix C, max |p? Ce; — p’Ce;| < ¢, for 
j 


any constant € > 0. Exploiting this fact, the work 
of Lipton, Markakis, and Mehta [14] shows that, 
for any bimatrix game and for any constant € > 
0, there exists an €-Nash equilibrium with only 
logarithmic support (in the number n of available 
pure strategies). Consider a bimatrix game T = 
(A, B), and let (x, y) be a Nash equilibrium for 
I’. Fix a positive integer k and form a multiset S$; 
by sampling k times from the set of pure strate- 
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gies of the row player, independently at random 
according to the distribution x. Similarly, form a 
multiset S2 by sampling k times from set of pure 
strategies of the column player according to y. 
Let X be the mixed strategy for the row player that 
assigns probability 1/k to each member of Sy 
and 0 to all other pure strategies, and let » be the 
mixed strategy for the column player that assigns 
probability 1/k to each member of Sz and 0 to 
all other pure strategies. Then, % and jy are called 
k-uniform [14], and the following holds: 


Theorem 1 ((14]) For any Nash equilibrium 
(x,y) of an x n bimatrix game and for every 
€ > 0, there exists, for every k > (12Inn)/e?, a 
pair of k-uniform strategies X, 9 such that (X, J) 
is an €-Nash equilibrium. 


This result directly yields a quasi-polynomial 
(n°""™)) algorithm for computing such an 
approximate equilibrium. Moreover, as pointed 
out in [1], no algorithm that examines supports 
smaller than about Inn can achieve an 
approximation better than 1/4. 


Theorem 2 ([4]) The problem of computing a 
1/n°-Nash equilibrium of an x n bimatrix 
game is PPAD-complete. 


Theorem 2 asserts that, unless PPAD C P, 
there exists no fully polynomial time approxima- 
tion scheme for computing equilibria in bimatrix 
games. However, this does not rule out the ex- 
istence of a polynomial approximation scheme 
for computing an €-Nash equilibrium when € is 
an absolute constant, or even when e = ©(1 — 
/poly(In7)). Furthermore, as observed in [4], if 
the problem of finding an e-Nash equilibrium 
were PPAD-complete when € is an absolute con- 
stant, then, due to Theorem 1, all PPAD problems 
would be solved in quasi-polynomial time, which 
is unlikely to be the case. 

Two concurrent and independent works [6, 
11] were the first to make progress in provid- 
ing €-Nash equilibria and €-well-supported Nash 
equilibria for bimatrix games and some constant 
0 < e < 1. In particular, the work of Kontogian- 
nis, Panagopoulou, and Spirakis [11] proposes 
a simple linear-time algorithm for computing a 
3/4-Nash equilibrium for any bimatrix game: 
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Theorem 3 ({11]) Consider any n x m bimatrix 
game P= (A, B), and let aii,j1 = Maxj,; aij 
and bj2,;2 = max;,; bj;. Then the pair of strate- 
= Vio = 1/2 


gies (X,Y) where Xi, = Xin = Vj, 
is a 3/4-Nash equilibrium for T. 


The above technique can be extended so as to 
obtain a parametrized, stronger approximation: 


Theorem 4 ([11]) Consider a n x m bimatrix 
game T = (A,B). Let 41 *(A2 *) be the mini- 
mum, among all Nash equilibria of 1, expected 
payoff for the row (column) player, and let = 
max{A, *, Az *}. Then, there exists a (2 + A)/4- 
Nash equilibrium that can be computed in time 
polynomial inn and m. 


The work of Daskalakis, Mehta, and Papadim- 
itriou [6] provides a simple algorithm for com- 
puting a 1/2-Nash equilibrium: Pick an arbitrary 
row for the row player, say row i. Let j = 
arg max ‘bi. Let k = arg max Kak: Thus, j is 
a best-response column for the column player to 
the row i, and k is a best-response row for the row 
player to the column j. Let ¥ = 1/2e; + 1/2e, 
and ~ = e;, ie., the row player plays row i 
or row k with probability 1/2 each, while the 
column player plays column j with probability 
1. Then: 


Theorem 5 ([6]) The strategy profile (X, 9) is a 
1/2-Nash equilibrium. 


A polynomial construction (based on linear pro- 
gramming) of a 0.38-Nash equilibrium is pre- 
sented in [7]. 

For the more demanding notion of well- 
supported approximate Nash _ equilibrium, 
Daskalakis, Mehta, and Papadimitriou [6] 
propose an algorithm, which, under a quite 
interesting and plausible graph __ theoretic 
conjecture, constructs in polynomial time a 5/6- 
well-supported Nash equilibrium. However, the 
status of this conjecture is still unknown. In [6], 
it is also shown how to transform a [0,1 ]-bimatrix 
game to a {0, 1}-bimatrix game of the same size, 
so that each €-well-supported Nash equilibrium 
of the resulting game is (1 + €)/2-well-supported 
Nash equilibrium of the original game. 
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An algorithm given by Kontogiannis and 
Spirakis computes a 2/3-well-supported Nash 
equilibrium in polynomial time [12]. Their 
methodology for attacking the problem is based 
on the solvability of zero-sum bimatrix games 
(via its connection to linear programming) and 
provides a 0.5-well-supported Nash equilibrium 
for win-lose games and a 2/3-well-supported 
Nash equilibrium for normalized games. In [9], a 
polynomial-time algorithm computes an e-well- 
supported Nash equilibrium with e < 2/3, by 
extending the 2/3-algorithm of Kontogiannis 
and Spirakis. In particular, it is shown that either 
the strategies generated by their algorithm can 
be tweaked to improve the approximation or 
that we can find a sub-game that resembles 
matching pennies, which again leads to a better 
approximation. This allows to construct a (2/3- 
0.004735)-well-supported Nash equilibrium in 
polynomial time. 

Two new results improved the approximation 
status of e-Nash equilibria: 


Theorem 6 ([2]) There is a polynomial time al- 
gorithm, based on linear programming, that pro- 
vides an 0.36392-Nash equilibrium. 


The second result below, due to Tsaknakis 
and Spirakis, is the best till now. Based on local 
search, it establishes that any local minimum of 
a very natural map in the space of pairs of mixed 
strategies or its dual point in a certain minimax 
problem used for finding the local minimum 
constitutes a 0.3393-Nash equilibrium. 


Theorem 7 ((19]) There exists a polynomial 
time algorithm, based on the stationary points of 
a natural optimization problem, that provides an 
0.3393-Nash Equilibrium. 


In [20], it is shown that the problem of com- 
puting a Nash equilibrium for 2-person games 
can be polynomially reduced to an indefinite 
quadratic programming problem involving the 
spectrum of the adjacency matrix of a strongly 
connected directed graph on 7 vertices, where n 
is the total number of players’ strategies. Based 
on that, a new method is presented for com- 
puting approximate equilibria, and it is shown 
that its complexity is a function of the average 
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spectral energy of the underlying graph. The 
implications of the strong connectedness prop- 
erties on the energy and on the complexity of 
the method are discussed, and certain classes of 
graphs are described for which the method is a 
polynomial time approximation scheme (PTAS). 
The worst- case complexity is bounded by a 
subexponential function in the total number of 
strategies n. 

Kannan and Theobald [10] investigate a 
hierarchy of bimatrix games (A,B) which 
results from restricting the rank of the matrix 
A + B to be of fixed rank at most k. They 
propose a new model of €-approximation 
for games of rank k and, using results from 
quadratic optimization, show that approximate 
Nash equilibria of constant rank games 
can be computed deterministically in time 
polynomial in 1/e. Moreover, [10] provides a 
randomized approximation algorithm for certain 
quadratic optimization problems, which yields 
a randomized approximation algorithm for the 
Nash equilibrium problem. This randomized 
algorithm has similar time complexity as the 
deterministic one, but it has the possibility 
of finding an exact solution in polynomial 
time if a conjecture is valid. Finally, they 
present a polynomial time algorithm for relative 
approximation (with respect to the payoffs in an 
equilibrium) provided that the matrix A + B has 
a nonnegative decomposition. 


Applications 

Noncooperative game theory and its main 
solution concept, i.e., the Nash equilibrium, 
have been extensively used to understand the 
phenomena observed when decision-makers 
interact and have been applied in many diverse 
academic fields, such as biology, economics, 
sociology, and artificial intelligence. Since 
however the computation of a Nash equilibrium 
is in general PPAD-complete, it is important to 
provide efficient algorithms for approximating 
a Nash equilibrium; the algorithms discussed 
in this entry are a first step towards this 
direction. 
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Problem Definition 


The simultaneous purchase and sale of the same 
securities, commodities, or foreign exchange in 
order to profit from a differential in the price. 
This usually takes place on different exchanges 
or marketplaces and is also known as a “riskless 
profit.” 

Arbitrage is, arguably, the most fundamental 
concept in finance. It is a state of the variables 
of financial instruments such that a riskless profit 
can be made, which is generally believed not 
in existence. The economist’s argument for its 
nonexistence is that active investment agents will 
exploit any arbitrage opportunity in a financial 
market and thus will deplete it as soon as it 
may arise. Naturally, the speed at which such 
an arbitrage opportunity can be located and be 
taken advantage of is important for the profit- 
seeking investigators, which falls in the realm 
of analysis of algorithms and computational 
complexity. 

The identification of arbitrage states is, at fric- 
tionless foreign exchange market (a theoretical 
trading environment where all costs and restraints 
associated with transactions are nonexistent), not 
difficult at all and can be reduced to existence 
of arbitrage on three currencies (see [11]). In 
reality, friction does exist. Because of friction, 
it is possible that there exist arbitrage opportu- 
nities in the market but difficult to find it and to 
exploit it to eliminate it. Experimental results in 
foreign exchange markets showed that arbitrage 
does exist in reality. Examination of data from 10 
markets over a 12-day period by Mavrides [11] 
revealed that a significant arbitrage opportunity 
exists. Some opportunities were observed to be 
persistent for a long time. The problem becomes 
worse at forward and future markets (in which 
future contracts in commodities are traded) cou- 
pled with covered interest rates, as observed by 
Abeysekera and Turtle [1] and Clinton [4]. An 
obvious interpretation is that the arbitrage oppor- 
tunity was not immediately identified because of 
information asymmetry in the market. However, 
that is not the only factor. Both the time necessary 
to collect the market information (so that an 
arbitrage opportunity would be identified) and the 
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time people (or computer programs) need to find 
the arbitrage transactions are important factors 
for eliminating arbitrage opportunities. 

The computational complexity in identifying 
arbitrage, the level in difficulty measured by 
arithmetic operations, is different in different 
models of exchange systems. Therefore, to 
approximate an ideal exchange market, models 
with lower complexities should be preferred to 
those with higher complexities. 

To model an exchange system, consider n 
foreign currencies: N = {1,2,...,m}. For each 
ordered pair (i, 7), one may change one unit of 
currency i to rj; units of currency j. Rate rj; 
is the exchange rate from i to j. In an ideal 
market, the exchange rate holds for any amount 
that is exchanged. An arbitrage opportunity is a 
set of exchanges between pairs of currencies such 
that the net balance for each involved currency 
is nonnegative and there is at least one currency 
for which the net balance is positive. Under ideal 
market conditions, there is no arbitrage if and 
only if there is no arbitrage among any three 
currencies (see [11]). 

Various types of friction can be easily mod- 
eled in such a system. Bid-offer spread may 
be expressed in the present mathematical for- 
mat as rjjrjj < 1 for some i,j ¢€ WN. In 
addition, usually the traded amount is required 
to be in multiples of a fixed integer amount, 
hundreds, thousands, or millions. Moreover, dif- 
ferent traders may bid or offer at different rates, 
and each for a limited amount. A more general 
model to describe these market imperfections will 
include, for pairsi A j € N, Ji; different rates 
rf of exchanges from currency 7 to j up to bk 
units of currency i,k = 1,...,d;;, where Jj; 
is the number of different exchange rates from 
currency i to /. 

A currency exchange market can be repre- 
sented by a digraph G = (V, E) with vertex set 
V and arc set EF such that each vertex i € V 
represents currency i and each arc ak, e€ E 
represents the currency exchange relation from i 
to 7 with rate rh and bound be . Note that parallel 
arcs may occur for different exchange rates. Such 
a digraph is called an exchange digraph. Let x = 
(xf) denote a currency exchange vector (Fig. 1). 
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Arbitrage in Frictional Foreign Exchange Market, 
Fig. 1 Digraph G; 


Problem 1 The existence of arbitrage in a fric- 
tional exchange market can be formulated as 
follows: 
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at least one strict inequality holds 
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x;, isinteger, 1<k <1]jj, 1 <i fj <n. 
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Note that the first term in the right-hand side 
of (1) is the revenue at currency i by selling other 
currencies and the second term is the expense at 
currency 7 by buying other currencies. 

The corresponding optimization problem is 


Problem 2 The maximum arbitrage problem in 
a frictional foreign exchange market with bid-ask 
spreads, bound, and integrality constraints is the 
following integer linear programming (P): 
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where w; > O is a given weight for currency 
i,i = 1,2,...,n, with at least one w; > 0. 


Finally, consider another 


Problem 3 In order to eliminate arbitrage, how 
many transactions and arcs in a exchange digraph 
have to be used for the currency exchange sys- 
tem? 


Key Results 


A decision problem is called nondeterministic 
polynomial (VP for short) if its solution (if one 
exists) can be guessed and verified in polynomial 
time; nondeterministic means that no particular 
rule is followed to make the guess. If a problem 
is NP and all other NP problems are polynomial- 
time reducible to it, the problem is NP-complete. 
And a problem is called NP-hard if every other 
problem in NP is polynomial-time reducible to it 
(Fig. 2). 


Theorem 1 /t is NP-complete to determine 
whether there exists arbitrage in a frictional 
foreign exchange market with bid-ask spreads, 
bound, and integrality constraints even if all 
ij =". 


Then, a further inapproximability result is ob- 
tained. 


Theorem 2 There exists fixed € > 0 such that 
approximating (P) within a factor of n* is NP- 
hard even for any of the following two special 
cases: 


(Pi) alll;; = 1 andw; = 1. 
(Pz) alll;; = 1 and all but one w; = 0. 
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Arbitrage in Frictional Foreign Exchange Market, Fig. 2. Digraph G2 


Now, consider two polynomially solvable spe- 
cial cases when the number of currencies is con- 
stant or the exchange digraph is star shaped (a 
digraph is star shaped if all arcs have a common 
vertex). 


Theorem 3 There are polynomial time algo- 
rithms for (P.) when the number of currencies is 
constant. 


Theorem 4 It is polynomially solvable to find 
the maximum revenue at the center currency of 
arbitrage in a frictional foreign exchange mar- 
ket with bid-ask spread, bound, and integrality 
constraints when the exchange digraph is star 
shaped. 


However, if the exchange digraph is the coa- 
lescence of a star-shaped exchange digraph and 
its copy, shown by Digraph G1, then the problem 
becomes NP-complete. 


Theorem 5 It is NP-complete to decide whether 
there exists arbitrage in a frictional foreign ex- 
change market with bid-ask spreads, bound, and 
integrality constraints even if its exchange di- 
graph is coalescent. 


Finally, an answer to Problem 3 is as follows 


Theorem 6 There is an exchange digraph of 
order n such that at least |n/2|{n/2] — 1 trans- 
actions and at least n?/4 + n — 3 arcs are in 
need to bring the system back to non-arbitrage 
States. 


For instance, consider the currency exchange 
market corresponding to digraph G2 = (V, E), 


where the number of currencies ism = |V|, p = 
[n/|,and K =n?. 
Set 


C = {aj € Ell<i<ppt+i<j <n} 
U {arcp4 py }\l(—ptyi} U {aig—1|2 Si < p} 
Uf{aig¢ylp +1 <i <n—I. 


Then, |C| = |[n/2|[n/2] +n-2 = |E|/ > 
n*/4+n— 3. It follows easily from the rates and 
bounds that each arc in C has to be used to elim- 
inate arbitrage. And |n/2|[n/2]— 1 transactions 
corresponding to {a;, € E|l<i<p,p+1< 
J <1}\{a~p+n1} are in need to bring the system 
back to non-arbitrage states. 


Applications 


The present results show that different foreign 
exchange systems exhibit quite different compu- 
tational complexities. They may shed new light 
on how monetary system models are adopted and 
evolved in reality. In addition, it provides with 
a computational complexity point of view to the 
understanding of the now fast growing Internet 
electronic exchange markets. 
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Open Problems 


The dynamic models involving in both spot mar- 
kets (in which goods are sold for cash and de- 
livered immediately) and futures markets are the 
most interesting ones. To develop good approxi- 
mation algorithms for such general models would 
be important. In addition, it is also important to 
identify special market models for which poly- 
nomial time algorithms are possible even with 
future markets. Another interesting paradox in 
this line of study is why friction constraints that 
make arbitrage difficult are not always eliminated 
in reality. 
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Problem Definition 


Often it is desirable to encode a sequence of 
data efficiently to minimize the number of bits 
required to transmit or store the sequence. The 
sequence may be a file or message consisting of 
symbols (or letters or characters) taken from a 
fixed input alphabet, but more generally the se- 
quence can be thought of as consisting of events, 
each taken from its own input set. Statistical data 
compression is concerned with encoding the data 
in a way that makes use of probability estimates 
of the events. Lossless compression has the prop- 
erty that the input sequence can be reconstructed 
exactly from the encoded sequence. Arithmetic 
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Arithmetic Coding for Data Compression, Table 1 Comparison of codes for Huffman coding, Hu-Tucker coding, 


and arithmetic coding for a sample 5-symbol alphabet 


Prob. Huffman 
Symbol e x 
Dk —log 2 Px Code 
a 0.04 4.644 1111 
b 0.18 2.474 110 
Cc 0.43 1.218 0 
d 0.15 2.737 1110 
e 0.20 2.322 10 


coding is a nearly optimal statistical coding tech- 
nique that can produce a lossless encoding. 

Problem (statistical data compression) INPUT: 
A sequence of m events a1, d2,...,@m. The ith 
event a; is taken from a set of n distinct pos- 
sible events €;,1, @j,2,...,@in, With an accurate 
assessment of the probability distribution P; of 
the events. The distributions P; need not be the 
same for each event a;. 

OuTPuT: A succinct encoding of the events 
that can be decoded to recover exactly the origi- 
nal sequence of events. 

The goal is to achieve optimal or near-optimal 
encoding length. Shannon [10] proved that the 
smallest possible expected number of bits needed 
to encode the ith event is the entropy of P;, 
denoted by 


n 


H(Pi)) = Y° —pix logy pix 
k=1 


where pj; is the probability that e, occurs as 
the ith event. An optimal code outputs — log 2p 
bits to encode an event whose probability of 
occurrence is p. 

The well-known Huffman codes [6] are opti- 
mal only among prefix (or instantaneous) codes, 
that is, those in which the encoding of one event 
can be decoded before encoding has begun for 
the next event. Hu-Tucker codes are prefix codes 
similar to Huffman codes and are derived using a 
similar algorithm, with the added constraint that 
coded messages preserve the ordering of original 
messages. 

When an instantaneous code is not needed, 
as is often the case, arithmetic coding provides 


Hu-Tucker Arithmetic 
Length Code Length Length 
4 000 3 4.644 
3 001 3 2.474 
1 01 2 1.218 
4 10 2 213d 
2 11 2 2.322 


a number of benefits, primarily by relaxing the 
constraint that the code lengths must be integers: 
(1) The code length is optimal (—log 2p bits for 
an event with probability p), even when prob- 
abilities are not integer powers of 5. (2) There 
is no loss of coding efficiency even for events 
with probability close to 1. (3) It is trivial to 
handle probability distributions that change from 
event to event. (4) The input message to output 
message ordering correspondence of Hu-Tucker 
coding can be obtained with minimal extra effort. 

As an example, consider a 5-symbol input 
alphabet. Symbol probabilities, codes, and code 
lengths are given in Table 1. 

The average code length is 2.13 bits per in- 
put symbol for the Huffman code, 2.22 bits per 
symbol for the Hu-Tucker code, and 2.03 bits per 
symbol for arithmetic coding. 


Key Results 


In theory, arithmetic codes assign one “code- 
word” to each possible input sequence. The code- 
words consist of half-open subintervals of the 
half-open unit interval [0,1) and are expressed by 
specifying enough bits to distinguish the subinter- 
val corresponding to the actual sequence from all 
other possible subintervals. Shorter codes corre- 
spond to larger subintervals and thus more prob- 
able input sequences. In practice, the subinterval 
is refined incrementally using the probabilities of 
the individual events, with bits being output as 
soon as they are known. Arithmetic codes almost 
always give better compression than prefix codes, 
but they lack the direct correspondence between 
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the events in the input sequence and bits or groups 
of bits in the coded output file. 

The algorithm for encoding a file using arith- 
metic coding works conceptually as follows: 


1. The “current interval” [L, A’) is initialized to 
[0,1). 

2. For each event in the file, two steps are per- 
formed: 

(a) Subdivide the current interval into subinter- 
vals, one for each possible event. The size 
of an event’s subinterval is proportional to 
the estimated probability that the event will 
be the next event in the file, according to 
the model of the input. 

Select the subinterval corresponding to the 

event that actually occurs next and make it 

the new current interval. 

3. Output enough bits to distinguish the final 
current interval from all other possible final 
intervals. 


(b 


wm 


The length of the final subinterval is clearly 
equal to the product of the probabilities of the 
individual events, which is the probability p of 
the particular overall sequence of events. It can 
be shown that | — log 2p| + 2 bits are enough 
to distinguish the file from all other possible 
files. 

For finite-length files, it is necessary to in- 
dicate the end of the file. In arithmetic coding, 
this can be done easily by introducing a special 
low-probability event that can be injected into 
the input stream at any point. This adds only 
O(log m) bits to the encoded length of an m- 
symbol file. 

In step 2, one needs to compute only the 
subinterval corresponding to the event a; that 
actually occurs. To do this, it is convenient to 
use two “cumulative” probabilities: the cumula- 


tive probability Pc = > - px and the next- 
=1 

cumulative probability Py = 

i-1 

>* pe. The new subinterval is [L + Pc(H — 


Po + pi = 


k=1 

L),L+Pn(H —_ L)). The need to maintain 
and supply cumulative probabilities requires the 
model to have a sophisticated data structure, such 
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as that of Moffat [7], especially when many more 
than two events are possible. 


Modeling 

The goal of modeling for statistical data 
compression is to provide probability infor- 
mation to the coder. The modeling process 
consists of structural and probability estimation 
components; each may be adaptive (starting 
from a neutral model, gradually build up the 
structure and probabilities based on the events 
encountered), semi-adaptive (specify an initial 
model that describes the events to be encountered 
in the data and then modify the model during 
coding so that it describes only the events yet to 
be coded), or static (specify an initial model and 
use it without modification during coding). 

In addition there are two strategies for prob- 
ability estimation. The first is to estimate each 
event’s probability individually based on its fre- 
quency within the input sequence. The second is 
to estimate the probabilities collectively, assum- 
ing a probability distribution of a particular form 
and estimating the parameters of the distribution, 
either directly or indirectly. For direct estimation, 
the data can yield an estimate of the parameter 
(the variance, for instance). For indirect esti- 
mation [4], one can start with a small number 
of possible distributions and compute the code 
length that would be obtained with each; the one 
with the smallest code length is selected. This 
method is very general and can be used even 
for distributions from different families, without 
common parameters. 

Arithmetic coding is often applied to text com- 
pression. The events are the symbols in the text 
file, and the model consists of the probabilities 
of the symbols considered in some context. The 
simplest model uses the overall frequencies of 
the symbols in the file as the probabilities; this 
is a zero-order Markov model, and its entropy 
is denoted Ho. The probabilities can be esti- 
mated adaptively starting with counts of 1 for all 
symbols and incrementing after each symbol is 
coded, or the symbol counts can be coded before 
coding the file itself and either modified during 
coding (a decrementing semi-adaptive code) or 
left unchanged (a static code). In all cases, the 
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code length is independent of the order of the 
symbols in the file. 


Theorem 1 For ail input files, the code length 
La of an adaptive code with initial 1-weights is 
the same as the code length Lsp of the semi- 
adaptive decrementing code plus the code length 
Lu of the input model encoded assuming that 
all symbol distributions are equally likely. This 
code length is less than Ls = mHo + Ly, the 
code length of a static code with the same input 
model. In other words, La = Lsp + Ly < 
mHy + Ly = Ls. 


It is possible to obtain considerably better 
text compression by using higher-order Markov 
models. Cleary and Witten [2] were the first to 
do this with their PPM method. PPM requires 
adaptive modeling and coding of probabilities 
close to | and makes heavy use of arithmetic 
coding. 


Implementation Issues 


Incremental Output 
The basic implementation of arithmetic coding 
described above has two major difficulties: the 
shrinking current interval requires the use of 
high-precision arithmetic, and no output is pro- 
duced until the entire file has been read. The 
most straightforward solution to both of these 
problems is to output each leading bit as soon 
as it is known and then to double the length 
of the current interval so that it reflects only 
the unknown part of the final interval. Witten, 
Neal, and Cleary [11] add a clever mechanism 
for preventing the current interval from shrinking 
too much when the endpoints are close to $ 
but straddle s. In that case, one does not yet 
know the next output bit, but whatever it is, the 
following bit will have the opposite value; one 
can merely keep track of that fact and expand 
the current interval symmetrically about 5. This 
follow-on procedure may be repeated any number 
of times, so the current interval size is always 
strictly longer than i 

Before [11] other mechanisms for incremental 
transmission and fixed precision arithmetic were 
developed through the years by a number of 
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researchers beginning with Pasco [8]. The bit- 
stuffing idea of Langdon and others at IBM [9] 
that limits the propagation of carries in the ad- 
ditions serves a function similar to that of the 
follow-on procedure described above. 


Use of Integer Arithmetic 

In practice, the arithmetic can be done by storing 
the endpoints of the current interval as suffi- 
ciently large integers rather than in floating point 
or exact rational numbers. Instead of starting 
with the real interval [0,1), start with the integer 
interval [0,N), N invariably being a power of 2. 
The subdivision process involves selecting non- 
overlapping integer intervals (of length at least 
1) with lengths approximately proportional to the 
counts. 


Limited-Precision Arithmetic Coding 

Arithmetic coding as it is usually implemented is 
slow because of the multiplications (and in some 
implementations, divisions) required in subdivid- 
ing the current interval according to the prob- 
ability information. Since small errors in prob- 
ability estimates cause very small increases in 
code length, introducing approximations into the 
arithmetic coding process in a controlled way 
can improve coding speed without significantly 
degrading compression performance. In the Q- 
Coder work at IBM [9], the time-consuming mul- 
tiplications are replaced by additions and shifts, 
and low-order bits are ignored. 

Howard and Vitter [3] describe a different 
approach to approximate arithmetic coding. The 
fractional bits characteristic of arithmetic coding 
are stored as state information in the coder. The 
idea, called quasi-arithmetic coding, is to reduce 
the number of possible states and replace arith- 
metic operations by table lookups; the lookup 
tables can be precomputed. 

The number of possible states (after applying 
the interval expansion procedure) of an arithmetic 
coder using the integer interval [0,NV) is 3N 
?/16. The obvious way to reduce the number of 
states in order to make lookup tables practicable 
is to reduce N. Binary quasi-arithmetic coding 
causes an insignificant increase in the code length 
compared with pure arithmetic coding. 
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Theorem 2 Jn a quasi-arithmetic coder based 
on full interval [0,N), using correct probability 
estimates, and excluding very large and very 
small probabilities, the number of bits per input 
event by which the average code length obtained 
by the quasi-arithmetic coder exceeds that of an 
exact arithmetic coder is at most 


4, 2\ i. 5(2 
In2\ "2 ¢In2) N N2 


and the fraction by which the average code length 
obtained by the quasi-arithmetic coder exceeds 
that of an exact arithmetic coder is at most 


2 1 1 
(log, e ln x) logs N +O (aan) 


— 0.0861 1 
™ tos, O (ca) 


General-purpose algorithms for parallel encod- 
ing and decoding using both Huffman and quasi- 
arithmetic coding are given in [5]. 


Applications 


Arithmetic coding can be used in most applica- 
tions of data compression. Its main usefulness is 
in obtaining maximum compression in conjunc- 
tion with an adaptive model or when the probabil- 
ity of one event is close to 1. Arithmetic coding 
has been used heavily in text compression. It has 
also been used in image compression in the JPEG 
international standards for image compression 
and is an essential part of the JBIG international 
standards for bilevel image compression. Many 
fast implementations of arithmetic coding, espe- 
cially for a two-symbol alphabet, are covered by 
patents; considerable effort has been expended in 
adjusting the basic algorithm to avoid infringing 
those patents. 
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Open Problems 


The technical problems with arithmetic coding it- 
self have been completely solved. The remaining 
unresolved issues are concerned with modeling, 
in which the issue is how to decompose an input 
data set into a sequence of events, so that the set 
of events possible at each point in the data set can 
be described by a probability distribution suitable 
for input into the coder. The modeling issues are 
entirely application-specific. 


Experimental Results 


Some experimental results for the Calgary and 
Canterbury corpora are summarized in a report 
by Arnold and Bell [1]. 


Data Sets 


Among the most widely used data sets suitable 
for research in arithmetic coding are the 
Calgary Corpus and Canterbury Corpus (corpus. 
canterbury.ac.nz) and the Pizza&Chili Corpus 
(pizzachili.dec.uchile.cl or http://pizzachili.di. 
unip1.it). 


URL to Code 


A number of implementations of arithmetic cod- 
ing are available on The Data Compression Re- 
source on the Internet, www.data-compression. 
info/Algorithms/AC/. 


Cross-References 


> Burrows-Wheeler Transform 
Huffman Coding 


Recommended Reading 


1. Arnold R, Bell T (1997) A corpus for the evaluation 
of lossless compression algorithms. In: Proceedings 
of the IEEE data compression conference, Snowbird, 
Mar 1997, pp 201-210 


150 


2. Cleary JG, Witten IH (1984) Data compression using 
adaptive coding and partial string matching. IEEE 
Trans Commun COM-32:396—402 

3. Howard PG, Vitter JS (1992) Practical implementa- 
tions of arithmetic coding. In: Storer JA (ed) Images 
and text compression. Kluwer Academic, Norwell 

4. Howard PG, Vitter JS (1993) Fast and efficient loss- 
less image compression. In: Proceedings of the IEEE 
data compression conference, Snowbird, Mar 1993, 
pp 351-360 

5. Howard PG, Vitter JS (1996) Parallel lossless image 
compression using Huffman and arithmetic coding. 
Inf Process Lett 59:65—73 

6. Huffman DA (1952) A method for the construction 
of minimum redundancy codes. Proc Inst Radio Eng 
40:1098-1101 

7. Moffat A (1999) An improved data structure for cu- 
mulative probability tables. Softw Pract Exp 29:647— 
659 

8. Pasco R (1976) Source coding algorithms for fast data 
compression. Ph.D. thesis, Stanford University 

9. Pennebaker WB, Mitchell JL, Langdon GG, Arps RB 
(1988) An overview of the basic principles of the Q- 
coder adaptive binary arithmetic coder. IBM J Res 
Dev 32:717-726 

10. Shannon CE (1948) A mathematical theory of com- 
munication. Bell Syst Tech J 27:398-403 

11. Witten IH, Neal RM, Cleary JG (1987) Arithmetic 
coding for data compression. Commun ACM 30:520— 
540 


Assignment Problem 

Samir Khuller 

Computer Science Department, University of 
Maryland, College Park, MD, USA 
Keywords 


Weighted bipartite matching 


Years and Authors of Summarized 
Original Work 


1955; Kuhn 

1957; Munkres 

Problem Definition 

Assume that a complete bipartite graph, 


G(X, Y, X x Y), with weights w(x, y) assigned 
to every edge (x,y) is given. A matching M 


Assignment Problem 


is a subset of edges so that no two edges in 
M have a common vertex. A perfect matching 
is one in which all the nodes are matched. 
Assume that |X| = |Y| = n. The weighted 
matching problem is to find a matching with the 


greatest total weight, where w(M) = > w(e). 
ecM 
Since G is a complete bipartite graph, it has a 


perfect matching. An algorithm that solves the 
weighted matching problem is due to Kuhn [4] 
and Munkres [6]. Assume that all edge weights 
are non-negative. 


Key Results 


Define a feasible vertex labeling ¢ as a mapping 
from the set of vertices in G to the reals, where 


£(x) +£(y) = w(x, y). 


Call €(x) the label of vertex x. It is easy to 
compute a feasible vertex labeling as follows: 


Vy eYL(y) =0 
and 


VxeX f(x) =maxw(x,y). 
yey 


Define the equality subgraph, Gz, to be the 
spanning subgraph of G, which includes all ver- 
tices of G but only those edges (x, y) that have 
weights such that 


w(x, y) =€(x)+£(y). 


The connection between equality subgraphs and 
maximum-weighted matchings is provided by the 
following theorem: 


Theorem 1 /f the equality subgraph, Ge, has a 
perfect matching, M*, then M* is a maximum- 
weighted matching in G. 


In fact, note that the sum of the labels is an 
upper bound on the weight of the maximum- 
weighted perfect matching. The algorithm even- 
tually finds a matching and a feasible labeling 
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such that the weight of the matching is equal to 
the sum of all the labels. 


High-Level Description 

The above theorem is the basis of an algorithm 
for finding a maximum-weighted matching in a 
complete bipartite graph. Starting with a feasible 
labeling, compute the equality subgraph, and then 
find a maximum matching in this subgraph (here, 
one can ignore weights on edges). If the matching 
found is perfect, the process is done. If it is not 
perfect, more edges are added to the equality 
subgraph by revising the vertex labels. After 
adding edges to the equality subgraph, either the 
size of the matching goes up (an augmenting path 
is found) or the Hungarian tree continues to grow. 
(This is the structure of explored edges when one 
starts BFS simultaneously from all free nodes 
in S. When one reaches a matched node in T, 
one only explores the matched edge; however, all 
edges incident to nodes in S are explored.) In 
the former case, the phase terminates, and a new 
phase starts (since the matching size has gone 
up). In the latter case, the Hungarian tree grows 
by adding new nodes to it, and clearly, this cannot 
happen more than 7 times. 

Let S be the set of free nodes in X. Grow 
Hungarian trees from each node in S. Let T be 
the nodes in Y encountered in the search for an 
augmenting path from nodes in S. Add all nodes 
from X that are encountered in the search to S. 

Note the following about this algorithm: 


5=x\s 
T=Y\T 
[S| > |T| 


There are no edges from S to T since this would 
imply that one did not grow the Hungarian trees 
completely. As the Hungarian trees are grown in 
Gz, alternate nodes in the search are placed into 
S and T. To revise the labels, take the labels in S, 
and start decreasing them uniformly (say, by A), 
and at the same time, increase the labels in T by 
A. This ensures that the edges from S to T do not 
leave the equality subgraph (Fig. 1). 
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Only edges in Gy are shown 


S fh 
= 
eo 
—r +» 
@ 
—. 
Sg f 


Assignment Problem, Fig. 1 Sets S and T as main- 
tained by the algorithm 


As the labels in S' are decreased, edges (in G) 
from S to T will potentially enter the equality 
subgraph, Gz. As we increase A, at some point in 
time, an edge enters the equality subgraph. This 
is when one stops and updates the Hungarian tree. 
If the node from T added to T is matched to a 
node in S,, both these nodes are moved to S and 
T, which yields a larger Hungarian tree. If the 
node from T is free, an augmenting path is found, 
and the phase is complete. One phase consists of 
those steps taken between increases in the size of 
the matching. There are at most n phases, where 
n is the number of vertices in G (since in each 
phase the size of the matching increases by 1). 
Within each phase, the size of the Hungarian tree 
is increased at most n times. It is clear that in 
O(n) time, one can figure out which edge from 
S to T is the first to enter the equality subgraph 
(one simply scans all the edges). This yields an 
O(n*) bound on the total running time. How to 
implement it in O(n?) time is now shown. 
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More Efficient Implementation 
Define the slack of an edge as follows: 


slack (x,y) = (x) + £(y)-—w(x, y). 


Then, 


A= min_ slack(x,y). 


xeES,yeT 
Naively, the calculation of A requires O(n”) time. 
For every vertex y € T, keep track of the edge 
with the smallest slack, i.e., 


slack [y] = min slack (x, y). 
xeS 


The computation of slack[y] (for all y € T) 
requires O(n”) time at the start of a phase. As the 
phase progresses, it is easy to update all the slack 
values in O(n) time since all of them change by 
the same amount (the labels of the vertices in S 
are going down uniformly). Whenever a node u 
is moved from S to S, one must recompute the 
slacks of the nodes in 7, requiring O(n) time. 
But a node can be moved from S to S$ at most n 
times. 

Thus, each phase can be implemented in 
O(n”) time. Since there are n phases, this gives 
a running time of O(n). For sparse graphs, 
there is a way to implement the algorithm in 
O(n(m + nlog n)) time using min cost flows 
[1], where m is the number of edges. 


Applications 


There are numerous applications of biparitite 
matching, for example, scheduling unit-length 
jobs with integer release times and deadlines even 
with time-dependent penalties. 


Open Problems 


Obtaining a linear, or close to linear, time algo- 
rithm. 
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Problem Definition 


Consider a distributed system consisting of a set 
of processes that communicate by sending and 
receiving messages. The network is a multiset of 
messages, where each message is addressed to 
some process. A process is a state machine that 
can take three kinds of steps. 
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¢ Ina send step, a process places a message in 
the network. 

e In a receive step, a process A either reads 
and removes from the network a message 
addressed to A, or it reads a distinguished 
null value, leaving the network unchanged. 
If a message addressed to A is placed in the 
network, and if A subsequently performs an 
infinite number of receive steps, then A will 
eventually receive that message. 

¢ In a computation state, a process changes 
state without communicating with any other 
process. 


Processes are asynchronous: there is no bound on 
their relative speeds. Processes can crash: they 
can simply halt and take no more steps. This 
article considers executions in which at most one 
process crashes. 

In the consensus problem, each process starts 
with a private input value, communicates with the 
others, and then halts with a decision value. These 
values must satisfy the following properties: 


¢ Agreement: all processes’ decision values 
must agree. 

¢ Validity: every decision value must be some 
process’ input. 

¢ Termination: every non-fault process must de- 
cide in a finite number of steps. 


Fischer, Lynch, and Paterson showed that there 
is no protocol that solves consensus in any asyn- 
chronous message-passing system where even 
a single process can fail. This result is one of 
the most influential results in Distributed Com- 
puting, laying the foundations for a number of 
subsequent research efforts. 


Terminology 

Without loss of generality, one can restrict atten- 
tion to binary consensus, where the inputs are 
Oor 1. A protocol state consists of the states of the 
processes and the multiset of messages in transit 
in the network. An initial state is a protocol state 
before any process has moved, and a final state is 
a protocol state after all processes have finished. 
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The decision value of any final state is the value 
decided by all processes in that state. 

Any terminating protocol’s set of possible 
states forms a tree, where each node represents 
a possible protocol state, and each edge 
represents a possible step by some process. 
Because the protocol must terminate, the tree is 
finite. Each leaf node represents a final protocol 
state with decision value either 0 or 1. 

A bivalent protocol state is one in which the 
eventual decision value is not yet fixed. From any 
bivalent state, there is an execution in which the 
eventual decision value is 0, and another in which 
itis 1. A univalent protocol state is one in which 
the outcome is fixed. Every execution starting 
from a univalent state decides the same value. 
A 1-valent protocol state is univalent with even- 
tual decision value 1, and similarly for a 0-valent 
state. 

A protocol state is critical if 


e itis bivalent, and 
¢ if any process takes a step, the protocol state 
becomes univalent. 


Key Results 
Lemma 1 Every consensus protocol has a biva- 
lent initial state. 


Proof Assume, by way of contradiction, that 
there exists a consensus protocol for (m + 1) 


threads Aop,--- , An in which every initial state 
is univalent. Let s; be the initial state where pro- 
cesses A;,--- , A, have input 0 and Ao,..., Aj—1 


have input 1. Clearly, so is 0-valent: all processes 
have input 0, so all must decide 0 by the validity 
condition. If s; is O-valent, so is 5;4 1. These states 
differ only in the input to process A; : 0 in 5;, 
and 1 in s;4,. Any execution starting from s; in 
which A; halts before taking any steps is indistin- 
guishable from an execution starting from s;+ 
in which A; halts before taking any steps. Since 
processes must decide 0 in the first execution, 
they must decide | in the second. Since there 
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is one execution starting from s;+, that decides 
0, and since s;4, is univalent by hypothesis, 
5;41 is O-valent. It follows that the state s,4 1, in 
which all processes start with input 1, is O-valent, 
a contradiction. O 


Lemma 2 Every consensus protocol has a criti- 
cal state. 


Proof by contradiction. By Lemma 1, the proto- 
col has a bivalent initial state. Start the protocol 
in this state. Repeatedly choose a process whose 
next step leaves the protocol in a bivalent state, 
and let that process take a step. Either the protocol 
runs forever, violating the termination condition, 
or the protocol eventually enters a critical state.0 


Theorem 1 There is no consensus protocol for 
an asynchronous message-passing system where 
a single process can crash. 


Proof Assume by way of contradiction that such 
a protocol exists. Run the protocol until it reaches 
a critical state s. There must be two processes A 
and B such that A’s next step carries the protocol 
to a 0-valent state, and B’s next step carries the 
protocol to a 1-valent state. 

Starting from s, let s4 be the state reached if 
A takes the first step, sz if B takes the first step, 
Sap if A takes a step followed by B, and so on. 
States s4 and sap are O-valent, while sg and spa 
are 1-valent. The rest is a case analysis. 

Of all the possible pairs of steps A and B could 
be about to execute, most of them commute: states 
Sap and Spa are identical, which is a contradiction 
because they have different valences. 

The only pair of steps that do not commute 
occurs when A is about to send a message to B 
(or vice versa). Let s4g be the state resulting if 
A sends a message to B and B then receives it, 
and let sg, be the state resulting if B receives 
a different message (or null) and then A sends 
its message to B. Note that every process other 
than B has the same local state in sag and sg,. 
Consider an execution starting from s4g in which 
every process other than B takes steps in round- 
robin order. Because sag is O-valent, they will 
eventually decide 0. Next, consider an execution 
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starting from sg, in which every process other 
than B takes steps in round-robin order. Because 
Spa is 1-valent, they will eventually decide 1. But 
all processes other than B have the same local 
states at the end of each execution, so they cannot 
decide different values, a contradiction. Oo 


In the proof of this theorem, and in the proofs 
of the preceding lemmas, we construct scenarios 
where at most a single process is delayed. As a re- 
sult, this impossibility result holds for any system 
where a single process can fail undetectably. 


Applications 


The consensus problem is a key tool for un- 
derstanding the power of various asynchronous 
models of computation. 


Open Problems 


There are many open problems concerning the 
solvability of consensus in other models, or with 
restrictions on inputs. 


Related Work 


The original paper by Fischer, Lynch, and Pater- 
son [8] is still a model of clarity. 

Many researchers have examined alternative 
models of computation in which consensus can 
be solved. Dolev, Dwork, and Stockmeyer [5] 
examine a variety of alternative message-passing 
models, identifying the precise assumptions 
needed to make consensus possible. Dwork, 
Lynch, and Stockmeyer [6] derive upper and 
lower bounds for a semi-synchronous model 
where there is an upper and lower bound on 
message delivery time. Ben-Or [1] showed that 
introducing randomization makes 
possible in an asynchronous message-passing 
system. Chandra and Toueg [3] showed that 
consensus becomes possible if in the presence 
of an oracle that can (unreliably) detect when 
a process has crashed. Each of the papers cited 


consensus 
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here has inspired many follow-up papers. A good 
place to start is the excellent survey by Fich and 
Ruppert [7]. 

A protocol is wait-free if it tolerates failures 
by all but one of the participants. A concur- 
rent object implementation is linearizable if each 
method call seems to take effect instantaneously 
at some point between the method’s invocation 
and response. Herlihy [9] showed that shared- 
memory objects can each be assigned a consensus 
number, which is the maximum number of pro- 
cesses for which there exists a wait-free consen- 
sus protocol using a combination of read-write 
memory and the objects in question. Consensus 
numbers induce an infinite hierarchy on objects, 
where (simplifying somewhat) higher objects are 
more powerful than lower objects. In a system of 
n or more concurrent processes, it is impossible 
to construct a lock-free implementation of an 
object with consensus number n from an object 
with a lower consensus number. On the other 
hand, any object with consensus number n is 
universal in a system of n or fewer processes: it 
can be used to construct a wait-free linearizable 
implementation of any object. 

In 1990, Chaudhuri [4] introduced the k-set 
agreement problem (sometimes called k-set con- 
sensus, which generalizes consensus by allowing 
k or fewer distinct decision values to be chosen. 
In particular, 1-set agreement is consensus. The 
question whether k-set agreement can be solved 
in asynchronous message-passing models was 
open for several years, until three independent 
groups [2, 10, 11] showed that no protocol exists. 
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Problem Definition 


The problem is concerned with allowing a set 
of processes to concurrently broadcast messages 
while ensuring that all destinations consistently 
deliver them in the exact same sequence, in spite 
of the possible presence of a number of faulty 
processes. 

The work of Cristian, Aghili, Strong, and 
Dolev [7] considers the problem of atomic broad- 
cast in a system with approximately synchronized 
clocks and bounded transmission and processing 
delays. They present successive extensions of an 
algorithm to tolerate a bounded number of omis- 
sion, timing, or Byzantine failures, respectively. 


Related Work 

The work presented in this entry originally ap- 
peared as a widely distributed conference contri- 
bution [6], over a decade before being published 
in a journal [7], at which time the work was well- 
known in the research community. Since there 
was no significant change in the algorithms, the 
historical context considered here is hence with 
respect to the earlier version. 

Lamport [11] proposed one of the first pub- 
lished algorithms to solve the problem of order- 
ing broadcast messages in a distributed systems. 
That algorithm, presented as the core of a mutual 
exclusion algorithm, operates in a fully asyn- 
chronous system (i.e., a system in which there 
are no bounds on processor speed or commu- 
nication delays), but does not tolerate failures. 
Although the algorithms presented here rely on 
physical clocks rather than Lamport’s logical 
clocks, the principle used for ordering messages 
is essentially the same: message carry a times- 
tamp of their sending time; messages are deliv- 
ered in increasing order of the timestamp, using 
the sending processor name for messages with 
equal timestamps. 

At roughly the same period as the initial publi- 
cation of the work of Cristian et al. [6], Chang and 
Maxemchuck [3] proposed an atomic broadcast 
protocol based on a token passing protocol, and 
tolerant to crash failures of processors. Also, 
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Carr [1] proposed the Tandem global update pro- 
tocol, tolerant to crash failures of processors. 

Cristian [5] later proposed an extension to 
the omission-tolerant algorithm presented here, 
under the assumption that the communication 
system consists of f +1 independent broad- 
cast channels (where f is the maximal number 
of faulty processors). Compared with the more 
general protocol presented here, its extension 
generates considerably fewer messages. 

Since the work of Cristian, Aghili, Strong, 
and Dolev [7], much has been published on the 
problem of atomic broadcast (and its numerous 
variants). For further reading, Défago, Schiper, 
and Urban [8] surveyed more than sixty different 
algorithms to solve the problem, classifying them 
into five different classes and twelve variants. 
That survey also reviews many alternative defi- 
nitions and references about two hundred articles 
related to this subject. This is still a very active 
research area, with many new results being pub- 
lished each year. 

Hadzilacos and Toueg [10] provide a system- 
atic classification of specifications for variants of 
atomic broadcast as well as other broadcast prob- 
lems, such as reliable broadcast, FIFO broadcast, 
or causal broadcast. 

Chandra and Toueg [2] proved the equiva- 
lence between atomic broadcast and the con- 
sensus problem. Thus, any application solved 
by a consensus can also be solved by atomic 
broadcast and vice-versa. Similarly, impossibil- 
ity results apply equally to both problems. For 
instance, it is well-known that consensus, thus 
atomic broadcast, cannot be solved deterministi- 
cally in an asynchronous system with the pres- 
ence of a faulty process [9]. 


Notations and Assumptions 

The system G consists of n distributed processors 
and m point-to-point communication links. A link 
does not necessarily exists between every pair 
of processors, but it is assumed that the com- 
munication network remains connected even in 
the face of faults (whether processors or links). 
All processors have distinct names and there 
exists a total order on them (e.g., lexicographic 
order). 
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A component (link or processor) is said to 
be correct if its behavior is consistent with 
its specification, and faulty otherwise. The 
paper considers three classes of component 
failures, namely, omission, timing, and Byzantine 
failures. 


e An omission failure occurs when the faulty 
component fails to provide the specified out- 
put (e.g., loss of a message). 

¢ A timing failure occurs when the faulty com- 
ponent omits a specified output, or provides it 
either too early or too late. 

¢ A Byzantine failure [12] occurs when the com- 
ponent does not behave according to its spec- 
ification, for instance, by providing output 
different from the one specified. In particular, 
the paper considers authentication-detectable 
Byzantine failures, that is, ones that are de- 
tectable using a message authentication pro- 
tocol, such as error correction codes or digital 
signatures. 


Each processor p has access to a local clock C, 
with the properties that (1) two separate clock 
readings yield different values, and (2) clocks are 
e-synchronized, meaning that, at any real time 1, 
the deviation in readings of the clocks of any two 
processors p and q is at most e. 

In addition, transmission and processing de- 
lays, as measured on the clock of a correct pro- 
cessor, are bounded by a known constant 8. This 
bound accounts not only for delays in transmis- 
sion and processing, but also for delays due to 
scheduling, overload, clock drift or adjustments. 
This is called a synchronous system model. 

The diffusion time dé is the time necessary 
to propagate information to all correct processes, 
in a surviving network of diameter d with the 
presence of a most x processor failures and ) link 
failures. 


Problem Definition 

The problem of atomic broadcast is defined in 
a synchronous system model as a broadcast prim- 
itive which satisfies the following three proper- 
ties: atomicity, order, and termination. 
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Problem 1 (Atomic broadcast) 

INPUT: A stream of messages broadcast by 
n concurrent processors, some of which may be 
faulty. 

OuTPUT: The messages delivered in sequence, 
with the following properties: 


1. Atomicity: if any correct processor delivers 
an update at time U on its clock, then that 
update was initiated by some processor and is 
delivered by each correct processor at time U 
on its clock. 

2. Order: all updates delivered by correct proces- 
sors are delivered in the same order by each 
correct processor. 

3. Termination: every update whose broadcast is 
initiated by a correct processor at time T on its 
clock is delivered at all correct processors at 
time T + A on their clock. 


Nowadays, problem definitions for atomic 
broadcast that do not explicitly refer to physical 
time are often preferred. Many variants of time- 
free definitions are reviewed by Hadzilacos and 
Toueg [10] and Défago et al. [8]. One such 
alternate definition is presented below, with 
the terminology adapted to the context of this 
entry. 


Problem 2 (Total order broadcast) 

INPUT: A stream of messages broadcast by n con- 
current processors, some of which may be faulty. 
OUTPUT: The messages delivered in sequence, 
with the following properties: 


1. Validity: if a correct processor broadcasts 
a message m, then it eventually delivers m. 

2. Uniform agreement: if a processor delivers 
a message m, then all correct processors even- 
tually deliver m. 

3. Uniform integrity: for any message m, every 
processor delivers m at most once, and only 
if m was previously broadcast by its sending 
processor. 

4. Gap-free uniform total order: if some pro- 
cessor delivers message m’ after message m, 
then a processor delivers m’ only after it has 
delivered m. 
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Key Results 


The paper presents three algorithms for solving 
the problem of atomic broadcast, each under an 
increasingly demanding failure model, namely, 
omission, timing, and Byzantine failures. Each 
protocol is actually an extension of the previous 
one. 

All three protocols are based on a clas- 
sical flooding, or information diffusion, 
algorithm [14]. Every message carries its 
initiation timestamp T, the name of the initiating 
processor s, and an update o. A message is then 
uniquely identified by (s, T). Then, the basic 
protocol is simple. Each processor logs every 
message it receives until it is delivered. When it 
receives a message that was never seen before, 
it forwards that message to all other neighbor 
processors. 


Atomic Broadcast for Omission Failures 
The first atomic broadcast protocol, supporting 


omission failures, considers a _ termination 
time A, as follows. 
Ap=mo+déb+e. (1) 


The delivery deadline T + A, is the time by 
which a processor can be sure that it has received 
copies of every message with timestamp T (or 
earlier) that could have been received by some 
correct process. 

The protocol then works as follows. When 
a processor initiates an atomic broadcast, it prop- 
agates that message, similar to the diffusion al- 
gorithm described above. The main exception is 
that every message received after the local clock 
exceeds the delivery deadline of that message, 
is discarded. Then, at local time JT + Ag, a pro- 
cessor delivers all messages timestamped with 7, 
in order of the name of the sending processor. 
Finally, it discards all copies of the messages 
from its logs. 


Atomic Broadcast for Timing Failures 

The second protocol extends the first one by in- 
troducing a hop count (1.e., a counter incremented 
each time a message is relayed) to the messages. 
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With this information, each relaying processor 
can determine when a message is timely, that is, 
if a message timestamped T with hop count h is 
received at time U then the following condition 
must hold. 


T—-he<U<T+h(b+6). (2) 


Before relaying a message, each processor checks 
the acceptance test above and discard the message 
if it does not satisfy it. The termination time A, of 
the protocol for timing failures is as follows. 


A,=n(St+e)tdd+e. (3) 


The authors point out that discarding early mes- 
sages is not necessary for correctness, but ensures 
that correct processors keep messages in their log 
for a bounded amount of time. 


Atomic Broadcast for Byzantine Failures 
Given some text, every processor is assumed to 
be able to generate a signature for it, that cannot 
be faked by other processors. Furthermore, every 
processor knows the name of every other proces- 
sors in the network, and has the ability to verify 
the authenticity of their signature. 

Under the above assumptions, the third pro- 
tocol extends the second one by adding signa- 
tures to the messages. To prevent a Byzantine 
processor (or link) from tampering with the hop 
count, a message is co-signed by every processor 
that relays it. For instance, a message signed by 
k processors p1,..., Px is as follows. 


(relayed, .. (relayed, (first, T,0, P1, 51), P2,82), 


. ++ Pky Sk) 


Where o is the update, T the timestamp, p; the 
message source, and s; the signature generated 
by processor p;. Any message for which one of 
the signature cannot be authenticated is simply 
discarded. Also, if several updates initiated by 
the same processor p carry the same timestamp, 
this indicates that p is faulty and the correspond- 
ing updates are discarded. The remainder of the 
protocol is the same as the second one, where 
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the number of hops is given by the number of 
signatures. The termination time Aj, is also as 
follows. 


Ap=n(dte)t+déb+e. (4) 
The authors insist however that, in this case, the 
transmission time § must be considerably larger 
than in the previous case, since it must account 
for the time spent in generating and verifying the 
digital signatures; usually a costly operation. 


Bounds 

In addition to the three protocols presented above 
and their correctness, Cristian et al. [7] prove the 
following two lower bounds on the termination 
time of atomic broadcast protocols. 


Theorem 1 /[f the communication network G 
requires x steps, then any atomic broadcast pro- 
tocol tolerant of up to m processor and X link 
omission failures has a termination time of at 
least x5 + €. 


Theorem 2 Any atomic broadcast protocol for 
a Hamiltonian network with n processors that tol- 
erate n — 2 authentication-detectable Byzantine 
processor failures cannot have a termination time 
smaller than (n — 1)(6 + €). 


Applications 


The main motivation for considering this problem 
is its use as the cornerstone for ensuring fault- 
tolerance through process replication. In particu- 
lar, the authors consider a synchronous replicated 
storage, which they define as a distributed and 
resilient storage system that displays the same 
content at every correct physical processor at any 
clock time. Using atomic broadcast to deliver 
updates ensures that all updates are applied at 
all correct processors in the same order. Thus, 
provided that the replicas are initially consis- 
tent, they will remain consistent. This technique, 
called state-machine replication [11, 13] or also 
active replication, is widely used in practice as 
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a means of supporting fault-tolerance in dis- 
tributed systems. 

In contrast, Cristian et al. [7] consider atomic 
broadcast in a synchronous system with bounded 
transmission and processing delays. Their work 
was motivated by the implementation of a highly- 
available replicated storage system, with tightly 
coupled processors running a real-time operating 
system. 

Atomic broadcast has been used as a support 
for the replication of running processes in real- 
time systems or, with the problem reformulated 
to isolate explicit timing requirements, has also 
been used as a support for fault-tolerance and 
replication in many group communication toolk- 
its (see survey of Chockler et al. [4]). 

In addition, atomic broadcast has been 
used for the replication of database systems, 
as a means to reduce the synchronization 
between the replicas. Wiesmann and Schiper [15] 
have compared different database replication 
and transaction processing approaches based 
on atomic broadcast, showing interesting 
performance gains. 
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Problem Definition 


Given here is a basic formulation using the online 
mistake-bound model, which was used by Little- 
stone [9] in his seminal work. 

Fix a class C of Boolean functions over n 
variables. To start a learning scenario, a target 
function f, € C is chosen but not revealed to the 
learning algorithm. Learning then proceeds in a 
sequence of trials. At trial t, an input x; € {0, 1}” 
is first given to the learning algorithm. The learn- 
ing algorithm then produces its prediction y,, 
which is its guess as to the unknown value fx (x;). 
The correct value y; = fx(x;) is then revealed to 
the learner. If yy 4 J;, the learning algorithm 
made a mistake. The learning algorithm learns C 
with mistake-bound m, if the number of mistakes 
never exceeds m, no matter how many trials are 
made and how f, and x1, x2,... are chosen. 


Variable (or attribute) X; is relevant 
for function f {0,1}"7 —> {0,1} if 
F Cig 3 Ki eK) x f(x,...,1 - 


Xj,...,Xn) holds for some ¥ € {0, 1}”. Suppose 
now that for some k < n, every function f € C 
has at most k relevant variables. It is said that 
a learning algorithm learns class C attribute- 
efficiently, if it learns C with a mistake-bound 
polynomial in k and logn. Additionally, the 
computation time for each trial is usually required 
to be polynomial in 7. 


Key Results 


The main part of current research of attribute- 
efficient learning stems from  Littlestone’s 
Winnow algorithm [9]. The basic version of 
Winnow maintains a weight vector w, = 
(W1t,1,---,Wen) € R”. The prediction for input 
x1 € {0, 1}” is given by 

/ 


where 0 is a parameter of the algorithm. Initially 
w, = (1,...,1), and after trial ¢, each compo- 
nent w;,; is updated according to 


n 
yt = Sign ) Wrirt,i = 


i=1 
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aw; ify, = 1, 3p = Oandx,;= 1 
wi /o ify, = 0, 3p = Land x,;= 1 
otherwise 


Wt+1,i = 
Wii 


(1) 


where a > | is a learning rate parameter. 

Littlestone’s basic result is that with a suitable 
choice of @ and a, Winnow learns the class 
of monotone k-literal disjunctions with mistake- 
bound O(k logn). Since the algorithm changes 
its weights only when a mistake occurs, this 
bound also guarantees that the weights remain 
small enough for computation times to remain 
polynomial in 1. With simple transformations, 
Winnow also yields attribute-efficient learning 
algorithms for general disjunctions and conjunc- 
tions. Various subclasses of DNF formulas and 
decision lists [8] can be learned, too. 

Winnow is quite robust against noise, i.e., 
errors in input data. This is extremely impor- 
tant for practical applications. Remove now the 
assumption about a target function f. € C 
satisfying yy = fx(x;) for all t. Define attribute 
error of a pair (x, y) with respect to a function 
f as the minimum Hamming distance between 
x and x’ such that f(x’) = y. The attribute 
error of a sequence of trials with respect to f 
is the sum of attribute errors of the individual 
pairs (x;, yr). Assuming the sequence of trials 
has attribute error at most A with respect to some 
k-literal disjunction, Auer and Warmuth [1] show 
that Winnow makes O(A+k log) mistakes. The 
noisy scenario can also be analyzed in terms of 
hinge loss [5]. 

The update rule (1) has served as a model 
for a whole family of multiplicative update al- 
gorithms. For example, Kivinen and Warmuth 
[7] introduce the exponentiated gradient algo- 
rithm, which is essentially Winnow modified for 
continuous-valued prediction, and show how it 
can be motivated by a relative entropy minimiza- 
tion principle. 

Consider a function class C where each func- 
tion can be encoded using O(p(k)logn) bits 
for some polynomial p. An example would be 
Boolean formulas with k relevant variables, when 
the size of the formula is restricted to p(k) 
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ignoring the size taken by the variables. The 
cardinality of C is then |C| = 207) ’2”), The 
classical halving algorithm (see [9] for discussion 
and references) learns any class consisting of 
m Boolean functions with mistake-bound log, m 
and would thus provide an attribute-efficient al- 
gorithm for such a class C. However, the running 
time would not be polynomial. Another serious 
drawback would be that the halving algorithm 
does not tolerate any noise. Interestingly, a mul- 
tiplicative update similar to (1) has been used 
in Littlestone and Warmuth’s weighted majority 
algorithm [10], and also Vovk’s aggregating al- 
gorithm [14], to produce a noise-tolerant general- 
ization of the halving algorithm. 

Attribute-efficient learning has also been stud- 
ied in other learning models than the mistake- 
bound model, such as Probably Approximately 
Correct learning [4], learning with uniform dis- 
tribution [12], and learning with membership 
queries [3]. The idea has been further developed 
into learning with a potentially infinite number of 
attributes [2]. 


Applications 


Attribute-efficient algorithms for simple function 
classes have a potentially interesting application 
as a component in learning more complex func- 
tion classes. For example, any monotone k-term 
DNF formula over variables x1,...,X, can be 
represented as a monotone k-literal disjunction 
over 2” variables z4, where z,4 is defined as 
zaA = J[[ x for A C {1,...,a}. Running 
i€A 

Winnow with the transformed inputs z € {0, 1} 
would give a mistake-bound O(k log 2”) = 
O(kn). Unfortunately the running time would 
be linear in 2”, at least for a naive implemen- 
tation. Khardon et al. [6] provide discouraging 
computational hardness results for this potential 
application. 

Online learning algorithms have a natural ap- 
plication domain in signal processing. In this 
setting, the sender emits a true signal y; at time 
t, fort = 1,2,3,.... At some later time (¢ + d), 
a receiver receives a signal z;, which is a sum 


gn 
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of the original signal y, and various echoes of 
earlier signals y,’, t’ < t, all distorted by random 
noise. The task is to recover the true signal y; 
based on received signals Z;, Zr-1,..., Zp] OVET 
some time window /. Currently attribute-efficient 
algorithms are not used for such tasks, but see 
[11] for preliminary results. 

Attribute-efficient learning algorithms are 
similar in spirit to statistical methods that find 
sparse models. In particular, statistical algorithms 
that use L, regularization are closely related 
to multiplicative algorithms such as winnow 
and exponentiated gradient. In contrast, more 
classical Lz regularization leads to algorithms 
that are not attribute-efficient [13]. 
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Problem Definition 


This problem is concerned with the automated 
development and analysis of search tree algo- 
rithms. Search tree algorithms are a popular way 
to find optimal solutions to NP-complete prob- 
lems. (For ease of presentation, only decision 
problems are considered; adaption to optimiza- 
tion problems is straightforward.) The idea is 
to recursively solve several smaller instances in 
such a way that at least one branch is a yes- 
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instance if and only if the original instance is. 
Typically, this is done by trying all possibilities to 
contribute to a solution certificate for a small part 
of the input, yielding a small local modification 
of the instance in each branch. 

For example, consider the NP-complete 
CLUSTER EDITING problem: can a given graph 
be modified by adding or deleting up to k edges 
such that the resulting graph is a cluster graph, 
that is, a graph that is a disjoint union of cliques? 
To give a search tree algorithm for CLUSTER 
EDITING, one can use the fact that cluster graphs 
are exactly the graphs that do not contain a P3 
(a path of 3 vertices) as an induced subgraph. One 
can thus solve CLUSTER EDITING by finding 
a P3 and splitting it into 3 branches: delete the 
first edge, delete the second edge, or add the 
missing edge. By this characterization, whenever 
there is no P3 found, one already has a cluster 
graph. The original instance has a solution with 
k modifications if and only if at least one of the 
branches has a solution with k — 1 modifications. 


Analysis 

For NP-complete problems, the running time of 
a search tree algorithm only depends on the size 
of the search tree up to a polynomial factor , 
which depends on the number of branches and the 
reduction in size of each branch. If the algorithm 
solves a problem of size s and calls itself recur- 
sively for problems of sizes s — d1,...,5 —dj, 
then (d,,...,d;) is called the branching vector 
of this recursion. It is known that the size of the 
search tree is then O(a’), where the branching 
number a is the only positive real root of the 
characteristic polynomial 


() 


where d = max{dj,...,d;}. For the simple 
CLUSTER EDITING search tree algorithm and the 
size measure k, the branching vector is (1, 1, 1) 
and the branching number is 3, meaning that the 
running time is up to a polynomial factor O(3*). 


Case Distinction 
Often, one can obtain better running times by 
distinguishing a number of cases of instances, 
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and giving a specialized branching for each case. 
The overall running time is then determined by 
the branching number of the worst case. Several 
publications obtain such algorithms by hand (e.g., 
a search tree of size O(2.27*) for CLUSTER EDIT- 
ING [4]); the topic of this work is how to automate 
this. That is, the problem is the following: 


Problem 1 (Fast Search Tree Algorithm) 
INPUT: An NP-hard problem P and a size mea- 
sure s(J) of an instance J of P where instances J 
with s(7) = 0 can be solved in polynomial time. 
OUTPUT: A partition of the instance set of P into 
cases, and for each case a branching such that the 
maximum branching number over all branchings 
is as small as possible. 


Note that this problem definition is somewhat 
vague; in particular, to be useful, the case an 
instance belongs to must be recognizable quickly. 
It is also not clear whether an optimal search 
tree algorithm exists; conceivably, the branching 
number can be continuously reduced by increas- 
ingly complicated case distinctions. 


Key Results 


Gramm et al. [3] describe a method to obtain fast 
search tree algorithms for CLUSTER EDITING 
and related problems, where the size measure is 
the number of editing operations k. To get a case 
distinction, a number of subgraphs are enumer- 
ated such that each instance is known to contain at 
least one of these subgraphs. It is next described 
how to obtain a branching for a particular case. 
A standard way of systematically obtaining 
specialized branchings for instance cases is to use 
a combination of basic branching and data re- 
duction rules. Basic branching is typically a very 
simple branching technique, and data reduction 
rules replace an instance with a smaller, solution- 
equivalent instance in polynomial time. Applying 
this to CLUSTER EDITING first requires a small 
modification of the problem: one considers an 
annotated version, where an edge can be marked 
as permanent and a non-edge can be marked as 
forbidden. Any such annotated vertex pair cannot 
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Automated Search Tree 
Generation, Fig. 1 
Branching for a CLUSTER 
EDITING case using only 
basic branching on vertex 
pairs (double circles), and 
applications of the 
reduction rules (asterisks). 
Permanent edges are 
marked bold, forbidden 1 
edges dashed. The numbers 
next to the subgraphs state 
the change of the problem 
size k. The branching 
vector is (1, 2, 3, 3, 2), 
corresponding to a search 
tree size of O(2.27*) 


be edited anymore. For a pair of vertices, the 
basic branching then branches into two cases: 
permanent or forbidden (one of these options will 
require an editing operation). The reduction rules 
are: if two permanent edges are adjacent, the third 
edge of the triangle they induce must also be 
permanent; and if a permanent and a forbidden 
edge are adjacent, the third edge of the triangle 
they induce must be forbidden. 

Figure | shows an example branching derived 
in this way. 

Using a refined method of searching the 
space for all possible cases and to distinguish 
all branchings for a case, Gramm et al. [3] derive 
a number of search tree algorithms for graph 
modification problems. 


Applications 


Gramm et al. [3] apply the automated genera- 
tion of search tree algorithms to several graph 
modification problems (see also Table 1). Fur- 
ther, Hiiffner [5] demonstrates an application of 
DOMINATING SET on graphs with maximum 
degree 4, where the size measure is the size of 
the dominating set. 

Fedin and Kulikov [2] examine variants of 
SAT; however, their framework is limited in that 
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it only proves upper bounds for a fixed algorithm 
instead of generating algorithms. 

Skjernaa [6] also presents results on variants 
of SAT. His framework does not require user- 
provided data reduction rules, but determines 
reductions automatically. 


Open Problems 


The analysis of search tree algorithms can be 
much improved by describing the “size” of an 
instance by more than one variable, resulting in 
multivariate recurrences [1]. It is open to intro- 
duce this technique into an automation frame- 
work. 

It has frequently been reported that better 
running time bounds obtained by distinguish- 
ing a large number of cases do not necessarily 
speed up, but in fact can slow down, a program. 
A careful investigation of the tradeoffs involved 
and a corresponding adaption of the automation 
frameworks is an open task. 


Experimental Results 


Gramm et al. [3] and Hiiffner [5] report search 
tree sizes for several NP-complete problems. Fur- 
ther, Fedin and Kulikov [2] and Skjernaa [6] 
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Automated Search Tree Generation, Table 1 
Summary of search tree sizes where automation gave 
improvements. “Known” is the size of the best previously 
published “hand-made” search tree. For the satisfiability 
problems, m is the number of clauses and / is the length 
of the formula 


Problem Trivial Known New 
CLUSTER EDITING 3 2.2) 1.92 [3] 
CLUSTER DELETION 2 1.77 1.53 [3] 
CLUSTER VERTEX 3 221 2.26 [3] 
DELETION 

BOUNDED DEGREE 4 3.71 [5] 
DOMINATING SET 

X3SAT, size measure m 3 1.1939 | 1.1586 [6] 
(n, 3)-MAXSAT, size 2 1.341 | 1.2366 [2] 
measure ™ 

(n, 3)-MAXSAT, size 2 1.1058 | 1.0983 [2] 
measure | 


report on variants of satisfiability. Table 1 sum- 
marizes the results. 


Cross-References 
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Problem Definition 


In the satisfiability problem (SAT), the input is 
a Boolean formula in conjunctive normal form 
(CNF), and the question is whether the formula 
is Satisfiable, that is, whether there exists an 
assignment of truth values to the variables such 
that the formula evaluates to true. For example, 
the formula 


(x Vay) A (Ax Vy V Z)A (x V 72) 
A (A=x Vy Vz) 
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is satisfiable since it evaluates to true if we set x, 
y, and z to true. 

Several classes of CNF formulas are known 
for which SAT can be solved in polynomial 
time — so-called islands of tractability. For a given 
island of tractability C, a C-backdoor is a set of 
variables of a formula such that assigning truth 
values to these variables gives a formula in C. 

Formally, let C be a class of formulas for 
which the recognition problem and the satisfia- 
bility problem can be solved in polynomial time. 
For a subset of variables X C var(F) of a 
CNF formula F,, and an assignment a : X¥ —> 
{true, false} of truth values to these variables, the 
reduced formula F[a] is obtained from F by 
removing all the clauses containing a true literal 
under a and removing all false literals from the 
remaining clauses. The notion of backdoors was 
introduced by Williams et al. [15], and they come 
in two variants: 


Definition 1 ({15]) A weak C-backdoor of a 
CNF formula F is a set of variables X C var(F’) 
such that there exists an assignment a to X such 
that F[a] € C and F'[a] is satisfiable. 


Definition 2 ((15]) A strong C-backdoor of a 
CNF formula F is a set of variables X C var(F) 
such that for each assignment a to X, we have 
that Fla] eC. 


There are two main computational problems 
associated with backdoors. In the detection prob- 
lem, the input is a CNF formula F' and an in- 
teger k, and the question is whether F has a 
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weak/strong C-backdoor of size k. In the evalu- 
ation problem, the input is a CNF formula F and 
a weak/strong C-backdoor X, and the question 
is whether F is satisfiable. (In the case of weak 
C-backdoors, one usually requires to find a sat- 
isfying assignment since every formula that has a 
weak C -backdoor is satisfiable.) 

The size of a smallest weak/strong C- 
backdoor of a CNF formula F naturally defines 
the distance of F to C. The size of the backdoor 
then becomes a very relevant parameter in 
studying the parameterized complexity [1] of 
backdoor detection and backdoor evaluation. 

For a base class C where #SAT (determine the 
number of satisfying assignments) or Max-SAT 
(find an assignment that maximizes the number 
or weight of satisfied clauses) can be solved in 
polynomial time, strong C-backdoors can also be 
used to solve these generalizations of SAT. 


Key Results 


While backdoor evaluation problems are fixed- 
parameter tractable for SAT, the parameterized 
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complexity of backdoor detection depends on the 
particular island of tractability C that is con- 
sidered. If weak (resp., strong) C-backdoor de- 
tection is fixed-parameter tractable parameter- 
ized by backdoor size, SAT is fixed-parameter 
tractable parameterized by the size of the small- 
est weak (resp., strong) C-backdoor. A sample 
of results for the parameterized complexity of 
backdoor detection is presented in Table 1. The 
considered islands of tractability are defined in 
Table 2. 

It can be observed that restricting the input 
formulas to have bounded clause length can make 
backdoor detection more tractable. Also, weak 
backdoor detection is often no more tractable 
than strong backdoor detection; the outlier here 
is FOREST-backdoor detection for general CNF 
formulas, where the weak version is known to be 
W/[2]-hard but the parameterized complexity of 
the strong variant is still open. A CNF formula 
belongs to the island of tractability Forest if its 
incidence graph is acyclic. Here, the incidence 
graph of a CNF formula F is the bipartite graph 
on the variables and clauses of F' where a clause 
is incident to the variables it contains. 


Backdoors to SAT, Table 1 The parameterized complexity of finding weak and strong backdoors of CNF formulas 


and r-CNF formulas, where r > 3 is a fixed integer 


Weak 
Island CNF r-CNF 
HORN W[2]-h [10] FPT [7] 
2CNF W[2]-h [10] FPT [7] 
UP W[P]-c [14] W[P]-c [14] 
FOREST W[2]-h [6] FPT [6] 
RHORN W[2]-h [7] W([2]-h [7] 
CLU W/[2]-h [11] FPT [7] 


Backdoors to SAT, Table 2 Some islands of tractability 


Island Description 
HORN 
2CNF 


UP 


Strong 

CNF r-CNF 

FPT [10] FPT [10] 
FPT [10] FPT [10] 
WI[P]-c [14] WIP]-c [14] 
Open Open 
W/[2]-h [7] Open 
W[2]-h [11] FPT [11] 


Horn formulas, i.e., CNF formulas where each clause contains at most one positive literal 
Krom formulas, i.e., CNF formulas where each clause contains at most two literals 


CNF formulas from which the empty formula or an empty clause can be derived by unit propagation 


(setting the literals in unit length clauses to true) 


FOREST 
RHORN 
CLU 


Acyclic formulas, 1.e., CNF formulas whose incidence graphs are forests 
Renamable Horn formulas, i.e., CNF formulas that can be made Horn by flipping literals 


Cluster formulas, i.e., CNF formulas that are variable disjoint unions of hitting formulas. A formula 


is hitting if every two of its clauses have at least one variable occurring positively in one clause and 


negatively in the other 
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The width of graph decompositions consti- 
tutes another measure for the tractability of 
CNF formulas that is orthogonal to backdoors. 
For example, Fischer et al. [2] and Samer and 
Szeider [12] give linear-time algorithms solving 
#SAT for CNF formulas in We;. 


Definition 3. For every integer t > 0, We; is the 
class of CNF formulas whose incidence graph has 
treewidth at most t. 


Combining backdoor and graph decompo- 
sition methods, let us now consider backdoors 
to We;. Since FOREST = We, weak We;- 
backdoor detection is already W/[2]-hard for 
t = 1. Fomin et al. [3] give parameterized 
algorithms for weak We<,-backdoor detection 
when the input formula has bounded clause 
length. Concerning strong W<;-backdoor 
detection for formulas with bounded clause 
length, Fomin et al. [3] sidestep the issue 
of computing a backdoor by giving a fixed- 
parameter algorithm, where the parameter is the 
size of the smallest W<;-backdoor, that directly 
solves r-SAT. The parameterized complexity of 
strong We<;-backdoor detection remains open, 
even for t = 1. However, a fixed-parameter 
approximation algorithm was designed by 
Gaspers and Szeider. 


Theorem 1 ((8]) There is a cubic-time algo- 
rithm that, given a CNF formula F and two 
constants k,t > 0, either finds a strong Wet- 
backdoor of size at most 2* or concludes that F 
has no strong Wet-backdoor set of size at most k. 


Using one of the #SAT algorithms for W<; [2, 
12], one can use Theorem | to obtain a fixed- 
parameter algorithm for #SAT parameterized by 
the size of the smallest strong backdoor to We;. 


Corollary 1 ((8]) There is a cubic-time algo- 
rithm that, given a CNF formula F, computes 
the number of satisfying assignments of F or 
concludes that F has no strong We<;-backdoor 
of size k for any pair of constants k,t > 0. 


In general, a fixed-parameter approximation 
algorithm for weak/strong C -backdoor detection 
is sufficient to make SAT fixed-parameter 
tractable parameterized by the size of a smallest 
weak/strong C -backdoor. 
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Backdoors for SAT have been considered 
for combinations of base classes [4, 9], and 
the notion of backdoors has been extended to 
other computational reasoning problems such 
as constraint satisfaction, quantified Boolean 
formulas, planning, abstract argumentation, and 
nonmonotonic reasoning; see [7]. Other variants 
of the notion of backdoors include deletion 
backdoors where variables are deleted instead 
of instantiated, backdoors that are sensitive to 
clause-learning, pseudo-backdoors that relax the 
requirement that the satisfiability problem for 
an island of tractability be solved in polynomial 
time, and backdoor trees; see [5]. 


Applications 


SAT is an NP-complete problem, but modern 
SAT solvers perform extremely well, especially 
on structured and industrial instances [13]. 

The study of backdoors, and especially the 
parameterized complexity of backdoor detection 
problems, is one nascent approach to try and 
explain the empirically observed running times of 
SAT solvers. 


Open Problems 


Major open problems in the area include to deter- 
mine whether 


¢ strong FOREST-backdoor detection is fixed- 
parameter tractable, and whether 

¢ strong RHORN-backdoor detection is fixed- 
parameter tractable for 3-CNF formulas. 


Experimental Results 


Experimental results evaluate running times 
of algorithms to find backdoors in benchmark 
instances, evaluate the size of backdoors of 
known SAT benchmark instances, compare 
backdoor sizes for various islands of tractability, 
compare backdoor sizes for various notions of 
backdoors, evaluate what effect preprocessing 
has on backdoor size, compare how backdoor 
sizes of random instances compare to backdoor 
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sizes of real-world industrial instances, and 
evaluate how SAT solver running times change 
if we force the solver to branch only on the 
variables of a given backdoor. The main messages 
are that the islands of tractability with the 
smallest backdoors are also those for which 
the backdoor detection problems are the most 
intractable and that existing SAT solvers can 
be significantly sped up on many real-world 
SAT instances if we feed them small backdoors. 
The issue is, of course, to compute these 
backdoors, and knowledge of the application 
domain, or specific SAT translations might help 
significantly with this task in practice. See [5] for 
a survey. 
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Problem Definition 


Determination of the complexity of k-CNF sat- 
isfiability is a celebrated open problem: given a 
Boolean formula in conjunctive normal form with 
at most k literals per clause, find an assignment 
to the variables that satisfies each of the clauses 
or declare none exists. It is well known that the 
decision problem of k-CNF satisfiability is NP- 
complete for k > 3. This entry is concerned with 
algorithms that significantly improve the worst- 
case running time of the naive exhaustive search 
algorithm, which is poly(n)2” for a formula on 
n variables. Monien and Speckenmeyer [8] gave 
the first real improvement by giving a simple 
algorithm whose running time is oa), with 
ex > 0 for all k. In a sequence of results 
[1, 3, 5-7, 9-12], algorithms with increasingly 
better running times (larger values of ¢,) have 
been proposed and analyzed. 

These algorithms usually follow one of two 
lines of attack to find a satisfying solution. Back- 
track search algorithms make up one class of al- 
gorithms. These algorithms were originally pro- 
posed by Davis, Logemann, and Loveland [4] and 
are sometimes called Davis-Putnam procedures. 
Such algorithms search for a satisfying assign- 
ment by assigning values to variables one by one 
(in some order), backtracking if a clause is made 
false. The other class of algorithms is based on 
local searches (the first guaranteed performance 
results were obtained by Schéning [12]). One 
starts with a randomly (or strategically) selected 
assignment and searches locally for a satisfying 
assignment guided by the unsatisfied clauses. 

This entry presents ResolveSat, a random- 
ized algorithm for k-CNF satisfiability which 
achieves some of the best known upper bounds. 
ResolveSat is based on an earlier algorithm of 
Paturi, Pudlak, and Zane [10], which is essen- 
tially a backtrack search algorithm where the 
variables are examined in a randomly chosen 
order. An analysis of the algorithm is based on 
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the observation that as long as the formula has 
a satisfying assignment which is isolated from 
other satisfying assignments, a third of the vari- 
ables are expected to occur as unit clauses as the 
variables are assigned in a random order. Thus, 
the algorithm needs to correctly guess the values 
of at most 2/3 of the variables. This analysis is 
extended to the general case by observing that ei- 
ther there exists an isolated satisfying assignment 
or there are many solutions, so the probability of 
guessing one correctly is sufficiently high. 

ResolveSat combines these ideas with reso- 
lution to obtain significantly improved bounds 
[9]. In fact, ResolveSat obtains the best known 
upper bounds for k-CNF satisfiability for all k > 
5. For k = 3 and 4, Iwama and Takami [6] 
obtained the best known upper bound with their 
randomized algorithm which combines the ideas 
from Sch6ning’s local search algorithm and Re- 
solveSat. Furthermore, for the promise problem 
of unique k-CNF satisfiability whose instances 
are conjectured to be among the hardest instances 
of k-CNF satisfiability [2], ResolveSat holds the 
best record for all k > 3. Bounds obtained by 
ResolveSat for unique k-SAT and k-SAT fork = 
3, 4,5, 6 are shown in Table 1. Here, these bounds 
are compared with those of Schéning [12], sub- 
sequently improved results based on local search 
[1,5, 11], and the most recent improvements due 
to Iwama and Takami [6]. The upper bounds 
obtained by these algorithms are expressed in 
the form 2°”-°™) and the numbers in the table 
represent the exponent c. This comparison fo- 
cuses only on the best bounds irrespective of 
the type of the algorithm (randomized versus 
deterministic). 


Backtracking Based k-SAT Algorithms, Table 1 This 
table shows the exponent c in the bound 2°”—°“) for the 
unique k-SAT and k-SAT from the ResolveSat algorithm, 
the bounds for k-SAT from Schéning’s algorithm [12], 
its improved versions for 3-SAT [1,5, 11], and the hybrid 
version of [6] 


unique k-SAT 
k k-SAT[9] k-SAT[9] k-SAT[12] [1,5,11] k-SAT[6] 
3 0.386... 0.521... 0.415... 0.409... 0.404... 
4 0.554... 0.562... 0.584... 0.559... 
5 0.650... 0.678... 
6 0.711... 0.736... 
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Notation 

In this entry, a CNF Boolean formula 
F(x1,X2,...,Xn) is viewed as both a Boolean 


function and a set of clauses. A Boolean formula 
F is a k-CNF if all the clauses have size at 
most k. For a clause C, write var(C) for the 
set of variables appearing in C. If v € var(C), 
the orientation of v is positive if the literal v 
is in C and is negative if v is in C. Recall that 
if F is a CNF Boolean formula on variables 
(X1,X2,...,X,) and a is a partial assignment 
of the variables, the restriction of F by a is 
defined to be the formula F’ = F'[, on the set 
of variables that are not set by a, obtained by 
treating each clause C of F as follows: if C is set 
to 1 by a, then delete C and otherwise replace C 
by the clause C’ obtained by deleting any literals 
of C that are set to 0 by a. Finally, a unit clause 
is a clause that contains exactly one literal. 


Key Results 


ResolveSat Algorithm 

The ResolveSat algorithm is very simple. Given 
a k-CNF formula, it first generates clauses that 
can be obtained by resolution without exceeding 
a certain clause length. Then it takes a random 
order of variables and gradually assigns values 
to them in this order. If the currently considered 
variable occurs in a unit clause, it is assigned 
as the only value that satisfies the clause. If it 
occurs in contradictory unit clauses, the algo- 
rithm starts over. At each step, the algorithm 
also checks if the formula is satisfied. If the 
formula is satisfied, then the input is accepted. 
This subroutine is repeated until either a satisfy- 
ing assignment is found or a given time limit is 
exceeded. 

The ResolveSat algorithm uses the follow- 
ing subroutine, which takes an arbitrary assign- 
ment y, a CNF formula F, and a permutation 
qt as input, and produces an assignment u. The 
assignment u is obtained by considering the vari- 
ables of y in the order given by a and modifying 
their values in an attempt to satisfy F. 

Function Modify(CNF formula G(x1, x2,..., 
Xn), permutation nm of {1,2,...,”}, assignment 
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y) — (assignment u) Go = G.for i = | to 
n if G;-; contains the unit clause x(j) then 
Ux(i) = 1 else if G;_, contains the unit clause 
Xn) then uz) = 0 else uny7) = Yau) Gi = 
Gi-1|xn(i)=un(7) end /* end for loop */return u; 

The algorithm Search is obtained by running 
Modify (G, x,y) on many pairs (a, y), where 
mT is a random permutation and y is a random 
assignment. 

Search(CNF-formula F, integer /)repeat 
I times m = uniformly random permutation 
of 1,...,7y = uniformly random vector 
€ {0,1}"u = Modify (F, 1, y); if wu satisfies 
F then output(u); exit;end/* end repeat loop 
*/output(‘Unsatisfiable’ ); 

The ResolveSat algorithm is obtained by com- 
bining Search with a preprocessing step consist- 
ing of bounded resolution. For the clauses C; and 
C2, Cy and C2 conflict on variable v if one of 
them contains v and the other contains v. C; and 
C2 is a resolvable pair if they conflict on exactly 
one variable v. For such a pair, their resolvent, 
denoted R(C1, C2), is the clause C = D,; v D2 
where D, and Dp» are obtained by deleting v 
and v from C, and Cp. It is easy to see that any 
assignment satisfying C, and C2 also satisfies C. 
Hence, if F is a satisfiable CNF formula contain- 
ing the resolvable pair C,, Cz then the formula 
F’ = F A R(C,,C2) has the same satisfying 
assignments as F’. The resolvable pair Ci, C2 is 
s-bounded if |C;|, |C2| <s and |R(C,, C2)| < s. 
The following subroutine extends a formula F 
to a formula F by applying as many steps of s- 
bounded resolution as possible. 

Resolve(CNF Formula F, integer s)F; = 
F while F; has an s-bounded resolvable pair 
Ci,C2 with R(Ci,C2) ¢€ Fs Fs = Fs A 
R(C,, Cz).return (F;). 

The algorithm for k-SAT is the following 
simple combination of Resolve and Search: 

ResolveSat(CNF-formula F, integer  s, 
positive integer J) F; = Resolve (F,s).Search 
(F;, 7). 


Analysis of ResolveSat 

The running time of ResolveSat (F,s5,/) can 
be bounded as follows. Resolve (Fs) adds at 
most O(n*) clauses to F by comparing pairs 
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of clauses, so a naive implementation runs in 
time n**poly(n) (this time bound can be im- 
proved, but this will not affect the asymptotics 
of the main results). Search (F;, 7) runs in time 
I(|F| + n°)poly(). Hence, the overall running 
time of ResolveSat (F, 5, 7) is crudely bounded 
from above by (n° + I(|F| + n°))poly(n). If 
s = O(n/logn), the overall running time can 
be bounded by /|F|29™ since nS = 2°”. It 
will be sufficient to choose s either to be some 
large constant or to be a slowly growing function 
of n. That is, s(m) tends to infinity with but is 
O(logn). 

The algorithm Search (F, J) always answers 
“unsatisfiable” if F is unsatisfiable. Thus, the 
only problem is to place an upper bound on the 
error probability in the case that F is satisfiable. 
Define t(F) to be the probability that Modify 
(F, 1, y) finds some satisfying assignment. Then 
for a satisfiable F’,, the error probability of Search 
(F, 1) is equal to (1 — t(F))! < e~!*), which 
is at most e~” provided that J > n/t(F). Hence, 
it suffices to give good upper bounds on t(F). 

Complexity analysis of ResolveSat requires 
certain constants pu, for k > 2: 


It is straightforward to show that u3 = 4 — 
41n 2 > 1.226 using Taylor’s series expansion of 
In 2. Using standard facts, it is easy to show that 
lx is an increasing function of & with the limit 


CO 


Y > (1/j7) = (27/6) = 1.644... 


j=l 
The results on the algorithm ResolveSat are sum- 
marized in the following three theorems. 
Theorem 1 


(i) Let k > 5, and let s(n) be a function going 
to infinity. Then for any satisfiable k-CNF 
formula F onn variables, 


t(Fy) < (eS) n-0@) 
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Hence, ResolveSat (F,s,1) with IT = 
20-HK/&-D)n+O@) has error probability 
O(1) and running time 20-#x/&-D)n+0@) 
on any satisfiable k-CNF formula, provided 
that s(n) goes to infinity sufficiently slowly. 
(ii) For k > 3, the same bounds are obtained 
provided that F is uniquely satisfiable. 


Theorem | is proved by first considering the 
uniquely satisfiable case and then relating the 
general case to the uniquely satisfiable case. 
When k > _ 5, the analysis reveals that the 
asymptotics of the general case is no worse than 
that of the uniquely satisfiable case. When k = 3 
or k = 4, it gives somewhat worse bounds for the 
general case than for the uniquely satisfiable case. 


Theorem 2 Let s = s(n) be a slowly growing 
function. For any satisfiable n-variable 3-CNF 
formula, t(F;) > 2~°57!", and so ResolveSat 
(F,s, 1) with I = n2°-52!" has error probability 
O(1) and running time 29521" + 0@, 


Theorem 3 Let s = s(n) be a slowly growing 
function. For any satisfiable n-variable 4-CNF 
formula, t(Fs) > 2~°°°5", and so ResolveSat 
(F,s, 1) with I = n2°°°?5" has error probabil- 
ity O(1) and running time 295625"+0@), 


Applications 


Various heuristics have been employed to 
produce implementations of 3-CNF satisfiability 
algorithms which are considerably more 
efficient than exhaustive search algorithms. The 
ResolveSat algorithm and its analysis provide 
a rigorous explanation for this efficiency and 
identify the structural parameters (e.g., the 
width of clauses and the number of solutions), 
influencing the complexity. 


Open Problems 


The gap between the bounds for the general 
case and the uniquely satisfiable case when k € 
{3,4} is due to a weakness in analysis, and it 
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conjectured that the asymptotic bounds for 


the uniquely satisfiable case hold in general for 
all k. If true, the conjecture would imply that 
ResolveSat is also faster than any other known 
algorithm in the k = 3 case. 


Another interesting problem is to better under- 


stand the connection between the number of satis- 
fying assignments and the complexity of finding 


a 


satisfying assignment [2]. A strong conjec- 


ture is that satisfiability for formulas with many 
satisfying assignments is strictly easier than for 
formulas with fewer solutions. 


Finally, an important open problem is to de- 


sign an improved k-SAT algorithm which runs 
faster than the bounds presented in here for the 
unique k-SAT case. 
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Problem Definition 


A network bargaining game can be represented 
by a graph G = (V, E£) along with a set of node 
capacities {c;|i € V} and a set of edge weights 


Bargaining Networks 


{wil@, J) € E}, where V is a set of n agents, E 
is the set of all possible contracts, each agent i € 
V has a capacity c; which the maximum number 
of contracts in which agent i may participate, and 
each edge (i, 7) € E has a weight wi which rep- 
resents the surplus of a possible contract between 
agent i and agent j which should be divided 
between agents 7 and 7 upon an agreement. The 
main goal is to find the outcome of bargaining 
among agents which is a set of contracts M C FE 
and the division of surplus {z,} for all contracts 
inM. 


Problem 1 (Computing the Final Outcome) 


INPUT: A network bargaining game G = 
(V, E) along with capacities {c;|i € V} and 
weights {w;|(i, j) € E}. 

OUTPUT: The final outcome of bargaining 
among agents. 


Solution Concept 


Feasible Solution 

The final outcome of the bargaining process 
might have many properties. The main one is the 
feasibility. A solution (M, {zj}) is feasible if and 
only if it has the following properties: 


¢ The degree of each node i should be at most 
cj inset M. 

e For each edge (i, 7) € M, we should have 
Zy+Zji = wi. This means if there is a contract 
between agents i and /, the surplus should be 
divided between these two agents. 

¢ For each edge (i, 7) ¢ M, we should have 
Zy = Zi = O. 


Outside Option 

Given a feasible solution (M, {zjj}), the outside 
option of agent is the best deal she can make 
outside of set M. For each edge (i,k) € E-—M, 
agent i has an outside option by offering agent 
k her current worst offer. In particular, if k has 
less than cy, contracts in M, agent i can offer 
agent k exactly 0, and thus the outside option 
of agent 7 regarding agent kK would be wi. On 
the other hand, if k has exactly cy, contracts in 
M, agent i may offer agent k the minimum of 
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Zy for all (k, jj) € M, and thus the outside 
option of agent i regarding agent k would be 
Wik — Ming, jem {Zs}. Therefore, the outside 
option a; of agent i regarding solution (M, {z;;}) 
is defined as aj = max(,pjez—m {Wik — Nik} 
where 


0 if k has less than 
c; contracts in M 


ming, jyem{Zy} if k has exactly 


c; contracts in M 


Stable Solution 

A solution (M, {z;;}) is stable if for each contract 
(i, 7) € M, we have zj > a; and for each agent 
i with less than c; contracts in M, we have a; = 
0. Otherwise, agent 7 has an incentive to deviate 
from M and makes a contract with agent k such 
that (i,k) ¢ M. 


Balanced Solution 

John Nash [6] proposed a solution for the out- 
come of bargaining process between two agents. 
In his solution, known as the Nash bargaining 
solution, both agents will enjoy their outside op- 
tions and then divide the surplus equally. One can 
leverage the intuition behind the Nash bargaining 
solution and defines the balanced solution in the 
network bargaining game. A feasible solution 
(M, {z,}) is balanced if for each contract (i, j) in 
M, the participants divide the net surplus equally, 
Le., Zik = Qj + MASS 


Problem 2 (Computing a Stable Solution) 


INPUT: A network bargaining game G = 
(V, E) along with capacities {ci|i € V} and 
weights {wj|(i, j) € E}. 

OUTPUT: A stable solution. 


Problem 3 (Computing a Balanced Solution) 


INPUT: A network bargaining game G = 
(V, E) along with capacities {c;|i € V} and 
weights {wj|(i, j) € E}. 

OUTPUT: A balanced solution. 
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Key Results 


The main goal of studying the network bargaining 
games is to find the right outcome of the game. 
Stable and balanced solutions are known to be 
good candidates. However, they might be too 
large, and moreover, some network bargaining 
games do not have stable and balanced solu- 
tions. 


Existence of Stable and Balanced Solutions 

It has been proved that a network bargaining 
game G = (V,E) with set of weights 
{wi|(@, J) € E} has at least one stable solution 
if and only if the following linear program for 
the corresponding maximum weighted matching 
problem has an integral optimum solution [4, 5]: 


maximize )0( jeg XijWij 
subject to iG. jyex Xi X Ci, 
xy <1, 


VieV 

VG,jJ)EEk 
(LP1) 

Kleinberg and Tardés [5] also study network 
bargaining games with unit capacities, i.e., c; = 
1 for each agent i, and show these games have 
at least one balanced solution if and only if 
they have a stable solution. Farczadi et al. [3] 
generalize this result and prove the same result 
for network bargaining games with general ca- 
pacities. 


Cooperative Game Theory Perspective 

One can study network bargaining games from 
cooperative game theory perspective. A cooper- 
ative game is defined by a set of agents V and 
a value function v : 2” — R, where v(S) 
represents the surplus that all agents in S alone 
can generate. In order to consider our bargaining 
game as a cooperative game, we should first 
define a value function for our bargaining game. 
The value function v(S) can be defined as the size 
of the maximum weighted c-matching of S. 


Core 

An outcome {x;|i € V} is in core if for each 
subset of agents S, we have )0;e5 x; > v(S) and 
for set V we have 0 jcy x; = v(V). This means 
the agents should divide the total surplus v(V) 
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such that each subset of agents earns at least as 
much as they alone can generate. 


Prekernel 

Consider an outcome {x;|i € V}. The power 
of agent i over agent j regarding outcome 
{xi |i € V} is defined as s(x) = max{v(S) — 
Mies xi|S CV,i € S, 7 € V—S}. An outcome 
{x;|i € V} is in prekernel if for every two agents 
i and j, we have sj(x) = sj(x). 


Nucleolus 

Consider an outcome {x;|i € V}. The excess of 
set S is defined as e(S) = v(S) — Vies Xi. 
Let € be the vector of all possible 2'”! excesses 
which are sorted in nondecreasing order. The 
nucleolus is the outcome which lexicographically 
maximizes vector €. 

There is a nice connection between stable and 
balanced solutions in network bargaining games 
and core and prekernel outcomes in cooperative 
games [1,2]. Bateni et al. [2] prove in a bipartite 
network where all nodes on one side have unit 
capacity, the set of stable solutions and the core 
coincide. Moreover, they map the set of balanced 
solutions to the prekernel in the same setting. 
Note that it is shown that this equivalence cannot 
be extended to a general bipartite network where 
nodes on both sides have general capacities [2,3]. 

The set of stable and balanced solutions are 
quite large for many instances and thus may 
not be used for predicting the outcome of the 
game. Both Azar et al. [1] and Bateni et al. [2] 
leverage the connection between network bar- 
gaining games and cooperative games and sug- 
gest the nucleolus as a symmetric and unique 
solution for the outcome of a network bargaining 
game [1, 2]. Bateni et al. [2] also propose a 
polynomial-time algorithm for finding nucleolus 
in bipartite networks with unit capacities on one 
side. 


Finding Stable and Balanced Solutions 

Designing a polynomial-time algorithm for find- 
ing stable and balanced solutions of a network 
bargaining game is a well-known problem. Klein- 
berg and Tardés [5] were the first who studied 
this problem and proposed a polynomial-time 
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algorithm which characterizes stable and _ bal- 
anced solutions when all agents have unit ca- 
pacities. Their solution draws connection to the 
structure of matchings and the Edmonds-Gallai 
decomposition. Bateni et al. [2] generalize this 
results and design a polynomial-time algorithm 
for bipartite graphs where all agents on one side 
have general capacities and the other ones have 
unit capacities. They leverage the correspondence 
between the set of balanced solutions and the 
intersection of the core and prekernel and use 
known algorithms for finding a point in prekernel 
to solve the problem. Last but not least, Farczadi 
et al. [3] propose an algorithm for computing 
a balanced solution for general capacities. The 
main idea of their solution is to reduce an instance 
with general capacities to a network bargaining 
game with unit capacities. 


Open Problems 


- What is the right outcome of a network bargain- 
ing game on a general graph? 

- How can we compute a proper outcome of a 
network bargaining game on a general graph 
in a polynomial time? 
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Problem Definition 


A drawing of a graph G = (V,£) maps each 
vertex v € V to a distinct point of the plane and 
each edge e € E to a simple open Jordan curve 
joining its end vertices. A drawing is planar if 
the edges do not intersect. A graph is planar if it 
admits a planar drawing. A planar drawing of a 
graph partitions the plane into connected regions 
called faces. The unbounded face is called exter- 
nal face. Two drawings of G are equivalent if 
they induce the same circular order of the edges 
incident to the vertices. A planar embedding of G 
is an equivalence class of such drawings. A plane 
graph is a planar graph together with a planar 
embedding and the specification of the external 
face. 

A drawing of a graph is orthogonal if each 
edge is a sequence of alternate horizontal and ver- 
tical segments. Only planar graphs of maximum 
degree four admit planar orthogonal drawings. 


178 


Bend Minimization for a 
Orthogonal Drawings of 

Plane Graphs, Fig. 1 (a) 

An orthogonal drawing of 

a graph. (b) An orthogonal 

drawing of the same graph 

with the minimum number 

of bends 


The points in common between two subsequent 
segments of the same edge are called bends. 
Figure | shows two orthogonal drawings of the 
same plane graph with seven bends and one bend, 
respectively. 


Bend Minimization Problem 
Formally, the main research problem can be de- 
fined as follows. 


INPUT: A plane graph G = (V, E) of maximum 
degree four. 

OUTPUT: An orthogonal drawing of G with the 
minimum number of bends. 


Since, given the shape of the faces, an or- 
thogonal drawing of G with integer coordinates 
for vertices and bends can be computed in linear 
time, this problem may be alternatively viewed 
as that of embedding a 4-plane graph in the 
orthogonal grid with the minimum number of 
bends. Observe that if the planar embedding of 
the graph is not fixed, the problem of finding a 
minimum-bend orthogonal drawing is known to 
be NP-complete [9], unless the input graph has 
maximum degree three [5]. 


Key Results 


The bend minimization problem can be solved in 
polynomial time by reducing it to that of finding a 
minimum-cost integer flow of a suitable network. 
Here, rather than describing the original model 
of [11], we describe the more intuitive model 
of [6]. 
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Any orthogonal drawing of a maximum degree 
four plane graph G = (V, E) corresponds to an 
integer flow in a network \/(G) with value 4 x n, 
with n = |V|, where: 


1. For each vertex v € V, N(G) has a node ny, 
which is a source of 4 units of flow. 

2. For each face f of G, N(G) has a node n ¢ 
which is a sink of 2deg(f) — 4 units if f 
is an internal face, or 2deg(f) + 4, other- 
wise, where deg(/) is the number of vertices 
encountered while traversing the boundary of 
face f (the same vertex may be counted mul- 
tiple times). 

3. For each edge e € E, with adjacent faces 
f and g, N(G) has two arcs (n¢,ng) and 
(ng,n f), both with cost 1 and lower bound 0. 

4. For each vertex v € V incident to a face f 
of G, N(G) has an arc (ny,n ¢) with cost 0 
and lower bound 1. Multiple incidences of the 
same vertex to the same face yield multiple 


arcs of N'(G). 


Figure 2 shows the two flows of cost 7 and 
1, respectively, corresponding to the orthogonal 
drawings of Fig. 1. Intuitively, a flow of VV(G) 
describes how 90° angles are distributed in the 
orthogonal drawing of G. Namely, each vertex 
has four 90° angles around it, hence “producing” 
four units of flow. The number of 90° angles 
needed to close a face f is given by the for- 
mula in [12], that is, 2deg(f) — 4 units if f is 
an internal face, and 2deg(f) + 4, otherwise. 
Finally, the flows traversing the edges account 
for their bends, where each bend allows a face 
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Bend Minimization for Orthogonal Drawings of Plane Graphs, Fig. 2 (a) The flow associated with the drawing 
of Fig. la has cost 7. (b) The flow associated with the drawing of Fig. 1b has cost 1 


to “lose” a 90° angle and the adjacent face to 
“gain” it. More formally, we have the following 
theorem. 


Theorem 1 Let G = (V,E) be a four-plane 
graph. For each orthogonal drawing of G with b 
bends, there exists an integer flow in N(G) whose 
value is 4 x |V| and whose cost is b. 


Although several orthogonal drawings of G 
(e.g., with the order of the bends along edges 
permuted) may correspond to the same flow of 
N(G), starting from any flow, one of such draw- 
ings may be obtained in linear time. Namely, once 
the orthogonal shape of each face is fixed, it is 
possible to greedily add as many dummy edges 
and nodes as are needed to split the face into 
rectangular faces (the external face may require 
the addition of dummy vertices in the corners). 
Integer edge lengths can be consistently assigned 
to the sides of these rectangular faces, obtaining 
a grid embedding (a linear-time algorithm for 
doing this is described in [6]). The removal of 
dummy nodes and edges yields the desired or- 
thogonal drawing. Hence, we have the following 
theorem. 


Theorem 2 Let G = (V,E) be a four-plane 
graph. Given an integer flow in N(G) whose 
value is 4x|V| and whose cost is b, an orthogonal 


drawing of G with b bends can be found in linear 
time. 


Since each bend of the drawing corresponds 
to a unit of cost for the flow, when the total cost 
of the flow is minimum, any orthogonal drawing 
that can be obtained from it has the minimum 
number of bends [11]. 

Hence, given a plane graph G = (V,£) of 
maximum degree four, an orthogonal drawing of 
G with the minimum number of bends can be 
computed with the same asymptotic complex- 
ity of finding a minimum-cost integer flow of 
N(G). The solution to this problem proposed 
in [11] is based on an iterative augmentation 
algorithm. Namely, starting from the initial zero 
flow, the final 4 x n flow is computed by aug- 
menting the flow at each of the O(n) steps along 
a minimum-cost path. Such a path can be found 
with the O(n logn)-implementation of Dijkstra’s 
algorithm that exploits a priority queue. The over- 
all O(n? logn) time complexity was lowered 
first to O(n7/4 log n) [8] and then to O(n3/?), 
exploiting the planarity of the flow network [4]. 
However, the latter time bound is increased by an 
additional logarithmic factor if some edges have 
constraints on the number of allowed bends [4] 
or if the Dijkstra’s algorithm for the shortest path 
computation is preferred with respect to the rather 
theoretical algorithm in [10]. 
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Bend Minimization for 
Orthogonal Drawings of 
Plane Graphs, Fig.3 (a) 
A drawing on the 
hexagonal grid. (b) A 
drawing of the same graph 
in the Kandinsky model 


Applications 


Orthogonal drawings with the minimum number 
of bends are of interest to VLSI circuit design, ar- 
chitectural floor plan layout, and aesthetic layout 
of diagrams used in information systems design. 
In particular, the orthogonal drawing standard is 
adopted for a wide range of diagrams, including 
entity-relationship diagrams, relational schemes, 
data-flow diagrams, flow charts, UML class dia- 
grams, etc. 


Open Problems 


Several generalizations of the model have been 
proposed in order to deal with graphs of degree 
greater than four. The hexagonal grid, for exam- 
ple, would allow for vertices of maximum degree 
six (see Fig. 3a. Although the bend minimization 
problem is polynomial on such a grid [11], decid- 
ing edge lengths becomes NP-hard [2]. 

One of the most popular generalizations is 
the Kandinsky orthogonal drawing standard [7] 
where vertices of arbitrary degree are represented 
as small squares or circles of the same dimen- 
sions, while the first segments of the edges that 
leave a vertex in the same direction run very 
close together (see, e.g., Fig. 3b). Although the 
bend minimization problem in the Kandinsky 
orthogonal drawing standard has been shown to 
be NP-hard [3], this model is of great interest 
for applications. An extension of the flow model 
that makes it possible to solve this problem in 
polynomial time for a meaningful subfamily of 
Kandinsky orthogonal drawings has been pro- 
posed in [1]. Namely, in addition to the drawing 
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conventions of the Kandinsky model, each vertex 
with degree greater than four has at least one 
incident edge on each side, and each edge leaving 
a vertex either has no bend or has its first bend on 
the right. 
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Problem Definition 


A setting is assumed in which n selfish users 
compete for routing their loads in a network. The 
network is an s —f directed graph with a single 
source vertex s and a single destination vertex f. 
The users are ordered sequentially. It is assumed 
that each user plays after the user before her in the 
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ordering, and the desired end result is a Pure Nash 
Equilibrium (PNE for short). It is assumed that, 
when a user plays (i.e., when she selects an s — ¢ 
path to route her load), the play is a best response 
(i.e., minimum delay), given the paths and loads 
of users currently in the net. The problem then is 
to find the class of directed graphs for which such 
an ordering exists so that the implied sequence 
of best responses leads indeed to a Pure Nash 
Equilibrium. 


The Model 
A network congestion game is a_ tuple 
((Wi)ien, G, (de)ecz) Where N =({1,...,n} 
is the set of users where user i controls w; units of 
traffic demand. In unweighted congestion games 
w; = 1 for i =1,...,n. G(V,E) is a directed 
graph representing the communications network 
and d, is the latency function associated with 
edge e € FE. It is assumed that the d,’s are 
non-negative and non-decreasing functions 
of the edge loads. The edges are called 
identical if de(x) = x, Ve € E. The model is 
further restricted to single-commodity network 
congestion games, where G has a single source s 
and destination f¢ and the set of users’ strategies 
is the set of s — ¢ paths, denoted P. Without loss 
of generality it is assumed that G is connected 
and that every vertex of G lies on a directed s — t 
path. 

A vector P = (pj,..., Pn) consisting of an 
s —t path p; for each user i is a pure strategies 
profile. Let le(P) = Dij-eep, Wi be the load of 
edge e in P. The authors define the cost MAP) 
for user i routing her demand on path p in the 
profile P to be 


A(P)= Yo de (le(P)) 


ee pNpi 


+ So de(le(P) + wi). 


CEP Dj 


The cost A'(P) of user i in P is just Ns (P), 
i.e., the total delay along her path. 

A pure strategies profile P is a Pure Nash Equi- 
librium (PNE) iff no user can reduce her total 
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delay by unilaterally deviating i.e., by selecting 
another s —t path for her load, while all other 
users keep their paths. 


Best Response 

Let p; be the path of useri and P’ = (py,..., pi) 
be the pure strategies profile for users 1,...,i. 
Then the best response of user i + 1 is a path 
Pi+1 So that 


YS (de (Ie (P') + wi41))e- 


Flows and Common Best Response 
A (feasible) flow on the set P of s — t paths of G 
isa function f : P — Hszo so that 


fp =m. 


peP i=1 


The single-commodity network congestion game 
((wi)ien, G, (de)eex) has the Common Best Re- 
sponse property if for every initial flow f (not 
necessarily feasible), all users have the same set 
of best responses with respect to f. That is, 
if a path p is a best response with respect to 
f for some user, then for all users j and all 
paths p’ 


di de (fe + wj) = Yo de (fe + wy). 


eepl ecp 


Furthermore, every segment m of a best re- 
sponse path p is a best response for routing the 
demand of any user between m’s endpoints. It is 
allowed here that some users may already have 
contributed to the initial flow /. 


Layered and Series-Parallel Graphs 
A directed (multi)graph G(V, E) with a distin- 
guished source s and destination ¢ is layered iff 
all directed s—t paths have exactly the same 
length and each vertex lies on some directed 
s —t path. 

A multigraph is series-parallel with terminals 
(s, 2) if 
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1. itis a single edge (s, f) or 

2. it is obtained from two series-parallel graphs 
G,, G2 with terminals (5), ¢,) and (52, f2) by 
connecting them either in series or in parallel. 
In a Series connection, t, is identified with s> 
and s; becomes s and tf) becomes ¢. In a paral- 
lel connection, 5; = s2 = s andft) =f) =t. 


Key Results 


The Greedy Best Response Algorithm (GBR) 
GBR considers the users one-by-one in non- 
increasing order of weight (i.e., w, > Ww2 => 
+++ > w,). Each user adopts her best response 
strategy on the set of (already adopted in the net) 
best responses of previous users. No user can 
change her strategy in the future. Formally, GBR 
succeeds if the eventual profile P is a Pure Nash 
Equilibrium (PNE). 


The Characterization 
In [3] it is shown: 


Theorem 1 /f G is an (s—t) series-parallel 
graph and the game ((wi)ien, G, (de)ecE) has 
the common best response property, then GBR 
succeeds. 


Theorem 2 A weighted single-commodity net- 
work congestion game in a layered network with 
identical edges has the common best response 
property for any set of user weights. 


Theorem 3 For any single-commodity network 
congestion game in series-parallel networks, 
GBR succeeds if 


1. The users are identical (if w; = 1 for all i) 
and the edge-delays are arbitrary but non- 
decreasing or 

2. The graph is layered and the edges are identi- 
cal (for arbitrary user weights) 


Theorem 4 /f the network consists of bunches 
of parallel-links connected in series, then 
a PNE is obtained by applying GBR to each 
bunch. 
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Theorem 5 


1. If the network is not series-parallel then there 
exist games where GBR fails, even for 2 iden- 
tical users and identical edges. 

2. If the network does not have the common best 
response property (and is not a sequence of 
parallel links graphs connected in series) then 
there exist games where GBR fails, even for 2- 
layered series-parallel graphs. 


Examples of such games are provided in [3]. 


Applications 


GBR has a natural distributed implementation 
based on a leader election algorithm. Each player 
is now represented by a process. It is assumed 
that processes know the network and the edge 
latency functions. The existence of a message 
passing subsystem and an underlying synchro- 
nization mechanism (e.g., logical timestamps) is 
assumed, that allows a distributed protocol to 
proceed in logical rounds. 

Initially all processes are active. In each round 
they run a leader election algorithm and deter- 
mine the process of largest weight (among the 
active ones). This process routes its demand on 
its best response path, announces its strategy to 
all active processes, and becomes passive. Notice 
that each process can compute its best response 
locally. 


Open Problems 


What is the class of networks where (identical) 
users can achieve a PNE by a k-round repetition 
of a best responses sequence? What happens to 
weighted users? In general, how the network 
topology affects best response sequences? Such 
open problems are a subject of current research. 
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Problem Definition 


Recent developments in phylogenetics have 
provided evidences that evolutionary histories 
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cannot always be represented as a single tree; 
thus, more sophisticated representations are 
needed. Phylogenetic networks are natural 
extensions of phylogenetic trees that recently 
gathered general consensus in literature. Let 
A be a finite set of labels, representing a set 
of extant species (taxa). A rooted phylogenetic 
N over A (or, simply, phylogenetic network 
or network) is a directed acyclic connected 
graph N = (V(N), A(N)) containing a unique 
vertex with no incoming arcs, called root of 
N, and a labeling function from the set L(V) 
of leaves of N to the set of labels A. The set 
of labels associated with the leaves of N is 
denoted by A(N), and phylogenetic networks 
whose leaves are in bijection with the set of 
labels are called uniquely labeled. The undirected 
edges underlying the set A(V) are denoted with 
E(N). 

We will discuss two important families of 
problems where phylogenetic networks have 
been introduced: consensus network computation 
and tree reconciliation. Other models (and the 
related problems) for representing and recon- 
structing non-treelike evolutionary scenarios are 
presented in [7]. The family of the consensus 
network computation problems asks for a single 
phylogenetic network (called consensus network) 
that best summarizes all the information provided 
by a collection of “structures” representing the 
evolutionary relationships among the set of taxa 
A. The family of tree reconciliation problems, 
instead, analyzes the evolution of a gene in order 
to either reconstruct the evolution of a set of 
species or infer the evolutionary scenario of the 
considered gene. 

Observe that it is possible to topologically sort 
the vertices of a phylogenetic network, so that 
each vertex always appears after all its predeces- 
sors; hence, it is possible to define the children, 
parents, ancestors, and descendants of a given 
vertex, as usual for trees. Furthermore, as for 
trees, we can define the least common ancestor 
of a set of nodes. Given a subset A of nodes 
of a phylogenetic network AN, then the least 
common ancestor (or, shortly, lca) of A in N 
is a node x of N that is an ancestor of each 
node in A and that is the furthest such node from 
the root. 
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Consensus Network Reconstruction 

The aim of consensus network reconstruction 
problems is computing a unique phylogenetic 
network (called consensus network) that best 
summarizes all the information provided by 
a collection of “structures” representing the 
evolutionary relationships among the set of taxa 
A. Different specific computational problems 
have been defined in the literature depending (i) 
on the kind of input structures considered and (11) 
on the definition of the optimality criterion used 
to choose the best consensus network. Simple 
formulations (such as maximum agreement 
subtree (MAST), maximum compatible tree 
(MCT), maximum agreement supertree (MASP)) 
compute a consensus network which is actually 
a phylogenetic tree. However, trees are not 
always sufficient for describing conflicting 
information and real evolutionary scenarios; 
hence, formulations which attempt to reconstruct 
phylogenetic networks have been proposed. 

In terms of optimality criterion, two aims can 
be pursued: either finding the largest set of taxa 
that “share” (as defined below) a common sub- 
structure or finding the “simplest” network that 
represents all taxa. For measuring the complexity 
of a network, two natural parameters are consid- 
ered: the reticulation number and the level of the 
network. Reticulation (or hybrid) nodes are nodes 
of the network with more than one parent. The 
reticulation number of a network N is defined as 
|E(N)|—|V(N)|+1 and represents how “far” the 
network is from a phylogenetic tree (which has 
reticulation number 0). If all reticulation nodes 
have indegree 2 (which is often assumed in the 
literature), then the reticulation number is equal 
to the number of reticulation nodes. The level of 
a network WN is defined as the maximum number 
of reticulation nodes in a biconnected component 
of (V(N), E(N)) (ie., the undirected graph ob- 
tained from the network) [6]. Phylogenetic trees 
are level-0 networks, while level-1 networks are 
often called galled trees [13]. 

In terms of kinds of input structures, several 
options have been studied, and, among them, 
the most important ones are phylogenetic net- 
works, triplets/quartets, and clusters. When input 
structures are phylogenetic networks, the usual 
aim is to reconstruct their maximum agreement 
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subnetwork (MASN) [6], that is, a level-k phy- 
logenetic network A (for some fixed k) uniquely 
labeled with a set A’ C A of maximum cardi- 
nality such that A is a subgraph of the topolog- 
ical restriction of each input network w.r.t. A’. 
The topological restriction of a uniquely labeled 
network N to a subset A’ C Ai of labels is 
defined as the network obtained by first deleting 
all nodes which are not on any directed path from 
the root to one of the leaves labeled in A’ along 
with their incident edges, and then, for every 
node with outdegree 1 and indegree less than 2, 
contracting its outgoing edges. (Notice that the 
MAST problem is a special case of the MASN 
problem when k = 0.) 

Triplets are rooted binary phylogenetic trees 
on exactly three species/leaves. They are gen- 
erally represented as xy|z, indicating that the 
parent of x and y is a child of the parent of z. 
The consensus network reconstruction problem 
from triplets is the problem of finding, if possible, 
a phylogenetic network N consistent with each 
triplet given as input (or with the maximum num- 
ber of them). A phylogenetic network N is said 
to be consistent with a triplet xy|z if N contains 
two distinct vertices u, v and the four pairwise 
internally vertex-disjoint paths u > x,u —> jy, 
v —> u,v — Zz. The resulting phylogenetic 
network is required to either have minimum level 
or have fixed level (possibly with minimum retic- 
ulation number) [18, 26,27]. The related problem 
on quartets (i.e., unrooted phylogenetic trees on 
four species) has been also proposed [8]. 

Clusters are (strict) subsets of A. The consen- 
sus network reconstruction problem from clusters 
is the problems of finding a phylogenetic network 
N that represents all clusters given as input. 
There exist two main definitions for “represents,” 
commonly referred to as the “hardwired” and the 
“softwired” definitions. A network N represents 
a cluster A’ C A in the hardwired sense if 
there exists an arc (u,v) € A(N) such that 
A’ is exactly the set of labels associated with 
the leaves of a subnetwork rooted in v [16]. 
Instead, a network N represents a cluster A’ C 
A in the softwired sense if there exists an arc 
(u,v) € A(N) such that A’ is the set of labels 
associated with the leaves of a subtree rooted 
in v obtained by removing, for each reticulation 
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node, all edges but one directed to that node [15]. 
In both cases, the computational problems fo- 
cus on reconstructing networks with minimum 
level [29]. 


Reconciliation of Gene Trees and Species 
Trees 

The evolution of a family of homologous genes 
in a given set of species is usually represented 
as a phylogenetic tree, called a gene tree, where 
each label can be associated with more than one 
leaf, while the evolution of the considered set 
of species is called a species tree, which is a 
uniquely labeled tree. Due to different evolution- 
ary events that affect gene evolution (duplica- 
tions, losses, lateral gene transfer), the evolution 
represented by a gene tree and a species tree (or 
by two different gene trees) can be different. 

Two fundamental combinatorial problems 
have been studied in this field: the reconstruc- 
tion of the species tree associated with the 
homologous genes considered [5, 12, 21] and 
the reconciliation of a gene tree with a given 
species tree [3, 4, 9, 21, 23], whose goal is the 
inference of the evolutionary events that occurred 
in the genes evolution. Here, we consider only 
the latter problem; hence, we assume that a gene 
tree (or a set of gene trees) and a (correct) species 
tree are given. 

Given a set S' of taxa, a species tree Ts and 
a gene tree Tg are two rooted binary trees, leaf- 
labeled by S, with A(Tg) € A(Ts). Two nodes 
of a tree T are comparable when one is an 
ancestor of the other. Tg and Ts are compared in- 
troducing a mapping A : V(Tg) > V(Ts), which 
usually corresponds to the least common ancestor 
mapping. Three biological events are considered 
for gene families’ evolution: duplications, losses, 
and lateral gene transfers. 

A duplication is a copy of a given gene, after 
which the two copies evolve independently. A 
duplication occurs in an internal node g of Tg 
if and only if A(f(g)) = A(g), for a child 
I(g) of g. A loss of a gene in some species 
consists in a copy of a gene disappearing during 
the evolution of a given gene family. The losses 
can be computed from the mapping 4 between 
Tg and Tg [21]. 
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When duplications and losses are the only 
evolutionary events considered, a gene tree and a 
species tree are compared with a reconciled tree 
R(TG, Ts) [2,5,9-11,21,23]. The reconciled tree 
is a binary tree that represents an embedding of 
a gene tree inside a species tree, and it allows 
to identify when duplications and losses occur. 
However, when considering also lateral gene 
transfers (also called horizontal gene transfers), 
the scenario changes, and the evolutionary history 
of a gene family must be represented by a phy- 
logenetic network. A lateral gene transfer occurs 
when some genetic material is transferred from a 
taxon to another taxon which is not a descendant 
of the first taxon. 

In order to represent a reconciliation of a gene 
tree Tg and a species tree 7's considering as 
evolutionary events, duplications, losses, and lat- 
eral gene transfers, the definition of duplication- 
transfer-loss scenario (DTL-scenario) has been 
introduced [25]. Notice that other models of rec- 
onciliation have been proposed, notably [10]. 


Definition 1 A DTL-scenario is a tuple S = 
(Ts, Tg,o,A*, &', A, , ©) where Ts is a species 
tree; Tg is gene tree; o maps each leaf of Tg 
with the corresponding leaf of Ts; A* maps each 
node of Tg in anode of Ts (A* can be considered 
as a generalization of the least common ancestor 
mapping A); {X’, A, I} is a tripartition of the 
internal nodes of Tg in speciation nodes, duplica- 
tion nodes, transfer nodes, respectively, while © 
is a subset of the edges of Tg; and the following 
properties hold: 


1. For each leaf of Tg, A*(u) = o(u). 

2. Consider a node x with children x; and x,, 
then A*(x) is not a proper descendant of one 
of A*(x,), A*(x;), and one of A*(x;), A* (x7) 
is a descendant of A*(x). 

3. Given an edge (x, y) of Tg, then (x,y) € O 
if and only if x is not comparable with y. 

4. Given nodes x of Tg with children x, and x;, 
then: 

(a) x € I if and only if (x,x;) € © or 
(x,x,)€ 0. 

(b) x € » only if A*(x) is the least common 
ancestor of A*(x;) and A*(x;), and A*(x;) 
and A*(x;) are not comparable. 
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(c) x € A only if A*(x) is an ancestor of 
the least common ancestor of A*(x;) and 
A*(x;), and A*(x;) and A* (x;) are compa- 
rable. 

5. Consider two edges (x, x’) € © and (y, y’) € 
©, with x’ an ancestor of y’, then A*(x’) is an 
ancestor of A*(y). 


The number of losses is directly inferred from 
a given scenario S [1,25]. 

Now, we briefly discuss the biological moti- 
vations for the conditions introduced. Condition 
1 guarantees the correspondence of each leaf Tg 
with the corresponding species (leaf) of T'5. Con- 
dition 2 guarantees that the order on the nodes of 
Tg is preserved by the mapping A*. Condition 3 
defines the edges associated with a lateral gene 
transfer. Condition 4 establishes that the nodes of 
Tg can be either associated with a lateral gene 
transfer (condition 4.a), with a speciation (con- 
dition 4.b, then each node x and its two children 
must be mapped in different nodes of Ts), or with 
a duplication (condition 4.c, then each node x and 
at least one of its two children must be mapped 
in the same node of 75). The last condition 
(condition 5) is introduced to ensure that different 
lateral gene transfers are biologically meaningful, 
that is, those events relate coexisting species, and 
that if (x, x’) is lateral gene transfer, then there 
is no lateral gene transfer (y, y’), where y is a 
proper ancestor of x and y’ is a proper descendant 
of x’. 


Key Results 


Consensus Networks 
The maximum agreement subnetwork (MASN) 
problem is NP-hard even if the input is com- 
posed of a binary tree and an unbounded-level 
network [17]. If the input is composed of two 
level-1 networks, the problem can be solved in 
time O(n7), and it is fixed-parameter tractable if 
the input is composed of two level-k networks 
(where k is the parameter) [6]. 

Given a set of triplets, constructing a level-k 
phylogenetic network consistent with all of them 
is NP-hard for all k > 1, while it is NP-hard 
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for all k > 0 if we want to construct a level-k 
network consistent with the maximum number of 
them [28]. If the input set 7 of triplets is dense 
(i.e., it contains a triplet for each cardinality three 
subset of taxa), then a level-k network consistent 
with all triplets can be found (if any) in time 
O(T) for k = 1 [18], in time O(7*) for k = 
2 [27], and in polynomial time for any fixed 
k [24]. Recently, these results have been extended 
in order to minimize the reticulation number of 
the computed network [14, 26]. 

The reconstruction of a consensus network 
from a set of clusters in the hardwired sense 
has been tackled in [16], where an algorithm 
for reconstructing a phylogenetic network that 
represents a set C of clusters (and only C) with 
O(\C|) nodes and O(|C|?) edges is presented. 
An algorithm for reconstructing a level-1 net- 
work from clusters in the softwired sense has 
been shown in [15], which has been later ex- 
tended in order to compute in polynomial time 
a level-k network or a network with reticula- 
tion number & for every fixed k [19]. An effi- 
cient algorithm for computing a level-k network 
(with k € {1,2}) representing a set of cluster 
in the softwired sense which also attempts to 
minimize the reticulation number has also been 
presented [29]. 


Reconciliation 

The main combinatorial problem related to rec- 
onciliation is, given a species tree and a gene tree, 
the computation of a DTL-scenario of minimum 
cost, for some function that assigns positive cost 
to duplications, losses, and lateral gene transfers. 
The problem is known to be NP-hard [22, 25]. 
However, two tractable variants of the problem 
have been considered. 

A first variant, called cyclic DTL-scenario, 
does not consider condition 5 of Definition 1. 
Computing a cyclic DTL-scenario of minimum 
cost is polynomial time solvable. First, an algo- 
rithm for cost function that assigns positive cost 
only to duplications and lateral gene transfers was 
presented [25]. Later a linear time algorithm for 
a general cost function (hence losses are assigned 
a positive cost) was given [1]. 
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A second variant considered is the dated ver- 
sion, where nodes of the species are associated 
with labels that represent the divergence time and 
a lateral gene transfer is possible only between 
coexisting species. In this case, computing an 
acyclic DTL-scenario of minimum cost is poly- 
nomial time solvable [20]. 


Open Problems 


The fixed parameter tractability of computing a 
minimum reticulation-number phylogenetic net- 
work that combines a set of nonbinary trees has 
not yet be assessed, and only a few attempts 
focus on approximate solutions. Answers to both 
questions could be of interest. 

An interesting open problem related to the 
computation of a DTL-scenario of minimum cost 
is the investigation of the parameterized complex- 
ity for acyclic DTL-scenarios, when parameter- 
ized by the cost of the solution or by the num- 
ber of lateral gene transfers. Another interesting 
direction for this problem is to investigate its 
approximation complexity. 
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Problem Definition 


This problem concerns hypergraph dualization 
and generalization to poset dualization. 

A hypergraph H = (V,€) consists of a finite 
collection € of sets over a finite set V, i.e., E C 
P(V) (the powerset of V). The elements of € are 
called hyperedges, or simply edges. A hypergraph 
is said simple if none of its edges is contained 
within another. A transversal (or hitting set) of 
H is a set T C V that intersects every edge 
of €. A transversal is minimal if it does not 
contain any other transversal as a subset. The 
set of all minimal transversal of H is denoted 
by Tr(H). The hypergraph (V, Tr(H)) is called 
the transversal hypergraph of H. Given a sim- 
ple hypergraph H, the hypergraph dualization 
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problem (TRANS-ENUM for short) concerns the 
enumeration without repetitions of Tr(H). 

The TRANS-ENUM problem can also be for- 
mulated as a dualization problem in posets. Let 
(P,<) be a poset (1.e., < is a reflexive, anti- 
symmetric, and transitive relation on the set P). 
For A C P, | A (resp. ¢ A) is the downward 
(resp. upward) closure of A under the relation 
< (ie., | A is an ideal and ¢ A a filter of 
(P, <)). Two antichains (B*, B~) of P are said 
to be dual if | BTU + B”- = P and | 
Btn t¢ B~ = @. Given an implicit description 
of a poset P and an antichain Bt (resp. B~) of 
P, the poset dualization problem (DUAL-ENUM 
for short) enumerates the set B~ (resp. B*), 
denoted by Dual(B*) = B~ (resp. Dual(B~) = 
Bt). Notice that the function dual is self-dual or 
idempotent, i.e., Dual(Dual(B)) = B. 

TRANS-ENUM is a particular case of DUAL- 
ENUM. Indeed, consider P the poset (P(V), C) 
for some set V. Then for every dual set (Bt, B~) 
of P, we have B~ = Tr(B+) = Dual(Bt), or 
equivalently Bt = Tr(B-) = Dual(B-) with 
E={V\E|E€€&}where€ Cc P(V). 

Now we ask the following question: Which 
posets DUAL-ENUM can be reduced to TRANS- 
ENUM? To do so, we introduce the notions 
of duality gap, convex embedding, and poset 
reflexion. 

Let (P, < P) and (Q, <q) be two posets and 
f : P = @ an injective reflection, i.e., for 
all x,y € P, f(x) <o f(y) implies x <p 
y. Notice that the reflection f preserves in- 
comparability, i.e., if x and y are incomparable 
in P, then f(x) and f(y) are incomparable 
in Q. Therefore, for every dual set (Bt, B-) 
of P, Dual(f(6*)) contains f(B~). The dif- 
ference between the size of Dual(f(B*)) and 
the size of f(6~) is a positive integer, called 
the duality gap. We speak about weak duality 
when the gap is strictly positive, strong duality 
otherwise. 

Duality gaps are important in enumeration 
problems because they provide an upper bound 
on the difference between the number of enumer- 
ated solutions and the number of solutions of the 
original problem. 
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Key Results 


TRANS-ENUM has been intensively studied in the 
last two decades, and several results show that 
it is equivalent to many problems in computer 
science area (see the paper by Eiter and Got- 
tlob [3]). The question whether TRANS-ENUM 
admits an output-polynomial time algorithm is 
still open. In fact, despite the number of papers 
on TRANS-ENUuM, the best known algorithm is 
the one by Fredman and Khachiyan [8] which 
runs in time O(n'°2)) where n is the size of the 
hypergraph plus the number of minimal transver- 
sals. Other results on complexity can be found in 
(5,6, 11, 12, 14]. For general posets, it is shown in 
[7] that the dualization over the products of some 
posets can be done with the same complexity as 
TRANS-ENUM. Recently, Nourine and Petit [16] 
have investigated dualization problems in general 
posets for which the duality gap is bounded by a 
polynomial. 


Strong Duality 

The following characterization theorem of the 
zero gap is a reformulation of a known result in 
[10, 15], where the poset Q is the powerset for 
some set. 


Theorem 1 Let (P,<p) and (Q,<g) be two 
posets. Then the duality gap is zero iff there exists 
amap f : P > Q such that f is a bijective 
embedding, i.e., for allx,y € P f(x) <o f(y) 
iffx <p y. 


Many instances of problems have such a prop- 
erty, for example, frequent itemsets, monotone 
Boolean functions, minimal keys, inclusion de- 
pendencies, or minimal dominating sets [10, 13, 
15]. Nevertheless, the bijective embedding be- 
tween two posets does not always exist. In the 
following we give a relaxation of the bijection 
embedding in order to capture some polynomial 
reductions between enumeration problems. 


Weak Duality 

Let (P, <p) and (Q, <g) be posets. A function 
f : P => Q isa convex embedding if for all 
x,y € Pandz € Q,x <p yiff f(x) <o fQ) 
and f(x) <g9 z <o f(y) implies there exists 
t € P such that f(t) = z. 
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The following result can be seen as a 
relaxation of the bijective embedding given in 
Theorem 1. 


Proposition 1 Let (P,<p) and (Q,<o) be 
two posets and f P +> @Q a convex 
embedding. Then there exist two antichains Bt, 
Bo of Q such that P \ {1p} is isomorphic to 
Q\(\ BU t Bp), where Lp is the bottom of 
P if it exists. Furthermore, the duality gap is 
bounded by | Bi | + | Bo |- 


Complexity 

For strong duality, [10, 15] points out how 
the result of Fredman and Khachiyan [8] can 
be reused to devise an incremental quasi- 
polynomial time algorithm, called Dualize 
and Advance, for some pattern mining 
problems. For weak duality, whenever the duality 
gap remains polynomial in the size of the problem 
and (Q, <g) isomorphic to (P(E), C) for some 
set FE, the Dualize and Advance algorithm 
can be reused with the same complexity if the 
following assumptions hold: 


1. The reflexion f of (P, <) to (P(E), C) and 
its inverse is computable in polynomial time. 

2. Given two elements x, y € P, checking x < 
y is polynomial time. 


Applications 


The hypergraph dualization is a crucial step in 
many applications in logics, databases, artificial 
intelligence, and pattern mining [3, 4, 8, 11, 15], 
especially for hypergraphs, i.e., Boolean lattices. 
The main application domain concerns pattern 
mining problems, i.e., the identification of max- 
imal interesting patterns in database by asking 
membership queries (predicate) to a database. In 
the rest of this section, we give two examples of 
pattern mining problems related to DUAL-ENUM 
and weak duality. 


Frequent Conjunctive Queries 

We consider the problem statement defined in [9]. 
Let R = {R,,...,R,} be a database schema, 
D the domain of R and sch(R) = {R;.A|Ri € 
R,A e€ R;}. A (simple) conjunctive query QO 
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over R is of the form my(oF(R; x... x Ry)) 
(ax(oFr) for short) where X C sch(R) and F 
a conjunction of equalities of the form R;.A = 
R;.B or R;.A = c with R;.A, R;.B € sch(R) 
and c € D. Let Q,; be the set of all possible 
conjunctive queries over R. For a given database 
d over R, we note Adom(d) C D is the active 
domain of d and Q(d) the result of the evaluation 
of QO against d. We note F is the finite set of all 
possible selection formula over R and Adom(d), 
ie, F = {{A,B} | A A B,A E RB | 
RU Adom(d)}. 

Let Q1, Q2 be two conjunctive queries over 
R. Q, is contained in Q2, denoted by Q; C 
Q>, if for every database d over R, Qi (d) C 
Q2(d). Q, is diagonally contained in Q>, de- 
noted Q; C4 Qo, if Q, is contained in a projec- 
tion of Q2, ie., 0; © mx(Qz2). The frequency 
of mx(or) in d is defined by |zx(or)(d)|. 
A query tx(oF) is frequent in d with respect 
to a given threshold ¢ if |my(or)(d)| > €. 
The frequency is anti-monotonic with respect 
to C4 [9]. 


Proposition 2 Let Q; = myx,(oFr,) and Q2 = 
xX, (OF, ) be two queries of Q,. Then Q, C4 05 
iff X; © Xz and Fy C Fy. Equivalently, Q, cA 
QO» iff X; U(F \ Fi) © XU (F \ Fo). 


From Proposition 2, f : QO, > P(R UF) 
with f(tx(or)) = X U (F \ F) is a bijective 
embedding. Thus Q; ordered under C4 isa 
Boolean lattice and Theorem | can be applied. 
It is interesting to consider the subclass of Q, 
restricted to consistent queries, i.e., queries for 
which there exists at least one database such 
that their evaluations return values different from 
zero. For instance, 0(B = 1A B = 2) and 
0(A = BAA=1AB = 2) are not consistent. 
Let us consider the set Oc C Q, of all consistent 
queries. 


Lemmal Let Q; = szyx,(or,) and Qo = 
UX, (OF,) be two queries of Q; such that Q, cS 
Q>. Then if Q2 is consistent, it implies Q, is 
consistent. 


Notice that the restriction of f to Qc is stilla 
convex embedding, but no longer bijective. More 
interestingly, the associated duality gap is not 
polynomial. Indeed, Bo = @ but Bo has a size 
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exponential in the size of R U Adom(d) since 
the number of selections of the form o(A; = 
Ag A...A\ An—-1 = An A Ay = UA An = v’') is 
exponential in the number of attributes. 


Rigid Sequences 

Let us consider sequences with or without wild- 
card (denoted «); see, e.g., [1]. Let © be an 
alphabet and * ¢ ¥. A rigid sequence s[n] is a 
word of size n of (37 U {«})* such that s[1] 4 « 
and s[n] ~ x. The set of all rigid sequences of 
size at most n are denoted by 2’, and the empty 
sequence by e. Let s[/], [k] €¢ ©. We consider 
the following classical (prefix and factor) partial 
orders on rigid sequences: 


° s Cf t, if there exists 7 € [1...k] such that 
foreveryi € [1.../], either s[7] = t[7 +i-1] 
or s[i] = * (factor). 

* s Cy t, if for every? € [1.../], either s[i] = 
t{i] or s[i] = * (prefix). 


The following theorem shows that the duality 
gap between the dualization in prefix posets of 
rigid sequences and TRANS-ENUM is bounded by 
a polynomial inn and | » |. 


Theorem 2 ((16]) Let f : (2% \ te},Ep) > 
(P({1,...,n} x 2), C) be a function defined by 
f(s) = {G, s[i]) | s[i] 4 *,i < n}. Then f is 
a convex embedding with Br = {{(@i,x) |x € 
Xi € [2...n]}} and By = {{d,x),C.y)} | 
xy € Lx A ySUK,x),Gy) Gz} | 
x,y,z2e Ly #2z,i €[2...n]}. 


Proposition 3 ({16]) There is a poset reflection 
fi (2h Ep) > (LR, Ep) with a duality gap 
bounded by a polynomial in n. 


Using Theorem 2 and Proposition 3, we con- 
clude that the duality gap between the dualization 
in factor posets of rigid sequences and TRANS- 
ENUM is bounded by a polynomial the size of 
and n [16]. 


Open Problems 


1. The challenging question is to find an output- 
polynomial time algorithm for TRANS-ENUM. 
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2. Lattices are a particular class of posets. For ex- 
ample, the dualization over product of chains 
can be done with the same complexity as 
TRANS-ENUM which is equivalent to dualiza- 
tion in Boolean lattices. For distributive lat- 
tices class which contains Boolean lattice and 
the product of chains, the dualization is open. 

3. Many connections have to be done between 
TRANS-ENUM and graph theory problems, 
such as minimal dominating sets [13]. 

4. Many problems in data mining can be formu- 
lated as dualization in posets, e.g., frequent 
subgraphs or frequent subtrees. An interesting 
direction is to identify posets for which the 
dualization is equivalent to TRANS-ENUM. 


URLs to Code and Data Sets 


Program Codes and Instances for Hypergraph 
Dualization can be found on the Takeaki 
Uno’s webpage at http://research.nii.ac.jp/~uno/ 
dualization.html. Some pattern mining problems, 
reducible to TRANS-ENUM with strong duality, 
can be found on the iZi webpage at http://liris. 
enrs.fr/izi/. 
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Problem Definition 


Over the last few years, differential privacy [5,6] 
has emerged as one of the most accepted no- 
tions of statistical data privacy. At a high level 
differential privacy ensures that from the output 
of an algorithm executed on a data set of po- 
tentially sensitive records, an adversary learns 
“almost” the same thing about an individual irre- 
spective of his presence or absence in the data set. 
Formally, differential privacy is defined below 
(Definition 1). Setting the privacy parameters 
e < landid < ensures semantically 
meaningful privacy guarantees. For a detailed 
survey on the semantics of differential privacy, 
see [2,3,8,9]. 


Definition 1 ((¢, 5)-differential privacy [5, 6]) 
We call two data sets D and D’ (with n records 
from a fixed domain tT) neighboring if they differ 
in exactly one entry, ie., |DAD’| = 2. An 
algorithm A is (€, 6)-differentially private if, for 
all neighboring data sets D and D’ and for all 
measurable events S in the range space of A, we 
have 


Pr[A(D) € S] < e PrLA(D’) € S] + 6. 


Initial efforts towards designing differentially 
private algorithms have concentrated on settings 
where the algorithms enjoy the same utility guar- 
antees for any data set from the domain t”. (See 
[3] for a survey on these efforts.) However, due to 
the pessimistic nature of these algorithms, some 
perform poorly in non-worst-case scenarios. With 
the seminal work of [11], and followed by a 
series of results [4, 7, 10, 12, 13], the commu- 
nity started focusing on designing differentially 
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private algorithms which are more useful in non- 
worst-case settings, but in pessimistic scenarios 
may only perform as poorly as the worst case 
algorithms. In this entry, we provide an overview 
of some of the recent efforts in this line of 
research. 


Computing the Median: A Motivating 
Example 

To provide a flavor of the nature of these al- 
gorithms, we start with the following simple 
example: Given a data set D = {dj,...,dn} of 
n real numbers in [0, R] (with R € Rt and n 
being odd), the task is to compute the median of 
D while preserving differential privacy. Notice 
that in the worst case, changing one entry in D 
can change the median by R. So intuitively, any 
algorithm that does not distinguish between worst 
case and non-worst-case scenario will introduce 
an error (2(R) in the output. 

Without loss of generality, assume that dj < 

- < dy, and let m = 2. Now it is not 
hard to observe that by changing one data entry 
in D, the median d,, can change by at most 
max{dm — dm—1,4m+1 — dm}, which can be 
potentially much smaller than R. An algorithm 
that takes advantage of this observation can be 
much more useful in non-worst-case settings as 
compared to an algorithm that always introduces 
an error of ©(R). With this example in mind, 
in the following section, we define some of the 
notions in the differential privacy literature that 
capture the non-worst-case change in the output 
of a given computation task on neighboring data 
sets. 

Although the median might seem to be a very 
simple example, but interestingly, this intuition 
of capturing non-worst-case change extends to 
a large class of problems. Especially in many 
machine learning settings, where even for the 
non-private algorithms the error guarantees are 
over distributional assumptions on the data [1], 
this intuition is very helpful in designing effective 
differentially private variants. 


N.B. Notice that in the case of computing the 
average, there is no distinction between worst 
case and non-worst change. Hence it is not a good 
example for the scenarios we address in this entry. 
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Notions of Sensitivity 


We describe some of the concepts which help 
us capture non-worst-case changes in the output 
of a given function f : t” — R on pairs of 
neighboring data sets D and D’. Later we use 
them to design differentially private algorithms 
which capture non-worst-case behavior of the 
data sets. 


Global Sensitivity [6] 

This notion of sensitivity refers to the maximum 
change the function f can have on any pair of 
neighboring data sets from the domain. Formally, 


GS(f) = f(D) — f(D). 
2 


(1) 


The following algorithm in (2) is (€,0)- 
differentially private. In the literature this is also 
called the Laplace mechanism [6]. Here Lap(A) 
refers to the Laplace distribution with standard 
deviation /2A. 


max 
,D’er",|DAD’|= 


Output: f(D) + Lap ( (2) 


GS(f ) 

zg : 
Notice that in (2) the distribution on the 
noise that is added is the same for all data 
sets. In general these style of algorithms that 
introduce data independent randomness have 
weaker utility guarantees in non-worst-case 
scenarios. 


Local Sensitivity 

While global sensitivity captures the maximum 
change in the output of f for any pairs of neigh- 
boring data sets, local sensitivity relaxes this 
notion to capture the maximum change in the 
output of f for any neighboring data set of a 
given data set D. Formally, 


LS(f, D) = fP)- Ff). 

(3) 
With the similarity between the definitions 
of local sensitivity and global sensitivity, it 
might be tempting to use the same algorithm 


max 
D/et",|DAD'|= 


Beyond Worst Case Sensitivity in Private Data Analysis 


as (2), with the GS replaced by LS. A careful 
observation indicates that this algorithm cannot 
be (€, 6)-differentially private for any non-trivial 
choices of € and 6. Consider the following setting 
where the data domain is {0, 1}”, and the function 
f is the median value of D. Let D be a data set 
with | 5 — 1| entries as zero, and the rest as one. 
When 7 is odd, clearly LS(D) equals zero. But 
for a data set D’ formed by changing one of the 
zeros in D to one, LS(D’) equals one. So, if 
we replace GS by LS in (2), then for D there 
will be zero noise added, and for D’ the noise 
will be Q(1/e). Differential privacy prohibits 
this. 

The counterexample above might give an im- 
pression that local sensitivity may not be a useful 
concept. However, in the following and in Al- 
gorithm 2, we show that one can use local sen- 
sitivity to obtain effective differentially private 
algorithms which are more useful in non-worst- 
case scenarios. 


Smooth Sensitivity [11] 

In the above example, we noticed that a direct use 
of local sensitivity in noise addition can result in 
trouble. However, using a related notion called 
smooth sensitivity, one can obtain a variant of 
the algorithm in (2) which is both differentially 
private and respects local properties of a given 
data set. At a high level, smooth sensitivity is an 
envelope over the local sensitivity which helps 
avoid abrupt change in the variance of the noise 
on neighboring data sets. Formally, 


SS(f,D, B) = max LS(/, Dye Pts eae 
feqn 

(4) 
Here dist is the symmetric difference between 
the two data sets D and D’ and 8 > 0 is the 
smoothness parameter. Following observations 
on smooth sensitivity will be useful for designing 
differentially private algorithms. 


1. Observation 1 (Envelope on LS): VD, f > 
0, SS(f, D, B) = LS(f, D). 

2. Observation 2 (Smaller than GS): V6 > 
0,D et”, SS(f. D, B) < GS(f). 

3. Observation 3 (Smoothness): For all neigh- 
bors D, D’, SS(f, D, B) < e8 SS(f, D’, B). 
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Key Results 


Using the concepts of local sensitivity and 
smooth sensitivity defined in the previous section, 
we provide two differentially private algorithmic 
frameworks which respect local (non-worst- 
case) properties of a given data set. Later in 
Applications 1 and 2, we instantiate them with 
specific problems. 


Algorithm 1: Smooth Sensitivity Based 

In order to use the notion of smooth sensitivity 
to design a differentially private algorithm anal- 
ogous to (2), we need the following properties 
from the noise distribution to be added to f(D). 
Let us define the following notation: For a subset 
S of R, the set S + A defines {z + A: z € S}, 
and the set e+ - § defines the set {e+ - z : z € S}. 


Definition 2 (Admissible noise distribution 
[11]) A probability distribution h on R is (a, B)- 
admissible if, fora = a(¢,5),B = Be, 4), the 
following conditions hold for all |A] < @ and 
|A| < B, and for all subsets S C R. 


1. Sliding property: Jl e S|] < 
ef/? Pr [ZeS+A]+$. 
2. Dialation property: Pr [Z e S|] < 


€/2 a, 5 
ef/? Pr [Z eet S] +4. 


With Definition 2 in hand, now we can define an 
algorithm which is analogous to (2). Let # be an 
(a, B)-admissible noise distribution and let Z be 
an independent sample from h. For a given data 
set D, the algorithm is as follows: 


Output: f(D) + SoU: PP) : 


Z. (5) 
One can show that the above algorithm is 
(€,6)-differentially private [11]. An immediate 
question that arises: Which natural distributions 
satisfy this property? [11] showed that Laplace 


distribution zee! is (¢€/2, Tm) -admissible, 


and NV (0, 1) is (</ VinG/3), anti ) “admissible, 
Later we will see a concrete instantiation of (5) 
for the median problem. 
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Algorithm 2: Propose-Test-Release (PTR) 
Framework 

In the previous section we saw a “noise- 
addition”-based algorithm that exploited the 
smooth upper bound on the local sensitivity 
to ensure differentially privacy. In this section, 
instead of obtaining a smooth bound on the local 
sensitivity, we seek an answer to the following 
question: Given a proposed upper bound A on the 
local sensitivity of f(D), how many data points 
(k) in D need to be changed to increase the local 
sensitivity beyond A? If k is sufficiently large, 
then the algorithm uses the proposed bound A in 
(2) instead of GS(f); otherwise the algorithm 
outputs a 1 and fails. Once formalized, this 
algorithmic paradigm can be shown to be (e, 5)- 
differentially private. A major component in the 
design of algorithms using this paradigm is to 
come up with tight upper bounds on the local 
sensitivity. In Applications 1 and 2 we state two 
approaches for getting such bounds. 

In the following, we formally introduce the 
propose-test-release framework. The version in 
Algorithm | is a variant of the ones appeared in 
[4] and [13]. 


Algorithm 1 Propose-test-release (PTR) frame- 

work 

Input: Data set: D € +”, function f : t” — R, pro- 
posed local sensitivity bound: A, privacy parameters: 
(e€, 6). 

: Distance to instability: dist <- Minimum k ¢€ [n] 


LS(f, D’)] > A. 
2k 


— 


for which max 


D!,DAD'= pe 
2: Noisy distance to instability: dist <— dist + 
Lap (+). _ 
3: Test and release: If dist > 4 log(1/6), then output 
A, else output _L. 


One can show that Algorithm 1 is (e,6)- 
differentially private. (See [13] for more details.) 
Additionally, by the tail properties of Laplace 
distribution, it is not hard to show that if 
dist > 2 log( 1/5), then with probability at least 
1 — 6, the algorithm outputs A. In (6) we 
fit Algorithm 1 with the Laplace mechanism 
from (2) to obtain a differentially private 
estimate of f(D). By the composition property 
of differential privacy [4, 5], the algorithm 
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(PTR+Laplace mechanism) in (6) is (2«€,6)- 
differentially private. 


If PTR(f, D, A,¢,5) # L, then output 


f(D) + Lap (=) , else fail. (6) 


a 

One can notice that if the proposed bound 

A is much smaller than GS(f), then whenever 

the algorithm succeeds, it would add much lesser 

noise to f(D) as compared to (2). In Application 

1 we will do a comparison between the global 

sensitivity based, the smooth sensitivity based, 

and the PTR-based algorithm for the problem of 
computing the median. 


Application 1: Computing the Median 


With the smooth sensitivity-based and the PTR- 
based algorithmic frameworks from the previ- 
ous section in hand, we revisit the problem of 
computing the median. Let the data set D = 
{d,,...,dn} € [0, R]” with n being odd and 
R € R being the range. W.1.0.g., assume that the 
entries in D are sorted in ascending order. 


Smooth sensitivity-based algorithm for me- 
dian computation. In order to use (5), we need 
to be able to efficiently compute the smooth 
sensitivity (4) of the median function for D with 
a given smoothness parameter 6. The follow- 
ing theorem implies an O(n log) algorithm for 
computing the smooth sensitivity. 


Theorem 1 ({11]) Let m = *5*. The smooth 
sensitivity of the median function with the 
smoothness parameter B is given by the 
following. 


SS(Median, D, 8) = max 


Precrt 


—k, 
x (« P max (dm+t — dnt1-t-1)) . 
t=0,....k+1 


It can be computed in time O(n logn). 
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Once the smooth sensitivity bound is obtained, 
one can use it in (5) to obtain a differentially 
private approximation to the median of D. If 
Laplace distribution (sel) is used as the noise, 
then set the admissible parameters a = €/2 
and 6 = TinC5y° An immediate question that 
arises is how does the noise added by the smooth 
sensitivity-based algorithm compare to the global 
sensitivity-based algorithm in (1). First notice 
that since global sensitivity is always an upper 
bound on smooth sensitivity, the noise added via 
smooth sensitivity can never be more than that via 
global sensitivity. Next we present a setting of the 
data set D where in fact the smooth sensitivity- 
based algorithm adds much lesser noise (and 
hence more accurate). 

Consider the data set D where each d; = 
for all i € [n]. In this case the term A, 
EDS Tins 

n 


Ri 


n 


the term e~*8 Ax is maximized when k = Z —1. 
Assuming 6 <_ 1, the smooth sensitivity 
SS(Median, D,f) is bounded by 7. If we 
use Laplace distribution in (5) to ensure («, 5)- 
differential privacy, then the noise that gets added 
to Median(D) is O (Segal . In comparison, 
the global sensitivity-based algorithm in (2) will 
add noise O (£), which is much higher. 

One might argue that the global sensitivity- 
based algorithm guarantees stronger differential 
privacy ((€, 0) as opposed to (€,5)) and hence it 
is not a fair comparison. Even when one uses a 
more concentrated noise distribution like Gaus- 
sian distribution (which ensures (¢, 6)-differential 
privacy) instead of Laplace distribution in (2), the 
error still remains the same. 


PTR-based algorithm for computing the me- 
dian. We now instantiate the PTR-based algo- 
rithm for the same problem of computing the 
median. In order to do so, we first partition the 
real line R into bins of width h = fs (or any 
width ay for any y > 0). Call this set of bins 
B. Additionally consider the set Byn/2), which 
is the set of bins shifted by 1/2. 

In Algorithm | we set the proposed sensitivity 
bound A = h. We compute the distance to 
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instability in Line | of Algorithm | by the follow- 
ing technique. Let k, be the minimum number 
of entries in D that needs to be changed to 
move the median from its bin in the set B, and 
let kz be the corresponding minimum number 
for the set Bc+n/2). The distance to instabil- 
ity is dist < max{k,,k2}. Now the rest of 
the algorithm follows as described for the PTR 
framework. The two sets of shifted bins B and 
B(+n/2) were needed because the median might 
fall at the partition boundary of the bins. Notice 
that computing dist takes O() time. 

In terms of the utility guarantee for this algo- 
rithm, we have the following: 


Theorem 2 ([4]) Let the data set D be drawn 
itd. from some fixed distribution P, where the 
cumulative distribution function of P is differ- 
entiable with positive derivative at the median. 
Assuming the privacy parameter 5 = 1/poly (n), 
we have the following utility guarantees for the 
PTR-based median computation. 


Pr[PTR(D) = L] = O(e7*'°8”) 


and PTR(D) converges in probability to the 
median of P asn —> ov. 


Application 2: Selection from 
a Discrete Set 


In this section we will see another application of 
the PTR framework. Although the exposition is 
fairly abstract, we will see that this tool is useful 
for a variety of machine learning problems, where 
we assume “very little’ about the underlying 
learning algorithm. Some of the examples being 
sparse estimation, parameter tuning, and non- 
convex learning. 

Given a data set D € t”, anda choice function 
f : tt” — {S,...,S%}, the objective is to 
compute a differentially private approximation 
to f(D). Here {S,,...,5;} form a discrete set 
of choices. In order to design the private al- 
gorithm, we instantiate the PTR framework in 
Algorithm 1, with A = 0 and f being the choice 
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function. (Notice that A = 0 means that the 
output of the function does not change at all 
by changing any one entry in the data set.) If 
the output of the PTR framework is not equal 
to L, then output f(D) exactly, and output L 
otherwise. From the privacy property of the PTR 
framework, it follows that the above algorithm is 
(€, 6)-differentially private. In terms of utility one 
can show the following. 


Theorem 3 ({13]) Jf the distance to instability of 
the choice function (Line I of Algorithm 1) is 
at least 2\og(1/6), then with probability at least 
1 — 6, the above PTR instantiation outputs f(D) 
exactly. 


At a high level, Theorem 3 says that if one 
needs to change sufficient number of entries 
(2log(1/6)) in the data set D to change f(D), 
then with high probability the PTR framework 
will output f(D) exactly. One issue with the 
current instantiation of the PTR framework is 
that it is not clear a priori how to efficiently 
compute the distance to instability in Line | of 
Algorithm 1. In the following we circumvent 
this problem by instantiating the PTR framework 
with a proxy function f instead of f, for which 
the distance to instability is always efficiently 
computable. Moreover, if on a (sufficiently large) 
random subset Dgyp of D, with probability at 
least 3/4 one can guarantee f(Dsup) = f(D), 
then the PTR framework outputs f(D) exactly 
with high probability. 


Subsample and aggregate framework. The 
basic idea of subsample aggregate framework 
first appeared in [11] and the current version 
is from [13]. Here we use a variant of that 
framework for instantiating the proxy function 
Fa corresponding to f. 

Let g = 3Dos By and m = bates) Sample 
data sets D,,..., Dm, where each D; is gener- 
ated from D by sampling each entry in D with 
probability g, and Dj;’s are i.i.d. Let Stirst be the 
choice which appears maximum number of times 
in F = {f(D,),..., f(Dm)}, and let Ssecona be 
the corresponding second. Let the proxy function 
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A 


f(D) equal Stirst. Let count(S) be the number 
of times the choice S appears in F. One can 
show that with probability at least 1 — 6, the 
distance to instability of f (D) equals dist <— 


count(Sfirst)—Count(Ssecond) __ 1. From this. one can 
4mq : ; 


conclude that using f as the proxy for the choice 
function f in the PTR framework ensures (€, 26)- 
differential privacy. In terms of utility, for this 
“proxy” instantiation of the PTR one can show 
the following. Notice that in both Theorems 3 
and 4, there is no dependence on the number of 
possible choices (k). 


Theorem 4 ((13]) Jffor each D; (defined above) 
F(Di) = f(D) with probability at least 3/4, 
then with probability at least 1 — 26, the above 
instantiation of the PTR framework outputs f(D) 
exactly. 


One of the classic setting where the above algo- 
rithms can be used in the case of model or feature 
selection in machine learning. A specific example 
is the LASSO estimator, where the PTR-based al- 
gorithm achieves the optimal sample complexity 
even under the constraint of differential privacy. 
(See [13] for details.) Another example is in 
finding the best regularization parameter for a 
given regression problem (Dwork and Thakurta, 
Differentially private parameter tuning using sub- 
sample and aggregate framework. Personal com- 
munication, 2014). Let A = {Aj,...,A,} bea 
candidate set of regularization parameters, with 
each A; € R. The idea is to estimate the best 
regularization parameter from the set A for each 
of the sampled data sets D;,..., Dm, and use the 
estimation algorithm itself as the choice function 
f in the PTR framework. 

Notice that we almost assumed nothing about 
the regularization parameter selection algorithm, 
apart from the fact that on random subsamples 
of the original data set D, the algorithm selects 
the same regularization parameter most of the 
times. The subsampling-based algorithm can also 
be used in the context of learning non-convex 
models while preserving differential privacy 
(Bilenko et al., Private and robust non-convex 
learning. Personal communication, 2014). For 
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the purpose of brevity, we defer the exposition for 
differentially private learning with non-convex 
models. 


Reference Notes 


The global sensitivity-based and the smooth 
sensitivity-based algorithmic framework are due 
to [6] and [11] respectively. The propose-test- 
release (PTR) framework is initially due to [4], 
but the exposition in this note is from [4, 13]. 
The smooth sensitivity-based private median 
algorithm is due to [11], and the one based on 
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Problem Definition 


How to effectively translate an algorithm from a 
distributed system model to another one? 
Distributed systems come in diverse settings 
that are modeled by different assumptions (1) 
on the way processes communicate, e.g., using 
shared memory or messages, (2) on the fault 
model, (3) on synchrony assumptions, etc. Each 
of these parameters has a dramatic impact on the 
computing power of the model, and in practice, 
an algorithm or an impossibility result is usually 
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tailored to a particular model and cannot be 
directly reused in another model. 

This wide variety of models has given rise to 
many different impossibility theorems and nu- 
merous algorithms for many of the possible com- 
binations of parameters that characterize them. 
Thus, a crucial question is the following: are there 
bridges between some models, i.e., is it possible 
to transfer an impossibility result or an algorithm 
from one model to another? 

The Borowsky-Gafni simulation algorithm, or 
BG simulation, is one of the first steps toward 
direct translations of algorithms or impossibil- 
ity results from one model to another. The BG 
simulation considers distributed systems made 
of asynchronous processes that communicate us- 
ing a shared memory array. In a nutshell, this 
simulation allows a set of ¢ + 1 asynchronous 
sequential processes, where up to t of them can 
stop during their execution, to simulate any set 
of n => t + 1 processes executing an algorithm 
that is designed to tolerate to up to ¢ fail-stop 
failures. 

The BG simulation has been used to prove 
solvability and unsolvability results for crash- 
prone asynchronous shared memory systems, 
paving the way for a more generic formal theory 
of reduction between problems in different 
models of distributed computing. 

The BG-simulation algorithm is named after 
its authors, Elizabeth Borowsky and Eli Gafni, 
that introduced it as a side tool [3] in order to 
generalize the impossibility result of solving a 
weakened version of consensus, namely, k-set 
agreement [6]. It has been later on formalized 
and proven correct [4, 18] using the I/O automata 
formalism [17]. 


System Model 


Processes 

The simulation considers a system made of up to 
n asynchronous sequential processes that execute 
a distributed algorithm to solve a given colorless 
decision task, as defined below. 
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Failure Model 

Processes may fail by stopping (crash failure). 
The simulation assumes that up to f processes 
can stop during the execution; t < n is known 
before the execution, but the identity of processes 
that may crash is unknown to the simulation. This 
model of computation is referred to as the f- 
resilient model. A corner case of this model is the 
wait-free model where t + | processes execute 
concurrently and at most ¢t of them may crash. 


Communication 

Processes communicate and coordinate using a 
reliable shared memory composed of 7 multiple- 
reader single-writer registers. Each process has 
the exclusive write access to one of these n 
registers, and processes can read all entries by 
invoking a snapshot operation, with the seman- 
tics that write and snapshot operations appear 
as if they are executed atomically. While using 
the snapshot abstraction eases the presentation 
of the algorithm, it has no impact on the power 
of the underlying computing model, since the 
snapshot/write model can be implemented wait- 
free using read/write registers [1]. 


Tasks 

A colorless task is a distributed coordination 
problem in which every process p; starts with a 
value, communicates with other processes, and 
has to decide eventually on a output value. Col- 
orless tasks, or convergence tasks [12], are a 
restricted version of tasks in which a deciding 
process may adopt the decision value of any pro- 
cess, i.e., two participating processes may decide 
the same value. For more formal definitions of 
tasks using tools from algebraic topology, the 
reader should refer to [11]. 


Simulation 

The simulation proceeds by executing concur- 
rently, using ¢ + 1 simulators s1,..., 5741, the 
code of n > ¢ processes that collaboratively 
solve a distributed colorless task. Hence, each 
simulator s; is given the code of all simulated 
processes and handles the execution of 7 threads. 
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Key Results 


Simulation of Memory 

Each one of the ¢ + 1 simulators s; executes 
the sequential code of the m simulated processes 
pj; in parallel. By assumption, every simulated 
code is a sequence of instructions that are either 
(1) local processing, (2) a write operation into 
memory, or (3) a snapshot of the shared memory. 

Every simulator s; maintains its local view of 
the simulated memory for all simulated threads. 
These local views are synchronized between sim- 
ulators by writing and reading (using snapshots) 
in a shared memory matrix array MEM that has 
one column per simulated thread and one row per 
snapshot instance. 

To ensure global consistency between sim- 
ulators that simulate concurrently all threads, 
operations on the memory must be coordinated 
between different simulators. This is achieved by 
ensuring that, for a given simulated thread, the 
sequence of snapshots of the memory as com- 
puted by all simulators is identical. As consensus 
cannot be implemented wait-free, the simulation 
coordinates snapshots using of a weaker form of 
agreement, the safe agreement. 


The Safe-Agreement Object 

Safe agreement is the most important building 
block of the simulation. First introduced as the 
non-blocking busy-wait agreement protocol [3], 
it has been further refined as safe agreement, 
with several blocking or non-blocking/wait-free 
implementations [2, 11, 14]. 

This weak form of agreement provides two 
methods to processes: propose(v) and resolve(). 
A participating process that proposes a value v 
first calls propose(v) once and is then allowed to 
make calls to resolve() that may return L if safe 
agreement is not resolved yet or a value. In this 
later case, safe agreement is said to be resolved 
and the value returned is the decided value by the 
process. Formally, safe agreement is defined by 
three properties: 


Termination: If no process crashes during the 
execution of propose(), then all processes de- 
cide, i.e., eventually all calls to resolve() re- 
turn a non-_L value, 
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Validity: All processes that decide must decide a 
proposed value, 

Agreement: All processes that decide must de- 
cide the same value. 


The specification is almost identical to the 
one of consensus, apart from the weakened ter- 
mination property. Safe agreement is wait-free 
solvable and thus solvable in f-resilient systems. 

The crucial point of the BG simulation lies in 
the termination property of safe agreement: if a 
safe-agreement protocol cannot be resolved, i.e., 
if no process decides, then at least one process 
crashed during the call to propose(). Thus, a 
given safe-agreement instance can “capture” a 
calling process that crashed during the propose 
invocation. 


Overview of the Simulation 

The current state of the simulation and its 
history is thus represented by two twin data 
structures: (1) the shared memory matrix 
MEM that contains the consecutive memory 
status of all simulated threads, as seen by 
simulators, and (2) a matrix of safe-agreement 
objects SafeAgreement(0...J[1...”] with 
columns, each column representing the execution 
advancement of one of the simulated processes, 
as shown in Fig. 1. 
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In this view, the entry at row £ and column 
i corresponds to the state of the th snapshot 
for simulated process p;. Hence, the “program 
counter” of a simulated thread p; is the greatest 
row of column i that is either unresolved or 
resolved. In this example, simulations of threads 
P2, pa, and pe are stopped with unresolved safe 
agreement that are due to (at least) one simulator 
stuck in the associated propose() methods. The 
program counter of all other threads is 9. 

Each simulator s; is given the code of the n 
threads it has to simulate, as well as an input value 
of one of the threads. Conceptually, the algorithm 
run by simulator s; is as follows: 

In the simulation, each snapshot invocation 
is mediated through a SafeAgreement object, 
lines 6 and 1. The only reason that could block the 
simulation of a given thread p; is when the call to 
resolve, line 6, always returns L. By definition 
of the safe-agreement object, this situation can 
happen only when a simulator crashed during 
the call to propose() on the same safe-agreement 
instance: the crash of a simulator can block the 
simulation of at most one simulated thread. 


Applications 


The BG-simulation algorithm has been primarily 
used to reduce t-resilient solvability to wait-free 


Algorithm 1 BG-simulation: code for a simulator s; starting with input v 


1: procedure BG-SIMULATION(V) 
2: Wi=1...n,SafeAgreement[0][i].propose(v) 


3: loop 

4 fori — 1,n do 

5: £ < current program counter of p; 

6: snap <- SafeAgreement[£][7].resolveQ 
7 if snap ~ 1 then 

8 


> Initialization 


> Simulate threads in round-robin 


> safe agreement is resolved 


perform local computation using sap, write operations in local memory 


9: execute write on behalf of p; in MEM[E][i] 


10: if thread p; is terminated then 

11: return value and stop its simulation 

12: else if at least (7 — t) threads have program counter > ¢ then 
13: snap <— snapshot(MEM[£]) 

14: SafeAgreement[é + 1][7].propose(snap) 

15: end if 

16: end if 

17: end for 

18: end loop 


19: end procedure 
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BG Distributed Simulation Algorithm, Fig. 1 Conceptual view of advancement for snapshots of all simulated 


process withn = 8 andt = 3 


solvability for colorless tasks, that is, tasks that 
are agnostic on process identities. The initial 
application has been made to the k-set agreement 
problem, in which all processes have to agree on 
a final set of values of size at most k. If k-set 
agreement was solvable in a k-resilient system of 
n > k +1 processes, then the BG simulation 
of this algorithm with k + 1 simulators would 
produce a wait-free solution to k-set agreement. 
Since k-set agreement is not wait-free solvable 
for k + 1 processes [13, 19], it follows a contra- 
diction. 

The BG simulation presented here only ap- 
plies to colorless tasks. Gafni [8] extended further 
to more general classes of tasks and provided 
the general characterization of t-resilient solvable 
tasks, similarly to the Herlihy-Shavit conditions 
for wait-free computability [13]. This extension 
has been also studied in [14, 16]. 

In order to study the relationship between 
wait-freedom and t-resilience, [5] uses objects 
of type S in addition to read/write registers 
and shows that for any t < k, t-resilient k- 
process consensus can be implemented with 
objects of type S and registers if and only if 
wait-free (¢ + 1)-process consensus can be 
implemented with objects of type S and registers. 


Imbs and Raynal [15] consider models equipped 
with registers and consensus objects and 
extend the results provided by BG simulation, 
showing equivalences between models based 
on the ratio between the maximum number of 
failures and the consensus number of consensus 
objects. 

Chaudhuri and Reiners [7] use BG simula- 
tion to provide a characterization of the set con- 
sensus partial order, a refinement of Herlihy’s 
consensus-based wait-free hierarchy [10]; a for- 
mal definition of set consensus number and a 
study of associated respective computing power 
have been later provided in [9]. 
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Problem Definition 


The theory of bidimensionality provides 
general techniques for designing efficient 
fixed-parameter algorithms and approximation 
algorithms for a broad range of NP-hard graph 
problems in a broad range of graphs. This 
theory applies to graph problems that are 
“bidimensional” in the sense that (1) the solution 
value for the k x k grid graph and similar graphs 
grows with k, typically as Q(k?), and (2) the 
solution value goes down when contracting edges 
and optionally when deleting edges in the graph. 
Many problems are bidimensional; a few classic 
examples are vertex cover, dominating set, and 
feedback vertex set. 


Graph Classes 

Results about bidimensional problems have been 
developed for increasingly general families of 
graphs, all generalizing planar graphs. 

The first two classes of graphs relate to em- 
beddings on surfaces. A graph is planar if it can 
be drawn in the plane (or the sphere) without 
crossings. A graph has (Euler) genus at most g 
if it can be drawn in a surface of Euler charac- 
teristic g. A class of graphs has bounded genus if 
every graph in the class has genus at most g for a 
fixed g. 

The next three classes of graphs relate to 
excluding minors. Given an edge e = {v, w} ina 
graph G, the contraction of e in G is the result of 
identifying vertices v and w in G and removing all 
loops and duplicate edges. A graph H obtained 
by a sequence of such edge contractions starting 
from G is said to be a contraction of G. A 
graph H is a minor of G if H is a subgraph of 
some contraction of G. A graph class C is minor 
closed if any minor of any graph in C is also 
a member of C. A minor-closed graph class C 
is H-minor-free if H ¢ C. More generally, the 
term “H-minor-free” refers to any minor-closed 
graph class that excludes some fixed graph H. A 
single-crossing graph is a minor of a graph that 
can be drawn in the plane with at most one pair 
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of edges crossing. A minor-closed graph class is 
single-crossing-minor-free if it excludes a fixed 
single-crossing graph. An apex graph is a graph 
in which the removal of some vertex leaves a 
planar graph. A graph class is apex-minor-free if 
it excludes some fixed apex graph. 


Bidimensional Parameters 
Although implicitly hinted at in [2,5, 10,11], the 
first use of the term “bidimensional” was in [3]. 

First, “parameters” are an alternative view 
on optimization problems. A parameter P is a 
function mapping graphs to nonnegative integers. 
The decision problem associated with P asks, 
for a given graph G and nonnegative integer k, 
whether P(G) < k. Many optimization problems 
can be phrased as such a decision problem about 
a graph parameter P. 

Now, a parameter is g(r)-bidimensional (or 
just bidimensional) if it is at least g(r) in an 
r xr “grid-like graph” and if the parameter does 
not increase when taking either minors (g(r)- 
minor-bidimensional) or contractions (g(r)- 
contraction-bidimensional). The exact definition 
of “grid-like graph” depends on the class of 
graphs allowed and whether one considers 
minor or contraction bidimensionality. For minor 
bidimensionality and for any H-minor-free graph 
class, the notion of a “grid-like graph” is defined 
to be the r x rgrid, i.e., the planar graph with r? 
vertices arranged on a square grid and with edges 
connecting horizontally and vertically adjacent 
vertices. For contraction bidimensionality, the 
notion of a “grid-like graph” is as follows: 


1. For planar graphs and single-crossing-minor- 
free graphs, a “grid-like graph” is an r xr grid 
partially triangulated by additional edges that 
preserve planarity. 

2. For bounded-genus graphs, a “grid-like graph” 
is such a partially triangulated r x r grid with 
up to genus (G) additional edges (“handles”). 

3. For apex-minor-free graphs, a “grid-like 
graph” is an r xX r grid augmented with 
additional edges such that each vertex is 
incident to O(1) edges to nonboundary 
vertices of the grid. (Here O(1) depends on 
the excluded apex graph.) 


Bidimensionality 


Contraction bidimensionality is so far undefined 
for H-minor-free graphs (or general graphs). 
Examples of bidimensional parameters 
include the number of vertices, the diameter, and 
the size of various structures such as feedback 
vertex set, vertex cover, minimum maximal 
matching, face cover, a series of vertex-removal 
parameters, dominating set, edge dominating 
set, R-dominating set, connected dominating set, 
connected edge dominating set, connected R- 
dominating set, unweighted TSP tour (a walk 
in the graph visiting all vertices), and chordal 
completion (fill-in). For example, feedback 
vertex set is Q(r?)-minor-bidimensional (and 
thus also contraction-bidimensional) because (1) 
deleting or contracting an edge preserves existing 
feedback vertex sets and (2) any vertex in the 
feedback vertex set destroys at most four squares 
in the r xr grid, and there are (r — 1)? squares, so 
any feedback vertex set must have Q(r7) vertices. 
See [1,3] for arguments of either contraction 


or minor bidimensionality for the other 
parameters. 
Key Results 


Bidimensionality builds on the seminal graph 
minor theory of Robertson and Seymour, by ex- 
tending some mathematical results and building 
new algorithmic tools. The foundation for sev- 
eral results in bidimensionality is the following 
two combinatorial results. The first relates any 
bidimensional parameter to treewidth, while the 
second relates treewidth to grid minors. 


Theorem 1 ([1,8]) Jf the parameter P is g(r)- 
bidimensional, then for every graph G in the fam- 
ily associated with the parameter P, tw(G) = 
O(g~!(P(G))). In particular, if g(r) = O(r?), 
then the bound becomes tw(G) = O(,/ P(G)). 


Theorem 2 ([8]) For any fixed graph H, ev- 
ery H-minor-free graph of treewidth w has an 
Q(w) x Q(w) grid as a minor. 


The two major algorithmic results in 
bidimensionality are general subexponen- 
tial fixed-parameter algorithm and _ general 
polynomial-time approximation scheme (PTASs). 
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Theorem 3 ([1, 8]) Consider a_ g(r)-bidimen- 
sional parameter P that can be computed on a 
graph G in h(w)n©°™ time given a tree decom- 
position of G of width at most w. Then there is 
an algorithm computing P on any graph G in 
P’s corresponding graph class, with running 
time [h(O(g1(k))) + 208 -16))] nO In 
particular, if g(r) = ©(r?) and h(w) = 2 2%), 
then this running time is subexponential in k. 


Theorem 4 ([7]) Consider a __ bidimensional 
problem satisfying the “separation property” 
defined in [4, 7]. 

Suppose that the problem can be solved 
on a graph G with n vertices in f(n,tw(G)) 
time. Suppose also that the problem can be 
approximated within a factor of a in g(n) 
time. For contraction-bidimensional problems, 
suppose further that both of these algorithms 
also apply to the “generalized form” of the 
problem defined in [4, 7]. Then there is a 
(1+7)-approximation algorithm whose running 
time is O(nf(n, O(O7/7)) + n3g(n)) for the 
corresponding graph class of the bidimensional 
problem. 


Applications 


The theorems above have many combinatorial 
and algorithmic applications. 

Applying the parameter-treewidth bound of 
Theorem | to the parameter of the number of 
vertices in the graph proves that every H -minor- 
free graph on n vertices has treewidth O(./n), 
thus (re)proving the separator theorem for H- 
minor-free graphs. Applying the parameter- 
treewidth bound of Theorem | to the parameter 
of the diameter of the graph proves a stronger 
form of Eppstein’s diameter-treewidth relation 
for apex-minor-free graphs. (Further work 
shows how to further strengthen the diameter- 
treewidth relation to linear [6].) The treewidth- 
grid relation of Theorem 2 can be used to bound 
the gap between half-integral multicommodity 
flow and fractional multicommodity flow in 
H-minor-free graphs. It also yields an O(1)- 
approximation for treewidth in H-minor- 
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free graphs. The subexponential fixed-parameter 
algorithms of Theorem 3 subsume or strengthen 
all previous such results. These results can also be 
generalized to obtain fixed-parameter algorithms 
in arbitrary graphs. The PTASs of Theorem 4 in 
particular establish the first PTASs for connected 
dominating set and feedback vertex set even for 
planar graphs. For details of all of these results, 
see [4]. 


Open Problems 


Several combinatorial and algorithmic open prob- 
lems remain in the theory of bidimensionality and 
related concepts. 

Can the grid-minor theorem for H -minor-free 
graphs, Theorem 2, be generalized to arbitrary 
graphs with a polynomial relation between 
treewidth and the largest grid minor? (The best 
relation so far is exponential.) Such polynomial 
generalizations have been obtained for the 
cases of “map graphs” and “power graphs” [9]. 
Good grid-treewidth bounds have applications to 
minor-bidimensional problems. 

Can the algorithmic results (Theorems 3 
and 4) be generalized to solve contraction- 
bidimensional problems beyond apex-minor-free 
graphs? It is known that the basis for these results, 
Theorem 1, does not generalize [1]. Nonetheless, 
Theorem 3 has been generalized for one specific 
contraction-bidimensional problem, dominating 


set [3]. 
Can the polynomial-time approximation 
schemes of Theorem 4 be generalized to 


more general algorithmic problems that do not 
correspond directly to bidimensional parameters? 
One general family of such problems arises when 
adding weights to vertices and/or edges, and 
the goal is, e.g., to find the minimum-weight 
dominating set. Another family of such problems 
arises when placing constraints (e.g., on coverage 
or domination) only on subsets of vertices and/or 
edges. Examples of such problems include 
Steiner tree and subset feedback vertex set. 

For additional open problems and details 
about the problems above, see [4]. 
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Problem Definition 


In the one-dimensional bin packing problem, one 
is given alist L = (a1,d2,...,Gn) of items, each 
item a; having a size s(a;) € (0, 1]. The goal is 
to pack the items into a minimum number of 
unit-capacity bins, that is, to partition the items 
into a minimum number of sets, each having total 
size of at most 1. This problem is NP-hard, and 
so much of the research on it has concerned the 
design and analysis of approximation algorithms, 
which will be the subject of this article. 
Although bin packing has many applications, 
it is perhaps most important for the role it has 
played as a proving ground for new algorithmic 
and analytical techniques. Some of the first worst- 
and average-case results for approximation algo- 
rithms were proved in this domain, as well as the 
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first lower bounds on the competitive ratios of 
online algorithms. Readers interested in a more 
detailed coverage than is possible here are di- 
rected to two relatively recent surveys [4, 11]. 


Key Results 
Worst-Case Behavior 


Asymptotic Worst-Case Ratios 

For most minimization problems, the standard 
worst-case metric for an approximation algorithm 
A is the maximum, over all instances J, of the 
ratio A(/)/OPT(1), where A(/) is the value of the 
solution generated by A and OPT(J) is the optimal 
solution value. In the case of bin packing, how- 
ever, there are limitations to this “absolute worst- 
case ratio” metric. Here it is already NP-hard 
to determine whether OPT(I) = 2, and hence 
no polynomial-time approximation algorithm can 
have an absolute worst-case ratio better than 1.5 
unless P = NP. To better understand the behavior 
of bin packing algorithms in the typical situation 
where the given list ZL requires a large number of 
bins, researchers thus use a more refined metric 
for bin packing, the asymptotic worst-case ratio 
R&. This is defined in two steps as follows. 


" = max {A(L)/OPT(L): 
Lisa list with OPT(L) = n} 


R& = limsup R% 


noo 


The first algorithm whose behavior was ana- 
lyzed in these terms was First Fit (FF). This 


algorithm envisions an infinite sequence of empty 
bins B,, Bo,... and, starting with the first item in 
the input list L, places each item in turn into the 
first bin which still has room for it. In a technical 
report from 1971 which was one of the very first 
papers in which worst-case performance ratios 
were studied, Ullman [22] proved the following. 


Theorem 1 ((22]) R&, = 17/10. 


In addition to FF, five other simple heuristics 
received early study and have served as the in- 
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spiration for later research. Best Fit (BF) is the 
variant of FF in which each item is placed in 
the bin into which it will fit with the least space 
left over, with ties broken in favor of the earliest 
such bin. Both FF and BF can be implemented 
to run in time O(nlogn) [12]. Next Fit (NF) 
is a still simpler and linear-time algorithm in 
which the first item is placed in the first bin, 
and thereafter each item is placed in the last 
nonempty bin if it will fit, otherwise a new bin 
is started. First Fit Decreasing (FFD) and Best 
Fit Decreasing (BFD) are the variants of those 
algorithms in which the input list is first sorted 
into nonincreasing order by size and then the 
corresponding packing rule is applied. The results 
for these algorithms are as follows. 


Theorem 2 ([12]) RYp = 2. 
Theorem 3 ({13]) RG, = 17/10. 


Theorem 4 ((12, 13]) R&p = R&p = 11/9 
= 1.222... 


The above mentioned algorithms are relatively 
simple and intuitive. If one is willing to con- 
sider more complicated algorithms, one can do 
substantially better. The current best polynomial- 
time bin packing algorithm is very good indeed. 
This is the 1982 algorithm of Karmarkar and 
Karp [15], denoted here as “KK.” It exploits the 
ellipsoid algorithm, approximation algorithms for 
the knapsack problem, and a clever rounding 
scheme to obtain the following guarantees. 


Theorem 5 ({15]) Ry = 1 and there is a con- 
stant c such that for all lists L, 


KK(L) < OPT(L) + ¢ log?(OPT(L)). 


Unfortunately, the running time for KK appears 
to be worse than O(n®), and BFD and FFD remain 
much more practical alternatives. 


Online Algorithms 

Three of the abovementioned algorithms (FF, BF, 
and NF) are online algorithms, in that they pack 
items in the order given, without reference to the 
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sizes or number of later items. As was subse- 
quently observed in many contexts, the online 
restriction can seriously limit the ability of an 
algorithm to produce good solutions. Perhaps the 
first limitation of this type to be proved was Yao’s 
theorem [24] that no online algorithm A for bin 
packing can have RP < 1.5. The bound has since 
been improved to the following. 


Theorem 6 ((23]) JfA is an online algorithm for 
bin packing, then RP = 1.540... 


Here the exact value of the lower bound is the 
solution to a complicated linear program. 

Yao’s paper also presented an_ online 
algorithm Revised First Fit (RFF) that had 
RQrp = 5/3 = 1.666... and hence got closer 
to this lower bound than FF and BF. This 
algorithm worked by dividing the items into four 
classes based on size and index, and then using 
different packing rules (and packings) for each 
class. Subsequent algorithms improved on this 
by going to more and more classes. The current 
champion is the online Harmonic++ algorithm 
(H+ +) of [21]: 


Theorem 7 ((21]) Rf, < 1.58889. 


Bounded-Space Algorithms 

The NF algorithm, in addition to being online, 
has another property worth noting: no more than 
a constant number of partially filled bins remain 
open to receive additional items at any given time. 
In the case of NF, the constant is 1 — only the last 
partially filled bin can receive additional items. 
Bounding the number of open bins may be neces- 
sary in some applications, such as packing trucks 
on loading docks. The bounded-space constraint 
imposes additional limits on algorithmic behavior 
however. 


Theorem 8 ((17]) For any online bounded- 
space algorithm A, RP = 1.691.... 


The constant 1.691... arises in many other bin 
packing contexts. It is commonly denoted by hgg 
and equals }°°° ,(1/t;), where t; = 1 and, for 
i>, =t-1(4-14+ 1). 
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The lower bound in Theorem 8 is tight, owing 
to the existence of the Harmonic, algorithms (Hx) 
of [17]. Hy is a class-based algorithm in which 
the items are divided into classes Cy, 1 < h <k, 
with C;, consisting of all items with size 1/k or 
smaller, and C,, 1<h <k, consisting of all 
qa; with 1/(h + 1) < s(aj) < 1/h. The items in 
each class are then packed by NF into a separate 
packing devoted just to that class. Thus, at most 
k bins are open at any time. In [17] it was 
shown that limg_.o9 Ra, =h. = 1.691.... 
This is even better than the asymptotic worst-case 
ratio of 1.7 for the unbounded-space algorithms 
FF and BF, although it should be noted that 
the bounded-space variant of BF in which all 
but the two most-full bins are closed also has 
R& = 1.7 [8]. 


Average-Case Behavior 


Continuous Distributions 

Bin packing also served as an early test 
bed for studying the average-case behavior 
of approximation algorithms. Suppose F is 
a distribution on (0, 1] and L,, is a list of n items 
with item sizes chosen independently according 
to F. For any list LZ, let s(Z) denote the lower 
bound on OPT(L) obtained by summing the sizes 
of all the items in L. Then define 


ER',(F) = E [A(Ln)/OPT(Ln)}. 
ER (F’) = lim sup ER", 


EW" (F) = E [A(Ln) — s(Ln)] 


The last definition is included since ERP (F') = 1 
occurs frequently enough that finer distinctions 
are meaningful. For example, in the early 
1980s, it was observed that for the distribution 
F = U(0, 1] in which item sizes are uniformly 
distributed on the interval (0, 1], ER?p)(F) = 
ER3rp(F) = 1, as a consequence of the 
following more-detailed results. 


Theorem 9 ([16, 20]) For Ac {FFD, BFD, OPT}, 
EW", (U(0, 1]) = O(/n). 
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Somewhat surprisingly, it was later discovered 
that the online FF and BF algorithms also 
had sublinear expected waste, and hence 
ER (U(0, 1]) = 1. 


Theorem 10 ([5, 19]) 


EW’ (UC, 1]) = O(n?) 
EW’, (U(O, 1]) = O(n"? log?/4 n) 


This good behavior does not, however, extend to 
the bounded-space algorithms NF and H,: 


Theorem 11 ([6, 18]) 


ERY (U(O, 1]) = 4/3 = 1.333... 
im ERn, U(0, 1]) = 27/3 — 2 = 1.2899... 
oo 


All the above results except the last two exploit 
the fact that the distribution U(0, 1] is symmetric 
about 1/2, and hence an optimal packing consists 
primarily of two-item bins, with items of size 
s > 1/2 matched with smaller items of size 
very close to 1 — s. The proofs essentially show 
that the algorithms in question do good jobs 
of constructing such matchings. In practice, 
however, there will clearly be situations where 
more than matching is required. To model 
such situations, researchers first turned to the 
distributions U(0,b], 0<b <1, where item 
sizes are chosen uniformly from the interval 
(0, b]. Simulations suggest that such distributions 
make things worse for the online algorithms FF 
and BF, which appear to have ERP (U(0, b]) > 1 
for all b € (0, 1). Surprisingly, they make things 
better for FFD and BFD (and the optimal 
packing). 


Theorem 12 ([2, 14]) 


1. For 0<b<1/2 and Ae {FFD, BFD}, 
Ew”, (U(0, b]) = O(1). 

2. For 1/2<b<1 and A€{FFD,BFD}, 
Ew" (U(O, b]) = O(n"), 

3. For0 <b < 1, EW6p,(U(0, b]) = O(). 
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Discrete Distributions 

In many applications, the item sizes come from 
a finite set, rather than a continuous distribution 
like those discussed above. Thus, recently the 
study of average-case behavior for bin packing 
has turned to discrete distributions. Such a distri- 
bution is specified by a finite list 51, 52,...,8q of 
rational sizes and for each s; a corresponding 
rational probability p;. A remarkable result 
of Courcoubetis and Weber [7] says the 
following. 


Theorem 13 ((7]) For any discrete distribution 
F, EWbpr(F) is either O(n), O(./n), or O(1). 


The discrete analogue of the continuous distri- 
bution U(0, b] is the distribution U{7,k}, where 
the sizes are 1/k,2/k,..., j/k and all the prob- 
abilities equal 1/7. Simulations suggest that the 
behavior of FF and BF in the discrete case are 
qualitatively similar to the behavior in the contin- 
uous case, whereas the behavior of FFD and BFD 
is considerably more bizarre [3]. Of particular 
note is the distribution F = U{6, 13}, for which 
ER¢rp(F) is strictly greater than ER, (F), in 
contrast to all the previously implied comparisons 
between the two algorithms. 

For discrete distributions, however, the stan- 
dard algorithms are all dominated by a new on- 
line algorithm called the Sum-of-Squares (SS) 
algorithm. Note that since the item sizes are all 
rational, they can be scaled so that they (and 
the bin size B) are all integral. Then at any 
given point in the operation of an online al- 
gorithm, the current packing can be summa- 
rized by giving, for each h, 1 < h < B, the num- 
ber n; of bins containing items of total size h. 
In SS, one packs each item so as to minimize 


B-1 
Dorel ure 


Theorem 14 ((9]) For any discrete distribution 
F, the following hold. 


1. If EWep,(F) = OC Jn), then 
EW%s(F) = O(/n). 

2. If EWopr(F) = OA), then 
EW’; 5(F) € {O(1), O(logn)}. 


Bin Packing 


In addition, a simple modification to SS can 
eliminate the © (log n) case of condition 2. 


Applications 


There are many potential applications of one- 
dimensional bin packing, from packing band- 
width requests into fixed-capacity channels to 
packing commercials into station breaks. In prac- 
tice, simple heuristics like FFD and BFD are 
commonly used. 


Open Problems 


Perhaps the most fundamental open problem 
related to bin packing is the following. As 
observed above, there is a polynomial-time 
algorithm (KK) whose packings are within 
O(log?(OPT)) bins of optimal. Is it possible 
to do better? As far as is currently known, there 
could still be a polynomial-time algorithm that 
always gets within one bin of optimal, even if P 
# NP. 


Experimental Results 


Bin packing has been a fertile ground for ex- 
perimental analysis, and many of the theorems 
mentioned above were first conjectured on the 
basis of experimental results. For example, the 
experiments reported in [1] inspired Theorem 10 
and 12, and the experiments in [10] inspired 
Theorem 14. 
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Problem Definition 


In the bin packing problem, one is given a se- 
quence of items, each of size in the range (0, 1], 
and an infinite number of bins. The goal is to 
pack each item into some bin using as few bins 
as possible, under the constraint that the sum of 
sizes of items in each bin is at most one. In the 
bin packing problem with cardinality constraints, 
an additional constraint is imposed that each bin 
can contain at most & items. 

This problem for k = 2 is solvable in poly- 
nomial time by reducing it to the cardinality 
matching problem. Nevertheless, this problem for 
k > 3 is NP-hard, since one can reduce 3- 
PARTITION to it. Therefore, much work has been 
done on approximation algorithms. We remark 
that, in particular, it has also been of interest 
to design online algorithms that pack each item 
upon its arrival. 

The standard performance measure of an ap- 
proximation algorithm for this problem is the 
asymptotic performance ratio. For a sequence of 
items L and an approximation algorithm 4A, let 
A(L) denote the value of the solution generated 
by A for L, and let OPT(L) denote the value 
of the optimal solution for L. The asymptotic 
performance ratio of A is defined as 


A(L 
RY = limsup sup| () 


aes PAOPTIL) | OE) =n} 


The bin packing problem with cardinality con- 
straints is formally defined as follows: 


Problem 1 (Bin Packing with Cardinality 
Constraints) 

Input: A sequence L = (a1, a2,...,4y) € (0, 1]” 
and an integer k > 2. Output: An integer m > 
1 and a partition of {1,2,...,n} into disjoint 
subsets S,,S2,..., Sm such that (1) m is mini- 
mum, (2) DER; ai < 1lforalll < j <m, 
and (3) |S;| <k forall\ < j <m. 


Bin Packing with Cardinality Constraints 


Key Results 


Approximation Algorithms 

Krause et al. [9, 10] gave approximation algo- 
rithms whose asymptotic performance ratios are 
all two. Kellerer and Pferschy [8] presented an 
improved approximation algorithm with asymp- 
totic performance ratio 3. Caprara et al. [3] pro- 
vided an APTAS (asymptotic polynomial-time 
approximation scheme): a collection of approx- 
imation algorithms that, for any parameter ¢ > 0, 
guarantees an asymptotic performance ratio of 
1 + ¢. Finally, a better polynomial-time scheme 
was developed: 


Theorem 1 ([6]) There exists an AFPTAS 
(asymptotic fully polynomial-time approximation 
scheme) for the bin packing problem with cardi- 
nality constraints, that is, an APTAS whose run- 
ning time is polynomial in the input size and 1 


Online Algorithms 

An online algorithm is an approximation algo- 
rithm which, for each i = 1,2,...,n, decides 
into which bin to place the ith item without 
information on the sizes of later items or the 
value of n. The First-Fit, Best-Fit, and Next- 
Fit algorithms may be the most common online 
algorithms for the bin packing problem without 
cardinality constraints. 

Krause et al. [9, 10] adapted the First-Fit algo- 
rithm to the problem with cardinality constraints 
and showed that its asymptotic performance ratio 
is at most 2.7— He. The result was later improved. 
Some work was done for individual values of k. 
We thus summarize best known upper and lower 
bounds on the asymptotic performance ratio for 
each 2 < k < 6 in Table 1. We say here that 
u is an upper bound on the asymptotic perfor- 
mance ratio if there exists an online algorithm 
A such that R$ = u. On the other hand, we 
say that / is a lower bound on the asymptotic 
performance ratio if R9° > / holds for any online 
algorithm A. 

Babel et al. [1] designed an online algorithm, 
denoted here by BCKK, which guarantees an 
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asymptotic performance ratio regardless of the 
value of k. Fork > 7, BCKK is the best so 
far. 


Theorem 2 ({1]) For any k, Rogge = 2. 


Recently, Désa and Epstein [4] showed a 
lower bound on the asymptotic performance ratio 
for each 7 < k < 11. Fujiwara and Kobayashi [7] 
established a lower bound for each 12 < k < 41. 
Some results on the bin packing problem without 
cardinality constraints can be interpreted as lower 
bounds on the asymptotic performance ratio for 
large k: a lower bound of ott (= 1.53900) for 
42 < k < 293 [11] and P33 (~1.54034) for 
294 < k < 2,057 [2]. 


Bounded-Space Online Algorithms 

A bounded-space online algorithm is an online 
algorithm which has only a constant number of 
bins available to accept given items at any time 
point. For example, the Next-Fit algorithm is 
a bounded-space online algorithm for the bin 
packing problem without cardinality constraints, 
since for the arrival of each new item, the 
algorithm always keeps a single bin which 
contains some item(s). All algorithms that 
appeared in the previous section, except Next- 
Fit, do not satisfy this property; such algorithms 
are sometimes called unbounded-space online 
algorithms. 

For the bin packing problem with cardinality 
constraints, a bounded-space online algorithm 
called CCH, [5] is known to be optimal, 
which is based on the Harmonic algorithm. 
Its asymptotic performance ratio is Re = 


Bin Packing with Cardinality Constraints, Table 1 
Best known upper and lower bounds on the asymptotic 
performance ratio of online algorithms for 2 < k < 6 


k Upper bound Lower bound 


2 1+ “8 (1.44721) [1] 1.42764 [7] 
3 1.75 [5] 1.5 [1] 

4 7 (& 1.86843) [5] 1.5 [7] 

5 74 (1.93719) [5] 1.5 [4] 

6 #87 (1.99306) [5] 1.5 [12] 
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ae max { 4, i}. where f; is the sequence 
defined by tf) = 2, tj41 = t(t; — 1) + 1 for 


i > 1. For example, we have R2 = 3 = 1:5; 
Ra = LP w-189933, Ry = 2. Rs = 2.1, 
and Re = 4B wx 2.16667. The value of 


Rx increases as k grows and approaches 
1+ °°, Le x 2.69103. 


7-1 
Theorem 3 ([5]) For every k, Recn, = Rx. 
Besides, RP = Rx holds for any bounded-space 
online algorithm A. 


Applications 


In the paper by Krause et al. [9, 10], the aim 
was to analyze task scheduling algorithms for 
multiprocessor systems. Not only this but a con- 
straint on the number of objects in a container 
is important in application, such as a limit to the 
number of files on a hard disk drive or a limit to 
the number of requests assigned to each node in 
a distributed system. 


Open Problems 


Many problems concerning (unbounded-space) 
online algorithms remain open. Even for small 
values of k, an optimal online algorithm has yet 
to be found. It is also interesting whether, for 
general k, there is an online algorithm whose 
asymptotic performance ratio is strictly smaller 
than two. 
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Problem Definition 


The well-known bin packing problem [3, 8] has 
numerous variants [4]. Here, we consider one 
natural variant, called the bin packing problem 
with general cost structures (GCBP) [1, 2,6]. In 
this problem, the action of an algorithm remains 
as in standard bin packing. We are given n items 
of rational sizes in (0,1]. These items are to 
be assigned into unit size bins. Each bin may 
contain items of total size at most 1. While in 
the standard problem the goal is to minimize 
the number of used bins, the goal in GCBP is 
different; the cost of a bin is not 1, but it depends 
on the number of items actually packed into this 
bin. This last function is a concave function of 
the number of packed items, where the cost of 
an empty bin is zero. More precisely, the input 
consists of n items J = {1,2,...,n} with sizes 
1> 5 = Ss = > S, => O anda 
function f : {0,1,2,...,} > R¢, where f is 
a monotonically nondecreasing concave function, 
satisfying f(0) = 0. The goal is to partition J 
into some number of sets S),..., S,,, called bins, 
such that }’j¢s, 5; < 1 forany 1! <i < yp, 
and so that )~"_, f(|Si|) is minimized (where 
|.S;| denotes the cardinality of the set S;). An 
instance of GCBP is defined not only by its input 
item sizes but also using the function f. It can 
be assumed that f(1) = 1 (by possible scaling of 
the cost function f). The problem is strongly NP- 
hard for multiple functions f, and as standard 
bin packing, it was studied using the asymptotic 
approximation ratio. 


Key Results 


There are two kinds of results for the problem. 
The first kind of results is algorithms that do not 
take f into account. The second kind is those that 
base their action on the values of f. 

A class of (concave and monotonically nonde- 
creasing) functions {f,}qen, which was consid- 
ered in [1], is the following. These are functions 
that grow linearly (with a slope of 1) up to 
an integer point q, and then, they are constant 
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(starting from that point). Specifically, f,(t) = t 
fort < q and f,(t) = q fort > q. It was shown 
in [1] that focusing on such functions is sufficient 
when computing upper bounds on algorithms that 
act independently of the cost function. Note that 
fi = 1, and thus GCBP with the cost function f; 
is equivalent to standard bin packing. 

Before describing the results, we present a 
simple example showing the crucial differences 
between GCBP and standard bin packing. Con- 
sider the function f = fs (where f(1) = 1, 
Ff) = 2, and f(k) = 3 fork > 3). Given 
an integer N > 1, consider an input consisting 
of 3N items, each of size , called large items, 
and 6N items, each of size z, called small items. 
An optimal solution for this input with respect 
to standard bin packing uses 3N bins, each con- 
taining one large item and two small items. This 
is the unique optimal solution (up to swapping 
positions of identical items). The cost of this 
solution for GCBP with the function f = fs 
is 9N. Consider a solution that uses 4N bins, 
the first NV bins receive six small items each, and 
each additional bin receives one large item. This 
solution is not optimal for standard bin packing, 
but its cost for GCBP with f = f3 is 6N. 

Anily, Bramel, and Simchi-Levi [1] analyzed 
the worst-case performance of some natural bin 
packing heuristics [8], when they are applied to 
GCBP. They showed that many common heuris- 
tics for bin packing, such as First Fit, Best Fit, 
and Next Fit, do not have a finite asymptotic 
approximation ratio. Moreover, running the mod- 
ifications of the first two heuristics after sorting 
the lists of items (in a nonincreasing order), 
i.e., applying the algorithms First Fit Decreasing 
and Best Fit Decreasing, leads to similar results. 
However, Next Fit Decreasing was shown to have 
an asymptotic approximation ratio of exactly 2. 
The algorithm Next Fit packs items into its last 
bin as long as this is possible and opens a new bin 
when necessary. Sorting the items in nondecreas- 
ing order gives a better asymptotic approximation 
ratio of approximately 1.691 (in this case, the 
three algorithms, First Fit Increasing, Best Fit 
Increasing, and Next Fit Increasing, are the same 
algorithm). It is stated in [1] that any heuristic that 
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is independent of f has an asymptotic approxi- 
mation ratio of at least 4. An improved approx- 
imation algorithm, called MatchHalf (MH), was 
developed in [6]. The asymptotic approximation 
ratio of this algorithm does not exceed 1.5. The 
idea of MH is to create bins containing pairs 
of items. The candidate items to be packed into 
those bins are half of the items of size above 5 
(large items), but they can only be packed with 
smaller items. Naturally, the smallest large items 
are selected, and the algorithm tries to match 
them with smaller items. The remaining items 
and unmatched items are packed using Next Fit 
Increasing. Interestingly, it was shown [6] that 
matching a larger fraction of large items can harm 
the asymptotic approximation ratio. 

A fully polynomial approximation scheme 
(asymptotic FPTAS or AFPTAS) for GCBP was 
given in [6]. This is a family of approximation 
algorithms that contains, for any ¢ > O, an 
approximation algorithm whose asymptotic 
approximation ratio is at most | + e. The running 
time must be polynomial in the input and in 4. 
An AFPTAS for GCBP must use the function 
f in its calculations (this can be shown using 
the example above and similar examples and 
can also be deduced from the lower bound of 
4 on the asymptotic approximation ratio of an 
algorithm that is oblivious of f [1]). An AFPTAS 
for GCBP is presented in [6]. One difficulty 
in designing such a scheme is that the nature 
of packing of small items is important, unlike 
approximation schemes for standard bin packing, 
where small items can be added greedily [7, 9]. 
While in our problem we can impose cardinality 
constraints on bins (upper bounds on numbers 
of packed items) as in [5], still the cost function 
introduces major difficulties. Another ingredient 
of the scheme is preprocessing where some very 
small items are packed into relatively full bins. It 
is impossible to do this for all very small items 
as bins consisting of only such items will have a 
relatively large cost (as each such bin will contain 
a very large number of items). This AFPTAS and 
those of [5] require column generation as in 
[9] but require fairly complicated configuration 
linear programs. 
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Problem Definition 


Boolean Functions 

The concept of a Boolean function — a function 
whose domain is {0, 1}” and range is {0, 1} — is 
central to computing. Boolean functions are used 
in foundational studies of complexity [7,9] as 
well as the design and analysis of logic circuits 
[4, 13], A Boolean function can be represented 
using a truth table — an enumeration of the 
values taken by the function on each element 
of {0,1}". Since the truth table representation 
requires memory exponential in 7, it is imprac- 
tical for most applications. Consequently, there 
is a need for data structures and associated algo- 
rithms for efficiently representing and manipulat- 
ing Boolean functions. 


Boolean Circuits 

Boolean functions can be represented in many 
ways. One natural representation is a Boolean 
combinational circuit, or circuit for short [6, 
Chapter 34]. A circuit consists of Boolean 
combinational elements connected by wires. 
The Boolean combinational elements are gates 
and primary inputs. Gates come in three types: 
NOT, AND, and OR. The NOT gate functions as 
follows: it takes a single Boolean-valued input 
and produces a single Boolean-valued output 
which takes value 0 if the input is 1, and 1 if the 
input is 0. The AND gate takes two Boolean- 
valued inputs and produces a single output; the 
output is | if both inputs are 1, and 0 otherwise. 
The OR gate is similar to AND, except that its 
output is | if one or both inputs are 1, and 0 
otherwise. 

Circuits are required to be acyclic. The ab- 
sence of cycles implies that a Boolean assignment 
to the primary inputs can be unambiguously prop- 
agated through the gates in topological order. It 
follows that a circuit on n ordered primary inputs 
with a designated gate called the primary output 
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corresponds to a Boolean function on {0, 1}”. 
Every Boolean function can be represented by a 
circuit, e.g., by building a circuit that mimics the 
truth table. 

The circuit representation is very general — any 
decision problem that is computable in polyno- 
mial time on a Turing machine can be computed 
by circuit polynomial in the instance size, and the 
circuits can be constructed efficiently from the 
Turing machine program [15]. However, the key 
analysis problems on circuits, namely, satisfiabil- 
ity and equivalence, are NP-hard [7]. 


Boolean Formulas 

A Boolean formula is defined recursively: a 
Boolean variable x; is a Boolean formula, and 
if g and Ww are Boolean formulas, then so are 
(=), (6A). (OV W), (> > W), and (6 > ). 
The operators -,V,A,—,< are referred to 
as connectives; parentheses are often dropped 
for notational convenience. Boolean formulas 
also can be used to represent arbitrary Boolean 
functions; however, formula satisfiability and 
equivalence are also NP-hard. Boolean formulas 
are not as succinct as Boolean circuits: for 
example, the parity function has linear sized 
circuits, but formula representations of parity 
are super-polynomial. More precisely, XORy : 
{0, 1}” — {0, 1} is defined to take the value 1 on 
exactly those elements of {0, 1}” which contain 
an odd number of Is. Define the size of a formula 
to be the number of connectives appearing in 
it. Then for any sequence of formulas 61, 92,... 
such that 0, represents XOR;, the size of 0, is 
Q(k*) for alle € Z* [14, Chapters 11, 12]. 

A disjunct is a Boolean formula in which A 
and — are the only connectives, and — is applied 
only to variables; for example, x1 A —x3 A 7x5 
is a disjunct. A Boolean formula is said to be 
in Disjunctive Normal Form (DNF) if it is of 
the form Dp V D, V --- V Dx_1, where each 
Dj; is a disjunct. DNF formulas can represent 
arbitrary Boolean functions, e.g., by identifying 
each input on which the formula takes the value 
1 with a disjunct. DNF formulas are useful in 
logic design, because it can be translated directly 
into a PLA implementation [4]. While satisfia- 
bility of DNF formulas is trivial, equivalence is 
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NP-hard. In addition, given DNF formulas @ and 
w, the formulas = and } A W are not DNF 
formulas, and the translation of these formulas 
to DNF formulas representing the same function 
can lead to exponential growth in the size of the 
formula. 


Shannon Trees 

Let f be a Boolean function on domain {0, 1}”. 
Associate the dimensions with variables 
X0,---,;X,-1- Then the positive cofactor of f 
with respect to x;, denoted by f;,, is the function 
on domain {0, 1}”, which is defined by 


- ,@n—1) 
A , An). 


Tj (Qo, ++ 07-1, Gj, Oj 441,-. 
= f(a, - «+5 Qj], 1,aj41,.. 
The negative cofactor of f with respect to x;, 
denoted by f,’, is defined similarly, with 0 taking 
the place of | in the right-hand side. 
Every Boolean function can be decomposed 
using Shannon’s expansion theorem: 


t Migss eth) = + x3 fx 


This observation can be used to represent f 
by a Shannon tree — a kill binary tree [6, Ap- 
pendix B.5] of height n, where each path to a 
leaf node defines a complete assignment to the n 
variables that f is defined over, and the leaf node 
holds a 0 or a 1, based on the value f takes for 
the assignment. 

The Shannon tree is not a particularly use- 
ful representation, since the height of the tree 
representing every Boolean function on {0, 1}” 
is n, and the tree has 2” leaves. The Shannon 
tree can be made smaller by merging isomor- 
phic subtrees and bypassing nodes which have 
identical children. At first glance the reduced 
Shannon tree representation is not particularly 
useful, since it entails creating the full binary 
tree in the first place. Furthermore, it is not 
clear how to efficiently perform computations on 
the reduced Shannon tree representation, such as 
equivalence checking or computing the conjunc- 
tion of functions presented as reduced Shannon 
trees. 
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Bryant [5] recognized that adding the restric- 
tion that variables appear in fixed order from root 
to leaves greatly reduced the complexity of ma- 
nipulating reduced Shannon trees. He referred to 
this representation as a binary decision diagram 
(BDD). 


Key Results 


Definitions 

Technically, a BDD is a directed acyclic graph 
(DAG), with a designated root, and at most two 
sinks — one labeled 0, the other labeled 1. Non- 
sink nodes are labeled with a variable. Each 
nonsink node has two outgoing edges — one 
labeled with a | leading to the /-child, the other 
is a 0, leading to the 0-child. Variables must be 
ordered — that is, if the variable label x; appears 
before the label x; on some path from the root 
to a sink, then the label x; is precluded from 
appearing before x; on any path from the root to a 
sink. Two nodes are isomorphic if both are equi- 
labeled sinks, or they are both nonsink nodes, 
with the same variable label, and their O- and 
1-children are isomorphic. For the DAG to be 
a valid BDD, it is required that there are no 
isomorphic nodes, and for no nodes are its 0- and 
1-children the same. 

A key result in the theory of BDDs is that 
given a fixed variable ordering, the representation 
is unique up to isomorphism, i.e., if F and G are 
both BDDs representing f: {0,1}” — {0,1} 
under the variable ordering x1 < x2 < ... 
then F and G are isomorphic. 

The definition of isomorphism directly yields 
a recursive algorithm for checking isomorphism. 
However, the resulting complexity is exponential 
in the number of nodes — this is illustrated, 
for example, by checking the isomorphism of 
the BDD for the parity function against itself 
on inspection, the exponential complexity arises 
from repeated checking of isomorphism between 
pairs of nodes — this naturally suggest dynamic 
programming. Caching isomorphism checks re- 
duces the complexity of isomorphism checking 
to O(|F|-|G|), where | B| denotes the number of 
nodes in the BDD B. 


Xn, 
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BDD Operations 

Many logical operations can be implemented 
in polynomial time using BDDs: bdd_and 
which computes a BDD representing the logical 
AND of the functions represented by two 
BDDs, bdd_or and bdd_not which are defined 
similarly, and bdd_compose which takes a BDD 
representing a function f, a variable v, and a 
BDD representing a function g and returns the 
BDD for f where v is substituted by g are 
examples. 

The example of bdd_and is instructive — it is 
based on the identity f-g = x- (fi - ex) + 
x! + (fer + xr). The recursion can be implemented 
directly: the base cases are when either f or g 
are 0 and when one or both are 1. The recursion 
chooses the variable v labeling either the root 
of the BDD for f or g, depending on which is 
earlier in the variable ordering, and recursively 
computes BDDs for fy - gy and fy - gy’; these 
are merged if isomorphic. Given a BDD F for 
f, if v is the variable labeling the root of F,, the 
BDDs for fy and fy, respectively, are simply the 
0-child and 1-child of F's root. 

The implementation of bdd_and as described 
has exponential complexity because of repeated 
subproblems arising. Dynamic programming 
again provides a solution — caching the 
intermediate results of bdd_and reduced the 
complexity to O(|F|-|G)). 


Variable Ordering 

All symmetric functions on {0,1}" have a 
BDD that is polynomial in n, independent of 
the variable ordering. Other useful functions 
such as comparators, multiplexers, adders, and 
subtracters can also be efficiently represented, 
if the variable ordering is selected correctly. 
Heuristics for ordering selection are presented 
in [1, 2, 11]. There are functions which do 
not have a polynomial-sized BDD under any 
variable ordering — the Unction representing 
the n-th bit of the output of a multiplier 
taking two n-bit unsigned integer inputs is 
an example [5]. Wegener [17] presents many 
more examples of the impact of variable 
ordering. 
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Applications 


BDDs have been most commonly applied in the 
context of formal verification of digital hardware 
[8]. Digital hardware extends the notion of circuit 
described above by adding state elements which 
hold a Boolean value between updates and are 
updated on a clock signal. 

The gates comprising a design are often up- 
dated based on performance requirements; these 
changes typically are not supposed to change 
the logical functionality of the design. BDD- 
based approaches have been used for checking 
the equivalence of digital hardware designs [10]. 

BDDs have also been used for checking prop- 
erties of digital hardware. A typical formulation 
is that a set of “good” states and a set of “initial” 
states are specified using Boolean formulas over 
the state elements; the property holds iff there 
is no sequence of inputs which leads a state in 
the initial state to a state not in the set of good 
states. Given a design with n registers, a set 
of states A in the design can be characterized 
by a formula ~,4 over n Boolean variables: 4 
evaluates to true on an assignment to the vari- 
ables iff the corresponding state is in A. The 
formula @,4 represents a Boolean function, and 
so BDDs can be used to represent sets of states. 
The key operation of computing the image of 
a set of states A, i.e., the set of states that 
can be reached on application of a single input 
from states in A, can also be implemented using 
BDDs [12]. 

BDDs have been used for test generation. 
One approach to test generation is to specify 
legal inputs using constraints, in essence Boolean 
formulas over the primary input and state vari- 
ables. Yuan et al. [18] have demonstrated that 
BDDs can be used to solve these constraints very 
efficiently. 

Logic synthesis is the discipline of realizing 
hardware designs specified as logic equations us- 
ing gates. Mapping equations to gates is straight- 
forward; however, in practice a direct mapping 
leads to implementations that are not acceptable 
from a performance perspective, where perfor- 
mance is measured by gate area or timing delay. 
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Manipulating logic equations in order to reduce 
area (e.g., through constant propagation, identi- 
fying common sub-expressions, etc.), and delay 
(e.g., through propagating late arriving signals 
closer to the outputs), is conveniently done using 
BDDs. 


Experimental Results 


Bryant reported results on verifying two qual- 
itatively distinct circuits for addition. He was 
able to verify on a VAX 11/780 (a 1 MIP ma- 
chine) that two 64-bit adders were equivalent in 
95.8 min. He used an ordering that he derived 
manually. 

Normalizing for technology, modern BDD 
packages are two orders of magnitude faster 
than Bryant’s original implementation. A large 
source the improvement comes from the use 
of the strong canonical form, wherein a global 
database of BDD nodes is maintained, and no 
new node is added without checking to see if a 
node with the same label and 0- and 1-children 
exists in the database [3]. (For this approach 
to work, it is also required that the children of 
any node being added be in strong canonical 
form.) Other improvements stem from the use of 
complement pointers (if a pointer has its least- 
significant bit set, it refers to the complement 
of the function), better memory management 
(garbage collection based on reference counts, 
keeping nodes that are commonly accessed 
together close in memory), better hash functions, 
and better organization of the computed table 
(which keeps track of subproblems that have 
already been encountered) [16]. 


Data Sets 


The SIS _ (http://embedded.eecs.berkeley.edu/ 
pubs/downloads/sis/) system from UC Berkeley 
is used for logic synthesis. It comes with a 
number of combinational and sequential circuits 
that have been used for benchmarking BDD 
packages. 
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The VIS (http://embedded.eecs.berkeley.edu/ 
pubs/downloads/vis) system from UC Berkeley 
and UC Boulder is used for design verification; 
it uses BDDs to perform checks. The distribution 
includes a large collection of verification prob- 
lems, ranging from simple hardware circuits to 
complex multiprocessor cache systems. 


URL to Code 


A number of BDD packages exist today, but the 
package of choice is CUDD (http://vlsi.colorado. 
edu/~fabio/CUDD/). CUDD implements all the 
core features for manipulating BDDs, as well as 
variants. It is written in C++ and has extensive 
user and programmer documentation. 
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Problem Definition 


The binary space partition (for short, BSP) is 
a scheme for subdividing the ambient space R@ 
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Binary Space Partitions, Fig. 1 Three 2-dimensional convex objects and a line segment (/eft), a binary space partition 


with five partition lines Hy,.. 


into open convex sets (called cells) by hyper- 
planes in a recursive fashion. Each subdivision 
step for a cell results in two cells, in which 
the process may continue, independently of other 
cells, until a stopping criterion is met. The binary 
recursion tree, also called BSP-tree, is tradition- 
ally used as a data structure in computer graphics 
for efficient rendering of polyhedral scenes. Each 
node v of the BSP-tree, except for the leaves, 
corresponds to a cell Cy C R@ and a partitioning 
hyperplane H,. The cell of the root r is C, = R@, 
and the two children of a node v correspond 
to Cy N Hy and C, N Aj, where H> and 
H,} denote the open half-spaces bounded by Hy. 
Refer to Fig. 1. 

A binary space partition for a set of n pairwise 
disjoint (typically polyhedral) objects in R@ is a 
BSP where the space is recursively partitioned 
until each cell intersects at most one object. When 
the BSP-tree is used as a data structure, every 
leaf v stores the fragment of at most one object 
clipped in the cell C,, and every interior node 
v stores the fragments of any lower-dimensional 
objects that lie in Cy N Ay. 

A BSP for a set of objects has two param- 
eters of interest: the size and the height of the 
corresponding BSP-tree. Ideally, a BSP parti- 
tions space so that each object lies entirely in a 
single cell or in a cutting hyperplane, yielding 
a so-called perfect BSP [4]. However, in most 
cases this is impossible, and the hyperplanes Hy 
partition some of the input objects into frag- 
ments. Assuming that the input objects are k- 
dimensional, for some k < d, the BSP typically 
stores only k-dimensional fragments, i.e., object 


. , Hs5 (center), and the corresponding BSP tree (right) 


parts clipped in leaf cells Cy or in Cy N Ay at 
interior nodes. 

The size of the BSP-tree is typically propor- 
tional to the number of k-dimensional fragments 
that the input objects are partitioned into, or the 
number of nodes in the tree. Given a set S of 
objects in R@, one would like to find a BSP for 
S' with small size and/or height. The partition 
complexity of a set of objects S is defined as the 
minimum size of a BSP for S. 


Glossary 


¢ Autopartition: a class of BSPs obtained by 
imposing the constraint that each cut is along 
a hyperplane containing a facet of one of the 
input objects. 

¢ Axis-aligned BSP: a class of BSPs obtained 
by imposing the constraint that each cut is 
orthogonal to a coordinate axis. 

¢ Round-robin BSP: An axis-aligned BSP in 
R? where any d consecutive recursive cuts 
are along hyperplanes orthogonal to the d 
coordinate axes. 

* Tiling in R7: a set of interior-disjoint polyhe- 
dra that partition R2. 

¢ Axis-aligned tiling: a set of full-dimensional 
boxes that partition R2. 

¢ d-dimensional box: the cross product of d 
real-valued intervals. 


Key Results 


The theoretical study of BSPs was initiated by 
Paterson and Yao [10, 11]. 
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Line Segments in the Plane 

A classical result of Paterson and Yao [10] is a 
simple and elegant randomized algorithm, which, 
given n disjoint segments, produces a BSP whose 
expected size is O(n logn); see also [3, Ch. 12]. 
It was widely believed for decades that every set 
of disjoint line segments in the plane admits 
a BSP of size O(7); see e.g., [10, p. 502]; this 
was until Té6th proved a tight super-linear bound 
for this problem, first by constructing a set of 
segments for which any BSP must have size 
Q(n logn/loglogn) and later by matching this 
bound algorithmically: 


Theorem 1 ([12, 15]) Every set of n disjoint 
line segments in the plane admits a BSP of 
size O(n logn/ log logn). This bound is the best 
possible, and a BSP of this size can be computed 
in O(n log? n) time. 


Simplices in R@ 

The randomized partition technique of Paterson 
and Yao generalizes to higher dimensions yield- 
ing the following. 


Theorem 2 ({10]) Every set of n (d — 1)- 
dimensional simplices in R¢, where d > 3 admits 
a BSP of size O(n?¢~'). 


While there exist n disjoint triangles in R? that 
require a BSP of size 2(n7), no super-quadratic 
lower bound is known in any dimension d. Near- 
linear upper bounds are known for “realistic” 
input models in IR? such as uncluttered scenes [5] 
or fat axis-aligned rectangles [14]. 


Axis-Parallel Segments, Rectangles, and 
Hyperrectangles 

Theorem 3 ([1, 7, 10]) Every set of n pairwise 
disjoint axis-parallel line segments in the plane 
admits an auto-partition of size at most 2n — 1. 
Such a BSP can be computed using O(n logn) 
time and space and has the additional property 
that no input segment is cut more than once. The 
upper bound on the size is the best possible apart 
from lower-order terms. 


Theorem 4 ([7,11]) Let I" be a collection of n 
line segments in R¢, where d > 3, consisting 
of nj segments parallel to the x;-axis, fori = 
1,...,d. Then I’ admits a BSP of size at most 
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Theorem 5 ([7]) For constants 1 <k <d-—1, 
every set of n axis-parallel k-rectangles in 
d-space admits an axis-aligned BSP of size 
O(n¢/4-*)). This bound is the best possible 
fork < d/2 apart from the constant factor. 


For k > d/2, the best known upper and lower 
bounds do not match. No super-quadratic lower 
bound is known in any dimension d. In Ry‘, 
Dumitrescu et al. [7] constructed n 2-dimensional 
disjoint rectangles whose partition complexity is 
2(n?/), 


Tilings 

Already in the plane, the worst-case partition 
complexity of axis-aligned tilings is smaller 
than that for disjoint boxes. Berman, DasGupta, 
and Muthukrishnan [6] showed that every axis- 
aligned tiling of size n admits an axis-aligned 
BSP of size at most 2n; apart from lower-order 
terms, this bound is the best possible. For higher 
dimensions, Hershberger, Suri, and Toth obtained 
the following result. 


Theorem 6 ([9]) Every axis-aligned tiling of 
size n in R®, where d > 2, admits a round-robin 
BSP. of size O(n @t0/3), On the other hand, 
there exist tilings of size n in R@ for which every 
BSP has size 2(n®), where B(3) = 4/3, and 
limg-+oo B(d) = (1 + V5)/2 = 1.618. 


In dimensions d = 3, the partition complexity of 
axis-aligned tilings of size n is O(n4/3), which 
is tight by a construction of Hershberger and 
Suri [8]. 


Applications 


The initial and most prominent applications are 
in computer graphics: BSPs support fast hidden- 
surface removal and ray tracing for moving view- 
points [10]. Rendering is used for visualizing 
spatial opaque surfaces on the screen. A com- 
mon and efficient rendering technique is the so- 
called painter’s algorithm. Every object is drawn 
sequentially according to the back-to-front order, 
starting with the deepest object and continuing 
with the objects closer to the viewpoint. When 
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all the objects have been drawn, every pixel rep- 
resents the color of the object closest to the view- 
point. Further computer graphics applications in- 
clude constructive solid geometry and shadow 
generation. Other applications of BSP trees in- 
clude range counting, point location, collision 
detection, robotics, graph drawing, and network 
design; see, for instance, [13] and the references 
therein. 

In the original setting, the input objects of 
the BSP were assumed to be static. Recent re- 
search on BSPs for moving objects can be seen 
in the context of kinetic data structures (KDS) 
of Basch, Guibas, and Hershberger [2]. In this 
model, objects move continuously along a given 
trajectory (flight plan), typically along a line or a 
low-degree algebraic curve. The splitting hyper- 
planes are defined by faces of the input objects, 
and so they move continuously, too. The BSP is 
updated only at discrete events, though, when the 
combinatorial structure of the BSP changes. 


Open Problems 


¢ What is the maximum partition complexity of 
n disjoint (d —1)-dimensional simplices in R@ 
for d > 3? 

e What is the maximum partition complexity 
of n disjoint (axis-aligned) boxes in R@ for 
d > 3? 

¢ What is the maximum (axis-aligned) partition 
complexity of a tiling of n axis-aligned boxes 
in R@ ford > 4? 

* Are there families of n disjoint objects in R@ 
whose partition complexity is super-quadratic 
inn? 

¢ How many combinatorial changes can occur 
in the kinetic BSP of n points moving with 
constant velocities in the plane? 


In all five open problems, the dimension d € N 
of the ambient space R¢ is constant, and asymp- 
totically tight bounds in terms of 7 are sought. 
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Problem Definition 


Floorplanning is an early stage of the very-large- 
scale integration (VLSI) design process in which 
a coarse layout of a set of rectangular circuit 
blocks is determined. A floorplan enables de- 
signers to quickly estimate circuit performance, 
routing congestion, etc., of the circuit. In modern 
VLSI design flow, a fixed outline of the floorplan- 
ning region is given. On the other hand, many 
circuit blocks do not have a fixed shape during 
floorplanning as their internal circuitries have not 
yet been laid out. Those blocks are called soft 
blocks. Others blocks with predetermined shapes 
are called hard blocks. Given the geometric (i.e., 
left-right and above-below) relationship among 
the blocks, the block shaping problem is to de- 
termine the shapes of the soft blocks such that 
all blocks can be packed without overlap into the 
fixed-outline region. 

To handle the block shaping problem in fixed- 
outline floorplanning, Yan and Chu [1,2] provide 
a problem formulation in which the floorplan 
height is minimized, while the width is upper 
bounded. The formulation is described below. 
Let W be the upper bound on the width of the 
floorplanning region. Given a set of n blocks, 
each block i has area A;, width w;, and height h;. 
A; is fixed, while w; and h; may vary as long as 
they satisfy w; x h; = A;, ve <ws we, 
mid Ae sh, = AP we = We and 
fc = H;", then block i is a hard block. Two 
constraint graphs Gy and Gy [3, Chapter 10] 
are given to specify the geometric relationship 
among the blocks. Gy and Gy consist of n + 2 
vertices. Vertices 1 to n represent the n blocks. In 
addition, dummy vertices 0 (called source) and 
n + 1 (called sink) are added. In Gy, vertices 0 
and n + | represent the leftmost and rightmost 
boundaries of the floorplanning region, respec- 
tively. In Gy, vertices 0 and n + 1 represent 
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the bottommost and topmost boundaries of the 
floorplanning region, respectively. Ao, wo, ho, 
An+1, Wn+1, and hy+ 1 are all set to 0. If block 
i is on the left of block /, (i, 7) € Guy. If block i 
is below block /, (i, 7) € Gr. 

Let x; and y; be the x- and y-coordinates of 
the bottom-left corner of block 7 in the floorplan. 
Then, the block shaping problem formulation in 
[1,2] can be written as the following geometric 
program: 


Minimize yn+1 
subject to Xn41 < W 


Xj + wi Sw; Vi, 7) € Gy 
vithi < yj V(i, j) € Gy 
wi X hy = Aj l<i<n 


Wem <w <Wm™1<i<n 
A sige He Lat ea 


Xo = yo = 0 


To solve the original problem of packing all 
blocks into a fixed-outline region, we can take 
any feasible solution of the geometric program in 
which y,+1 is less than or equal to the height of 
the region. 


Key Results 


Almost all previous works target the classical 
floorplanning formulation, which minimizes 
the floorplan area. Such a formulation is not 
compatible with modern design methodologies 
[4], but those works may be modified to help 
fixed-outline floorplanning to various extents. 
For the special case of slicing floorplan [5], 
the block shaping problem can be solved by 
the elegant shape curve idea [6]. For a general 
floorplan which may not have a slicing structure, 
various heuristics have been proposed [7—9]. Moh 
et al. [10] formulated the shaping problem as a 
geometric program and optimally solved it using 
standard convex optimization. Young et al. [11] 
solved the geometric program formulation by 
Lagrangian relaxation. Lin et al. [12] minimized 
the floorplan area indirectly by minimizing its 
perimeter optimally using min-cost flow and 
trust region method. For previous works which 
directly tackled the block shaping problem in 
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fixed-outline floorplanning, Adya and Markov 
[13] proposed a simple greedy heuristic, and 
Lin and Hung [14] used second-order cone 
programming. Previous works are either non- 
optimal or time-consuming. 

Yan and Chu [1, 2] presented a simple and 
optimal algorithm called slack-driven shaping 
(SDS). SDS iteratively shapes the soft blocks to 
reduce the floorplan height while not exceeding 
the floorplan width bound. We first present a sim- 
plified version called basic slack-driven shaping 
(basic SDS), which almost always produces an 
optimal solution. Then, we present its extension 
to the SDS algorithm. 

Given some initial block shapes, the blocks 
can be packed to the four boundaries of the 
floorplanning region. For block i (1 < i < n), 
let A,, be the difference in x; between the two 
layouts generated by packing all blocks to x = 
W and to x = 0, respectively. Similarly, let Ay, 
be the difference in y; between the two layouts 
generated by packing all blocks to y = yy4, and 
to y = 0, respectively. The horizontal slack as 
and vertical slack Oy are defined as follows: 


225 
ve = max(0, Ax; ), sy! = max(0, Ay, ). 


Horizontal critical path (HCP) is defined as a path 
in Gy from source to sink such that all blocks 
along the path have zero horizontal slack. Vertical 
critical path (VCP) is similarly defined. We also 
define two subsets of blocks: 


SH = {iis soft} N {s# > 0,sY = 0} 
Nw; < Wi} 

SV = {iis soft} nis? =O," > 0} 
nih eae 


Note that y,+1 can be reduced by decreasing the 
height (i.e., increasing the width) of the blocks in 
SH, and x,+1 can be reduced by decreasing the 
width (i.e., increasing the height) of the blocks 
in SV. We call the blocks in the sets SH and SV 
target soft blocks. In each iteration of basic SDS, 
we would like to increase the width w; of each 
block i € SH by Fa and the height h; of each 
block i € SV by ae The basic SDS algorithm is 
shown below: 


Basic Slack-Driven Shaping Algorithm 


Output: Optimized y,+41, w; and h; for alli. 
Begin 

1. Set w; to won for alli. 

2. Pack blocks to x = 0 and compute Xy+1. 


Input: A set of n blocks, upper-bound width W, Gy and Gy. 


3. If xn41 > W, 

4, Return no feasible solution. 

5. Else, 

6 Repeat 

7. Pack blocks to y = 0, y = yn41,X = 0, andx = W. 
8 Calculate of and for alli. 

9 Identify target soft blocks in SH and SV. 


(wm = wi) He 


10. Vi € SH, increase w; by 67 = 


11. Vi € SV, increase h; by BY = px 


12. Until there is no target soft block. 
End 


ax Sis 
MAX pH (eee a We)) ’ 
where pe is the set of paths in Gy passing through block i. 


a _ hj) i 


MAX pe py (pep A =F) 
where Pe is the set of paths in Gy passing through block 7. 


Si, 
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Note that all 5¥ and 6” in Lines 10 and 11 
can be computed using dynamic programming in 
linear time. Packing of blocks can also be done 
in linear time by longest path algorithm on a 
directed acyclic graph. Hence, each iteration of 
basic SDS takes linear time. The way af and 
ul are set in Lines 10 and 11 is the key to the 
convergence of the algorithm. 


Lemma 1 For any path p from source to sink in 

H Pp 2 
Guy, we have Diep ons ae where Sinax Hf 
is the maximum horizontal slack over all blocks 


along p. 


Basically, < ax H gives us a budget on the total 
amount of increase in the block width along path 
p. Hence, Lemma | implies that the width of the 
floorplan will not be more than W after shaping 
of the blocks in SH at Line 10. The shaping of 
blocks in SV is done similarly, but a factor 6 is 


introduced in Line 11. 


Lemma 2 For any path p from source to sink 
in Gy, we have ae a < Bx 5? x 
_ ax v is the maximum vertical slack over all 
blocks along p. 


where 


Lemma 2 guarantees that by setting B < 
1, Yn+1 Will not increase after each iteration. 
In other words, the height of the floorplan will 
monotonically decrease during the whole shaping 
process. f is almost always set to 1. However, if 
f = 1, it is possible that the floorplan height may 
remain the same after one iteration even when the 
solution is not yet optimal. To avoid getting stuck 
at a local minimum, if the floorplan height does 
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not decrease for two consecutive iterations, 6 is 
set to 0.9 for the next iteration. 

Consider a shaping solution L generated by 
basic SDS (i.e., without any target soft block). 
Blocks at the intersection of some HCP and some 
VCP are called intersection blocks. The follow- 
ing optimality conditions were derived in [1, 2]. 


Lemma 3 /f L contains one VCP in which 
all intersection blocks are hard, then L is 
optimal. 


Lemma 4 /f L contains at most one HCP in 
which some intersection blocks are soft, then 
L is optimal. 


Lemma 5 /f L contains at most one VCP in 
which some intersection blocks are soft, then 
L is optimal. 


In practice, it is very rare for a shaping solu- 
tion generated by basic SDS to satisfy none of 
the three optimality conditions. According to the 
experiments in [1,2], all solutions by basic SDS 
satisfy at least one of the optimality conditions, 
i.e., are optimal. However, [1,2] showed that it 
is possible for basic SDS to converge to non- 
optimal solutions. If a non-optimal solution is 
produced by basic SDS, it can be used as a 
starting solution to the geometric program above 
and then be improved by a single step of any 
descent-based optimization technique (e.g., deep- 
est descent). This perturbed and improved solu- 
tion can be fed to basic SDS again to be further 
improved. The resulting SDS algorithm, which is 
guaranteed optimal, is shown below: 


Slack-Driven Shaping Algorithm 


Output: Optimal y,41, w; and h; for alli. 
Begin 


6. Go to Line 1. 
End 


Input: A set of 1 blocks, upper-bound width W, Gy and Gy. 


1. Run basic SDS to generate shaping solution L. 
2. If Lemma 3 or Lemma 4 or Lemma 5 is satisfied, 


3, L is optimal. Exit. 
4. Else, 
5. Improve L by a single step of geometric programming. 
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Applications 


Floorplanning is a very important step in modern 
VLSI design. It enables designers to explore dif- 
ferent alternatives in the design space and make 
critical decisions early in the design process. 
Typically, a huge number of alternatives need 
to be evaluated during the floorplanning stage. 
Hence, an efficient block shaping algorithm is a 
crucial component of a floorplanning tool. SDS 
is tens to hundreds of times faster than previous 
algorithms in practice. It also directly handles 
a fixed-outline floorplanning formulation, which 
is the standard in modern design methodologies. 
Hence, SDS should be able to improve the qual- 
ity while also reduce the design time of VLSI 
circuits. 


Open Problems 


An interesting open problem is to derive a the- 
oretical bound on the number of iterations for 
SDS to converge to an optimal solution. Although 
experimental results have shown that the number 
of iterations is small in practice, no theoretical 
bound is known. 

Another interesting problem is to design an 
algorithm to achieve optimal block shaping en- 
tirely by simple slack-driven operations without 
resorting to geometric programming. 

Besides, because of the similarity of the con- 
cept of slack in floorplanning and in circuit tim- 
ing analysis, it would be interesting to see if a 
slack-driven approach similar to that in SDS can 
be applied to buffer and wire sizing for timing 
optimization. 
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Problem Definition 


Informally, a boosting technique is a method that, 
when applied to a particular class of algorithms, 
yields improved algorithms. The improvement 
must be provable and well defined in terms of 
one or more of the parameters characterizing the 
algorithmic performance. Examples of boosters 
can be found in the context of randomized algo- 
rithms (here, a booster allows one to turn a BPP 
algorithm into an RP one [6]) and computational 
learning theory (here, a booster allows one to 
improve the prediction accuracy of a weak learn- 
ing algorithm [10]). The problem of compression 
boosting consists of designing a technique that 
improves the compression performance of a wide 
class of algorithms. In particular, the results of 
Ferragina et al. provide a general technique for 
turning a compressor that uses no context infor- 
mation into one that always uses the best possible 
context. 

The classic Huffman and arithmetic coding 
algorithms [1] are examples of statistical com- 
pressors which typically encode an input symbol 
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according to its overall frequency in the data to 
be compressed. (In their dynamic versions these 
algorithms consider the frequency of a symbol 
in the already scanned portion of the input.) 
This approach is efficient and easy to implement 
but achieves poor compression. The compres- 
sion performance of statistical compressors can 
be improved by adopting higher-order models 
that obtain better estimates for the frequencies 
of the input symbols. The PPM compressor [9] 
implements this idea by collecting (the frequency 
of) all symbols which follow any k-long context 
and by compressing them via arithmetic cod- 
ing. The length k of the context is a parameter 
of the algorithm that depends on the data to 
be compressed: it is different if one is com- 
pressing English text, a DNA sequence, or an 
XML document. There exist other examples of 
sophisticated compressors that use context infor- 
mation in an implicit way, such as Lempel-Ziv 
and Burrows-Wheeler compressors [9]. All these 
context-aware algorithms are effective in terms of 
compression performance, but are usually rather 
complex to implement and difficult to analyze. 

Applying the boosting technique of Ferragina 
et al. to Huffman or arithmetic coding yields a 
new compression algorithm with the following 
features: (i) the new algorithm uses the boosted 
compressor as a black box; (ii) the new algorithm 
compresses in a PPM-like style, automatically 
choosing the optimal value of k; and (iii) the new 
algorithm has essentially the same time/space 
asymptotic performance of the boosted compres- 
sor. The following sections give a precise and 
formal treatment of the three properties (i)—(111) 
outlined above. 


Key Results 


Notation: The Empirical Entropy 

Let s be a string over the alphabet © = 
{a1,...,an}, and for each a; € 4, let n; be 
the number of occurrences of a; in s. The Oth 
order empirical ny of the string s is defined 


= dim /|s|) log(n;/|s|), where it 


is assumed that “all logarithms are taken to the 


as Ho(s) = 
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base 2 and Olog 0 = 0. It is well known that Ho 
is the maximum compression one can achieve 
using a uniquely decodable code in which a 
fixed codeword is assigned to each alphabet 
symbol. Greater compression is achievable if 
the codeword of a symbol depends on the k 
symbols following it (namely, its context). (In 
data compression it is customary to define 
the context looking at the symbols preceding 
the one to be encoded. The present entry 
uses the nonstandard “forward” contexts to 
simplify the notation of the following sections. 
Note that working with “forward” contexts 
is equivalent to working with the traditional 
“backward” contexts on the string s reversed 
(see [3] for details).) Let us define ws as the 
string of single symbols immediately preceding 
the occurrences of w in s. For example, for 
S = bcabcabdca, it is cas = bbd. The 
value 


1 
Hi(s) = YS IwslHoOws) A) 


wey* 


is the k-th order empirical entropy of s and is a 
lower bound to the compression one can achieve 
using codewords which only depend on the k 
symbols immediately following the one to be 
encoded. 


Example 1 Lets = mississippi. Fork = 1, 
itis is =mssp, Ss = isis, ps = ip. Hence, 


4 4 
Ay(s) = qi folmssp) + qpfotisis) 


o) 
ae o(ip) 

_ 6 2 4 2 12 

~ tah ta 


Note that the empirical entropy is defined for 
any string and can be used to measure the per- 
formance of compression algorithms without any 
assumption on the input source. Unfortunately, 
for some (highly compressible) strings, the em- 
pirical entropy provides a lower bound that is 
too conservative. For example, for s = a”, it is 
|s| Hy (s) = 0 for any k > 0. To better deal with 
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highly compressible strings, [7] introduced the 
notion of Oth order modified empirical entropy 
Ho* (s) whose property is that |s| Ho* (s) is at 
least equal to the number of bits needed to write 
down the length of s in binary. The kth order 
modified empirical entropy Hj,* is then defined in 
terms of Ho* as the maximum compression one 
can achieve by looking at no more than k symbols 
following the one to be encoded. 


The Burrows-Wheeler Transform 

Given a string s, the Burrows-Wheeler transform 
[2] (bw?) consists of three basic steps: (1) append 
to the end of s a special symbol $ smaller than 
any other symbol in &; (2) form a conceptual 
matrix M whose rows are the cyclic shifts of 
the string s$, sorted in lexicographic order; and 
(3) construct the transformed text § = bwt(s) 
by taking the last column of M (see Fig. 1). 
In [2] Burrows and Wheeler proved that § is a 
permutation of s, and that from S it is possible to 
recover s in O(|s|) time. 

To see the power of the bwr, the reader should 
reason in terms of empirical entropy. Fix a posi- 
tive integer k. The first k columns of the bwt ma- 
trix contain, lexicographically ordered, all length- 
k substrings of s (and k substrings containing 
the symbol $). For any length-k substring w 
of s, the symbols immediately preceding every 
occurrence of w in s are grouped together in 
a set of consecutive positions of § since they 
are the last symbols of the rows of M_pre- 
fixed by w. Using the notation introduced for 
defining H;, it is possible to rephrase this prop- 
erty by saying that the symbols of ws are con- 
secutive within S or, equivalently, that § con- 
tains, as a substring, a permutation z,,(ws) of the 
string Wy. 


Example 2 Let s = mississippi andk = 
1. Figure 1 shows that s[{1,4] = pssm is a 
permutation of is = mssp. In addition, 5[6, 7] = 
pi is a permutation of p, = ip, and S[8, 11] = 
Ssii is a permutation of s; = isis. 

Since permuting a string does not change its 
(modified) Oth order empirical entropy (that is, 
Ho(w(ws)) = Ho(ws)), the Burrows-Wheeler 
transform can be seen as a tool for reducing the 
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$ mississipp 
i $mississip 
i ppi$missis 
i ssippi$mis 
ssissippi$ 
ississippi 
i$mississi 
pi$mississ 
ippi$missi 
issippi$mi 
sippi$miss 
sissippi$m 


Bee OD eo ORM UN 'O HE: 


nn Hn HWD'O'O B P- 
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Boosting Textual Compression, Fig. 1 The bwr matrix (left) and the suffix tree (right) for the string s = 
mississippi§. The output of the bwr is the last column of the bwf matrix, i.e., § = bwt(s) = ipssm$pissii 


problem of compressing s up to its kth order 
entropy to the problem of compressing distinct 
portions of $ up to their Oth order entropy. To see 
this, assume partitioning of S into the substrings 
Ty(Ws) by varying w over D*. It follows that 


§ = [|] 2,(ws) where || denotes the con- 
we bk 
catenation operator among strings. (In addition to 


LIvexk m,(ws)> the string S also contains the last 
k symbols of s (which do not belong to any ws) 
and the special symbol $. For simplicity these 
symbols will be ignored in the following part of 
the entry.) By (1) it follows that 


Y= |twlws)| Ho (aw (ws) 


webk 


= © |ws| Ho(ws) = |s| Hx(s). 


we DK 


Hence, to compress s up to |s| Hx(s), it suffices 
to compress each substring z,,(ws) up to its Oth 
order empirical entropy. Note, however, that in 
the above scheme the parameter & must be chosen 
in advance. Moreover, a similar scheme cannot 
be applied to H,* which is defined in terms 
of contexts of length at most k. As a result, 
no efficient procedure is known for computing 
the partition of § corresponding to H;,*(s). The 
compression booster [3] is a natural complement 
to the bwt and allows one to compress any string 


s up to Ay, (s) (or H;*(s)) simultaneously for all 
k>0. 


The Compression Boosting Algorithm 

A crucial ingredient of compression boosting is 
the relationship between the bwt matrix and the 
suffix tree data structure. Let 7 denote the suffix 
tree of the string s$. 7 has |s| + 1 leaves, one per 
suffix of s$, and edges labeled with substrings of 
s$ (see Fig. 1). Any node u of 7 has implicitly 
associated a substring of s$, given by the con- 
catenation of the edge labels on the downward 
path from the root of 7 to u. In that implicit 
association, the leaves of 7 correspond to the 
suffixes of s$. Assume that the suffix tree edges 
are sorted lexicographically. Since each row of 
the bwt matrix is prefixed by one suffix of s$ 
and rows are lexicographically sorted, the ith 
leaf (counting from the left) of the suffix tree 
corresponds to the ith row of the bwt matrix. 
Associate to the ith leaf of 7 the ith symbol 
of § = bwt(s). In Fig.1 these symbols are 
represented inside circles. 

For any suffix tree node u, let 5(u) denote the 
substring of 5 obtained by concatenating, from 
left to right, the symbols associated to the leaves 
descending from node u. Of course §(root(T)) = 
Ss. A subset £ of T’s nodes is called a leaf 
cover if every leaf of the suffix tree has a unique 
ancestor in 7. Any leaf cover £ = {u1,...,Up} 
naturally induces a partition of the leaves of 7. 
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Because of the relationship between 7 and the 
bwt matrix, this is also a partition of §, namely, 


{S(u1),...,8(up)}. 


Example 3 Consider the suffix tree in Fig. 1. A 
leaf cover consists of all nodes of depth one. 
The partition of § induced by this leaf cover is 
{i,pssm,$,pi,ssii}. 

Let C denote a function that associates to 
every string x over NU{$} a positive real value 
C(x). For any leaf cover CL, define its cost as 
C(L) = >> C(S(u)). In other words, the cost of 


uel 
the leaf cover £ is equal to the sum of the costs 


of the strings in the partition induced by CL. A leaf 
cover Lyin is called optimal with respect to C if 
C(Lmin) < C(L), for any leaf cover L. 

Let A be acompressor such that, for any string 
x, its output size is bounded by |x| Ho(x)+y|x|+ 
i bits, where y and « are constants. Define the 
cost function Ca(x) = |x| Ho(x) + n|x| + p. In 
[3] Ferragina et al. exhibit a linear-time greedy 
algorithm that computes the optimal leaf cover 
Lmin With respect to Cy. The authors of [3] also 
show that, for any k > 0, there exists a leaf cover 
Ly of cost Ca(Ly) = |s| Hy (s)+n|s|+ O(\EI*). 
These two crucial observations show that, if one 
uses A to compress each substring in the partition 
induced by the optimal leaf cover Lyin, the total 
output size is bounded in terms of |s| Hz(s), for 
any k > 0. In fact, 


Yo Ca (u)) = Ca(Lmin) < Ca(Le) 


u€ Lyin 


= |s| Hi (s) + nls| + O(=|*) 


In summary, boosting the compressor A over the 
string s consists of three main steps: 


1. Compute § = bwt(s). 

2. Compute the optimal leaf cover Lyin with 
respect to Cy and partition § according to Lyin. 

3. Compress each substring of the partition using 
the algorithm A. 


So the boosting paradigm reduces the design of 
effective compressors that use context informa- 
tion, to the (usually easier) design of Oth order 
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compressors. The performance of this paradigm 
is summarized by the following theorem. 


Theorem 1 ((3]) Let A be a compressor that 
squeezes any string x in at most |x| Ho(x) + 
n|x| + pw bits. The compression booster applied 
to A produces an output whose size is bounded 
by |s| Hy(s) + log|s| + nls| + O(=I|*) bits 
simultaneously for all k > 0. With respect to 
A, the booster introduces a space overhead of 
O(|s|log|s|) bits and no asymptotic time over- 
head in the compression process. Oo 


A similar result holds for the modified entropy 
Hi,* as well (but it is much harder to prove): 
given a compressor A that squeezes any string x 
in at most A|x| Ho (x) + p bits, the compression 
booster produces an output whose size is bounded 
by Als Ajf(s) + log |s| + O(|=|*) bits, simulta- 
neously for all k > O. In [3] the authors also 
show that no compression algorithm, satisfying 
some mild assumptions on its inner working, 
can achieve a similar bound in which both the 
multiplicative factor A and the additive logarith- 
mic term are dropped simultaneously. Further- 
more [3] proposes an instantiation of the booster 
which compresses any string s in at most 2.5 
|s| Hy*(s) + log |s| + O(|d|*) bits. This bound is 
analytically superior to the bounds proven for the 
best existing compressors including Lempel-Ziv, 
Burrows-Wheeler, and PPM compressors. 


Applications 


Apart from the natural application in data com- 
pression, compressor boosting has been used also 
to design compressed full-text indexes [8]. 


Open Problems 


The boosting paradigm may be generalized as 
follows: given a compressor A, find a permu- 
tation P for the symbols of the string s and 
a partitioning strategy such that the boosting 
approach, applied to them, minimizes the output 
size. These pages have provided convincing evi- 
dence that the Burrows-Wheeler transform is an 
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elegant and efficient permutation P. Surprisingly 
enough, other classic data compression problems 
fall into this framework: shortest common super- 
string (which is MAX-SNP hard), run length en- 
coding for a set of strings (which is polynomially 
solvable), LZ77, and minimum number of phrases 
(which is MAX-SNP hard). Therefore, the boost- 
ing approach is general enough to deserve further 
theoretical and practical attention [5]. 


Experimental Results 


An investigation of several compression algo- 
rithms based on boosting and a comparison with 
other state-of-the-art compressors are presented 
in [4]. The experiments show that the boosting 
technique is more robust than other bwt-based ap- 
proaches and works well even with less effective 
Oth order compressors. However, these positive 
features are achieved using more (time and space) 
resources. 


Data Sets 


The data sets used in [4] are available from http:// 
people.unipmn.it/manzini/boosting. Other data 
sets for compression and indexing are available 
at the Pizza&Chili site http://pizzachili.di. 
unipi.it/. 


URL to Code 


The compression boosting page (http://people. 
unipmn.it/manzini/boosting) contains the source 
code of all the algorithms tested in [4]. The code 
is organized in a highly modular library that can 
be used to boost any compressor even without 
knowing the bwt or the boosting procedure. 
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Problem Definition 


Branchwidth, along with its better-known coun- 
terpart, treewidth, are measures of the “global 
connectivity” of a graph. 


Definition 

Let G be a graph on n vertices. A branch decom- 
position of G is a pair (T,t), where T is a tree 
with vertices of degree 1 or 3 and t is a bijection 
from the set of leaves of T to the edges of G. 
The order, we denote it as a(e), of an edge e 
in T is the number of vertices v of G such that 
there are leaves f1, f2 in T in different components 
of T(V(T), E(T) — e) with t(t,) and t(t2) both 
containing v as an endpoint. 

The width of (T,t) is equal to maxgex(T) 
{a(e)}, ie., is the maximum order over all edges 
of T. The branchwidth of G is the minimum 
width over all the branch decompositions of G 
(in the case where |E(G)| < 1, then we define 
the branchwidth to be 0; if |E(G)| = 0, then 
G has no branch decomposition; if |E(G)| = 1, 
then G has a branch decomposition consisting of 
a tree with one vertex — the width of this branch 
decomposition is considered to be 0). 

The above definition can be directly extended 
to hypergraphs where t is a bijection from the 
leaves of T to the hyperedges of G. The same 
definition can easily be extended to matroids. 

Branchwidth was first defined by Robertson 
and Seymour in [25] and served as a main tool 
for their proof of Wagner’s Conjecture in their 
Graph Minors series of papers. There, branch- 
width was used as an alternative to the parameter 
of treewidth as it appeared easier to handle for 
the purposes of the proof. The relation between 
branchwidth and treewidth is given by the follow- 
ing result. 


Theorem 1 ((25]) Jf G is a graph, then 
branchwidth(G) < treewidth(G) + 1 < | 3/2 
branchwidth(G)|. 
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The algorithmic problems related to branchwidth 
are of two kinds: first find fast algorithms com- 
puting its value and, second, use it in order to 
design fast dynamic programming algorithms for 
other problems. 


Key Results 


Algorithms for Branchwidth 

Computing branchwidth is an NP-hard problem 
([29]). Moreover, the problem remains NP-hard 
even if we restrict its input graphs to the class of 
split graphs or bipartite graphs [20]. 

On the positive side, branchwidth is 
computable in polynomial time on interval 
graphs [20, 24], and circular arc graphs [21]. 
Perhaps the most celebrated positive result on 
branchwidth is an O(n”) algorithm for the 
branchwidth of planar graphs, given by Seymour 
and Thomas in [29]. In the same paper they also 
give an O(n*) algorithm to compute an optimal 
branch decomposition. (The running time of this 
algorithm has been improved to O(n?) in [18].) 
The algorithm in [29] is basically an algorithm 
for a parameter called carving width, related to 
telephone routing and the result for branchwidth 
follows from the fact that the branch width of 
a planar graph is half of the carving-width of its 
medial graph. 

The algorithm for planar graphs [29] can be 
used to construct an approximation algorithm 
for branchwidth of some non-planar graphs. On 
graph classes excluding a single crossing graph as 
a minor branchwidth can be approximated within 
a factor of 2.25 [7] (a graph A is a minor of 
a graph G if H can be obtained by a subgraph 
of G after applying edge contractions). Finally, 
it follows from [13] that for every minor closed 
graph class, branchwidth can be approximated by 
a constant factor. 

Branchwidth cannot increase when applying 
edge contractions or removals. According to 
the Graph Minors theory, this implies that, for 
any fixed k, there is a finite number of minor 
minimal graphs of branchwidth more than k and 
we denote this set of graphs by By. Checking 
whether a graph G contains a fixed graph as 
a minor can be done in polynomial time [27]. 
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Branchwidth of Graphs, Fig. 1 Example of a graph and its branch decomposition of width 3 


Therefore, the knowledge of Br implies the 
construction of a polynomial time algorithm 
for deciding whether branchwidth(G) < k, 
for any fixed k. Unfortunately B, is known 
only for small values of &. In_ particular, 
Bo = {Po}, Bi = {P4, K3},Bo = {Kg} and 
Bz = {Ks, Vs, Me, O3} (here K, is a clique on r 
vertices, P, is a path on r edges, Vg is the graph 
obtained by a cycle on 8 vertices if we connect 
all pairs of vertices with cyclic distance 4, Mg 
is the octahedron, and Q; is the 3-dimensional 
cube). However, for any fixed k, one can construct 
a linear, on n = |V(G)|, algorithm that decides 
whether an input graph G has branchwidth < k 
and, if so, outputs the corresponding branch 
decomposition (see [3]). In technical terms, 
this implies that the problem of asking, for 
a given graph G, whether branchwidth(G) < k, 
parameterized by k is fixed parameter tractable 
(i.e., belongs in the parameterized complexity 
class FPT). (See [12] for further references on 
parameterized algorithms and complexity.) The 
algorithm in [3] is complicated and uses the 
technique of characteristic sequences, which was 
also used in order to prove the analogous result 
for treewidth. For the particular cases where 
k <3, simpler algorithms exist that use the 
“reduction rule” technique (see [4]). We stress 
that B4 remains unknown while several elements 
of it have been detected so far (including the 


dodecahedron and the icosahedron graphs). 
There is a number of algorithms that for a given 
k in time 2? .n9@) either decide that the 
branchwidth of a given graph is at least k, or 
construct a branch decomposition of width O(k) 
(see [26]). These results can be generalized to 
compute the branchwidth of matroids and even 
more general parameters. 

An exact algorithm for branchwidth appeared 
in [14]. Its complexity is O((2- V3)” -n9), 
The algorithm exploits special properties of 
branchwidth (see also [24]). 

In contrast to treewidth, edge maximal graphs 
of given branchwidth are not so easy to charac- 
terize (for treewidth there are just k-trees, i.e., 
chordal graphs with all maximal cliques of size 
k + 1). An algorithm for generating such graphs 
has been given in [23] and reveals several struc- 
tural issues on this parameter. 

It is known that a large number of graph 
theoretical problems can be solved in linear 
time when their inputs are restricted to graphs 
of small (i.e., fixed) treewidth or branchwidth 
(see [2]). 

Branchwidth appeared to be a useful tool in 
the design of exact subexponential algorithms on 
planar graphs and their generalizations. The basic 
idea behind this approach is very simple: Let P be 
a problem on graphs and G be a class of graphs 
such that 
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e for every graph G €G of branchwidth at 
most £, the problem ? can be solved in time 
2°. 4°) where c is a constant, and; 

¢ for every graph G € G onzn vertices a branch 
decomposition (not necessarily optimal) of G 
of width at most h(n) can be constructed in 
polynomial time, where A(n) is a function. 


Then for every graph G €G, the problem 
P can be solved in time 2°? «n°. Thus, 
everything boils down to computations of con- 
stants c and functions h(n). These computations 
can be quite involved. For example, as was shown 
in [17], for every planar graph G on n vertices, the 
branchwidth of G is at most /4.5n < 2.1214,/n. 
For extensions of this bound to graphs embed- 
dable on a surface of genus g, see [15]. 

Dorn [9] used fast matrix multiplication 
in dynamic programming to estimate the 
constants c for a number of problems. For 
example, for the MAXIMUM INDEPENDENT 
SET problem, c<@/2, where w < 2.376 
is the matrix product exponent over a ring, 
which implies that the INDEPENDENT SET 
problem on planar graphs is solvable in time 
O(2252V”). For the MINIMUM DOMINATING 
SET problem, c < 4, thus implying that the 
branch decomposition method runs in time 
O(23-:99V"), It appears that algorithms of 
running time 2°(V") can be designed even 
for some of the “non-local” problems, such 
as the HAMILTONIAN CYCLE, CONNECTED 
DOMINATING SET, and STEINER TREE, for 
which no time 2° . n°) algorithm on general 
graphs of branchwidth £ is known [11]. Here 
one needs special properties of some optimal 
planar branch decompositions, roughly speaking 
that every edge of T corresponds to a disk on 
a plane such that all edges of G corresponding 
to one component of 7 —e are inside the disk 
and all other edges are outside. Some of the 
subexponential algorithms on planar graphs 
can be generalized for graphs embedded on 
surfaces [10] and, more generally, to graph 
classes that are closed under taking of minors [8]. 

A similar approach can be used for parame- 
terized problems on planar graphs. For example, 
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a parameterized algorithm that finds a dominating 
set of size < k (or reports that no such set exists) 
in time 2°CY)n°® can be obtained based on 
the following observations: there is a constant c 
such that every planar graph of branchwidth at 
least ck does not contain a dominating set of 
size at most k. Then for a given k the algorithm 
computes an optimal branch decomposition of 
a palanar graph G and if its width is more than 
cvVk concludes that G has no dominating set of 
size k. Otherwise, find an optimal dominating 
set by performing dynamic programming in time 
2(Vk) 9). There are several ways of bounding 
a parameter of a planar graph in terms of its 
branchwidth or treewidth including techniques 
similar to Baker’s approach from approximation 
algorithms [1], the use of separators, or by some 
combinatorial arguments, as shown in [16]. An- 
other general approach of bounding the branch- 
width of a planar graph by parameters, is based 
on the results of Robertson et al. [28] regarding 
quickly excluding a planar graph. This brings us 
to the notion of bidimensionality [6]. Parameter- 
ized algorithms based on branch decompositions 
can be generalized from planar graphs to graphs 
embedded on surfaces and to graphs excluding 
a fixed graph as a minor. 


Applications 


See [5] for using branchwidth for solving TSP. 


Open Problems 


1. It is known that any planar graph G has 
branchwidth at most /4.5- ,/|V(G)| (or at 
most 3 - /|E(G)| + 2) [17]. Is it possible 
to improver this upper bound? Any possible 
improvement would accelerate many of the 
known exact or parameterized algorithms on 
planar graphs that use dynamic programming 
on branch decompositions. 

2. In contrast to treewidth, very few graph 
classes are known where branchwidth is 
computable in polynomial time. Find graphs 
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classes where branchwidth can be computed 
or approximated in polynomial time. 


. Find By for values of k bigger than 3. The only 


structural result on By is that its planar ele- 
ments will be either self-dual or pairwise-dual. 
This follows from the fact that dual planar 
graphs have the same branchwidth [29, 16]. 
Find an exact algorithm for branchwidth of 
complexity O*(2”) (the notation O*() as- 
sumes that we drop the non-exponential terms 
in the classic O() notation). 


. The dependence on k of the linear time algo- 


rithm for branchwidth in [3] is huge. Find an 
29D) gO) step algorithm, deciding whether 
the branchwidth of an n-vertex input graph is 
at most k. 
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Problem Definition 


In this entry, we consider the classical broadcast 
scheduling problem and discuss some recent ad- 
vances on this problem. The problem is formal- 
ized as follows: there is a server which has a 
collection of unit-sized pages P = {1,...,n}. 
The server can broadcast pages in integer time 
slots in response to requests, which are given 
as the following sequence: at time rf, the server 
receives wp(t) € Zso requests for each page 
p «€ P. We say that a request p for page p 
that arrives at time f is satisfied at time cp(t) if 
Cp(t) is the first time after ¢ by which the server 
has completely transmitted page p. The response 
time of the request p is defined to be cp(t) — f, 
i.e., the time that elapses from its arrival till 
the time it is satisfied. Notice that by definition, 
the response time for any request is at least 1. 
The goal is to find a schedule for broadcasting 
pages to minimize the average response time, i.e., 
(deep Wot) (Cpt) —2))/ deep Wp (0). Recall that 
the problem we discuss here is an offline problem, 
where the entire request sequence is specified 
as part of the input. There has also been much 
research on the online version of the problem, and 
we briefly discuss this toward the end of the entry. 


Key Results 


Erlebach and Hall [5] were the first to show 
complexity theoretic hardness for this problem by 
showing that it is NP complete. The techniques 
we describe below were introduced in [1,3]. By 
fine-tuning these ideas, [2] shows the following 
result on the approximability of the offline prob- 
lem, which will be the main result we will build 
toward this entry. 


Theorem 1 ([2]) Let y > 0 be any arbitrary 
parameter. There is a polynomial time algorithm 
that finds a schedule with average response time 
(2+y)-OPT + O((/log, 4,7 - log log) logn), 
where OPT denotes the value of the average 
response time in the optimum solution. 


By setting y = O(logn) above, we can get 
an approximation guarantee of O(log'? 1). Also 
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note that the O(log!*n) term in Theorem 1 is 
additive. As a result, for instance, where OPT is 
large (say Q(log'>*€ n) for some € > 0), we 
can set y arbitrarily small to get an approximation 
ratio arbitrarily close to 2. 


Linear Programming Formulation 


All of the algorithmic results in [1-3] are based 
on rounding the following natural LP relaxation 
for the problem. For each page p € [n] and each 
time f, there is a variable yp; which indicates 
whether page p was transmitted at time t. We 
have another set of variables xp; 8.t t' > tf, 
which indicates whether a request for page p 
which arrives at time f is satisfied at t’. Let wp(t) 
denote the total weight of requests for page p that 
arrive at time f. 


min D> (t)=t)- @p(t)- xpi (1) 
pt,t’>t 
St. > Ypt <1 Vt 
Pp 
(2) 
Xp = 1 Vp,t 
t/>t 
(3) 
X ptt’ S Ypt’ Vp,t,t' >t 
(4) 
X ptt’, pr’ € [0, 1] Vp,t,t! 
(5) 


Constraint (2) ensures that only one page is 
transmitted in each time, (3) ensures that each 
request must be satisfied, and (4) ensures that 
a request for page p can be satisfied at time 
t only if p is transmitted at time ¢. Finally, 
a request arriving at time ¢ that is satisfied at 
time t’ contributes (t’ — ft) to the objective. Now 
consider the linear program obtained by relaxing 
the integrality constraints on X pz and yp;. 


Rounding Techniques 


The following points illustrate the main ideas 
that form the building blocks of the rounding 
algorithms in [1-3]. 
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The Half-Integrality Assumption 

In what follows, we discuss the techniques in the 
special case that the LP solution is half-integral, 
i.e., where all the x pry € {0, s}. The general case 
essentially builds upon this intuition, and all main 
technical ingredients are contained in this special 
case. 


Viewing the LP Solution as a Convex 
Combination of Blocks 

In half-integral solutions, note that every request 
is satisfied by the two earliest half broadcasts 
of the corresponding page. For any page p, let 
Tp = {tp,1.tp.2,---} denote the times when the 
fractional solution broadcasts 4 units of page 
p. Notice that the fractional solution can be 
entirely characterized by these sets t, for all 
pages p. The main intuition now is to view 
the fractional broadcast of each page as a con- 
vex combination of two different solutions, one 
which broadcasts the page p integrally at the odd 
times {tp,1,tp,3,-- .} and another which broad- 
casts the page p integrally at the even times 
{tp,2,tp,4,---}. We call these the odd schedule 
and even schedule for page p. 


Rounding the Solution to Minimize 

Backlog: Attempt 1 [1] 

Our first and most natural rounding idea is to 
round the convex combination for each page 
into one of the odd or even schedules, each 
with probability 1/2. Let us call this the tenta- 
tive schedule. Note that on average, the tentative 
schedule broadcasts one page per time slot, and 
moreover, the expected response time of any 
request is equal to its fractional response time. 
The only issue, however, is that different pages 
may broadcast at the same time slots. Indeed, 
there could some time interval [¢,,t2) where the 
tentative schedule makes many more than f2 — 
t,; broadcasts! A natural manner to resolve this 
issue is to broadcast conflicting pages in a first- 
come first-serve manner, breaking ties arbitrarily. 
Now, the typical request waits for at most its 
fractional cost (on average), plus the backlog due 
to conflicting broadcasts. Formally, the backlog is 
defined as max;, 45>1, Na(t. f2)—(t2—t), where 
Na(t1,t2) is the number of broadcasts made in 
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the interval [t,, 2) by the tentative schedule. For 
this simple randomized algorithm, note that the 
backlog of any interval [f;, ty) is at most O(./n) 
w.h.p by a standard concentration bound. This 
can be formalized to give us the O(/n) approxi- 
mation algorithm of [1]. 


Rounding the Solution to Minimize 

Backlog: Attempt 2 [3] 

Our next attempt involves controlling the 
backlog by explicitly enforcing constraints 
which periodically reset the backlog to 0. For 
this, we write an auxiliary LP as follows: 
we divide the set ty for each page p into 
blocks of size B = O(logn) units each and 
have a variable for choosing the odd schedule 
or even schedule within each block. (If the 
first block chooses an odd schedule and the 
second block chooses an even schedule, then 
the requests which arrived at the boundary 
may incur greater costs, but [3] argues that 
these can be bounded for B > S2(logn).) 
Since each block has B/2 units of fractional 
transmission (recall that the LP is half-integral), 
the total number of blocks is at most 2T/B, 
where T is the time horizon of all broadcasts 
made by the LP solution. Therefore, the total 
number of variables is at most 47/B (each 
block chooses either an odd schedule or even 
schedule). Now, instead of asking for the LP to 
choose schedules such that each time has at most 
one transmission, suppose we group the time 
slots into intervals of size B and ask for the LP 
to choose schedules such that each interval has 
at most B transmissions. Now, there are T/B 
such constraints, and there are 27 /B constraints 
which enforce that we pick one schedule for each 
block. 

Therefore, in this relaxed LP, we start with a 
solution that has 47'/B variables each set to 1/2 
and a total of 37/B constraints. But now, we can 
convert this into an optimal basic feasible solu- 
tion (where the number of nonzero variables is at 
most the number of constraints). This implies that 
at least a constant fraction of the blocks chooses 
either the odd or even schedules integrally. It 
is easy to see that the backlog incurred in any 
time interval [f;,t2) is at most O(B) since we 
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explicitly enforce 0 backlog for consecutive in- 
tervals of size B. Therefore, by repeating this 
process O(log 7) time, we get a fully integral 
schedule with backlog O(log TB) = O(log” n). 
This then gives us the 2 - OPT + O(log? n) 
approximation guarantee of [3]. 


Rounding the Solution to Minimize 

Backlog: Attempt 3 [2] 

Our final attempt involves combining the ideas 
of attempts 1 and 2. Indeed, the main issue 
with approach 2 is that, when we solve for a 
basic feasible solution, we lose all control over 
how the solution looks! Therefore, we would 
ideally like for a rounding which enforces 
the constraints on time intervals of size B, 
but still randomly selects the schedules within 
each block. This way, we'll be able to argue 
that within each time interval of size B, the 
maximum backlog is O(/B). Moreover, if 
we look at a larger time interval 7, we can 
decompose this into intervals of size B for 
which we have constraints in the LP, and a 
prefix and suffix of size at most B. Therefore, 
the backlog is constrained to be 0 by the LP 
for all intermediate intervals except the prefix 
and suffix which can have a backlog of O(/B). 
This will immediately give us the O(log! n) 
approximation of [2]. 

Indeed, the main tool which lets us achieve 
this comes from a recent rounding technique of 
Lovett and Meka [7]. They prove the following 
result which they used as a subroutine for mini- 
mizing discrepancy of set systems, but it turns out 
to be a general result applicable in our setting as 
well [2]. 


Theorem 2 (Constructive partial coloring 
theorem [7]) Let y € [0,1]” be any starting 
point, 5 > 0 be an arbitrary error parameter, 


V1,...,Un € R"” vectors, and hj,...,A4n > 0 
parameters with 
pees ©) 
~ 16° 


i=1 


Then, there is a randomized O((m-+n)} /8?)-time 
algorithm to compute a vector z € [0, 1] with 
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(i) z; € [0,6] U [1 — 4, 1] for at least m/32 of 
the indices j € [m]. 
(ii) |vj,-Z—v; -y| < A; ||v;||2, for eachi € [n]. 


Hardness Results 


The authors [2] also complement the above algo- 
rithmic result with the following negative results. 


Theorem 3 The natural LP relaxation for the 


broadcast problem has an integrality gap of 
Q(logn). 


Interestingly, Theorem 3 is based on 
establishing a new connection with the problem 
of minimizing the discrepancy of 3 permutations. 
In the 3-permutation problem, we are given 
3 permutations 71, 72, 73 of [n]. The goal 
of the problem is to find a coloring x that 
minimizes the discrepancy. The discrepancy of 
IT = (7,%2,73) ws.t a +1 coloring x is the 
worst case discrepancy of all prefixes. That is, 
mink Max? pS x%,)) where z7;,; is 
the jth element in 2;. Newman and Nikolov 
[8] showed a tight @(logn) lower bound on 
the discrepancy of 3 permutations, resolving a 
long-standing conjecture. The authors [2] note 
that this can be used to give an integrality gap 
for the broadcast scheduling problem as well. 
Then, by generalizing the connection to the 
discrepancy of £ permutations, [2] shows the 
following hardness results (prior to this, only NP 
hardness was known). 


Theorem 4 There is no O(log!/?~€ n) approxi- 
mation algorithm for the problem of minimizing 
average response time, for any € > 0, unless 
NP c U:soBPTIME(2!°" ”). Moreover, for any 
sufficiently large ¢, there is no O(¢"/?) approxi- 
mation algorithm for the €-permutation problem, 


unless NP = RP. 


Online Broadcast Scheduling 


The broadcast scheduling problem has also been 
studied in the online scheduling model, where the 
algorithm is made aware of requests only when 
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they arrive, and it has to make the broadcast 
choices without knowledge of the future requests. 
Naturally, the performance of our algorithms de- 
grade when compared to the offline model, but re- 
markably, we can get nontrivial algorithms even 
in the online model! The only additional assump- 
tion we need in the online model is that our 
scheduling algorithm may broadcast two pages 
(instead of one) every 1/e time slots, in order to 
get approximation ratios that depend on 1/e. In 
particular, several (1 + €)-speed, O(poly(1/e)) 
are now known [4, 6], and it is also known that 
extra speed is necessary to obtain n°) competi- 
tive algorithms. 
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Problem Definition 


The Model Overview 

Consider a set of stations (nodes) modeled as 
points in the plane, labeled by natural numbers, 
and equipped with transmitting and receiving ca- 
pabilities. Every node u has a range r,, depending 
on the power of its transmitter, and it can reach 
all nodes at distance at most r, from it. The 
collection of nodes equipped with ranges deter- 
mines a directed graph on the set of nodes, called 
a geometric radio network (GRN), in which a 
directed edge (uv) exists if node v can be reached 
from u. In this case u is called a neighbor of v. If 
the power of all transmitters is the same, then all 
ranges are equal and the corresponding GRN is 
symmetric. 
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Nodes send messages in synchronous rounds. 
In every round, every node acts either as a trans- 
mitter or as a receiver. A node gets a message in 
a given round, if and only if it acts as a receiver 
and exactly one of its neighbors transmits in this 
round. The message received in this case is the 
one that was transmitted. If at least two neighbors 
of a receiving node u transmit simultaneously in 
a given round, none of the messages is received 
by uw in this round. In this case, it is said that a 
collision occurred at u. 


The Problem 

Broadcasting is one of the fundamental network 
communication primitives. One node of the net- 
work, called the source, has to transmit a message 
to all other nodes. Remote nodes are informed via 
intermediate nodes, along directed paths in the 
network. One of the basic performance measures 
of a broadcasting scheme is the total time, i.e., the 
number of rounds it uses to inform all the nodes 
of the network. 

For a fixed real s > 0, called the knowledge 
radius, it is assumed that each node knows the 
part of the network within the circle of radius s 
centered at it, i.e., it knows the positions, labels, 
and ranges of all nodes at distance at most s. The 
following problem is considered: 

How does the size of the knowledge radius in- 
fluence deterministic broadcasting time in GRN? 


Terminology and Notation 

Fix a finite set R = {r1,..., 1p} of positive reals 
such that r; < < rp. Reals r; are called 
ranges. A node v is a triple [/, (x, y), r;], where / 
is a binary sequence called the label of v; (x, y) 
are coordinates of a point in the plane, called the 
position of v; and r; € R is called the range 
of v. It is assumed that labels are consecutive 
integers | to n, where n is the number of nodes, 
but all the results hold if labels are integers in the 
set {1,..., WZ}, where M € O(n). Moreover, it 
is assumed that all nodes know an upper bound 
IT on n, where T is polynomial in n. One of 
the nodes is distinguished and called the source. 
Any set of nodes C with a distinguished source, 
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such that positions and labels of distinct nodes are 
different, is called a configuration. 

With any configuration C, the following di- 
rected graph G(C) is associated. Nodes of the 
graph are nodes of the configuration and a di- 
rected edge (uv) exists in the graph, if and only 
if the distance between u and v does not exceed 
the range of u. (The word “distance” always 
means the geometric distance in the plane and 
not the distance in a graph.) In this case uw is 
called a neighbor of v. Graphs of the form G(C) 
for some configuration C are called geometric 
radio networks (GRN). In what follows, only 
configurations C such that in G(C) there exists 
a directed path from the source to any other node 
are considered. If the size of the set R of ranges is 
p, aresulting configuration and the corresponding 
GRN are called a p-configuration and p-GRN, 
respectively. Clearly, all 1-GRN are symmetric 
graphs. D denotes the eccentricity of the source 
ina GRN, i.e., the maximum length of all shortest 
paths in this graph from the source to all other 
nodes. D is of order of the diameter if the graph 
is symmetric but may be much smaller in general. 
Q2(D) is an obvious lower bound on broadcasting 
time. 

Given any configuration, fix a nonnegative real 
s, called the knowledge radius, and assume that 
every node of C has initial input consisting of all 
nodes whose positions are at distance at most s 
from its own. Thus, it is assumed that every node 
knows a priori labels, positions, and ranges of all 
nodes within a circle of radius s centered at it. All 
nodes also know the set R of available ranges. 

It is not assumed that nodes know any global 
parameters of the network, such as its size or 
diameter. The only global information that nodes 
have about the network is a polynomial upper 
bound on its size. Consequently, the broadcast 
process may be finished but no node needs to be 
aware of this fact. Hence, the adopted definition 
of broadcasting time is the same as in [3]. An 
algorithm accomplishes broadcasting in ¢ rounds, 
if all nodes know the source message after round 
t, and no messages are sent after round rf. 

Only deterministic algorithms are considered. 
Nodes can transmit messages even before getting 
the source message, which enables preprocessing 
in some cases. The algorithms are adaptive, i.e., 
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nodes can schedule their actions based on their 
local history. A node can obviously gain knowl- 
edge from previously obtained messages. There 
is, however, another potential way of acquiring 
information during the communication process. 
The availability of this method depends on what 
happens during a collision, i.e., when u acts as a 
receiver and two or more neighbors of u transmit 
simultaneously. As mentioned above, u does not 
get any of the messages in this case. However, 
two scenarios are possible. Node u may either 
hear nothing (except for the background noise), 
or it may receive interference noise different from 
any message received properly but also different 
from background noise. In the first case, it is said 
that there is no collision detection, and in the 
second case — that collision detection is available 
(cf., e.g., [1]). A discussion justifying both sce- 
narios can be found in [1,7]. 


Related Work 

Broadcasting in geometric radio networks and 
some of their variations was considered, e.g., 
in [6, 8,9, 11, 12]. In [12] the authors proved 
that scheduling optimal broadcasting is NP hard 
even when restricted to such graphs and gave 
an O(n logn) algorithm to schedule an optimal 
broadcast when nodes are situated on a line. In 
[11] broadcasting was considered in networks 
with nodes randomly placed on a line. In [9] 
the authors discussed fault-tolerant broadcasting 
in radio networks arising from regular locations 
of nodes on the line and in the plane, with 
reachability regions being squares and hexagons, 
rather than circles. Finally, in [6] broadcasting 
with restricted knowledge was considered but the 
authors studied only the special case of nodes 
situated on the line. 


Key Results 


The results summarized below are based on the 
paper [5], of which [4] is a preliminary version. 


Arbitrary GRN in the Model Without 

Collision Detection 

Clearly all upper bounds and algorithms are valid 
in the model with collision detection as well. 
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Large Knowledge Radius 

Theorem 1 The minimum to perform 
broadcasting in an arbitrary GRN with source 
eccentricity D and knowledge radius s > rp (or 
with global knowledge of the network) is Q(D). 


time 


This result yields a centralized O(D) broad- 
casting algorithm when global knowledge of the 
GRN is available. This is in sharp contrast with 
broadcasting in arbitrary graphs, as witnessed by 
the graph from [10] which has bounded diameter 
but requires time Q(log 7) for broadcasting. 


Knowledge Radius Zero 

Next consider the case when knowledge radius 
s = 0, 1.e., when every node knows only its 
own label, position, and range. In this case, it is 
possible to broadcast in time O(n) for arbitrary 
GRN. It should be stressed that this upper bound 
is valid for arbitrary GRN, not only symmetric, 
unlike the algorithm from [3] designed for arbi- 
trary symmetric graphs. 


Theorem 2 It is possible to broadcast in arbi- 
trary n-node GRN with knowledge radius zero in 
time O(n). 


The above upper bound for GRN should be 
contrasted with the lower bound from [2,3] show- 
ing that some graphs require broadcasting time 
Q(nlogn). Indeed, the graphs constructed in 
[2,3] and witnessing to this lower bound are not 
GRN. Surprisingly, this sharper lower bound does 
not require very unusual graphs. While coun- 
terexamples from [2,3] are not GRN, it turns 
out that the reason for a longer broadcasting 
time is really not the topology of the graph but 
the difference in knowledge available to nodes. 
Recall that in GRN with knowledge radius 0, 
it is assumed that each node knows its own 
position (apart from its label and range): the up- 
per bound O(n) uses this geometric information 
extensively. 

If this knowledge is not available to nodes (i.e., 
each node knows only its label and range), then 
there exists a family of GRN requiring broad- 
casting time Q(n logn). Moreover, it is possible 
to show such GRN resulting from configurations 
with only 2 distinct ranges. (Obviously for 1 
configurations, this lower bound does not hold, as 
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these configurations yield symmetric GRN, and 
in [3], the authors showed an O(n) algorithm 
working for arbitrary symmetric graphs). 


Theorem 3 /f every node knows only its own 
label and range (and does not know its position), 
then there exist n-node GRN requiring broadcast- 
ing time Q(n logn). 


Symmetric GRN 


The Model with Collision Detection 

In the model with collision detection and know]- 
edge radius zero, optimal broadcast time is estab- 
lished by the following pair of results. 


Theorem 4 In the model with collision detec- 
tion and knowledge radius zero, it is possible 
to broadcast in any n-node symmetric GRN of 
diameter D in time O(D + logn). 


The next result is the lower bound Q(logn) 
for broadcasting time, holding for some GRN 
of diameter 2. Together with the obvious bound, 
Q2(D) this matches the upper bound from Theo- 
rem 4. 


Theorem 5 For any broadcasting algorithm 
with collision detection and knowledge radius 
zero, there exist n-node symmetric GRN of 
diameter 2 for which this algorithm requires 
time Q(logn). 


The Model Without Collision Detection 

For the model without collision detection, it is 
possible to maintain complexity O(D + logn) 
of broadcasting. However, we need a stronger 
assumption concerning knowledge radius: it is no 
longer 0, but positive, although arbitrarily small. 


Theorem 6 In the model without collision detec- 
tion, it is possible to broadcast in any n-node 
symmetric GRN of diameter D in time O(D + 
logn), for any positive knowledge radius. 


Applications 


The radio network model is applicable to wireless 
networks using a single frequency. The specific 
model of geometric radio networks described 
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in section “Problem Definition” is applicable to 
wireless networks where stations are located in a 
relatively flat region without large obstacles (nat- 
ural or human made), e.g., in the sea or a desert, 
as opposed to a large city or a mountain region. In 
such a terrain, the signal of a transmitter reaches 
receivers at the same distance in all directions, 
i.e., the set of potential receivers of a transmitter 
is a disc. 


Open Problems 


1. Is it possible to broadcast in time o(n) in 
arbitrary n-node GRN with eccentricity D 
sublinear in 1 for knowledge radius zero? 
Note: In view of Theorem 2, it is possible to 
broadcast in time O(7). 

2. Is it possible to broadcast in time O(D +logn) 

in all symmetric n-node GRN with eccen- 
tricity D, without collision detection, when 
knowledge radius is zero? 
Note: In view of Theorems 4 and 6, the answer 
is positive if either collision detection or a 
positive (even arbitrarily small) knowledge 
radius is assumed. 
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Problem Definition 


This problem is concerned with storing a linearly 
ordered set of elements such that the DICTIO- 
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NARY operations FIND, INSERT, and DELETE 
can be performed efficiently. 

In 1972, Bayer and McCreight introduced the 
class of B-trees as a possible way of implement- 
ing an “index for a dynamically changing random 
access file” [7, p. 173]. B-trees have received 
considerable attention both in the database and in 
the algorithms community ever since; a promi- 
nent witness to their immediate and widespread 
acceptance is the fact that the authoritative survey 
on B-trees authored by Comer [10] appeared as 
soon as 1979 and, already at that time, referred 
to the B-tree data structure as the “ubiquitous 
B-tree.” 


Notations 

A B-tree is a multiway search tree defined as fol- 
lows (the definition of Bayer and McCreight [7] 
is restated according to Knuth [19, Sec. 6.2.4] and 
Cormen et al. [11, Ch. 18.1]): 


Definition 1 Let m > 3 be a positive integer. A 
tree T is a B-tree of degree m if it is either empty 
or fulfills the following properties: 


1. All leaves of T appear on the same level of T. 

2. Every node of T has at most m children. 

3. Every node of 7, except for the root and the 
leaves, has at least m/2 children. 

4. The root of T is either a leaf or has at least two 
children. 

5. An internal node with & children c;[v],..., 
cx[v] stores k — 1 keys, and a leaf stores 
between m/2 — 1 and m — 1 keys. The keys 
key,;[v], 1 < i < k —1, of anode v € T 
are maintained in sorted order, i.e., key, [v] < 

. < key,_;[v]. 

6. If v is an internal node of 7 with k 
children c,[v],...,cx[v], the k — 1 keys 
key, [v],...,key,_,[v] of v separate the range 
of keys stored in the subtrees rooted at the 
children of v. If x; is any key stored in the 
subtree rooted at c;[v], the following holds: 


xX; <key,[v] < x2 < key,[v] <--- < xg-1 


< key,_,[v] < x% 
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To search a B-tree for a given key x, the 
algorithm starts with the root of the tree being 
the current node. If x matches one of the current 
node’s keys, the search terminates successfully. 
Otherwise, if the current node is a leaf, the search 
terminates unsuccessfully. If the current node’s 
key does not contain x and if the current node 
is not a leaf, the algorithm identifies the unique 
subtree rooted at the child of the current node that 
may contain x and recurses on this subtree. Since 
the keys of a node guide the search process, they 
are also referred to as routing elements. 


Variants and Extensions 

Knuth [19] defines a B*-tree to be a B-tree where 
Property 3 in Definition | is modified such that 
every node (except for the root) contains at least 
2m/3 keys. 

A B*-tree is a leaf-oriented B-tree, i.e., a 
B-tree that stores the keys in the leaves only. 
Additionally, the leaves are linked in left-to-right 
order to allow for fast sequential traversal of the 
keys stored in the tree. In a leaf-oriented tree, 
the routing elements usually are copies of certain 
keys stored in the leaves (key;[v] can be set to 
be the largest key stored in the subtree rooted at 
c;[v]), but any set of routing elements that fulfills 
Properties 5 and 6 of Definition | can do as well. 

Huddleston and Mehlhorn [16] extended Def- 
inition | to describe a more general class of 
multiway search trees that includes the class of 
B-trees as a special case. Their class of so-called 
(a, b)-trees is parameterized by two integers a 
and b with a > 2 and 2a — 1 < b. Property 2 
of Definition | is modified to allow each node to 
have up to b children, and Property 3 is modified 
to require that, except for the root and the leaves, 
every node of an (a, b)-tree has at least a chil- 
dren. All other properties of Definition | remain 
unchanged for (a, b)-trees. Usually, (a, b)-trees 
are implemented as leaf-oriented trees. 

By the above definitions, a B-tree is a 
(b/2, b)-tree (if b is even) or an (a, 2a — 1)-tree 
(if b is odd). The subtle difference between even 
and odd maximum degree becomes relevant in an 
important amortization argument of Huddleston 
and Mehlhorn (see below) where the inequality 
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b > 2a is required to hold. This amortization 
argument actually caused (a,b)-trees with 
b > 2a to be given a special name: weak B- 
trees [16]. 


Update Operations 
An INSERT operation on an (a, b)-tree first tries 
to locate the key x to be inserted. After an 
unsuccessful search that stops at some leaf @, 
x is inserted into €’s set of keys. If £ becomes 
too full, i.e., contains more than b elements, two 
approaches are possible to resolve this overflow 
situation: (1) the node @ can be split around its 
median key into two nodes with at least a keys 
each, or (2) the node £ can have some of its keys 
be distributed to its left or right siblings (if this 
sibling has enough space to accommodate the 
new keys). In the first case, a new routing element 
separating the keys in the two new subtrees of £’s 
parent jz has to be inserted into the key set of 
j4, and in the second case, the routing element 
in #4 separating the keys in the subtree rooted 
at £ from the keys rooted at ¢’s relevant sibling 
needs to be updated. If & was split, the node jz 
needs to be checked for a potential overflow due 
to the insertion of a new routing element, and 
the split may propagate all the way up to the 
root. 

A DELETE operation also first locates the key 
x to be deleted. If (in a non-leaf-oriented tree) 
x resides in an internal node, x is replaced by 
the largest key in the left subtree of x (or the 
smallest key in the right subtree of x) which 
resides in a leaf and is deleted from there. In a 
leaf-oriented tree, keys are deleted from leaves 
only (the correctness of a routing element on a 
higher levels is not affected by this deletion). 
In any case, a DELETE operation may result in 
a leaf node £ containing less than a elements. 
Again, there are two approaches to resolve this 
underflow situation: (1) the node @ is merged 
with its left or right sibling node or (2) keys 
from ’s left or right sibling node are moved to 
£ (unless the sibling node would underflow as a 
result of this). Both underflow handling strategies 
require updating the routing information stored 
in the parent of £ which (in the case of merging) 
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may underflow itself. As with overflow handling, 
this process may propagate up to the root of the 
tree. 

Note that the root of the tree can be split as 
a result of an INSERT operation and that it may 
disappear if the only two children of the root are 
merged to form the new root. This implies that 
B-trees grow and shrink at the top, and thus all 
leaves are guaranteed to appear on the same level 
of the tree (Property | of Definition 1). 


Key Results 


Since B-trees are a premier index structure for 
external storage, the results given in this section 
are stated not only in the RAM-model of compu- 
tation but also in the I/O-model of computation 
introduced by Aggarwal and Vitter [2]. In the 
1/O-model, not only the number N of elements 
in the problem instance but also the number M 
of elements that simultaneously can be kept in 
main memory and the number B of elements 
that fit into one disk block are (nonconstant) 
parameters, and the complexity measure is the 
number of I/O-operations needed to solve a given 
problem instance. If B-trees are used in an ex- 
ternal memory setting, the degree m of the B- 
tree is usually chosen such that one node fits 
into one disk block, i.e., m € O(B), and this is 
assumed implicitly whenever the I/O-complexity 
of B-trees is discussed. 


Theorem 1 The height of an N-key B-tree of 
degree m > 3 is bounded by logtmj2,(N + 1)/2. 


Theorem 2 ((22]) The storage utilization for B- 
trees of high order under random insertions and 
deletions is approximately In2 ~ 69 %. 


Theorem 3 A B-tree may be used to implement 
the abstract data type DICTIONARY such that 
the operations FIND, INSERT, and DELETE on 
a set of N elements from a linearly ordered 
domain can be performed in O(log N) time (with 
O(logz N) 1/O-operations) in the worst case. 


Remark I By threading the nodes of a B-tree, 
i.e., by linking the nodes according to their in- 
order traversal number, the operations PREV and 
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NEXT can be performed in constant time (with a 
constant number of I/O-operations). 


A (one-dimensional) range query asks for all 
keys that fall within a given query range (inter- 
val). 


Lemma 1 A B-tree supports (one-dimensional) 
range queries with O(log N + K) time complex- 
ity (O(logg N + K/B) 1/O-complexity) in the 
worst case where K is the number of keys re- 
ported. 


Under the convention that each update to a 
B-tree results in a new “version” of the B-tree, 
a multiversion B-tree is a B-tree that allows for 
updates of the current version but also supports 
queries in earlier versions. 


Theorem 4 ([9]) A multiversion B-tree can be 
constructed from a B-tree such that it is optimal 
with respect to the worst-case complexity of the 
FIND, INSERT, and DELETE operations as well 
as to the worst-case complexity of answering 
range queries. 


Applications 


Databases 

One of the main reasons for the success of the B- 
tree lies in its close connection to databases: any 
implementation of Codd’s relational data model 
(introduced incidentally in the same year as B- 
trees were invented) requires an efficient indexing 
mechanism to search and traverse relations that 
are kept on secondary storage. If this index is 
realized as a Bt-tree, all keys are stored in a 
linked list of leaves which is indexed by the 
top levels of the B*-tree, and thus both efficient 
logarithmic searching and sequential scanning of 
the set of keys is possible. 

Due to the importance of this indexing mech- 
anism, a wide number of results on how to in- 
corporate B-trees and their variants into database 
systems and how to formulate algorithms us- 
ing these structures have been published in the 
database community. Comer [10] and Graefe [14] 
summarize early and recent results, but due to 
the bulk of results, even these summaries cannot 
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be fully comprehensive. Also, B-trees have been 
shown to work well in the presence of concurrent 
operations [8], and Mehlhorn [20, p. 212] notes 
that they perform especially well if a top-down 
splitting approach is used. The details of this 
splitting approach may be found, e.g., in the 
textbook of Cormen et al. [11, Ch. 18.2]. 


Priority Queues 

A B-tree may be used to serve as an implemen- 
tation of the abstract data type PRIORITYQUEUE 
since the smallest key always resides in the first 
slot of the leftmost leaf. 


Lemma 2 An implementation of a priority queue 
that uses a B-tree supports the MIN operation 
in O(1) time (with O(1) I/O-operations). All 
other operations (including DECREASEKEY) 
have a time complexity of O(logN) (an 
I/O-complexity of O(logg N)) in the worst 
case. 


Mehlhorn [20, Sec. III, 5.3.1] examined B- 
trees (and, more general, (a, b)-trees with a > 2 
and b > 2a — 1) in the context of mergeable 
priority queues. Mergeable priority queues are 
priority queues that additionally allow for con- 
catenating and splitting priority queues. Concate- 
nating priority queues for a set S$; # @ and a set 
Sz 4 @ is only defined if max{x | x € Sy} < 
min{x | x € S} and results in a single priority 
queue for S; U Sp. Splitting a priority queue for 
a set S3 4 @ according to some y € dom(S3) 
results in a priority queue for the set S4 := {x € 
S3 | x < y} and a priority queue for the set 
Ss := {x € S3 | x > y} (one of these sets 
may be empty). Mehlhorn’s result restated in the 
context of B-trees is as follows: 


Theorem 5 (Theorem 6 in [20, Sec. III, 
5.3.1]) If sets Sy # © and Sy 4 @O 
are represented by a_ B-tree each, then 
operation CONCATENATE(S), S2) takes time 
O(log max{|$1|,|S2|}) (has an I/O-complexity 
of O(logp max{|Si|,|S2|})) and operation 
SPLIT(S1, y) takes time O(log|S,|) (has an 
I/O-complexity of O(log pg |S1|)). All bounds hold 


in the worst case. 
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Buffered Data Structures 

Many applications (including sorting) that in- 
volve massive data sets allow for batched data 
processing. A variant of B-trees that exploits this 
relaxed problem setting is the so-called buffer tree 
proposed by Arge [4]. A buffer tree is a B-trees of 
degree m € ©(M/B) (instead of m € O(B)) 
where each node is assigned a buffer of size 
©(M). These buffers are used to collect updates 
and query requests that are passed further down 
the tree only if the buffer gets full enough to allow 
for cost amortization. 


Theorem 6 (Theorem 1 in [4]) The total cost 
of an arbitrary sequence of N intermixed IN- 
SERT and DELETE operations on an initially 
empty buffer tree is O(N/B logy ;g N/B) /O 
operations, that is, the amortized I/O-cost of an 
operation is O(1/B logy pg N/B). 


As a consequence, N elements can be sorted 
spending an optimal number of O(N/B logy, 
N/B) Y/O-operations by inserting them into a 
(leaf-oriented) buffer tree in a batched manner 
and then traversing the leaves. By the preceding 
discussion, buffer trees can also be used to im- 
plement (batched) priority queues in the external 
memory setting. Arge [4] extended his analysis 
of buffer trees to show that they also support 
DELETEMIN operations with an amortized I/O- 
cost of O(1/B logy)g N/B). 

Since the degree of a buffer tree is too large 
to allow for efficient Single shot, i.e., non- 
batched operations, Arge et al. [6] discussed 
how buffers can be attached to (and later 
detached from) a multiway tree while at 
the same time keeping the degree of the 
base structure in ©(B). Their discussion 
uses the R-tree index structure as a running 
example; the techniques presented, however, 
carry over to the B-tree. The resulting data 
structure is accessed through standard methods 
and additionally allows for batched update 
operations, e.g., bulk loading, and queries. The 
amortized I/O-complexity of all operations is 
analogous to the complexity of the buffer tree 
operations. 

Using this buffering technique along with 
weight balancing [5], Achakeev and Seeger [1] 
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showed how to efficiently bulk load and bulk 
update partially persistent data structures such as 
the multiversion B-tree. 

Variants of the B-tree base structure that sup- 
port modern architectures such as many-core pro- 
cessors and that can be updated efficiently have 
also been proposed by Sewall et al. [21], Graefe 
et al. [15], and Erb et al. [12]. 


B-trees as Base Structures 

Several external memory data structures are de- 
rived from B-trees or use a B-tree as their base 
structure — see the survey by Arge [3] for a 
detailed discussion. One of these structures, the 
so-called weight-balanced B-tree is particularly 
useful as a base tree for building dynamic exter- 
nal data structures that have secondary structures 
attached to all (or some) of their nodes. The 
weight-balanced B-tree, developed by Arge and 
Vitter [5], is a variant of the B-tree that requires 
all subtrees of a node to have approximately, i.e., 
up to a small constant factor, the same number of 
leaves. Weight-balanced B-trees can be shown to 
have the following property: 


Theorem 7 ((5]) In a weight-balanced B-tree, 
rebalancing after an update operation is per- 
formed by splitting or merging nodes. When a 
rebalancing operation involves a node v that 
is the root of a subtree with w(v) leaves, at 
least O(w(v)) update operations involving leaves 
below v have to be performed before v itself has 
to be rebalanced again. 


Using the above theorem, amortized bounds 
for maintaining secondary data structures 
attached to nodes of the base tree can be obtained 
— as long as each such structure can be updated 
with an I/O-complexity linear in the number of 
elements stored below the node it is attached 
to [3,5]. 


Amortized Analysis 

Most of the amortization arguments used for 
(a, b)-trees, buffer trees, and their relatives are 
based upon a theorem due to Huddleston and 
Mehlhorn [16, Theorem 3]. This theorem states 
that the total number of rebalancing operations in 
any sequence of N intermixed insert and delete 
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operations performed on an initially empty weak 
B-tree, i.e., an (a,b)-tree with b > 2a, is at 
most linear in NV. This result carries over to buffer 
trees since they are (M//4B, M/B)-trees. Since 
B-trees are (a, b)-trees with b = 2a — 1 (if D is 
odd), the result in its full generality is not valid for 
B-trees, and Huddleston and Mehlhorn present a 
simple counterexample for (2, 3)-trees. 

A crucial fact used in the proof of the above 
amortization argument is that the sequence of 
operations to be analyzed is performed on an 
initially empty data structure. Jacobsen et al. [17] 
proved the existence of non-extreme (a, b)-trees, 
i.e., (a,b)-trees where only few nodes have a 
degree of a or b. Based upon this, they re- 
established the above result that the rebalancing 
cost in a sequence of operations is amortized 
constant (and thus the related result for buffer 
trees) also for operations on initially nonempty 
data structures. 

In connection with concurrent operations in 
database systems, it should be noted that the 
analysis of Huddleston and Mehlhorn actually 
requires b > 2a + 2 if a top-down splitting 
approach is used. In can be shown, though, that 
even in the general case, few node splits (in an 
amortized sense) happen close to the root. 


URLs to Code and Data Sets 


There is a variety of (commercial and free) imple- 
mentations of B-trees and (a, b)-trees available 
for download. Representatives are the C++-based 
implementations that are part of the LEDA- 
library (http://www.algorithmic-solutions.com), 
the STXXL-library (http://stxxl.sourceforge.net), 
and the TPIE-library (http://www.madalgo.au.dk/ 
tpie/) as well as the Java-based implementation 
that is part of the javaxx1-library (http://xxl. 
googlecode.com). Furthermore, (pseudo-code) 
implementations can be found in almost every 
textbook on database systems or on algorithms 
and data structures — see, e.g., [11, 13]. Since 
textbooks almost always leave developing the 
implementation details of the DELETE operation 
as an exercise to the reader, the discussion by 
Jannink [18] is especially helpful. 
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Problem Definition 


The Burrows-Wheeler transform is a technique 
used for the lossless compression of data. It is the 
algorithmic core of the tool bzip2 which has be- 
come a standard for the creation and distribution 
of compressed archives. 

Before the introduction of the Burrows- 
Wheeler transform, the field of lossless data 
compression was dominated by two approaches 
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(see [2,21] for comprehensive surveys). The first 
approach dates back to the pioneering works of 
Shannon and Huffman, and it is based on the idea 
of using shorter codewords for the more frequent 
symbols. This idea has originated the techniques 
of Huffman and arithmetic coding and, more 
recently, the PPM (prediction by partial 
matching) family of compression algorithms. 
The second approach originated from the works 
of Lempel and Ziv and is based on the idea of 
adaptively building a dictionary and representing 
the input string as a concatenation of dictionary 
words. The best-known compressors based on 
this approach form the so-called ZIP-family; 
they have been the standard for several years 
and are available on essentially any computing 
platform (e.g., gzip, zip, winzip, just to cite 
a few). 

The Burrows-Wheeler transform introduced 
a completely new approach to lossless data 
compression based on the idea of transforming 
the input to make it easier to compress. In the 
authors’ words: “(this) technique [...] works by 
applying a reversible transformation to a block 
of text to make redundancy in the input more ac- 
cessible to simple coding schemes” [5, Sect. 7]. 
Not only has this technique produced some state- 
of-the-art compressors, but it also originated 
the field of compressed indexes [20] and it has 
been successfully extended to compress (and 
index) structured data such as XML files [11] and 
tables [22]. 


Key Results 


Notation 

Let s be a string of length m drawn from an 
alphabet &. For 7 = 0,..., n — 1, s[i] denotes 
the i-th character of s and s[i,n — 1] denotes 
the suffix of s starting at position i (i.e., starting 
with the character s[i]). Given two strings s and 
t, the notation s ~< tf is used to denote that s 
lexicographically precedes f. 


The Burrows-Wheeler Transform 
In [5] Burrows and Wheeler introduced a new 
compression algorithm based on a reversible 
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called the Burrows- 
Wheeler transform (bwt). Given a string s, the 
computation of bwt(s) consists of three basic 
steps (see Fig. 1): 


transformation, now 


1. Append to the end of s a special symbol $ 
smaller than any other symbol in &. 

2. Form a conceptual matrix M whose rows are 
the cyclic shifts of the string s$ sorted in 
lexicographic order. 

3. Construct the transformed text § = bwt(s) by 
taking the last column of M. 


Notice that every column of M, hence also the 
transformed text 5, is a permutation of s$. As an 
example F’,, the first column of the bwt matrix 
M« consists of all characters of s alphabetically 
sorted. In Fig. | it is F = $iiiimppssss. 

Although it is not obvious from its definition, 
the bwt is an invertible transformation, and both 
the bwt and its inverse can be computed in O(n) 
optimal time. To be consistent with the more 
recent literature, the following notation and proof 
techniques will be slightly different from the ones 
in [5]. 


Definition 1 For 1< i <n, let s[k;,n—1] denote 
the suffix of s prefixing row i of M, and define 
W(i) as the index of the row prefixed by s[A; + 1, 
n— 1). 


For example, in Fig. | itis Y(2) = 7 since row 
2 of M is prefixed by ippi and row 7 is prefixed 
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by ppi. Note that Y(z) is not defined for i = 0 
since row 0 is not prefixed by a proper suffix of s. 
(In [5] instead of W the authors make use of a 
map which is essentially the inverse of VY. The 
use of Y has been introduced in the literature of 
compressed indexes where W and its inverse play 
an important role (see [20]).) 


Lemmal fori = 1l,... 
s[W(i)]. 


wn, it is F[i] = 


Proof Since each row contains a cyclic shift of 
s$, the last character of the row prefixed by 
s[k; +1,n—1] is s[k;]. Definition 1 then implies 
S[W(i)] = s[k;] = F[i] as claimed. Oo 


Lemma 2 /f/< i < j <nand Fii] = F{j], 
then V(i) < W(j). 


Proof Let s[k;,n — 1] (resp. s[k;, n — 1]) denote 
the suffix of s prefixing row i (resp. row /). 
The hypothesis i < j implies that s[ki,n — 
1] ~_ s[k;,n — 1]. The hypothesis F[i] = 
F{[j] implies s[k;] = s[k;]; hence, it must be 
s[k; + 1, —1] ~ s[k; +1, n — 1]. The thesis 
follows since by construction W(7) (resp. Y(/)) 
is the lexicographic position of the row prefixed 
by s[k; +1,n—1] (resp. s[k; +1,n-—1)). O 


Lemma 3 For any character c € &, if Fj] is 
the €-th occurrence of c in F,, then S{W(j)] is the 
£-th occurrence of c in 8. 


Proof Take an index h such that h < j and 
F lh] = F[j] = c(the case h > 7 is symmetric). 


Transform, Fig. 1 


mississippis $ mississipp i 
Example of ississippi$m i Smississip p 
Burrows-Wheeler : : : : : : ‘ ‘ 
transform for the string ssissippi$mi i ppifmissis s 
S = mississippi. The matrix sissippi$mis i ssippi$mis s 
on the right has the rows issippiSmiss i ssissippi$ m 
sorted in lexicographic ssippiSmissi m ississippi $ 
order. The output of the sippisSmissis a p iSmississi p 
bwt is the last column of ‘ ; : : ; : 2 ; 
the sorted matrix; in this ippifmississ p pismississ i 
example, the output is § = ppismississi Ss ippiSmissi s 
bwt(s) = ipssm$pissii piSmississip s issippiS$mi s 

iSmississipp s sippiSmiss i 

Smississippi s sissippiSm i 
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Lemma 2 implies U(h) < W(j) and Lemma 1 
implies s[W(h)] = s[W(j)] = c. Consequently, 
the number of c’s preceding (resp. following) 
F[j] in F coincides with the number of c’s 
preceding (resp. following) s[W(j)] in S and the 
lemma follows. Oo 


In Fig. 1 it is W(2) = 7 and both F[2] and 
S$[7] are the second i in their respective strings. 
This property is usually expressed by saying 
that corresponding characters maintain the same 
relative order in both strings F and S. 


Lemma 4 For anyi, (i) can be computed from 
S$ = bwt(s). 


Proof Retrieve F simply by sorting alphabeti- 
cally the symbols of §. Then compute W(i) as 
follows: (1) set c = F(i), (2) compute @ such 
that F[i] is the €-th occurrence of c in F, and 
(3) return the index of the £-th occurrence of c 
ins. Oo 


Referring again to Fig. 1, to compute (10) it 
suffices to set c = F[10] = s and observe that 
F[10] is the second s in F. Then it suffices to 
locate the index j of the second s in S, namely, 
j = 4. Hence, Y(10) = 4, and in fact row 10 
is prefixed by sissippi and row 4 is prefixed by 
issippi. 

Theorem 1 The original string s can be recov- 
ered from bwt(s). 


Proof Lemma 4 implies that the column F and 
the map YW can be retrieved from bwt(s). Let jo 
denote the index of the special character $ in 5. 
By construction, the row jo of the bwt matrix is 
prefixed by s[0,n — 1]; hence, s[0] = F'[jo]. Let 
ji = V(jo). By Definition 1 row /; is prefixed 
by s[1, n — 1]; hence, s[1] = F [ji]. Continuing 
in this way, it is straightforward to prove by 
induction that s[i] = F[W'(jo)], for i = 1...., 
n—-1. Oo 


Algorithmic Issues 

A remarkable property of the bwt is that both the 
direct and the inverse transform admit efficient 
algorithms that are extremely simple and elegant. 
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Theorem 2 Let s[1,n] be a string over a con- 
stant size alphabet X. String § = bwt(s) can be 
computed in O(n) time using O(nlog n) bits of 
working space. 


Proof The suffix array of s can be computed in 
O(n) time and O(nlog n) bits of working space 
by using, for example, the algorithm in [17]. 
The suffix array is an array of integers sa[1, n] 
such that fori = 1, ...,7,s8[sali], n — 1] is 
the i-th suffix of s in the lexicographic order. 
Since each row of M is prefixed by a unique 
suffix of s followed by the special symbol $, the 
suffix array provides the ordering of the rows in 
M. Consequently, bwt(s) can be computed from 
sa in linear time using the procedure sa2bwt of 
Fig. 2. oO 


Theorem 3 Let s[1,n] be a string over a con- 
stant size alphabet %. Given bwt(s), the string s 
can be retrieved in O(n) time using O(nlog n) 
bits of working space. 


Proof The algorithm for retrieving s follows al- 
most verbatim the procedure outlined in the proof 
of Theorem 1. The only difference is that, for 
efficiency reasons, all the values of the map WV 
are computed in one shot. This is done by the 
procedure bw/2psi in Fig.2. In bw#2psi instead 
of working with the column F, it uses the array 
count which is a “compact” representation of 
F.. At the beginning of the procedure, for any 
character c € &, count[c] provides the index of 
the first row of M prefixed by c. For example, 
in Fig. 1 count[i] = 1, count|m] = 5, and so 
on. In the main for loop of bw?2psi, the array bwt 
is scanned and count[c] is increased every time 
an occurrence of character c is encountered (line 
6). Line 6 also assigns to / the index of the ¢-th 
occurrence of c in F’. By Lemma 3, line 7 stores 
correctly in psi[h] the value i = W(h). After 
the computation of array psi, s is retrieved by 
using the procedure psi2text of Fig. 2, whose cor- 
rectness immediately follows from Theorem 1. 
Clearly, the procedures bwi2psi and psi2text in 
Fig. 2 run in O(n) time. Their working space is 
dominated by the cost of storing the array psi 
which takes O(nlog n) bits. 
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Procedure sa2bwt Procedure bwt2psi Procedure psi2text 
1. bwt[0]=s[n-1]; Tile i818 (AL =10)p al<< inp alee) 891. k = 30; i=0; 
Bo wore (daily a<ialg alse) 2 ie = loynae [lat || Pp 2, és 
3, ie (salt == i) 3, dele == "S")) 3, kk = fsa Pel ¢ 
4. bwt[i]=’$’'; 4. j0 =i; Ay s'il — bw ikl 
D3. else ». else while (i<n) ; 
Go love [at )=s)[Se\ (Lab) 10] ¢ 6. h = count[c]++; 
7. psifh]=i; 


Burrows-Wheeler Transform, Fig. 2 Algorithms for 
computing and inverting the Burrows-Wheeler transform. 
Procedure sa2bwt computes bwit(s) given s and its suf- 
fix array sa. Procedure bwt2psi takes bwt(s) as input 
and computes the W map storing it in the array psi. 
bwt2psi also stores in jo the index of the row prefixed 


The Burrows-Wheeler Compression 
Algorithm 

The rationale for using the bwt for data compres- 
sion is the following. Consider a string w that 
appears k times within s. In the bwr matrix of s, 
there will be k consecutive rows prefixed by w, 
say rows ry + 1,7, + 2,...,%w +k. Hence, the 
positions r,, + 1,...,7y +k of § = bwt(s)s will 
contain precisely the symbols that immediately 
precede w in s. If in s certain patterns are more 
frequent than others, then for many substrings w, 
the corresponding positions r, + 1,..., Ty + k 
of $ will contain only a few distinct symbols. 
For example, if s is an English text and w is 
the string his, the corresponding portion of § will 
likely contain many ¢’s and blanks and only a 
few other symbols. Hence, S$ is a permutation 
of s that is usually locally homogeneous, in that 
its “short” substrings usually contain only a few 
distinct symbols. (Obviously this is true only if s 
has some regularity: if s is a random string S$ will 
be random as well!) 

To take advantage of this property, Burrows 
and Wheeler proposed to process the string 5 us- 
ing move-to-front encoding [4] (mtf). mtf encodes 
each symbol with the number of distinct symbols 
encountered since its previous occurrence. To this 
end, mtf maintains a list of the symbols ordered 
by recency of occurrence; when the next symbol 
arrives, the encoder outputs its current rank and 
moves it to the front of the list. Note that mtf 
produces a string which has the same length as 


by s[0, 2 — 1]. bwt2psi uses the auxiliary array count [1, 
|X| ] which initially contains in count{i] the number of 
occurrences in bw#(s) of the symbols 1,..., 7 — 1. Finally, 
procedure psi2text recovers the string s given bwi(s), the 
array psi, and the value jo 


$ and, if § is locally homogeneous, the string 
mtf(S) will mainly consist of small integers. 
(If s is an English text, mtf(s) usually con- 
tains more that 50 % zeroes.) Given this skewed 
distribution, mtf(S)can be easily compressed: 
Burrows and Wheeler proposed to compress it 
using Huffman or Arithmetic coding, possibly 
preceded by the run-length encoding of runs of 
equal integers. 

Burrows and Wheeler were mainly interested 
in proposing an algorithm with good practical 
performance. Indeed their simple implementation 
outperformed, in terms of compression ratio, the 
tool gzip that was the current standard for lossless 
compression. A few years after the introduction 
of the bwt, [14, 18] have shown that the compres- 
sion ratio of the Burrows-Wheeler compression 
algorithm can be bounded in terms of the k-th 
order empirical entropy of the input string for any 
k > 0. For example, Kaplan et al. [14] showed 
that for any input string s and real yz > 1, the 
length of the compressed string is bounded by 
pnHy,(s) + nlog(é(4)) + Ug + O(log n) bits, 
where C(j) is the standard Zeta function and g; 
is a function depending only on k and the size 
of &. This bound holds pointwise for any string 
s, simultaneously for any k >O and w > 1, 
and it is remarkable since similar bounds have 
not been proven for any other known compressor. 
The theoretical study on the performance of bwt- 
based compressors is an active area of research. 
For more recent results, see [6, 12]. 
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Applications 


After the seminal paper of Burrows and Wheeler, 
many researchers have proposed compression 
algorithms based on the bwt (see [8, 9] and 
references therein). Of particular theoretical 
interest are the results in [10] showing that 
the bwt can be used to design a “compression 
booster,’ that is, a tool for improving the 
performance of other compressors in a well- 
defined and measurable way. 

Today the main area of application of the 
bwt is the design of Compressed Full-text 
Indexes [20]. These indexes take advantage of 
the relationship between the bwt and the suffix 
array to provide a compressed representation 
of a string supporting the efficient search and 
retrieval of the occurrences of an arbitrary 
pattern. 


Open Problems 


In addition to the investigation on the perfor- 
mance of bwt-based compressors, an open prob- 
lem of great practical significance is the space- 
efficient computation of the bwr. Given a string s 
of length n over an alphabet ©, both s and s = 
bwt(s) take O(nlog |X|) bits. Unfortunately, 
the linear time algorithms shown in Fig. 2 make 
use of auxiliary arrays (i.e., sa and WY) whose 
storage takes ©(nlog 7) bits. This poses a serious 
limitation to the size of the largest bwr that 
can be computed in main memory. The problem 
of space- and time-efficient computation of the 
bwt is still open, even if interesting results are 
reported in [1, 3,7, 13, 15, 19]. The problem of 
designing space-efficient algorithms for inverting 
the bwt is also open; see [7, 16,20] and references 
therein for further details. 


Experimental Results 


An experimental study of the performance of 
several compression algorithms based on the bwt 
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and a comparison with other state-of-the-art com- 
pressors is presented in [8]. 


Data Sets 


The data sets used in [8] are available from http:// 
people.unipmn.it/manzini/boosting. Other data 
sets relevant for compression and compressed 
indexing are available at the Pizza&Chili site 
http://pizzachili.di.unipi.it/. 


URL to Code 


The compression boosting page (http://people. 
unipmn.it/manzini/boosting) contains the source 
code of the algorithms tested in [8]. An 
extremely efficient code for the computation 
of the suffix array and the bwt (without 
compression) is available at http://code.google. 
com/p/libdivsufsort. The code of bzip2 is 
available at http://www.bzip.org. 
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Problem Definition 


The study of Pease, Shostak and Lamport was 
among the first to consider the problem of achiev- 
ing a coordinated behavior between processors 
of a distributed system in the presence of fail- 
ures [21]. Since the paper was published, this 
subject has grown into an extensive research 
area. Below is a presentation of the main find- 
ings regarding the specific questions addressed 
in their paper. In some cases this entry uses the 
currently accepted terminology in this subject, 
rather than the original terminology used by the 
authors. 


System Model 

A distributed system is considered to have n 
independent processors, Pp}, ... .Pn, each modeled 
as a (possibly infinite) state machine. The proces- 
sors are linked by a communication network that 
supports direct communication between every 
pair of processors. The processors can commu- 
nicate only by exchanging messages, where the 
sender of every message can be identified by the 
receiver. While the processors may fail, it is as- 
sumed that the communication subsystem is fail- 
safe. It is not known in advance which processors 
will not fail (remain correct) and which ones will 
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fail. The types of processor failures are classified 
according to the following hierarchy. 


Crash failure A crash failure means that the 
processor no longer operates (ad infinitum, 
starting from the failure point). In particular, 
other processors will not receive messages 
from a faulty processor after it crashes. 

Omission failure A processor fails to send and 
receive an arbitrary subset of its messages. 

Byzantine failure A faulty processor behaves 
arbitrarily. 


The Byzantine failure is further subdivided 
into two cases, according to the ability of the 
processors to create unfalsifiable signatures for 
their messages. In the authenticated Byzantine 
failure model it is assumed that each message is 
signed by its sender and that no other processor 
can fake a signature of a correct processor. Thus, 
even if such a message is forwarded by other 
processors, its authenticity can be verified. If the 
processors represent malevolent (human) users of 
a distributed system, a Public Key Infrastructure 
(PKI) is typically used to sign the messages 
(which involves cryptography related issues [17], 
not discussed here). Practically, in systems where 
processors are just “processors”, a simple sig- 
nature, such as CRC (cyclic redundancy check), 
might be sufficient [13]. In the unauthenticated 
Byzantine failure model there are no message 
signatures. 


Definition of the Byzantine Agreement 
Problem 

In the beginning, each processor p; has an exter- 
nally provided input value v;, from some set V (of 
at least size 2). In the Byzantine Agreement (BA) 
problem, every correct processor p; is required to 
decide on an output value d; € V such that the 
following conditions hold: 


Termination Eventually, p; decides, i.e., the al- 
gorithm cannot run indefinitely. 

Validity If the input value of all the processors 
is v, then the correct processors decide v. 

Agreement All the correct processors decide on 
the same value. 
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For crash failures and omission failures there 
exists a stronger agreement condition: 


Uniform Agreement No two processors (either 
correct or faulty) decide differently. 


The termination condition has the following 
stronger version. 


Simultaneous Termination All the correct pro- 
cessors decide in the same round (see defini- 
tion below). 


Timing Model 

The BA problem was originally defined for syn- 
chronous distributed systems [18, 21]. In this tim- 
ing model the processors are assumed to operate 
in lockstep, which allows to partition the execu- 
tion of a protocol to rounds. Each round consists 
of a send phase, during which a processor can 
send a (different) message to each processor 
directly connected to it, followed by a receive 
phase, in which it receives messages sent by these 
processors in the current round. Unlimited local 
computations (state transitions) are allowed in 
both phases, which models the typical situation 
in real distributed systems, where computation 
steps are faster than the communication steps by 
several orders of magnitude. 


Overview 
This entry deals only with deterministic algo- 
rithms for the BA problem in the synchronous 
model. For algorithms involving randomization 
see the » Optimal Probabilistic Synchronous 
Byzantine Agreement entry in this volume. For 
results on BA in other models of synchrony, 
see » Asynchronous Consensus Impossibility, 
Failure Detectors, » Consensus with Partial 
Synchrony entries in this volume. 


Key Results 


The maximum possible number of faulty pro- 
cessors is assumed to be bounded by an a pri- 
ori specified number ¢ (e.g., estimated from the 
failure probability of individual processor and 
the requirements on the failure probability of the 
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system as a whole). The number of processors 
that actually become faulty in a given execution 
is denoted by f, where f < t. 

The complexity of synchronous distributed 
algorithms is measured by three complementary 
parameters. The first is the round complexity, 
which measures the number of rounds required 
by the algorithm. The second is the message 
complexity, i.e., the total number of messages 
(and sometimes also their size in bits) sent by all 
the processors (in case of Byzantine failures, only 
messages sent by correct processors are counted). 
The third complexity parameter measures the 
number of local operations, as in sequential al- 
gorithms. 

All the algorithms presented bellow are ef- 
ficient, i.e., the number of rounds, the number 
of messages and their size, and the local oper- 
ations performed by each processor are polyno- 
mial in n. In most of the algorithms, both the 
exchanged messages and the local computations 
involve only the basic data structures (e.g., arrays, 
lists, queues). Thus, the discussion is restricted 
only to the round and the message complexities 
of the algorithms. 

The network is assumed to be fully connected, 
unless explicitly stated otherwise. 


Crash Failures 

A simple BA algorithm which runs in ¢+ 1 
rounds and sends O(n”) messages, together with 
a proof that this number of rounds is optimal, 
can be found in textbooks on distributed com- 
puting [19]. Algorithms for deciding in f + 1 
rounds, which is the best possible, are presented 
in [7, 23] (one additional round is necessary be- 
fore the processors can stop [11]). Simultaneous 
termination requires ¢ + | rounds, even if no fail- 
ures actually occur [11], however there exists an 
algorithm that in any given execution stops in the 
earliest possible round [14]. For uniform agree- 
ment, decision can be made in min( f + 2,f + 1) 
rounds, which is tight [7]. 

In case of crash failures it is possible to 
solve the BA problem with O(n) messages, 
which is also the lower bound. However, all 
known message-optimal BA algorithms require 
a superlinear time. An algorithm that runs in 
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O(f +1) rounds and uses only O(n polylog 
n) messages, is presented in [8], along with 
an overview of other results on BA message 
complexity. 


Omission Failures 

The basic algorithm used to solve the crash 
failure BA problem works for omission failures 
as well, which allows to solve the problem 
in ¢+1 rounds [23]. An algorithm which 
terminates in min(f +2,¢+1) rounds was 
presented in [22]. Uniform agreement is 
impossible for t > n/2 [23]. For t < n/2, there 
is an algorithm that achieves uniform agreement 
in min(f +2,t+1) rounds (and O(n? f) 
message complexity) [20]. 


Byzantine Failures with Authentication 

A (t+ 1)-round BA algorithm is presented 
in [12]. An algorithm which terminates in 
min(f + 2,¢ + 1) rounds can be found in [24]. 
The message complexity of the problem is 
analyzed in [10], where it is shown that 
the number of signatures and the number of 
messages in any authenticated BA algorithm 
are Q2(nt) and Q(n+4+t?), respectively. In 
addition, it is shown that (Q2(nt) is the 
bound on the number of messages for the 
unauthenticated BA. 


Byzantine Failures Without Authentication 

In the unauthenticated case, the BA problem 
can be solved if and only if n> 3t. The 
proof can be found in [1, 19]. An algorithm 
that decides in min(f +3,¢+ 1) rounds (it 
might require two additional rounds to stop) is 
presented in [16]. Unfortunately, this algorithm 
is complicated. Simpler algorithms, that run in 
min(2f +4,2t+1) and 3min(f + 2,f+4 1) 
rounds, are presented in [24] and [5], respec- 
tively. In these algorithms the number of sent 
messages is O(n3), moreover, in the latter 
algorithm the messages are of constant size (2 
bits). Both algorithms assume V = {0,1}. To 
solve the BA problem for a larger V, several 
instances of a binary algorithm can be run in 
parallel. Alternatively, there exists a simple 2- 
round protocol that reduces a BA problem with 
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arbitrary initial values to the binary case, e.g., see 
Sect. 6.3.3 in [19]. For algorithms with optimal 
O(nt) message complexity and tf + o(t) round 
complexity see [4, 9]. 


Arbitrary Network Topologies 

When the network is not fully connected, BA can 
be solved for crash, omission and authenticated 
Byzantine failures if and only if it is (¢ + 1)- 
connected [12]. In case of Byzantine failures 
without authentication, BA has a solution if and 
only if the network is (2t¢ + 1)-connected and 
n > 3t [19]. In both cases the BA problem can 
be solved by simulating the algorithms for the 
fully connected network, using the fact that the 
number of disjoint communication paths between 
any pair of non-adjacent processors exceeds the 
number of faulty nodes by an amount that is 
sufficient for reliable communication. 


Interactive Consistency and Byzantine 
Generals 

The BA (consensus) problem can be stated in 
several similar ways. Two widely used variants 
are the Byzantine Generals (BG) problem and 
the Interactive Consistency (IC) problem. In the 
BG case there is a designated processor, say p1, 
which is the only one to have an input value. 
The termination and agreement requirements of 
the BG problem are exactly as in BA, while the 
validity condition requires that if the input value 
of p; is v and p, is correct, then the correct pro- 
cessors decide v. The IC problem is an extension 
of BG, where every processor is “designated”, so 
that each processor has to decide on a vector of 
values, where the conditions for the i-th entry are 
as in BG, with p; as the designated processor. For 
deterministic synchronous algorithms BA, BG 
and IC problems are essentially equivalent, e.g., 
see the discussion in [15]. 


Firing Squad 

The above algorithms assume that the processors 
share a “global time”, i.e., all the processors start 
in the same (first) round, so that their round 
counters are equal throughout the execution of 
the algorithm. However, there are cases in which 
the processors run in a synchronous network, yet 
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each processor has its own notion of time (e.g., 
when each processor starts on its own, the round 
counter values are distinct among the processors). 
In these cases, it is desirable to have a proto- 
col that allows the processors to agree on some 
specific round, thus creating a common round 
which synchronizes all the correct processors. 
This synchronization task, known as the Byzan- 
tine firing squad problem [6], is tightly realted 
to BA. 


General Translation Techniques 

One particular direction that was pursued as part 
of the research on the BA problem is the devel- 
opment of methods that automatically translate 
any protocol that tolerates a more benign failure 
type into one which tolerates more severe fail- 
ures [24]. Efficient translations spanning the en- 
tire failure hierarchy, starting from crash failures 
all the way to unauthenticated Byzantine failures, 
can be found in [3] and in Ch. 12 of [1]. 


Applications 


Due to the very tight synchronization assump- 
tions made in the algorithms presented above, 
they are used mainly in real-time, safety-critical 
systems, e.g., aircraft control [13]. In fact, the 
original interest of Pease, Shostak and Lamport 
in this problem was raised by such an appli- 
cation [21]. In addition, BA protocols for the 
Byzantine failure case serve as a basic building 
block in many cryptographic protocols, e.g., se- 
cure multi-party computation [17], by providing 
a broadcast channel on top of pairwise communi- 
cation channels. 
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Problem Definition 


Computers contain a hierarchy of memory levels, 
with vastly differing access times. Hence, the 
time for a memory access depends strongly on 
what is the innermost level containing the data 
accessed. In analysis of algorithms, the standard 
RAM (or von Neumann) model cannot capture 
this effect, and external memory models were 
introduced to better model the situation. The most 
widely used of these models is the two-level 
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1/O-model [4], also called the external memory 
model or the disk access model. The I/O-model 
approximates the memory hierarchy by modeling 
two levels, with the inner level having size M, 
the outer level having infinite size, and transfers 
between the levels taking place in blocks of 
B consecutive elements. The cost of an algorithm 
is the number of such transfers it makes. 

The cache-oblivious model, introduced by 
Frigo et al. [26], elegantly generalizes the I/O- 
model to a multilevel memory model by a simple 
measure: the algorithm is not allowed to know 
the value of B and M. More precisely, a cache- 
oblivious algorithm is an algorithm formulated 
in the RAM model, but analyzed in the I/O- 
model, with an analysis valid for any value of 
B and M. Cache replacement is assumed to 
take place automatically by an optimal off-line 
cache replacement strategy. Since the analysis 
holds for any B and M, it holds for all levels 
simultaneously (for a detailed version of this 
statement, see [26]). 

The subject of the present chapter is that 
of efficient dictionary structures for the cache- 
oblivious model. 


Key Results 


The first cache-oblivious dictionary was given 
by Prokop [32], who showed how to lay 
out a static binary tree in memory such that 
searches take O(loggm) memory transfers. 
This layout, often called the van Emde Boas 


262 


layout because it is reminiscent of the classic 
van Emde Boas data structure, also ensures that 
range searches take O(loggn + k/B) memory 
transfers [8], where k is the size of the output. 
Both bounds are optimal for comparison-based 
searching. 

The first dynamic, cache-oblivious dictionary 
was given by Bender et al. [13]. Making use of a 
variant of the van Emde Boas layout, a density 
maintenance algorithm of the type invented by 
Itai et al. [28], and weight-balanced B-trees [5], 
they arrived at the following results: 


Theorem 1 ({13]) There is a cache-oblivious 
dictionary structure supporting searches in 
O(loggn) memory transfers and_ insertions 
and deletions in amortized O(loggn) memory 
transfers. 


Theorem 2 ([13]) There is a cache-oblivious 
dictionary structure supporting searches in 
O(loggn) memory transfers, insertions and 
deletions in amortized O(logpn + (log? n)/B) 
memory transfers, and range searches in 
O(loggn + k/B) memory transfers, where k 
is the size of the output. 


Later, Bender et al. [10] developed a cache- 
oblivious structure for maintaining linked lists 
which supports insertion and deletion of elements 
in O(1) memory transfers and scanning of k con- 
secutive elements in amortized O(k/B) mem- 
ory transfers. Combining this structure with the 
structure of the first theorem above, the following 
result can be achieved. 


Theorem 3 ({10,13]) There is a cache-oblivious 
dictionary structure supporting searches in 
O(loggn) memory transfers, insertions and 
deletions in amortized O(loggn) memory 
transfers, and range searches in amortized 
O(loggn + k/B) memory transfers, where k 
is the size of the output. 


A long list of extensions of these basic cache- 
oblivious dictionary results has been given. We 
now survey these. 

Bender et al. [12] and Brodal et al. [20] gave 
very similar proposals for reproducing the result 
of Theorem 2 but with simpler structures (avoid- 
ing the use of weight-balanced B-trees). Based 
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on exponential trees, Bender et al. [11] gave a 
proposal with O(logg 1) worst-case queries and 
updates. They also gave a solution with partial 
persistence, where searches (in all versions of 
the structure) and updates (in latest version of 
the structure) require amortized O(log g(m +7)) 
memory transfers, where m is the number of 
versions and 7 is the number of elements in the 
version operated on. Bender et al. [14] extended 
the cache-oblivious model to a concurrent set- 
ting and gave three proposals for cache-oblivious 
B-trees in this setting. Bender et al. [16] pre- 
sented cache-oblivious dictionary structures ex- 
ploring trade-offs between faster insertion costs 
and slower search cost, and Brodal et al. [21] 
later gave improved structures meeting lower 
bounds. Franceschini and Grossi [25] showed 
how to achieve O(log g 7) worst-case queries and 
updates while using O(1) space besides the space 
for the n elements stored. Brodal and Kejlberg- 
Rasmussen [19] extended this to structures adap- 
tive to the working set bound and allowing prede- 
cessor queries. Cache-oblivious dictionaries for 
other data types such as strings [15, 18, 22-24, 
27] and geometric data [1, 2,6, 7,9] have been 
given. The expected number of I/Os for hash- 
ing was studied in the cache-oblivious model 
in [31]. 

It has been shown [17] that the best possible 
multiplicative constant in the O(logg 7) search 
bound for comparison-based searching is differ- 
ent in the I/O-model and in the cache-oblivious 
model. It has also been shown [1,3] that for three- 
sided range reporting in 2D, the best possible 
space bound for structures with worst-case opti- 
mal query times is different in the two models. 
The latter result implies that linear space cache- 
oblivious persistent B-trees with optimal worst- 
case bounds for (1D) range reporting are not 
possible. 


Applications 


Dictionaries solve a fundamental data structuring 
problem which is part of solutions for a very high 
number of computational problems. Dictionaries 
for external memory are useful in settings 
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where memory accesses are dominating the 
running time, and cache-oblivious dictionaries 
in particular stand out by their ability to 
optimize the access to all levels of an unknown 
memory hierarchy. This is an asset, e.g., when 
developing programs to be run on diverse or 
unknown architectures (such as software libraries 
or programs for heterogeneous distributed 
computing like grid computing and projects 
such as SETI@home). Even on a single, known 
architecture, the memory parameters available to 
a computational process may be nonconstant if 
several processes compete for the same memory 
resources. Since cache-oblivious algorithms are 
optimized for all parameter values, they have the 
potential to adapt more graceful to these changes 
and also to varying input sizes forcing different 
memory levels to be in use. 


Open Problems 


It is an open problem to find a data structure 
achieving worst-case versions of all of the bounds 
in Theorem 3. 


Experimental Results 


Cache-oblivious dictionaries have been studied 
empirically in [12, 15,20, 29, 30, 33]. The overall 
conclusion of these investigations is that cache- 
oblivious methods easily can outperform RAM 
algorithms, although sometimes not as much as 
algorithms tuned to the specific memory hierar- 
chy and problem size at hand. On the other hand, 
cache-oblivious algorithms seem to perform well 
on all levels of the memory hierarchy and to be 
more robust to changing problem sizes. 
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Problem Definition 


The memory system of contemporary computers 
consists of a hierarchy of memory levels, with 
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Cache-Oblivious Model, Fig. 1 The memory hierarchy 


each level acting as a cache for the next; a typical 
hierarchy consists of registers, level 1 cache, 
level 2 cache, level 3 cache, main memory, and 
disk (Fig. 1). One characteristics of the hierarchy 
is that the memory levels get larger and slower the 
further they get from the processor, with the ac- 
cess time increasing most dramatically between 
RAM memory and disk. Another characteristics 
is that data is moved between levels in blocks of 
consecutive elements. 

As a consequence of the differences in 
access time between the levels, the cost of a 
memory access depends highly on what is the 
current lowest memory level holding the element 
accessed. Hence, the memory access pattern of an 
algorithm has a major influence on its practical 
running time. Unfortunately, the RAM model 
(Fig. 2) traditionally used to design and analyze 
algorithms is not capable of capturing this, as 
it assumes that all memory accesses take equal 
time. 

To better account for the effects of the 
memory hierarchy, a number of computational 
models have been proposed. The simplest and 
most successful is the two-level I/O-model 
introduced by Aggarwal and Vitter [3] (Fig. 3). 
In this model a two-level memory hierarchy is 
assumed, consisting of a fast memory of size 
M and a slower memory of infinite size, with 
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data transferred between the levels in blocks 
of B consecutive elements. Computation can 
only be performed on data in the fast memory, 
and algorithms are assumed to have complete 
control over transfers of blocks between the 
two levels. Such a block transfer is denoted a 
memory transfer. The complexity measure is the 
number of memory transfers performed. The 
strength of the I/O-model is that it captures 
part of the memory hierarchy while being 
sufficiently simple to make design and analysis 
of algorithms feasible. Over the last two decades, 
a large body of results for the I/O-model 
has been produced, covering most areas of 
algorithmics. For an overview, see the surveys 
[5, 32, 34-36]. 


266 


Cache-Oblivious Model, 
Fig. 3 The I/O-model 


Cache-Oblivious Model, 
Fig.4 The 
cache-oblivious model 


More elaborate models of multilevel memory 
have been proposed (see e.g., [34] for an 
overview) but these models have been less 
successful than the I/O-model, mainly because 
of their complexity which makes analysis of 
algorithms harder. All these models, including 
the I/O-model, assume that the characteristics of 
the memory hierarchy (the level and block sizes) 
are known. 

In 1999 the cache-oblivious model (Fig. 4) was 
introduced by Frigo et al. [30]. In short, a cache- 
oblivious algorithm is an algorithm formulated in 
the RAM model but analyzed in the I/O-model, 
with the analysis required to hold for any block 
size B and memory size M. Memory transfers 
are assumed to take place automatically by an 
optimal off-line cache replacement strategy. 

The crux of the cache-oblivious model is that 
because the I/O-model analysis holds for any 
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block and memory size, it holds for all levels of 
a multilevel memory hierarchy (see [30] for a de- 
tailed version of this statement). Put differently, 
by optimizing an algorithm to one unknown level 
of the memory hierarchy, it is optimized to all 
levels simultaneously. Thus, the cache-oblivious 
model elegantly generalizes the I/O-model to a 
multilevel memory model by one simple mea- 
sure: the algorithm is not allowed to know the 
value of B and M. The challenge, of course, is to 
develop algorithms having good memory transfer 
analyses under these conditions. 

Besides capturing the entire memory hierar- 
chy in a conceptually simple way, the cache- 
oblivious model has other benefits: Algorithms 
developed in the model do not rely on know- 
ing the parameters of the memory hierarchy, 
which is an asset when developing programs 
to be run on diverse or unknown architectures 
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(e.g., software libraries or programs for hetero- 
geneous distributed computing such as grid com- 
puting and projects like SETI@home). Even ona 
single, known architecture, the memory parame- 
ters available to a computational process may be 
nonconstant if several processes compete for the 
same memory resources. Since cache-oblivious 
algorithms are optimized for all parameter values, 
they have the potential to adapt more graceful to 
these changes. Also, the same code will adapt 
to varying input sizes forcing different memory 
levels to be in use. Finally, cache-oblivious al- 
gorithms automatically are optimizing the use 
of translation lookaside buffers (a cache holding 
recently accessed parts of the page table used for 
virtual memory) of the CPU, which may be seen 
as second memory hierarchy parallel to the one 
mentioned in the introduction. 

Possible weak points of the cache-oblivious 
model are the assumption of optimal off-line 
cache replacement and the lack of modeling of 
the limited associativity of many of the levels 
of the hierarchy. The first point is mitigated by 
the fact that normally, the provided analysis of 
a proposed cache-oblivious algorithm will work 
just as well assuming a least recently used cache 
replacement policy, which is closer to actual 
replacement strategies of computers. The second 
point is also a weak point of most other memory 
models. 


Key Results 


This section surveys a number of the known re- 
sults in the cache-oblivious model. Other surveys 
available include [6, 15, 26, 32]. 

First of all, note that scanning an array of N 
elements takes O(N/B) memory transfers for 
any values of B and M and hence is an optimal 
cache-oblivious algorithm. Thus, standard RAM 
algorithms based on scanning may already posses 
good analysis in the cache-oblivious model — for 
instance, the classic deterministic linear time se- 
lection algorithm has complexity O(N/B) [26]. 

For sorting, a fundamental fact in the I/O- 
model is that comparison-based sorting of N el- 
ements takes ©(Sort(N)) memory transfers [3], 
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where Sort(V) = © logm/B ¥. Also in the 
cache-oblivious model, sorting can be carried out 
in ©(Sort(N)) memory transfer, if one makes 
the so-called tall cache assumption M > B1*® 
[16,30]. Such an assumption has been shown to 
be necessary [18], which proves a separation in 
power between cache-oblivious algorithms and 
algorithms in the I/O-model (where this assump- 
tion is not needed for the sorting bound). 

For searching, B-trees have cost O(log, N), 
which is optimal in the I/O-model for comparison- 
based searching. This cost is also attainable in the 
cache-oblivious model, as shown for the static 
case in [33] and for the dynamic case in [12]. 
Also for searching, a separation between cache- 
oblivious algorithms and algorithms in the I/O- 
model has been shown [13] in the sense that the 
constants attainable in the O(loggz N) bound are 
provably different. 

By now, a large number of cache-oblivious 
algorithms and data structures in many areas 
have been given. These include priority queues 
[7, 17]; many dictionaries for standard data, string 
data, and geometric data (see survey in section 
on cache-oblivious B-trees); and algorithms for 
other problems in computational geometry [8, 
16, 22], for graph problems [4, 7, 19, 23, 31], for 
scanning dynamic sets [9], for layout of static 
trees [11], for search problems on multi-sets [28], 
for dynamic programming [14, 24], for adaptive 
sorting [20], for inplace sorting [29], for sorting 
of strings [27], for partial persistence [10], for 
matrix operations [30], and for the fast Fourier 
transform [30]. 

In the negative direction, a few further sep- 
arations in power between cache-oblivious al- 
gorithms and algorithms in the I/O-model are 
known. Permuting in the I/O-model has complex- 
ity O(min{Sort(V), N}), assuming that elements 
are indivisible [3]. It has been proven [18] that 
this asymptotic complexity cannot be attained in 
the cache-oblivious model. A separation with re- 
spect to space complexity has been proven [2] for 
three-sided range reporting in 2D where the best 
possible space bound for structures with worst- 
case optimal query times is different in the two 
models. This result also implies that linear space 
cache-oblivious persistent B-trees with optimal 
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worst-case bounds for (1D) range reporting are 
not possible. 


Applications 


The cache-oblivious model is a means for design 
and analysis of algorithms that use the memory 
hierarchy of computers efficiently. 


Experimental Results 


Cache-oblivious algorithms have been evaluated 
empirically in a number of areas, including 
sorting [21], searching (see survey in section 
on cache-oblivious B-trees), matrix algorithms 
[1, 30,37], and dynamic programming [24, 25]. 

The overall conclusion of these investigations 
is that cache-oblivious methods often outperform 
RAM algorithms but not always exactly as much 
as do algorithms tuned to the specific memory 
hierarchy and problem size. On the other hand, 
cache-oblivious algorithms seem to perform well 
on all levels of the memory hierarchy and to 
be more robust to changing problem sizes than 
cache-aware algorithms. 
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Problem Definition 


Sorting a set of elements is one of the most 
well-studied computational problems. In the 
cache-oblivious setting, the first study of sorting 
was presented in 1999 in the seminal paper 
by Frigo et al. [8] that introduced the cache- 
oblivious framework for developing algorithms 
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aimed at machines with (unknown) hierarchical 
memory. 


Model 

In the cache-oblivious setting, the computational 
model is a machine with two levels of memory: 
a cache of limited capacity and a secondary 
memory of infinite capacity. The capacity of the 
cache is assumed to be M elements, and data 
is moved between the two levels of memory in 
blocks of B consecutive elements. Computations 
can only be performed on elements stored in 
cache, i.e., elements from secondary memory 
need to be moved to the cache before operations 
can access the elements. Programs are written 
as acting directly on one unbounded memory, 
i.e., programs are like standard RAM programs. 
The necessary block transfers between cache and 
secondary memory are handled automatically by 
the model, assuming an optimal offline cache 
replacement strategy. The core assumption of 
the cache-oblivious model is that M and B are 
unknown to the algorithm, whereas in the related 
I/O model introduced by Aggarwal and Vitter [1], 
the algorithms know M and B, and the algo- 
rithms perform the block transfers explicitly. A 
thorough discussion of the cache-oblivious model 
and its relation to multilevel memory hierarchies 
is given in [8]. 


Sorting 

For the sorting problem, the input is an array of 
N elements residing in secondary memory, and 
the output is required to be an array in secondary 
memory, storing the input elements in sorted 
order. 


Key Results 


In the I/O model, tight upper and lower bounds 
were proved for the sorting problem and 
the problem of permuting an array [1]. In 
particular it was proved that sorting requires 
10) (4 logu/B *) block transfers and permuting 
an array requires Q (min {N, $ logayz MY) 
block transfers. Since lower bounds for the 
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I/O model also hold for the cache-oblivious 
model, the lower bounds from [1] immediately 
give a lower bound of O.(F loguyp a) 
block transfers for cache-oblivious sorting and 
O (min {N, i logu/B vy) block transfers for 
cache-oblivious permuting. The upper bounds 
from [1] cannot be applied to the cache-oblivious 
setting since these algorithms make explicit use 
of B and M. 

Binary mergesort performs O(N log, N) 
comparisons, but analyzed in the cache-oblivious 
model, it performs O (4 log, 7) block transfers 
which is a factor O(log 4) from the lower 
bound (assuming a recursive implementation 
of binary mergesort, in order to get M in the 
denominator in the log N/M part of the bound 
on the block transfers). Another comparison- 
based sorting algorithm is the classical quicksort 
sorting algorithm from 1962 by Hoare [9] that 
performs expected O(N log, N) comparisons 
and expected O (F log, +) block transfers. 
Both these algorithms achieve their relatively 
good performance for the number of block 
transfers from the fact that they are based 
on repeated scanning of arrays — a property 
not shared with, e.g., heapsort [10] that has 
a very poor performance of © (4 logy, By) 
block transfers. In the I/O model, the optimal 
performance of O (4 logy)g 4) is achieved 
by generalizing binary mergesort to © (4f)-way 
mergesort [1]. 

Frigo et al. in [8] presented two cache- 
oblivious sorting algorithms (which can also 
be used to permute an array of elements). 
The first algorithm [8, Section 4] is denoted 
as funnelsort and is a reminiscent of classical 
binary mergesort, whereas the second algorithm 
[8, Section 5] is a distribution-based sorting 
algorithm. Both algorithms perform optimal 
O (4 logyr;z 4) block transfers — provided a 
tall cache assumption M = Q(B?) is satisfied. 


Funnelsort 

The basic idea of funnelsort is to rearrange the 
sorting process performed by binary mergesort, 
such that the processed data is stored “locally.” 
This is achieved by two basic ideas: (1) a 
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Cache-Oblivious Sorting, Fig. 1 The overall recursion of funnelsort (/eft) and a 16-merger (right) 


top-level recursion that partitions the input into 
N1/3 sequences of size N?/? funnelsorts these 
sequences recursively and merges the resulting 
sorted subsequences using an N!/3-merger. (2) 
A k-merger is recursively defined to perform 
binary merging of k input sequences in a clever 
schedule with an appropriate recursive layout of 
data in memory using buffers to hold suspended 
merging processes (see Fig. 1). Subsequently two 
simplifications were made, without sacrificing 
the asymptotic number of block transfers 
performed. In [3], it was proved that the binary 
merging could be performed lazily, simplifying 
the scheduling of merging. In [5], it was further 
observed that the recursive layout of k-mergers 
is not necessary. It is sufficient that a k-merger is 
stored in a consecutive array, i.e., the buffers can 
be laid out in an arbitrary order which simplifies 
the construction algorithm for the k-merger. 


Implicit Cache-Oblivious Sorting 

Franceschini in [7] showed how to perform opti- 
mal cache-oblivious sorting implicitly using only 
O(1) space, i.e., all data is stored in the input 
array except for O(1) additional words of infor- 
mation. In particular the output array is just a 
permutation of the input array. 


The Role of the Tall Cache Assumption 
The role of the tall cache assumption on cache- 
oblivious sorting was studied by Brodal and 


Fagerberg in [4]. If no tall cache assumption is 
made, they proved the following theorem: 


Theorem 1 ((4], Corollary 3) Let B} = 1 
and By = M/2. For any cache-oblivious 
comparison-based sorting algorithm, let ty and tz 
be upper bounds on the number of I/Os performed 
for block sizes By and Bo. If for a real number 
d > 0, it is satisfied that tz = d - ae loguyp> be 
then t; > 1/8- N log, N/M. 


The theorem shows that cache-oblivious 
comparison-based sorting without a tall cache 
assumption cannot match the performance of 
algorithms in the I/O model where M and B 
are known to the algorithm. It also has the 
natural interpretation that if a cache-oblivious 
algorithm is required to be I/O optimal for the 
case B = M/2, then binary mergesort is best 
possible — any other algorithm will be the same 
factor of O(log M) worse than the optimal block 
transfer bound for the case M >> D. 

For the related problem of permuting an array, 
the following theorem states that for all possible 
tall cache assumptions B < M 5 no cache- 
oblivious permuting algorithm exists with a block 
transfer bound (even only in the average case 
sense), matching the worst case bound in the I/O 
model. 


Theorem 2 ([4], Theorem 2) For all 5 > 0, 
there exists no cache-oblivious algorithm for 
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permuting that for all M > 2B and 1 < 
B < M? achieves O (min {N, s logu/B ¥) 
I/Os averaged over all possible permutations of 
size N. 


Applications 


Many problems can be reduced to cache- 
oblivious sorting. In particular Arge et al. [2] 
developed a cache-oblivious priority queue based 
on a reduction to sorting. They furthermore 
showed how a cache-oblivious priority queue can 
be applied to solve a sequence of graph problems, 
including list ranking, BFS, DFS, and minimum 
spanning trees. 

Brodal and Fagerberg in [3] showed how to 
modify the cache-oblivious lazy funnelsort al- 
gorithm to solve several problems within com- 
putational geometry, including orthogonal line 
segment intersection reporting, all the nearest 
neighbors, 3D maxima problem, and batched 
orthogonal range queries. All these problems can 
be solved by a computation process very similarly 
to binary mergesort with an additional problem- 
dependent twist. This general framework to solve 
computational geometry problems is denoted as 
distribution sweeping. 


Open Problems 


Since the seminal paper by Frigo et al. [8] intro- 
ducing the cache-oblivious framework, there has 
been a lot of work on developing algorithms with 
a good theoretical performance, but only a limited 
amount of work has been done on implementing 
these algorithms. An important issue for future 
work is to get further experimental results con- 
solidating the cache-oblivious model as a relevant 
model for dealing efficiently with hierarchical 
memory. 


Experimental Results 


A detailed experimental study of the cache- 
oblivious sorting algorithm funnelsort was 
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performed in [5]. The main result of [5] is 
that a carefully implemented cache-oblivious 
sorting algorithm can be faster than a tuned 
implementation of quicksort already for input 
sizes well within the limits of RAM. The 
implementation is also at least as fast as the 
recent cache-aware implementations included 
in the test. On disk, the difference is even 
more pronounced regarding quicksort and the 
cache-aware algorithms, whereas the algorithm is 
slower than a careful implementation of multiway 
mergesort optimized for external memory such 
as in TPIE [6]. 


URL to Code 


http://kristoffer. vinther.name/projects/funnelsort/ 
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Background 


In scientific computing and related fields, math- 
ematical functions are often approximated on 
meshes where each mesh cell contains a local 
approximation (e.g., using polynomials) of the 
represented quantity (density functions, physical 
quantities such as temperature or pressure, etc.). 
The grid cells may adaptively refine within ar- 
eas of high interest or where the applied nu- 
merical algorithms demand improved resolution. 
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The resolution even may dynamically change 
throughout the computation. 

In this context, we consider tree-structured 
adaptive meshes, i.e., meshes that result from a 
recursive subdivision of grid cells. They can be 
represented via trees — quadtrees or octrees being 
the most prominent examples. In typical problem 
settings, quantities are stored on entities (vertices, 
edges, faces, cells) of the grid. The computation 
of these variables is usually characterized by 
local interaction rules and involves variables of 
adjacent grid cells only. Hence, efficient algo- 
rithms are required for the (parallel) traversal 
of such tree-structured grids and their associated 
variables. 


Problem Definition 


Consider a hierarchical mesh of grid cells (trian- 
gles, squares, tetrahedra, cubes, etc.), in which 
all grid cells result from recursively splitting 
an existing grid cell into a fixed number k of 
geometrically similar child cells. The resulting 
grid is equivalent to a tree with uniform degree k. 
We refer to it as a spacetree. Special cases are 
quadtrees (based upon squares, i.e., dimension 
d = 2, and k = 4) and octrees (cubes, d = 3, 
and k = 8). Depending on the problem, only the 
mesh defined by the leaves of the tree may be of 
interest, or a multilevel grid may be considered. 
The latter also includes all cells corresponding 
to interior nodes of the tree. Also, meshes re- 
sulting from a collection of spacetrees may be 
considered. Such a generalized data structure is 
called a forest of spacetrees. A mathematical 
function shall be defined on such a mesh via 
coefficients that are associated with entities (ver- 
tices, edges, faces, cells) of the grid cells. Each 
coefficient contributes to the representation of the 
mathematical function in the grid cells adjacent 
to its entity. For typical computations, we then 
require efficient algorithms for mesh traversals 
processing all unknowns on entities. 


Mesh traversal (definition): Run through all 
leaf, i.e., unrefined, grid cells and process all 
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function coefficients associated to each cell or 
to entities adjacent to it. 

Multiscale traversal (definition): Perform one 
mesh traversal including all (or a certain 
subset of the) grid cells (tree-interior and leaf 
cells), thus processing the coarse-grid cells of 
the grid hierarchy as well. 


Mesh traversals may be used to define a se- 
quential order (linearization) on the mesh cells 
or coefficients. Sequential orders that preserve 
locality may be used to define partitions for 
parallel processing and load balancing. Of special 
interest are algorithms that minimize the memory 
accesses during traversals as well as the memory 
required to store the tree-structured grids. 


Key Results 


Space-Filling Curve Orders on 
Tree-Structured Grids 

Space-filling curves (SFCs) [1,5] are continuous 
surjective mappings from a one-dimensional in- 
terval to a higher-dimensional target domain (typ- 
ically squares, cubes, etc.). They are constructed 
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Cache-Oblivious Spacetree Traversals, Fig. 1 
Recursively structured triangular mesh and corresponding 
binary tree with bitstream encoding. The illustrated 
iteration of a Sierpinski SFC defines a sequential order 
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Hojn; Kosn; Vosn; illustration for Vo/n is skipped) to 
construct the Sierpinski curve. The nonterminals classify 
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via an “infinite” recursion process analogous to 
the generation of tree-structured meshes. Space- 
filling curves thus induce a sequential order on a 
corresponding tree-structured grid (an example is 
given in Fig. 1). 

The construction of the curve may be de- 
scribed via a grammar, in which the nonterminals 
reflect the local orientation of the curve within 
a grid cell (e.g., Fig.2). Terminals are used to 
indicate transfers between grid cells or levels. 

Together with this grammar, a_ bitstream 
encoding of the refinement information (as in 
Fig. 1) provides a minimal-memory data structure 
to encode a given tree-structured grid. Using 
Hilbert or Lebesgue (Morton order) SFCs, for 
example, respective algorithms can be formulated 
for quadtrees and octrees. Peano curves lead to 
traversals for (hyper)cube-based spacetrees with 
3-refinement along each dimension. 


Space-Filling-Curve Traversals 

Depth-first traversals of the SFC/bitstream- 
encoded tree visit all leaf cells of the tree- 
structured grid in space-filling order (SFC 
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\ \ of \ 
a) 0 0 0 fi 0 
00 00 
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on the leaf cells (equivalent to a depth-first traversal of the 
tree) and classifies vertices into groups left (@) and right 
(C) of the curve. Vertices A, B, C, and D are visited in 
last-in-first-out order during the traversal 


for each vertex and edge of a cell, whether it is located 
left (e) or right (C1) of the curve. The old/ew labels 
indicate whether the grid cell adjacent to this edge occurs 
earlier/later in the Sierpinski order 
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traversal) sequentially. Cell-local data can be 
held in a stream. All other entities (in 2D: vertices 
and edges) are to be processed by all adjacent 
grid cells, i.e., are processed multiple times. For 
them, a storage scheme for intermediate values 
or repeated access is required. Figure | illustrates 
that the SFC induces a left/right classification 
of these entities. During SFC traversals, these 
entities are accessed in a LIFO fashion, such that 
intermediate values can be stored on two stacks 
(eft vs. right). Local access rules may be inferred 
from an augmented grammar (as in Fig. 2). 
While the left/right classification determines 
the involved stack, the old/new classification 
determines whether entities are accessed for the 
first time during traversal (first touch) or have 
been processed by all adjacent cells (dast touch). 
First and last touch trigger loading and storing 
the respective variables from/onto data streams. 
These stack properties hold for several space- 
filling curves. 


SFC Traversals in the Cache-Oblivious 

Model 

Memory access in SFC traversals is restricted 
to stack and stream accesses. Random access to 
memory is entirely avoided. Thus, the number 
of cache and memory accesses can be accurately 
described by the I/O or cache-oblivious model 
(see Cross-References). For the 2D case, assume 
that for a subtree with K grid cells, the number of 
edges on the boundary of the respective subgrid is 
O(/K) — which is always satisfied by regularly 
refined meshes. It can be shown that if we choose 
K such that all boundary elements fit into cache 
(size: M words), only boundary edges will cause 
non-compulsory cache misses [2]. For an entire 
SFC traversal, the number of cache misses is 
O (43z)- which is asymptotically optimal (B is 
the number of words per cache line). For adap- 
tively refined meshes, it is an open question what 
kind of restrictions are posed on the mesh by the 
O(/K) criterion. While it is easy to construct 
degenerate grids that violate the condition, it is 
interesting whether grids that result from useful 
refinement processes (with physically motivated 
refinement criteria) tend to satisfy the O(\K) 
requirement. 
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Multiscale Depth-First and Breadth-First 
Traversals 

Multiscale traversals find applications in mul- 
tiphysics simulations, where different physical 
models are used on the different levels, as well as 
in (additive) multigrid methods or data analysis, 
where the different grid levels hold data in differ- 
ent resolutions. If algorithms compute not only 
results on one level, it is typically sufficient to 
have two levels available throughout the traversal 
at one time. Multiscale algorithms then can be 
constructed recursively. 

As variables exist on multiple levels, their 
(intermediate) access scheme is more elaborate 
than for a pure SFC traversal. A stack-based 
management is trivial if we apply one set of 
stacks per resolution level. Statements on cache 
obliviousness then have to be weakened, as the 
maximum number of stacks is not resolution in- 
dependent anymore. They depend on the number 
of refinement levels. For depth-first and Peano, 
2d + 2 stacks have been proven to be suffi- 
cient (d the spatial dimension). Such a multiscale 
scheme remains cache oblivious independent of 
the refinement. It is unknown though doubtful 
whether schemes for other curves and depth-first 
traversal exist that allow accesses to unknowns 
using a resolution-independent number of stacks 
for arbitrary d. 


Toward Parallel Tree Traversals 
Data decomposition is the predominant paral- 
lelization paradigm in scientific computing: op- 
erations are executed on different sets of data 
in parallel. For distinct sets, the parallelization 
does not require any synchronization mechanism. 
For spacetree data structures, distinct pieces of 
data are given by spacetree cells that do not 
share a vertex. A parallel data traversal then can 
be rewritten as a mesh traversal where (in the 
parallel traversal phases) succeeding cells along 
the traversal do not share grid entities — a con- 
tradiction to connected space-filling curves. For 
three-partitioning in 2D, such a reordering allows 
a maximum concurrency level of four (see Fig. 3). 
For breadth-first traversals, a reordering is 
trivial if we drop the space-filling curve ordering 
and instead reorder all leaves of one level to ob- 
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Cache-Oblivious Spacetree Traversals, Fig. 3 Peano space-filling curve with numbering on a level n + 1 (left). The 
ordering then is rearranged to have a concurrency level of four (right) illustrated via different shades of gray 


tain the highest concurrency level. For depth-first 
traversals, in contrast, the maximum concurrency 
level is strictly bounded, even if we drop the SFC 
paradigm. It remains one for all bipartitioning 
schemes, is at most 2@ for three-partitioning, and 
is at most (|k/2|)@ for k-partitioning. 


Recursion Unrolling and Parallelism 

As concurrency on the cell level is important for 
many applications, recursion unrolling becomes 
an important technique: (Regular) Subtrees or, 
more general, fragments along the space-filling 
curve’s depth-first ordering are identified and 
locally replaced by a breadth-first traversal. This 
can be done without modifying any data ac- 
cess order on the surfaces of the cut-out curve 
fragment if the data load and store sequence 
is preserved along the fragment throughout the 
unrolling while only computations are reordered. 
Recursion unrolling then has an impact on the ex- 
ecution overhead as it eliminates recursive func- 
tion calls and it improves the concurrency of 
the computations. It can be controlled by an on- 
the-fly analysis of the tree structure and thus 
seamlessly integrates into changing grids. 


Other Tree-Structured Meshes and 
Space-Filling Curves 

Traversals and stacks can team up only for certain 
space-filling curves and dimensions: 


¢ In 2D, the stack property is apparently satis- 
fied by all connected space-filling curves (for 
connected SFCs, two contiguous subdomains 
share an edge). SFC traversals are induced by 
a grammar that allows a left/right classifica- 
tion. However, no formal proof for this claim 
has been given yet. 

¢ In 3D and all higher dimensions, the Peano 
curve is the only connected curve that has been 
found to satisfy all required properties [6]. 

¢ For octree meshes (in 3D), it is an open prob- 
lem whether SFC traversals exist that can 
exploit stack properties. Hilbert curves and 
Lebesgue curves yield data access patterns 
with spatial and temporal locality but do not 
provide a stack property. 


Applications 


Practical applications comprise (parallel) numer- 
ical simulations on spacetree meshes that require 
adaptive refinement and coarsening in each time 
step or after each iteration [3,4,6]. SFC traversals 
on spacetrees induce sequential orders that may 
be exploited to create balanced partitions with 
favorable quality (due to SFCs being Hélder con- 
tinuous). Using spacetrees as helper data struc- 
tures, respective SFC orders can be defined also 
for entirely unstructured meshes or particle sets. 


Canonical Orders and Schnyder Realizers 


Space-filling curves are thus a frequently used 
tool to define parallel partitions. 
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Problem Definition 


Every planar graph has a crossings-free drawing 
in the plane. Formally, a straight-line drawing 
of a planar graph G is one where each vertex is 
placed at a point in the plane and each edge is rep- 
resented by a straight-line segment between the 
two corresponding points such that no two edges 
cross each other, except possibly at their common 
end points. A straight-line grid drawing of G is 
a straight-line drawing of G where each vertex 
of G is placed on an integer grid point. The area 
for such a drawing is defined by the minimum- 
area axis-aligned rectangle, or bounding box, that 
contains the drawing. 

Wagner in 1936 [12], Fary in 1948 [5], and 
Stein in 1951 [10] proved independently that 
every planar graph has a straight-line drawing. It 
was not until 1990 that the first algorithms for 
drawing a planar graph on a grid of polynomial 
area were developed. The concepts of canoni- 
cal orders [4] and Schnyder realizers [9] were 
independently introduced for the purpose of effi- 
ciently computing straight-line grid drawings on 
the O(n) x O(n) grid. These two seemingly very 
different combinatorial structures turn out to be 
closely related and have since been used in many 
different problems and in many applications. 


Key Results 


We first describe canonical orders for planar 
graphs and a linear time procedure to construct 
them. Then we describe Schnyder realizers and a 
linear time procedure to compute them. Finally, 
we show how they can be used to compute 
straight-line grid drawings for planar graph. 


Canonical Order 
A planar graph G along with a planar embedding 
(a cyclic order of the neighbors for each vertex) is 
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called a plane graph. Given a graph G, testing for 
planarity and computing a planar embedding can 
be done in linear time [6]. Let G be a maximal 
plane graph with outer vertices u,v,w in coun- 
terclockwise order. Then a canonical order or 
shelling order of G is a total order of the vertices 
V1 = U,v2 = V,V3,...,Un = w that meets the 
following criteria for every 4 <i <n: 


(a) The subgraph G;-; © G induced by 
V1, 0U2,.. is 2-connected, and the 
boundary of its outerface is a cycle Cj-1 
containing the edge (v1, v2). 

The vertex v; is in the outerface of G;_}, 
and its neighbors in G;_; form a subinterval 
of the path C;_1 — (u,v) with at least two 
vertices; see Fig. la—1b. 


» > Uj-1 


(b) 


Every maximal plane graph G admits a canon- 
ical order; see Fig. la-1b. Moreover, computing 
such an order can be done in O(7) time where 
n is the number of vertices in G. Before proving 
these claims, we need a simple lemma. 


Lemma 1 Let G be a maximal plane graph with 
canonical order V1, V2 = V,0U3,..., Un. Then for 
i € {3,...,n — 1}, any separating pair {x, y} of 
G; is a chord of C;. 


The proof of the lemma is simple. Recall 
that a chord of a cycle C is an edge between 
nonadjacent vertices of C. Since G is a maxi- 
mal plane graph and since each vertex v;,j € 
{n,...,i + 1} is in the outerface of G;, all 
the internal faces of G; are triangles. Adding 
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a dummy vertex d along with edges from d 
to each vertex on the outerface of G; yields a 
maximal plane graph G’. Then G’ is 3-connected, 
and for each separation pair {x, y} of G, the set 
T = {x,y,d} is a separating set of G’ since 
G =G’ \d. The set T is a separating triangle in 
G’ [1], and therefore, the edge (x, y) is a chord 
on Cj. 


Theorem 1 A canonical order of a maximal 
plane graph G can be computed in linear time. 


This is also easy to prove. If the number of 
vertices n in G is 3, then the canonical ordering 
of G is trivially defined. Let n > 3 and choose 
the vertices vy = W,Uy-1,...,U3 in this order 
so that conditions (a)-(b) of the definition are 
satisfied. Since G is a maximal plane graph, 
G is 3-connected, and hence, G,_1 = G \w 
is 2-connected. Furthermore, the set of vertices 
adjacent to v, = w forms a cycle C,—1, which 
is the boundary of the outerface of G,—;. Thus, 
conditions (a)—-(b) hold for k = n. 

Assume by induction hypothesis that the ver- 
tices Un, Un—1,---, Vit1,/ = 3 have been appro- 
priately chosen. We now find the next vertex v;. 
If we can find a vertex x on C;, which is not 
an end vertex of a chord, then we can choose 
vz = Xx. Indeed, if deleting x from G; violated 
the 2-connectivity, then the cut vertex y of G;—x, 
together with x, would form a separating pair for 
G;, and hence, (x, y) would be a chord in G; 
(from the lemma above). We now show that we 
can find a vertex v; on C; which is not an end 
vertex of a chord. 


Canonical Orders and Schnyder Realizers, Fig. 1 (a) A canonical order of vertices for a maximal plane graph G, 
(b) insertion of vg in G7, (c) a Schnyder realizer for G 
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If there is no chord of C;, then we can choose 
any vertex of C; other than u and v as Ug. 
Otherwise, label the vertices of C; — {(u, v)} by 
Pi = U, p2,..., Pr = v consecutively from u to 
v. By definition, any chord (pz, pz), k < 1 must 
have k < 1—1. We say that a chord (px, pj), k < 
1, includes another chord (px, py’), k' < U, if 
k <k’ <I’ <1. Then take an inclusion-minimal 
chord (px, pj) and any vertex p; fork <j </ 
can be chosen as v;. Since v; is not an end vertex 
of a chord for C;_1,G;-; = G; \ v; remains 
2-connected. Furthermore, due to the maximal 
planarity of G, the neighborhood of v; on Cj, 
forms a subinterval for Cj-1 — (u, v). 

The algorithm, implicit in the above argument, 
can be implemented to run in linear time by 
keeping a variable for each vertex x on Cj, 
counting the number of chords x is incident to. 
After each vertex v; is chosen, the variables for 
all its neighbors can be updated in O(deg(v;)) 
time. Summing over all vertices in the graph leads 
to an overall linear running time [2], and this 
concludes the proof of the theorem. 


Schnyder Realizer 

Let G be a maximal plane graph. A Schnyder 
realizer S of G is a partition of the internal edges 
of G into three sets T7,, 7>, and 73 of directed 
edges, so that for each interior vertex v, the 
following conditions hold: 


(a) v has outdegree exactly one in each of 7), 72, 
and 73. 

(b) The clockwise order of edges incident to v 
is outgoing 7}, incoming 7>, outgoing 73, 
incoming 7), outgoing 72, and incoming 73; 
see Fig. Ic. 


Since a maximal plane graph has exactly n — 
3 internal vertices and exactly 3n — 9 internal 
edges, the three outgoing edges for each internal 
vertex imply that all the edges incident to the 
outer vertices are incoming. In fact, these two 
conditions imply that for each outer vertex r;,i = 
1,2,3, the incident edges belong to the same set, 
say T;, where r1,12,7r3 are in counterclockwise 
order around the outerface and each set of edges 
T; forms a directed tree, spanning all the internal 
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vertices and one external vertex 7;, oriented to- 
wards r; [3]; see Fig. lc. Call 7; the root of 7; for 
b= 1,253; 

Note that the existence of a decomposition 
of a maximal planar graph G into three trees 
was proved earlier by Nash-Williams [8] and by 
Tutte [11]. Kampen [7] showed that these three 
trees can be oriented towards any three specified 
root vertices r1,12,1r3 of G so that each vertex 
other than these roots has exactly one outgoing 
edges in each tree. Schnyder [9] proved the exis- 
tence of the special decomposition defined above, 
along with a linear time algorithm to compute 
it. Before we describe the algorithm, we need to 
define the operation of edge contraction. Let G 
be a graph and e = (x, y) be an edge of G. Then 
we denote by G/e, the simple graph obtained by 
deleting x, y and all their incident edges from G, 
adding a new vertex z and inserting an edge (z, v) 
for each vertex v that is adjacent to either x or 
y in G. Note that for a maximal plane graph G, 
contracting an edge e = (x, y) yields a maximal 
plane graph if and only if there are exactly two 
common neighbors of x and y. Two end vertices 
of an edge e = (x, y) have exactly two common 
neighbors if and only if the edge e is not on the 
boundary of a separating triangle. 


Lemma 2 Let G be a maximal plane graph with 
at least 4 vertices, where u is an outer vertex of G. 
Then there exists an internal vertex v of G such 
that (u,v) is an edge in G and vertices u and v 
have exactly two common neighbors. 


This is easy to prove. If G has exactly 4 vertices, 
then it is K4 and the internal vertex of G has 
exactly two common neighbors. Consider graph 
G with more than 4 vertices. If u is not on the 
boundary of any separating triangle, then taking 
any neighbor of u as v is sufficient. Else, if u is 
on the boundary of a separating triangle A, we 
can find a desired vertex v by induction on the 
subgraph of G inside A. 


Theorem 2 A Schnyder realizer of a maximal 
plane graph G can be computed in linear time. 


The proof of the theorem is by induction. If 
G has exactly 3 vertices, its Schnyder realizer 
is computed trivially. Consider graph G with 
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Canonical Orders and Schnyder Realizers, Fig. 2. Computing a Schnyder realizer of a maximal plane graph G from 


that of G/e 


more than 3 vertices. Let 71,72, and r3 be the 
three outer vertices in counterclockwise order. 
Then by the above lemma, there is an internal 
vertex x in G and edge e = (r3,x) so that 
r3 and x have exactly two common neighbors. 
Let G’ = G/e. Then by the induction hy- 
pothesis, G’ has a Schnyder realizer with the 
three trees 7), 7>, and 73, rooted at 71,72, and 
r3. We now modify this to find a Schnyder re- 
alizer for G. The orientation and partitioning 
of all the edges not incident to x remain un- 
changed from G/e. Among the edges incident 
to x, we add e to 73, oriented towards r3. We 
add the two edges that are just counterclock- 
wise of e and just clockwise of e in the order- 
ing around x, to 7; and 72, respectively, both 
oriented away from x. Finally we put all the 
remaining edges in 73, oriented towards x; see 
Fig. 2. It is now straightforward to check that 
these assignment of edges to the trees satisfy 
the two conditions. The algorithm implicit in the 
proof can be implemented in linear time, given 
the edge contraction sequence. The edge contrac- 
tion sequence itself can be computed in linear 
time by taking the reverse order of a canonical 
order of the vertices and in every step contracting 
the edge between r3 and the current vertex in 
this order. 


Drawing Planar Graphs 

We now show how canonical orders and Schny- 
der realizers can be used to compute straight-line 
grid drawings of maximal plane graphs. 


Theorem 3 Let G be a maximal plane graph 
with n vertices. A straight-line grid drawing of 
G on the (2n —4) x (n —2) grid can be computed 
in linear time. 


This is a constructive proof. Let O = v1,...,Un 
be a canonical order of G,G;, the subgraph of 
G induced by the vertices v,,...,v,;, and C; the 
boundary of the outerface of G;,i = 3,4,...,n. 
We incrementally obtain straight-line drawing I; 
of G; fori = 3,4,...,n. We also maintain the 
following invariants for Ij: 


(i) The x-coordinates of the vertices on the path 
C; \ {(v1, v2)} are monotonically increasing 
as we go from v; to U2. 

(ii) Each edge of the path C; —{(v1, v2)} is drawn 
with slope | or —1. 


We begin with G3, giving v;, v2, and v3 co- 
ordinates (0, 0), (2,0), and (1, 1); the drawing I 
satisfies conditions (i)—-(i1); see Fig. 3a. Suppose 
the drawing for [;_; for some i > 3 has already 
been computed; we now show how to obtain Jj. 
We need to add vertex v; and its incident edges 
in G; to [;_;. Let wy = v1,...,wt = U2 be the 
vertices on C;~; \ {(v1, v2)} in this order from 
v1 to v2. By the property of canonical orders, 
v; is adjacent to a subinterval of this path. Let 
W...-,Wr,l < 1 < r < t, be the vertices 
adjacent to v;, in this order. We want to place v; at 
the intersection point p between the straight line 
from w; with slope | and the straight line from 
wy with slope —1. Note that by condition (ii), 
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Canonical Orders and Schnyder Realizers, Fig. 3 Illustration for the straight-line drawing algorithm using canonical 


order 


the two vertices w; and w; are at even Manhattan 
distance in I”, and hence, point p is a grid point. 
However, if we place v; at p, then the edges v;w; 
and v;w, might overlap with the edges w;w)+1 
and w;—1W,, Since they are drawn with slopes 1 
or —1. We thus shift all the vertices to the left of 
w, in I}-1 (including w7) one unit to the left at all 
the vertices to the right of w, in [j-1 (including 
wy) one unit to the right; see Fig. 3. 

Consider a rooted tree 7, spanning all the in- 
ternal vertex of G along with one external vertex 
Un, Where v,, is the root of T and for each internal 
vertex x, the parent of x is the highest numbered 
successor in G. (Later we see that 7 can be one 
of the three trees in a Schnyder realizer of G.) 


For any internal vertex x of G, denote by U(x) 
the set of vertices that are in the subtree of T 
rooted at v (including v itself). Then the shifting 
of the vertices above can be obtained by shifting 
the vertices in U(w;),i = 1,...,/ one unit to 
the left and the vertices in U(w;),i = 1,...,t 
one unit to the right. After these shifts, v; can be 
safely placed at the intersection of the line with 
slope | from w; and the line with slope —1 from 
wy. Note that this drawing satisfies conditions (i)— 
(ii). This algorithm can be implemented in linear 
time, even though the efficient vertex shifting 
requires careful relative offset computation [2]. 
The next theorem shows how Schnyder real- 
izers can be used to compute a straight-line grid 
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Canonical Orders and Schnyder Realizers, Fig. 4 (a) A straight-line drawing for the graph in Fig. 1 using Schnyder 
realizer, (b)—(c) computation of a canonical order from a Schnyder realizer 


drawing of a plane graph. Let 7), 72, and 73 be 
the trees in a Schnyder realizer of G, rooted at 
outer vertices r}, 72, and r3. Since each internal 
vertex v of G has exactly one outgoing edge 
in each of the trees, there is a directed path 
P;(v) in each of the three trees 7; from v to 
rj. These three paths P,(v), Po(v), and P3(v) 
are vertex disjoint except for v, and they define 
three regions R,(v), R2(v), and R3(v) for v. 
Here R;(v) is the region between the two paths 
P;-,(v) and P;+1(v), where the addition and 
subtraction are modulo 3. Let 7;(v) denote the 
number of vertices in R;(v) \ Pi-1(v), where i = 
1,2,3 and the subtraction is modulo 3. Extend 
these definitions to the outer vertices as follows: 
ni(ri) =n — 2, ni41(71) = 1, ni-1(71) = O. 


Theorem 4 The coordinates ((y1(v), n2(v))) for 
each vertex v in G give a straight-line drawing " 
of G ona grid of size (n — 2) x (n — 2). 


Place each vertex v at the point with coordi- 
nates (71(v), 2(v), and n3(v)). Since n(v) = 
ni(v) + n2(v) + 3(v) counts the number of 
vertices in all the three regions of v, each vertex 
of G except v is counted exactly once in n(v). 
Thus, 71(v) + n2(v) + n3(v) = n — 1 for 
each vertex v. Thus, the drawing I’ obtained 
by these coordinates (1 (v), n2(v), 73(v)) lies on 
the plane x + y + z = n — 1. Furthermore, 
the drawing does not induce any edge crossings; 
see [9]. Thus, I’ is a straight-line drawing of G 
on the plane x + y + z =n —1. Then I’ is just 
a projection of I”’ on the plane z = 0 and hence 
is planar. Since each coordinate in the drawing is 


bounded between 0 and n — 1, the area is at most 
(n — 2) x (n — 2); see Fig. 4a. 


Equivalency of Canonical Orders and 
Schnyder Realizers 

Here we show that canonical orders and Schnyder 
realizers are in fact equivalent in the sense that 
a canonical order of a graph defines a Schnyder 
realizer and vice versa [3]. 


Lemma 3 A canonical order for a maximal 
plane graph G defines a unique Schnyder realizer 
where the three parents for each vertex v of 
G are its leftmost predecessor, its rightmost 
predecessor, and its highest-labeled successor. 


See Fig. la, 1c for a canonical order O and the 
corresponding Schnyder realizer S defined by O 
for a maximal plane graph. One can easily verify 
that this definition of S satisfies the two condi- 
tions for each internal vertex for a maximal plane 
graph. A canonical order O and the Schnyder 
realizer S obtained from © for a maximal plane 
graph are said to be compatible. 

We now describe two ways to obtain a canon- 
ical order from a Schnyder realizer S. In both 
cases, we obtain a canonical order which is com- 
patible with S. 


Lemma 4 Let G be a maximal plane graph with 
outer vertices 11,12, and r3 in counterclockwise 
order and let T;, Tz, and T3 be the three trees in 
a Schnyder realizer of G rooted at 1,12, and r3. 
Then a compatible canonical order of G can be 
obtained as follows: 
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1. By taking the counterclockwise depth-first 
traversal order of the vertices in the graph 
T; U {(r2, 171), (73, 71) }; see Fig. 4b. 

2. By taking the topological order of the directed 
acyclic graph are UTS “UT, where i ae i= 
1,2 is the Schnyder tree T; with reversed edge 
directions; see Fig. 4c. 


It is not difficult to show that the directed 
graph ‘ie U oe U T3 is in fact acyclic [3, 9]. 
Then it is easy to verify that the canonical orders 
obtained from a Schnyder realizer S are compati- 
ble with S; i.e., defining a Schnyder realizer from 
them by Lemma 3 produces the original Schnyder 
realizer S. 
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Problem Definition 


This entry covers several problems, related with 
each other. The first problem is concerned with 
maintaining the causal relationship between 
events in a distributed system. The motivation 
is to allow distributed systems to reason about 
time with no explicit access to a physical clock. 
Lamport [5] defines a notion of logical clocks 
that can be used to generate timestamps that 
are consistent with causal relationships (in 
a conservative sense). He illustrates logical 
clocks (also called Lamport clocks) with 
a distributed mutual exclusion algorithm. The 
algorithm turns out to be an illustration of state- 
machine replication. Basically, the algorithm 
generates a total ordering of the events that is 
consistent across processes. With all processes 
starting in the same state, they evolve consistently 
with no need for further synchronization. 


System Model 
The system consists of a collection of processes. 
Each process consists of a sequence of events. 
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Processes have no shared memory and commu- 
nicate only by exchanging messages. The exact 
definition of an event depends on the system 
actually considered and the abstraction level at 
which it is considered. One distinguishes between 
three kinds of events: internal (affects only the 
process executing it), send, and receive events. 


Causal Order 
Causal order is concerned with the problem that 
the occurrence of some events may affect other 
events in the future, while other events may not 
influence each other. With processes that do not 
measure time, the notion of simultaneity must be 
redefined in such a way that simultaneous events 
are those that cannot possibly affect each other. 
For this reason, it is necessary to define what 
it means for an event to happen before another 
event. 

The following “happened before” relation is 
defined as an irreflexive partial ordering on the 
set of all events in the system [5]. 


Definition 1 The relation “—” on the set of 
events of a system is the smallest relation satis- 
fying the following three conditions: 


1. If a and b are events in the same process, and 
acomes before b, then a > b. 

2. Ifais the sending of a message by one process 
and b is the receipt of the same message by 
another process, then a > b. 

3. Ifa > bandb > c thena—>c. 


Definition 2 Two distinct events a and D are said 
to be concurrent if a & b andb # a. 


Logical Clocks 
Lamport also defines clocks in a generic way, as 
follows. 


Definition 3 A clock C; for a process p; is 
a function which assigns a number C;(a) to 
any event a on that process. The entire system 
of clocks is represented by the function C which 
assigns to any event b the number C (b), where 
C (b) = C;(b) if b is an event in process p;. The 
system of clocks must meet the following clock 
condition. 
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¢ For any events a and b, if a—b then 


C(a) < C(b). 


Assuming that there is some arbitrary total 
ordering < of the processes (e.g., unique names 
ordered lexicographically), Lamport extends the 
“happened before” relation and defines a rela- 
tion “=>” as a total ordering on the set of all 
events in the system. 


Definition 4 The total order relation = is de- 
fined as follows. If a is an event in process p; 
and b is an event in process p;, then a => b if and 
only if either one of the following conditions is 
satisfied. 


i 
aoe 


A 


a) < Cj(b) 
a) = C;(b) and p; < pj. 

In fact, Lamport [5] also discusses an adap- 
tation of these conditions to physical clocks, 
and provides a simple clock synchronization al- 
gorithm. This is however not discussed further 
here. 


State Machine Replication 

The problem of state-machine replication was 
originally presented by Lamport [4, 5]. In a later 
review of the problem, Schneider [8] defines the 
problem as follows (formulation adapted to the 
context of the entry). 


Problem 1 (State-machine replication) 

INPUT: A set of concurrent requests. 

OUTPUT: A sequence of the requests processed 
at each process, such that: 


1. Replica coordination: all replicas receive and 
process the same sequence of requests. 

2. Agreement: every non-faulty state-machine 
replica receives every request. 

3. Order: every non-faulty state-machine replica 
processes the requests it receives in the same 
relative order. 


In his paper on logical time [5] and discussed in 
this entry, Lamport does not consider failures. He 
does however consider them in another paper on 
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state-machine replication for fault-tolerance [4], 
which he published the same year. 


Key Results 


Lamport [5] proposed many key results related to 
the problems described above. 


Logical Clocks 

Lamport [5] defines an elegant system of logical 
clocks that meets the clock condition presented 
in Definition 3. The clock of a process p; is 
represented by a register C;, such that C; (a) 
is the value held by C; when a occurs. Each 
message m carries a timestamp 7,,, which equals 
the time at which m was sent. The clock sys- 
tem can be described in terms of the following 
rules. 


1. Each process p; increments C; between any 
two successive events. 

2. If event a is the sending of a message m 
by process p;, then the message m contains 
a timestamp Tm = C; (a). 

3. Upon receiving a message m, process p; sets 
Cj to max(C;, Tm + 1) (before actually exe- 
cuting the receive event). 


State Machine Replication 

As an illustration for the use of logical clocks, 
Lamport [5] describes a mutual exclusion algo- 
rithm. He also mentions that the approach is 
more general and discusses the concept of state- 
machine replication that he refines in a different 
paper [4]. 

The mutual exclusion algorithm is based on 
the idea that every process maintains a copy of 
a request queue, and the algorithm ensures that 
the copies remain consistent across the processes. 
This is done by generating a total ordering of 
the request messages, according to timestamps 
obtained from the logical clocks of the sending 
processes. 

The algorithm described works under the fol- 
lowing simplifying assumptions: 


e Every message that is sent is eventually re- 
ceived. 


285 


* For any processes p; and p;, messages from p; 
to p; are received in the same order as they are 
sent. 

¢ A process can send messages directly to every 
other processes. 


The algorithm requires that each process 
maintains its own request queue, and ensures that 
the request queues of different processes always 
remain consistent. Initially, request queues 
contain a single message (70, Po, request), where 
Po is the process that holds the resource and the 
timestamp 7p is smaller than the initial value 
of every clock. Then, the algorithm works as 
follows. 


1. When a process p; requests the resource, it 
sends a request message (Tin, pj, request) to 
all other processes and puts the message in its 
request queue. 

2. When a process p; receives a message 
(Tm, pi, request), it puts that message in its 
request queue and sends an acknowledgment 
(Tm, pj,ack) to pj. 

3. When a process p; releases the resource, 
it removes all instances of messages 
(—, pi, request) from its queue, and sends 
a message (Ty, pj,release) to all other 
processes. 

4. When a process p; receives a release message 
from process p;, it removes all instances of 
messages (—, p;, request) from its queue, and 
sends a timestamped acknowledgment to p;. 

5. Messages in the queue are sorted according to 
the total order relation = of Definition 4. 
A process p; can use the resource when 
(a) a message (Tim, pi, request) appears first 
in the queue, and (b) p; has received from all 
other processes a message with a timestamp 
greater than T,, (or equal from any process pj; 
where p; < p;). 


Applications 


A brief overview of some applications of the con- 

cepts presented in this entry has been provided. 
First of all, the notion of causality in 

distributed systems (or lack thereof) leads 
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to a famous problem in which a user may 
potentially see an answer before she can see 
the relevant question. The time-independent 
characterization of causality of Lamport lead 
to the development of efficient solutions to 
enforce causal order in communication. In his 
later work, Lamport [3] gives a more general 
definition to the “happened before” relation, so 
that a system can be characterized at various 
abstraction levels. 

About a decade after Lamport’s work on 
logical clock, Fidge [2] and Mattern [6] have 
developed the notion of vector clocks, with 
the advantage of a complete characterization of 
causal order. Indeed, the clock condition enforced 
by Lamport’s logical clocks is only a one- 
way implication (see Definition 3). In contrast, 
vector clocks extend Lamport clocks by ensuring 
that, for any events a and b, if C(a) < C(bd), 
then a > b. This is for instance useful for 
choosing a set of checkpoints after recovery of 
a distributed system, for distributed debugging, 
or for deadlock detection. Other extensions of 
logical time have been proposed, that have been 
surveyed by Raynal and Singhal [7]. 

The state-machine replication also has many 
applications. In particular, it is often used for 
replicating a distributed service over several pro- 
cessors, so that the service can continue to op- 
erate even in spite of the failure of some of 
the processors. State-machine replication ensures 
that the different replicas remain consistent. 

The mutual exclusion algorithm proposed by 
Lamport [5] and described in this entry is actually 
one of the first known solution to the atomic 
broadcast problem (see relevant entry). Briefly, 
in a system with several processes that broadcast 
messages concurrently, the problem requires that 
all processes deliver (and process) all message 
in the same order. Nowadays, there exist several 
approaches to solving the problem. Surveying 
many algorithms, Défago et al. [1] have classified 
Lamport’s algorithm as communication history 
algorithms, because of the way the ordering is 
generated. 
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Problem Definition 


This problem concerns the query complexity 
of proper learning in a widely studied learning 
model: exact learning with membership and 
equivalence queries. Hellerstein et al. [10] 
showed that the number of (polynomially 
sized) queries required to learn a concept 
class in this model is closely related to the 
size of certain certificates associated with that 
class. This relationship gives a combinatorial 
characterization of the concept classes that can 
be learned with polynomial query complexity. 
Similar results were shown by Hegediis based on 
the work of Moshkov [8, 13]. 


The Exact Learning Model 

Concepts are functions f : X — {0,1} where 
X is an arbitrary domain. In exact learning, there 
is a hidden concept f from a known class of 
concepts C, and the problem is to exactly identify 
the concept /. 

Algorithms in the exact learning model ob- 
tain information about f, the target concept, 
by querying two oracles, a membership oracle 
and an equivalence oracle. A membership oracle 
for f answers membership queries (i.e., point 
evaluation queries), which are of the form “What 
is f(x)?” where x € X. The membership oracle 
responds with the value f(x). An equivalence 
oracle for f answers equivalence queries, which 
are of the form “Is h = f?” where h is a 
representation of a concept defined on the domain 
X. Representation h is called a hypothesis. The 
equivalence oracle responds “yes” if h(x) = 
F(x) for all x € X. Otherwise, it returns a 
counterexample, a value x € X such that f(x) 4 


h(x). 
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The exact learning model is due to Angluin 
[2]. Angluin viewed the combination of mem- 
bership and equivalence oracles as constituting 
a “minimally adequate teacher.’ Equivalence 
queries can be simulated both in Valiant’s well- 
known PAC model, and in the online mistake- 
bound learning model. 

Let R be a set of representations of concepts, 
and let Cr be the associated set of concepts. For 
example, if R were a set of DNF formulas, then 
Cr would be the set of Boolean functions (con- 
cepts) represented by those formulas. An exact 
learning algorithm is said to learn R if, given 
access to membership and equivalence oracles for 
any f in CR, it ends by outputting a hypothesis h 
that is a representation of /. 


Query Complexity of Exact Learning 

There are two aspects to the complexity of exact 
learning: query complexity and computational 
complexity. The results of Hellerstein et al. con- 
cern query complexity. 

The query complexity of an exact learning 
algorithm measures the number of queries it asks 
and the size of the hypotheses it uses in those 
queries (and as the final output). We assume that 
each representation class R has an associated size 
function that assigns a nonnegative number to 
each r € R. The size of a concept c with respect 
to R, denoted by |c|p, is the size of the smallest 
representation of c in R; if c € Cr, |clr = 
oo. Ideally, the query complexity of an exact 
learning algorithm will be polynomial in the size 
of the target and other relevant parameters of the 
problem. 

Many exact learning results concern learning 
classes of representations of Boolean functions. 
Algorithms for learning such classes R are said to 
have polynomial query complexity if the number 
of hypotheses used, and the size of those hypothe- 
ses, is bounded by some polynomial p(m,7n), 
where n is the number of variables on which the 
target f is defined, and m = |f|r. We assume 
that algorithms for learning Boolean representa- 
tion classes are given the value of n as input. 
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Since the number and size of queries used 
by an algorithm are a lower bound on the time 
taken by that algorithm, query complexity lower 
bounds imply computational complexity lower 
bounds. 


Improper Learning and the Halving 
Algorithm 

An algorithm for learning a representation class 
R is said to be proper if all hypotheses used in its 
equivalence queries are from R, and it outputs a 
representation from R. Otherwise, the algorithm 
is said to be improper. 

When Cp is a finite concept class, defined 
on a finite domain X, a simple, generic algo- 
rithm called the halving algorithm can be used 
to exactly learn R using log |Cr| equivalence 
queries and no membership queries. The halving 
algorithm is based on the following idea. For any 
V C Cr, define the majority hypothesis MA/y 
to be the concept defined on X such that for all 
x € X, MAJy(x) = 1 if g(x) = 1 for more 
than half the concepts g in V, and MAJy(x) = 
0 otherwise. The halving algorithm begins by 
setting V = Cp. It then repeats the following: 


1. Ask an equivalence query with the hypothesis 
MAJy. 

2. If the answer is yes, then output MAJy. 

3. Otherwise, the answer is a counterexample 
x. Remove from V all g such that g(x) = 
MAJy (x). 


Each counterexample eliminates the majority of 
the elements currently in V, so the size of V 
is reduced by a factor of at least 2 with each 
equivalence query. It follows that the algorithm 
cannot ask more than log, |Cr| queries. 

The halving algorithm cannot necessarily be 
implemented as a proper algorithm, since the 
majority of hypotheses may not be representable 
in Cr. Even when they are representable in Cr, 
the representations may be exponentially larger 
than the target concept. 


Proper Learning and Certificates 

In the exact model, the query complexity of 
proper learning is closely related to the size of 
certain certificates. 
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For any concept f defined on a domain X, a 
certificate that f has property P is a subset S C 
X such that for all concepts g defined on X, if 
g(x) = f(x) for all x € S, then g has property 
P.. The size of the certificate S is |S|, the number 
of elements in it. 

We are interested in properties of the form “g 
is not a member of the concept class C.” To take 
a simple example, let D be the class of constant- 
valued n-variable Boolean functions, i.e., D con- 
sists of the two functions f1(x1,...,X,) = 1 and 
fo(%1,.--,Xn) = 0. Then if g is an n-variable 
Boolean function that is not a member of D, a 
certificate that g is not in C could be just a pair 
a € {0,1}” and b € {0,1}” such that g(a) = 1 
and g(b) = 0. 

For C aclass of concepts defined on X define 
the exclusion dimension of C to be the maxi- 
mum, over all concepts g not in C, of the size 
of the smallest certificate that g is not in C. Let 
XD(C) denote the exclusion dimension of C. In 
the above example, XD(C) = 2. 


Key Results 


Theorem 1 Let R be a finite class of repre- 
sentations. Then there exists a proper learning 
algorithm in the exact model that learns C us- 
ing at most XD(C) log |C| queries. Further, any 
such algorithm for C must make at least XD(C) 
queries. 


Independently, Hegediis proved a theorem that 
is essentially identical to the above theorem. The 
algorithm in the theorem is a variant of the or- 
dinary halving algorithm. As noted by Hegediis, 
a similar result to Theorem | was proved earlier 
by Moshkov, and Moshkov’s techniques can be 


used to improve the upper bound by a factor of 
2 


log, XD(C) 

An extension of the above result character- 
izes the representation classes that have polyno- 
mial query complexity. The following theorem 
presents the extended result as it applies to rep- 
resentation classes of Boolean functions. 


Theorem 2 Let R be a class of representations 
of Boolean functions. Then there exists a proper 
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learning algorithm in the exact model that learns 
R with polynomial query complexity iff there 
exists a polynomial p(m,n) such that for all 
m,n > 0, and all n-variable Boolean functions 
g, if |g|R > m, then there exists a certificate of 
size at most p(m,n) proving that |g|r > m. 


A concept class having certificates of the type 
specified in this theorem is said to have polyno- 
mial certificates. 

The algorithm in the above theorem does not 
run in polynomial time. Hellerstein et al. give a 
more complex algorithm that runs in polynomial 
time using a x4” oracle, provided R satisfies 
certain technical conditions. K6bler and Lindner 
subsequently gave an algorithm using a ped 
oracle [12]. 

Theorem 2 and its generalization give a tech- 
nique for proving bounds on proper learning in 
the exact model. Proving upper bounds on the 
size of the appropriate certificates yields up- 
per bounds on query complexity. Proving lower 
bounds on the size of appropriate certificates 
yields lower bounds on query complexity and 
hence also on time complexity. Moreover, unlike 
many computational hardness results in learning, 
computational hardness results achieved in this 
way do not rely on any unproven complexity 
theoretic or cryptographic hardness assumptions. 

One of the most widely studied problems in 
computational learning theory has been the ques- 
tion of whether DNF formulas can be learned 
in polynomial time in common learning models. 
The following result on learning DNF formulas 
was proved using Theorem 2, by bounding the 
size of the relevant certificates. 


Theorem 3 There is a proper algorithm that 
learns DNF formulas in the exact model with 
query complexity bounded above by a polynomial 
p(m,r,n), where m is the size of the smallest 
DNF representing the target function f, n is the 
number of variables on which f is defined, and r 
is the size of the smallest CNF representing f. 


The size of a DNF is the number of its terms; 
the size of a CNF is the number of its clauses. The 
above theorem does not imply polynomial-time 
learnability of arbitrary DNF formulas, since the 
running time of the algorithm depends not just 
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on the size of the smallest DNF representing the 
target but also on the size of the smallest CNF. 
Building on results of Alekhnovich et al., 
Feldman showed that if NP 4 RP, DNF formulas 
cannot be properly learned in polynomial time 
in the PAC model augmented with membership 
queries. The same negative result then follows 
immediately for the exact model [1, 7]. Heller- 
stein and Raghavan used certificate size lower 
bounds and Theorem | to prove that DNF for- 
mulas cannot be learned by a proper exact algo- 
rithm with polynomial query complexity, if the 
algorithm is restricted to using DNF hypotheses 
that are only slightly larger than the target [9]. 
The main results of Hellerstein et al. apply 
to learning with membership and equivalence 
queries. Hellerstein et al. also considered the 
model of exact learning with membership queries 
alone and showed that in this model, a projection- 
closed Boolean function class is polynomial 
query learnable iff it has polynomial teaching 
dimension. Teaching dimension was previously 
defined by Goldman and Kearns. Hegediis 
defined the extended teaching dimension and 
showed that all classes are polynomially query 
learnable with membership queries alone iff they 
have polynomial extended teaching dimension. 
Balcazar et al. introduced the general dimen- 
sion, which generalizes the combinatorial dimen- 
sions discussed above [5]. It can be used to 
characterize polynomial query learnability for 
a wide range of different queries. Balcan and 
Hanneke have investigated related combinatorial 
dimensions in the active learning setting [4]. 


Open Problems 


It remains open whether DNF formulas can be 
learned in polynomial time in the exact model, 
using hypotheses that are not DNF formulas. 
Feldman’s results show the computational 
hardness of proper learning of DNF in the 
exact learning model based on complexity- 
theoretic assumptions. However, it is unclear 
whether query complexity is also a barrier to 
efficient learning of DNF formulas. It is still 
open whether the class of DNF formulas has 
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polynomial certificates; showing they do not have 
polynomial certificates would give a hardness 
result for proper learning of DNF based only on 
query complexity, with no complexity-theoretic 
assumptions (and without the hypothesis-size 
restrictions used by Hellerstein and Raghavan). 
DNF formulas do have certain sub-exponential 
certificates [11]. 

It is open whether decision trees have polyno- 
mial certificates. 

Certificate techniques are used to prove lower 
bounds on learning when we restrict the type 
of hypotheses used by the learning algorithm. 
These types of results are called representation 
dependent, since they depend on the restriction 
of the representations used as hypotheses. 
Although there are some techniques for proving 
representation-independent hardness __ results, 
there is a need for more powerful techniques. 
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Problem Definition 


One of the major problems facing wireless net- 
works is the capacity reduction due to inter- 
ference among multiple simultaneous transmis- 
sions. In wireless mesh networks providing mesh 
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routers with multiple-radios can greatly allevi- 
ate this problem. With multiple-radios, nodes 
can transmit and receive simultaneously or can 
transmit on multiple channels simultaneously. 
However, due to the limited number of channels 
available the interference cannot be completely 
eliminated and in addition careful channel assign- 
ment must be carried out to mitigate the effects 
of interference. Channel assignment and routing 
are inter-dependent. This is because channel as- 
signments have an impact on link bandwidths and 
the extent to which link transmissions interfere. 
This clearly impacts the routing used to satisfy 
traffic demands. In the same way traffic routing 
determines the traffic flows for each link which 
certainly affects channel assignments. Channel 
assignments need to be done in a way such that 
the communication requirements for the links 
can be met. Thus, the problem of throughput 
maximization of wireless mesh networks must be 
solved through channel assignment, routing, and 
scheduling. 

Formally, given a wireless mesh backbone 
network modeled as a graph (V, E): The node 
t € V represents the wired network. An edge 
e = (u,v) exists in E iff u and v are within 
communication range Rr. The set Vg CV 
represents the set of gateway nodes. The system 
has a total of K channels. Each node u € V 
has J(u) network interface cards, and has an 
aggregated demand /(u) from its associated users. 
For each edge e the set J(e) C E denotes the 
set of edges that it interferes with. A pair of 
nodes that use the same channel and are within 
interference range R;x may interfere with each 
other’s communication, even if they cannot 
directly communicate. Node pairs using different 
channels can transmit packets simultaneously 
without interference. The problem is to maximize 
» where at least A/(u) amount of throughput 
can be routed from each node u to the Internet 
(represented by a node f). The A/(u) throughput 
for each node wu is achieved by computing 
g(1) a network flow that associates with each 
edge e=(u,v) values f(e(i)),1<i<K 
where f(e(i)) is the rate at which traffic is 
transmitted by node u for node v on channel i; 
(2) a feasible channel assignment F(u) (F(u) 
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is an ordered set where the ith interface of u 
operates on the ith channel in F(u)) such that, 
whenever /f(e(i)) >0, i € Flu) F(v); (©) 
a feasible schedule S that decides the set of 
edge channel pair (e, i) (edge e using channel, 
ie., f(e(i)) > 0 scheduled at time slot t, for 
t=1,2,...,7 where T is the period of the 
schedule. A schedule is feasible if the edges of 
no two edge pairs (e€1,7), (@2,7) scheduled in the 
same time slot for a common channel i interfere 
with each other (e; ¢ I(e2) and e2 ¢ I(e;)). 
Thus, a feasible schedule is also referred to 
as an interference free edge schedule. An 
indicator variable X¢i,1,e € E,i € F(e),t > 1 
is used. It is assigned 1 if and only if 
link e is active in slot t on channel i. 
Note that 1/T ))j<,<r Xe,i,rc(e) = f(e@)). 
This is because communication at rate c(e) 
happens in every slot that link e is active on 
channel i and since f(e(7)) is the average rate 
attained on link e for channel i. This implies 


1/T i<r<r Xe,i,t = a 


Joint Routing, Channel Assignment, 
and Link Scheduling Algorithm 


Even the interference free edge scheduling sub- 
problem given the edge flows is NP-hard [5]. An 
approximation algorithm called RCL for the joint 
routing, channel assignment, and link scheduling 
problem has been developed. The algorithm 
performs the following steps in the given 
order: 


1. Solve LP: First optimally solve a LP relax- 
ation of the problem. This results in a flow 
on the flow graph along with a not necessar- 
ily feasible channel assignment for the node 
radios. Specifically, a node may be assigned 
more channels than the number of its radios. 
However, this channel assignment is “opti- 
mal” in terms of ensuring that the interference 
for each channel is minimum. This step also 
yields a lower bound on the value which 
is used in establishing the worst case perfor- 
mance guarantee of the overall algorithm. 
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2. Channel Assignment: This step presents 
a channel assignment algorithm which is 
used to adjust the flow on the flow graph 
(routing changes) to ensure a feasible channel 
assignment. This flow adjustment also strives 
to keep the increase in interference for each 
channel to a minimum. 

3. Interference Free Link Scheduling: This 
step obtains an interference free link schedule 
for the edge flows corresponding to the flow 
on the flow graph. 


Each of these steps is described in the following 
subsections. 


A Linear Programming-Based Routing 
Algorithm 

A linear program LP (1) to find a flow that 
maximizes ) is given below: 


max A (1) 
Subject to 


K 
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The first two constraints are flow constraints. 
The first one is the flow conservation constraint; 
the second one ensures no link capacity is 
violated. The third constraint is the node radio 
constraints. Recall that a 'WMN node v € V has 
I(v) radios and hence can be assigned at most J(v) 
channels from 1 < i < K.One way to model this 
constraint is to observe that due to interference 
constraints v can be involved in at most J(v) 
simultaneous communications (with different 
one hop neighbors). In other words this constraint 
follows from Di<jcx De=(uv)eE Xed.t + 
isi<K Due=(v ek Xe,i,c < 1(v). The fourth 
constraint is the link congestion constraints 
which are discussed in detail in section “Link 
Flow Scheduling”. Note that all the constraints 
listed above are necessary conditions for any 
feasible solution. However, these constraints are 
not necessarily sufficient. Hence if a solution is 
found that satisfies these constraints it may not be 
a feasible solution. The approach is to start with 
a “good” but not necessarily feasible solution 
that satisfies all of these constraints and use it to 
construct a feasible solution without impacting 
the quality of the solution. 

A solution to this LP can be viewed as a flow 
on a flow graph H =(V,E") where E% = 
{e(i)|Ve € E,1 <i < K}. Although the optimal 
solution to this LP yields the best possible d 
(say A*) from a practical point of view more 
improvements may be possible: 


¢ The flow may have directed cycles. This may 
be the case since the LP does not try to min- 
imize the amount of interference directly. By 
removing the flow on the directed cycle (equal 
amount off each edge) flow conservation is 
maintained and in addition since there are 
fewer transmissions the amount of interfer- 
ence is reduced. 

¢ The flow may be using a long path when 
shorter paths are available. Note that longer 
paths imply more link transmissions. In this 
case it is often the case that by moving the 
flow to shorter paths, system interference may 
be reduced. 


The above arguments suggest that it would be 
practical to find among all solutions that attain the 
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optimal value of A* the one for which the total 
value of the following quantity is minimized: 


> FeE@) 


1<i<K e=(v,u)EE c(e) 


The LP is then re-solved with this objective 
function and with 2 fixed at A*. 


Channel Assignment 

The solution to the LP (1) is a set of flow values 
Ff (e(i)) for edge e and channel i that maximize 
the value . Let A* denote the optimal value of i. 
The flow f(e(i)) implies a channel assignment 
where the two end nodes of edge e are both 
assigned channel i if and only if f(e(Z)) > 0. 
Note that for the flow f(e(7)) the implied channel 
assignment may not be feasible (it may require 
more than /(v) channels at node v). The channel 
assignment algorithm transforms the given flow 
to fix this infeasibility. Below only a sketch of 
the algorithm is given. More details can be found 
in [1]. 

First observe that in an idle scenario, where all 
nodes v have the same number of interfaces J (i.e., 
I =1(v)) and where the number of available 
channels K is also J, the channel assignment 
implied by the LP (1) is feasible. This is because 
even the trivial channel assignment where all 
nodes are assigned all the channels 1 to J is 
feasible. The main idea behind the algorithm is to 
first transform the LP (1) solution to a new flow 
in which every edge e has flow f(e(i)) > 0 only 
for the channels 1 <7 < J. The basic operation 
that the algorithm uses for this is to equally 
distribute, for every edge e, the flow f(e(i)), for 
I <i < K to the edges e(j), for 1 < i < J. This 
ensures that all f(e()) = 0, for 7 <i < K after 
the operation. This operation is called Phase I of 
the Algorithm. Note that the Phase I operation 
does not violate the flow conservation constraints 
or the node radio constraints (5) in the LP (1). 
It can be shown that in the resulting solution the 
flow f(e(i)) may exceed the capacity of edge e 
by at most a factor @ = K/J. This is called the 
“inflation factor” of Phase I. Likewise in the new 
flow, the link congestion constraints (5) may also 
be violated for edge e and channel i by no more 


293 


than the inflation factor ~. In other words in the 
resulting flow 


fle'@) 
c(e’) 


ey < e(q). 

e’El(e) 
This implies that if the new flow is scaled by 
a fraction 1/@ than it is feasible for the LP (1). 
Note that the implied channel assignment (assign 
channels 1 to J to every node) is also feasi- 
ble. Thus, the above algorithm finds a feasible 
channel assignment with a X value of at least 
A*/d. 

One shortcoming of the channel assignment 
algorithm (Phase I) described so far is that it only 
uses I of the K available channels. By using more 
channels the interference may be further reduced 
thus allowing for more flow to be pushed in the 
system. The channel assignment algorithm uses 
an additional heuristic for this improvement. This 
is called Phase II of the algorithm. 

Now define an operation called “channel 
switch operation.” Let A be a maximal connected 
component (the vertices in A are not connected 
to vertices outside A) in the graph formed by 
the edges e for a given channel 7 for which 
Ff (e(i)) > 0. The main observation to use is that 
for a given channel j, the operation of completely 
moving flow f(e(i)) to flow f(e(j)) for every 
edge e in A, does not impact the feasibility of 
the implied channel assignment. This is because 
there is no increase in the number of channels 
assigned per node after the flow transformation: 
the end nodes of edges e in A which were earlier 
assigned channel i are now assigned channel j 
instead. Thus, the transformation is equivalent to 
switching the channel assignment of nodes in A 
so that channel i is discarded and channel j is 
gained if not already assigned. 

The Phase II heuristic attempts to re-transform 
the unscaled Phase I flows f(e(i)) so that 
there are multiple connected components in the 
graphs G(e, i) formed by the edges e for each 
channel 1 <i < J. This re-transformation is 
done so that the LP constraints are kept satisfied 
with an inflation factor of at most @, as is the 
case for the unscaled flow after Phase I of the 
algorithm. 
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Next in Phase III of the algorithm the con- 
nected components within each graph G(e, i) 
are grouped such that there are as close to K 
(but no more than) groups overall and such that 
the maximum interference within each group is 
minimized. Next the nodes within the /th group 
are assigned channel /, by using the channel 
switch operation to do the corresponding flow 
transformation. It can be shown that the channel 
assignment implied by the flow in Phase III is 
feasible. In addition the underlying flows f(e(i)) 
satisfy the LP (1) constraints with an inflation 
factor of at most é = K/T. 

Next the algorithm scales the flow by the 
largest possible fraction (at least 1/@) such 
that the resulting flow is a feasible solution 
to the LP (1) and also implies a feasible 
channel assignment solution to the channel 
assignment. Thus, the overall algorithm finds 
a feasible channel assignment (by not necessarily 
restricting to channels 1 to J only) with a A value 
of at least A*/@. 


Link Flow Scheduling 

The results in this section are obtained by ex- 
tending those of [4] for the single channel case 
and for the Protocol Model of interference [2]. 
Recall that the time slotted schedule S is assumed 
to be periodic (with period T) where the indicator 
variable X¢;,7,e € E,i € F(e),t > lis 1ifand 
only if link e is active in slot t on channel i and / is 
a channel in common among the set of channels 
assigned to the end-nodes of edge e. 

Directly applying the result (Claim 2) in [4] 
it follows that a necessary condition for interfer- 
ence free link scheduling is that for every e € 
E,i € F(e), rape Xei,t + Sega Xe! ,i,t < 
c(q). Here c(q) is a constant that only depends 
on the interference model. In the interference 
model this constant is a function of the fixed 
value q, the ratio of the interference range R; to 
the transmission range R7, and an intuition for its 
derivation for a particular value q = 2 is given 
below. 


Lemma 1 c(q) = 8 forg = 2. 


Proof Recall that an edge e’ € I(e) if there exist 
two nodes x, y € V which are at most 2R7 apart 
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and such that edge e is incident on node x and 
edge e’ is incident on node y. Let e = (u, v). Note 
that u and v are at most Rr apart. Consider the 
region C formed by the union of two circles C, 
and Cy, of radius 2R7 each, centered at node u and 
node v, respectively. Then e’ = (u’, v’) € I(e) if 
an only if at least one of the two nodes w’, v’ 
is in C; Denote such a node by C(e’). Given 
two edges e;, e2 € I(e) that do not interfere with 
each other it must be the case that the nodes 
C(e,) and C(e2) are at least 2R; apart. Thus, 
an upper bound on how many edges in /(e) do 
not pair-wise interfere with each other can be 
obtained by computing how may nodes can be 
put in C that are pair-wise at least 2Rr apart. It 
can be shown [1] that this number is at most 8. 
Thus, in schedule S in a given slot only one of the 
two possibilities exist: either edge e is scheduled 
or an “independent” set of edges in J(e) of size 
at most 8 is scheduled implying the claimed 
bound. Oo 


A necessary condition: (Link Congestion Con- 
straint) Recall that + Dy <r<r Xe,ic = SEY. 
Thus: Any valid “interference free” edge flows 
must satisfy for every link e and every channel i 


the Link Congestion Constraint: 


fle") 
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c(e) e’El(e) 


<c(q). (6) 


A matching sufficient condition can also estab- 
lished [1]. 

A. sufficient condition: (Link Congestion 
Constraint) If the edge flows satisfy for every 
link e and every channel i the following Link 
Schedulability Constraint than an interference 
free edge communication schedule can be found 
using an algorithm given in [1]. 


fe'@) 
c(e’) 


<1. 


fe) 
wet > (7) 
e’El(e) 
The above implies that if a flow f(e(i)) sat- 
isfies the Link Congestion Constraint then by 
scaling the flow by a fraction 1/c(q) it can be 
scheduled free of interference. 
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Key Results 


Theorem The RCL algorithm is a Kc(q)/I ap- 
proximation algorithm for the Joint Routing and 
Channel Assignment with Interference Free Edge 
Scheduling problem. 


Proof Note that the flow f(e(i)) returned by the 
channel assignment algorithm in Sect. “Channel 
Assignment” satisfies the Link Congestion Con- 
straint. Thus, from the result of Sect. “Link Flow 
Scheduling” it follows that by scaling the flow by 
an additional factor of 1/c(q) the flow can be re- 
alized by an interference free link schedule. This 
implies a feasible solution to the joint routing, 
channel assignment and scheduling problem with 
a d value of at least A*/@c(q). Thus, the RCL 
algorithm is a @c(q) = Kc(q)/1 approximation 
algorithm. Oo 


Applications 


Infrastructure mesh networks are increasingly 
been deployed for commercial use and law 
enforcement. These deployment settings place 
stringent requirements on the performance of 
the underlying IWMNs. Bandwidth guarantee 
is one of the most important requirements of 
applications in these settings. For these [WMNs, 
topology change is infrequent and the variability 
of aggregate traffic demand from each mesh 
router (client traffic aggregation point) is small. 
These characteristics admit periodic optimization 
of the network which may be done by a system 
management software based on traffic demand 
estimation. This work can be directly applied to 
IWMNs. It can also be used as a benchmark to 
compare against heuristic algorithms in multi- 
hop wireless networks. 


Open Problems 


For future work, it will be interesting to inves- 
tigate the problem when routing solutions can 
be enforced by changing link weights of a dis- 
tributed routing protocol such as OSPF. Also, can 
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the worst case bounds of the algorithm be im- 
proved (e.g., a constant factor independent of K 
and I)? 
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Problem Definition 


Circuit partitioning is a fundamental problem 
in many areas of VLSI layout and design. 
Min-cut balanced bipartition is the problem 
of partitioning a circuit into two disjoint 
components with equal weights such that the 
number of nets connecting the two components 
is minimized. The min-cut balanced bipartition 
problem was shown to be NP-complete [5]. 
The problem has been solved by heuristic 
algorithms, e.g., Kernighan and Lin type 
(K&L) iterative improvement methods [4, 11], 
simulated annealing approaches [10], and 
analytical methods for the ratio-cut objective 
(2, 7, 13, 15]. Although it is a natural method 
for finding a min-cut, the network max-flow 
min-cut technique [6, 8] has been overlooked 
as a viable approach for circuit partitioning. 
In [16], a method was proposed for exactly 
modeling a circuit netlist (or, equivalently, a 
hypergraph) by a flow network, and an algorithm 
for balanced bipartition based on repeated 
applications of the max-flow min-cut technique 
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was proposed as well. Our algorithm has the same 
asymptotic time complexity as one max-flow 
computation. 

A circuit netlist is defined as a digraph N = 
(V, E), where V is a set of nodes representing 
logic gates and registers and E is a set of edges 
representing wires between gates and registers. 
Each node v € V has a weight w(v) € Rt. The 
total weight of a subset U C V is denoted by 
wU) = Lyceuw(v). W = w(V) denotes the total 
weight of the circuit. A netn = (v;v1,...,U7) 
is a set of outgoing edges from node v in N. 
Given two nodes s and ¢ in N, an s —¢ cut (or 
cut for short) (x x ) of N is a bipartition of the 
nodes in V such that s € X andt € X. The 
net-cut net (xX, X) of the cut is the set of nets in 
N that are incident to nodes in both X and X. 
A cut (xX, X) is a min-net-cut if |net (Xx, X)| iS 
minimum among all s — tf cuts of N. In Fig. 1, net 
a = [r1;g1,g2), net cuts net (X,X) = {b,e} 
and net (Y, Y) = {c,a,b,e}, and (X, Xx) isa 
min-net-cut. 

Formally, given an aspect ratio r and a devi- 
ation factor ¢, min-cut r-balanced bipartition is 
the problem of finding a bipartition (xX oa ) of the 
netlist N such that (1) (1 —e)rW < W(X) < 
(1 + €)rW and (2) the size of the cut net (Xx, X) 
is minimum among all bipartitions satisfying (1). 
When r = 1/2, this becomes a min-cut balanced- 
bipartition problem. 


Circuit Partitioning: A Network-Flow-Based Balanced Min-Cut Approach, Fig. 1 A circuit netlist with two 


net-cuts 
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Circuit Partitioning: A 
Network-Flow-Based 
Balanced Min-Cut 
Approach, Fig. 2 
Modeling a net in N in the 
flow network N’ 


A net n in circuit N 


Circuit Partitioning: A 
Network-Flow-Based 
Balanced Min-Cut 
Approach, Fig.3 The 
flow network for Fig. | 
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The nodes and edges correspond to net n in N’ 


iY 


ee 
OS 


A bridging edge with unit capacity 


Key Results 


Optimal-Network-Flow-Based Min-Net-Cut 
Bipartition 

The problem of finding a min-net-cut in N = 
(V, E) is reduced to the problem of finding a cut 
of minimum capacity. Then the latter problem is 
solved using the max-flow min-cut technique. A 
flow network N’ = (V’, E’) is constructed from 
N = (V, E) as follows (see Figs. 2 and 3): 


1. V’ contains all nodes in V. 

2. For each net n = (v;v1,...,v;) in N, add 
two nodes 1; and 2 in V’ and a bridging edge 
bridge(n) = (n,,n2) in E’. 

3. For each node u € {v, v1,..., v;} incident on 
net n, add two edges (u, 1) and (1, u) in E’. 

4. Let s be the source of N’ and ¢ the sink of N’. 

5. Assign unit capacity to all bridging edges and 
infinite capacity to all other edges in E’. 

6. For anode v € V’ corresponding to a node in 
V, w(v) is the weight of v in N. For a node 
u € V’ split from a net, w(u) = 0. 


An ordinary edge with infinite capacity 


Note that all nodes incident on net n are 
connected to mn, and are connected from n2 
in N’. Hence, the flow network construction 
is symmetric with respect to all nodes 
incident on a net. This construction also 
works when the netlist is represented as a 
hypergraph. 

It is clear that N’ is a strongly connected 
digraph. This property is the key to reduc- 
ing the bidirectional min-net-cut problem 
to a minimum-capacity cut problem that 
counts the capacity of the forward edges 
only. 


Theorem 1 WN has a cut of net-cut size at most 
C if and only if N’ has a cut of capacity at 
most C. 


Corollary 1 Let (X’, Xx’) be a cut of minimum 
capacity C in N’. Let Ney = {n | bridge(n) € 
(x’, Xx’) i. Then New = (X, X) is a min-net-cut 
in N and |Neut| = C. 


Corollary 2 A min-net-cut in a circuit N = 
(V, E) can be found in O(|V ||E|) time. 
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Min-Cut Balanced-Bipartition 
Heuristic 


First, a repeated max-flow min-cut heuristic al- 
gorithm, flow-balanced bipartition (FBB), is de- 
veloped for finding an r-balanced bipartition that 
minimizes the number of crossing nets. Then, an 
efficient implementation of FBB is developed that 
has the same asymptotic time complexity as one 
max-flow computation. For ease of presentation, 
the FBB algorithm is described on the original 
circuit rather than the flow network constructed 
from the circuit. The heuristic algorithm is de- 
scribed in Fig. 4. Figure 5 shows an example. 
Table 1 compares the best bipartition net- 
cut sizes of FBB with those produced by the 
analytical-method-based partitioners EIG1 [7] 
and PARABOLI (PB) [13]. The results produced 
by PARABOLI were the best previously known 
results reported on the benchmark circuits. The 
results for FBB were the best of ten runs. 
On average, FBB outperformed EIG! and 
PARABOLI by 58.1 and 11.3%, respectively. 
For circuit $38417, the suboptimal result from 
FBB can be improved by (1) running more times 
and (2) applying clustering techniques to the 
circuit based on connectivity before partitioning. 


Circuit Partitioning: A 
Network-Flow-Based 


Balanced Min-Cut 1 
Approach, Fig.4 FBB 2 
algorithm 


Circuit Partitioning: A Network-Flow-Based Balanced Min-Cut Approach 


In the FBB algorithm, the node-collapsing 
method is chosen instead of a more gradual 
method (e.g., [9]) to ensure that the capacity 
of a cut always reflects the real net-cut size. To 
pick a node at steps 4.2 and 5.2, a threshold R is 
given for the number of nodes in the uncollapsed 
subcircuit. A node is randomly picked if the 
number of nodes is larger than R. Otherwise, 
all nodes adjacent to C are tried and the one 
whose collapse induces a min-net-cut with the 
smallest size is picked. A naive implementation 
of step 2 by computing the max-flow from the 
zero flow would incur a high time complexity. 
Instead, the flow value in the flow network 
is retained, and additional flow is explored to 
saturate the bridging edges of the min-net-cut 
from one iteration to the next. The procedure is 
shown in Fig. 6. Initially, the flow network retains 
the flow function computed in the previous 
iteration. Since the max-flow computation using 
the augmenting-path method is insensitive to 
the initial flow values in the flow network and 
the order in which the augmenting paths are 
found, the above procedure correctly finds a 
max-flow with the same flow value as a max-flow 
computed in the collapsed flow network from the 
zero flow. 


Algorithm: Flow-Balanced-Bipartition (FBB) 


. Pick a pair of nodes s and rin N; 
. Find a min-net-cut C in N; 
Let X be the subcircuit reachable from s through 


augmenting paths in the flow network, and X 


the rest; 


.if d-—e)rW < w(x) < (1+e)rw 
return C as the answer; 
4. if w(X) < (l-e)rW 
4.1. Collapse all nodes in X to s; 
4.2. Pick a node vy €X adjacent to C and collapse it to s; 


4.3. Goto 1; 


5. if w(X) > (1+ €)rW 
5.1. Collapse all nodes in X to ¢; 
5.2. Pick a node vy €X adjacent to C and collapse it to ¢; 


5.3. Goto 1; 
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© An un saturated net © A saturated net GY, A node to be collapsed to s or t 


Circuit Partitioning: A Network-Flow-Based Bal- for each node. The algorithm terminates after finding cut 
anced Min-Cut Approach, Fig. 5 FBB on the example (X>2, X2). A small solid node indicates that the bridging 
in Fig.3 for r = 1/2,€ = 0.15 and unit weight edge corresponding to the net is saturated with flow 


Circuit Partitioning: A Network-Flow-Based Balanced Min-Cut Approach, Table 1 Comparison of EIG1, PB, 
and FBB (r = 1/2,¢€ = 0.1). All allow <10 % deviation 


Circuit Best net-cut size Improve. % over 

Name Gates and latches Nets Avg. deg EIG1 PB FBB EIG1 PB FBB elaps. sec. 
$1423 731 743 2.7 23 16 = 13 43.5 18.8 1.7 
$9234 5,808 5,805 2.4 227 74 70 69.2 5.4 55.7 
$13207 8,696 8,606 2.4 241 91 74 69.3 18.9 100.0 
$15850 10,310 10,310 2.4 215 91 67 68.8 26.4 96.5 
$35932 18,081 17,796 2.7 105 62 49 53.3 21.0 2,808 
$38584 20,859 20,593 2 76 55 47 38.2 14.5 1,130 
$38417 24,033 23,955 2.4 121 49 58 52.1 —18.4 2,736 
Average 58.5 11.3 


Circuit Partitioning: A 
Network-Flow-Based Procedure: Incremental Flow Computation 
Balanced Min-Cut 
Approach, Fig. 6 

Incremental max-flow increase flow value along the augmenting 
computation 


1. while 5 an additional augmenting path from s to t 


path; 
. Mark all nodes ws.t.d an augmenting path from s 
to u; 


. Let C’ be the set of bridging edges whose starting 


nodes are marked and ending nodes are not 
marked; 

. Return the nets corresponding to the bridging edges 
in C’as the min-net-cut C, and the marked 
nodes as X. 
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Theorem 2 FBB has time complexity O(|V || E|) 
for a connected circuit N = (V, E). 


Theorem 3 The number of iterations and the fi- 
nal net-cut size are nonincreasing functions of €. 


In practice, FBB terminates much faster than this 
worst-case time complexity as shown in the sec- 
tion “Experimental Results.” Theorem 3 allows 
us to improve the efficiency of FBB and the 
partition quality for a larger €. This is not true for 
other partitioning approaches such as the K&L 
heuristics. 


Applications 


Circuit partitioning is a fundamental problem 
in many areas of VLSI layout and design 
automation. The FBB algorithm provides the 
first efficient predictable solution to the min- 
cut balanced-circuit-partitioning problem. It 
directly relates the efficiency and the quality 
of the solution produced by the algorithm to 
the deviation factor €. The algorithm can be 
easily extended to handle nets with different 
weights by simply assigning the weight of a 
net to its bridging edge in the flow network. 
K-way min-cut partitioning for K > 2 can be 
accomplished by recursively applying FBB or 
by setting r = 1/K and then using FBB to find 
one partition at a time. A flow-based method 
for directly solving the problem can be found in 
[12]. Prepartitioning circuit clustering according 
to the connectivity or the timing information of 
the circuit can be easily incorporated into FBB 


by treating a cluster as a node. Heuristic solutions 
based on K&L heuristics or simulated annealing 
with low temperature can be used to further fine- 
tune the solution. 


Experimental Results 


The FBB algorithm was implemented in 
SIS/MISTI [1] and tested on a set of large ISCAS 
and MCNC benchmark circuits on a SPARC 
10 workstation with 36-MHz CPU and 32 MB 
memory. 

Table 2 compares the average bipartition re- 
sults of FBB with those reported by Dasdan and 
Aykanat in [3]. SN is based on the K&L heuristic 
algorithm in Sanchis [14]. PFM3 is based on 
the K&L heuristic with free moves as described 
in [3]. For each circuit, SN was run 20 times 
and PFM3 10 times from different randomly 
generated initial partitions. FBB was run 10 times 
from different randomly selected s and ¢. With 
only one exception, FBB outperformed both SN 
and PFM3 on the five circuits. On average, FBB 
found a bipartition with 24.5 and 19.0% fewer 
crossing nets than SN and PFM3, respectively. 
The runtimes of SN, PFM3, and FBB were not 
compared since they were run on different work- 
stations. 
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Circuit Partitioning: A Network-Flow-Based Balanced Min-Cut Approach, Table 2. Comparison of SN. PFM3. 


and FBB (r = 1/2,€ = 0.1) 


Circuit Avg. net-cut size F BB Improve. % 

Name Gates and latches Nets Avg. deg SN  PFM3_ FBB a Over SN Over PFM3 
C1355 514 523 3.0 38.9 29.1 26.0 1:1.08 33.2 10.7 
C2670 1,161 1,254 2.6 51.9 46.0 37.1 1:1.15 28.5 19.3 
C3540 1,667 1,695 2.7 90.3. 71.0 79.8 1:1.11 11.6 -12.4 
C7552 3,466 3,565 2.7 44.3 81.8 42.9 1:1.08 3.2 47.6 
$838 478 511 | 2.6 27.1 21.0 14.7 1:1.04 45.8 30.0 

Ave 1:1.10 24.5 19.0 


Circuit Placement 


Recommended Reading 


1. 


10. 


11. 


12. 


13. 


14. 


15. 


16. 


Brayton RK, Rudell R, Sangiovanni-Vincentelli AL 
(1987) MIS: a multiple-level logic optimization. 
IEEE Trans CAD 6(6):1061-1081 

Cong J, Hagen L, Kahng A (1992) Net partitions 
yield better module partitions. In: Proceedings of 
the 29th ACM/IEEE design automation conference, 
Anaheim, pp 47-52 

Dasdan A, Aykanat C (1994) Improved multiple- 
way circuit partitioning algorithms. In: International 
ACM/SIGDA workshop on field programmable gate 
arrays, Berkeley 

Fiduccia CM, Mattheyses RM (1982) A linear time 
heuristic for improving network partitions. In: Pro- 
ceedings of the ACM/IEEE design automation con- 
ference, Las Vegas, pp 175-181 

Garey M, Johnson DS (1979) Computers and in- 
tractability: a guide to the theory of NP-completeness. 
Freeman, Gordonsville 

Goldberg AW, Tarjan RE (1988) A new ap- 
proach to the maximum flow problem. J SIAM 35: 
921-940 

Hagen L, Kahng AB (1991) Fast spectral methods for 
ratio cut partitioning and clustering. In: Proceedings 
of the IEEE international conference on computer- 
aided design, Santa Clara, pp 10-13 

Hu TC, Moerder K (1985) Multiterminal flows in a 
hypergraph. In: Hu TC, Kuh ES (eds) VLSI circuit 
layout: theory and design. IEEE, New York, pp 87-93 
Iman S, Pedram M, Fabian C, Cong J (1993) Finding 
uni-directional cuts based on physical partitioning 
and logic restructuring. In: 4th ACM/SIGDA physical 
design workshop, Lake Arrowhead 

Kirkpatrick S, Gelatt CD, Vecchi MP (1983) 
Optimization by simulated annealing. Science 
4598:67 1-680 

Kernighan B, Lin S (1970) An efficient heuristic 
procedure for partitioning of electrical circuits. Bell 
Syst Tech J 49:291-307 

Liu H, Wong DF (1998) Network-flow-based mul- 
tiway partitioning with area and pin constraints. 
IEEE Trans CAD Integr Circuits Syst 17(1): 
50-59 

Riess BM, Doll K, Frank MJ (1994) Partitioning very 
large circuits using analytical placement techniques. 
In: Proceedings of the 31th ACM/IEEE design au- 
tomation conference, San Diego, pp 646-651 
Sanchis LA (1989) Multiway network partitioning. 
IEEE Trans Comput 38(1):62—81 

Wei YC, Cheng CK (1989) Towards efficient 
hierarchical designs by ratio cut partitioning. 
In: Proceedings of the JEEE international 
conference on computer-aided design, Santa Clara, 
pp 298-301 

Yang H, Wong DF (1994) Efficient network flow 
based min-cut balanced partitioning. In: Proceedings 
of the IEEE international conference on computer- 
aided design, San Jose, pp 50-55 


301 


Circuit Placement 


Andrew A. Kennings! and Igor L. Markov? 
1Department of Electrical and Computer 
Engineering, University of Waterloo, Waterloo, 
ON, Canada 

?Department of Electrical Engineering and 
Computer Science, University of Michigan, 
Ann Arbor, MI, USA 


Keywords 


Algorithm; Circuit; Combinatorial optimization; 
Hypergraph; Large-scale optimization; Lin- 
ear programming; Network flow; Nonlinear 
optimization; Partitioning; Physical design; 
Placement; VLSI CAD 


Synonyms 


Analytical placement; EDA; Layout; Mathemat- 
ical programming; Min-cost max-flow; Min-cut 
placement; Netlist 


Years and Authors of Summarized 
Original Work 


2000; Caldwell, Kahng, Markov 
2006; Kennings, Vorwerk 
2012; Kim, Lee, Markov 


Problem Definition 


This problem is concerned with determining con- 
strained positions of objects while minimizing 
a measure of interconnect between the objects, 
as in physical layout of integrated circuits, com- 
monly done in 2 dimensions. While most formu- 
lations are NP-hard, modern circuits are so large 
that practical placement algorithms must have 
near-linear run time and memory requirements, 
but not necessarily produce optimal solutions. 
Research in placement algorithms has identified 
scalable techniques which are now being adopted 
in the electronic design automation industry. 
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One models a circuit by a hypergraph 
Gy(V_, En) with (i) vertices V,; = {v1,...,Un} 
representing logic gates, standard cells, larger 
modules, or fixed I/O pads and (ii) hyperedges 
En = {e1,...,@m} representing connections 
between modules. Vertices and hyperedges 
connect through pins for a total of P pins in 
the hypergraph. Each vertex v; € Vj, has width 
w;, height h;, and area A;. Hyperedges may 
also be weighted. Circuit placement seeks center 
positions (x;,¥;) for vertices that optimize a 
hypergraph-based objective subject to constraints 
(see below). A placement is captured by x = 


(X1,°°+ , Xn) and y = (¥1,°*: . Yn). 


Objective: Let C, be the index set of the 
hypergraph vertices incident to hyperedge 
ex. The total half-perimeter wire length 
(HPWL) of the circuit hypergraph is given by 
HPWL(G;,) = eek. HPWL(ex) = pares 
[max;,jecy |xi—xj| + maxi jec, yi — yl]. 
HPWL is piecewise linear, separable in the x 
and y directions, convex, but not strictly convex. 
Among many objectives for circuit placement, it 
is the simplest and most common. 


Constraints: 


1. No overlap. The area occupied by any two 
vertices cannot overlap; i.e., either |x; — x; | > 
3(wi + wy) or |yi — yj] = 3s + hy), 
Vuj, V7 E Vp. 

2. Fixed outline. Each vertex v; € Vj, must be 
placed entirely within a specified rectangular 
region bounded by Xmin (Ymin) and Xmax (Vmax) 
which denote the left (bottom) and right (top) 
boundaries of the specified region. 

3. Discrete slots. There is only a finite num- 
ber of discrete positions, typically on a grid. 
However, in large-scale circuit layout, slot 
constraints are often ignored during global 
placement and enforced only during legaliza- 
tion and detail placement. 


Other constraints may include alignment, mini- 
mum and maximum spacing, etc. Many place- 
ment techniques temporarily relax overlap con- 
straints into density constraints to avoid vertices 
clustered in small regions. A m x n regular 
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bin structure B is superimposed over the fixed 
outline and vertex area is assigned to bins based 
on the positions of vertices. Let D;; denote the 
density of bin B;; € B, defined as the total cell 
area assigned to bin B;; divided by its capacity. 
Vertex overlap is limited implicitly by satisfying 
Di; < K, VBi; € B, for some K < | (density 
target). 


Problem 1 (Circuit Placement) INPUT: Circuit 
hypergraph G),(V;, E;,) and a fixed outline for 
the placement area. 

OUTPUT: Positions for each vertex v; € Vz, such 
that (1) wire length is minimized and (2) the area- 
density constraints D;; < K are satisfied for all 
B; 7s B. 


Key Results 


An unconstrained optimal position of a single 
placeable vertex connected to fixed vertices can 
be found in linear time as the median of adja- 
cent positions [7]. Unconstrained HPWL mini- 
mization for multiple placeable vertices can be 
formulated as a linear program [6, 11]. For each 
ex € Ey, upper and lower bound variables Ux 
and L,; are added. The cost of e, (x-direction 
only) is the difference between Uz and Lx. Each 
U;,. (Lx) comes with pz inequality constraints that 
restrict its value to be larger (smaller) than the 
position of every vertex i € Cx. 

Linear programming has poor scalability and 
integrating constraint-tracking into optimization 
is difficult. Other approaches include nonlinear 
optimization and partitioning-based methods. 


Combinatorial Techniques for Wire Length 
Minimization 

The no-overlap constraints are not convex and 
cannot be directly added to the linear program 
for HPWL minimization. Vertices often cluster 
in small regions of high density. One can lower 
bound the distance between closely placed ver- 
tices with a single linear constraint that depends 
on the relative placement of these vertices [11]. 
The resulting optimization problem is incremen- 
tally resolved, and the process repeats until the 
desired density is achieved. 
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The min-cut placement technique is based 
on balanced min-cut partitioning of hypergraphs 
and is more focused on density constraints [12]. 
Vertices of the initial hypergraph are first parti- 
tioned in two similar-sized groups. One of them 
is assigned to the left half of the placement 
region, and the other one to the right half. Parti- 
tioning is performed by the Multilevel Fiduccia- 
Mattheyses (MLFM) heuristic [10] to minimize 
connections between the two groups of vertices 
(the net-cut objective). Each half is partitioned 
again but takes into account the connections to 
the other half [12]. At the large scale, ensuring 
the similar sizes of bipartitions corresponds to 
density constraints, and cut minimization corre- 
sponds to HPWL minimization. When regions 
become small and contain <10 vertices, optimal 
positions can be found with respect to discrete 
slot constraints by branch-and-bound [2]. Bal- 
anced hypergraph partitioning is NP-hard [4], but 
the MLFM heuristic takes O((V + EF) log V) 
time. The entire min-cut placement procedure 
takes O((V + E)(log V)) time and can process 
hypergraphs with millions of vertices in several 
hours. 

A special case of interest is that of one- 
dimensional placement. When all vertices 
have identical width and none of them are 
fixed, one obtains the NP-hard MINIMUM 
LINEAR ARRANGEMENT problem [4] which 
can be approximated in polynomial time within 
O(log V) and solved exactly for trees in O(V +) 
time as shown by Yannakakis. The min-cut 
technique described above also works well for 
the related NP-hard MINIMUM-CUT LINEAR 
ARRANGEMENT problem [4]. 


Quadratic and Nonlinear Wire Length 
Approximations 

Quadratic and generic nonlinear optimization 
may be faster than linear programming 
while reasonably approximating the original 
formulation. 


Quadratic, Linearized Quadratic, and 
Bound-to-Bound Placement 

The hypergraph is represented by a weighted 
graph where w;; represents the weight on the 
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2-pin edge connecting vertices v; and v; in the 
weighted graph. When an edge is absent, wij = 
O, and in general wi; = —L; 4; wij. A quadratic 
placement (x-direction only) is given by 


O(x) = D> wy [Ci — x5)"] 
i,j 


1 
= 5% Ox +e? x + const. (1) 


The global minimum of ©(x) is found by solving 
Qx-+c¢ = 0 which is a sparse, symmetric positive 
definite system of linear equations (assuming >1 
fixed vertex), efficiently solved using any number 
of iterative solvers. Quadratic placement may 
have different optima depending on the model 
(clique or star) used to represent hyperedges. 
However, for a k-pin hyperedge, if wj; = We in 
a clique model and w;; = kW, is a star model, 
then the models are equivalent in quadratic place- 
ment [6]. 

Quadratic placement can produce lower 
quality placements. To approximate a linear 
objective, one can iteratively solve Eq. 1 
with wi; = 1/|x; — x;| computed at every 
iteration. Alternatively, one can solve a single 
B-regularized optimization problem given by 
4 (x) = ming Dy; wis yy Oi — /) +B, B > 
0, e.g., using a Primal-Dual Newton method [1]. 

In bound-to-bound placement, instead of a 
clique or star model, hyperedges are decomposed 
based on the relative placement of vertices. For 
a k-pin hyperedge, the extreme vertices (min 
and max) are connected to each other and to 
each internal vertex with weights w;; = 1/(k — 
1)|x; — x;|. With these weights, the quadratic 
objective captures HPWL exactly, but only for the 
given placement. As placement changes, updates 
to the quadratic placement objective are required 
to reduce discrepancies [8]. 


Half-perimeter Wire Length Placement: 

HPWL can be provably approximated by strictly 
convex and differentiable functions. For 2-pin 
hyperedges, f-regularization can be used [1]. 
For an k-pin hyperedge (k > 3), ome can 
rewrite HPWL as the maximum (/,5-norm) of 
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all k(k — 1)/2 pairwise distances |x; — x;| and 
approximate the /,.-norm by the /,-norm. This 
removes all non-differentiabilities except at 0 
which is then removed with 6-regularization. 
The resulting HPWL approximation is given by 


1/p 


ye |x;—x;|? +B 


i,fECk 


HPWLaec(Gi)= >> 

exe En 

(2) 

which overestimates HPWL with arbitrarily 

small relative error as p — oo and B — 0 [6]. 

Alternatively, HPWL can be approximated via 
the log-sum-exp (LSE) formula given by 


HPWLisgE(Gp) = a > In as en (=) 


excEn TEC, 


(3) 


where a > 0 is a smoothing parameter [5]. Both 
approximations can be optimized using conjugate 
gradient methods. Other convex and differen- 
tiable HPWL approximations exist. 


Analytic Techniques for Target Density 
Constraints 


The target density constraints are non-differentiable 


and are typically handled by approximation. 


Force-Based Spreading 

The key idea is to add constant forces f that pull 
vertices always from overlaps, and recompute the 
forces over multiple iterations to reflect changes 
in vertex distribution. For quadratic placement, 
the new optimality conditions are Qx +¢e+ f= 
0 [7]. The constant force can perturb a placement 
in any number of ways to satisfy the target density 


2 2 
TOG) 4 TEC — f(x,y) = D(x, y), &y) ER 


ap 
o¢ — 9 
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constraints. The force f is computed using a 
discrete version of Poisson’s equation. 


Fixed-Point Spreading 

A fixed point f is a pseudo-vertex with zero 
area, fixed at (vf, yy), and connected to one 
vertex H(f) in the hypergraph through the use 
of a pseudo-edge with weight wry). Each 
fixed point introduces a single quadratic term into 
the objective function; quadratic placement with 
fixed points is given by D(x) = Di; ; wi,j (Xi — 
xj) + Vip wraps) — xf)”. By manip- 
ulating the positions of fixed points, one can 
perturb a placement to satisfy the target density 
constraints. Fixed points improve the controlla- 
bility and stability of placement iterations, in 
particular by improving the conditioning number 
of resulting numerical problem instances. A par- 
ticularly effective approach to find fixed points is 
through the use of fast LookAhead Legalization 
(LAL) [8,9]. Given locations found by quadratic 
placement, LAL gradually modifies them into 
a relatively overlap-free placement that satisfies 
density constraints and seeks to preserve the 
ordering of x and y positions, while avoiding 
unnecessary movement. The resulting locations 
are used as fixed target points. LAL can be 
performed by top-down geometric partitioning 
with nonlinear scaling between partitions. As 
described in [8, 9], this approach is particularly 
effective at handling rectilinear obstacles. Sub- 
sequent work developed extensions to account 
for routing congestion and other considerations 
arising in global placement. At the most recent 
(ISPD 2014) placement contest, the contestants 
ranked in top three used the framework outlined 
in [8]. 


Generalized Force-Directed Spreading 

The Helmholtz equation models a diffusion pro- 
cess and makes it ideal for spreading vertices [3]. 
The Helmholtz equation is given by 


(4) 
(x, y) on the boundary of R 
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where € > 0, v is an outer unit normal, R rep- 
resents the fixed outline, and D(x, y) represents 
the continuous density function. The boundary 
conditions, ae = 0, specify that forces pointing 
outside of the fixed outline be set to zero — this is 
a key difference with the Poisson method which 
assumes that forces become zero at infinity. The 
value ¢;; at the center of each bin B;; is found 
by discretization of Eq. 4 using finite differences. 
The density constraints are replaced by ¢;; = 
K ,V Bi; € B where K is ascaled representative 
of the density target K. Wire length minimization 
subject to the smoothed density constraints can be 
solved via Uzawa’s algorithm. For quadratic wire 
length, this algorithm is a generalization of force- 
based spreading. 


Potential Function Spreading 

Target density constraints can also be satisfied 
via a penalty function. The area assigned 
to bin B;; by vertex v; is represented by 
Potential(v;, Bij) which is a_ bell-shaped 
function. The use of piecewise quadratic 
functions makes the potential function non- 
convex but smooth and differentiable [5]. The 
wire length approximation can be combined 
together with a penalty term given by Penalty = 


2 
DB eB (Maer: Potential(v;, Bij) — K) to 
arrive at an unconstrained optimization problem 
which is solved using a conjugate gradient 
method [5]. 


Applications 


Practical applications involve more sophisticated 
interconnect objectives, such as circuit delay, 
routing congestion, power dissipation, power 
density, and maximum thermal gradient. The 
above techniques are adapted to handle multiob- 
jective optimization. Many such extensions are 
based on heuristic assignment of net weights that 
encourage the shortening of some (e.g., timing 
critical and frequently switching) connections at 
the expense of other connections. To moderate 
routing congestion, predictive congestion maps 
are used to decrease the maximal density 
constraint for placement in congested regions. 


305 


Another application is in physical synthesis, 
where incremental placement is used to evaluate 
changes in circuit topology. 


Experimental Results and Data Sets 


Circuit placement has been actively studied for 
the past 30 years, and a wealth of experimental 
results have been reported. A 2003 result showed 
that placement tools could produce results as 
much as 1.41x to 2.09x known optimal wire 
lengths on average. In a 2006 placement con- 
test, academic software for placement produced 
results that differed by as much as 1.39x on 
average when the objective was the simultaneous 
minimization of wire length, routability, and run 
time. Placement run times for instances with 
2M movable objects ranged into hours. More 
recently, the gap in wire length between dif- 
ferent tools has decreased, and run times have 
improved, in part due to the use of multicore 
CPUs and vectorized arithmetics. Over the last 
10 years, wire length has improved by 20-25 % 
and run time by 15-20 times [8, 9]. More recent 
work in circuit placement has focused on other 
objectives such as routability in addition to wire 
length minimization. 

Modern benchmark suites include the ISPD05, 
ISPD06, ISPD11, and ISPD14 suites (http:// 
www.ispd.cc). Additional benchmark suites 
include the ICCADI12 = (http://cad_contest. 
cs.nctu.edu.tw/CAD-contest-at-ICCAD2012), 
ICCAD 13 (http://cad_contest.cs.nctu.edu.tw/ 
CAD-contest-at-ICCAD2013), and ICCAD14 
(http://cad_contest.ee.ncu.edu.tw/CAD-contest- 
at-ICCAD2014) suites. Instances in these 
benchmark suites contain between several 
hundred thousand to several million placeable 
objects. Additional benchmark suites also exist. 
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Problem Definition 


Circuit retiming is one of the most effective 
structural optimization techniques for sequential 
circuits. It moves the registers within a circuit 
without changing its function. Besides clock pe- 
riod, retiming can be used to minimize the num- 
ber of registers in the circuit. It is also called 
minimum area retiming problem Leiserson and 
Saxe [3] started the research on retiming and pro- 
posed algorithms for both minimum period and 
minimum area retiming. Both their algorithms 
for minimum area and minimum period will be 
presented here. 

The problems can be formally described as 
follows. Given a directed graph G = (V, E) 
representing a circuit — each node v € V rep- 
resents a gate and each edge e € FE represents 
a signal passing from one gate to another — with 
gate delays d : V — R* and register numbers 
w: E — N, the minimum area problem asks 
for a relocation of registers w’ Eo N 
such that the number of registers in the circuit 
is minimum under a given clock period @. The 
minimum period problem asks for a solution with 
the minimum clock period. 


Notations 


To guarantee that the new registers are actually a 
relocation of the old ones, a label r : V > Zis 
used to represent how many registers are moved 
from the outgoing edges to the incoming edges of 
each node. Using this notation, the new number 
of registers on an edge (u, v) can be computed as 


w [u,v] = wu, v] + r[v] — r[u]. 
The same notation can be extended from edges to 


paths. However, between any two nodes u and v, 
there may be more than one path. Among these 


Circuit Retiming 


paths, the ones with the minimum number of 
registers will decide how many registers can be 
moved outside of u and v. The number is denoted 
by W[u, v] for any u,v € V, that is, 


Wlu, v] 2 min > w[x, y] 
piux>v 
(x,y)Ep 
The maximal delay among all the paths from u to 
v with the minimum number of registers is also 
denoted by D[u, v], that is, 


A 
D{u, v| = max d(x 
w[p:u>vJ=W [u,v] pS 


xEp 


Constraints 


Based on the notations, a valid retiming r should 
not have any negative number of registers on any 
edge. Such a validity condition is given as 


PO(r) a V(u,v) € E: wiu,v]+r[v]—r[u] = 0 


On the other hand, given a retiming r, the mini- 
mum number of registers between any two nodes 
u and v is W[u, v] — r[u] + r[v]. This number 
will not be negative because of the previous 
constraint. However, when it is zero, there will be 
a path of delay D[u, v] without any register on it. 
Therefore, to have a retimed circuit working for 
clock period ¢, the following constraint must be 
satisfied. 


Pl(r) £VuvEV: Dur] >¢ 
=> Wlu,v)]+r[v]—r[u] > 1 


Key Results 


The object of the minimum area retiming is to 

minimize the total number of registers in the cir- 

cuit, which is given by }° w’[u,v]. Express- 
(u,v)EE 

ing w’[u, v] in terms of r, the objective becomes 


>; (in[v] — out[v]) * r[v] + > w[u, v] 


veV (u,v)EE 
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where in[v] is the in-degree and out[v] is the out- 
degree of node v. Since the second term is a 
constant, the problem can be formulated as the 
following integer linear program. 


Minimize ax (in[v] — out[v]) * r[v] 
veV 


s.t. w[u, v]+r[v]—r[uJ=0 Vu, ve E 


Wu, v]+r[v]—r[uJ=1 Vu, veV:D[u, v]>¢ 


rulleZ WueVv 


Since the constraints have only difference in- 
equalities with integer-constant terms, solving 
the relaxed linear program (without the integer 
constraint) will only give integer solutions. Even 
better, it can be shown that the problem is the dual 
of a minimum cost network flow problem and, 
thus, can be solved efficiently. 


Theorem 1 The integer linear program for the 
minimum area retiming problem is the dual of the 
following minimum cost network flow problem. 


Minimize > w[u, v] * f [u, v] 


(u,v)EE 
+ >> (Wu,v] = 1) * flu.v] 
D{u,v]>¢ 
s.t.in[v] + > Flv, w] = out[v] 


(u,w)eEVD[v.w]>¢ 
t+ feu 
(u,v)€ED[u,v]>@ 


flu,v] =O Vu,v) € ED[u, v] > ¢ 


VueVv 


From the theorem, it can be seen that the network 
graph is a dense graph where a new edge (u, v) 
needs to be introduced for any node pair u,v 
such that D[u, v] > ¢@. There may be redundant 
constraints in the system. 

For example, if W[u, w] = W[u, v] + w[v, w] 
and D[u, v] > @ then the constraint W[u, w] + 
r[w] — r[u] = 1 is redundant, since there are 
already W[u, v] + r[v] — r[u] = 1 and w[v, w] + 
r[w] — r[v] = 0. However, it may not be easy 
to check and remove all redundancy in the con- 
straints. 
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In order to build the minimum cost flow net- 
work, it is needed to first compute both matrices 
W and D. Since W [u, v] is the shortest path from 
u to v in terms of w, the computation of W can be 
done by an all-pair shortest paths algorithm such 
as Floyd-Warshall’s algorithm [1]. Furthermore, 
if the ordered pair (w[x, y],—d[x]) is used as 
the edge weight for each (x,y) € E, an all- 
pair shortest paths algorithm can also be used 
to compute both W and D. The algorithm will 
add weights by component-wise addition and will 
compare weights by lexicographic ordering. 

Leiserson and Saxe’s [3] first algorithm for the 
minimum period retiming was also based on the 
matrices W and D. The idea was that the con- 
straints in the integer linear program for the min- 
imum area retiming can be checked efficiently 
by Bellman-Ford’s shortest paths algorithm [1], 
since they are just difference inequalities. This 
gives a feasibility checking for any given clock 
period ». Then the optimal clock period can be 
found by a binary search on a range of possible 
periods. The feasibility checking can be done 
in O(|V |?) time, thus the runtime of such an 
algorithm is O(|V|? log |V]). 

Their second algorithm got rid of the con- 
struction of the matrices W and D. It still used 
a clock period feasibility checking within a bi- 
nary search. However, the feasibility checking 
was done by incremental retiming. It works as 
follows: Starting with r = 0, the algorithm 
computes the arrival time of each node by the 
longest paths computation on a DAG (Directed 
Acyclic Graph). For each node v with an arrival 
time larger than the given period g, the r[v] 
will be increased by one. The process of the 
arrival time computation and r increasing will be 
repeated |V| — 1 times. After that, if there is still 
arrival time that is larger than 4, then the period is 
infeasible. Since the feasibility checking is done 
in O(|V||£]|) time, the runtime for the minimum 
period retiming is O(|V|E| log |V). 


Applications 


Shenoy and Rudell [7] implemented Leiserson 
and Saxe’s minimum period and minimum area 
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retiming algorithms with some efficiency im- 
provements. For minimum period retiming, they 
implemented the second algorithm and, in order 
to find out infeasibility earlier, they introduced a 
pointer from one node to another where at least 
one register is required between them. A cycle 
formed by the pointers indicates the feasibility 
of the given period. For minimum area retiming, 
they removed some of the redundancy in the 
constraints and used the cost-scaling algorithm 
of Goldberg and Tarjan [2] for the minimum cost 
flow computation. 

As can be seen from the second minimum 
period retiming algorithm here and Zhou’s algo- 
rithm [9] in another entry (» Circuit Retiming: 
An Incremental Approach), incremental compu- 
tation of the longest combinational paths (i.e., 
those without register on them) is more efficient 
than constructing the dense graph (via matri- 
ces W and D). However, the minimum area 
retiming algorithm is still based on a minimum 
cost network flow on the dense graph. A more 
efficient algorithm based on incremental retiming 
has recently been designed for the minimum area 
problem by Wang and Zhou [8]. 


Experimental Results 


Sapatnekar and Deokar [6] and Pan [5] proposed 
continuous retiming as an efficient approximation 
for minimum period retiming and reported the ex- 
perimental results. Maheshwari and Sapatnekar 
[4] also proposed some efficiency improvements 
to the minimum area retiming algorithm and 
reported their experimental results. 
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Problem Definition 


Circuit retiming is one of the most effective 
structural optimization techniques for sequential 
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circuits. It moves the registers within a circuit 
without changing its function. The minimal pe- 
riod retiming problem needs to minimize the 
longest delay between any two consecutive reg- 
isters, which decides the clock period. 

The problem can be formally described as 
follows. Given a directed graph G = (V, E) 
representing a circuit — each node v € V rep- 
resents a gate and each edge e € FE represents 
a signal passing from one gate to another — with 
gate delays d : V — R* and register numbers 
w: EF > N,, it asks for a relocation of registers 
w’ : E — N such that the maximal delay between 
two consecutive registers is minimized. 


Notations To guarantee that the new registers 
are actually a relocation of the old ones, a label 
r : V — Z is used to represent how many 
registers are moved from the outgoing edges to 
the incoming edges of each node. Using this 
notation, the new number of registers on an edge 
(u, v) can be computed as 


w [u, v] = wlu, v] + r[v] — r[u]. 


Furthermore, to avoid explicitly enumerating the 
paths in finding the longest path, another label 
t : V > R? is introduced to represent the output 
arrival time of each gate, that is, the maximal 
delay of a gate from any preceding register. The 
condition for ¢ to be at least the combinational 
delays is 


Vlu,v] € E:w [u,v] = 0 = tv] = t[u] + du). 


Constraints and Objective Based on the no- 
tations, a valid retiming 7 should not have any 
negative number of registers on any edge. Such a 
validity condition is given as 


PO(r) & V(u,v) € E: wiu,v]+r[v]—r[u] = 0. 


As already stated, the conditions for ¢ to be 
valid arrival time is given by the following two 
predicates: 
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Pi(t) 2 Vv eV: t{v] = dfv] 
P2(r,t) £ V(u,v) € Es ru] —r[v] = wl, v] 


=> tlv] —¢[u]J>d[v]. 


A predicate P is used to denote the conjunction 
of the above conditions: 


P(r,t) & PO(r) A P(t) A P2(r,t). 


A minimal period retiming is a solution r, f satis- 
fying the following optimality condition: 


P34vr'.t': P(r’, t’) => max (t) < max (¢’) 


where 
max(t) & max[v]. 
veV 


Since only a valid retiming (r’, t’) will be dis- 
cussed in the sequel, to simplify the presentation, 
the range condition P(r’, t’) will often be omit- 
ted; the meaning shall be clear from the context. 


Key Results 


This section will show how an efficient algorithm 
is designed for the minimal period retiming prob- 
lem. Contrary to the usual way of only presenting 
the final product, i.e., the algorithm, but not the 
ideas on its design, a step-by-step design process 
will be shown to finally arrive at the algorithm. 

To design an algorithm is to construct a proce- 
dure such that it will terminate in finite steps and 
will satisfy a given predicate when it terminates. 
In the minimal period retiming problem, the pred- 
icate to be satisfied is PO A P1 A P2 A P3. The 
predicate is also called the post-condition. It can 
be argued that any nontrivial algorithm will have 
at least one loop; otherwise, the processing length 
is only proportional to the text length. Therefore, 
some part of the post-condition will be iteratively 
satisfied by the loop, while the remaining part 
will be initially satisfied by an initialization and 
made invariant during the loop. 

The first decision needed to make is to par- 
tition the post-condition into possible invariant 
and loop goal. Among the four conjuncts, the 
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predicate P3 gives the optimality condition and 
is the most complex one. Therefore, it will be 
used as a loop goal. On the other hand, the 
predicates PO and P1 can be easily satisfied by 
the following simple initialization: 


Based on these, the plan is to design an algorithm 
with the following scheme: 


r,t:=0,d 
dof POA PI} 

=P2 — update t 

=P3 — update r 
od{ POA PLA P2A P3}. 


The first command in the loop can be refined as 


d(u, v)€E : r[u] —r[v] = wiu, v] A tv] 
—tlu] < d[v] > t[v] 


:= t{u] + d[v]. 


This is simply the Bellman-Ford relaxations for 
computing the longest paths. 

The second command is more difficult to re- 
fine. If —P3, that is, there exists another valid 
retiming r’,t’ such that max(t) > max(¢’), then 
on any node v such that ¢[v] = max(f) it must 
have t’[v] < t[v]. One property known on these 
nodes is 


Vu eV: t'[v] < tv] 
=> (due V:rlu]—rlv] > r’[u]—r’[v]), 


which means that if the arrival time of v is smaller 
in another retiming r’,t’, then there must be a 
node u such that r’ gives more registers between 
u and v. In fact, one such a u is the starting node 
of the longest combinational path to v that gives 
the delay of ¢[v]. 

To reduce the clock period, the variable r 
needs to be updated to make it closer to r’. It 
should be noted that it is not the absolute values 
of r but their differences that are relevant in 
the retiming. If r,¢ is a solution to a retiming 
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problem, thenr+c,t, where c € Zis an arbitrary 
constant, is also a solution. Therefore r can be 
made “closer” to r’ by allocating more registers 
between u and v, that is, by either decreasing r[u] 
or increasing r[v]. Notice that v can be easily 
identified by t[v] = max(t). No matter whether 
r[v] or r[u] is selected to change, the amount of 
change should be only one since r should not 
be overadjusted. Thus, after the adjustment, it 
is still true that r[v] — r[u] < r’[v] — r’[u] or 
equivalently r[v] — r’[v] < r[u] — r’[u]. Since 
v is easy to identify, r[v] is selected to increase. 
The arrival time ¢[v] can be immediately reduced 
to d[v]. This gives a refinement of the second 
command: 


3=P3A P2A4v € V : tv] = max (¢) 
> rlv], tv] = r[v] + 1, d[v]. 


Since registers are moved in the above operation, 
the predicate P2 may be violated. However, 
the first command will take care of it. That 
command will increase t on some _ nodes; 
some may even become larger than max(t) 
before the register move. The same reasoning 
using r’,t’ shows that their r values shall 
be increased, too. Therefore, to implement 
this as-soon-as-possible (ASAP) increase of 
r, a snapshot of max(f) needs to be taken 
when P2 is valid. Physically, such a snapshot 
records one feasible clock period ¢ and can be 
implemented by adding one more command in 
the loop: 


P2A@¢ > max (t) > ¢@ := max (f). 


However, such an ASAP operation may increase 
r{u] even when w[u, v] — r[u] + r[v] = 0 for 
an edge (u,v). It means that PO may no longer 
be an invariant. But moving PO from invariant 
to loop goal will not cause a problem since one 
more command can be added in the loop to take 
care of it: 


d(u,v) € E:r[uj —r[v] > wiu, v] 
> rlv] := r[u] — wu, v]. 
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Putting all things together, the algorithm now has 
the following form: 


r,t, @ := 0, d, 00; 
do{ P 1} 
dtu,v) € E: ru] —rlv] = w[y, v] 
Atv] — t[u] < d[v] > t[v] := t[u] + d[v] 
“=P3AdvEV: tl] >¢ 
> rv], tiv] = r[v] + 1,d[v] 
POA P2A ¢ > max(t) > ¢ := max(t) 
dtu,v) € E:r[u]—rlv] > wily, v] 
> r[v] :=r[u] — wu, v] 
od{ POA P1A P2A P3}. 


The remaining task to complete the algorithm 
is how to check —P3. From previous discussion, 
it is already known that —P3 implies that there 
is anode u such that r[u] — r’[u] > r’[v] — r’[v] 
every time after r[v] is increased. This means that 
maxyey r[v] — r’[v] will not increase. In other 
words, there is at least one node v whose r[v] 
will not change. Before r[v] is increased, it also 
has Wysy — r[u] + r[v] < 0, where wy.» > 0 is 
the original number of registers on one path from 
u to v, which gives r[v] — r[u] < 1 even after the 
increase of r[v]. This implies that there will be at 
least i + 1 nodes whose r is at most i for 0 < 
i < |V|. In other words, the algorithm can keep 
increasing r and when there is any r reaching 
|V| it shows that P3 is satisfied. Therefore, the 
complete algorithm will have the following form: 


r,t, @ := 0,d, oo; 
dof P1} 
dtu,v) € E: ru] —rlv] = win, v] 

Atv] — t[u] < d[v] > t[v] := t[u] + d[v] 
(Vv EV: rv] < |V]) 

AdveV : t[v]=e—r|[v], tlv]:=r[v]+1, d[v] 
(av eV: r[v] > |V]) 

AdveV : t[v]=e—r|[v], tlv]:=r[v]+1, d[v] 
POA P2A ¢ > max(t) > ¢ := max(t) 
dtu,v) € E:r[u]—rlv] > wily, v] 

> r[v] :=r[u] — w[u, v] 

odf POA P1A P2 A P3}. 


The correctness of the algorithm can be proved 
easily by showing that the invariant P1 is main- 
tained and the negation of the guards implies 
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POA P2A P3. The termination is guaranteed 
by the monotonic increase of r and an upper 
bound on it. In fact, the following theorem gives 
its worst-case runtime. 


Theorem 1 The worst-case running time of the 
given retiming algorithm is upper bounded by 
O(\V 7|E)). 


The runtime bound of the retiming algorithm 
is got under the worst-case assumption that each 
increase on r will trigger a timing propagation on 
the whole circuit (|E| edges). This is only true 
when the r increase moves all registers in the 
circuit. However, in such a case, the r is upper 
bounded by 1, thus the running time is not larger 
than O(|V||£|). On the other hand, when the r 
value is large, the circuit is partitioned by the 
registers into many small parts, thus the timing 
propagation triggered by one r increase is limited 
within a small tree. 


Applications 


In the basic algorithm, the optimality P3 is ver- 
ified by an r[v] > |V|. However, in most cases, 
the optimality condition can be discovered much 
earlier. Since each time r[v] is increased, there 
must be a “safeguard” node wu such that r[u] — 
r'{u] => r[v]—r’[v] after the operation. Therefore, 
if a pointer is introduced from v to u when r[v] 
is increased, the pointers cannot form a cycle 
under — P3. In fact, the pointers will form a forest 
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where the roots have r = 0 and a child can have 
an r at most one larger than its parent. Using 
a cycle by the pointers as an indication of P3, 
instead of an r[v] > |V|, the algorithm can have 
much better practical performance. 

Retiming is usually used to optimize either the 
clock period or the number of registers in the 
circuit. The discussed algorithm solves only the 
minimal period retiming problem. The retiming 
problem for minimizing the number of registers 
under a given period has been solved by Leiser- 
son and Saxe [1] and is presented in another entry 
in this encyclopedia. Their algorithm reduces the 
problem to the dual of a minimal cost network 
problem on a denser graph. An efficient itera- 
tive algorithm similar to Zhou’s algorithm has 
been designed for the minimal register problem 
recently [3]. 


Experimental Results 


Experimental results are reported by Zhou [4] 
which compared the runtime of the algorithm 
with an efficient heuristic called ASTRA [2]. The 
results on the ISCAS89 benchmarks are repro- 
duced here in Table 1 from [4], where columns 
A and B are the running time of the two stages in 
ASTRA. 
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Name #gates Clock period vr #updates Time(s) ASTRA 
Before After A(s) | B(s) 
$1423 490 166 127 808 7,619 0.02 0.03 0.02 
$1494 558 89 88 628 7,765 0.02 0.01 0.01 
$9234 2,027 89 81 2,215 76,943 0.12 0.11 0.09 
$9234.1 2,027 89 81 2,164 77,644 0.16 0.11 0.10 
$13207 2,573 143 82 4,086 28,395 0.12 0.38 0.12 
s15850 3,448 186 77 12,038 99,314 0.36 0.43 0.17 
$35932 12,204 109 100 16,373 108,459 = 0.28 0.24 0.65 
$38417 8,709 110 56 9,834 155,489 = 0.58 0.89 0.64 
s38584 =—§.: 11,448 191 163 19,692 155,637 0.41 0.50 0.67 
s38584.1 11,448 191 183 9,416 114,940 0.48 0.55 0.78 
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Problem Definition 


We discuss a simple undirected and connected 
graph G = (V, £) with a finite set V of vertices 
and a finite set E C V x V of edges. A pair of 
vertices v and w is said to be adjacent if (v, w) € 
E. For a subset R C V of vertices, G(R) = 
(R, EM (R x R)) is an induced subgraph. An 
induced subgraph G(Q) is said to be a clique 
if (v,w) € E for all vy,w € O C V with 
uv # w. In this case, we may simply state that 
Q is a clique. In particular, a clique that is not 
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properly contained in any other clique is called 
maximal. An induced subgraph G(S) is said to 
be an independent set if (v,w) ¢ E forall v,w € 
S C V. Fora vertex v € V, let T(v) = {we 
V|(v, w) € E}. We call |I"(v)| the degree of v. 
The problem is to enumerate all maximal 
cliques of the given graph G = (V,£). It is 
equivalent to enumerate all maximal independent 
sets of the complementary graph G = (V, E), 
where E = {(v,w) © VxV|(v,w) ¢ E,v ¥ wh. 


Key Results 


Efficient Algorithms for Clique 

Enumeration 

Efficient algorithms to solve the problem can be 
found in the following approaches (1) and (2). 


(1) Clique Enumeration by Depth-First 
Search with Pivoting Strategy 


The basis of the first approach is a simple depth- 
first search. It begins with a clique of size 0 and 
continues with finding all of the progressively 
larger cliques until they can be verified as max- 
imal. Formally, this approach maintains a global 
variable O = {p1, Po,..., Pa} that consists of 
vertices of a current clique found so far. Let 


SUBG =V AT (p1) NP (2) Ns NE (pg). 


We begin the algorithm by letting Q = © 
and SUBG := V (the set of all vertices). We 
select a certain vertex p from SUBG and add 
p to O(O := QU {p}). Then we compute 
SUBG,p := SUBGNI(p) as the new set of ver- 
tices in question. In particular, the first selected 
vertex u € SUBG is called a pivot. This pro- 
cedure (EXPAND()) is applied recursively while 
SUBG, # ©. 

When SUBG, = @ is reached, Q constitutes 
a maximal clique. We then backtrack by remov- 
ing the lastly inserted vertex from Q and SUBG. 
We select a new vertex p from the resulting 
SUBG and continue the same procedure until 
SUBG = 9. This process can be represented 
by a depth-first search forest. See Fig.2b as an 
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procedure CLIQUES(G) 
begin 

1: EXPAND(V,V) 

end of CLIQUES 


procedure EXPAND(SUBG, CAND ) 
begin 


Clique Enumeration 


jx Q:=0 +/ 


/* Q isa maximal clique */ 
/* pivot */ 


/* Q:=QU{q} */ 


22 if SUBG = 0 

3: then print(“clique,” ) 

4: else u := a vertex u in SUBG that maximizes | CAND 1 I'(u) |; 
5: while CAND — I'(u) 40 

6: do q := a vertex in (CAND — I'(u)); 
7: print (a.°.")5 

8: SUBG, := SUBGN I'(q); 

9: CAND, := CAND NT(q); 

10: EXPAND(SUBG,, CAND ,); 

Ul: CAND := CAND —{ q}; 

12: print (“back,”) 


od 
fi 
end of EXPAND 


Clique Enumeration, Fig. 1 Algorithm CLIQUES 


example of an essential part of a search forest. 
It clearly generates all maximal cliques. 

The above-generated maximal cliques, how- 
ever, could contain duplications or nonmaximal 
ones, SO we prune unnecessary parts of the search 
forest as in the Bron-Kerbosch algorithm [3]. 

First, let FINI be a subset of vertices of SUBG 
that have already been processed by the algo- 
rithm. (FINI is short for finished.) Then we denote 
by CAND the set of remaining candidates for ex- 
pansion: CAND := SUBG — FINI, where for two 
sets X andY, X—Y = {v|v € X andv ¢ Y}. At 
the beginning, FINI := @ and CAND := SUBG. 
In the subgraph G(SUBG,) with SUBG, : = 
SUBG 1 T'(q), let 


FINIq := SUBGgM FINI, 
CANDg := SUBGq — FIN Iq. 


Then only the vertices in CANDg can be candi- 
dates for expanding the clique Q U {q} to find 
new larger cliques. 

Second, for the first selected pivot u in SUBG, 
any maximal clique R in G(SUBG NM T(u)) is 
not maximal in G(SUBG), since R U {u} is a 
larger clique in G(SUBG). Therefore, searching 
for maximal cliques from SUBG M Tu) should 
be excluded. 


/* Q:=Q-{4q} */ 


When the previously described pruning 
method is also taken into consideration, we find 
that the only search subtrees to be expanded are 
from the vertices in (SUBG — SUBG N T(u)) — 
FINI = CAND—T(u). Here, in order to minimize 
|CAND — T(u)|, we choose the pivot u € SUBG 
to be the one that maximizes |CANDNT(u)|. This 
is crucial to establish the optimality of the worst- 
case time complexity of the algorithm. This kind 
of pivoting strategy was proposed by Tomita et al. 
[11]. (Recommended Reading [11] was reviewed 
by Pardalos and Xue [10] and Bomze et al. [2].) 

The algorithm CLIQUES by Tomita et al. [12] 
is shown in Fig. 1. It enumerates all maximal 
cliques based upon the above approach, but all 
maximal cliques enumerated are presented in a 
tree-like form. Here, if Q is a maximal clique that 
is found at statement 2, then the algorithm only 
prints out a string of characters clique instead of 
Q itself at statement 3. Otherwise, it is impos- 
sible to achieve the optimal worst-case running 
time. Instead, in addition to printing clique at 
statement 3, we print out g followed by a comma 
at statement 7 every time q is picked out as a 
new element of a larger clique, and we print out 
a string of characters back at statement 12 after 
q is moved from CAND to FINI at statement 11. 
We can easily obtain a tree representation of all 
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c 


4, 6, 7, 8, cl, bk, bk, 
5, cl, bk, bk, 


clique 
{4, 6, 5} 


clique clique 


{4, 3,8}  {1, 2, 9} |" {A 
SUBG 


(cl and bk are short 
for clique and back, 
respectively.) 


clique 
{2, 3, 9} 


O :u (pivot) The output in 


tree -like form 
O €canp —Tu) 
CAND { 


@ cand Nn MW) 


A search forest 


Clique Enumeration, Fig. 2, An example run of CLIQUES [12]. (a) A graph G. (b) A search forest. (c) The output 


in tree-like form 


the maximal cliques from the sequence printed 
by statements 3, 7, and 12. The output in a tree- 
like format is also important practically since it 
saves space in the output file. An example run of 
CLIQUES to Fig. 2a is shown in Fig. 2b, c with 
appropriate indentations. 

The worst-case time complexity of CLIQUES 
was proved to be O(3"/3) for an n-vertex graph 
[11,12]. This is optimal as a function of n since 
there exist up to 3”/? cliques in an n-vertex graph 
[9]. 

Eppstein et al. [5] used this approach and pro- 
posed an algorithm for enumerating all maximal 
cliques that runs in time O(dn34/3) for an n- 
vertex graph G, where d is the degeneracy of G 
that is defined to be the smallest number such that 
every subgraph of G contains a vertex of degree 
at most d. If graph G is sparse, d can be much 
smaller than n and hence O(dn3¢/) can be much 
smaller than O(3”/?). 


(2) Clique Enumeration by Reverse Search 


The second approach is regarded to be based 
upon the reverse search that was introduced by 
Avis and Fukuda [1] to solve enumeration prob- 
lems efficiently. 


Given the graph G = (V,£) with V = 
{V1,U2,...,U,} Where n = |V|, let Vi = 
{v1,V2,...,u;}. Then {v;} is simply a maximal 


clique in G(V;). All the maximal cliques in 
G(V;) are enumerated based on those in G(V;_1) 


step by step fori = 2,3,...,n. The process 
forms an enumeration tree whose root is {v1}, 
where the root is considered at depth | of the 
enumeration tree for the sake of simplicity. 
The children at depth 7 are all the maximal 
cliques in G(V;) fori = 2,3,...,n. For two 
subsets X,Y C V, we say that X precedes Y in 
lexicographic order if for v; € (X —Y)U(Y —X) 
with the minimum index 7 it holds that v; € X. 

Let Q be a maximal clique in G(V;_1). If v; is 
adjacent to all vertices in Q, then Q U {v;} is the 
only child of Q at depth 7. Otherwise, Q itself 
is the first child of Q at depth 7. In addition, if 
QO NT(y;) U {u;} is a maximal clique in G(V;), 
then it is a candidate for the second child of QO at 
depth 7. The unique parent of 0 NT'(v;) U {v;} is 
defined to be the lexicographically first maximal 
clique in G(V;_;) that contains OQ N T(v;). Cin 
general, there exist multiple numbers of distinct 
maximal cliques that contain QO M ['(v;) at depth 
i-1.) 

The algorithm of Tsukiyama et al. [14] tra- 
verses the above enumeration tree in a depth-first 
way. Such a traversal is considered to be reverse 
search [8]. To be more precise, the algorithm MIS 
in [14] is to enumerate all maximal independent 
sets, and we are concerned here with its com- 
plementary algorithm in [8] that enumerates all 
maximal cliques, which we call here MIS. An 
example run of MIS to Fig. 2a is shown in Fig. 3a. 
Algorithms MIS and MIS run in time O(m'n) and 
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a 


G(V,) G(V2) G(V3)—G(V4) 


V, = (1,2,...,i} 


Clique Enumeration 


G(Vs) G(Ve) G(V;) G(Vg) G(Vo) ° 
(1pm, 2} ef 1,2 fmm 1 2 fmm f 1,2 Jee 1, 2 fe £12} ee (1, 2 Se £1, 2,9} {12,9} 
vee {2,3 Jam { 2,3 am{ 2,3} mem {2,3} —m{ 2,3 Jew { 2, 3,9} (28,9) 
ae (3,4 Jum 3,4} mm {3,4} mee {34,8} mel 3,4,8} a8} 
eee {45,6} mm{4,5,6 }mmm{4,5,6} (4,5, 6} 
Ne {4,6,7,8} 
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Clique Enumeration, Fig. 3 Enumeration trees for Fig. 2a in reverse search. (a) By MIS [14,8]. (b) By AMC [8] 


O(mn) per maximal clique, respectively, where 
n= |V|,m = |E|, and m' = |B| [8,14]. 

Chiba and Nishizeki [4] reduced the time com- 
plexity of MIS to O(a(G)m) per maximal clique, 
where a(G) is the arboricity of G with m/(n — 
1) < a(G) < O(m"/) for a connected graph 
G. Johnson et al. [7] presented an algorithm that 
enumerates all maximal cliques in lexicographic 
order in time O(mn) per maximal clique [7, 8]. 

Makino and Uno [8] proposed new algorithms 
that are based on the algorithm of Tsukiyama et 
al. [14]. Let C(Q) denote the lexicographically 
first maximal clique containing Q in graph G. 
The root of their enumeration tree is the lexico- 
graphically first maximal clique Qo in G. Fora 
maximal clique O # Qg in the enumeration tree, 
define the parent of Q to be C(Q NM V;) where i 
is the maximum index such that C(ONV;) 4 Q. 
Such a parent uniquely exists for every 0 # Qo. 
In the enumeration tree, O' = C(QNV; N 
T(v;) U {u;}) is a child of Q if and only if Q isa 
parent of Q’. (In general, a parent has at most |V | 
children.) This concludes the description of the 
enumeration tree of ALLMAXCLIQUES (AMC 
for short) in [8]. An example run to Fig. 2a is 
shown in Fig. 3b, where the bold-faced vertex is 
the minimum 7 such that ON V; = Q. Algo- 
rithm AMC runs in time O(M(n)) per maximal 
clique, where M(n) denotes the time required to 
multiply two n x n matrices. Another algorithm 
in [8] runs in time O(A*) per maximal clique, 
where A is the maximum degree of G. Here, if 
G is sparse, then A can be small. In addition, 
they presented an algorithm that enumerates all 


maximal bipartite cliques in a bipartite graph in 
time O(A?) per maximal bipartite clique. 


Applications 


Clique enumeration has diverse applications in 
clustering, data mining, information retrieval, 
bioinformatics, computer vision, wireless 
networking, computational topology, and many 
other areas. Here, one of Makino and Uno’s 
algorithms [8] was successfully applied for 
enumerating frequent closed itemsets [16]. See 
Recommended Reading [2, 5, 6, 8, 10, 12, 13, 16] 
for details. For practical applications, enumer- 
ation of pseudo cliques is sometimes more 
important [15]. 


Experimental Results 


Experimental Results are shown in Recom- 
mended Reading [12, 6, 14, 8]. Tomita et al.’s 
algorithm CLIQUES [12] is fast especially for 
graphs with high and medium density. Eppstein 
et al.’s algorithm [5] is effective for very large and 
sparse graphs [6]. Makino and Uno’s algorithms 
[8] can be fast for sparse graphs especially when 
they have a small number of maximal cliques. 
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Problem Definition 


Background and Overview 

Coordinating processors located in different 
places is one of the fundamental problems in 
distributed computing. In his seminal work, 
Lamport [4, 5] studied the model where the 
only source of coordination is message exchange 
between the processors; the time that elapses 
between successive steps at the same processor, 
as well as the time spent by a message in 
transit, may be arbitrarily large or small. 
Lamport observed that in this model, called the 
asynchronous model, temporal concepts such 
s “past” and “future” are derivatives of causal 
dependence, a notion with a simple algorithmic 
interpretation. The work of Patt-Shamir and 
Rajsbaum [10] can be viewed as extending 
Lamport’s qualitative treatment with quantitative 
concepts. For example, a statement like “event 
a happened before event b” may be refined to 
a statement like “event a happened at least 2 time 


ent-algorithms-for-finding-maximum-and-maximal-cliques~ Hoctiye: tools Sor keniaitonia 


Tsukiyama S, Ide M, Ariyoshi H, Shirakawa I (1977) 
A new algorithm for generating all the maximal 
independent sets. SIAM J Comput 6:505-517 

Uno T (2010) An efficient algorithm for solving 
pseudo clique enumeration problem. Algorithmica 
56:3-16 


units an 
This is in contrast to most previous theoretical 
work, which focused on the linear-programming 
aspects of clock synchronization (see below). 


Ss 
mos ime units before event b”. 


The basic idea in [10] is as follows. First, 


the framework is extended to allow for upper 
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and lower bounds on the time that elapses 
between pairs of events, using the system’s 
real-time specification. The notion of real-time 
specification is a very natural one. For example, 
most processors have local clocks, whose rate 
of progress is typically bounded with respect to 
real time (these bounds are usually referred to 
as the clock’s “drift bounds”). Another example 
is send and receive events of a given message: 
It is always true that the receive event occurs 
before the send event, and in many cases, tighter 
lower and upper bounds are available. Having 
defined real-time specification, [10] proceeds to 
show how to combine these local bounds global 
bounds in the best possible way using simple 
graph-theoretic concepts. This allows one to 
derive optimal protocols that say, for example, 
what is the current reading of a remote clock. 
If that remote clock is the standard clock, then 
the result is optimal clock synchronization in the 
common sense (this concept is called “external 
synchronization” below). 


Formal Model 

The system consists of a fixed set of intercon- 
nected processors. Each processor has a local 
clock. An execution of the system is a sequence 
of events, where each event is either a send 
event, a receive event, or an internal event. Re- 
garding communication, it is only assumed that 
each receive event of a message m has a unique 
corresponding send event of m. This means that 
messages may be arbitrarily lost, duplicated or 
reordered, but not corrupted. Each event e oc- 
curs at a single specified processor, and has two 
real numbers associated with it: its local time, 
denoted LT(e), and its real time, denoted RT(e). 
The local time of an event models the reading 
of the local clock when that event occurs, and 
the local processor may use this value, e.g., for 
calculations, or by sending it over to another 
processor. By contrast, the real time of an event 
is not observable by processors: it is an abstract 
concept that exists only in the analysis. 

Finally, the real-time properties of the system 
are modeled by a pair of functions that map 
each pair of events to RU {—oo, 00}: given 
two events e and e’, L(e,e’) = £ means that 
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RT(e’) —RT(e) => £, and H(e,e’) =h means 
that RT(e’) —RT(e) < h, ie., that the number 
of (real) time units since the occurrence of event 
e until the occurrence of e’ is at least & and at 
most h. Without loss of generality, it is assumed 
that L(e, e’) = —H(e’, e) for all events e, e’ (just 
use the smaller of them). Henceforth, only the 
upper bounds function H is used to represent the 
real-time specification. 

Some special cases of real time properties 
are particularly important. In a completely asyn- 
chronous system, H(e’,e) = 0 if either e occurs 
before e’ in the same processor, or if e and e’ are 
the send and receive events, respectively, of the 
same message. (For simplicity, it is assumed that 
two ordered events may have the same real time 
of occurrence.) In all other cases H(e, e’) = oo. 
On the other extreme of the model spectrum, 
there is the drift-free clocks model, where all 
local clocks run at exactly the rate of real time. 
Formally, in this case H(e, e’) = LT(e’) — LT(e) 
for any two events e and e’ occurring at the 
same processor. Obviously, it may be the case 
that only some of the clocks in the system are 
drift-free. 


Algorithms 

In this work, message generation and delivery is 
completely decoupled from message information. 
Formally, messages are assumed to be generated 
by some “send module”, and delivered by the 
“communication system’’. The task of algorithms 
is to add contents in messages and state variables 
in each node. (The idea of decoupling synchro- 
nization information from message generation 
was introduced in [1].) The algorithm only has 
local information, i.e., contents of the local state 
variables and the local clock, as well as the con- 
tents of the incoming message, if we are dealing 
with a receive event. It is also assumed that the 
real time specification is known to the algorithm. 
The conjunction of the events, their and their 
local times (but not their real times) is called 
as the view of the given execution. Algorithms, 
therefore, can only use as input the view of an 
execution and its real time specification. 


Clock Synchronization 


Problem Statement 

The simplest variant of clock synchronization 
is external synchronization, where one of the 
processors, called the source, has a drift-free 
clock, and the task of all processors is to maintain 
the tightest possible estimate on the current 
reading of the source clock. This formulation 
corresponds to the Newtonian model, where 
the processors reside in a well-defined time 
coordinate system, and the source clock is 
reading the standard time. Formally, in external 
synchronization each processor v has two output 
variables A, and &y; the estimate of uv of the 
source time at a given state is LT, + Ay, where 
LT, is the current local time at v. The algorithm is 
required to guarantee that the difference between 
the source time and it estimate is at most €, (note 
that A,, as well as ¢,, may change dynamically 
during the execution). The performance of the 
algorithm is judged by the value of the ey 
variables: the smaller, the better. 

In another variant of the problem, called in- 
ternal synchronization, there is no distinguished 
processor, and the requirement is essentially that 
all clocks will have values which are close to 
each other. Defining this variant is not as straight- 
forward, because trivial solutions (e.g., “set all 
clocks to 0 all the time’’) must be disqualified. 


Key Results 


The key construct used in [10] is the synchroniza- 
tion graph of an execution, defined by combining 
the concepts of local times and real-time specifi- 
cation as follows. 


Definition 1 Let B be a view of an execution 
of the system, and let H be a real time specifi- 
cation for B. The synchronization graph gener- 
ated by B and H is a directed weighted graph 
Ign = (V,E,w), where V is the set of events 
in 8, and for each ordered pair of events p q 
in B such that H(p,q) < ov, there is a directed 
edge (p,q) € E. The weight of an edge (p, q) is 


w(p.q) = H(p.q) —LT(p) + LT(q). 
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The natural concept of distance from an event 
p to an event g in a synchronization graph T, 
denoted dr (p,q), is defined by the length of the 
shortest weight path from p to q, or infinity if 
q is not reachable from p. Since weights may 
be negative, one has to prove that the concept 
is well defined: indeed, it is shown that if gy 
is derived from an execution with view 6 that 
satisfies real time specification H, then Igy 
does not contain directed cycles of negative 
weight. 

The main algorithmic result concerning syn- 
chronization graphs is summarized in the follow- 
ing theorem. 


Theorem 1 Let a be an execution with view 
B. Then a satisfies the real time specification H 
if and only if RT(p) — RT(q) < dr(p.q) + 
LT(p)—LT(q) for any two events p and q in 'gy. 


Note that all quantities in the r.h.s. of the inequal- 
ity are available to the synchronization algorithm, 
which can therefore determine upper bounds on 
the real time that elapses between events. More- 
over, these bounds are the best possible, as im- 
plied by the next theorem. 


Theorem 2 Let Igy = (V, E,w) be a synchro- 
nization graph obtained from a view f satisfying 
real time specification H. Then for any given 
event Po € V, and for any finite number N > 0, 
there exist executions a9 and a with view B, both 
satisfying H, and such that the following real time 
assignments hold. 


¢ In ao, for all q € V with dr(q, po) < 
00,RTay(q) = LT(q) + dr(q, po), and for 
allq € V with dr(q, po) = 0&,RTa (gq) > 
LT(q) + N. 

¢ In a, for all q € V with dr(po,q) < 
oo,RTo, (q) = LT(q) — dr(po.q), and for 
all q € V with dr(po,q) = ©,RTo,(q) < 
LT(q) — N. 


From the algorithmic viewpoint, one important 
drawback of results of Theorems | and 2 is that 
they depend on the view of an execution, which 
may grow without bound. Is it really necessary? 
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The last general result in [10] answers this ques- 
tion in the affirmative. Specifically, it is shown 
that in some variant of the branching program 
computational model, the space complexity 
of any synchronization algorithm that works 
with arbitrary real time specifications cannot 
be bounded by a function of the system size. The 
result is proved by considering multiple scenarios 
on a simple system of four processors on a line. 


Later Developments 

Based on the concept of synchronization graph, 
Ostrovsky and Patt-Shamir present a refined gen- 
eral optimal algorithm for clock synchroniza- 
tion [9]. The idea in [9] is to discard parts of 
the synchronization graphs that are no longer 
relevant. Roughly speaking, the complexity of 
the algorithm is bounded by a polynomial in the 
system size and the ratio of processors speeds. 

Much theoretical work was invested in the 
internal synchronization variant of the problem. 
For example, Lundelius and Lynch [7] proved 
that in a system of n processors with full con- 
nectivity, if message delays can take arbitrary 
values in [0, 1] and local clocks are drift-free, then 
the best synchronization that can be guaranteed 
is 1— 1 Helpern et al. [3] extended their re- 
sult to general graphs using linear-programming 
techniques. This work, in turn, was extended by 
Attiya et al. [1] to analyze any given execution 
(rather than only the worst case for a given topol- 
ogy), but the analysis is performed off-line and 
in a centralized fashion. The work of Patt-Shamir 
and Rajsbaum [10] extended the “per execution” 
viewpoint to on-line distributed algorithms, and 
shifted the focus of the problem to external syn- 
chronization. 

Recently, Fan and Lynch [2] proved that in 
a line of n processors whose clocks may drift, 
no algorithm can guarantee that the difference be- 
tween the clock readings of all pairs of neighbors 
is o(logn/ log logn). 

Clock synchronization is very useful in 
practice. See, for example, Liskov [6] for 
some motivation. It is worth noting that the 
Internet provides a protocol for external clock 
synchronization called NTP [8]. 
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Applications 


Theorem | immediately gives rise to an algo- 
rithm for clock synchronization: every processor 
maintains a representation of the synchronization 
graph portion known to it. This can be done 
using a full information protocol: In each out- 
going message this graph is sent, and whenever 
a message arrives, the graph is extended to in- 
clude the new information from the graph in the 
arriving message. By Theorem 2, the synchro- 
nization graph obtained this way represents at 
any point in time all information available re- 
quired for optimal synchronization. For example, 
consider external synchronization. Directly from 
definitions it follows that all events associated 
with a drift-free clock (such as events in the 
source node) are at distance 0 from each other in 
the synchronization graph, and can therefore be 
considered, for distance computations, as a single 
node s. Now, assuming that the source clock 
actually shows real time, it is easy to see that for 
any event p, 


RT(p) € [LT(p) — d(s, p), LT(p) + d(p,5)], 


and furthermore, no better bounds can be ob- 
tained by any correct algorithm. 

The general algorithm described above 
(maintaining the complete synchronization 
graph) can be used also to obtain optimal 
results for internal synchronization; details are 
omitted. 

An interesting special case is where all clocks 
are drift free. In this case, the size of the 
synchronization graph remains fixed: similarly 
to a source node in external synchronization, 
all events occurring at the same processor can 
be mapped to a single node; parallel edges 
can be replaced by a single new edge whose 
weight is minimal among all old edges. This 
way one can obtain a particularly efficient 
distributed algorithm solving external clock 
synchronization, based on _ the distributed 
Bellman—Ford algorithm for distance compu- 
tation. 


Closest String and Substring Problems 


Finally, note that the asynchronous model may 
also be viewed as a special case of this general 
theory, where an event p “happens before” an 
event q if and only if d(p,q) < 0. 


Open Problems 


One central issue in clock synchronization is 
faulty executions, where the real time specifica- 
tion is violated. Synchronization graphs detect 
any detectable error: views which do not have 
an execution that conforms with the real time 
specification will result in synchronization graphs 
with negative cycles. However, it is desirable to 
overcome such faults, say by removing from the 
synchronization graph some edges so as to break 
all negative-weight cycles. The natural objective 
in this case is to remove the least number of 
edges. This problem is APX-hard as it generalizes 
the Feedback Arc Set problem. Unfortunately, no 
non-trivial approximation algorithms for it are 
known. 
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Problem Definition 


The problem of finding a center string that is 
“close” to every given string arises and has 
applications in computational molecular biol- 
ogy [4,5, 9-11, 18, 19] and coding theory [1,6,7]. 

This problem has two versions: The first prob- 
lem comes from coding theory when we are 
looking for a code not too far away from a given 
set of codes. 


Problem 1 (The closest string problem) Input: 
a set of strings S = {51,52,...,5,}, each of 
length m. 

Output: the smallest d and a string s of length 
m which is within Hamming distance d to each 
5 ES. 


The second problem is much more elusive 
than the closest string problem. The problem 
is formulated from applications in finding con- 
served regions, genetic drug target identification, 
and genetic probes in molecular biology. 


Problem 2 (The closest substring problem) 
Input: an integer L and a set of strings S = 
{51,52,...,5,}, each of length m. 

Output: the smallest d and a string s, of length 
L, which is within Hamming distance d away 
from a length L substring ¢; of s; fori = 
1,2,...,n. 


The following results on approximation algo- 
rithms are from [12-15]. 


Theorem 1 There is a polynomial time approxi- 
mation scheme for the closest string problem. 


Theorem 2 There is a polynomial time approxi- 
mation scheme for the closest substring problem. 


A faster approximation algorithm for the clos- 
est string problem was given in [16]. 

Lots of results have been obtained in terms of 
parameterized complexity and fixed-parameter 
algorithms. In 2005, Marx showed that the 
closest substring problem is W[1]-hard even 
if both d and n are parameters [17]. Two 
algorithms for the closest substring problem 
have been developed [17] for the cases where 
d and n are small. The running times for 
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the two algorithms are f(d) - m?°2® and 
g(d,n) - nests) for some functions f 
and g, respectively. The first fixed-parameter 
algorithm for closest string problem has a 
running time complexity O(nd¢*!) [8]. Ma 
and Sun designed a fixed-parameter algorithm 
with running time O(nm + nd - (16|Z])@) 
for the closest string problem [16]. Extending 
the algorithm for the closest string problem, 
an O(nL + nd x 244|5|4 x ml'es¢1+1) time 
algorithm was given for the closest substring 
problem [16]. 

Since then, a series of improved algorithms 
have been obtained. Wang and Zhu gave an 
O(nL + nd - (2375(|E| — 1))%) algorithm 
for the closest string problem [20]. Chen and 
Wang gave an algorithm with running times 
O(nL +nd - 47.21) for protein with || = 20 
and O(nL +nd - 13.924) for DNA with | Z| = 4, 
respectively [2]. They also developed a software 
package for the (L,d) motif model. Currently 
the fastest fixed-parameter algorithm for the 
closest string problem was given by Chen, 
Ma, and Wang. They developed a three-string 
approach and the running time of the algorithm 
is O(nL +. nd? - d®.731) for binary strings [3]. 

Results for other measures with applications 
in computational biology can be found in [5, 9, 
18, 19]. 


Applications 


Many problems in molecular biology involve 
finding similar regions common to each sequence 
in a given set of DNA, RNA, or protein 
sequences. These problems find applications 
in locating binding sites and finding conserved 
regions in unaligned sequences [5, 9, 18, 19], 
genetic drug target identification [10], designing 
genetic probes [10], universal PCR primer de- 
sign [5, 10], and, outside computational biology, 
in coding theory [1, 6,7]. Such problems may 
be considered to be various generalizations of 
the common substring problem, allowing errors. 
Many measures have been proposed for finding 
such regions common to every given string. 
A popular and one of the most fundamental 
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measures is the Hamming distance. Moreover, 
two popular objective functions are used in these 
areas. One is the total sum of distances between 
the center string (common substring) and each 
of the given strings. The other is the maximum 
distance between the center string and a given 
string. For more details, see [10]. 


A More General Problem 


The distinguishing substring selection problem 
has as input two sets of strings, 6 and G. It is 
required to find a substring of unspecified length 
(denoted by L) such that it is, informally, close to 
a substring of every string in 6 and far away from 
every length L substring of strings in G. However, 
we can go through all the possible length L 
substrings of strings in G, and we may assume 
that every string in G has the same length L since 
G can be reconstructed to contain all substrings 
of length L in each of the good strings. 

The problem is formally defined as follows: 
Given a set B = {51,52,...,5n} of ny (bad) 
strings of length at least L, and a set G = 
{£15 25-++no} Of Nz (good) strings of length 
exactly L, as well as two integers dp, and dg 
(dp < dg), the distinguishing substring selection 
problem (DSSP) is to find a string s such that 
for each string, there exists a length L substring 
t; of s; with d(s,t;) < d, and for any string 
gi € G, d(s,gi) = dg. Here d(,) represents 
the Hamming distance between two strings. If 
all strings in B are also of the same length L, 
the problem is called the distinguishing string 
problem (DSP). 

The distinguishing string problem was first 
proposed in [10] for generic drug target design. 
The following results are from [4]. 


Theorem3 There is a_ polynomial time 
approximation scheme for the distinguishing 
substring selection problem. That is, for 
any constant € > 0, the algorithm finds 
a string s of length L such that for every 
si € B, there is a length L substring t; of 
s; with d(ti,s) < (1 + €)dp and for every 
substring uj of length L of every gi € G, 
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d(uj,s) => (1 — €)dg, if a solution to 
the original pair (dj, < dg) exists. Since 
there are a polynomial number of such pairs 
(db,dg), we can exhaust all the possibilities in 
polynomial time to find a good approximation 
required by the corresponding application 
problems. 


Open Problems 


The PTASs designed here use linear program- 
ming and randomized rounding technique to 
solve some cases for the problem. Thus, the 
running time complexity of the algorithms for 
both the closest string and closest substring is 
very high. An interesting open problem is to 
design more efficient PTASs for both problems. 
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Problem Definition 


CLOSEST SUBSTRING is a core problem in the 
field of consensus string analysis with, in par- 
ticular, applications in computational biology. Its 
decision version is defined as follows. 


CLOSEST SUBSTRING 

Input: k strings 51, 52,...,5% over alphabet » 
and non-negative integers d and L. 

Question: Is there a string s of length L and, for 
alli = 1,...,k,alength-L substring S; of s; such 


that dy (s,s;) < d? 
Here dy (s,s; ) denotes the Hamming distance 


between s and s;’, i.e., the number of positions in 
which s and s;’ differ. Following the notation used 
in [7], m is used to denote the average length of 
the input strings and 7 to denote the total size of 
the problem input. 

The optimization version of CLOSEST SUB- 
STRING asks for the minimum value of the dis- 
tance parameter d for which the input strings still 
allow a solution. 


Key Results 


The classical complexity of CLOSEST SUB- 
STRING is given by 


Theorem 1 ([4, 5]) CLOSEST SUBSTRING is 
NP-complete, and remains so for the special case 
of the CLOSEST STRING problem, where the 
requested solution string s has to be of same 
length as the input strings. CLOSEST STRING is 
NP-complete even for the further restriction to 
a binary alphabet. 


The following theorem gives the central state- 
ment concerning the problem’s approximability: 


Theorem 2 ([6]) CLOSEST SUBSTRING (as well 
as CLOSEST STRING) admit polynomial time 
approximation schemes (PTAS’s), where the ob- 
jective function is the minimum Hamming dis- 
tance d. 


In its randomized version, the PTAS cited by 
Theorem 2 computes, with high probability, 
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a solution with Hamming distance (1 + €)dop 
for an optimum value dop: in (k2m) 0 os |21/ <*) 
running time. With additional overhead, this ran- 
domized PTAS can be derandomized. A straight- 
forward and efficient factor-2 approximation 
for CLOSEST STRING is obtained by trying 
all length-L substrings of one of the input 
strings. 

The following two statements address the 
problem’s parametrized complexity, with respect 
to both obvious problem parameters d and k: 


Theorem 3 ({3]) CLOSEST SUBSTRING is 
W[1]-hard with respect to the parameter k, even 
for binary alphabet. 


Theorem 4 ([7]) CLOSEST SUBSTRING is 
W[1]-hard with respect to the parameter d, even 
for binary alphabet. 


For non-binary alphabet the statement of Theo- 
rem 3 has been shown independently by Evans 
et al. [2]. Theorem 3 and Theorem 4 show that an 
exact algorithm for CLOSEST SUBSTRING with 
polynomial running time is unlikely for a con- 
stant value of d as well as for a constant value 
of k, i.e., such an algorithm does not exist unless 
3-SAT can be solved in subexponential time. 
Theorem 4 also allows additional insights into 
the problem’s approximability: In the PTAS for 
CLOSEST SUBSTRING, the exponent of the poly- 
nomial bounding the running time depends on the 
approximation factor. These are not “efficient” 
PTAS’s (EPTAS’s), i.e., PTAS’s with a f(e) - n° 
running time for some function f and some con- 
stant c, and therefore are probably not useful in 
practice. Theorem 4 implies that most likely the 
PTAS with the n?/«) running time presented 
in [6] cannot be improved to an EPTAS. More 
precisely, there is no f(e) -n°“°2!/9) time PTAS 
for CLOSEST SUBSTRING unless 3-SAT can be 
solved in subexponential time. Moreover, the 
proof of Theorem 4 also yields. 
Theorem 5 ([7]) There are no f(d,k)-n°%°2® 
time and no g(d,k) -n?°“°8"*) exact algorithms 
solving CLOSEST SUBSTRING for some functions 
f and g unless 3-SAT can be solved in subexpo- 
nential time. 
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For unbounded alphabet the bounds have 
been strengthened by showing that Closest 
Substring has no PTAS with running time 
f(e) -n°@/9 for any function f unless 3-SAT 
can be solved in subexponential time [10]. The 
following statements provide exact algorithms 
for CLOSEST SUBSTRING with small fixed 
values of d and k, matching the bounds given 
in Theorem 5: 


Theorem 6 ([7]) CLOSEST SUBSTRING can be 
solved in time f(d)-n°"°2®) for some function 
f, where, more precisely, f(d) = |Z |4@°24+?), 


Theorem 7 ({7]) CLOSEST SUBSTRING 
can be solved in time g(d,k)-nOosbes%) 
for some function g, where, more precisely, 


g(d,k) = (|Z |). 


With regard to problem parameter L, CLOS- 
EST SUBSTRING can be trivially solved in 
O(\=| -n) time by trying all possible strings 
over alphabet »’. 


Applications 


An application of CLOSEST SUBSTRING lies in 
the analysis of biological sequences. In motif 
discovery, a goal is to search “signals” common 
to a set of selected strings representing DNA or 
protein sequences. One way to represent these 
signals are approximately preserved substrings 
occurring in each of the input strings. Employing 
Hamming distance as a biologically meaningful 
distance measure results in the problem formula- 
tion of CLOSEST SUBSTRING. 

For example, Sagot [9] studies motif 
discovery by solving CLOSEST SUBSTRING 
(and generalizations thereof) using — suffix 
trees; this approach has a worst-case running 
time of O(k?2m-L4%-|Z|¢). In the context 
of motif discovery, also heuristics applicable 
to CLOSEST SUBSTRING were proposed, 
e.g., Pevzner and Sze [8] present an algo- 
rithm called WINNOWER and Buhler and 
Tompa [1] use a technique called random 
projections. 
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Open Problems 


It is open [7] whether the nO@/ *) running time 
of the approximation scheme presented in [6] can 
be improved to n@“°s!/©), matching the bound 
derived from Theorem 4. 


Cross-References 


The following problems are close relatives of 
CLOSEST SUBSTRING: 


Closest String and Substring Problems is 
the special case of CLOSEST SUBSTRING, 
where the requested solution string s has to 
be of same length as the input strings. 

e Distinguishing Substring Selection is the gen- 
eralization of CLOSEST SUBSTRING, where 
a second set of input strings and an addi- 
tional integer d’ are given and where the 
requested solution string s has — in addition 
to the requirements posed by CLOSEST SUB- 
STRING — Hamming distance at least d’ with 
every length-Z substring from the second set 
of strings. 

¢ Consensus Patterns is the problem obtained 
by replacing, in the definition of CLOSEST 
SUBSTRING, the maximum of Hamming dis- 
tances by the sum of Hamming distances. The 
resulting modified question of CONSENSUS 
PATTERNS is: Is there a string s of length L 
with 


> du(s,s}) <4? 


i=1,...,m 


CONSENSUS PATTERNS is the special case 
of SUBSTRING PARSIMONY in which the 
phylogenetic tree provided in the definition of 
SUBSTRING PARSIMONY is a Star phylogeny. 
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Problem Definition 


A clustered graph C(G,T) consists of a graph 
G, called underlying graph, and of a rooted tree 
T, called inclusion tree. The leaves of T are 
the vertices of G; each internal node yw of T 
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Clustered Graph Drawing, Fig. 1 A clustered graph C(G, T) (left) and its inclusion tree (right) 


Clustered Graph 
Drawing, Fig. 2 Types of 
crossings in a drawing of a 
clustered graph 


ne 


represents a cluster, that is, the set of vertices of 
G that are the leaves of the subtree of 7 rooted 
at y. Figure 1 shows a clustered graph and its 
inclusion tree. 

Clustered graphs are widely used in applica- 
tions where it is needed at the same time to rep- 
resent relationships between entities and to group 
entities with semantic affinities. For example, in 
the Internet network, the routers and the links 
between them are the vertices and edges of a 
graph, respectively; geographically close routers 
are grouped into areas that are hence associ- 
ated with clusters of vertices. In turn, areas are 
grouped into autonomous systems that are hence 
associated with clusters of vertices. 

Visualizing clustered graphs is a difficult prob- 
lem, due to the simultaneous need for a readable 
drawing of the underlying structure and of the 
clustering relationship. As for the visualization 
of graphs, the most important aesthetic criterion 
for the readability of a drawing of a clustered 
graph is the planarity, whose definition needs a 
refinement in the context of clustered graphs, in 
order to deal with the clustering structure. 

In a drawing of a clustered graph C(G,T), 
vertices and edges of G are drawn as points and 
open curves, respectively, and each cluster ju is 
represented by a simple closed region R,, con- 
taining all and only the vertices of w. A drawing 
of C can have three types of crossings. Edge-edge 
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crossings are crossings between edges of G (see 
Fig. 2, left). Consider an edge e of G and a cluster 
pin T. If e intersects the boundary of R,, more 
than once, we have an edge-region crossing (see 
Fig. 2, middle). Finally, consider two clusters 
and v in 7; if the boundary of R,, intersects the 
boundary of R,, we have a region-region crossing 
(see Fig. 2, right). A drawing of a clustered graph 
is c-planar (short for clustered planar) if it does 
not have any edge-edge, edge-region, or region- 
region crossing. A clustered graph is c-planar 
if it admits a c-planar drawing. A drawing of a 
clustered graph is straight line if each edge is 
represented by a straight-line segment; also, it is 
convex if each cluster is represented by a convex 
region. 

The notion of c-planarity was first introduced 
by Feng, Cohen, and Eades in 1995 [10, 11]. 
The graph drawing community has subsequently 
adopted this definition as a standard, and the 
topological and geometric properties of c-planar 
drawings of clustered graphs have been investi- 
gated in tens of papers. The two main questions 
raised by Feng, Cohen, and Eades were the fol- 
lowing. 


Problem 1 (C-Planarity Testing) 


QUESTION: What’s the time complexity of test- 
ing the c-planarity of a clustered graph? 
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Problem 2 (Straight-Line Convex C-Planar 
Drawability) 


QUESTION: Does every c-planar-clustered 
graph admit a straight-line convex c-planar 
drawing? 


Key Results 


Almost 20 years after the publication of the 
seminal papers by Feng et al. [10, 11], a solution 
for Problem | remains an elusive goal, arguably 
the most intriguing and well-studied algorithmic 
problem in the graph drawing research area (see, 
e.g., (3,4, 7, 12, 13, 15-17]). 

Polynomial-time algorithms have been pre- 
sented to test the c-planarity of a large number of 
classes of clustered graphs. A particular attention 
has been devoted to c-connected clustered graphs 
that are clustered graphs C(G, T) such that each 
cluster 44 € T induces a connected component 
G, of G. The following theorem reveals the 
importance of c-connected clustered graphs. 


Theorem 1 (Feng, Cohen, and Eades [11]) A 
clustered graph is c-planar if and only if it is 
a subgraph of a c-planar c-connected clustered 
graph. 


Feng, Cohen, and Eades provided in [11] a 
nice and simple quadratic-time testing algorithm, 
which is described in the following. 


Theorem 2 (Feng, Cohen, and Eades [11]) 
The c-planarity of an n-vertex c-connected 
clustered graph can be tested in O(n?) time. 


The starting point of Feng et al. result is a 
characterization of c-planar drawings. 


Theorem 3 (Feng, Cohen, and Eades [11]) 
A drawing of a c-connected clustered graph 
C(G,T) is c-planar if and only if it is planar, 
and, for each cluster 1, all the vertices and edges 
of G — G,, are in the outer face of the drawing of 
Gy. 


The algorithm of Feng et al. [11] performs a 
bottom-up traversal of T. 

When a node x € T is considered, the 
algorithm tests whether a drawing of G,, exists 
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such that (P1) for each descendant v of y, all 
the vertices and edges of G, — G, are in the 
outer face of the drawing of G, in I, and 
(P2) all the vertices of G, having neighbors in 
G — G, are incident to the outer face of G,, in 
I. Feng et al. show how a PQ-tree P,, [2] can 
be used to efficiently represent all the (possibly 
exponentially many) orderings in which the edges 
incident to jz can cross the boundary of R,, in any 
planar drawing I), of G,, satisfying properties P1 
and P2 (see Fig. 3, left and middle). 

PQ-tree P,, can be easily computed for each 
leaf x © T. Consider an internal node wz € 
T and assume that PQ-trees P,,,,..., Pu, have 
been associated to the children [11,..., (4% of 
LL. Representative graphs Hy,,..., Hy, are con- 
structed from P,,,..., Py,; the embeddings of 
H,,; are in bijection with the embeddings of 
G,, satisfying properties P1 and P2 (see Fig. 3, 
left and right). Then, a graph Gi, is constructed 
composed of H,,,,..., Hy,,, of a dummy vertex 
vy, and of length-2 paths connecting v, with 
every vertex of H,,,,..., H,,, that has a neighbor 
in G — G,,. Feng et al. argue that the embeddings 
of G,, satisfying properties Pl and P2 are in 
bijection with the embeddings of G, in which v,, 
is incident to the outer face. Hence, a planarity 
testing for Gi, is performed. This allows to deter- 
mine P,,, thus allowing the visit of T to go on. 

If no planarity test fails, the algorithm com- 
pletes the visit of T. Top-down traversing T and 
fixing an embedding for the PQ-tree associated to 
each node of T determines a c-planar drawing of 
C(G,T). 

Involved linear-time algorithms to test the 
c-planarity of c-connected clustered graphs are 
known nowadays [5, 6]. The algorithm in [5] 
relies on a structural characterization of the 
c-planarity of a c-connected clustered graph 
C(G, T) based on the decomposition of G into 
triconnected components. The characterization 
allows one to test in linear time the c-planarity 
of C(G,T) via a bottom-up visit of the SPQR- 
tree [8] of G, which is a data structure efficiently 
representing the planar embeddings of G. 

Problem | is fundamental for the graph draw- 
ing research area. However, no less importance 
has to be attributed to the task of designing 
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Clustered Graph Drawing, Fig. 3 Left: Graph G,, and 
the edges incident to (4. Middle: The PQ-tree representing 
the possible orderings in which the edges incident to (4 


algorithms for constructing geometric represen- 
tations of clustered graphs. The milestones in 
this research direction have been established by 
Feng et al., who provided in [10] a full answer to 
Problem 2. 


Theorem 4 (Feng, Cohen, and Eades [10]) Ev- 
ery c-planar clustered graph admits a straight- 
line convex c-planar drawing. 


The proof of Theorem 4 relies on a positive an- 
swer for the following question: Does every pla- 
nar hierarchical graph admit a planar straight- 
line hierarchical drawing? A hierarchical graph 
is a graph with an assignment of its vertices 
to k layers l1,...,/,. A hierarchical drawing 
maps each vertex assigned to layer /; to a point 
on the horizontal line y = i and each edge 
to a y-monotone curve between the correspond- 
ing endpoints. A hierarchical graph is planar 
if it admits a planar hierarchical drawing. Feng 
et al. [10] showed an algorithm to construct a pla- 
nar straight-line hierarchical drawing of any pla- 
nar hierarchical graph H. Their algorithm splits 
H into some subgraphs, inductively constructs 
planar straight-line hierarchical drawings of such 
subgraphs, and glues these drawings together to 
obtain a planar straight-line hierarchical drawing 
of H. 


can cross the boundary of R,, in a planar drawing of Gy, 
satisfying properties Pl and P2. Right: The representative 
graph H,, for Gy 


Feng et al. also showed how the result on 
hierarchical graphs leads to a proof of Theorem 4, 
namely: 


1. Starting from any c-planar clustered graph 
C(G,T), construct a hierarchical graph H 
by assigning the vertices of G to n layers, 
in the same order as in an st-numbering of 
G in which vertices of the same cluster are 
numbered consecutively. 

2. Construct a planar straight-line hierarchical 
drawing I"y of H. 

3. Construct a_ straight-line convex c-planar 
drawing of C(G,T) starting from Iq by 
drawing each cluster as a convex region 
slightly surrounding the convex hull of its 
vertices. 


Angelini, Frati, and Kaufmann [1] recently 
strengthened Theorem 4 by proving that every 
c-planar clustered graph admits a straight-line c- 
planar drawing in which each cluster is repre- 
sented by a scaled copy of an arbitrary convex 
shape. 

Hong and Nagamochi [14] studied straight- 
line convex c-planar drawings in which the faces 
of the underlying graph are delimited by con- 
vex polygons. They proved that a c-connected 
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clustered graph admits such a drawing if and 
only if it is c-planar, completely connected (i.e., 
for every cluster 4, both G, and G — G, are 
connected), and internally triconnected (i.e., for 
every separation pair {u,v}, vertices u and v are 
incident to the outer face of G and each connected 
component of G — {u,v} contains vertices inci- 
dent to the outer face of G). 

The drawings constructed by the algorithm 
in [10] use real coordinates. Hence, when dis- 
playing these drawings on a screen with a fi- 
nite resolution rule, exponential area might be 
required for the visualization. This drawback is 
however unavoidable. Namely, Feng et al. proved 
in [10] that there exist clustered graphs requiring 
exponential area in any straight-line convex c- 
planar drawing with a finite resolution rule. This 
harshly differentiates c-planar clustered graphs 
from planar graphs, as straight-line convex planar 
drawings of planar graphs can be constructed in 
polynomial area. 


Theorem 5 (Feng, Cohen, and Eades [10]) 
There exist n-vertex c-planar clustered graphs 
requiring 2°) area in any straight-line convex 
c-planar drawing. 


The proof of Theorem 5 adapts techniques 
introduced by Di Battista et al. [9] to prove area 
lower bounds for straight-line upward planar 
drawings of directed graphs. 


Open Problems 


After almost 20 years since it was first posed by 
Feng, Cohen, and Eades, Problem | still repre- 
sents a terrific challenge for researchers working 
in graph drawing. 

A key result of Feng, Cohen, and Eades [11] 
shows that testing the c-planarity of a clustered 
graph C(G,T) is a polynomial-time solvable 
problem if C(G,T) is c-connected — see Theo- 
rem 2. Moreover, a clustered graph is c-planar 
if and only if it is a subgraph of a c-planar 
c-connected clustered graph — see Theorem 1. 
Hence, the core of testing the c-planarity of a 
non-c-connected clustered graph C(G,7) is an 
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augmentation problem, asking whether C(G, 7) 
can be augmented to a c-connected c-planar clus- 
tered graph C’(G’, T) by inserting edges in G. 
This augmentation problem seems far from 
being solved. A particular attention [3,4,7, 15,16] 
has been devoted to the case in which a planar 
embedding for G is prescribed as part of the in- 
put. In this case, edges might only be inserted in- 
side faces of G, in order to guarantee the planarity 
of G’. Thus, the problem becomes equivalent to 
the one selecting a set S of edges into a set M of 
topological embedded multigraphs, where each 
cluster j4 defines a multigraph M,, in M con- 
sisting of all the edges that can be inserted inside 
faces of G in order to connect distinct connected 
components of G,,. Then the edges in S are those 
that are selected to augment G to G’ — hence 
no two edges in S are allowed to cross. Even 
in this prescribed-embedding version, only partial 
results are known. For example, polynomial-time 
algorithms to test the c-planarity of C(G, 7) are 
known if the faces of G have at most five incident 
vertices [7], or if each cluster induces at most 
two connected components [15], or if each cluster 
has at most two incident vertices on each face of 


G [3]. 
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Problem Definition 


The problem of clustering consists of partitioning 
a set of objects such as images, text documents, 
etc. into groups of related items. The information 
available to the clustering algorithm consists of 
pairwise similarity information between objects. 
One of the most popular approaches to clustering 
is to map the objects into data points in a metric 
space, define an objective function over the data 
points, and find a partitioning which achieves the 
optimal solution, or an approximately optimal 
solution to the given objective function. In this 
entry, we will focus on two of the most widely 
studied objective functions for clustering: the k- 
median objective and the k-means objective. 

In k-median clustering, the input is a set P of 
n points in a metric space (X,d), where d(-) is 
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the distance function. The objective is to find k 
center points C1, C2,-+-cx. The clustering is then 
formed by assigning each data point to the closest 
center point. If a point x is assigned to center 
c(x), then the cost incurred is d(x,c(x)). The 
goal is to find center points and a partitioning 
of the data so as to minimize the total cost 
® = Mine, ,c9,...c, Dox Min; d(x, c;). This objec- 
tive is closely related to the well-studied facility 
location problem [1, 9] in theoretical computer 
science. 

Similarly, for the k-means objective, the goal 
is to find k center points. However, the cost 
incurred by a point x is the distance squared 
to its center. Hence, the goal is to minimize 
P = Ming, c>,...c, Dx Min; d*(x,c;). A special 
case of the k-means objective which is of 
particular interest is the Euclidean k-means 
problem where the points are in ‘i’ and the 
distance function is the squared Euclidean 
distance. Again, the goal is to choose k 
center points and assign each point to the 
closest center while minimizing the total cost. 
However, unlike k-median and k-means in metric 
spaces, the center points do not necessarily 
have to belong to the data set P and can be 
arbitrarily chosen from ‘”. Unfortunately, 
optimizing both these objectives turns out to 
be NP-hard. Hence, a lot of the work in the 
theoretical computer science community focuses 
on designing good approximation algorithms 
for these problems[1,8-10,12] with formal 
guarantees on worst-case instances. 

However, in most practical scenarios, the 
clustering instances which one encounters are 
not worst case but instead have additional 
structure/stability associated with them. In such 
cases, it is natural to ask if one can abstract 
out this structure in the form of a stability 
notion, formally study it, and exploit this 
additional structure in order to obtain optimal 
or nearly optimal solutions and bypass NP- 
hardness which only applies to worst-case 
instances. This modern take on clustering 
research has, in recent years, produced new 
insights and deeper understanding of what we 
know about clustering. In this entry, we will 
survey some key results on clustering under 
stability assumptions. 


Clustering under Stability Assumptions 


Key Results 


€-separability: 


This notion of stability was proposed by Ostro- 
vsky et al. [15]. The motivation comes from the 
fact that in practice, when solving a clustering 
instance, one typically has to decide how many 
clusters to partition the data into, i.e., the value 
of k. If the k-means objective is the underlying 
criteria being used to judge the quality of a clus- 
tering, and the optimal (kK — 1)-means clustering 
is comparable to the optimal &-means clustering; 
then, one can in principle also use (k — 1) clusters 
to describe the data set. Hence, the particular 
clustering instance is not well behaved or not 
stable. In fact this particular method is a very 
popular heuristic to find out the number of hidden 
clusters in the data set suggesting that real-world 
instances have this property. 


Definition 1 (¢-Separability) Given an instance 
of Euclidean k-means clustering, let OPT(k) de- 
note the cost of the optimal k-means solution. 
We can also decompose OPT(k) as OPT = 
Ta OPT;, where OPT; denotes the 1-means 
cost of cluster C;, i.e., eee) Such 
an instance is called €-separable if it satisfies 
OPT(k — 1) > SOPT(k). 


It was shown by Ostrovsky et al. [15] that one can 
design much better approximation algorithms for 
such instances. The algorithm is based on over 
sampling O(k) candidate centers using a distance 
weighted sampling scheme, followed by a greedy 
deletion strategy to reduce the k centers without 
incurring too much increase in the k-means cost. 


Theorem 1 ((15]) There is a polynomial time al- 
gorithm which given any €-separable Euclidean 
k-means instance, outputs a clustering of cost at 


most er with probability 1 — O((p)!/*) where 
p = O(e?). 


(1 + a, €)-Approximation-Stability: 


Balcan et al. [5] introduced and analyzed a class 
of approximation stable instances for which one 
can find near optimal clusterings in polynomial 
time. The motivation comes from the fact that for 
many problems of interest to machine learning, 
there is an unknown underlying correct “target” 
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clustering. In such cases, the implicit hope when 
pursuing an objective-based clustering approach 
(k-means or k-median) is that approximately 
optimizing the objective function will in fact 
produce a clustering of low clustering error, i.e., 
a clustering that is point wise close to the target 
clustering. Balcan et al. showed that by making 
this implicit assumption explicit, one can effi- 
ciently compute a low-error clustering even in 
cases when the approximation problem of the 
objective function is NP-hard! 


Definition 2 ((1 + a, €)-approximation- 
stability) Let P be a set of 1 points residing 
in a metric space (M,d). Given an objective 
function ® (such as k-median, k-means), we 
say that instance (M, P) satisfies (1 + a,€)- 
approximation-stability for ® if all clusterings C 
with ®(C) < (1 + a) - OPT(K) are point-wise 
e-close to the target clustering 7 for (M, P). 


Here, the term “target” clustering refers to the 
ground truth clustering of P which one is trying 
to approximate. The distance between any two 
k clusterings C and C* of n points is measured 
as dist(C,C*) = minges, + 74, |Ci\ C*oa|- 
Interestingly, this approximation stability condi- 
tion implies a lot of structure about the problem 
instance which could be exploited algorithmi- 
cally. For example, one can show the following: 


Theorem 2 ( [5]) Jf the given instance (M, P) 
satisfies (1 +a, €)-approximation-stability for the 
k-median or the k-means objective, then we can 
efficiently produce a clustering that is O(€ + 
€/a)-close to the target clustering T. 


As mentioned above, this theorem is valid even 
for values of a for which getting a (1 + @)- 
approximation to k-median and k-means is N P- 
hard! The algorithm first creates a graph over data 
points by connecting points which are within a 
certain distance threshold. The next step involves 
iteratively peeling off connected components 
in the graph and simultaneously de-noising the 
instance. 


Related Notions 

The notion of €-separability and (1 + a, €)- 
approximation-stability are related to each other. 
For example, Theorem 5.1 in [15] shows that 
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€-separability implies that any near-optimal 
solution to k-means is O(e7)-close to the 
k-means optimal clustering. However, the 
converse is not necessarily the case; an instance 
could satisfy approximation-stability without 
being €-separated. In [6], Balcan et al. present 
a specific example of points in Euclidean space 
with a = 1. In fact, when & is much larger than 
1/e, the difference between the two properties 
can be more substantial. 

The notion of separability and approximation 
stability was generalized in [2] where the authors 
study a notion of stability called a-weak deletion 
stability. A clustering instance is stable under 
this notion if in the optimal clustering merging 
any two clusters into one increases the cost by a 
multiplicative factor of (1 + a). This definition 
captures both €-separability and approximation 
stability in the case of large cluster sizes. Re- 
markably, [2] show that for such instances of k- 
median and Euclidean k-means, one can design a 
(1 + €) approximation algorithm for any constant 
€ > 0. This leads to immediate improvements 
over the works of [5] (for the case of large 
clusters) and of [15]. However, the run time of 
the resulting algorithm depends polynomially in 
n and k and exponentially in the parameters 1/a 
and 1/e, so the simpler algorithms of [2] and [5] 
are more suitable for scenarios where one expects 
the stronger properties to hold. 

Kumar and Kannan [11] study a separation 
condition motivated by the k-means objective and 
the kind of instances produced by Gaussian and 
related mixture models. They consider the setting 
of points in Euclidean space and show that if the 
projection of any data point onto the line joining 
the mean of its cluster in the target clustering to 
the mean of any other cluster of the target is 2(k) 
standard deviations closer to its own mean than 
the other mean, then they can recover the target 
clusters in polynomial time. This condition was 
further analyzed and reduced by work of [3]. 

Bilu and Linial [7] study clustering instances 
which are perturbation resilient. An instance is 
c-perturbation resilient if it has the property that 
the optimal solution to the objective remains 
optimal even after bounded perturbations (up to 
factor c) to the input weight matrix. They give 
an algorithm for maxcut (which can be viewed 
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as a 2-clustering problem) under the assumption 
that the optimal solution is stable to (roughly) 
O(n?/)-factor multiplicative perturbations to 
the edge weights. This was improved by [14]. 
Awasthi et al. [3] study perturbation resilience 
for center-based clustering objectives such as 
k-median and k-means and give an algorithm 
that finds the optimal solution when the input is 
stable to only factor-3 perturbations. This factor 
is improved to 1 + /2 by [4], who also design 
algorithms under a relaxed (c, €)-stability to per- 
turbation condition in which the optimal solution 
need not be identical on the c-perturbed instance, 
but may change on an € fraction of the points (in 
this case, the algorithms require c = 2 + 3). 

For the k-median objective, (c, €)-approx- 
imation-stability with respect to C* implies 
(c,€)-stability to perturbations because an 
optimal solution in a c-perturbed instance is 
guaranteed to be a c-approximation on_ the 
original instance. Similarly, for k-means, (c, €)- 
stability to perturbations is implied by (c?, €)- 
approximation-stability. However, as noted 
above, the values of c known to lead to efficient 
clustering in the case of stability to perturbations 
are larger than for approximation-stability, where 
any constant c > | suffices. 


Open Problems 


The algorithm proposed in [15] for €-separability 
is a variant of the popular Lloyd’s heuristic for k- 
means [13]. Hence, the result can also be viewed 
as a characterization of when such heuristics 
work in practice. It would be interesting to 
establish weaker sufficient conditions for such 
heuristics. For instance, is it possible that weak- 
deletion stability is sufficient for a version of 
the Lloyd’s heuristic to converge to the optimal 
clustering? Another open direction concerns the 
notion of perturbation resilience. Can one reduce 
the perturbation factor c needed for efficient 
clustering? Alternatively, if one cannot find 
the optimal clustering for small values of c, 
can one still find a near-optimal clustering, of 
approximation ratio better than what is possible 
on worst-case instances? 
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In a different direction, one can also consider 
relaxations of the _ perturbation-resilience 
condition. For example, Balcan et al. [4] also 
consider instances that are “mostly resilient” to 
c-perturbations; under any c-perturbation of the 
underlying metric, no more than an e-fraction 
of the points gets mislabeled under the optimal 
solution. For sufficiently large constant c and suf- 
ficiently small constant €, they present algorithms 
that get good approximations to the objective un- 
der this condition. A different kind of relaxation 
would be to consider a notion of resilience to 
perturbations on average: a clustering instance 
whose optimal clustering is likely not to change, 
assuming the perturbation is random from a 
suitable distribution. Can this weaker notion be 
used to still achieve positive guarantees? 

Finally, the notion of stability can also shed 
light on practically interesting instances of many 
other important problems. Can stability assump- 
tions, preferably ones of a mild nature, allow us 
to bypass NP-hardness results of other problems? 
One particularly intriguing direction is the prob- 
lem of Sparsest-Cut, for which no PTAS or 
constant-approximation algorithm is known, yet a 
powerful heuristics based on spectral techniques 
work remarkably well in practice. 
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Problem Definition 


Color coding [2] is a novel method used for 
solving, in polynomial time, various subcases 
of the generally NP-Hard subgraph isomorphism 
problem. The input for the subgraph isomorphism 
problem is an ordered pair of (possibly 
directed) graphs (G,H). The output is either 
a mapping showing that H is isomorphic to 
a (possibly induced) subgraph of G, or false 
if no such subgraph exists. The subgraph 
isomorphism problem includes, as_ special 
cases, the HAMILTON-PATH, CLIQUE, and 
INDEPENDENT SET problems, as well as 
many others. The problem is also interesting 
when H is fixed. The goal, in this case, is 
to design algorithms whose running times are 
significantly better than the running time of the 
naive algorithm. 


Method Description 

The color coding method is a randomized 
method. The vertices of the graph G = (V, E) in 
which a subgraph isomorphic to H = (Vy, Ex) 
is sought are randomly colored by k = |Vq| 
colors. If |Vz| = O(Wog|V]|), then with a small 
probability, but only polynomially small (i.e., one 
over a polynomial), all the vertices of a subgraph 
of G which is isomorphic to H, if there is such 
a subgraph, will be colored by distinct colors. 
Such a subgraph is called color coded. The color 
coding method exploits the fact that, in many 
cases, it is easier to detect color coded subgraphs 
than uncolored ones. 

Perhaps the simplest interesting subcases 
of the subgraph isomorphism problem are the 
following: Given a directed or undirected graph 
G =(V,£) and a number k, does G contain 
a simple (directed) path of length k? Does 
G contain a simple (directed) cycle of length 
exactly k? The following describes a 2°) - |E| 
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time algorithm that receives as input the graph 
G = (V,E), a coloring c:V > {1,...,k} and 
a vertex s © V, and finds a colorful path of 
length k —1 that starts at s, if one exists. To 
find a colorful path of length & —1 in G that 
starts somewhere, just add a new vertex s’ 
to V, color it with a new color 0 and connect 
it with edges to all the vertices of V. Now 
look for a colorful path of length & that starts 
at 5’. 

A colorful path of length k — 1 that starts at 
some specified vertex s is found using a dynamic 
programming approach. Suppose one is already 
given, for each vertex v € V, the possible sets of 
colors on colorful paths of length 7 that connect s 
and v. Note that there is no need to record 
all colorful paths connecting s and v. Instead, 
record the color sets appearing on such paths. 
For each vertex v there is a collection of at most 
(‘) color sets. Now, inspect every subset C that 
belongs to the collection of v, and every edge 
(v,u) € E. If c(u) ¢ C, add the set C U {c(u)} 
to the collection of u that corresponds to colorful 
paths of length i+ 1. The graph G contains 
a colorful path of length k — 1 with respect to 
the coloring c if and only if the final collection, 
that corresponding to paths of length k — 1, of 
at least one vertex is non-empty. The number of 
operations performed by the algorithm outlined 
is at most O( Ki (*) -|E|) which is clearly 
O(k2* - |E]). 


Derandomization 

The randomized algorithms obtained using the 
color coding method are derandomized with only 
a small loss in efficiency. All that is needed 
to derandomize them is a family of colorings 
of G=(V,E) so that every subset of k ver- 
tices of G is assigned distinct colors by at least 
one of these colorings. Such a family is also 
called a family of perfect hash functions from 
{1,2,...,|V]|} to {1,2,...,k}. Such a family is 
explicitly constructed by combining the meth- 
ods of [1, 9, 12, 16]. For a derandomization 
technique yielding a constant factor improvement 
see [5]. 


Color Coding 


Key Results 


Lemma 1 Let G=(V,E) be a directed or 
undirected graph and let c:V —> {1,...,k} be 
a coloring of its vertices with k colors. A colorful 
path of length k —1 in G, if one exists, can be 
found in 2°) . |E| worst-case time. 


Lemma 2 Let G=(V,E) be a directed or 
undirected graph and let c:V — {1,...,k} 
be a coloring of its vertices with k colors. 
All pairs of vertices connected by colorful 
paths of length k —1 in G can be found in 
either 29) .|V\|E| 0r2° .|V|®_— worst- 
case time (here w < 2.376 denotes the matrix 
multiplication exponent). 


Using the above lemmata the following results 
are obtained. 


Theorem 3 A simple directed or undirected path 
of length k —1 in a (directed or undirected) 
graph G =(V,E) that contains such a path 
can be found in 2° -|V| expected time in the 
undirected case and in 2°) . |E| expected time 
in the directed case. 


Theorem 4 A simple directed or undirected 
cycle of size k in a (directed or undirected) 
graph G = (V, E) that contains such a cycle can 
be found in either 2° -|V||E| or 2° . |v |? 
expected time. 


A cycle of length k in minor-closed families of 
graphs can be found, using color coding, even 
faster (for planar graphs, a slightly faster algo- 
rithm appears in [6]). 


Theorem 5 Let C be a non-trivial minor-closed 
family of graphs and let k > 3 be a fixed integer. 
Then, there exists a randomized algorithm that 
given a graphG = (V,E) from C, finds a Cx 
(a simple cycle of size k) in G, if one exists, in 
O(|V|) expected time. 


As mentioned above, all these theorems can be 
derandomized at the price of a log |V| factor. The 
algorithms are also easily to parallelize. 


Color Coding 


Applications 


The initial goal was to obtain efficient algorithms 
for finding simple paths and cycles in graphs. 
The color coding method turned out, however, 
to have a much wider range of applicability. 
The linear time (ic., 2°“ -|E| for directed 
graphs and 2°“) .|V| for undirected graphs) 
bounds for simple paths apply in fact to any 
forest on k vertices. The 2°“) .|V|® bound 
for simple cycles applies in fact to any series- 
parallel graph on k vertices. More generally, 
if G = (V, E) contains a subgraph isomorphic 
to a graph H = (Vq,ExH) whose tree-width 
is at most 7, then such a subgraph can be 
found in 2°) .|V|‘+! expected time, where 
k = |Vq|. This improves an algorithm of 
Plehn and Voigt [14] that has a running time 
of k°%).|V\'+!_ As a very special case, 
it follows that the LOG PATH problem is 
in P. This resolves in the affirmative a con- 
jecture of Papadimitriou and Yannakakis [13]. 
The exponential dependence on k in the 
above bounds is probably unavoidable as the 
problem is NP-complete if k is part of the 
input. 

The color coding method has been a fruitful 
method in the study of parametrized algorithms 
and parametrized complexity [7, 8]. Recently, 
the method has found interesting applications in 
computational biology, specifically in detecting 
signaling pathways within protein interaction net- 
works, see [10, 17, 18, 19]. 


Open Problems 


Several problems, listed below, remain open. 


e Is there a polynomial time (deterministic or 
randomized) algorithm for deciding if a given 
graph G = (V, E) contains a path of length, 
say, log” |V|? (This is unlikely, as it will imply 
the existence of an algorithm that decides 
in time 20(V”) whether a given graph on n 
vertices is Hamiltonian.) 
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* Can the log|V| factor appearing in the deran- 
domization be omitted? 

¢ Is the problem of deciding whether a given 
graph G = (V, E) contains a triangle as dif- 
ficult as the Boolean multiplication of two 
|V| x |V| matrices? 


Experimental Results 


Results of running the basic algorithm on biolog- 
ical data have been reported in [17, 19]. 
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Problem Definition 


A proper coloring of a graph G = (V, E) is an 
assignment of colors to all vertices in V in sucha 
way that no two adjacent vertices have the same 
color. A k-coloring of G is a coloring that uses 
k colors. The minimum number of colors that 
can be used to properly color G is the (vertex) 
chromatic number of G and is denoted by y(G). 

Deciding whether a given graph admits a k- 
coloring for a given k > 3 is well known to be 
NP complete. In particular, it is NP hard to com- 
pute the chromatic number [5]. The best known 
approximation algorithm computes a coloring of 


size at most within a factor O (nie) of 
the chromatic number [6]. Furthermore, for any 
constant € > 0, it is NP hard to approximate the 
chromatic number within a factor n!~€ [14]. 

The intractability of the vertex coloring prob- 
lem for arbitrary graphs leads researchers to the 
study of the problem for appropriately generated 
random graphs. In the current entry, we con- 
sider coloring random instances of the random 
intersection graphs model, which is defined as 
follows: 


Definition 1 (Random Intersection Graph - 
Guim,p (9, 13]) Consider a universe M = 
{1,2,...,m} of elements and a set of 1 vertices 
V. Assign independently to each vertex v € V a 
subset S, of M, choosing each element i € M 
independently with probability p, and draw an 
edge between two vertices v # u if and only if 
Sy A S, A O. The resulting graph is an instance 
Gnm,p Of the random intersection graphs model. 


We will say that a property holds in Gym, p 
with high probability (whp) if the probability that 
a random instance of the Gym,» model has the 
property is at least 1 — o(1). 

In this model, we will refer to the elements in 
the universe M as labels. We also denote by L; 
the set of vertices that have chosen label i € M. 
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Given Gnjm,p, We will refer to {L;,i € M} as its 
label representation. Consider the bipartite graph 
with vertex set V U M and edge set {(v,i) : 
i € Sy} = {(v,i) : v € Li}. We will refer to 
this graph as the bipartite random graph Bnm,p 
associated to Gym,p. Notice that the associated 
bipartite graph is uniquely defined by the label 
representation. 

It follows from the definition of the model 
that the edges in Gym,» are not independent. 
This dependence becomes stronger as the number 
of labels decreases. In fact, the authors in [3] 
prove the equivalence (measured in terms of total 
variation distance) of the random intersection 
graphs model Gnim,p and the Erdés-Rényi ran- 
dom graphs model G, ,, for p = 1 — (1 — 
p?)”, when m = n®,a > 6. This bound on 
the number of labels was improved in [12], by 
showing equivalence of sharp threshold functions 
among the two models for a > 3. We note that 
1 — (1 — p?)™ is in fact the (unconditioned) 
probability that a specific edge exists in Gy m,p. 
In view of this equivalence, in this entry, we 
consider the interesting range of values m = 
n®,a < 1, where random intersection graphs 
seem to differ the most from Erdés-Rényi random 
graphs. 

In [1] the authors propose algorithms that whp 
probability color sparse instances of Gy m,p. In 
particular, for m = n*,a > 0 and p = 


O (Vi). they show that G,,p) can be col- 


ored optimally. Also, in the case where m = 
n®,a < land p = o(=), they show that 
X(Gn,m,p) ~ np whp. To do this, they prove 
that Gnm,p is chordal whp (or equivalently, the 
associated bipartite graph does not contain cy- 
cles), and so a perfect elimination scheme can 
be used to find a coloring in polynomial time. 
The range of values we consider here is different 
than the one needed for the algorithms in [1] 
to work. In particular, we study coloring Gp m,p 
for the wider range mp < (1 — a) Inn, as well 
as the denser range mp > In?n. We have to 
note also that the proof techniques used in [1] 
cannot be used in the range we consider, since the 
properties that they examine do not hold in our 
case. Therefore, a completely different approach 
is required. 
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Key Results 


In this entry, we initially considered the problem 
of properly coloring almost all vertices in Gy m,p- 
In particular, we proved the following: 


Theorem 1 Let m = n*,a < 1 and mp < 

B Inn, for any constant B < 1—a. Then a random 

instance of the random intersection graphs model 

Gnjm,p contains a subset of at least n — o(n) 

vertices that can be colored using np colors, with 
aa —0-99 

probability at least 1 —e 


Note that the range of values of m, p considered 
in the above Theorem is quite wider than the 
one studied in [1]. For the proof, we combine 
ideas from [4] (see also [7]) and [10]. In 
particular, we define a Doob martingale as 
follows: Let v1,v2,...,Un be an arbitrary 
ordering of the vertices of Gump. For i = 
1,2,...,”, let B; be the subgraph of the 
associated bipartite graph for Gym,» (namely, 
Bn,m,p) induced by Ui iv; LM. We denote 
by H; the intersection graph whose bipartite 
graph has vertex set V|_)M and edge set that 
is exactly as B; between Ui 10; and M, 
whereas every other edge (i.e., the ones between 
UF ai 4,v; and M) appears independently with 
probability p. 

Let also X denote the size of the largest np- 
colorable subset of vertices in Gnm,p, and let X; 
denote the expectation of the largest n p-colorable 
subset in H;. Notice that X; is a random variable 
depending on the overlap between Gym,p and 
H;. Obviously, X = X,y and setting Xo = E[X], 
we have |X; — X;41| < 1, foralli = 1,2,...,n. 
It is straightforward to verify that the sequence 
Xo, X1,...,Xn is a Doob martingale, and thus 
we can use Azuma’s inequality to prove concen- 
tration of X,, around its mean value. However, the 
exact value of E[X] is unknown. Nevertheless, 
we could prove a lower bound on E[X] by 
providing a lower bound on the probability that 
X takes sufficiently large values. In particular, we 
showed that, for any positive constant € > 0, the 
probability that X takes values at least (1 — €)n 
is larger than the upper bound given by Azuma’s 
inequality, implying that E[X] > n —o(n). 

It is worth noting here that the proof of The- 
orem | can also be used to prove that O(np) 
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colors are enough to color n — o(n) vertices, 
even in the case where mp = flnn, for any 
constant 6 > 0. However, finding the exact con- 
stant multiplying 1p is technically more difficult. 
Finally, note that Theorem | does not provide any 
direct information for the chromatic number of 
Gn,m,p, because the vertices that remain uncol- 
ored could induce a clique in Gym, p in the worst 
case. 


An Efficient Algorithm 

Following our existential result of Theorem 1, 
we also proposed and analyzed an algorithm 
CliqueColor for finding a proper coloring of 
a random instance of Gyim,p, for any mp = 
In? n, where m = n*,a < 1. The algorithm 


uses information of the label sets assigned to the 

n2mp2 
Inn 

time whp (i.e., polynomial in 7 and m). In the 


algorithm, every vertex initially chooses indepen- 
dently uniformly at random a preference in colors 
from a set C, denoted by shade(-), and every 
label / chooses a preference in the colors of the 
vertices in L;, denoted by c;(-). Subsequently, the 
algorithm visits every label clique and fixes the 
color (according to preference lists) for as many 
vertices as possible without causing collisions 
with already-colored vertices. Finally, it finds 
a proper coloring to the remaining uncolored 
vertices, using a new set of colors C’. Algorithm 
CliqueColor is described below: 

It is evident that algorithm CliqueColor 
always finds a proper coloring of Gnim,p, but 
its efficiency depends on the number of colors 
included in C and C’. The main idea is that if 
we have enough colors in the initial color set 
C, then the subgraph H containing uncolored 
vertices will have sufficiently small maximum 
degree A(H), so that we can easily color it using 
A(H) extra colors. More specifically, we prove 
the following: 


vertices of Gym,p, and it runs in O 


Theorem 2 (Efficiency) Let m = n%,a < 1 
and mp > \n?n, p = 0 (Fz): Then algorithm 


CliqueColor succeeds in finding a proper 


@ (222)-coloring using for Gn.m,p in polyno- 


mial time whp. 
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Algorithm CliqueColor: 


Input: An instance Gym, p Of Gn,m,p and its associated 
bipartite By m,p- 
Output: A proper coloring of Gym, p- 


1. forevery v € V choose a color denoted by shade(v) 
independently, uniformly at random among those in C; 

2. for every 1 € M, choose a coloring of the vertices 

in L; such that, for every color in {c € C : du € 

L; with shade(v) = c}, there is exactly one vertex 

in the set {u € L; : shade(u) = c} having c;(u) = 

c, while the rest remain uncolored; 

set U = @andC = @; 

for / = 1 tom do { 

5. color every vertex in L;\(U U C) according to c;(-) 
iff there is no collision with the color of a vertex in 
LINC; 

6. include every vertex in L; colored that way in C and 
the rest in U; } enddo 

7. let H denote the (intersection) subgraph of Gy.m,p 
induced by the vertices in U and let A(H) be its 
maximum degree; 

8. give a proper A(H)-coloring of H using a new set of 
colors C’ of cardinality A(H); 

9. output a coloring of Gym,» using |C UC’ | colors; 


Bee 


It is worth noting that the number of colors 
used by algorithm CliqueColor in the case 


mp > \n?n,p = O (ze) andm = n*,a < | 
is of the correct order of magnitude (i.e., it is 
optimal up to constant factors). Indeed, by the 
concentration of the values of |S,| around mp for 
any vertex v with high probability, we can use 
the results of [11] on the independence number 
of the uniform random intersection graphs model 
Gnim,a» With A ~ mp, to provide a lower bound 
on the chromatic number. Indeed, it can be easily 
verified that the independence number of Gy ma 


for A = mp > In? n is at most © ( int), which 


mp 
implies that the chromatic number of Gy ma 
(hence also of the Gym,» because of the concen- 


tration of the values of |.S,|) is at least 2 (28e 7 


Inn 


Coloring Random Hypergraphs 

The model of random intersection graphs Gy m,p 
could also be thought of as generating random 
hypergraphs. The hypergraphs generated have 
vertex set V and edge set M. There is a huge 
amount of literature concerning coloring hyper- 
graphs. However, the question about coloring 
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there seems to be different from the one we con- 
sidered here. More specifically, a proper coloring 
of a hypergraph is any assignment of colors to the 
vertices, so that no monochromatic edge exists. 
This of course implies that fewer colors than 
the chromatic number (studied in this entry) are 
needed in order to achieve this goal. 

In Gnjm,p, the problem of finding a coloring 
such that no label is monochromatic seems to be 
quite easier when p is not too small. The proof of 
the following Theorem is based on the method of 
conditional expectations (see [2,8]). 


Theorem 3 Let Gy m,p be a random instance of 
the model Gn m,p, for p = @ (m2) and m = n®, 
for any fixed a > 0. Then with high probability, 
there is a polynomial time algorithm that finds a 
k-coloring of the vertices such that no label is 


monochromatic, for any fixed integer k > 2. 


Applications 


Graph coloring enjoys many practical applica- 
tions as well as theoretical challenges. Beside the 
classical types of problems, different limitations 
can also be set on the graph, or on the way a 
color is assigned, or even on the color itself. 
Some of the many applications of graph coloring 
include modeling scheduling problems, register 
allocation, pattern matching, etc. 

Random intersection graphs are relevant to 
and capture quite nicely social networking. In- 
deed, a social network is a structure made of 
nodes (individuals or organizations) tied by one 
or more specific types of interdependency, such 
as values, visions, financial exchange, friends, 
conflicts, web links, etc. Social network analysis 
views social relationships in terms of nodes and 
ties. Nodes are the individual actors within the 
networks and ties are the relationships between 
the actors. Other applications include oblivious 
resource sharing in a (general) distributed setting, 
efficient and secure communication in sensor net- 
works, interactions of mobile agents traversing 
the web, etc. Even epidemiological phenomena 
(like spread of disease) tend to be more accurately 


341 


captured by this “interaction-sensitive” random 
graphs model. 


Open Problems 


In [1], the authors present (among other 
results) an algorithm for coloring Gyjm,p in 
the case where m = n%,a < 1 andmp = 


O (tz): In contrast, we presented algorithm 
CliqueColor, which finds a proper coloring 
of Gram,p using O(x(Gnim,p)) Wwhp, in the 
case m = na < 1 and mp > In*n. It 
remains open whether we can construct efficient 
algorithms (both in terms of the running time and 
the number of colors used) for finding proper 
colorings of Gyjm,py for the range of values 


92 (sta) <mp< In? n. 
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Problem Definition 


In the field of combinatorial generation, the goal 
is to have fast elegant algorithms and code for 
exhaustively listing the elements of various com- 
binatorial classes such as permutations, combina- 
tions, partitions, trees, graphs, and so on. Often 
it is desirable that successive objects in the list- 
ing satisfy some closeness condition, particularly 
conditions where successive objects differ only 
by a constant amount. Such listings are called 
combinatorial Gray codes; thus the study of com- 
binatorial Gray codes is an important subfield of 
combinatorial generation. 
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There are a variety of applications of combina- 
torial objects where there is an inherent closeness 
operation that takes one object to another object 
and vice versa. There is a natural closeness graph 
G = (V, £) that is associated with this setting. 
The vertices, V, of this graph are the combina- 
torial objects and edges, EF, are between objects 
that are close. A combinatorial Gray code then 
becomes a Hamilton path in G and a Gray cycle 
is a Hamilton cycle in G. 

The term Combinatorial Gray Code seems to 
have first appeared in print in Joichi, White, 
and Williamson [4]. An excellent survey up 
to 1997 was provided by Savage [10], and 
many examples of combinatorial Gray codes 
may be found in Knuth [5]. There are literally 
thousands of papers and patents on Gray 
codes in general, and although fewer about 
combinatorial Gray codes, this article can 
only scratch the surface and will focus on 
fundamental concepts, generalized settings, and 
recent results. 


The Binary Reflected Gray Codes 

The binary reflected Gray code (BRGC) is a 
well-known circular listing of the bit strings of a 
fixed length in which successive bit strings differ 
by a single bit. (The word bit string is used in 
this article instead of “binary string.”) Let B, 
be the Gray code list of all bit strings of length 
n. The list is defined by the following simple 
recursion: 


Bo =e, andforn >0 Bn41 = OBn, 1B2. 
(1) 


In this definition, ¢ is the empty string, 
the comma represents concatenation of lists, 
the symbol x preceding a list indicates that 
an x is to be prepended to every string 
in list, and the superscript R indicates that 
the list is reversed. For example, By = 
0,1, By = 00,01,11,10, and Bz, = 
000, 001,011,010, 110, 111, 101, 100. 

Figure | illustrates one of the most useful 
applications of the BRGC. Imagine a rotating 
“shaft,” like a photocopier drum or a shutoff valve 
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Combinatorial Gray Code, Fig. 1 Applications of the binary reflected Gray code (a, b, ¢: see explanation in the text) 


on a pipeline, in which the amount of rotation 
is to be determined by reading n “bits” from 
a sensor (c). If the sensor position is between 
two adjacent binary strings, then the result is 
indeterminate; so, for example, if the sensor fell 
between 0110 and 1100, then it could return 
either of these, but also 0100 or 1110 could be 
returned. If the bits are encoded around the shaft 
as a sequence of bit strings in lexicographic order, 
as shown in (a), and the sensor falls between 0000 
and 1111, then any bit string might be returned! 
This problem is entirely mitigated by using a 
Gray code, such as the BRGC (b), since then only 
one of the two bit strings that lay under the sensor 
will be returned. 

The term “Gray” comes from Frank Gray and 
engineer at Bell Labs who was issued a patent 
that uses the BRGC (Pulse code communica- 
tion, March 17, 1953. U.S. patent no. 2,632,058). 
However, the BRGC was known much earlier; in 
particular it occurs in solutions to the Chinese 
rings puzzle and was used to list diagrams in 
the I Ching. See Knuth [5] for further historical 
background. 

Surprisingly, new things are yet being proved 
about the BRGC; e.g., Williams [15] shows that 
the BRGC is generated by the following itera- 
tive “greedy” rule: Starting with the all Os bit 
string, flip the rightmost bit that yields a bit 
string that has not already been generated. There 
are many other useful Gray code listings of bit 
strings known, for example, where the bit flips are 
equally distributed over the indexing positions or 
where all bit strings of density k (ie., having k 
1s) are listed before any of those of density k +2. 


Combinations as Represented by Bit Strings 
Many other simple Gray codes can be described 
using rules in the style of (1). For example, 
Bik = OBn-1,k; Le ie gives a Gray code of 
all bit strings of length n with exactly & Is. It has 
the property that successive bit strings differ by 
the transposition of two (possibly nonadjacent) 
bits. Observe also that this is the list obtained 
from the BRGC by deleting all bit strings that do 
not contain k Is. 

A more restrictive Gray code for combinations 
is that of Eades and McKay [1]. Here the re- 
cursive construction (1 < k < n)is E,xx = 
OF 1415 Pesaran 11 En—2,4—-2. This Gray 
code has the nice property that successive bit 
strings differ by a transposition of two bits and 
only Os lie between the two transposed bits. 
It is worth noting that there is no Gray code 
for combinations by transpositions in which the 
transposed bits are always adjacent (unless n is 
even and k is odd). 


Generating Permutations via Plain 


Changes 
The second most well-known combinatorial Gray 
code lists all n! permutations of {1,2,...,”} in 


one-line notation in such a way that successive 
permutations differ by the transposition of two 
adjacent elements. In its algorithmic form it is 
usually attributed to Johnson [3] and Trotter [13] 
although it has been used for centuries by cam- 
panologists [5], who refer to it as plain changes. 
Given such a list L,_, for {1,2,..., — 1}, the 
list L, can be created by successively sweeping 
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the n back-and-forth through each permutation 
in Ly-1. For example, if L3 is 123, 132, 312, 
321, 231, 213, then the first 8 permutations of 
La are 1234, 1243, 1423, 4123, 4132, 1432, 
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1342, 1324. The “weave” below illustrates plane 
changes form = 5, where the permutations occur 
as columns and the leftmost column corresponds 
to the permutation 12345, read top to bottom. 


CS PRK 


Hamiltonicity 
In our BRGC example above, the closeness op- 
eration is the flipping of a bit, and the close- 
ness graph is the hypercube. In a Gray code 
for permutations, a natural closeness operation is 
the transposing of two adjacent elements; in this 
case the closeness graph is sometimes called the 
permutohedron. 

Sometimes it happens that the closeness graph 
G has no Hamilton path. If it is not connected, 
then there is no hope of finding a Gray code, 
but if it is connected, then there are several 
approaches that have proved successful in finding 
a Gray code in a related graph; the result is a 
weaker Gray code that permits 2 or 3 applications 
of the closeness operation. One approach is to 
consider the prism G x e [7]. If G is bipartite 
and Hamiltonian, then the Gray code consists of 
every other vertex along the Hamilton cycle. As 
a last resort the result of Sekanina below [12] can 
be used. An implementation of Sekanina’s proof, 
called prepostorder, is given by Knuth [5]. 


Theorem 1 (Sekanina) /f G is connected, then 
G? is Hamiltonian. 


At this point the reader may be wondering: 
“What distinguishes the study of Gray codes from 
the study of Hamiltonicity of graphs?” The un- 
derlying motivation for most combinatorial Gray 
codes is algorithmic; proving the existence of the 
Hamilton cycle is only the first step; one then tries 
to get an efficient implementation according to 
the criteria discussed in the next section (e.g., the 
recent “middle-level” result gives an existence 
proof via a complex construction — but it lacks the 
nice construction that one would strive for when 
thinking about a Gray code). Another difference 
is that the graphs studied usually have a very 
specific underlying structure; often this structure 


is recursive and is exploited in the development 
of efficient algorithms. 


The Representation Issue 

Many combinatorial structures exhibit a 
chameleonic nature. The canonical example 
of this is the objects counted by the Catalan 
numbers, C,, of which there are hundreds. For 
example, they count the well-formed parentheses 
strings with n left and n right parentheses but 
also count the binary trees with n nodes. The 
most natural ways of representing these are very 
different; well-formed parentheses are naturally 
represented by bit strings of length 2, whereas 
binary trees are naturally represented using a 
linked structure with two pointers per cell. 
Furthermore, the natural closeness conditions 
are also different; for parentheses, a swap of two 
parentheses is natural, and for binary trees, the 
class rotation operation is most natural. When 
discussing Gray codes it is imperative to know 
precisely what representation is being used (and 
then what closeness operation is being used). 


Algorithmic Issues 

For the vast majority of combinatorial Gray 
codes, space is the main enemy. The first task 
of the algorithm designer is to make sure that 
their algorithm uses an amount of space that is 
a small polynomial of the input size; algorithms 
that rely on sublists of the objects being listed are 
doomed even though many Hamiltonicity proofs 
naively lead to such algorithms. For example, 
an efficient algorithm for generating the BRGC 
cannot directly use (1) since that would require 
exponential space. 


CAT Algorithms, Loopless Algorithms 
From the global point of view, the best possible 
algorithms are those that output the objects (V) 
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in time that is proportional to the number of 
objects (|V|). Such algorithms are said to be CAT, 
standing for constant amortized time. 

From a more local point of view, the best pos- 
sible algorithms are those that output successive 
objects so that the amount of work between suc- 
cessive objects is constant. Such algorithms are 
said to be loopless, a term introduced by Ehrlich 
[2]. Both the BRGC and the plain changes algo- 
rithm for permutations mentioned above can be 
implemented as loopless algorithms. 

Note that in both of these definitions we ignore 
the time that it would take to actually output the 
objects; what is being measured is the amount 
of data structure change that is occurring. This 
measure is the correct one to use because in 
many applications it is only the part of the data 
structure that changes that is really needed by the 
application (e.g., to update an objective function). 


Key Results 


Below are listed some combinatorial Gray codes, 
focusing on those that are representative, are very 
general, or are recent breakthroughs. 


Numerical Partitions [5,9] 

Objects: All numerical partitions of an integer 7. 

Representation: Sequence of positive integers 
a, > a2 >--- suchthat a; + a2---=n. 

Closeness operation: Two partitions a and a’ are 
close if there are two indices 7 and j such that 
a; =a; +landa, =a; —1. 

Efficiency: CAT. 

Comments: Results have been extended to the 
case where all parts are distinct and where the 
number of parts or the largest part is fixed. 


Spanning Trees 

Objects: Spanning trees of connected unlabeled 
graphs. 

Representation: List of edges in the graph. 

Closeness operation: Successive trees differ by a 
single edge replacement. 

Efficiency: CAT. 

References: University of Victoria MSc thesis of 
Malcolm Smith. Knuth 4A, [5], pages 468- 
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469. Knuth has an implementation of the 
Smith algorithm on his website mentioned 
below; see the programs *SPAN. 


Basic Words of Antimatroids [7] 

Objects: Let A be a set system over [n] := 
{1,2,...,m} that (a) is closed under union 
(S UT ¢€ A for all S,T € A) and (b) is 
accessible (for all S € A with S # @, there 
is an x € S such that S \ {x} € A). Such 
a set system A is an antimatroid. Repeated 
application of (b) starting with the set [n] and 
ending with @ gives a permutation of [n] called 
a basic word. 

Representation: A permutation of [/] in one-line 
notation. 

Closeness operation: Successive permutations 
differ by the transposition of one or two 
adjacent elements. 

Efficiency: CAT if there is an O(1) “oracle” for 
determining whether the closeness operation 
applied to a basic word gives another basic 
word of A; loopless if the antimatroid is the 
set of ideals of a poset. 

Important special cases: Linear extensions 
of posets (partially ordered sets), convex 
shellings of finite point sets, and perfect elim- 
ination orderings of chordal graphs. Linear 
extensions have as special cases permutations 
(of a multiset), k-ary trees, standard young 
tableau, alternating permutations, etc. 

Additional notes: If G is the cover graph of an 
antimatroid A where the sets are ordered by 
set inclusion, then the prism of G is Hamil- 
tonian. Thus there is a Gray code for the 
elements of A. No CAT algorithm is known 
for this Gray code, even in the case where the 
antimatroid consists of the ideals of a poset. 


Words in a Bubble Language [8, 11] 

Objects: A language L over alphabet {0,1} is 
a bubble language if it is closed under the 
operation of changing the rightmost 01 to a 10. 

Representation: Bit strings of length n (i.e., ele- 
ments of {0, 1}”). 

Closeness operation: Two 01 <> 10 swaps, or 
(equivalently) one rotation of a prefix. 
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Important special cases: Combinations, __ well- 
formed parentheses, necklaces, and prefix 
normal words. 

Efficiency: CAT if there is an O(1) oracle for 
determining membership under the operation. 


Permutations via o and t [16] 

Objects: Permutations of [7]. 

Representation: One-line notation. 

Closeness operation: Successive permutations 
differ by the rotation o = (1 2 --: n) or 
the transposition t = (1 2) as applied to the 
indices of the representation. 

Efficiency: CAT, but has a loopless implementa- 
tion if only the successive o or t generators 
are output or if the permutation is represented 
using linked lists. 

Comments: This is known as Wilf’s (directed) 
o — Tt problem. 


Middle Two Levels of the Boolean 

Lattice [6] 

Objects: All subsets S of B2,4 1 of density n or 
n+l. 

Representation: Characteristic bit strings (b; = 
1 if and only ifi € S). 

Closeness operation: Bit strings differ by a trans- 
position of adjacent bits. 

Efficiency: Unknown. A good open problem. 

Comments: This is famously known as the “mid- 
dle two-level problem.” 


URLs to Code and Data Sets 


Don Knuth maintains a web page which contains 
some implementations of combinatorial Gray 
codes at http://www-cs-faculty.stanford.edu/~ 
uno/programs.html. See, in particular, the pro- 
grams GRAYSPAN, SPSPAN, GRAYSPSPAN, 
KODA-RUSKEY, and SPIDERS. 

Jeorg Arndt maintains a website (http://www. 
jjj-de/fxt/) and book with many C programs for 
generating combinatorial objects, some of which 
are combinatorial Gray codes. The book may be 
freely downloaded. Chapter 14, entitled “Gray 
codes for strings with restrictions”, is devoted to 
combinatorial Gray codes, but they can also be 
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found in other chapters; e.g., 15.2 and 15.3 both 
contain Gray codes for well-formed parentheses 
strings. 

The “Combinatorial Object Server” at http:// 
www.theory.cs.uvic.ca/~cos/ allows you to pro- 
duce small lists of combinatorial objects, often in 
various Gray code orders. 


Cross-References 


Entries relevant to combinatorial generation (not 
necessarily Gray codes): 


Enumeration of Non-crossing Geometric 
Graphs 

Enumeration of Paths, Cycles, and Spanning 
Trees 

Geometric Object Enumeration 

Permutation Enumeration 

Tree Enumeration 
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Problem Definition 


Tile Assembly Models 

Two of the most studied tile self-assembly models 
in the literature are the abstract Tile Assembly 
Model (aTAM) [7] and the Two-Handed Tile 
Assembly Model (2HAM) [4]. Both models con- 
stitute a mathematical model of self-assembly in 
which system components are four-sided Wang 
tiles with glue types assigned to each tile edge. 
Any pair of glue types are assigned some nonneg- 
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ative interaction strength denoting how strongly 
the pair of glues bind. The models differ in their 
rules for growth in that the aTAM allows single- 
ton tiles to attach one at a time to a growing seed, 
whereas the 2HAM permits any two previously 
built assemblies to combine given enough affinity 
for attachment. 

In more detail, an aTAM system is an ordered 
triplet (T, t,o) consisting of a set of tiles T, a 
positive integer threshold parameter t called the 
system’s temperature, and a special tileo € T 
denoted as the seed tile. Assembly proceeds by 
attaching copies of tiles from 7 to a growing seed 
assembly whenever the placement of a tile on the 
2D grid achieves a total strength of attachment 
from abutting edges, determined by the sum of 
pairwise glue interactions, that meets or exceeds 
the temperature parameter t. An additional twist 
that is often considered is the ability to specify 
a relative concentration distribution on the tiles 
in 7. The growth from the initial seed then 
proceeds randomly with higher concentrated tile 
types attaching more quickly than lower con- 
centrated types. Even when the final assembly 
is deterministic, adjusting concentration profiles 
may substantially alter the expected time to reach 
the unique terminal state. 

The Two-Handed Tile Assembly Model 
(2HAM) [4] is similar to the aTAM, but removes 
the concept of a seed tile. Instead, a 2HAM 
system (7, T) produces anew assembly whenever 
any two previously built (and potentially large) 
assemblies may translate together into a new 
stable assembly based on glue interactions and 
temperature. The distinction between the 2HAM 
and the aTAM is that the 2HAM allows large 
assemblies to grow independently and attach 
as large, pre-built assemblies, while the aTAM 
grows through the step-by-step attachment of 
singleton tiles to a growing seed. 

A typical goal in tile self-assembly is to design 
an efficient tile system that uniquely assembles a 
target shape. Two primary efficiency metrics are 
(1) the number of distinct tile types used to self- 
assemble the target shape and (2) the expected 
time the system takes to self-assemble the target 
shape. Toward minimizing the number of tiles 
used to build a shape, the Minimum Tile Set 
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Problem is considered. Toward the goal of min- 
imizing assembly time, the problem of selecting 
an optimal concentration distribution over the 
tiles of a given set is considered in the Tile Con- 
centration Problem. Finally, the computational 
problem of simply verifying whether a given 
system correctly and uniquely self-assembles a 
target shape is considered in the Unique Assem- 
bly Verification Problem. Formally, the problems 
are as follows: 


Problem 1 (The Minimum Tile Set Prob- 
lem [2]) Given a shape, find the tile system with 
the minimum number of tile types that uniquely 
self-assembles into this shape. 


Problem 2 (The Tile Concentration Prob- 
lem [2]) Given a shape and a tile system 
that uniquely produces the given shape, assign 
concentrations to each tile type so that the 
expected assembly time for the shape is 
minimized. 

Problem 3 (The Unique Assembly Verification 
Problem [2,4]) Given a tile system and an as- 
sembly, determine if the tile system uniquely self- 
assembles into the assembly. 


Key Results 


Minimum Tile Set Problem 

The NP-completeness of the Minimum Tile Set 
Problem within the aTAM is proven in [2] by a 
reduction from 3CNF-SAT. The proof is notable 
in that the polynomial time reduction relies on the 
polynomial time solution of the Minimum Tile 
Set Problem for tree shapes, which the authors 
show is polynomial time solvable. The authors 
also show that the Minimum Tile Set Problem 
is polynomial time solvable for n x n squares 
by noting that since the optimal solution has at 
most O(log) tile types [7], a brute force search 
of candidate tile sets finishes in polynomial time 
as long as the temperatures of the systems under 
consideration are all a fixed constant. Extending 
the polynomial time solution to find the mini- 
mum tile system over any temperature is achieved 
in [5]. 
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Theorem 1 The Minimum Tile Set Problem is 
NP-complete within the aTAM. For the restricted 
classes of shapes consisting of squares and trees, 
the Minimum Tile Set Problem is polynomial time 
solvable. 


Concentration Optimization 

The next result provides an approximation algo- 
rithm for the Tile Concentration Problem for a 
restricted class of aTAM tile system called partial 
order systems. Partial order systems are systems 
in which a unique assembly is constructed, and 
for any pair of adjacent tiles in the final assembly 
which have positive bonding strength, there is 
a strict order in which the two tiles are placed 
with respect to each other for all possible as- 
sembly sequences. For such systems, a O(log 7)- 
approximation algorithm is presented [2]. 


Theorem 2 For any partial order aTAM system 
(T, t, 0) that uniquely self-assembles a size-n as- 
sembly, there exists a polynomial time O(log n)- 
approximation algorithm for the Tile Concentra- 
tion Problem. 


Assembly Verification 

The next result provides an important distinction 
in verification complexity between the aTAM and 
the 2HAM. In [2] a straightforward quadratic 
time algorithm for assembly verification is pre- 
sented. In contrast, the problem is shown to be 
co-NP-complete in [4] through a reduction from 
3CNF-SAT. The hardness holds for a 3D general- 
ization of the 2HAM, but requires only | step into 
the third dimension. To achieve this reduction, 
the exponentially many candidate 3CNF-SAT so- 
lutions are engineered into the order in which 
the system might grow while maintaining that 
these candidate paths all collapse into a single 
polynomial-sized final assembly in the case that 
no satisfying solution exists. This reduction fun- 
damentally relies on the third dimension and thus 
leaves open the complexity of 2D verification in 
the 2HAM. 


Theorem 3 The Unique Assembly Verification 
Problem is co-NP-complete for the 3D 2HAM 
and solvable in polynomial time O(|A|? +|A||T|) 
in the aTAM. 
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Open Problems 


A few open problems in this area are as fol- 
lows. The Minimum Tile Set Problem has an 
efficient solution for squares which stems from 
a logarithmic upper bound on the complexity of 
assembling such shapes. This holds more gen- 
erally for thick rectangles, but this ceases to be 
true when the width of the rectangle becomes 
sufficiently thin [3]. The complexity of the Min- 
imum Tile Set Problem is open for this class of 
simple geometric shapes. For the Tile Concen- 
tration Problem, an exact solution is conjectured 
to be #P-hard for partial order systems [2], but 
this has not been proven. More generally, little 
is known about the Tile Concentration Problem 
for non-partial order systems. Another direction 
within the scope of minimizing assembly time 
is to consider optimizing over the tiles used, as 
well as the concentration distribution over the 
tile set. Some work along these lines has been 
done with respect to the fast assembly of n x 
n squares [1] and the fast implementation of 
basic arithmetic primitives in self-assembly [6]. 
In the case of the Unique Assembly Verifica- 
tion Problem, the complexity of the problem 
for the 2HAM in 2D is still unknown. For the 
aTAM, it is an open question as to whether the 
quadratic run time of verification can be im- 
proved. 
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Problem Definition 


Two players — Alice and Bob - are playing a 
game in which their shared goal is to compute 
a function f : ¥ x Y > Z efficiently. The 
game starts with Alice holding a value x € ¥ 
and Bob holding y € Y. They then communicate 
by sending each other messages according to 
a predetermined protocol, at the end of which 
they must both arrive at some output z € Z. 
The protocol is deemed correct if z = f(x,y) 
for all inputs (x, y). Each message from Alice 
(resp. Bob) is an arbitrary binary-string-valued 
function of x (resp. y) and all previous messages 
received during the protocol’s execution. The cost 
of the protocol is the maximum total length of all 
such messages, over all possible inputs, and is the 
basic measure of efficiency of the protocol. The 
central goals in communication complexity [23] 
are (i) to design protocols with low cost for 
given problems of interest, and (ii) to prove lower 
bounds on the cost that must be paid to solve a 
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given problem. The minimum possible such cost 
is anatural measure of complexity of the function 
f and is denoted D(/). 

Notably, the “message functions” in the above 
definition are not required to be efficiently 
computable. Thus, communication complexity 
focuses on certain basic information theoretic 
aspects of computation, abstracting away messier 
and potentially unmanageable lower-level details. 
Arguably, it is this aspect of communication 
complexity that has made it such a successful 
paradigm for proving lower bounds in a wide 
range of areas in computer science. 

Most work in communication complexity fo- 
cuses on randomized protocols, wherein random 
coin tosses (equivalently, a single random binary 
string) may be used to determine the messages 
sent. These coin tosses may be performed either 
in private by each player or in public: the result- 
ing protocols are called private coin and public 
coin, respectively. A randomized protocol is said 
to compute f with error bounded by ¢ > 0 if, 
for all inputs (x, y), its output on (x, y) differs 
from f(x,y) with probability at most e. With 
this notion in place, one can then define the e- 
error randomized communication complexity of 
f, denoted R,(/), analogously to the determin- 
istic one. By convention, this notation assumes 
private coins; the analogous public-coin variant 
is denoted R?*?( f). Further, when f has Boolean 
output, it is convenient to put R(f) = Ri /3(/). 
Clearly, one always has R™>(f) < R(f) < 
D(f). 

Consider a probability distribution jz on the 
input domain ¥Y x Y. A protocol’s error un- 
der yz is the probability that it errs when given 
a random input (X,Y) ~ yw. The e-error p- 
distributional complexity of f, denoted D’(f), 
is then the minimum cost of a deterministic pro- 
tocol for f whose error, under j, is at most 
€; an easy averaging argument shows that the 
restriction of determinism incurs no loss of gen- 
erality. The fundamental minimax principle of 
Yao [22] says that Re(f) = max, D?(/f). In 
particular, exhibiting a lower bound on D#(f) 
for a wisely chosen jz lower bounds R,(f); 
this is a key lower-bounding technique in the 
area. 
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Let IT be a protocol that uses a public random 
string R as well as private random strings: R4 
for Alice, Rg for Bob. Let I7(x, y, R, Ra, Rp) 
denote the transcript of conversation between 
Alice and Bob, on input (x, y). The internal and 
external information costs of 7 with respect to 
the distribution jz are then defined as follows: 


icost(IT) = I(IT(X, Y, R, Ra, Rp): X | Y,R) 
+I(I1(X, Y, R, Ra, Rp):Y |X, R), 
icost®™'(IT) = I(IT(X, Y, R, Ra, Rp) : X,Y |R), 


where (X,Y) ~ yw and I denotes mutual in- 
formation. These definitions capture the amount 
of information learned by each player about the 
other’s input (in the internal case) and by an 
external observer about the total input (in the 
external case) when /7 is run on a random p- 
distributed input. It is elementary to show that 
these quantities lower bound the actual com- 
munication cost of the protocol. Therefore, the 
corresponding information complexity [3,9] mea- 
sures — denoted IC“(f) and IC#’"( f) — defined 
as the infima of these costs over all worst-case 
é-error protocols for f, naturally lower bound 
R,(/ ). This is another important lower-bounding 
technique. 


Key Results 


In a number of basic communication problems, 
Alice’s and Bob’s inputs are n-bit strings, de- 
noted x and y, respectively, and the goal is to 
compute a Boolean function f(x, y). We shall 
denote the ith bit of x as x;. The bound D(f) < 
n+ 1 is then trivial, because Alice can always just 
sent Bob her input, for a cost of n. 

The textbook by Kushilevitz and Nisan [14] 
gives a thorough introduction to the subject and 
contains detailed proofs of several of the results 
summarized below. We first present results about 
specific communication problems, then move on 
to more abstract results about general problems, 
and close with a few applications of these results 
and ideas. 
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Problem-Specific Results 


Equality Testing 

This problem is defined by the equality func- 
tion, given by EQ,(x,y) = lif x = y and 
EQ,(x,y) = O otherwise. This can be solved 
nontrivially by a randomized protocol wherein 
Alice sends Bob a fingerprint of x, which Bob 
can compare with the corresponding fingerprint 
of y generated using the same random seed. 
Using public coins, a random n-bit string r can 
be used as a seed to generate the fingerprint 
(x,r) = 7.1 xir; mod 2. One can readily 
check that this yields R'°(zQ,) = O(1) and, 
more generally, R™>(EQ,) = O(log(1/e)). In 
the private coin setting, one can use a different 
kind of fingerprinting, e.g., by treating the bits 
of x as the coefficients of a degree-n polynomial 
and evaluating it at a random element of Fy (for 
a large enough prime q) to obtain a fingerprint. 
This idea leads to the bound R(EQ,,) = O(logn). 

Randomization is essential for the above re- 
sults: it can be shown that D(EQ,) => n. The 
argument relies on the fundamental rectangle 
property of deterministic protocols, which states 
that the set of inputs (x,y) that lead to the 
same transcript must form a combinatorial rect- 
angle inside the input space VY x Y. This can 
be proved by induction on the length of the 
transcript. This rectangle property then implies 
that ifu A v € {0,1}”, then a correct protocol 
for EQ, cannot have the same transcript on inputs 
(u,u) and (v,v); otherwise, it would have the 
same transcript of (u,v) as well, and therefore 
err, because EQ,,(u,u) # EQ,(u,v). It follows 
that the protocol must have at least 2” distinct 
transcripts, whence one of them must have length 
at least n. 

It can also be shown that the upper bounds 
above are optimal [14]. The lower bound 
R(EQ,) = 2(logn) is a consequence of the 
more general result that D(f) = 2°89) for 
every Boolean function f. The lower bound 
R™>(EQ,) = Q2(log(1/e)) follows from Yao’s 
minimax principle and a version of the above 
rectangle-property argument. 

More refined results can be obtained by con- 
sidering the expected (rather than worst case) cost 
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of an r-round protocol, i.e., a protocol in which a 
total of r are sent: in this case, Pub \(EQ,) = 
O(log log---log(min{n, log(1/e)})) with the 
outer chain of logarithms iterated (r — 1) times. 
This is tight [7]. Another, incomparable, result 
is that EQ, has a zero-error randomized protocol 
with information cost only O(1), regardless of 
the joint distribution from which the inputs are 
drawn [5]. 


Comparison 
This problem is defined by the greater-than func- 
tion, given by GT,(x,y) = lif x > y and 


GTp(x, y) = 0 otherwise, where we treat x and 
y as integers written in binary. Like EQ,, it has 
no nontrivial deterministic protocol, for much the 
same reason. As before, this implies R(GT,) = 
Q2(logn). 

In fact the tight bound R(GT,) = O(logn) 
holds, but the proof requires a subtle argument. 
Binary search based on equality testing on sub- 
strings of x and y allows one to zoom in, in 
O(logn) rounds, on the most significant bit po- 
sition where x differs from y. If each equality 
test is allowed O(1/logn) probability of error, 
a straightforward union bound gives an O(1) 
overall error rate, but this uses O(log login) com- 
munication per round. The improvement to an 
overall O(log) bound is obtained by preceding 
each binary search step with an extra “sanity 
check” equality test on prefixes of x and y and 
backtracking to the previous level of the binary 
search if the check fails: this allows one to use 
only O(1) communication per round. 

The bounded-round complexity of GT, is 
also fairly well understood. Replacing the binary 
search above with an n!/"-ary search leads to the 
r-round bound R&?(GT,) = O(n'/" logn). A 
lower bound of Q(n!/"/r?) can be proven by 
carefully analyzing information cost. 


Indexing and Bipartite Pointer Jumping 

The indexing or index problem is defined by the 
Boolean function INDEX,(x,k) = xx, where 
x € {0,1}” as usual, but Bob’s input k € [n], 
where [n] = ({1,2,...,n}. Straightforward 
information-theoretic arguments show that the 
one-round complexity R® (INDEX) = (2(n), 
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where the single message must go from 
Alice to Bob. Without this restriction, clearly 
R(INDEX,) = O(logn). A more delicate 
result [17] is that in a (1/3)-error protocol for 
INDEXy, for any b ¢€ [1,logz], either Bob must 
send b bits or Alice must send n/2°) bits; an 
easy 2-round protocol shows that this trade-off 
is optimal. Even more delicate results, involving 
information cost, are known, and these are useful 
in certain applications (see below). 

The indexing problem illustrates that inter- 
action can improve communication cost expo- 
nentially. This can be generalized to show that 
r + 1 rounds can be exponentially more pow- 
erful than r rounds. For this, one considers the 
bipartite pointer jumping problem, where Al- 
ice and Bob receive functions f, g [In] > 
[n], respectively, and Bob also receives y € 
[n]. Their goal is to compute PJ,n(fig,y) = 
h,(---h2(hy(y)) +--+) mod 2, where h; = f for 
odd i and h; = g for even i. Notice that 
PJi,n is essentially the same as INDEX, and 
that ROT) (py, .) = O(r logn). Suitably gen- 
eralizing the information-theoretic arguments for 
INDEX, shows that R”(PJ,.,) = Q(n/r?). 


Inner Product Parity 

The Boolean function IP, (x, y) = (x, y), which 
is the parity of the inner product }“"_, xiyi, is 
the most basic very hard communication prob- 
lem: solving it to error 5 — 6 (for constant 5) 
requires n — O(log(1/6)) communication. This 
is proved by considering the distributional com- 
plexity D/(IP,), where j is the uniform distribu- 
tion and lower bounding it using the discrepancy 
method. Observe that a deterministic protocol 
IT with cost C induces a partition of the input 
domain into 2© combinatorial rectangles, on each 
of which JT has the same transcript, hence the 
same output. If /7 has error at most 5-65 under ju, 
then the jz-discrepancies of these rectangles —i.e., 
the differences between the jz-measures of the Os 
and 1s within them — must sum up to at least 26. 
Letting disc’(f) denote the maximum over all 
rectangles R in ¥ x Y of the jz- discrepancy of R, 
we then obtain 2© disc“(f) > 28, which allows 
us to lower bound C if we can upper bound the 
discrepancy disc'(f). 
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For the function IP, the matrix 


(IP(X, Y))xef0,1}",yef0,1}" 1S easily seen to be 
a Hadamard matrix, whose spectrum is well 
understood. With a bit of matrix analysis, this 
enables the discrepancy of IP under a uniform 
jt to be computed very accurately. This in turn 
yields the claimed communication complexity 
lower bound. 


Set Disjointness 

The problem of determining whether Alice’s set 
x C [n] is disjoint from Bob’s set y C [n], de- 
noted DISJ,(x, y), is, along with its natural gen- 
eralizations, the most studied and widely useful 
problem in communication complexity. It is easy 
to prove, from first principles, the strong lower 
bound D(DISJ,(x, y)) = n — o(n). Obtaining 
a similarly strong lower bound for randomized 
complexity turns out to be quite a challenge, one 
that has driven a number of theoretical innova- 
tions. 

The discrepancy method outlined above is 
provably very weak at lower bounding R(DISJ,). 
Instead, one considers a refinement called the 
corruption technique: it consists of showing that 
“large” rectangles in the matrix for DISJ, cannot 
come close to consisting purely of 1-inputs (.e., 
disjoint pairs (x, y)) but must be corrupted by 
a “significant” fraction of 0-inputs. On the other 
hand, a sufficiently low-cost communication pro- 
tocol for DISJ, would imply that at least one such 
large rectangle must exist. The tension between 
these two facts gives rise to a lower bound on 
D#(DISJn), where yz and « figure in the quantifi- 
cation of “large” and “significant” above. Follow- 
ing this outline, Babai et al. [2] proved an Q(./n) 
lower bound using the uniform distribution. This 
was then improved by using a certain non-product 
input distribution, i.e., one where the inputs x 
and y are correlated — a provably necessary 
complication — to the optimal {2 (7), initially via a 
complicated Kolmogorov complexity technique, 
but eventually via elementary (though ingenious) 
combinatorics by Razborov [20]. Subsequently, 
Bar-Yossef et al. [4] re-proved the (2(n) bound 
via a novel notion of conditional information 
complexity, a proof that has since been reworked 
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to use the more natural internal information com- 
plexity [5]. 

Disjointness is also interesting and natural as 
a multiparty problem, where each of ¢f players 
holds a subset of [7] and they wish to determine if 
these are disjoint. An important result with many 
applications (see below) is that under a promise 
that the sets are pairwise disjoint except perhaps 
at one element, this requires 2(n/t) commu- 
nication, even if the players communicate via 
broadcast; this bound is essentially tight. Without 
this promise, and with only private message chan- 
nels between the f players, disjointness requires 
§2(tn) communication. 


Gap Hamming Distance 

This problem is defined by the partial Boolean 
function on {0, 1}”x{0, 1}” given by GHDy (x, y) = 
0 if |x — ylli < 52 — Jn; GHD,(x, y) = Lif 
|x — yl] = $n + Jn; and GHDn(x,y) = * 
otherwise. Correctness and error probability for 
protocols for GHD, are based only on inputs 
not mapped to x. After several efforts giving 
special-case lower bounds, it was eventually 
proved [8] that R(GHD,) = £2(n) and in 
particular D&(GHD,) = 2(n) with pw being 
uniform and ¢ = ©(1) being sufficiently small. 
This bound provably does not follow from the 
corruption method because of the presence of 
large barely-corrupted rectangles in the matrix 
for GHD,; instead it was proved using the so- 
called smooth corruption technique [12]. 


General Complexity-Theoretic Results 
There is a vast literature on general results con- 
necting various notions of complexity for com- 
munication problems. As before, we survey some 
highlights. Throughout, we consider a general 
function f : {0,1}" x {0, 1}” — {0, 1}. 


Determinism vs. Public vs. Private Randomness 

A private-coin protocol can be deterministically 
simulated by direct estimation of the probabil- 
ity of generating each possible transcript. This 
leads to the relation R(f) = S2(logD(/)). 
This separation is the best possible, as witnessed 
by EQ,. Further, a public-coin protocol can be 
restricted to draw its random string — no matter 
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how long — from a fixed set S consisting of 
O(n) strings, for a constant additive increase in 
error probability [18]. This implies that it can be 
simulated using a private coin (which is used only 
to draw a random element of S) at an additional 
communication cost of logn + O(1). Therefore, 
R(f) < R™>(f) + logn + O(1). Again the 
EQ, function shows that this separation is the best 
possible. 


The Log-Rank Conjecture and Further Matrix 
Analysis 

The rectangle property of communication 
protocols readily implies that D(f) = 
log, rk f, where rk f is the rank of the 
matrix (f(x, y))¢0,1}7x¢0,137- It has long been 
conjectured that D(f) = poly(logrk f). This 
famous conjecture remains wide open; the 
best-known relevant upper bound is D(f) < 
O(./rk f logrk f) [16]. 

Other, more sophisticated, matrix analysis 
techniques can be used to establish lower bounds 
on R(f) by going through the approximate 
rank and factorization norms. The survey by 
Lee and Shraibman [15] has a strong focus on 
such techniques and provides a comprehensive 
coverage of results. 


Direct Sum, Direct Product, and Amortization 
Does the complexity of f grow n-fold, or even 
2(n)-fold, if we have to compute f on n 
independent instances? This is called a direct 
sum question. Attempts to answer it and its 
variants, for general, as well as specific functions 
ff have spurred a number of developments. Let 
f K denote the (non-Boolean) function that, 
on input ((x,...,x), (y,..., y™)), 
outputs (f(x, y®),..., f(x, y™)). Let 
Re (f*) denote the cost of the best randomized 
protocol that computes each entry of the k- 
tuple of of values of f* up to error ¢. This 
is in contrast to the usual R,(f By. which is 
concerned with getting the entire k-tuple correct 
except with probability «. An alternate way 
of posing the direct sum question is to ask 
how the amortized randomized complexity 
R.(f) = limps RE(f*)/k compares with 
Re(f). 
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An early result along these lines shows that 
R(EQ,) = O(1) [10]. Recalling that R(EQ,) = 
Q(logn), this shows that (in the private-coin 
setting) EQ, exhibits economy of scale; we say 
that it does not satisfy the direct sum property. It 
had long been conjectured that no such economy 
of scale is possible in a public-coin setting; in 
fact information complexity rose to prominence 
as a technique precisely because in the informa- 
tional setting, the direct sum property is easy to 
prove [9]. Thus, IC“ ( f), for every distribution jz, 
lower bounds not just R(f) but also R( f). More 
interestingly, the opposite inequality also holds, 
so that information and amortized randomized 
complexities are in fact equal; the proof uses 
a sophisticated interactive protocol compression 
technique [6] that should be seen as an analog 
of classical information theoretic results about 
single-message compression, e.g., via Huffman 
coding. 

Thus, proving a general direct sum theorem 
for randomized communication is equivalent to 
compressing a protocol with information cost [ 
down to O(/) bits of communication. However, 
the best-known compression uses 2° pits [3); 
and this is optimal [11]. The proof of optimality 
— despite showing that a fully general direct sum 
theorem is impossible — is such that a slightly 
weakened direct sum result, such as R* 5 = 
Q(k)R(f) — O(logn), remains possible and 
open. Meanwhile, fully general and strong direct 
sum theorems can be proven by restricting the 
model to bounded-round communication, or re- 
stricting the function to those whose complexity 
is captured by the smooth corruption bound. 


Round Elimination 

For problems that are hard only under a limitation 
on the number of rounds — e.g., GT, discussed 
above — strong bounded-round lower bounds are 
proved using the round elimination technique. 
Here, one shows that if an r-round protocol for 
f starts with a short enough first message, then 
this message can be eliminated altogether, and 
the resulting (r — 1)-round protocol will solve a 
“subproblem” of f. Repeating this operation r 
times results in a 0-round protocol that, hopefully, 
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solves a nontrivial subproblem, giving a contra- 
diction. 

To make this useful, one then has to identify a 
reasonable notion of subproblem. It is especially 
useful to have this subproblem be a smaller in- 
stance of f itself. This does happen in several 
cases and can be illustrated by looking at GT,: 
restricting n-length strings to indices in [€, A] 
and forcing agreement at indices in [1,¢ — 1] 
shows that GT, contains GT,_¢ as a subproblem. 
The proof of the aforementioned lower bound 
R(GT,) = Q(n'/" /r?) uses exactly this ob- 
servation. The pointer jumping lower bound, also 
mentioned before, proceeds along similar lines. 


Applications 


Data Stream Algorithms 

Consider a data stream algorithm using s bits of 
working memory and p passes over its input. 
Splitting the stream into two pieces, giving the 
first to Alice and the second to Bob, creates a 
communication problem and a (2p — 1)-round 
protocol for it, using s bits of communication per 
round. A generalization to multiplayer communi- 
cation is immediate. These observations [1] allow 
us to infer several space (or pass/space trade-off) 
lower bounds for data stream algorithms from 
suitable communication lower bounds, often after 
a suitable reduction. 

For instance, the problem of approximating 
the number of distinct items in a stream and its 
generalization to the problem of approximating 
frequency moments are almost fully understood, 
based on lower bounds for EQ,, GHDy,, DISJn, 
and its generalization to multiple players, 
under the unique intersection restriction noted 
above. A large number of graph-theoretic 
problems can be shown to require {2(”) space, n 
being the number of vertices, based on lower 
bounds for INDEX,, PJ, and variants, and 
again DISJ,. For several other data streaming 
problems — e.g., approximating £,, and cascaded 
norms and approximating a maximum cut or 
maximum matching — a reduction using an 
off-the-shelf communication lower bound is 
not known, but one can still obtain strong 
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space lower bounds by considering a tailor- 
made communication problem in each case and 
applying the familiar lower-bounding techniques 
outlined above. 


Data Structures 

The cell-probe model of Yao is designed to cap- 
ture all conceivable data structures on a modern 
computer: it models the query/update process as 
a sequence of probes into the entries (memory 
words) of a table containing the data structure. 
Focusing on static data structures for the mo- 
ment, note that a t-probe algorithm using an s- 
word table with w-bit words directly leads to 
a 2t-round communication protocol in which 
Alice (the querier) sends (log, s)-bit messages 
and Bob (the table holder) sends w-bit mes- 
sages. Lower bounds trading off f against w 
and s are therefore implied by suitable asymmet- 
ric communication lower bounds, where Alice’s 
messages need to be much shorter than Bob’s 
and Alice also has a correspondingly smaller 
input. 

The study of these kinds of lower bounds was 
systematized by Miltersen et al. [17], who used 
round elimination as well as corruption-style 
techniques to obtain cell-probe lower bounds 
for set membership, predecessor search, range 
query, and further static data structure problems. 
P&trascu [19] derived an impressive number of 
cell-probe lower bounds — for problems ranging 
from union-find to dynamic stabbing and range 
reporting (in low dimension) to approximate near 
neighbor searching — by a tree of reductions 
starting from the lopsided set disjointness 
problem. This latter problem, denoted LSDx.y, 
gives Alice a set x C [kn] with |x| < k and Bob 
a set y © [kn]. Using information complexity 
techniques and a direct sum result for the basic 
INDEX problem, one can use the Alice/Bob trade- 
off result for INDEX discussed earlier to establish 
the nearly optimal trade-off that, for each 6 > 0, 
solving LSDx,, to ; error (say) requires either 
Alice to send at least 6n log k bits or Bob to send 
nk!—9®) pits. 
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Circuit Complexity 

Early work in circuit complexity had identified 
certain conjectured communication complexity 
lower bounds as a route towards strong lower 
bounds for circuit size and depth and related 
complexity measures for Boolean formulas and 
branching programs. Several of these conjectures 
remain unproven, especially ones involving the 
number-on-the-forehead (NOF) communication 
model, where the input is “written on the fore- 
heads” of a large number, ¢, of players. The 
resulting high degree of input sharing allows for 
some rather novel nontrivial protocols, making 
lower bounds very hard to prove. Nevertheless, 
the discrepancy technique has been extended to 
NOF communication, and some of the technically 
hardest work in communication complexity has 
gone towards using it effectively for concrete 
problems, such as set disjointness [21]. While 
NOF lower bounds strong enough to imply cir- 
cuit lower bounds remain elusive, certain other 
communication lower bounds, such as two-party 
bounds for computing relations, have had more 
success. In particular, monotone circuits for di- 
rected and undirected graph connectivity have 
been shown to require super-logarithmic depth, 
via the influential idea of Karchmer-Wigderson 
games [13]. 


Further Applications 

We note in passing that there are plenty more 
applications of communication complexity than 
are possible to even outline in this short article. 
These touch upon such diverse areas as proof 
complexity; extension complexity for linear and 
semidefinite programming; AT* lower bounds 
in VLSI design; query complexity in the classi- 
cal and quantum models; and time complexity 
on classical Turing machines. Kushilevitz and 
Nisan [14] remains the best starting point for fur- 
ther reading about these and more applications. 
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Problem Definition 


A mobile ad hoc network is a temporary dy- 
namic interconnection network of wireless mo- 
bile nodes without any established infrastructure 
or centralized administration. A basic communi- 
cation problem, in ad hoc mobile networks, is 
to send information from a sender node, A, to 
another designated receiver node, B. If mobile 
nodes A and B come within wireless range of 
each other, then they are able to communicate. 
However, if they do not, they can communicate 
if other network nodes of the network are willing 
to forward their packets. One way to solve this 
problem is the protocol of notifying every node 
that the sender A meets and provide it with all 
the information hoping that some of them will 
eventually meet the receiver B. 


Is there a more efficient technique (other than 
notifying every node that the sender meets, in the 
hope that some of them will then eventually meet 
the receiver) that will effectively solve the commu- 
nication establishment problem without flooding 
the network and exhausting the battery and com- 
putational power of the nodes? 


The problem of communication among mobile 
nodes is one of the most fundamental problems in 
ad hoc mobile networks and is at the core of many 
algorithms, such as for counting the number of 
nodes, electing a leader, data processing etc. 
For an exposition of several important problems 
in ad hoc mobile networks see [13]. The work 
of Chatzigiannakis, Nikoletseas and Spirakis [5] 
focuses on wireless mobile networks that are 
subject to highly dynamic structural changes cre- 
ated by mobility, channel fluctuations and de- 
vice failures. These changes affect topological 
connectivity, occur with high frequency and may 
not be predictable in advance. Therefore, the 
environment where the nodes move (in three- 
dimensional space with possible obstacles) as 


358 


well as the motion that the nodes perform are 
input to any distributed algorithm. 


The Motion Space 

The space of possible motions of the mobile 
nodes is combinatorially abstracted by a motion- 
graph, 1.e., the detailed geometric characteristics 
of the motion are neglected. Each mobile node 
is assumed to have a transmission range repre- 
sented by a sphere tr centered by itself. Any 
other node inside fr can receive any message 
broadcast by this node. This sphere is approxi- 
mated by a cube fc with volume V(tc), where 
V(tc) < V(tr). The size of tc can be chosen in 
such a way that its volume V(fc) is the maximum 
that preserves V(tc) < V(tr), and if a mobile 
node inside tc broadcasts a message, this mes- 
sage is received by any other node in tc. Given 
that the mobile nodes are moving in the space 
S,S is divided into consecutive cubes of volume 
V(tc). 


Definition 1 The motion graph G(V, E), (|V| = 
n,|E| = m), which corresponds to a quantization 
of S is constructed in the following way: a vertex 
u € G represents a cube of volume V(fc) and an 
edge (u, v) € G exists if the corresponding cubes 
are adjacent. 


The number of vertices n, actually approximates 
the ratio between the volume V(S) of space S, 
and the space occupied by the transmission range 
of a mobile node V(tr). In the extreme case 
where V(S) = V(tr), the transmission range of 
the nodes approximates the space where they 
are moving and n = 1. Given the transmission 
range tr, n depends linearly on the volume 
of space S regardless of the choice of tc, and 
n= O(V(S)/V(tr)). The ratio V(S)/V(tr) is 
the relative motion space size and is denoted by 
p. Since the edges of G represent neighboring 
polyhedra each vertex is connected with 
a constant number of neighbors, which yields that 
m = @©(n). In this example where tc is a cube, 
G has maximum degree of six and m < 6n. Thus 
motion graph G is (usually) a bounded degree 
graph as it is derived from a regular graph of 
small degree by deleting parts of it corresponding 
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to motion or communication obstacles. Let A be 
the maximum vertex degree of G. 


The Motion of the Nodes-Adversaries 

In the general case, the motions of the nodes are 
decided by an oblivious adversary: The adversary 
determines motion patterns in any possible way 
but independently of the distributed algorithm. In 
other words, the case where some of the nodes 
are deliberately trying to maliciously affect the 
protocol, e.g., avoid certain nodes, are excluded. 
This is a pragmatic assumption usually followed 
by applications. Such kind of motion adversaries 
are called restricted motion adversaries. 

For purposes of studying efficiency of dis- 
tributed algorithms for ad hoc networks on the 
average, the motions of the nodes are modeled 
by concurrent and independent random walks. 
The assumption that the mobile nodes move ran- 
domly, either according to uniformly distributed 
changes in their directions and velocities or ac- 
cording to the random waypoint mobility model 
by picking random destinations, has been used 
extensively by other research. 


Key Results 


The key idea is to take advantage of the mobile 
nodes natural movement by exchanging informa- 
tion whenever mobile nodes meet incidentally. It 
is evident, however, that if the nodes are spread 
in remote areas and they do not move beyond 
these areas, there is no way for information to 
reach them, unless the protocol takes special care 
of such situations. The work of Chatzigiannakis, 
Nikoletseas and Spirakis [5] proposes the idea 
of forcing only a small subset of the deployed 
nodes to move as per the needs of the protocol; 
they call this subset of nodes the support of 
the network. Assuming the availability of such 
nodes, they are used to provide a simple, correct 
and efficient strategy for communication between 
any pair of nodes of the network that avoids 
message flooding. 

Let k nodes be a predefined set of nodes 
that become the nodes of the support. These 


Communication in Ad Hoc Mobile Networks Using Random Walks 359 


a b 
Obstacle 
WN ; 
Sl | a 
original network area S divided in cubes motion graph G 


Communication in Ad Hoc Mobile Networks Using Random Walks, Fig. 1 The original network area S (a), how 
it is divided in consecutive cubes of volume V(tc) (b) and the resulting motion graph G (c) 


nodes move randomly and fast enough so that 
they visit in sufficiently short time the entire 
motion graph. When some node of the support is 
within transmission range of a sender, it notifies 
the sender that it may send its message(s). The 
messages are then stored “somewhere within the 
support structure”. When a receiver comes within 
transmission range of a node of the support, the 
receiver is notified that a message is “waiting” 
for him and the message is then forwarded to the 
receiver. 


Protocol 1 (The “Snake” Support Motion Co- 
ordination Protocol) Let So, S;,...,S,—1 be 
the members of the support and let Sg denote 
the leader node (possibly elected). The protocol 
forces Sp to perform a random walk on the motion 
graph and each of the other nodes S; execute the 
simple protocol “move where S;— 1 was before”. 
When So is about to move, it sends a message to 
S, that states the new direction of movement. S; 
will change its direction as per instructions of So 
and will propagate the message to 5. In analogy, 
S; will follow the orders of S; — ; after transmitting 
the new directions to S;+ 1. Movement orders 
received by S; are positioned in a queue Q; for 
sequential processing. The very first move of S;, 
Vi € {1,2,...,k — 1} is delayed by a& period of 
time. 


The purpose of the random walk of the head 
So is to ensure a cover, within some finite time, 
of the whole graph G without knowledge and 
memory, other than local, of topology details. 
This memoryless motion also ensures fairness, 
low-overhead and inherent robustness to struc- 
tural changes. 


Consider the case where any sender or receiver 
is allowed a general, unknown motion strategy, 
but its strategy is provided by a restricted mo- 
tion adversary. This means that each node not 
in the support either (a) executes a determin- 
istic motion which either stops at a vertex or 
cycles forever after some initial part or (b) it 
executes a stochastic strategy which however is 
independent of the motion of the support. The 
authors in [5] prove the following correctness 
and efficiency results. The reader can refer to 
the excellent book by Aldous and Fill [1] for 
a nice introduction on Makrov Chains and Ran- 
dom Walks. 


Theorem 1 The support and the “snake” motion 
coordination protocol guarantee reliable commu- 
nication between any sender-receiver (A, B) pair 
in finite time, whose expected value is bounded 
only by a function of the relative motion space 
size p and does not depend on the number of 
nodes, and is also independent of how MHs, 
MHr move, provided that the mobile nodes not 
in the support do not deliberately try to avoid the 
support. 


Theorem 2 The expected communication time of 
the support and the “snake” motion coordination 
protocol is bounded above by @(./mc) when 
the (optimal) support size k = /2mc and c is 
e/(e—1)u, u being the “separation threshold 
time” of the random walk on G. 


Theorem 3 By having the support’s head move 
on a regular spanning subgraph of G, there is an 
absolute constant y > 0 such that the expected 
meeting time of A (or B) and the support is 
bounded above by yn?/k. Thus the protocol 
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guarantees a total expected communication time 
of ©(p), independent of the total number of mo- 
bile nodes, and their movement. 


The analysis assumes that the head Sp moves 
according to a continuous time random walk of 
total rate | (rate of exit out of a node of G). If So 
moves wW times faster than the rest of the nodes, all 
the estimated times, except the inter-support time, 
will be divided by wW. Thus the expected total 
communication time can be made to be as small 
as O(yp/./W) where y is an absolute constant. In 
cases where So can take advantage of the network 
topology, all the estimated times, except the inter- 
support time are improved: 


Theorem 4 When the support’s head moves on 
a regular spanning subgraph of G the expected 
meeting time of A (or B) and the support cannot 
be less than (n — 1)*/2m. Since m = O(n), the 
lower bound for the expected communication 
time is @(n). In this sense, the “snake” proto- 
col’s expected communication time is optimal, for 
a support size which is O(n). 


The “on-the-average” analysis of the time- 
efficiency of the protocol assumes that the motion 
of the mobile nodes not in the support is a random 
walk on the motion graph G. The random walk of 
each mobile node is performed independently of 
the other nodes. 


Theorem 5 The expected communication time of 
the support and the “snake” motion coordination 
protocol is bounded above by the formula 


re) (=) + O(k). 


BOYS 5G) 


The upper bound is minimized when 
k = J/2n/A2(G), where Az is the second 
eigenvalue of the motion graph’s adjacency 
matrix. 


The way the support nodes move and commu- 
nicate is robust, in the sense that it can tolerate 
failures of the support nodes. The types of fail- 
ures of nodes considered are permanent, i.e., stop 
failures. Once such a fault happens, the support 
node of the fault does not participate in the ad 
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hoc mobile network anymore. A communication 
protocol is B-faults tolerant, if it still allows the 
members of the network to communicate cor- 
rectly, under the presence of at most 8 permanent 
faults of the nodes in the support (6 > 1). [5] 
shows that: 


Theorem 6 The support and the “snake” motion 
coordination protocol is 1-fault tolerant. 


Applications 


Ad hoc mobile networks are rapidly deployable 
and self-configuring networks that have 
important applications in many critical areas such 
as disaster relief, ambient intelligence, wide area 
sensing and surveillance. The ability to network 
anywhere, anytime enables teleconferencing, 
home networking, sensor networks, personal 
area networks, and embedded computing 
applications [13]. 


Related Work 

The most common way to establish communica- 
tion is to form paths of intermediate nodes that 
lie within one another’s transmission range and 
can directly communicate with each other. The 
mobile nodes act as hosts and routers at the same 
time in order to propagate packets along these 
paths. This approach of maintaining a global 
structure with respect to the temporary network 
is a difficult problem. Since nodes are moving, 
the underlying communication graph is changing, 
and the nodes have to adapt quickly to such 
changes and reestablish their routes. Busch and 
Tirthapura [2] provide the first analysis of the per- 
formance of some characteristic protocols [8, 13] 
and show that in some cases they require Q(u”) 
time, where u is the number of nodes, to stabilize, 
i.e., be able to provide communication. 

The work of Chatzigiannakis, Nikoletseas and 
Spirakis [5] focuses on networks where topo- 
logical connectivity is subject to frequent, un- 
predictable change and studies the problem of 
efficient data delivery in sparse networks where 
network partitions can last for a significant pe- 
riod of time. In such cases, it is possible to 
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have a small team of fast moving and versatile 
vehicles, to implement the support. These ve- 
hicles can be cars, motorcycles, helicopters or 
a collection of independently controlled mobile 
modules, i.e., robots. This specific approach is 
inspired by the work of Walter, Welch and Am- 
ato [14] that study the problem of motion co- 
ordination in distributed systems consisting of 
such robots, which can connect, disconnect and 
move around. 

The use of mobility to improve performance 
in ad hoc mobile networks has been considered 
in different contexts in [6, 9, 11, 15]. The pri- 
mary objective has been to provide intermittent 
connectivity in a disconnected ad hoc network. 
Each solution achieves certain properties of end- 
to-end connectivity, such as delay and message 
loss among the nodes of the network. Some of 
them require long-range wireless transmission, 
other require that all nodes move pro-actively 
under the control of the protocol and collab- 
orate so that they meet more often. The key 
idea of forcing only a subset of the nodes to 
facilitate communication is used in a similar 
way in [10, 15]. However, [15] focuses in cases 
where only one node is available. Recently, the 
application of mobility to the domain of wireless 
sensor networks has been addressed in [3, 10, 
12]. 


Open Problems 


A number of problems related to the work of 
Chatzigiannakis, Nikoletseas and Spirakis [5] re- 
main open. It is clear that the size of the support, 
k, the shape and the way the support moves 
affects the performance of end-to-end connec- 
tivity. An open issue is to investigate alternative 
structures for the support, different motion coor- 
dination strategies and comparatively study the 
corresponding effects on communication times. 
To this end, the support idea is extended to 
hierarchical and highly changing motion graphs 
in [4]. The idea of cooperative routing based on 
the existence of support nodes may also improve 
security and trust. 
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An important issue for the case where the 
network is sparsely populated or where the rate 
of motion is too high is to study the performance 
of path construction and maintenance protocols. 
Some work has be done in this direction in [2] 
that can be also used to investigate the end-to- 
end communication in wireless sensor networks. 
It is still unknown if there exist impossibility 
results for distributed algorithms that attempt to 
maintain structural information of the implied 
fragile network of virtual links. 

Another open research area is to analyze the 
properties of end-to-end communication given 
certain support motion strategies. There are cases 
where the mobile nodes interactions may behave 
in a similar way to the Physics paradigm of 
interacting particles and their modeling. Studies 
of interaction times and propagation times in 
various graphs are reported in [7] and are still 
important to further research in this direction. 


Experimental Results 


In [5] an experimental evaluation is conducted via 
simulation in order to model the different pos- 
sible situations regarding the geographical area 
covered by an ad-hoc mobile network. A num- 
ber of experiments were carried out for grid- 
graphs (2D, 3D), random graphs (G,,, model), 
bipartite multi-stage graphs and two-level motion 
graphs. All results verify the theoretical analysis 
and provide useful insight on how to further 
exploit the support idea. In [4] the model of hi- 
erarchical and highly changing ad-hoc networks 
is investigated. The experiments indicate that, 
the pattern of the “snake” algorithm’s perfor- 
mance remains the same even in such type of 
networks. 


URL to Code 


http://ru1.cti.gr 
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Problem Definition 


Routing is a distributed mechanism _ that 
allows sending messages between any pair of 
nodes of the network. As in all distributed 
algorithms, a routing scheme runs locally on 
every processor/node of the network. Each 
node/processor of the network has a routing 
daemon running on it whose responsibility is to 
forward arriving messages while utilizing local 
information that is stored at the node itself. This 
local information is usually referred to as the 
routing table of the node. 

A routing scheme involves two phases, a pre- 
processing phase and a routing phase. In the 
preprocessing phase, the algorithm assigns every 
node of the network a routing table and a small- 
size label. The label is used as the address of the 
node and therefore is usually expected to be of 
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small size — poly-logarithmic in the size of the 
network. 

In the routing phase, some node of the net- 
works wishes to send a message to some other 
nodes of the network in a distributed manner. 
During the routing phase, each node of the net- 
work may receive this message, and it has to 
decide whether this message reached its final 
destination, and if not, the node needs to decide 
to which of its neighbors this message should be 
forwarded next. In order to make these decisions, 
the node may use its own routing table and the 
header of the message that usually contains the 
label of the final destination and perhaps some 
additional information. 

The stretch of a routing scheme is defined 
as the worst case ratio between the length of 
the path obtained by the routing scheme and the 
length of the shortest path between the source and 
the destination. There are two main objectives 
in designing the routing scheme. The first is to 
minimize the stretch of the routing scheme, and 
the second is to minimize the size of the routing 
tables. Much of the work on designing routing 
schemes focuses on the trade-off between these 
two objectives. 

One extreme case is when it is allowed to 
use linear-size routing tables. In this case, one 
can store a complete routing table at all nodes, 
ie., for every source node s and every potential 
destination node ¢, store at s the port number 
of the neighbor of s on the shortest path from 
s to ¢t. In this case, the stretch is 1, 1.e., the 
algorithm can route on exact shortest paths. How- 
ever, a clear drawback is that the size of the 
routing tables is large, linear in the size of the 
network. 

One may wish to use smaller routing tables 
at the price of a larger stretch. A routing scheme 
is considered to be compact if the size of the 
routing tables is sublinear in the number of 
nodes. 


Key Results 


This section presents a survey on compact routing 
schemes and a highlight of some recent new 
developments. 
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Many papers focus on the trade-off between 
the size of the routing tables and the stretch (e.g., 
[1,2,4,5, 7-9]). The first trade-off was obtained 
by Peleg and Upfal [9]. Their scheme considered 
unweighted graph and achieved a bound on the 
total size of the routing tables. 

Later, Awerbuch et al. [1] considered weighted 
graphs and achieved a routing scheme with a 
guarantee on the maximum table size. Their rout- 
ing scheme uses table size of O(n'/*) and was 
with O(k?9%) stretch. A better trade-off was later 
achieved by Awerbuch and Peleg [2]. 

Until very recently, the best-known trade-off 
was due to Thorup and Zwick [10]. They pre- 
sented a routing scheme that uses routing tables 
of size O(n'/*), a stretch of 4k — 5, and label 
size of O(k log” n). Moreover, they showed that 
if a handshaking is allowed, namely, if the source 
node and the destination are allowed to exchange 
an information of size O(log’ n) bits, then the 
stretch can be improved to 2k — 1. Clearly, in 
many cases, it would be desirable to avoid the use 
of handshaking, as the overhead of establishing a 
handshake can be as high as sending the original 
message itself. 

A natural question is, what is the best trade- 
off between routing table size and stretch one can 
hope for with or without a handshake? In fact, 
assuming the girth conjecture of Erdés [6], one 
can show that with table size of O(n'!/*), the 
best stretch possible is 2k — 1 with or without 
a handshake. Hence, in the case of a handshake, 
Thorup and Zwick’s scheme [10] is essentially 
optimal. However, in the case of no handshake, 
there is still a gap between the lower and upper 
bound. A main open problem in the area of 
compact routing schemes is on the gap between 
the stretch 4k — 5 and 2k — 1. 

Recently, Chechik [3] gave the first evidence 
that the asymptotically optimal stretch is less than 
4k. Chechik [3] presented the first improvement 
to the stretch-space trade-off of compact routing 
scheme since the result of Thorup and Zwick 
[10]. More specifically, [3] presented a com- 
pact routing scheme for weighted general undi- 
rected graphs that uses tables of size O(n'!/*) 
and has stretch c - k for some absolute constant 
c <4. 
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Open Problems 


The main question that still remains unresolved 
is to prove or disprove the existence of a com- 
pact routing scheme that utilizes tables of size 
O (ni! K ) and has stretch of 2k without the use 
of a handshake. 
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Problem Definition 


This problem studies the one round, sealed-bid 
auction model where an auctioneer would like 
to sell an idiosyncratic commodity with unlim- 
ited copies to n bidders and each bidder i € 
{1,...,} will get at most one item. 

First, for any i, bidder i bids a value 5; 
representing the price he is willing to pay for the 
item. They submit the bids simultaneously. After 
receiving the bidding vector b = (bj,...,)n), 
the auctioneer computes and outputs the alloca- 
tion vector x = (Xj,...,X,) € {0,1}” and the 
price vector p = (p1,..., Pn). If for any i, x; = 
1, then bidder i gets the item and pays p; for it. 
Otherwise, bidder i loses and pays nothing. In the 
auction, the auctioneer’s revenue is )~”_, xp’. 


Definition 1 (Optimal Single Price Omniscient 
Auction /) Given a bidding vector b sorted in 
decreasing order, 


F(b) = max i -d; 


1<i<n 
Further, 


F™(b) = max i -d; 


m<i<n 


Obviously, F maximizes the auctioneer’s rev- 
enue if only uniform price is allowed. 

However, in this problem, each bidder 7 is 
associated with a private value v; representing 
the item’s value in his opinion. So if bidder i 
gets the item, his payoff should be v; — p;. 
Otherwise, his payoff is 0. So for any bidder 7, his 
payoff function can be formulated as (v; — p;)x;. 
Furthermore, free will is allowed in the model. 
In other words, each bidder would bid some b; 
different from his true value v;, to maximize his 
payoff. 


Competitive Auction 


The objective of the problem is to design a 
truthful auction which could still maximize the 
auctioneer’s revenue. An auction is truthful if for 
every bidder 7, bidding his true value would max- 
imize his payoff, regardless of the bids submitted 
by the other bidders [12, 13]. 


Definition 2 (Competitive Auctions) 


INPUT: the submitted bidding vector b. 

OUTPUT: the allocation vector x and the price 
vector p. 

CONSTRAINTS: 


(a) Truthful; 

(b) The auctioneer’s revenue is within a con- 
stant factor of the optimal single pricing for 
all inputs. 


Key Results 


Let b_; = (by, ares »Di-1, bj 41, sk 
function from b_; to the price. 


., bn). f is any 


Algorithm 1 Bid-independent auction: A ¢ (b) 
1: fori = 1 ton do 


2: if f(b_;) < b; then 

3: Xx; = land p; = f(b;) 
4 else 

5: xj =0 

6 end if 

7: end for 


Theorem 1 ([6]) An auction is truthful if and 
only if it is equivalent to a bid-independent auc- 
tion. 


Definition 3 A truthful auction A is f- 
competitive against F™ if for all bidding vectors 
b, the expected profit of A on b satisfies 


F™ (b) 
B 


Definition 4 (CostSharec [11]) Given bids b, 
this mechanism finds the largest k such that the 
highest k bidders’ biddings are at least £. Charge 
each of such k bidders £. 


E(A(b)) > 
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Algorithm 2 Sampling cost-sharing auction 
(SCS) 
1: Partition bidding vector b uniformly at random into 
two sets b’ and b”. 
2: Computer F’ = F(b’) and F” = F(b”). 
3: Running CostShare 7 on b’ and CostSharez, on b”. 


Theorem 2 ([6]) SCS is 4-competitive against 
F®), and the bound is tight. 


SCS could be extended for partitioning into 
k parts for any k. In fact, k = 3 is the optimal 
partition. 


Theorem 3 ({10]) The random three partition- 
ing cost sharing auction is 3.25-competitive. 


Theorem 4 ([9]) Let A be any truthful random- 
ized auction. There exists an input bidding vector 


: F)(b) 
b on which E(A(b)) < 435-- 


Applications 


As the Internet becomes more popular, more and 
more auctions are beginning to appear. Further, 
the items on sale in the auctions vary from an- 
tiques, paintings, and digital goods, for example, 
mp3, licenses, network resources, and so on. 
Truthful auctions can reduce the bidders’ cost 
of investigating the competitors’ strategies, since 
truthful auctions encourage bidders to bid their 
true values. On the other hand, competitive auc- 
tions can also guarantee the auctioneer’s profit. 
So this problem is very practical and significant. 
These years, designing and analyzing competi- 
tive auctions under various auction models has 
become a hot topic [1—5, 7, 8]. 
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Problem Definition 


It is well known that if NP + P, there is an 
infinite hierarchy of complexity classes between 
them [10]. However, for some broad classes of 
problems, a complexity dichotomy exists: every 
problem in the class is either in polynomial time 
or NP-hard. Such results include Schaefer’s the- 
orem [13], the dichotomy of Hell and NeSetiil 
for H-coloring [9], and some subclasses of the 
general constraint satisfaction problem [4]. These 
developments lead to the following questions: 
How far can we push the envelope and show di- 
chotomies for even broader classes of problems? 
Given a class of problems, what is the criterion 
that distinguishes the tractable problems from the 
intractable ones? How does it help in solving the 
tractable problems efficiently? Now replacing NP 
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with #P [15], all the questions above can be asked 
for counting problems. 

One family of counting problem concerns 
graph homomorphisms. Given two undirected 
graphs G and H, a graph homomorphism from 
G to H is a map & from the vertex set V(G) 
to V(#) such that (u,v) is an edge in G if 
and only if (&(u),€(v)) is an edge in H. The 
counting problem for graph homomorphism 
is to compute the number of homomorphisms 
from G to H. For a fixed graph H, this 
problem is also known as the #H-coloring 
problem. In addition to #H-coloring, a more 
general family of problems that has been studied 
intensively over the years is to count graph 
homomorphisms with weights. Formally, we 
use A to denote an m x m symmetric matrix with 
entries (A;,;), i, 7 € [m] = {1,...,m}. Given 
any undirected graph G = (V,E), we define 
the graph homomorphism function Z(G) as 
follows: 


ZG)= DY) [J Aewew- 


E:V—>[m] (u,v)EE 


() 


This is also called the partition function 
from statistical physics. It is clear from the 
definition that Za4(G) is exactly the number of 
homomorphisms from G to H, when A is the 
{0, 1} adjacency matrix of H. 

Graph homomorphism can express many 
natural graph properties. For example, if one 
takes H to be the graph over two vertices 
{0,1} with an edge (0,1) and a self-loop 
at 1, then the set of vertices mapped to 1 
in a graph homomorphism from G to H 
corresponds to a VERTEX COVER of G, and 
the counting problem simply counts the number 
of vertex covers. As another example, if H is 
the complete graph over k vertices (without 
self-loops), then the problem is exactly the k- 
COLORING problem for G. Many additional 
graph invariants can be expressed as Za(G) 
for appropriate A. Consider the Hadamard 


matrix 
11 
H= (; ey) , (2) 
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where we index the rows and columns by {0, 1}. 
In Z(G), every product 


[] Hew. € (1-33 
(u,v)EE 


and is —1 precisely when the induced subgraph 
of G on &~!(1) has an odd number of edges. 
Thus, (2” — Zy(G))/2 is the number of induced 
subgraphs of G with an odd number of edges. 
Also expressible as Za(-) are S-flows where S 
is a subset of a finite Abelian group closed under 
inversion [6], and a scaled version of the Tutte 
polynomial T(x, y) where (x — 1)(y — 1) isa 
positive integer. In [6], Freedman, Lovasz, and 
Schrijver characterized the graph functions that 
can be expressed as Za(-). 


Key Results 


In [5], Dyer and Greenhill first prove a complex- 
ity dichotomy theorem for all undirected graphs 
H. To state it formally, we give the following 
definition of block-rank-1 matrices: 


Definition 1 (Block-rank-1 matrices) A 
nonnegative (but not necessarily symmetric) 
matrix A € C™*” is said to be block-rank-1 
if after separate appropriate permutations of its 
rows and columns, A becomes a block diagonal 
matrix and every block is of rank 1. 


It is clear that a nonnegative matrix A is block- 
rank-1 iff every 2 x 2 submatrix of A with at 
least three positive entries is of rank |. Here is the 
dichotomy theorem of Dyer and Greenhill [5]: 


Theorem 1 ([5]) Given any undirected graph 
H, the #H-coloring problem is in polynomial 
time if its adjacency matrix is block-rank-\ and 
is #P-hard otherwise. 


For the special case when H has two vertices, 
the dichotomy above states that #H-coloring is 
in polynomial time if the number of Is in its 
adjacency matrix is 0,1,2, or 4 and is #P-hard 
otherwise. For the latter case, one of the diagonal 
entries is 0 (as 7 is undirected), and #H -coloring 
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is indeed the problem of counting independent 
sets [16]. However, proving a dichotomy theorem 
for H of arbitrary size is much more challeng- 
ing. Besides counting independent sets, the other 
starting point used in [5] is the problem of count- 
ing proper qg-colorings [12]. To show that there 
is a reduction from one of these two problems 
whenever H violates the block-rank-1 criterion, 
Dyer and Greenhill need to define a more gen- 
eral counting problem with vertex weights and 
employ the technique of interpolation [14, 16], as 
well as two tools often used with interpolation, 
stretching, and thickening. 

Later in [1], Bulatov and Grohe give a sweep- 
ing complexity dichotomy theorem that general- 
izes the result of Dyer and Greenhill to nonnega- 
tive symmetric matrices: 


Theorem 2 ([{1]) Given any symmetric and non- 
negative algebraic matrix A, computing Z4(-) is 
in polynomial time if A is block-rank-1\ and is #P- 
hard otherwise. 


This dichotomy theorem has since played an 
important role in many of the new developments 
in the study of counting graph homomorphisms 
as well as counting constraint satisfaction prob- 
lem because of its enormous applicability. Many 
#P-hardness results are built on top of this di- 
chotomy. A proof of the dichotomy theorem with 
a few shortcuts can also be found in [8]. 

Recently in a paper with both exceptional 
depth and conceptual vision, Goldberg, Jerrum, 
Grohe, and Thurley [7] proved a complexity 
dichotomy for all real-valued symmetric 
matrices: 


Theorem 3 ([7]) Given any symmetric and 
real algebraic matrix A, the problem of 
computing Z,(-) is either in polynomial time or 
#P-hard. 


The exact tractability criterion in the 
dichotomy above, however, is much more 
technical and involved. Roughly speaking, the 
proof of the theorem proceeds by establishing 
a sequence of successively more stringent 
properties that a tractable A must satisfy. 
Ultimately, it arrives at a point where the 
satisfaction of these properties together implies 
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that the computation of Z4(G) can be reduced to 
the following sum: 


(—1) £6 @1>---%0) (3) 


where fg is a quadratic polynomial over Z2 
constructed from the input graph G efficiently. 
This sum is known to be computable in 
polynomial time in 7, the number of variables 
(e.g., see [3] and [11, Theorem 6.30]). In 
particular, the latter immediately implies that 
the following two Hadamard matrices 


1 
Ha=( 


are both tractable. This can be seen from the 
following polynomial view of these two matrices. 
If we index the rows and columns of Hz by Zz 
and index the rows and columns of Hy by (Z2)?, 
then their (x, y)th entry and ((x1, x2), (71, y2))th 
entry are 


11 11 
1 1-1-1 
1-1 1 -l 
1-1-1 1 


= and Hy, = 


(-1)*” and (H1)*192 74291 | 
respectively. From here, it is easy to reduce 
Zu, (-) and Zy, (-) to (3). 

Compared with the nonnegative domain [1,5], 
there are a lot more interesting tractable cases 
over the real numbers, e.g., the two Hadamard 
matrices above as well as their arbitrary tensor 
products. It is not surprising that the potential 
cancelations in the sum Za,(-) may in fact be 
the source of efficient algorithms for comput- 
ing Z,(-) itself. This motivates Cai, Chen, and 
Lu to continue to investigate the computational 
complexity of Z,(-) with A being a symmetric 
complex matrix [2], because over the complex 
domain, there is a significantly richer variety 
of possible cancelations with the roots of unit, 
and more interesting tractable cases are expected. 
This turns out to be the case, and they prove the 
following complexity dichotomy: 


Theorem 4 ([2]) given any symmetric and alge- 
braic complex matrix A € C™*™, the problem of 
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computing Z,(-) is either in polynomial time or 
#P-hard. 


Applications 


None is reported. 


Open Problems 


The efficient approximation of Z,(-) remains 
widely open even for small nonnegative matri- 
ces. See the entry “Approximating the Partition 
Function of Two-Spin Systems” for the current 
state of the art on this. Two families of counting 
problems that generalize Z,(-) are counting con- 
straint satisfaction and Holant problems. Open 
problems in these two areas can be found in 
the two entries “Complexity Dichotomies for the 
Counting Constraint Satisfaction Problem” and 
“Holant Problems.” 


Experimental Results 


None is reported. 


URLs to Code and Data Sets 


None is reported. 
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Problem Definition 


In the middle of the last century, Nash [8] studied 
general noncooperative games and proved that 
there exists a set of mixed strategies, now 
commonly referred to as a Nash equilibrium, 
one for each player, such that no player can 
benefit if he/she changes his/her own strategy 
unilaterally. Since the development of Nash’s 
theorem, researchers have worked on how to 
compute Nash equilibria efficiently. Despite 
much effort in the last half century, no significant 
progress has been made on characterizing its 
algorithmic complexity, though both hardness 
results and algorithms have been developed for 
various modified versions. 

An exciting breakthrough, which shows that 
computing Nash equilibria is possibly hard, was 
made by Daskalakis, Goldberg, and Papadim- 
itriou [5], for games among four players or more. 
The problem was proven to be complete in PPAD 
(polynomial parity argument, directed version), a 
complexity class introduced by Papadimitriou in 
[9]. The work of [5] is based on the techniques 
developed in [6]. This hardness result was then 
improved to the three-player case by Chen and 
Deng [1] and Daskalakis and Papadimitriou [4], 
independently and with different proofs. Finally, 
Chen and Deng [2] proved that NASH, the prob- 
lem of finding a Nash equilibrium in a bimatrix 
game (or two-player game), is PPAD-complete. 

A bimatrix game is a noncooperative game 
between two players in which the players have 
m and n choices of actions (or pure strategies), 
respectively. Such a game can be specified by two 
m Xn matrices A = (a;,;) and B = (8;,;). 
If the first player chooses action i and the second 
player chooses action j, then their payoffs are 
aj,; and b;,;, respectively. A mixed strategy of 
a player is a probability distribution over his/her 
choices. Let P” denote the set of all probability 


Complexity of Bimatrix Nash Equilibria 


vectors in R”, i.e., nonnegative vectors whose 
entries sum to 1. The Nash equilibrium theorem 
on noncooperative games, when specialized to 
bimatrix games, states that for every bimatrix 
game G = (A,B), there exists a pair of mixed 
strategies (x* € P”,y* © P”), called a Nash 
equilibrium, such that for all x € P” andy € P”, 
(x*)TAY* > x? Ay* and (X*)™By* > (x*)'By. 

Computationally, one might settle with an ap- 
proximate Nash equilibrium. Let A; denote the 
ith row vector of A and B; denote the ith column 
vector of B. An e-well-supported Nash equilib- 
rium of game (A,B) is a pair of mixed strategies 
(x*, y*) such that 


Aiy">Ajy* +e>x7=0, Vi, j:1Si, j sm; 


(x*)'Bi>(x")'Bj-+e=>y7=0, Vi, j:1Si, jn. 


Definition 1 (2-NASH and NASH) The input 
instance of problem 2-NASH is a pair (G,0*) 
where G is a bimatrix game and the output is a 
2-* well-supported Nash equilibrium of G. The 
input of problem NASH is a bimatrix game G and 
the output is an exact Nash equilibrium of G. 


Key Results 


A binary relation R C {0,1}* x {0, 1}* is poly- 
nomially balanced if there exists a polynomial p 
such that for all pairs (x, y) € R, |y| < p(|x|). 
It is a polynomial-time computable relation if for 
each pair (x, y), one can decide whether or not 
(x, y) € R in polynomial time in |x| + |y| . The 
NP search problem QR specified by R is defined 
as follows: given x € {0,1}*, if there exists y 
such that (x, y) € R, return y; otherwise, return 
a special string “no.” 

Relation R is total if for every x € {0,1}*, 
there exists a y such that (x,y) € R. Follow- 
ing [7], let TFNP denote the class of all NP 
search problems specified by total relations. A 
search problem QR, € TFNP is polynomial-time 
reducible to problem Q pr, € TFNP if there exists 
a pair of polynomial-time computable functions 
(f, g) such that for every x of Rj, if y satisfies 
that (f(x), y) € Ro, then (x,g(y)) € Ri. 
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Furthermore, Q p; and Q r>2 are polynomial-time 
equivalent if O 2 is also reducible to QR. 

The complexity class PPAD is a subclass of 
TFNP, containing all the search problems which 
are polynomial-time reducible to: 


Definition 2 (Problem LEAFD) The input in- 
stance of LEAFD is a pair (M,0”), where M 
defines a polynomial-time Turing machine satis- 


fying: 


1. For every v € {0, 1}”, M(v) is an ordered pair 
(uy, U2) With u,, uz € {0, 1}"U {“no”}. 

2. M(0") = (‘no,” 1”) and the first component 
of M(1”) is 0”. 


This instance defines a directed graph G = 
(V, E) with V = {0,1}”". Edge (u,v) € E iff 
v is the second component of M(u) and u is the 
first component of M(v). 

The output of problem LEAFD is a directed 
leaf of G other than 0”. Here a vertex is called 
a directed leaf if its out-degree plus in-degree 
equals one. 

A search problem in PPAD is said to be 
complete in PPAD (or PPAD-complete) if there 
exists a polynomial-time reduction from LEAFD 
to it. 


Theorem ([2]) 2-Nash and Nash are PPAD- 
complete. 


Applications 


The concept of Nash equilibria has traditionally 
been one of the most influential tools in the study 
of many disciplines involved with strategies, such 
as political science and economic theory. The rise 
of the Internet and the study of its anarchical 
environment have made the Nash equilibrium an 
indispensable part of computer science. Over the 
past decades, the computer science community 
has contributed a lot to the design of efficient 
algorithms for related problems. This sequence 
of results [1-6], for the first time, provides some 
evidence that the problem of finding a Nash 
equilibrium is possibly hard for P. These results 
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are very important to the emerging discipline, 
algorithmic game theory. 


Open Problems 


This sequence of works shows that (r + 1)-player 
games are polynomial-time reducible to r-player 
games for every r > 2, but the reduction is car- 
ried out by first reducing (r + 1)-player games to 
a fixed-point problem and then further to r-player 
games. Is there a natural reduction that goes 
directly from (r + 1)-player games to r-player 
games? Such a reduction could provide a better 
understanding for the behavior of multiplayer 
games. Although many people believe that PPAD 
is hard for P, there is no strong evidence for this 
belief or intuition. The natural open problem is: 
can one rigorously prove that class PPAD is hard, 
under one of those generally believed assump- 
tions in theoretical computer science, like “NP 
is not in P” or “one-way function exists”? Such 
a result would be extremely important to both 
computational complexity theory and algorithmic 
game theory. 
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Problem Definition 


The core is one of the most important solution 
concepts in cooperative game, which is based on 
the coalition rationality condition: no subgroup of 
the players will do better if they break away from 
the joint decision of all players to form their own 
coalition. The principle behind this condition can 
be seen as an extension to that of the Nash 
Equilibrium in noncooperative games. The work 
of Fang, Zhu, Cai, and Deng [4] discusses the 
computational complexity problems related to the 
cores of some cooperative game models arising 
from combinatorial optimization problems, such 
as flow games and Steiner tree games. 

A cooperative game with side payments 
is given by the pair (N,v), where N = 
{1,2,...,m} is the player set and v : 2% +> R 
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is the characteristic function. For each coalition 
S C N, the value v(S) is interpreted as the 
profit or cost achieved by the collective action 
of players in S without any assistance of players 
in N\S. A game is called a profit (cost) game 
if the characteristic function values measure the 
profit (cost) achieved by the coalitions. Here, 
the definitions are only given for profit games, 
symmetric statements hold for cost games. 

A vector x = {xj,X2,...,Xy,} is called an 


imputation if it satisfies }* x; = v(N) and 
iceN 

Vi € N: x; = v({i}). The core of the game 

(N, v) is defined as: 


C(v) ={x € R": x(N) = v(N) 
and x(S) > v(S), VS Cc N}, 


where x(S) = )) x; for S C N. A game 
is called balanced. Hi its core is nonempty, and 
totally balanced, if every subgame (i.e., the game 
obtained by restricting the player set to a coalition 
and the characteristic function to the power set of 
that coalition) is balanced. 

It is a challenge for the algorithmic study of 
the core, since there are an exponential number 
of constraints imposed on its definition. The fol- 
lowing computational complexity questions have 
attracted much attention from researchers: 


1. Testing balancedness: Can it be tested in poly- 
nomial time whether a given instance of the 
game has a nonempty core? 

2. Checking membership: Can it be checked in 
polynomial time whether a given imputation 
belongs to the core? 

3. Finding a core member: Is it possible to find 
an imputation in the core in polynomial time? 


In reality, however, there is an important case 
in which the characteristic function value of a 
coalition can be evaluated via a combinatorial 
optimization problem, subject to constraints of 
resources controlled by the players of this coali- 
tion. In such circumstances, the input size of a 
game is the same as that of the related opti- 
mization problem, which is usually polynomial 
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in the number of players. Therefore, this class of 
games, called combinatorial optimization games, 
fits well into the framework of algorithm and 
complexity analysis. Flow games and Steiner tree 
games discussed in Fang et al. [4] fall within this 
scope. 


Flow Game Let D = (V,E;«;5,t) be a di- 
rected flow network, where V is the vertex set, 
E is the arc set, w:E —> Rt is the arc capacity 
function, and s and ¢ are the source and the sink 
of the network, respectively. Assume that each 
player controls one arc in the network. The value 
of a maximum flow can be viewed as the profit 
achieved by the players in cooperation. The flow 
game Ty = (E,v) associated with the network 
D is defined as follows: 


(i) The player set is F. 

di) VS C E, v(S) is the value of a maximum 
flow from s to ¢ in the subnetwork of D 
consisting only of arcs belonging to S. 


In Kailai and Zemel [6] and Deng et al. [1], it was 
shown that the flow game is totally balanced and 
finding a core member can be done in polynomial 
time. 


Problem 1 (Checking membership for flow 
game) INSTANCE: A flow network D = 
(V, E;w;s,t)andx: E> Rt. 

QUESTION: Is it true that x(£) = v(E) and 
x(S) > v(S) for all subsets S C E? 


Steiner Tree Game Let G = (V, E;) be an 
edge-weighted graph with V = {vp} UN UM, 
where N, M C V\{vo} are disjoint. vp represents 
a central supplier, N represents the consumer set, 
M represents the switch set, and w(e) denotes the 
cost of connecting the two endpoints of edge e di- 
rectly. It is required to connect all the consumers 
in N to the central supplier vg. The connection 
is not limited to using direct links between two 
consumers or a consumer and the central sup- 
plier; it may pass through some switches in M. 
The aim is to construct the cheapest connection 
and distribute the connection cost among the 
consumers fairly. Then, the associated Steiner 
tree game I; = (N, y) is defined as follows: 


373 


(i) The player set is NV. 

(ii) V S C N, y(S) is the weight of a minimum 
Steiner tree on G w.r.t. the set S U {vo}, that 
is, y(S) = min{ >) w(e): Ts = (Vs, Es) 


ecEs 
is a subtree of G with Vs D S U {vo}}. 


Different from flow games, the core of a Steiner 
tree game may be empty. An example with an 
empty core was given in Megiddo [9]. 


Problem 2 (Testing balancedness for a Steiner 
tree game) INSTANCE: An_ edge-weighted 
graph G = (V, E;@) with V = {vo} UNUM. 

QUESTION: Does there exist a vector x : 
N — R® such that x(N) = y(N) and x(S) < 
y(S) for all subsets S$ C N? 


Problem 3 (Checking membership for a 
Steiner tree game) INSTANCE: An edge- 
weighted graph G = (V,E;0) with V = 
{vo} UN UM andx: N > R™. 

QUESTION: Is it true that x(NV) = y(V) and 
x(S) < y(S) for all subsets S C N? 


Key Results 


Theorem 1 /t is NP-complete to show that 
given a flow game \r = (E,v) defined on 
network D = (V,E;@;s8,t) and a vector 
x: E > Rt with x(E) = v(E), whether there 
exists a coalition S C N such that x(S) < v(S). 
That is, checking membership of the core for flow 
games is co-NP-complete. 


The proof of Theorem | yields directly the 
same conclusion for linear production games. In 
Owen’s linear production game [10], each player 
J Gg € N) is in possession of an individual 
resource vector b/. For a coalition S of players, 
the profit obtained by S is the optimum value of 
the following linear program: 


max cly: Ay < >> B/, y>0 
Jes 


That is, the characteristic function value is what 
the coalition can achieve in the linear production 
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model with the resources under the control of 
its players. Owen showed that one imputation 
in the core can also be constructed through an 
optimal dual solution to the linear program which 
determines the value of NV. However, in general, 
there exist some imputations in the core which 
cannot be obtained in this way. 


Theorem 2 Checking membership of the core 
for linear production games is co- NP-complete. 


The problem of finding a minimum Steiner tree in 
a network is \/ P-hard; therefore, in a Steiner tree 
game, the value y(S) of each coalition S may not 
be obtained in polynomial time. It implies that the 
complement problem of checking membership of 
the core for Steiner tree games may not be in VP. 


Theorem 3 /t is NP-hard to show that given a 
Steiner tree game l'; = (N, y) defined on network 
G = (V, E;o) and a vector x : N — Rt with 
x(N) = y(N), whether there exists a coalition 
S CN such that x(S) > y(S). That is, checking 
membership of the core for Steiner tree games is 


NP-hard. 


Theorem 4 Testing balancedness for Steiner 
tree games is NP-hard. 


Given a Steiner tree game I’; = (N, y) defined on 
network G = (V, E;@) and a subset S$ C N, in 
the subgame (S, ys), the value y(S’) (S’ € S) is 
the weight of a minimum Steiner tree of G w.r.t. 
the subset S’U{vo}, where all the vertices in N\S 
are treated as switches but not consumers. It is 
further proved in Fang et al. [4] that determining 
whether a Steiner tree game is totally balanced is 
also \’P-hard. This is the first example of \’P- 
hardness for the totally balanced condition. 


Theorem 5 Testing total balancedness for 
Steiner tree games is NP-hard. 


Applications 


The computational complexity results on the 
cores of combinatorial optimization games have 
been as diverse as the corresponding combinato- 
rial optimization problems. For example: 


Complexity of Core 


1. In matching games [2], testing balancedness, 
checking membership, and finding a core 
member can all be done in polynomial time. 

2. In both flow games and minimum-cost span- 
ning tree games [3, 4], although their cores 
are always nonempty and a core member can 
be found in polynomial time, the problem of 
checking membership is co-/\VP-complete. 

3. In facility location games [5], the problem 
of testing balancedness is in general A/P- 
hard; however, given the information that the 
core is nonempty, both finding a core member 
and checking membership can be solved effi- 
ciently. 

4. In a game of sum of edge weight defined on a 
graph [1], all the problems of testing balanced- 
ness, checking membership, and finding a core 
member are VP-hard. 


Based on the concept of bounded rationality 
[3, 8], it is suggested that computational 
complexity be taken as an important factor in 
considering rationality and fairness of a solution 
concept. That is, the players are not willing to 
spend super-polynomial time to search for the 
most suitable solution. In the case when the 
solutions of a game do not exist or are difficult 
to compute or to check, it may not be simple to 
dismiss the problem as hopeless, especially when 
the game arises from important applications. 
Hence, various conceptual approaches are 
proposed to resolve this problem. 

When the core of a game is empty, it mo- 
tivates conditions ensuring nonemptiness of ap- 
proximate cores. A natural way to approximate 
the core is the least core. Let (N,v) be a profit 
cooperative game. Given a real number ge, the e- 
core is defined to contain the imputations such 
that x(S) > v(S) — « for each nonempty proper 
subset S of N. The /east core is the intersection 
of all nonempty e-cores. Let e* be the minimum 
value of ¢ such that the s-core is empty and then 
the least core is the same as the €*-core. 

The concept of the least core poses new chal- 
lenges in regard to algorithmic issues. The most 
natural problem is how to efficiently compute the 
value e* for a given cooperative game. The catch 
is that the computation of e* requires solving 
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of a linear program with an exponential number 
of constrains. Though there are cases where this 
value can be computed in polynomial time [7], 
it is in general very hard. If the value of &* is 
considered to represent some subsidies given by 
the central authority to ensure the existence of the 
cooperation, then it is significant to give the ap- 
proximate value of it even when its computation 
is NP-hard. 

Another possible approach is to interpret ap- 
proximation as bounded rationality. For exam- 
ple, it would be interesting to know if there is 
any game with a property that for any ¢ > 0, 
checking membership in the e-core can be done 
in polynomial time, but it is \/P-hard to tell if 
an imputation is in the core. In such cases, the 
restoration of cooperation would be a result of 
bounded rationality. That is to say, the players 
would not care an extra gain or loss of ¢€ as 
the expense of another order of degree of com- 
putational resources. This methodology may be 
further applied to other solution concepts. 
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Problem Definition 


We face the following problem. 


Problem 1 (Top-k document retrieval) Let 
D = {1T1,T2,...,Tp} be a collection of D 
documents of n characters in total, drawn from 
an alphabet set 3 = [ol]. The relevance of a 
document Tq with respect to a pattern P, denoted 
by w(P, d) is a function of the set of occurrences 
of P in Tg. Our task is to index D, such that 
whenever a pattern P[1, p] and a parameter 
k comes as a query, the k documents with the 
highest w(P, -) values can be reported efficiently. 
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Compressed Document Retrieval on String Collections, Table 1 Indexes of space 2|CSA| + D log(n/D) + 


O(D) + o(n) bits 
Source 

Hon et al. [3] 

Gagie et al. [2] 
Belazzougui et al. [1] 
Hon et al. [4] 


Report time per document 

Oltsa log? F€ n) 

O(tsa log D log(D/k) log! tT n) 
O(tsa log k log(D/k) log* n) 
O(tsa log k log* n) 


Compressed Document Retrieval on String Collections, Table 2 Indexes of space |CSA| + D log(n/D) + 


O(D) + o(n) bits 
Source 


Tsur [12] 
Navarro and Thankachan [8] 


Traditionally, inverted indexes are employed 
for this task in Information Retrieval. However, 
they are not powerful enough to handle sce- 
narios where the documents need to be treated 
as general strings over an arbitrary alphabet set 
(e.g., genome sequences in bioinformatics, text in 
many East-Asian languages) [5]. Hon et al. [3] 
proposed the first solution for Problem 1, re- 
quiring O(nlogn) bits of space and O(p + 
klogk) query time. Later, optimal O(p + k) 
query time indexes were proposed by Navarro 
and Nekrich [7], and also by Shah et al. [11]. 
There also exist compressed/compact space so- 
lutions, tailored to specific relevance functions 
(mostly term-frequency or PageRank). In this 
article, we briefly survey the compressed space 
indexes for Problem | for the case where the rele- 
vance function is term-frequency (i.e., w(P, d) is 
the number of occurrences of P in Tz), which we 
call the Compressed Top-k Frequent Document 
Retrieval (CTFDR) problem. 


Key Results 


First we introduce some notations. For conve- 
nience, we append a special character $ to every 
document. Then, T = T, 0 Tz 0-::0 Tp is 
the concatenation of all documents. GST, SA, 
and CSA are the suffix tree, suffix array, and a 
compressed suffix array of T, respectively. Notice 
that both GST and SA take O(n logn) bits of 
space, whereas the space of CSA (|CSA| bits) 


Report time per document 
O(tsa log k log! *€ n) 
O( tga log” k log® n) 


can be made as close as the minimum space for 
maintaining D (which is not more than n logo 
bits) by choosing an appropriate version of CSA 
[6]. Using CSA, the suffix range [sp,ep] of 
P[1, p], as well as any SA[-], can be computed 
in times search(p) and tsa, respectively. Hon 
et al. [3] gave the first solution for the CTFDR 
Problem, requiring roughly 2|CSA| bits of space, 
whereas the first space-optimal index was given 
by Tsur [12]. Various improvements on both 
results have been proposed and are summarized 
in Tables | and 2. Notice that the total query 
time is search(p) plus k times per reported 
document. 


Notations and Basic Framework 
The suffix tree GST of T can be considered as a 
generalized suffix tree of D, where 


¢ £; is the ith leftmost leaf in GST 

¢ doc(;) is the document to which the suffix 
corresponding to £; belongs 

¢ Leaf(u) is the set of leaves in the subtree of 
node u 

¢ tf(u,d) is the number of leaves in Leaf(u) 
with doc(-) = d 

¢ Top(u,k) is the set k of document identifiers 
with the highest tf(u, -) value 


From now onwards, we assume all solutions 
consists of a fully compressed representation of 
GST in |CSA| + o(n) bits [10], and a bitmap 
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B{l,n], where Bli] = 1 iff T[7] = $. We use 
a Dlog(n/D) + O(D) + o(n) bits representa- 
tion of B with constant time rank/select query 
support [9]. Therefore, doc(¢;) can be computed 
as 1 plus the number of 1’s in B[1, SA[i] — 1] 
in time tsa + O(1). Observe that a CTFDL 
query (P,k) essentially asks to return the set 
Top(up,k), where up is the locus node of P in 
GST. Any superset of Top(up, k) be called as a 
candidate set of (P,k). The following lemma is 
crucial. 


Lemma 1 The set Top(w,k) U {doc(@;)|€; € 
Leaf(up)\Leaf(w)} is a candidate set of (P,k), 


where w is any node in the subtree of up. 


All query processing algorithms consist of the 
following two steps: (i) Generate a candidate set 
C of size as close to k as possible. (ii) Compute 
tf(up,d) of all d € C and report those k docu- 
ment identifiers with the highest tf(up,-) values 
as output. 


An Index of Size » 2|CSA| Bits 
Queries are categorized into O(log D) different 


types. 


Definition 1 A query (P,k) is of type x if 
flogk] = x. 


We start with the description of a structure DS, 
(of size |DS,,| bits) that along with GST and B 
can generate a candidate set of size proportional 
to k for any type-x query. The first step is to 
identify a set Marky of nodes in GST using the 
scheme described in Lemma 2 (parameter g will 
be fixed later). Then maintain Top(u’, 2”) for all 
u’ € Markg. 


Lemma 2 ((3]) There exists a scheme to 
identify a set Markg of nodes in GST (called 
marked nodes) based on a grouping factor g, 
where the following conditions are satisfied: 
(i) |Markg| = O(n/g), (ii) if it exists, the highest 
marked node u' € Markg in the subtree of any 
node u is unique, and Leaf(u)\Leaf(u’) < 2g. 
For example, Markg can be the set of all lowest 
common ancestor (LCA) nodes of £; and ti+¢, 
where i is an integer multiple of g. 
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Using DS,, any type-x query (P,k) can be 
processed as follows: find the highest node uw’, 
in the subtree of up and generate the following 
candidate set. 


Top(u'p, 2*)U{doc(é;) |; eLeaf(up )\Leaf(u'p)} 


From the properties described in Lemma 2, the 
cardinality of this set is O(2* + g) = O(k + g) 
and the size of DS, is O((n/g)2* log D) bits. 
By fixing g = 2* log?*€ n [3], we can bound 
the set cardinality by O(k log?** n) and |DS,| 
by O(n/log'*€ n) bits. Therefore, we maintain 
DS, for x = 1,2,3,...,logD in o(n) bits 
overall, and whenever a query comes, generate a 
candidate set of size O(k log?*¢ n) using appro- 
priate structures. The observation by Belazzougui 
et al. [1] is that g = x2* log!t€n in the above 
analysis yields a candidate set of even lower 
cardinality O(k logk log!T€ n), without blowing 
up the space. 

Later, Hon et al. [4] came up with another 
strategy for generating a candidate set of even 
smaller size, O(k logk log‘ n). They associate 
another structure DS* (of space |DS*| bits) 
with each DS,. Essentially, DS* maintains 
Top(u”,2*) of every uw’ € Mark, with 
h = x2*log*n in an encoded form. Now, 
whenever a type-x query (P,k) comes, we 
first find the highest node uw‘, in the subtree 
of up that belongs to Mark, and generate the 
candidate set Top(u',,2*) U {doc(€;)|&; «€ 
Leaf(up)\Leaf(u',)}, whose cardinality is 
O(2* + h) = O(k logk log‘ n). 

We now describe the scheme for encoding 
a particular Top(w’,2*). Let wu’ be the highest 
node in the subtree of w”, that belongs to Markg. 
Then, Top(u”, 2*) C Top(u’, 2*) U {doc(£;) |e; € 
Leaf(u’’)\Leaf(u')}. Notice that Top(u”, 2*) is 
stored in DS, and any doc(¢;) can be decoded in 
O(tsa) time. Therefore, instead of explicitly stor- 
ing an entry d within Top(w”, 2*) in log D bits, 
we can refer to the position of d in Top(u’, 2*) 
if d € Top(w’,2*), else refer to the relative 
position of a leaf node @; in Leaf(u’’)\Leaf(u’) 
with doc(€;) = d. Therefore, maintaining the 
following two bitmaps is sufficient. 
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¢ F[1,2*], where F[i] = 1 iff ith entry in 
Top(u’, 2*) is present in Top(u”, 2”) 

° Fl, |Leaf(u”)\Leaf(u’)|], Fi] = 1 iff 
doc(-) of ith leaf node in Leaf(w’)\Leaf(u’) is 
present in Top(u”, 2*), but not in Top(u’, 2*). 


As the total length and the number of 1’s over 
F and F’ is O(g + 2*) and O(2*), respectively, 
we can encode them in O(2* log(g/2*)) = 
O(2* loglogn) bits. Therefore, |DS*| = 
O((n/(x2* log* n))2* log logn) bits and Sead 
|DS,.| = o(n) bits. 


Lemma 3 By maintaining a |CSA| + o(n) + 
D log(n/D) + O(D) bits space structure (which 
includes the space of CSA and B), a candidate 
set of size O(k log k log‘ n) can be generated for 
any query (P,k) in time O(tga - k log k log‘ n). 


We now turn our attention to Step 2 of the 
query algorithm. Let [sp, ep] be the suffix range 
of P in CSA and [spqg, epa] be the suffix range 
of P in CSAq, the compressed suffix array of 
Tg. Hon et al. [3] showed that by addition- 
ally maintaining all CSAqg’s (in space roughly 
x |CSA| bits), any [spg,epqg] can be com- 
puted in time O(tga logn) (and thus tf(P,d) = 
€Pad — Spa + 1). Belazzougui et al. [1] improved 
this time to O(tsaloglogn) using o(n) extra 
bits. Combined with Lemma 3, this gives the 
following. 


Theorem 1 ([4]) Using a 2|CSA| + o(n) + 
Dlog(n/D) + O(D) bits space index, top- 
k frequent document retrieval queries can be 
answered in O(search(p) + k «tga log k log‘ n) 
time. 


Space-Optimal Index 

Space-optimal indexes essentially circumvent the 
need of CSA,’s. We first present a simplified 
version of Tsur’s index (with slightly worse query 
time). To handle type-x queries, first identify the 
set of nodes Markg based on a grouping factor 
g (to be fixed later). Tsur proved that each node 
u’ € Markg can be associated with a set Set(u’) 
of O(2* + ./2*g) document identifiers, such 
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that Set(u’,) represents a candidate set, where 
Up is the highest node in the subtree of up that 
belongs to Markg. Therefore, we can store S(u’) 
for every u’ € Marky along with tf(u’,d) of 
every d € S(w’) in O((n/g)(2* + /2*g) logn) 
bits. Now a type-x query can be processed as 
follows: 


1. Find U'ps the highest node in the subtree of up 
that belongs to Markg. 

2. Extract Set(w’,) and tf(w’,,d) of alld ¢ 
Set(u’p ). 

3. Scan the leaves in Leaf(up)\Leaf(w’,), 
decode the corresponding doc(-) values 
and compute tf(up,d) — tf(u'p.d) for all 
d € Set(u'p). 

4. Then obtain ti(up,d) = tfwp.,d) + 
(tf(up,d) —tf(u'p, d)) for all d € Set(u’p). 

5. Report k documents within Set(w’p) with 
highest tf(up,-) values as output. 


In summary, an O((n/g)(2* + /2* g) logn)- 
bit structure (along with GST and B) can answer 
any type-x query in O(search(p)+(|Set(w’, )|+ 
|Leaf(up)\Leaf(w’, )|) - tsa) = O(search(p) + 
(2* + /2*g + g)-tsa) = O(search(p) + (g + 
2*) - tga) time. By fixing g = x?2* log”*€ n, the 
query time can be bounded by O(search(p) + 
k - tga log’ k log?** n) and the overall space cor- 
responding to x = 1,2,3,..., log D is o(n) bits. 
We remark that the index originally proposed by 
Tsur is even faster. 

Navarro and Thankachan [8] observed that 
each document identifier and the associated tf(-, -) 
value can be compressed into O(log log 7) bits. 
For compressing document identifiers, ideas from 
Hon et al. [4] were borrowed. For compress- 
ing tf(-,-) values, they introduced an o(n)-bit 
structure, called sampled document array that 
can compute an approximate value of any tf(-, -) 
(denoted by tf*(-,-)) in time O(log logn) within 
an additive error of at most log” n. This means 
that, instead of storing tf(-,-), storing tf(-,-) — 
tf*(-,-) (in just O(loglogn) bits) is sufficient. 
In summary, by maintaining an O((n/g)(2* + 
/2* g) log log n)-bit structure (along with GST, 
B and the sampled document array), any type-x 
query can be answered in O((g + 2*) - tsa) time. 


Compressed Range Minimum Queries 


A similar analysis with g = x?2* log‘ n gives the 
following result. 


Theorem 2 ({8]) Top-k frequent document re- 
trieval queries can be answered in O(search(p)+ 
k-tga log? k log* n) time using a |CSA|+0(n)+ 
D log(n/D) + O(D)-bit index. 
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Problem Definition 


Given a static array A of n totally ordered 
objects, the range minimum query problem 
(RMQ problem) is to build a data structure D on 
A that allows us to answer efficiently subsequent 
online queries of the form “what is the position of 
a minimum element in the subarray ranging from 
i to 7?” (We consider the minimum; all results 
hold for maximum as well.) Such queries are 
denoted by RMQ,(i, 7) and are formally defined 
by RMQa4(i,j) = argmin; <,< ; { A[k]} for an 
array A[l,n] and indices 1 <i < j <n. In 
the succinct or compressed setting, the goal is 
to use as few bits as possible for D, hopefully 
sublinear in the space needed for storing A 
itself. The space for A is denoted by |A| and 
is |A] = O(nlogn) bits if A stores numbers 
from a universe of size nO, 


Indexing Versus Encoding Model 

There are two variations of the problem, depend- 
ing on whether the input array A is available 
at query time (indexing model) or not (encoding 
model). In the indexing model, some space for the 
data structure D can in principle be saved, as the 
query algorithm can substitute the “missing in- 
formation” by consulting A when answering the 
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queries, and this is indeed what all indexing data 
structures make heavy use of. However, due to 
the need to access A at query time, the total space 
(which is |A| + |D] bits) will never be sublinear 
in the space needed for storing the array A itself. 

This is different in the encoding model, where 
the data structure D must be built in a way such 
that the query algorithm can derive its answers 
without consulting A. Such encoding data struc- 
tures are important when only the positions of 
the minima matter (and not the actual values) or 
when the access to A itself is slow. 

Any encoding data structure Dg is automati- 
cally also an indexing data structure; conversely, 
an indexing data structure D; can always be “‘con- 
verted” to an encoding data structure by storing 
D, plus (a copy of) A. Hence, differentiating 
between the two concepts only makes sense if 
there are indexing data structures that use less 
space than the best encoding data structures and 
if there exist encoding data structures which use 
space sublinear in |A|. Interestingly, for range 
minimum queries, exactly this is the case. 


Model of Computation 
All results assume the usual word RAM model of 
computation with word size S2 (log 7) bits. 


Key Results 


Table 1 summarizes the key results from [12] 
by showing the sizes of data structures for range 
minimum queries (left column). The first data 
structure is in the indexing model, and the last 
two are encoding data structures. The leading 
terms (2n/c(n) bits with O(c(n)) query time 
in the indexing model and 2n bits for arbitrary 
query time in the encoding model) are optimal: 
in the encoding model, this is rather easy to see 
by establishing a bijection between the class 


Compressed Range Minimum Queries, Table 1 Data 
structures [12] for range minimum queries, where |A| 
denotes the space of the (read-only) input array A. Con- 


Final space (bits) Construction space 


|Al+ ay — O(Zaicgn) | O(log? n) 

log lo; 
2n + O(n log logn/logn) Ale +O a) 
2n + O(n/polylog n) |A| + O(n) 
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of binary trees and the class of arrays with 
different answers for at least one RMQ [12], 
and in the indexing model, Brodal et al. [4] prove 
the lower bound. Particular emphasis is placed 
on the additional space needed for constructing 
the data structure (middle column), where it is 
important to use asymptotically less space than 
the final data structure. 


Extensions 


Surpassing the Lower Bound 

Attempts have been made to break the lower 
bound for special cases. If A is compressible 
under the order-k empirical entropy measure 
H(A), then also the leading term 2n/c(n) of 
the indexing data structure can be compressed to 
nH (A) [12]. Other results (in both the indexing 
and the encoding model) exist for compressibility 
measures based on the number of runs in A [2]. 
Davoodi et al. [8] show that random input arrays 
can be encoded in expected 1.919n + o(n) bits 
for RMQs. All of the above results retain constant 
query times. 


Top-k Queries 

A natural generalization of RMQs is listing the 
k smallest (or largest) values in a query range 
(k needs only be specified at query time). In 
the indexing model, any RMQ structure with 
constant query time can be used to answer top- 
k queries in O(k) time [16]. 

Recently, an increased interest in encoding 
data structures for top-k queries can be observed. 
For general k, Grossi et al. [15] show that 
Q(nlogk) bits are needed for answering top- 
k queries. Therefore, interesting encodings can 
only exist if an upper limit « on k is given at 
construction time. This lower bound is matched 


struction space is in addition to the final space of the data 
structure. All data structures can be constructed in O(n) 
time, and query time is O(1) unless noted otherwise 


Comment 
Query time O(c()) for c(n) = O(n*),0<e <1 
Construction space improved to | A| + o(7) [8] 


Using succinct data structures from [23] and [21] 


Compressed Range Minimum Queries 


asymptotically by an encoding data structure 
using O(nlogk) bits and O(k) query time by 
Navarro et al. [22]. For the specific case k = 2, 
Davoodi et al. [8] provide a lower bound of 
2.656n — O(log n) bits (using computer-assisted 
search) and also give an encoding data structure 
using at most 3.272 + 0() bits supporting top-2 
queries in O(1) time. 


Range Selection 

Another generalization are queries asking for 
the k-th smallest (or largest) value in the 
query range (k is again part of the query). 
This problem is harder for nonconstant k, as 
Jgrgensen and Larsen prove a lower bound of 
Q (log k/ log log n) on the query time when using 
O(npolylogn) words of space [17]. Also, the 
abovementioned space lower bounds on encoding 
top-k queries also apply to range selection. 
Again, Navarro et al. [22] give a matching upper 
bound: O(n log x) bits suffice to answer queries 
asking for the k-largest element in a query range 
(k < x) in O(logk/loglogn) time. Note that 
this includes queries asking for the median in a 
query range. 


Higher Dimensions 

Indexing and encoding data structures for range 
minima also exist for higher-dimensional arrays 
(matrices) of total size N. Atallah and Yuan [25] 
show an (uncompressed) indexing data structure 
of size O(24d!N) words with O(3%) query 
time, where d is the dimension of the underlying 
matrix A. 

Tighter results exist in the two-dimensional 
case [1], where A is an (m x 1) matrix consisting 
of N = m-n elements (w.l.o.g. assume m < 
n). In the indexing model, the currently best 
solution is a data structure of size |A| + O(N/c) 
bits (1 < c < ny) that answers queries in 
O(c log c log” log c) time [3], still leaving a gap 
between the highest lower bounds of §2(c) query 
time for O(N /c) bits of space [4]. 

In the encoding model, a lower bound of 
Q(N logm) bits exists [4], but the best data 
structure with constant query time achieves only 
O(N min{m,logn}) bits of space [4], which 
still leaves a gap unless m = n?“), However, 


381 


Brodal et al. [5] do show an encoding using 
only O(N log m) bits, but nothing better than the 
trivial O(N) can be said about its query time. 
Special cases for small (constant) values of m 
(e.g., optimal 57 + o(n) bits for m = 2) and also 
for random input arrays are considered by Golin 
et al. [14]. 


Further Extensions 

The indexing technique [12] has been generalized 
such that a specific minimum (e.g., the position 
of the median of the minima) can be returned if 
the minimum in the query range is not unique 
[11]. Further generalizations include functions 
other than the “minimum” on the query range, 
e.g., median [17], mode [6], etc. RMQs have 
also been generalized to edge-weighted trees [9], 
where now a query specifies two nodes v and w, 
and a minimum-weight edge on the path from v 
to w is sought. 


Applications 


Data structures for RMQs have many applica- 
tions. Most notably, the problem of preprocessing 
a tree for lowest common ancestor (LCA) queries 
is equivalent to the RMQ problem. In succinctly 
encoded trees (using balanced parentheses), 
RMQs can also be used to answer LCA queries 
in constant time; in this case, the RMQ structure 
is built on the virtual excess sequence of the 
parentheses and uses only o() bits in addition to 
the parenthesis sequence [24]. Other applications 
of RMQs include document retrieval [20], 
succinct trees [21], compressed suffix trees [13], 
text-index construction [10], Lempel-Ziv text 
compression [e.g., 18], orthogonal range search- 
ing [19], and other kinds of range queries [7]. 
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Problem Definition 


The problem is to represent a graph of a given 
size and type (e.g., an n node planar graph) 
in a compressed form while still supporting 
efficient navigation operations. More formally, 
if G = (V,E) is a graph of a given type yx, 
with n nodes and m edges, then represent G 
using lg |y| + o(1g |x|) bits of space, and support 
a set of appropriate operations on the graph in 
constant time (assuming we can access O(lgn) 
consecutive bits in one operation). This may not 
be possible; if not, then explore what trade-offs 
are possible. Data structures that achieve this 
space bound are called succinct [13]. To simplify 
the statement of results, we assume the graph G 
in question contains no self-loops, and we also 
restrict ourselves to the static case. 


Key Results 


Outerplanar, Planar, and k-Page Graphs 

The area of succinct data structures was initiated 
by Jacobson [13], who presented a succinct rep- 
resentation of planar graphs. His approach was 
to decompose the planar graph into at most four 
one-page (or outerplanar) graphs by applying a 
theorem of Yannikakis [20]. Each one-page graph 
is then represented as a sequence of balanced 
parentheses: this representation extends naturally 
to k-pages for k > 1. Using this representation, 
it is straightforward to support the following 
operations efficiently: 


* Adjacent(x, y): report whether there is an 
edge (x,y) € E. 

* Neighbors(x): report all vertices that are 
adjacent to vertex x. 

* Degree(x): report the degree of vertex x. 


Munro and Raman [16] improved Jacobson’s 
balanced parenthesis representation, thereby im- 
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proving the constant factor in the space bound 
for representing both k-page and planar graphs. 
We present Table 1 which compares a represen- 
tative selection of the various succinct data struc- 
tures for representing planar graphs. Subsequent 
simplifications to the representation of balanced 
parentheses have been presented; cf., [11, 17]. 
Barbay et al. [1] present results for larger values 
of k, as well as the case where the edges or 
vertices of the graph have labels. 

The decomposition into four one-page graphs 
is not the only approach to representing static 
planar graphs. Chuang et al. [7] presented an- 
other encoding based on canonical orderings of 
a planar graph and represented the graph using 
a multiple parentheses sequence (a sequence of 
balanced parentheses of more than one type). 
Later, Chiang et al. [6] generalized the notion 
of canonical orderings to orderly spanning trees, 
yielding improved constant factors in terms of 
n and m. Gavoille and Hanusse [10] presented 
an alternate encoding scheme for k-page graphs 
that yields a trade-off based on the number of 
isolated nodes (connected components with one 
vertex). Further improvements have been pre- 
sented by Chuang et al. [7], as well as Castelli 
Aleardi et al. [5], for the special case of planar 
triangulations. 

Blandford et al. [2] considered unlabelled 
separable graphs. A separable graph is one 
that admits an O(n‘) separator for c < 1. 
Their structure occupies O(n) bits and performs 
all three query types optimally. Subsequently, 
Blelloch and Farzan [3] made the construction 
of Blandford et al. [2] succinct in the sense 
that, given a graph G from a separable class 
of graphs x (e.g., the class of arbitrary planar 
graphs), their data structure represents G using 
Ig|y| + o(m) bits. Interestingly, we need not 
even know the value of lg || in order to use this 
representation. 


Arbitrary Directed Graphs, DAGs, 

Undirected Graphs, and Posets 

We consider the problem of designing succinct 
data structures for arbitrary digraphs. In a 
directed graph, we refer to the set of vertices 
{y : (x,y) € E} as the successors of x and 
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Compressed Representations of Graphs, Table 1 
Comparison of various succinct planar graph representa- 
tions. The second column indicates whether the represen- 
tation supports multigraphs. For the entries marked with a 
+, the query cost is measured in bit accesses. Notation: 


Compressed Representations of Graphs 


is the number of vertices in G, m is the number of edges 
in G, i is the number of isolated vertices in G, € is an ar- 
bitrary positive constant, tT = min{lgk/1glgm, lglg k}, 
and #7 is the information theoretic lower space bound for 
storing a graph G drawn from a class of separable graphs 


Type Multi Ref. 
k-page N [13] 
[10] 
[1] 
[1] 


ZA 2 


[16] 
[13] 
[10] 
[7] 
[6] 
[16] 
[7] 
[6] 
[7] 
[7] 
[2] 
[3] 


Planar 


Triangulation 


Separable 


ZAK ZAK KK 2/2/24) 24 


Space in bits 

O(kn) 

(2(m+i) +o(m+i))lgk 
2mlgk +n+o(mlgk) 


(2 + e)migk +n + 
o(mlgk) 


2m + 2kn + o(kn) 
O(n) 

12n + 4i + o(n) 

3m + (5+ 6)n + o0(n) 
2n+o(m+n) 
8n + o(n) 
2m+(5+e)n+o(n) 
3n +o0(m+n) 
2m+n+o(n) 

2m + 2n + 0(n) 

O(n) 

H + o0(n) 


2m 


2m 


2m 


Adjacent(x,y) Neighbors(x) Degree(x) 
O(lgn + k)t O(deg(x)lgn+k)t O(gn)t 
O(tlgk) O(deg(x)t) O(A1) 
O(igk lglg k) O(deg(x) lglg k) O(A) 
O(igk) O(deg(x)) O(1) 
O(k) O(deg(x) + k) O(1) 
O(ign)+ O(deg(x) Ign) + O(ign)t 
O(1) O(deg(x)) O(1) 
O(1) O(deg(x)) O(1) 
O(1) O(deg(x)) O(1) 
O(1) O(deg(x)) O(1) 
O(1) O(deg(x)) O(1) 
O(1) O(deg(x)) O(1) 
O(1) O(deg(x)) O(1) 
O(1) O(deg(x)) O(1) 
O(1) O(deg(x)) O(1) 
O(1) O(deg(x)) O(1) 


the set of vertices {y 
predecessors of x. 

It is well known that an arbitrary directed 
graph can be represented using n x n bits by 
storing its adjacency matrix. This representa- 
tion supports the Adjacency(x, y) operation 
by probing a single bit in the table in constant 
time. On the other hand, we can represent the 
graph using © (7m lg n) bits using an adjacency list 
representation, such that the following operations 
can be supporting in constant time: 


(y,x) € E} as the 


* Successor(x,/): list the ith successor of 
vertex x 

* Predecessor(x,/): list the ith predecessor 
of vertex x 


The information theoretic lower bound 
dictates that essentially lg (”") bits are necessary 
for representing an arbitrary digraph. By repre- 
senting each row (resp. column) of the adjacency 
matrix using an indexable dictionary [19], we get 
a data structure that supports Adjacency and 
Successor (resp. Predecessor) queries in 


constant time and occupies lg (=) +0 (Ig a )) 


bits of space. Note that we can only support 
two of the three operations with this approach. 
Farzan and Munro [9] showed that if O(n*) < 
m<@O (n?-*) for some constant e > 0, then 


(1 + &’) ("") bits are sufficient and required to 
support all three operations. In a more general 
setting, Golynski [12] had proven that the 
difficulty in supporting both Successor and 
Predecessor queries simultaneously using 
succinct space relates to the fact that they have 
the so-called reciprocal property. In other words, 
if the graph is not extremely sparse or dense, then 
it is impossible to support all three operations 
succinctly. On the other hand, if m = o(n*) 
or m = Q(n?/1g!*n), for some constant 
€é > 0, then Farzan and Munro [9] showed that 
succinctness can be achieved while supporting 
the three operations. 

Suppose G is a directed acyclic graph instead 
of an arbitrary digraph. In this case, one can 
exploit the fact that the graph is acyclic by 
ordering the vertices topologically. This ordering 
induces an adjacency matrix which is upper 
triangular. Exploiting this fact, when O(n*) < 
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m < @O(n?-®), the data structure of Farzan and 
Munro can support all three operations using 
(1 + e)lg ((2)) bits, for an arbitrary positive 
constant ¢, which is optimal. If m = o(n*) or 
m = 2(n?/\g'~*n), for some constant ¢ > 0, 
then they achieve lg (ya + o(1)) bits. By 
orienting the edges in an undirected graph so 
that they are directed toward the vertex with the 
with the larger label, this representation can also 
be used to support the operations Adjacency 
and Neighbors on an arbitrary undirected 
graph. 

A partial order or poset is a directed acyclic 
graph with the additional transitivity constraint 
on the set of edges E: if (x, y) and (y,z) are 
present in EF, then it is implied that (x,z) € E. 
Farzan and Fischer [8] showed that a poset can 
be stored using 2nw(1 + o(1)) + A + e)nign 
bits of space, where w is the width of the poset 
—ie., the length of the maximum antichain — 
and ¢ is an arbitrary positive constant. Their 
data structure supports Adjacency queries in 
constant time, as well as many other operations 
in time proportional to w. This matches a lower 
bound of Brightwell and Goodall [4] up to the 
additive en lgn term when n is sufficiently large 
relative to w. For an arbitrary poset, Kleitman 
and Rothschild showed that n?/4 + O(n) bits are 
sufficient and necessary by a constructive count- 
ing argument [14]. Munro and Nicholson [15, 18] 
showed that there is a data structure that occu- 
pies n?/4 + o(n?) bits, such that Adjacency, 
Predecessor, and Successor queries can 
be supported in constant time. 
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Problem Definition 


Given a fext string T = tto...t, over an 
alphabet © of size o, the suffix array A[1, 7] 
is a permutation of the interval [l,m] that 
sorts the suffixes of T. More precisely, it 
satisfies T[A[i],n] < T[A[i + 1],n] for all 
1 < i < _n, where “<” between strings is 
the lexicographical order. The suffix array is 
the canonical full-text index that allows to 
efficiently compute basic string matching queries 
on T. 

The compressed suffix array (CSA) problem 
asks to replace A with a space-efficient data 
structure that is capable of efficiently computing 
A{i]. 


Compressed Suffix Array 


If a CSA does not require T to operate, and is 
capable of efficiently answering substring queries 
on T, itis called a self-index, as it can be seen as a 
replacement of 7 itself. Typical queries required 
from such an index are the following: 


* count(P): count how many times a given 
pattern string P = pi p2... Pm occurs in T. 

¢ locate(P): return the locations where P oc- 
curs in T. 

¢ display(i, 7): return T[7, /]. 


Key Results 


Ww -Based CSAs 

The first solution to the problem is by Grossi 
and Vitter [8], who exploit the regularities of the 
suffix array via the W-function: 


Definition 1 Given suffix array A[1, 1], function 
W : [1,n] — [1,n] is defined so that, for all 1 < 
i <n, A[W(@)] = Ali] + 1. The exception is 
A[1] = n, in which case the requirement is that 
A[W(1)] = 1 so that W is a permutation. 


The following lemma shows that W is appeal- 
ing to compression: 


Lemma 1 Given a text T[1,n], its suffix array 
A[l,n], and the corresponding function W, it 
holds W(i) < Wi + 1) whenever Taji = 
TAti+1]- 


Grossi and Vitter used a hierarchical decom- 
position of YW into h = [loglogn] levels. The 
piecewise increasing property of W can be used 
to represent each level of W in in logo bits 
[8]. By storing some sampled values of A in 
the bottom level, any A[i] can be computed by 
traversing the hierarchical structure. Other trade- 
offs are possible using different amount of levels. 
The following one involves the use of a constant 
number of levels: 


Theorem 1 (inspired from [8]) The Com- 
pressed Suffix Array of Grossi and Vitter 
supports retrieving Ali] in O(log‘ n) time using 
(4n) logo + O(n log logo) bits of space, for any 
O<e<l. 
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As a consequence, simulating the classical 
binary searches [13] to find the range of suffix 
array containing all the occurrences of a pat- 
tern P[1,m] in T[1,n], can then be done in 
O(mlog!*€ n) time. 

Sadakane [16] shows how the above com- 
pressed suffix array can be converted into a self- 
index, and at the same time optimized it in several 
ways. 

Sadakane represents both A and 7 using the 
full function W, and a few extra structures. Imag- 
ine one wishes to compare P against T[A[i], n]. 
For the binary search, one needs to extract enough 
characters from T[A[i], 1] so that its lexicograph- 
ical relation to P is clear. Retrieving charac- 
ter T[A[i]], given i, is easy. Use a bit vector 
F[1,n] marking the suffixes of A[i] where the 
first character changes from that of A[i —1]. After 
preprocessing F for rank-queries, computing 
j = rank;(F,i) tells us that T[A[i]] = c;, 
where c; is the j-th smallest alphabet character. 
Once T[A[i]] = c; is determined this way, one 
needs to obtain the next character, T[A[i] + 1]. 
But T[A[i]+1] = T[A[W@)]], so one can simply 
move toi’ = W(i) and keep extracting characters 
with the same method, as long as necessary. Note 
that at most |P| = m characters suffice to decide 
a comparison with P. Thus the binary search is 
simulated in O(m log n) time. 

Up to now, the space used is n + o(n) + 
o logo bits for F and »’. Sadakane [16] gives an 
improved representation for W using O(n Hp + 
nlogloga) bits, where Ho is the zeroth order 
entropy of T. 

Sadakane also shows how Ali] can be re- 
trieved, by plugging in the hierarchical scheme 
of Grossi and Vitter. He adds to the scheme the 
retrieval of the inverse A~'[/]. This is used in or- 
der to retrieve arbitrary text substrings T[p, 1], by 
first applying i = A~![p] and then continuing as 
before to retrieve r— p+ 1 first characters of suffix 
T[A[i],n]. This capability turns the compressed 
suffix array into self-index. The following bound 
is a modified version of Sadakane’s CSA taken 
from [15]: 


Theorem 2 The Compressed Suffix Array of 
Sadakane is a self-index occupying 4nHo + 


€ 
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O(nlogloga) bits, and supporting retrieval 
of values Afi] and A~'[j] in O(log‘ n) time, 
counting of pattern occurrences in O(m logn) 
time, and displaying any substring of T of length 
£ in O(£ + log‘ n) time. Here 0 < € < lisan 
arbitrary constant. 


Grossi, Gupta, Vitter, and Foschini [6,9] have 
improved the space requirement of compressed 
suffix arrays to depend on the k-th order entropy 
Hy of T. The idea behind this improvement is a 
more careful analysis of regularities captured by 
the W-function when combined with the indexing 
capabilities of their new elegant data structure, 
wavelet tree. They obtain, among other results, 
the following tradeoff: 


Theorem 3 (Grossi, Gupta, and Vitter [9]) 
The Compressed Suffix Array of Grossi, Gupta, 
and Vitter is a self-index of size 1 nH + 
o(n logo) bits, that supports A[i] and A~'[j] in 
O(log!*€ n/e) time, count(P) in O(mloga + 
log? n/e) time, and display(i, j) in O((j —i)/ 
log, n + log't€ n/e) time. Here 0 < € < lis 
an arbitrary constant, k < alog,n for some 
constant0 <a <1. 


They also obtain an interesting special case: 


Theorem 4 (Grossi, Gupta, and Vitter [9]) 
The space optimized Compressed Suffix Array 
of Grossi, Gupta, and Vitter is a_self-index 
of size nH, + o(nloga) bits, that supports 
Afi] and A~![j] in O(log n/loglogn) time, 
count(P) in O(m logn logo + log? n/ log logn) 
time, and display(i, 7) in O((j — i)/logo + 
log*n/loglogn) time. Here k < alog,n for 
some constant 0 <a <1. 


In the above results, value k must be fixed 
before building the indexes. Later, they notice 
that a simple coding of W-values yields an n Hx- 
dependent bound without the need of fixing k 
beforehand [6]. 


FM-Index 

A different solution to the problem (at least 
on the surface) is obtained by exploiting the 
connection of Burrows-Wheeler Transform 
(BWT) [2] and Suffix Array data structure 
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[13]. The BWT is formed by a permutation 
T™ of T defined as T™ [i] = T[A[i] — 
1] for Aff] > 1 and T™{[i] = T[n] 
for Afi] = 1. Without lack of gener- 


ality, one can assume that 7 ends with 
T[n] = $ with $ being distinct symbol 
smaller than other symbols in 7. Then 


T 1] = T[n — 1]. 

A property of the BWT is that symbols having 
the same context (1.e., string following them in 
T) are consecutive in T>“'. This makes it easy 
to compress T>™' achieving space close to high- 
order empirical entropies [14]. 

Ferragina and Manzini [3] discovered a way to 
combine the compressibility of the BWT and the 
indexing properties of the suffix array. The struc- 
ture is essentially a compressed representation of 
the BWT plus some small additional structures to 
make it searchable. 

To retrieve the whole text from the structure 
(that is, to support display(1,7)), it is enough 
to invert the BWT. For this purpose, let 
us consider a table LF{[1,n] defined such 
that if T[i] is permuted to T>'[j] and 
Ti =Tae Tf] then P| = 7. It 
is then immediate that T can be retrieved 
backwards by printing $ - T°“[1] - T>“'[L F [I] - 
TOOLF LE UM ccs 

To represent array LF space-efficiently, Fer- 
ragina and Manzini noticed that each LF'[i] can 
be expressed as follows: 


Lemma 2 (Ferragina) and Manzini  [3]) 
LF{i] = C(c) + rank, (i), where c = T>'[i], 
C(c) tells how many times symbols smaller than 
c appear in T*™ and rank,(i) tells how many 
times symbol c appears in T*™{1, i]. 


It was later observed that L F is in fact the inverse 
of W. 

It also happens that the very same two- 
part expression of LFIi] enables efficient 
count(P) queries. The idea is that if one knows 
the range of the suffix array, say Al[sp;, epi], 
such that the suffixes T[A[sp;],n], T[A[sp; + 
1],n],..., T[A[epi],n] are the only ones con- 
taining P [i,m] as a prefix, then one can compute 
the new range A[sp;—1, ep;—1] where the suffixes 
contain P[i — 1,m] as a prefix, as follows: 
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spi-) = C(P{i — 1)) + rankppi—-1 (spi — 1) + 1 
and epj-1 = C(P{[i — 1]) + rank pyj_1) (epi). It 
is then enough to scan the pattern backwards and 
compute values C() and rank, () 2m times to find 
out the (possibly empty) range of the suffix array 
where all the suffixes start with the complete P. 
Returning ep; — sp; + | solves the count(P) 
query without the need of having the suffix array 
available at all. 

For locating each such occurrence Ali], 
Spy < i < epi, one can compute the 
sequence i, LF{i], LF[LF[i]], ..., until 
LF*[i] is a sampled suffix array position; 
sampled positions can be marked in a bit 
vector B such that B[LF*[i]] = 1 indi- 
cates that samples|rank,;(B,LF*[i])]| = 
A[LF*|i]], where samples is a compact 
array storing the sampled suffix array val- 
ues. Then Afi] = A[LF*[i]] + k = 
samples|[rank;(B, LF*{i])] + k. A similar 
structure can be used to support display(Z, 7). 

Values C() can be stored trivially in a table of 
o log, n bits. T° [i] can be computed in O(c) 
time by checking for which c is rank,(i) 4 
rank,(i — 1). The suffix array sampling rate 
can be chosen as s = O(log'*£n) so that the 
samples require o(7) bits. The real challenge is 
to preprocess the text for rank,() queries. The 
original proposal builds several small partial sum 
data structures on top of the compressed BWT, 
and achieves the following result: 


Theorem 5 (Ferragina and Manzini_ [3]) 
The FM-Index (FMI) is a self-index of size 
5n Hx, + o(n logo) bits that supports count(P) 
in O(m) time, locate(P) in O(o log!*€ n) time 
per occurrence, and display(i, 7) in O(o(j —i+ 
log!t€ n)) time. Here o = o(logn/loglogn), 
k < log,(n/logn) — w(1), and € > 0 is an 
arbitrary constant. 


The original FM-Index has a severe restriction 
on the alphabet size. This has been removed in 
follow-up works. Conceptually, the easiest way 
to achieve a more alphabet-friendly instance of 
the FM-index is to build a wavelet tree [9] on 
Tt. It allows one to simulate a single rank, () 
query or to obtain T>'[i] in O(logo) time. 
Some later enhancements have improved the time 
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requirement, so as to obtain, for example, the 
following result: 


Theorem 6 (Maikinen and Navarro [11]) 
The CSA problem can be solved using a so- 
called Succinct Suffix Array (SSA), of size 
nHo + o(nlogo) bits that supports count(P) 
in O(m(1 + logo/loglogn)) time, locate(P) 
in O(log!t€n (1 + loga/loglogn)) time per 
occurrence, and display(i,j) in O(j —i + 
log't€n)(1 + logo/loglogn)) time. Here 
o = o(n) ande > Ois an arbitrary constant. 


Ferragina et al. [4] developed a technique 
called compression boosting that finds an optimal 
partitioning of T>“' such that, when one com- 
presses each piece separately using its zero-order 
model, the result is proportional to the k-th order 
entropy. It was observed in [10] that a fixed block 
partitioning achieves the same result. 

Compression boosting can be combined with 
the idea of SSA by building a wavelet tree sep- 
arately for each piece and some additional struc- 
tures in order to solve global rank, () queries from 
the individual wavelet trees: 


Theorem 7 (Ferragina et al. [5]) The CSA 
problem can be solved using a_ so-called 
Alphabet-Friendly FM-Index (AF-FMI), of 
size nH, + o(nlogo) bits, with the same 
time complexities and restrictions of SSA with 
k < alog,n, for any constant0 <a <1. 


A careful analysis [12] reveals that the space of 
the plain SSA is bounded by the same nH; + 0 
(n logo) bits, making the boosting approach to 
achieve the same result unnecessary in theory. 
By plugging a better wavelet tree implementa- 
tion [7], the space of Theorem 7 can be improved 
ton Hy + o(n) bits. 

The wavelet tree is space-efficient, but it can- 
not operate in time better than O(1 + inshen): 
To achieve better performance, some other tech- 
niques must be used. One example, is the follow- 
ing fastest FM-index with dominant term n Hx in 
the space. 


Theorem 8 (Belazzougui and Navarro [1]) 
The CSA problem can be solved using an index 
of sizen Hx + o(n logo), that supports count(P ) 
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in O(m) time, locate(P) in O(log, n log logn) 
time per occurrence, and display(i, j) in 
O((Vj —i) + log, n log logn) withk < alog, n, 
O(n), and a is any constant such that 
0<a<l. 


Oo = 


Cross-References 


Burrows- Wheeler Transform 

Rank and Select Operations on Bit Strings 
Rank and Select Operations on Sequences 
Suffix Trees and Arrays 

Wavelet Trees 


Recommended Reading 


1. Belazzougui D, Navarro G (2011) 
independent compressed text indexing. 
Saarbriicken, pp 748-759 

2. Burrows M, Wheeler D (1994) A block sorting 
lossless data compression algorithm. Technical report 
124, Digital Equipment Corporation 

3. Ferragina P, Manzini G (2005) Indexing compressed 
texts. J ACM 52(4):552-581 

4. Ferragina P, Giancarlo R, Manzini G, Sciortino M 
(2005) Boosting textual compression in optimal 
linear time. J ACM 52(4):688-713 

5. Ferragina P, Manzini G, Miakinen V, Navarro G 
(2007) Compressed representations of sequences and 
full-text indexes. ACM Trans Algorithms 3(2):20 

6. Foschini L, Grossi R, Gupta A, Vitter JS (2006) When 
indexing equals compression: experiments with com- 
pressing suffix arrays and applications. ACM Trans 
Algorithms 2(4):61 1-639 

7. Golynski A, Raman R, Srinivasa Rao S (2008) On 
the redundancy of succinct data structures. In: SWAT, 
Gothenburg, pp 148-159 

8. Grossi R, Vitter J (2006) Compressed suffix arrays 
and suffix trees with applications to text indexing and 
string matching. SIAM J Comput 35(2):378-407 

9. Grossi R, Gupta A, Vitter J (2003) High-order 
entropy-compressed text indexes. In: Proceedings of 
the 14th annual ACM-SIAM symposium on discrete 
algorithms (SODA), Baltimore, pp 841-850 

10. Karkkéinen J, Puglisi SJ (2011) Fixed block com- 
pression boosting in fm-indexes. In: SPIRE, Pisa, 
pp 174-184 

11. Miékinen V, Navarro G (2005) Succinct suffix arrays 
based on run-length encoding. Nord J Comput 
12(1):40-66 

12. Makinen V, Navarro G (2008) Dynamic entropy- 
compressed sequences and full-text indexes. ACM 
Trans Algorithms 4(3):32 


Alphabet- 
In: ESA, 


390 


13. Manber U, Myers G (1993) Suffix arrays: a new 
method for on-line string searches. SIAM J Comput 
22(5):935—948 

14. Manzini G (2001) An analysis of the Burrows- 
Wheeler transform. J ACM 48(3):407-430 

15. Navarro G, Makinen V (2007) Compressed full-text 
indexes. ACM Comput Surv 39(1): Article 2 

16. Sadakane K (2003) New text indexing functionali- 
ties of the compressed suffix arrays. J Algorithms 
48(2):294—-313 


Compressed Suffix Trees 


Luis M.S. Russo 

Departamento de Informatica, Instituto Superior 
Técnico, Universidade de Lisboa, Lisboa, 
Portugal 

INESC-ID, Lisboa, Portugal 


Keywords 


Compressed index; Data compression; Enhanced 
suffix array; Longest common prefix; Range 
minimum query; Succinct data structure; Suffix 
link 


Years and Authors of Summarized 
Original Work 


2007; Sadakane 

2009; Fischer, Makinen, Navarro 
2010; Ohlebusch, Fischer, Gog 
2011; Russo, Navarro, Oliveira 


Problem Definition 


The problem consists in representing suffix trees 
in main memory. The representation needs to 
support operations efficiently, using a reasonable 
amount of space. 

Suffix trees were proposed by Weiner in 
1973 [16]. Donald Knuth called them the 
“Algorithm of the Year.” Their ubiquitous nature 
was quickly perceived and used to solve a myriad 
of string processing problems. The downside 
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of this flexibility was the notorious amount of 
space necessary to keep it in main memory. A 
direct implementation is several times larger 
than the file it is indexing. Initial research into 
this matter discovered smaller data structures, 
sometimes by sacrificing functionality, namely, 
suffix arrays [6], directed acyclic word graphs [5], 
or engineered solutions. 

A suffix tree is obtained from a sequence of 
characters T by considering all its suffixes. The 
suffixes are collated together by their common 
prefixes into a labeled tree. This means that 
suffixes that share a common prefix are united 
by that prefix and split only when the common 
prefix ends, i.e., in the first letter where they 
mismatch. A special terminator character $ is 
placed at the end to force these mismatches for 
the small suffixes of 7’. The string depth of a node 
is the number of letters between the node and the 
root. Figure 1 shows the suffix tree of the string 
abbbab. 

A viable representation needs to support sev- 
eral operations: tree navigation, such as finding a 
parent node, a child node, or a sibling node; label- 
ing, such as reading the letters along a branch or 
using a letter to choose a child node; and indexing 
operations, such as determining a leaf’s index or 
a node’s string depth. 


Compressed Suffix Trees, Fig. 1 Suffix tree of string 
abbbab, with the leaves numbered. The arrow shows the 
SLINK between nodes ab and b. Below we show the 
suffix array. The portion of the tree corresponding to node 
b and respective leaves interval is within a dashed box 
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Besides the usual tree topology, suffix trees 
contain suffix links. For a given node, the suffix 
link points to a second node. The string from 
the root to the second node can be obtained by 
removing the first letter of the string from the root 
to the first node. For example, the suffix link of 
node ab is node b. See Fig. 1. 

An acceptable lower bound to represent a 
DNA sequence is 27 bits, i.e., n logo bits (we 
use logarithms in base 2) for a text of size n 
with an alphabet of size o. The size of the suffix 
array for such a sequence, on a 32-bit machine, 
is 16 times bigger than this bound. A space- 
engineered suffix tree would be 40 times larger. 
Succinct data structures are functional represen- 
tations whose space requirements are close to the 
space needed to represent the data (i.e., close to 
2n bits, in our example). If we consider the order- 
k statistical compressibility of the text, then the 
information-theory lower bound is even smaller, 
nH, + o(n) bits, where Hx is the empirical 
entropy of the text. Such representations require 
data compression techniques, which need to be 
made functional. The prospect of representing 
suffix trees within this space was significant. 


Objective 
Obtain a representation of a suffix tree that re- 
quires n logo + o(n) bits, or even nH; + o(n) 
bits, or close, and that supports all operations 
efficiently. 


Key Results 


Classical Results 

Suffix arrays [6] are a common alternative to 
suffix trees. They do not provide the same set of 
operations or the same time bounds, and still they 
require only 5 bytes per text character as opposed 
to the >10 bytes of suffix trees. 

The suffix array stores the lexicographical or- 
der of the suffixes of 7. Figure 1 shows the suffix 
array SA of our running example, i.e., the suffixes 
in the suffix tree are lexicographically ordered. 
Suffix arrays lack node information. Still they 
can represent nodes as an interval of suffixes, for 
example, node b corresponds to the interval [3, 6]. 
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This mapping is injective, i.e., no two nodes 
can map to the same interval. Hence, a given 
interval corresponds to no more than one node. 
Some intervals do not correspond to any node, 
for example, [4, 6] does not correspond to a node 
on the suffix tree. To determine which intervals 
are invalid and speed up navigation operations, 
the suffix array can be augmented with longest 
common prefix (LCP) information. It is enough 
to store the length of the LCP between consecu- 
tive suffixes. For an arbitrary pair of suffixes, the 
LCP value can be computed as a range minimum 
query (RMQ) on the corresponding leaf interval. 
For example, LCP(3,5) can be computed as the 
minimum of 1,1,2. The suffix array enhanced 
with LCP information [2] can now be used to 
emulate several algorithms that required suffix 
tree navigation. 

Another approach is to reduce suffix tree re- 
dundancy, by factoring repeated structures. The 
following lemma is used to build directed acyclic 
word graphs. 


Lemma 1 /f the sub-trees rooted at nodes v and 
v’, of the suffix tree of T, have the same number 
of leaves and v' is the suffix link of v, then the 
sub-trees are isomorphic. 


Succinct Results 

A fundamental component of compressed suf- 
fix trees is the underlying compressed index, 
which provides the suffix array information. Ad- 
ditionally these indexes provide support for w 
and LF, which are used to compute suffix links 
and backward search. In fact, yw is the equiv- 
alent to suffix links over the suffix array, ie., 
SA[W(@)] = SA[i] + 1. Moreover, LF is the 
inverse of w, ie., LF(W(i)) = W(LF(@)) = i. 
There is a wide variety of these indexes, depend- 
ing on the underlying data compression tech- 
nique, namely, the Burrows-Wheeler transform, 
6-coding, or Lempel-Ziv. For a complete survey 
on these indexes, consult the survey by Navarro 
and Makinen [7]. For our purposes, we consider 
the following index. 


Theorem 1 For a string T, over an alphabet 
of size polylogn, there exists a suffix array 
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representation that requires nH; + o(n) bits and 
computes w, LF and retrieves letters T |SA{i]] 
in O(1) time, while it obtains values SA[i] in 
O(log n log logn) time. 


Sadakane was the first to combine a com- 
pressed suffix array with a succinct tree and a 
succinct representation of string depth informa- 
tion. The combination of these three ingredients 
leads to the first succinct representation of suffix 
trees, which set the basic structure for later devel- 
opments. 


Theorem 2 (({14]) There is a compressed suffix 
tree representation that requires nH, +6n+o(n) 
bits and supports child in O(log? n log logn), 
string depth and edge letters in O(log n log log n) 
time, and the remaining operations in O(1) time. 


The 6n term is composed of 4n for the 
succinct tree and 2n to store LCP values. 
A tree can be represented succinctly as a 
sequence of balanced parentheses. A_ suffix 
tree contains at most 2m — 1 nodes; since 
each node requires exactly 2 parentheses, this 
accounts for the 4n parcel. For example, the 
parentheses representation of the tree in Fig. 1 
is ( (0) ((1) (2)) ((3) (4) (5) (6) ))); 
the numbers correspond to the leaf indexes 
and are not part of the representation; only the 
parentheses are necessary. These parentheses 
are encoded with bits, set to 0 or 1, respec- 
tively. 

We refer to the position of T in the suffix 
array as ft, i.e., SA[t] = 0, in our running 
example t = 2. A technique used to store the 
SA values is to use the relation SA[y(i)] = 
SA[i] + 1. This means that, if w is supported, 
we can store the SA[y’/(t)] values, with ] = 
logn log logn, and obtain the missing values in 
at most / steps. The resulting SA values re- 
quire only n/loglogn bits to store. To encode 
the LCP of internal nodes, Sadakane used that 
LCP(W(@), W@) + 1) = LCP(, i + 1)—1. Hence 
LcP(w* (i), w*(i) + 1) + k forms an increasing 
sequence of numbers, which can be encoded in 
at most 2n bits such that any element can be 
accessed with a “select” operation on the bits 
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(find the position of the jth 1, which can be 
solved in constant time with an o(n)-bit extra 
index). Hence computing LCP requires deter- 
mining k and subtracting it from the number 
in the sequence; this can be achieved with SA. 
Subsequent research focused on eliminating the 
6n term. Fischer, Makinen, and Navarro obtained 
a smaller representation. 


Theorem 3 ([4]) There is a compressed suffix 
tree representation which, for any constant € > 0, 
requires nHx(2log(1/Hx) + (1/e) + OC)) + 
o(nloga) bits and supports all operations in 
O(log‘ n) time, except level ancestor queries 
(LAQ) which need O(log!*€ n) time. 


This bound is obtained by compressing the 
differential LCP values in Sadakane’s represen- 
tation and discarding the parentheses representa- 
tion. Instead it relies exclusively on range min- 
imum queries over the LCP values. The next 
smaller value (NSV) and previous smaller value 
(PSV) operations are presented to replace the 
need to find matching closing or opening paren- 
theses. Later, Fischer [3] further improved the 
speed at which € vanishes. 

Russo, Navarro, and Oliveira [13] ob- 
tained the smallest representation by show- 
ing that LCA(SLINK(v), SLINK(v’)) = 
SLINK(LCA(v, v’)) holds for any nodes v and v’. 
LCA(v, v’) means the lowest common ancestor 
of nodes v and v’, and SLINK(v) means the 
suffix link of node v. This relation generalizes 
Lemma |. 


Theorem 4 ({13]) There is a compressed suffix 
tree representation that requires n Hx + o(n) bits 
and supports child in O(log n(log logn)*) and 
the other operations in time O(log n log logn). 


The reduced space requirements are obtained 
by storing information about a few sampled 
nodes. Information for the remaining nodes is 
computed with the property above. Only the 
sampled nodes are stored in a parentheses repre- 
sentation and moreover string depth is stored only 
for these nodes. Although Theorem 4 obtains 
optimal space, the logarithmic time is significant, 
in theory and in practice. This limitation was 
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recently improved by Navarro and Russo [9] with 
a new approach to compare sampled nodes. They 
obtain O(polyloglog n) time for all operations 
within the same space, except for child, which 
retains the previous time bound. 


Applications 


There is a myriad of practical applications for suf- 
fix trees. An extensive description can be found 
in the book by Gusfield [5]. Nowadays, most 
applications are related to bioinformatics, due to 
the string-like nature of DNA sequences. Most of 
these problems can be solved with compressed 
suffix trees and in fact can only be computed 
in reasonable time if the ever-increasing DNA 
database can be kept in main memory with suffix 
tree functionality. 


Experimental Results 


The results we have described provided a firm 
ground for representing suffix trees efficiently 
in compressed space. Still the goal was not a 
theoretical endeavor, since these data structures 
play a center role in the analysis of genomic 
data, among others. In practice, several aspects of 
computer architecture come into play, which can 
significantly impact the resulting performance. 
Different approaches can sacrifice space optimal- 
ity to be orders of magnitude faster than the 
smaller variants. 

Abeliuk, Canovas, and Navarro [1] presented 
an exhaustive experimental analysis of existing 
CSTs. They obtained practical CSTs by 
implementing the PSV and NSV_ operations 
of Fischer, Makinen, and Navarro [4], with a 
range min-max tree [10]. Their CSTs covered 
a wide range in the space and time spectrum. 
Roughly, they need 8-12 bits per character (bpc) 
and perform the operations in microseconds. 
Further, practical variants considered reducing 
the 6n term of Sadakane’s representation by 
using a single data structure that simultaneously 
provides RMQ/PSV/NSV. Ohlebusch and Gog 
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[11] used only 2” + o(n) bits, obtaining around 
10-12bpce and operations in microseconds. 
Ohlebusch, Fischer, and Gog [12] also used this 
approach to obtain the same time performance of 
Theorem 2, within 3n extra bits instead of 6n. 
This yields around 16 bpc and operations running 
in micro- to nanoseconds. An implementation 
of Theorem 4 [13] needed around 4bpc 
but queries required milliseconds, although 
better performance is expected for the new 
version [9]. 

The proposal in [1] is also designed to adapt 
efficiently to highly repetitive sequence collec- 
tions, obtaining 1-2 bpc and operations in mil- 
liseconds. Also for repetitive texts, Navarro and 
Ordo6fiez [8] obtained a speedup to microsecond 
operations, with a slight space increase, 1-3 bpc, 
by representing the parenthesis topology explic- 
itly but in grammar-compressed form. 


URLs to Code and Data Sets 


An implementation of Sadakane’s compressed 
suffix tree [14] is available from the SuDS 
group at http://www.cs.helsinki.fi/group/suds/ 
cst/. Implementation details and engineering 
decisions are described by V&limaki, Gerlach, 
Dixit, and Makinen [15]. The compressed suffix 
tree of Abeliuk, Canovas, and Navarro [1] is 
available in the libcds library at https://github. 
com/fclaude/libcds. 

An alternative implementation of Sadakane’s 
CST is available in the Succinct Data Structure 
Library at https://github.com/simongog/sdsl- lite. 
The SDSL also contains an implementation of the 
CST++ [12]. 

The Pizza and Chili site contains a large and 
varied dataset to test compressed indexes, http:// 
pizzachili.dcc.uchile.cl. 
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Problem Definition 


Given a text string T = tyt2...t, over an al- 
phabet & of size o, the compressed text in- 
dexing (CTI) problem asks to replace T with 
a space-efficient data structure capable of effi- 
ciently answering basic string matching and sub- 
string queries on 7. Typical queries required from 
such an index are the following: 


* count(P): count how many times a given 
pattern string P = py, p2... Pm occurs in T. 

¢ locate(P ): return the locations where P occurs 
in T. 

¢ display(i, j): return T [i, /]. 


Key Results 


An elegant solution to the problem is obtained 
by exploiting the connection of Burrows-Wheeler 
Transform (BWT) [1] and Suffix Array data struc- 
ture [9]. The suffix array SA[1,n] of T is the 
permutation of text positions (1...7) listing the 
suffixes T|i,n] in lexicographic order. That is, 
T[SA[i],] is the ith smallest suffix. The BWT 


Compressed Text Indexing 


is formed by (1) a permutation 7°“ of T defined 
as Ti] = T[SA[i] — 1], where T[0] = T[n], 
and (2) the number i* = SA~![1]. 

A property of the BWT is that symbols having 
the same context (i.e., string following them in 
T) are consecutive in 7°“'. This makes it easy 
to compress 7°“ achieving space close to high- 
order empirical entropies [10]. On the other hand, 
the suffix array is a versatile text index, allowing 
for example O(mlogn) time counting queries 
(using two binary searches on SA) after which one 
can locate the occurrences in optimal time. 

Ferragina and Manzini [3] discovered a way to 
combine the compressibility of the BWT and the 
indexing properties of the suffix array. The struc- 
ture is essentially a compressed representation of 
the BWT plus some small additional structures to 
make it searchable. 

We first focus on retrieving arbitrary sub- 
strings from this compressed text representation, 
and later consider searching capabilities. To re- 
trieve the whole text from the structure (that 
is, to support display(1,n)), it is enough to in- 
vert the BWT. For this purpose, let us consider 
a table LF[1,n] defined such that if 7T[i] is 
permuted to T>“'[j] and T[i —1] to T™{j’] 
then LF[j] = j’. It is then immediate that T 
can be retrieved backwards by printing T>“'[i*] - 
TULF [i*]]- T™ [LF[LF[i*]]]... (by defini- 
tion T>™'[i*] corresponds to T [n]). 

To represent array LF space-efficiently, Fer- 
ragina and Manzini noticed that each LF'[i] can 
be expressed as follows: 


Lemma 1 (Ferragina and Manzini [3]) 
LF{i] = C(c) + rank, (i), where c = T™ i], 
C(c) tells how many times symbols smaller than c 
appear in T°” and rank-(i) tells how many times 
symbol c appears in T™'[1, i]. 


General display(i, j) queries rely on a regular 
sampling of the text. Every text position of the 
form j’-s, being s the sampling rate, is stored 
together with SA~![j’ - s], the suffix array posi- 
tion pointing to it. To solve display(i, j) we start 
from the smallest sampled text position j’-s > j 
and apply the BWT inversion procedure starting 
with SA~![j’- 5s] instead of i°. This gives the 


395 


characters in reverse order from j’-s — 1 to i, 
requiring at most 7 —i + s steps. 

It also happens that the very same two-part 
expression of LF[i] enables efficient count(P) 
queries. The idea is that if one knows the 
range of the suffix array, say SA[sp;, epi], such 
that the suffixes T[SA[sp;],n], T[SA[sp; + 
1],n],..., T[SA[ep;],n] are the only ones con- 
taining P[i,m] as a prefix, then one can compute 
the new range SA[sp;—1,epi—1] where the suf- 
fixes contain P[i — 1, m] as a prefix, as follows: 
spi-1 = C(P[i — 1]) + rank pg_-1(sp; — 1) + 1 
and epj—1 = C(P[i — 1]) + rank py_1\(ep;). It 
is then enough to scan the pattern backwards and 
compute values C() and rank, () 2m times to find 
out the (possibly empty) range of the suffix array 
where all the suffixes start with the complete P. 
Returning ep; — sp; +1 solves the count(P) 
query without the need of having the suffix array 
available at all. 

For locating each such occurrence SA[i], 
Spy < i < epi, one can compute the 
sequence i, LF{i], LF[LFi{i]], ..., until 
LF*{i] is a sampled suffix array position and 
thus it is explicitly stored in the sampling 
structure designed for display(i, j) queries. Then 
SAfi] = SA[LF*[i]] +k. As we are virtually 
moving sequentially on the text, we cannot do 
more than s steps in this process. 

Now consider the space requirement. 
Values C() can be stored trivially in a table 
of olog,n bits. Ti] can be computed 
in O(o) time by checking for which c is 
rank, (i) # rank-(i — 1). The sampling rate 
can be chosen as s = O(log!t€n) so that the 
samples require o(7) bits. The only real challenge 
is to preprocess the text for rank, () queries. This 
has been a subject of intensive research in recent 
years and many solutions have been proposed. 
The original proposal builds several small partial 
sum data structures on top of the compressed 
BWT, and achieves the following result: 


Theorem 2 (Ferragina and Manzini [3]) The 
CTI problem can be solved using a so-called FM- 
Index (FMD), of size 5n Hx + o(n logo) bits, that 
supports count(P) in O(m) time, locate(P) 
in O(clog!t€n) time per occurrence, and 
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display(i, j) in O(o(j —i + log!*€n)) time. 
Here Hy is the k th order empirical entropy of 
T, o = o(logn/loglogn), k < log, (n/logn), 
—w(1), and € > 0 is an arbitrary constant. 


The original FM-Index has a severe restriction 
on the alphabet size. This has been removed in 
follow-up works. Conceptually, the easiest way 
to achieve a more alphabet-friendly instance of 
the FM-index is to build a wavelet tree [5] on 
7°“. This is a binary tree on © such that each 
node v handles a subset S(v) of the alphabet, 
which is split among its children. The root 
handles & and each leaf handles a single symbol. 
Each node v encodes those positions 7 so that 
T™{i] © S(v). For those positions, node v 
only stores a bit vector telling which go to the 
left, which to the right. The node bit vectors are 
preprocessed for constant time rank,() queries 
using o(n)-bit data structures [6, 12]. Grossi 
et al. [4] show that the wavelet tree built using 
the encoding of [12] occupies n Hp + o(n logo) 
bits. It is then easy to simulate a single 
rank,() query by log, o rank,() queries. With 
the same cost one can obtain T>[i]. Some 
later enhancements have improved the time 
requirement, so as to obtain, for example, the 
following result: 


Theorem 3 (Makinen and Navarro [7]) The 
CTI problem can be solved using a so-called Suc- 
cinct Suffix Array (SSA), of sizen Ho + o(n logo) 
bits, that supports count(P) in O(m(1 + loga/ 
log logn)) time, locate(P) in O(log'** n loga/ 
loglogn) time per occurrence, and display(i, 
j)in O(j —i + log'*€ n) loga/ log logn) time. 
Here Ho is the zero-order entropy of T, o = o(n), 
and € > O is an arbitrary constant. 


Ferragina et al. [2] developed a_ technique 
called compression boosting that finds an 
optimal partitioning of T°“ such that, when one 
compresses each piece separately using its zero- 
order model, the result is proportional to the kth 
order entropy. This can be combined with the 
idea of SSA by building a wavelet tree separately 
for each piece and some additional structures in 
order to solve global rank,() queries from the 
individual wavelet trees: 
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Theorem 4 (Ferragina et al. [4]) The CTI 
problem can be solved using a_ so-called 
Alphabet-Friendly FM-Index (AF-FMI), of 
size nH, +o(nlogo) bits, with 
time complexities and restrictions of SSA with 
k <alog,n, for any constant0 <a <1. 


the same 


A very recent analysis [8] reveals that the space 
of the plain SSA is bounded by the same 
nH, + o(nlogo) bits, making the boosting 
approach to achieve the same result unnecessary 
in theory. In practice, implementations of [4, 7] 
are superior by far to those building directly on 
this simplifying idea. 


Applications 


Sequence analysis in Bioinformatics, search and 
retrieval on oriental and agglutinating languages, 
multimedia streams, and even structured and tra- 
ditional database scenarios. 


URL to Code and Data Sets 


Site Pizza-Chili http://pizzachili.dcc.uchile.cl or 
http://pizzachili.di.unipi.it contains a collection 
of standardized library implementations as well 
as data sets and experimental comparisons. 
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Problem Definition 


The problem is, given a tree, to encode it com- 
pactly so that basic operations on the tree are 
done quickly, preferably in constant time for 
static trees. Here, we consider the most basic 
class of trees: rooted ordered unlabeled trees. 
The information-theoretic lower bound for rep- 
resenting an n-node ordered tree is 2n — o(n) 
bits because there are (a) /n different trees. 
Therefore, the aim is to encode an ordered tree 
in 2n + o(n) bits including auxiliary data struc- 
tures so that basic operations are done quickly. 
We assume that the computation model is the 
O(log n)-bit word RAM, that is, memory access 
for consecutive O(log) bits and arithmetic and 
logical operations on two @(log)-bit integers 
are done in constant time. 


Preliminaries 

Let X be a string on alphabet A. The number of 
occurrences of c € Ain X[1...i] is denoted by 
rank,(X,i), and the position of j-th c from the 
left is denoted by select.(j) (with select.(0) = 
0). For binary strings (|A| = 2) of length n, rank 
and select are computed in constant time using 
an n + o(n)-bit data structure [4]. Let us define 
for simplicity prev,(i) = select.(rank,(i — 1)) 
and next, (i) = select, (rank, (i) + 1) the position 
of the c preceding and following, respectively, 
position i in X. 


Key Results 


Basically, there are three representations of or- 
dered trees: LOUDS (Level-Order Unary Degree 
Sequence) [11], DFUDS (Depth-First Unary De- 
gree Sequence) [2], and BP (Balanced Parenthe- 
sis sequence) [16]. An example is shown in Fig. 1. 
All these representations are succinct, that is, of 
2n + o(n) bits. However, their functionality is 
slightly different. 
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Compressed Tree Representations, Fig. 1 Succinct 
representations of trees 


LOUDS 

In LOUDS representation, the degree of each 
node is encoded by a unary representation, that 
is, a node with d children is encoded in d ones, 
followed by a zero. Codes for the nodes are stored 
in level order: the root node is encoded first, then 
its children are encoded from left to right, all the 
nodes at depth 2 are encoded next, and so on. 
Let L be the LOUDS representation of a tree. 
Then L is a 0,1-string of length 2n — 1. Tree 
navigation operations are expressed by rank and 
select operations. The i-th node in level order is 
represented by selecto(i — 1) + 1. The operations 
isleaf (i), parent(i), first_child(i), last_child(i), 
next_sibling(i), | prev_sibling(i), | degree(i), 
child(i,q), and child_rank(i) (see Table 1) are 
computed in constant time using rank and select 
operations, for example, degree(i) = nexto(i — 
1)—i, child(i, q) = selecto(rank,(i—1+q))+1, 
and parent(i) = prevg (select, (ranko(i—1)))+1. 
However, because nodes are stored in level 
order, other operations cannot be done efficiently, 
such as depth(i), subtree_size(i), Ica(i, j), etc. 
A merit of the LOUDS representation is that 
in practice it is fast and simple to implement 
because all the operations are done by only rank 
and select operations. 
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BP Representation 

In BP representation, the tree is traversed in 
depth-first order, appending an open parenthesis 
“(? to the sequence when we reach a node and a 
closing parenthesis “‘)” when we leave it. These 
parentheses are represented with the bits | and 0, 
respectively. The result is a sequence of 27” paren- 
theses that is balanced: for any open (resp. close) 
parenthesis, there is a matching close (resp. open) 
parenthesis to the right (resp. left), so that the 
areas between two pairs of matching parentheses 
either nest or are disjoint. Each node is identified 
with the position of its open parenthesis. 

Munro and Raman [16] showed how to im- 
plement operations findclose(i), findopen(i), and 
enclose(i) in constant time and 2n + o(n) bits in 
total. Later, Geary et al. [8] considerably simpli- 
fied the solutions. With those operations and rank 
and select support, many operations in Table | are 
possible. For example, pre_rank(i) = rank,(i), 
pre_select(j) = select,(j), isleaf (i) iff there is 
a0 at position? + 1, isancestor(i, j)ifi < j < 
findclose(i), depth(i) = rank,(i) — ranko(i), 
parent(i) = enclose(i), first_child(i) = 
i + 1, next_sibling(i) =  findclose(i) + 1, 
subtree_size(i) = (findclose(i) — i + 1)/2, ete. 
Operations Ica, height(i), and deepest_node(i) 
could be added with additional structures for 
range minimum queries, rmqi(i,j), in o(n) 
further bits [20]. Some other operations can 
be also supported in constant time by adding 
different additional structures. Lu and Yeh [14] 
gave o(n)-bit data structures for degree(i), 
child(i,q), and child_rank(i). Geary et al. [9] 
gave o(n)-bit data structures for LA(i,d). 
However, these extra structures are complicated 
and add considerably extra space in practice. 


DFUDS 

In DFUDS representation, nodes are also en- 
coded by a unary representation of their degrees, 
but stored in depth-first order. The bits 1 and 
0 are interpreted as open parenthesis “(’ and 
close parenthesis “)’, respectively. By adding a 
dummy open parenthesis at the beginning, the 
DFUDS sequence becomes balanced. Each node 
is identified with the first position of its unary 
description. 
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Compressed Tree Representations, Table 1 
Operations supported by the data structure of [19]. The 
time complexities are for the dynamic case; in the static 
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case, all operations take O(1) time. The first group, of 
basic operations, is used to implement the others, but 
could have other uses 


Operation Description Time complexity 
Variant | Variant 2 
inspect(i) P{i] On ) 
findclose(i) | findopen(i) Position of parenthesis matching P [i] 
enclose(i) Position of tightest open parent. enclosing i 
rmqi(i, J) / RMQi(i, J) Position of min/max excess value in range [/, /'] 
pre_rank(i) / post_rank(i) Preorder/postorder rank of node i One . a) 
pre_select(i) / post_select(i) The node with preorder/postorder i 
isleaf (i) Whether P [7] is a leaf 
isancestor(i, j) Whether 7 is an ancestor of / 
depth(i) Depth of node i 
parent(1) Parent of node i 
first_child(i) / last_child(i) First/last child of node 7 
next_sibling(i) / prev_sibling(i) | Next/previous sibling of node i 
subtree_size(i) Number of nodes in the subtree of node i 
Ica(i, j) The lowest common ancestor of two nodes 7, j 
deepest_node(i) The (first) deepest node in the subtree of i 
height(i) The height of / (distance to its deepest node) 
in_rank(i) Inorder of node i 
in_select(i) Node with inorder i 
leaf_rank(i) Number of leaves to the left of leaf i 
leaf _select(i) i-th leaf 
Imost_leaf (i) / rmost_leaf (i) Leftmost/rightmost leaf of node i 
LA(i, ad) Ancestor j of i s.t. depth(j) = depth(i) — d O(log n) 
level_next(i) / level_prev(i) Next/previous node of 7 in BFS order 
level_lmost(d) / level_rmost(d) | Leftmost/rightmost node with depth d 
degree(i) q = number of children of node i oes ) O(log n) 
child(i, q) q-th child of node i 
child_rank(i) q = number of siblings to the left of node 7 
insert(i, J) Insert node given by matching parent at i and j OnE 7 7a) O(log n) 


delete(i) Delete node i 


A merit of using DFUDS is that it retains 
many operations supported in BP, but degree(i), 
child(i,q), and child_rank(i) are done in 
constant time by using only the basic operations 
supported by Munro and Raman [16]. For 
example, degree(i) = nexto(i — 1) — i, 
child(i,q) = findclose(i + degree(i) — q) + 1, 
and parent(i) = prevo(findopen(i — 1)) + 1. 
Operation /ca(i, j) can also be computed using 
rmqi(i, j ) [12]. 


Another important feature of DFUDS is that 
some trees can be represented in less than 2n 
bits. The number of ordered trees having nj; 
nodes with degree i (§ = O,1,...,n — 1) is 
(nomen) Hf LisotiG — 1) = —1 and 0 
otherwise (i.e., there are no trees satisfying the 
condition). Jansson et al. [12] proposed a com- 
pression algorithm for DFUDS sequences that 


achieves the lower bound lg Cn), + 
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o(n) bits. For example, a full binary tree, that is, 
a tree in which all internal nodes have exactly two 
children, is encoded in n bits. 

A demerit of DUFDS is that computing 
depth(i) and LA(i,d) is complicated, though 
it is possible in constant time [12]. 


Fully Functional BP Representation 

Navarro and Sadakane [19] proposed a data 
structure for ordered trees using the BP 
representation [16]. The data structure is called 
range min-max tree. Let P be the BP sequence. 
For each entry P [i] of the sequence, we define its 
excess value as the number of open parentheses 
minus the number of close parentheses in P[1, i]. 
If P[i] is an open parenthesis, its excess value 
is equal to depth(i). Let E[i] denote the excess 
value for P[i]. With the range min-max tree and 
small additional data structures, they support in 
constant time the operations fwd_search(i,d) = 
mintt{y > i/E[j] = Eli] + d} U tn + Uh, 
bwd_search(i,d) = max{{j <i|E[j] = E[i]+ 
d} U {O}}, and rmqi(i, j) = argmin;<,<; Ek]. 
Then, the basic operations can be expressed as 
findclose(i) = fwd_search(i, —1), findopen(i) = 
bwd_search(i,0) + 1, and enclose(Zi) = 
bwd_search(i,—2) + 1. In addition to those 
operations and rank/select, other operations 
can be computed, such as LA(i,d) = 
bwd_search(i, —d — 1) + 1. Operation child(i, q) 
and related ones are also done in constant time 
using small additional structures. The results are 
summarized as follows: 


Theorem 1 ({19]) For any ordinal tree with n 
nodes, all operations in Table I except insert and 
delete are carried out in constant time O(c) with 
a data structure using 2n + O(n/ log‘ n) bits 
of space on a O(logn)-bit word RAM, for any 
constant c > 0. The data structure can be con- 
structed from the balanced parenthesis sequence 
of the tree, in O(n) time using O(n) bits of space. 


Another merit of using the range min-max tree 
is that it is easy to dynamize. By using a balanced 
tree, we obtain the following: 


Theorem 2 ([19]) Ona O(log n)-bit word RAM, 
all operations on a dynamic ordinal tree with 
n nodes can be carried out within the worst- 
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case complexities given in Table 1, using a data 
structure that requires 2n + O(n log logn/ log n) 
bits. Alternatively, all the operations of the table 
can be carried out in O(logn) time using 2n + 
O(n/ logn) bits of space. 


The time complexity O(logn/ log log n) is opti- 
mal [3]. 


Other Representations 

There are other representations of ordered trees 
and other types of trees [6,7, 10]. The idea is that 
the entire tree is partitioned into mini-trees using 
the tree cover technique [10], and mini-trees are 
again partitioned into micro-trees. 


Applications 


There are many applications of succinct ordered 
trees because trees are fundamental data struc- 
tures. A typical application is compressed suf- 
fix trees [20]. Balanced parentheses have also 
been used to represent planar and k-page graphs 
[16]. We also have other applications to encoding 
permutations and functions [17], grammar com- 
pression [15], compressing BDDs/ZDDs (binary 
decision diagrams/zero-suppressed BDDs) [5], 
etc. 


Open Problems 


An open problem is to give a dynamic data 
structure supporting all operations in the optimal 
O(log n/ log log n) time. 


Experimental Results 


Arroyuelo et al. [1] implemented LOUDS and the 
static version of the fully functional BP repre- 
sentation. LOUDS uses little space (as little as 
2.1n bits) and solves its operations in half a mi- 
crosecond or less, but its functionality is limited. 
The range min-max tree [19] requires about 2.4n 
bits and can be used to represent both BP and 
DFUDS. It solves all the operations within 1—2 
microseconds. Previous implementations [8, 18] 
require more space and are generally slower. 
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Problem Definition 


Trees are a fundamental structure in computing. 
They are used in almost every aspect of mod- 
eling and representation for computations like 
searching for keys, maintaining directories, and 
representations of parsing or execution traces, 
to name just a few. One of the latest uses of 
trees is XML, the de facto format for data stor- 
age, integration, and exchange over the Internet 
(see http://www.w3.org/XML/). Explicit storage 
of trees, with one pointer per child as well as 
other auxiliary information (e.g., label), is often 
taken as given but can account for the dominant 
storage cost. Just to have an idea, a simple tree en- 
coding needs at least 16 bytes per tree node: one 
pointer to the auxiliary information (e.g., node 
label) plus three node pointers to the parent, the 
first child, and the next sibling. This large space 
occupancy may even prevent the processing of 
medium-sized trees, e.g., XML documents. This 
entry surveys the best-known storage solutions 
for unlabeled and labeled trees that are space 
efficient and support fast navigational and search 
operations over the tree structure. In the literature, 
they are referred to as succinct/compressed tree- 
indexing solutions. 


Notation and Basic Facts 

The information-theoretic storage cost for any 
item of a universe U can be derived via a sim- 
ple counting argument: at least log |U| bits 
are needed to distinguish any two items of U. 
(Throughout the entry, all logarithms are taken 
to the base 2, and it is assumed Olog 0 = 0.) 
Now, let 7 be a rooted tree of arbitrary degree 
and shape, and consider the following three main 
classes of trees: 


Ordinal trees 7Jis unlabeled and its children are 
left-to-right ordered. The number of ordinal 
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trees on ¢ nodes is C; = ic) /(t + 1) 
which induces a lower bound of 2t — O(log £) 
bits 

Cardinal k-ary Trees 7 is labeled on its edges 
with symbols drawn from the alphabet £ = 
{l,...,k}. Any node has degree at most k 
because the edges outgoing from each node 
have distinct labels. Typical examples of car- 
dinal trees are the binary tree (k = 2), the 
(uncompacted) trie, and the Patricia tree. The 
number of k-ary cardinal trees on f nodes is 
Oh = e va / (kt + 1) which induces 
a lower bound of t(logk + log e) — O(log kt) 
bits, when & is a slowly growing function 
of t 

(Multi-)labeled trees J is an ordinal tree, labeled 
on its nodes with symbols drawn from the 
alphabet 2. In the case of multi-labeled trees, 
every node has at least one symbol as its label. 
The same symbols may repeat among sibling 
nodes, so that the degree of each node is 
unbounded, and the same labeled subpath may 
occur many times in 7, anchored anywhere. 
The information-theoretic lower bound on the 
storage complexity of this class of trees on ¢ 
nodes comes easily from the decoupling of the 
tree structure and the storage of tree labels. 
For labeled trees, it is log C; + tlog|Z| = 
t(log |X| + 2) — O(log r) bits 


The following query operations should be sup- 
ported over 7: 


Basic navigational queries They ask for the par- 
ent of a given node u, the ith child of u, 
and the degree of u. These operations may be 
restricted to some label c € L, if J is labeled 

Sophisticated navigational queries They ask for 
the jth level-ancestor of u, the depth of u, the 
subtree size of u, the lowest common ancestor 
of a pair of nodes, and the ith node accord- 
ing to some node ordering over 7, possibly 
restricted to some label c € & (if 7 is labeled). 
For even more operations, see [2, 14] 

Subpath query Given a labeled subpath TT, it asks 
for the (number occ of) nodes of 7 that im- 
mediately descend from every occurrence of 
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TT in 7. Each subpath occurrence may be 
anchored anywhere in the tree (i.e., not nec- 
essarily in its root) 


The elementary solution to the tree-indexing 
problem consists of encoding the tree 7 via a 
mixture of pointers and arrays, thus taking a total 
of O(tlog ¢) bits. This supports basic naviga- 
tional queries in constant time, but it is not space 
efficient and requires visiting the whole tree to 
implement the subpath query or the more sophis- 
ticated navigational queries. Here, the goal is to 
design tree storage schemes that are either suc- 
cinct, namely, “close to the information-theoretic 
lower bound” mentioned before or compressed 
in that they achieve “entropy-bounded storage.” 
Furthermore, these storage schemes do not re- 
quire the whole visit of the tree for most nav- 
igational operations. Thus, succinct/compressed 
tree-indexing solutions are distinct from simply 
compressing the input and then uncompressing it 
at query time. 

In this entry, it is assumed that ¢ > |L|. The 
model of computation used is the random access 
machine (RAM) with word size O(log t), where 
one can perform various arithmetic and bit-wise 
Boolean operations on single words in constant 
time. 


Key Results 


The notion of succinct data structures was intro- 
duced by Jacobson [13] in a seminal work over 
25 years ago. He presented a storage scheme for 
ordinal trees using 2¢ + o(t) bits that supports ba- 
sic navigational queries in O(log log f) time (i.e., 
parent, first child, and next sibling of a node). 
Later, Munro and Raman [16] closed the issue 
for ordinal trees on basic navigational queries 
and the subtree-size query by achieving constant 
query time and 2¢ + o(t) bits of storage. Their 
storage scheme is called balanced parenthesis 
(BP) representation (some papers [Chiang et al., 
ACM-SIAM SODA ’01; Sadakane, ISAAC ’01; 
Munro et al., J.ALG ’01; Munro and Rao, ICALP 
’04] have extended BP to support in constant time 
other sophisticated navigational queries like LCA, 
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node degree, rank/select on leaves and number 
of leaves in a subtree, level-ancestor and level- 
successor) representation. Subsequently, Benoit 
et al. [3] proposed a storage scheme called depth- 
first unary degree sequence (shortly, DFUDS) 
that still uses 2t + o(t) bits but performs more 
navigational queries like 7th child, child rank, and 
node degree in constant time. Geary et al. [10] 
gave another representation still taking optimal 
space that extends DFUDS’s operations with the 
level-ancestor query. 

Although these three representations achieve 
the optimal space occupancy, none of them sup- 
ports every existing operation in constant time: 
e.g., BP does not support ith child and child rank 
and DFUDS and Geary et al.’s representation 
do not support LCA. Later, Jansson et al. [14] 
extended the DFUDS storage scheme in two 
directions: (1) they showed how to implement 
in constant time all navigational queries above 
and (the BP representation and the one of Geary 
et al. [10] have been recently extended to support 
further operations-like depth/height of a node, 
next node in the same level, rank/select over 
various node orders-still in constant time and 
2t + o(t) bits see [11] and references therein) 
(2) they showed how to compress the new tree 
storage scheme up to H*(7), which denotes the 
entropy of the distribution of node degrees in 7. 


Theorem 1 (Jansson et al. [14]) For any rooted 
(unlabeled) tree T with t nodes, there exists 
a tree-indexing scheme that uses tH*(T) + 
O(t(log log t)?/ log t) bits and supports all nav- 
igational queries in constant time. 


This improves the standard tree pointer-based 
representation, since it needs no more than 
H*(T) bits per node and does not compromise 
the performance of sophisticated navigational 
queries. Since it is H*(7) < 2, this solution 
is also never worse than BP or DFUDS, but its 
improvement may be significant! This result can 
be extended to achieve the kth-order entropy 
of the DFUDS sequence, by adopting any 
compressed-storage scheme for strings (see, e.g., 
[7] and references therein). 

Further work in the area of succinct ordinal 
tree representations came in the form of (i) a 
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uniform approach to succinct tree representations 
[5] that simplified and extended the representa- 
tion of Geary et al., (ii) a universal represen- 
tation [6] that emulates all three representations 
mentioned above, and (iii) a fully functional rep- 
resentation [18] that obtains a simplified ordinal 
tree encoding with reduced space occupancy. 

Benoit et al. [3] extended the use of DFUDS 
to cardinal trees and proposed a tree-indexing 
scheme whose space occupancy is close to the 
information-theoretic lower bound and supports 
various navigational queries in constant time. 
Raman et al. [19] improved the space by using 
a different approach (based on storing the tree as 
a set of edges), thus proving the following: 


Theorem 2 (Raman et al. [19]) For any k-ary 
cardinal tree T with t nodes, there exists a 
tree-indexing scheme that uses log ce + o(t) + 
O(log logk) bits and supports in constant time 
the following operations: finding the parent, the 
degree, the ordinal position among its siblings, 
the child with label c, and the ith child of a node. 


The  subtree-size operation cannot be 
supported efficiently using this representation, 
so [3] should be resorted to in case this operation 
is a primary concern. 

Despite this flurry of activity, the fundamental 
problem of indexing labeled trees succinctly 
has remained mostly unsolved. In fact, the 
succinct encoding for ordered trees mentioned 
above might be replicated |Z| times (once for 
each symbol of £), and then the divide-and- 
conquer approach of [10] might be applied to 
reduce the final space occupancy. However, the 
final space bound would be 2¢ + tlog|Z| + 
O(t|Z|(log log log t)/(loglogt)) bits, which is 
nonetheless far from the information-theoretic 
storage bound even for moderately large 2. On 
the other hand, if subpath queries are of primary 
concern (e.g., XML), one can use the approach of 
[15] which consists of a variant of the suffix-tree 
data structure properly designed to index all 7’s 
labeled paths. Subpath queries can be supported 
in O(|IT|log|Z| + occ) time, but the required 
space would still be O(¢ log?) bits (with large 
hidden constants, due to the use of suffix trees). 
Subsequently, some papers [1,2, 8, 12] addressed 
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this problem in its whole generality by either 
dealing simultaneously with subpath and basic 
navigational queries [8] or by considering multi- 
labeled trees and a larger set of navigational 
operations [1, 2, 12]. 

In particular, [8] introduced a transform of 
the labeled tree 7, denoted xbw[T ], which lin- 
earizes it into two coordinated arrays (Sjast, Sw): 
the former capturing the tree structure and the 
latter keeping a permutation of the labels of 7. 
xbw[T ] has the optimal (up to lower-order terms) 
size of 2t + tlog|2| bits and can be built and 
inverted in optimal linear time. In designing the 
XBW-transform, the authors were inspired by the 
elegant Burrows-Wheeler transform for strings 
[4]. The power of xbw[7] relies on the fact 
that it allows one to transform compression and 
indexing problems on labeled trees into easier 
problems over strings. Namely, the following 
two string-search primitives are key tools for 
indexing xbw[T ]:rank,(S, i) returns the number 
of occurrences of the symbol c in the string 
prefix S[1,i], and select,(S, j) returns the po- 
sition of the jth occurrence of the symbol c in 
string S. The literature offers many time-/space- 
efficient solutions for these primitives that could 
be used as a black box for the compressed index- 
ing of xbw[T] (see, e.g., [2, 17] and references 
therein). 


Theorem 3 (Ferragina et al. [8]) Consider a 
tree T consisting of t nodes labeled with symbols 
drawn from alphabet x. There exists a com- 
pressed tree-indexing scheme that achieves the 
following performance: 


¢ If|Z| = O(polylog(t)), the index takes at most 
tHo(Sq) + 2t + o(t) bits and supports ba- 
sic navigational queries in constant time and 
(counting) subpath queries in O(|I]|) time. 

¢ For any alphabet , the index takes less than 
tHy(Sa) + 2t + o(t log|>>|)) bits, but 
label-based navigational queries and (count- 
ing) subpath queries are slowed down by a 
factor o((log log |Z|)3). 


Here, Hj,(s) is the kth-order empirical entropy of 
string s, with H;(s) < Hy_1(s) for anyk > 0. 
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Since Hx(Sv) < Ho(Sw) log|>— |, the in- 
dexing of xbw[T ] takes at most as much space as 
its plain representation, up to lower-order terms, 
but with the additional feature of being able to 
navigate and search 7 efficiently. This is indeed 
a sort of pointerless representation of the labeled 
tree 7 with additional search functionalities (see 
[8] for details). 

If sophisticated navigational queries over 
labeled trees are a primary concern, and subpath 
queries are not necessary, then the approach 
of Barbay et al. [1, 2] should be followed. 
They proposed the novel concept of succinct 
index, which is different from the concept of 
succinct/compressed encoding implemented 
by all the above solutions. A succinct index 
does not touch the data to be indexed; it just 
accesses the data via basic operations offered 
by the underlying abstract data type (ADT) 
and requires asymptotically less space than 
the information-theoretic lower bound on the 
storage of the data itself. The authors reduce 
the problem of indexing labeled trees to the one 
of indexing ordinal trees and strings and the 
problem of indexing multi-labeled trees to the 
one of indexing ordinal trees and binary relations. 
Then, they provide succinct indexes for strings 
and binary relations. In order to present their 
result, the following definitions are needed. Let 
m be the total number of symbols in 7, let t¢ 
be the number of nodes labeled c in 7, and let 
P c be the maximum number of labels c in any 
rooted path of 7 (called the recursivity of c). 
Define p as the average recursivity of 7, namely, 


p= (1/m) y (tc Pc). 
ce 


Theorem 4 (Barbay et al. [1]) Consider a tree 
JT consisting of t nodes (multi-)labeled with 
possibly many symbols drawn from alphabet 
2X. Let m be the total number of symbols in 
JT, and assume that the underlying ADT for 
T offers basic navigational queries in constant 
time and retrieves the ith label of a node in 
time f. There is a succinct index for T using 
mi(log p + o(log(|Z|))) bits that supports for 
a given node u the following operations (where 


L = log log | )) | log log log | >? |): 
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¢ Every c-descendant or c-child of u can be 
retrieved in O(L (f + loglog|>~ |)) time. 

¢ The set A of c-ancestors of u can be retrieved 
in O(L (f +log log| }/|)) +|A|(log log pe + 
logloglog| )7|(f + log log| )71))) time. 


More recently, He et al. [12] obtained new repre- 
sentations that support a much broader collection 
of operations than the ones mentioned above. 


Applications 


As trees are ubiquitous in many applications, this 
section concentrates just on two examples that, 
in their simplicity, highlight the flexibility and 
power of succinct/compressed tree indexes. 

The first example regards suffix trees, which 
are a crucial algorithmic block of many string 
processing applications — ranging from bioinfor- 
matics to data mining, from data compression 
to search engines. Standard implementations of 
suffix trees take at least 80 bits per node. The 
compressed suffix tree of a string S[1,s] consists 
of three components: the tree topology, the string 
depths stored into the internal suffix-tree nodes, 
and the suffix pointers stored in the suffix-tree 
leaves (also called suffix array of S). The succinct 
tree representation of [14] can be used to encode 
the suffix-tree topology and the string depths 
taking 45+ 0(s) bits (assuming w.Lo.g. that || = 
2). The suffix array can be compressed up to the 
kth-order entropy of S via any solution surveyed 
in [17]. The overall result is never worse than 80 
bits per node, but can be significantly better for 
highly compressible strings. 

The second example refers to the XML for- 
mat which is often modeled as a labeled tree. 
The succinct/compressed indexes in [1, 2, 8] are 
theoretical in flavor but turn out to be relevant 
for practical XML processing systems. As an 
example, [9] has published some encouraging 
experimental results that highlight the impact of 
the XBW-transform on real XML datasets. The au- 
thors show that a proper adaptation of the XBW- 
transform allows one to compress XML data up 
to state-of-the-art XML-conscious compressors 
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and to provide access to its content, navigate up 
and down the XML tree structure, and search 
for simple path expressions and substrings in a 
few milliseconds over MBs of XML data, by 
uncompressing only a tiny fraction of them at 
each operation. Previous solutions took several 
seconds per operation! 


Open Problems 


For recent results, open problems, and further 
directions of research in the general area of suc- 
cinct tree representation, the interested reader is 
referred to [2, 11, 14, 18] and references therein. 
Here, we describe two main problems, which 
naturally derive from the discussion above. 

Motivated by XML applications, one may like 
to extend the subpath search operation to the 
efficient search for all leaves of T whose labels 
contain a substring 6 and that descend from a 
given subpath IT. The term “efficient” here means 
in time proportional to |IT| and to the number of 
retrieved occurrences, but independent as much 
as possible of 7’s size in the worst case. Cur- 
rently, this search operation is possible only for 
the leaves which are immediate descendants of IT, 
and even for this setting, the solution proposed in 
[9] is not optimal. 

There are two main encodings for trees which 
lead to the results above: ordinal tree representa- 
tion (BP, DFUDS or the representation of Geary 
et al. [10]) and XBW. The former is at the base 
of solutions for sophisticated navigational opera- 
tions, and the latter is at the base of solutions for 
sophisticated subpath searches. Is it possible to 
devise one unique transform for the labeled tree 
J which combines the best of the two worlds and 
is still compressible? 


Experimental Results 


See http://mattmahoney.net/dc/text.html and at 
the paper [9] for numerous experiments on XML 
datasets. 
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Data Sets 


See http://mattmahoney.net/dc/text.html and the 
references in [9]. 


URL to Code 


Paper [9] contains a list of software tools for 
compression and indexing of XML data. 
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Problem Definition 


An n-symbol message M = (50,51,.--;S5n—1) 
is given, where each symbol s; is an integer 
in the range O < gs; < U. If the sj;s are 
strictly increasing, then M identifies an n-subset 
of {0,1,...,U — 1}. 


Objective To economically encode M as a bi- 
nary string over {0,1}. 


Constraints 


1. Short messages. The message length n may 
be small relative to U. 

2. Monotonic equivalence. Message M is con- 
verted to a strictly increasing message M’ over 
the alphabet U’ < Un by taking prefix sums, 
se=it Vino 5) and U’ = s’_, + 1. The 
inverse is to “take gaps,” g; = s; — sj_1 — 1, 
with go = So. 

3. Combinatorial_Limit. If 1 is monotonic 
then | log, ‘e < U bits are required in 


the worst case. Whenn < U, logy Ce) ~ 
n(log,(U/n) + log, e). 


Key Results 


Monotonic sequences can be coded in min{U, 
n(log,(U/n) + 2)} bits. Non-monotonic 
sequences can be coded in min{)-"9 (log,(1 + 
si) + o(log s;)),n(log,(U'/n) + 2)} bits. 


Unary and Binary Codes 

The unary code represents symbol x as x 1-bits 
followed by a single 0-bit. The unary code for x 
is 1 + x bits long; hence, the corresponding ideal 
symbol probability distribution (for which this 
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pattern of codeword lengths yields the minimal 
message length) is given by py = 27+), 
Unary is an infinite code, for which knowledge of 
U is not required. But unless M is dominated by 
small integers, unary is expensive — the represen- 
tation of a message M = (50...5n—1) requires 
n+ 0, s; = U' + 1 bits. 

IfU < 2* for integer k, then s; can be rep- 
resented in k bits using the binary code. Binary 
is finite, with an ideal probability distribution 
given by py = 27-*. When U = 2%, the 
ideal probability p, = 2~'824 = 1/U. When 
2-1 =< U < 2%, then 2* — U of the codewords 
can be shortened to k — 1 bits, in a minimal binary 
code. It is usual (but not necessary) to assign the 
short codewords to 0...2* — U — 1, leaving the 
codewords for 2 —U...U —1ask bits. 


Elias Codes 

Peter Elias described a suite of hybrids between 
unary and binary in work published in 1975 
[7]. This family of codes are defined recursively, 
with unary being the base method. To code a 
value x, the “predecessor” Elias code is used 
to specify x’ = |log,(1 + x)], followed by 
an x’-bit binary code for x — (2 — 1). The 
second member of the Elias family is C) and is 
a unary-binary code: unary for the prefix part and 
then binary to indicate the value of x within the 
range it specifies. The first few C, codewords 
are 0, 100, 101, 11000, and so on, where 
spaces are used to illustrate the split between 
the components. The C,, codeword for a value x 
requires 1 + |log,(1 + x)| bits for the unary part 
and a further | log, (1+ x)| bits for the binary part. 
The ideal probability distribution is thus given 
by px ~ 1/(2(1 + x)?). After C,, the next 
member of the Elias family is Cs, in which C, 
is used to store x’. The first few codewords are 
0, 1000, 1001, and 10100; like unary and 
Cy, Cs is infinite, but now the codeword for x 
requires 1 +2|log,(1+x’)| +.’ bits, where x’ = 
[log,(1 + x)]|. Further members of the family 
can be generated, but for most practical purposes, 
Cs is the last useful one. To see why, note that 
|Cy(x")| < |Cs(x’)| whenever x’ < 30, meaning 
that the next Elias code is shorter than Cs only for 
values x’ > 31, that is, for x > 23! — 1. 
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Fibonacci-Based Codes 

Another infinite code arises from the Fibonacci 
sequence described (for this purpose) as Fy = 1, 
Fy => 2, Fo = 3, F3 = 5, F4 = 8, and 
in general F; = Fj-1 + Fj—2. The Zeckendorf 
representation of a natural number is a list of 
Fibonacci values that add up to it, such that no 
two adjacent Fibonacci numbers are used. For 
example, 9 is the sum of 1+ 8 = Fo + F%4. 
The Fibonacci code for x > 0 is derived from the 
Zeckendorf representation of x + | and consists 
of a 1 bit in the ith position (counting from the 
left) if F; appears in the sum and a 0 bit if not. 
Because it is not possible for both F; and Fj+1 
to be part of the sum, the last 2 bits must be 
01; hence, appending a further 1 bit provides 
a unique sentinel for the codeword. The code 
for x = 0 is 11, and the next few codewords 
are 011, 0011, 1011, 00011, and 10011. 
Because (for large i) F; ~ o'+?//5, where 
@ = (14+ V5)/2 & 1.62, the codeword for 
x requires approximately |log, V5 + logg( + 
x)V/5| ~ [1.67 + 1.44log,(1 + x)| bits and is 
longer than Cy only for x = 0 and x = 2. The 
Fibonacci code is also as good as, or better than, 
Cs between x = | and Fig —2 = 6,763. Higher- 
order variants are also possible, with increased 
minimum codeword lengths and decreased co- 
efficients on the logarithmic term. Fenwick [8] 
provides coverage of Fibonacci codes. 


Byte-Aligned Codes 

Extracting bits from bitstrings can slow down 
decoding rates, especially if each bit is then tested 
in a loop guard. Operations on larger units tend to 
be faster. The simplest byte-aligned code is an in- 
terleaved 8-bit analog of the Elias C, mechanism. 
The top bit in each byte is reserved for a flag that 
indicates (when 0) that this is the last byte of the 
codeword and (when 1) that the codeword contin- 
ues; the other 7 bits in each byte are for data. A 
total of 8[ (log, x)/7] bits are used, which makes 
it more effective asymptotically than the Elias 
C, code or the Fibonacci code. However, the 
minimum 8 bits means that byte-aligned codes 
are expensive on messages dominated by small 
values. A further advantage of byte codes is that 
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compressed sequences can be searched, using a 
search pattern converted using the same code [5]. 
The zero top bit in all final bytes means that false 
matches can be easily eliminated. 

An improvement to the simple byte-aligned 
coding mechanism arises from the observation 
that different values for the separating value be- 
tween the stopper and continuer bytes lead to dif- 
ferent trade-offs in overall codeword lengths [3]. 
In the (S, C)-byte-aligned code, values for S + 
C = 256 are chosen, and each codeword consists 
of a sequence of zero or more continuer bytes 
with values greater than or equal to S, followed 
by a stopper byte with a value less than S. Other 
variants include methods that use bytes as the 
coding units to form Huffman codes, either using 
8-bit coding symbols or tagged 7-bit units [5] 
and methods that partially permute the alphabet, 
but avoid the need for a complete mapping [4]. 
Culpepper and Moffat [4] also describe a byte- 
aligned coding method that creates a set of byte- 
based codewords with the property that the first 
byte uniquely identifies the length of the code- 
word. Similarly, nibble codes can be designed as 
a 4-bit analog of the byte-aligned approach, with 
1 bit reserved for a stopper-continuer flag, and 
3 bits used for data. 


Golomb Codes 

In 1966 Solomon Golomb observed that the inter- 
vals between consecutive members of a random 
n-subset of the items 0... U—1 could be modeled 
by a geometric probability distribution py = 
p(i — p)*—", where p = n/U [10]. This proba- 
bility distribution implies that in a Golomb code, 
the representation for x +b should be | bit longer 
than the representation for x when (1 — p)? = 
0.5, that is, when b = log0.5/log(1 — p) » 
(In2)/p ~ 0.69U/n. Assuming a monotonic 
message, each s; is converted to a gap; then 
gi div b is coded in unary; and finally g; mod b 
is coded in minimal binary with respect to b. 
Like unary, the Golomb code is infinite. If the 
sequence is non-monotonic, then the values s; are 
coded directly using parameter b = 0.69U'/n. 
Each 1-bit that is part of a unary part spans b 
elements of U, meaning that there are at most 
|U/b| of them in total; and there are exactly 
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n O-bits in the unary parts. The minimal binary 
parts, one per symbol, take fewer than n[log, b] 
bits in total. Summing these components, and 
maximizing the cost over different values of n 
and U by assuming an adversary that forces 
the use of the first long minimal binary code- 
word whenever possible, yields a total Golomb 
code length for a monotonic sequence of at most 
n(log,(U/n) + 2) bits. The variant in which 
b = 2 is used, k = [log,(U/n)], is called a 
Rice code. Note that Golomb and Rice codes are 
infinite, but require that a parameter be set and 
that one way of estimating the parameter is based 
on knowing a value for U. 


Other Static Codes 

Elias codes and Golomb codes are examples 
of methods specified by a set of buckets, with 
symbol x coded in two parts: a bucket identifier, 
followed by an offset within the bucket, the 
latter usually using minimal binary. For example, 
the Elias C, code employs a vector of bucket 
sizes OF D1 0? 9394 .). Teuhola (see Moffat 
and Turpin [14]) proposed a hybrid in which a 
parameter k is chosen, and the vector of bucket 
sizes is given by (2*,2*+1,2*+2 9k+3 1) One 
way of setting the parameter k is the length in 
bits of the median sequence value, so that the first 
bit of each codeword approximately halves the 
range of observed symbol values. Another variant 
is described by Boldi and Vigna [2], using vector 
(2*—1, (2*—1)2*, (2k—1)2?*, (2*—-1)23*,...\ to 
obtain a family of codes that are analytically 
and empirically well suited to power-law 
probability distributions, especially (taking k in 
the range 2—4) those associated with web-graph 
compression. Fraenkel and Klein [9] observed 
that the sequence of symbol magnitudes (i.e., the 
sequence of values |log,(1 + s;)]) provides a 
denser range than the message itself and that it 
can be effective to use a principled code for them. 
For example, rather than using unary for the 
prefix part, a Huffman code can be used. Moffat 
and Anh [12] consider other ways in which the 
prefix part of each codeword can be reduced; and 
Fenwick [8] provides general coverage of other 
static coding methods. 
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Elias-Fano Codes 

In 1974 Elias [6] presented another coding 
method, noting that it was described indepen- 
dently in 1971 by Robert Fano. The approach is 
now known as Elias-Fano coding. For monotonic 
sequence M, parameter k = |log,(U/n)| is 
used to again break each codeword into quotient 
and remainder, without first taking gaps, with 
codewords formed relative to a sequence of 
buckets each of width b = 2*. The number 
of symbols in the buckets is stored in a bitstring 
of [U/b| < 2n unary codes. The m remainder 
parts r; = s; modb are stored as a sequence 
of k-bit binary codes. Each symbol contributes 
k bits as a binary part and adds 1 bit to one 
of the unary parts; plus there are at most 
[U/b] 0-bits terminating the unary parts. Based 
on these relationships, the total length of an 
Elias-Fano code can be shown to be at most 
n(log,(U/n) + 2) bits. Vigna [15] has deployed 
Elias-Fano codes to good effect. 


Packed Codes 
If each n-subset of 0...U — 1 is equally likely, 
the Golomb code is effective in the average case; 
and the Elias-Fano code is effective in the worst 
case. But if the elements in the subset are clus- 
tered, then it is possible to obtain smaller rep- 
resentations, provided that groups of elements 
themselves can be employed as part of the pro- 
cess of determining the code. The word-aligned 
codes of Anh and Moffat [1] fit as many binary 
values into each output word as possible. For 
example, in their Simple-9 method, 32-bit output 
words are used, and the first 4 bits of each word 
contain a selector which specifies how to decode 
the next 28 bits: one 28-bit binary number, or two 
14-bit binary numbers, or three 9-bit numbers, 
and so on. Other variants use 64-bit words [1]. In 
these codes, clusters of low s; (or g;) values can 
be represented more compactly than would occur 
with the Golomb code and an all-of-message b 
parameter; and decoding is fast because whole 
words are expanded without any need for condi- 
tionals or branching. 

Zukowski et al. [16] describe a different 
approach, in which blocks of z values from M 
are coded in binary using k bits each, where k is 


Compressing Integer Sequences 


chosen such that 2* is larger than most, but not 
necessarily all, of the z elements in the block. 
Any s;’s in the block that are larger than 2* — 1 
are noted as exceptions and handled separately; a 
variety of methods for coding the exceptions have 
been used in different forms of the pfordelta code. 
This mechanism is again fast when decoding, due 
to the absence of conditional bit evaluations, and, 
for typical values such as z = 128, also yields 
effective compression. Lemire and Boytsov have 
carried out detailed experimentation with packed 
codes [11]. 


Context-Sensitive Codes 

If the objective is to create the smallest repre- 
sentation, rather than provide a balance between 
compression effectiveness and decoding speed, 
the nonsequential binary interpolative code of 
Moffat and Stuiver [13] can be used. As an exam- 
ple, consider message M, shown in Table 1, and 
suppose that the decoder is aware that U’ = 29, 
that is, that s; < 29. Every item in M is greater 
than or equal to Jo = 0 and less than hi = 29, 
and the mid-value in M, in this example s4 = 6 
(it doesn’t matter which mid-value is chosen), can 
be transmitted to the decoder using [log, 29] = 
5 bits. Once that middle number is pinned, the 
remaining values can be coded recursively within 
more precise ranges and might require fewer than 
5 bits each. 

In fact, there are four distinct values in M that 
precede s/, and another five that follow it, so a 
more restricted range for s/, can be inferred: it 
must be greater than or equal to Jo’ = lo+4=4 
and less than hii = hi-—5 = 24. That is, 
$84 = 6 can be minimal binary coded as the value 
6 — lo’ = 2 within the range [0, 20) using just 4 
bits. 

It remains to transmit the left part, (0,3, 4, 5), 
against the knowledge that every value is greater 
than or equal to Jo = 0 and less than hi = 6, 
and the right part, (16,24, 26, 27,28), against 
the knowledge that every value is greater than 
or equal to Jo = 7 and less than hi = 29. 
These two sublists are processed recursively in 
the order shown in the remainder of Table 1, 
with the tighter ranges [lo’, hi’) also shown at 
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Compressing Integer Sequences, Table 1 code. Whena minimal binary code is applied to each value 


Example encodings of message M = in its corresponding range, a total of 20 bits are required. 
(0, 3,4, 5, 6, 16, 24, 26, 27, 28) using the interpolative No bits are output if lo’ = hi’ — 1 
i Sj s lo hi lo’ hi’ s; —lo’ hi’ — lo’ Binary MinBin 
4 0 6 0 29 4 24 2 20 00010 0010 
1 2 3 0 6 1 4 2 3 10 11 
0 0 0 0 3 0 0 3 00 0) 
2 0 4 4 6 4 5 0 1 -- == 
3 0 5 5 6 5 0 1 -- -- 
7 1 26 7 29 9 27 17 18 10001 1144 
5 9 16 7 26 7 25 9 18 01001 1001 
6 7 24 17 26 17 26 7 9 0111 1110 
8 1 27 27 29 27 28 0 1 -- -- 
9 0 28 28 29 28 29 0 1 -- -- 
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Problem Definition 


Recall that a simple digraph 7 is a tournament if 
for every two vertices u,v € V(T), exactly one 
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of the arcs (u,v) and (v,u) exists in 7. If we 
relax this condition by allowing both these arcs 
to exist at the same time, then we obtain the def- 
inition of a semi-complete digraph. We say that a 
digraph T contains a digraph H as a topological 
minor if one can map vertices of H to different 
vertices of 7, and arcs of H to directed paths 
connecting respective images of the endpoints 
that are internally vertex disjoint. By relaxing 
vertex disjointness to arc disjointness, we obtain 
the definition of an immersion. (For simplicity, 
we neglect here the difference between weak 
immersions and strong immersions, and we work 
with weak immersions only.) Finally, we say that 
T contains H as a minor if vertices of H can 
be mapped to vertex disjoint strongly connected 
subgraphs of T in such a manner that for every 
arc (u,v) of H, there exists a corresponding 
arc of JT going from a vertex belonging to the 
image of u to a vertex belonging to the image 
of v. 

The topological minor, immersion, and mi- 
nor relations form fundamental containment or- 
derings on the class of digraphs. Mirroring the 
achievements of the graph minors program of 
Robertson and Seymour, it is natural to ask what 
is the complexity of testing these relations when 
the pattern graph H is assumed to be small. For 
general digraphs, even very basic problems of 
this nature are NP-complete [5]; however, the 
structure of semi-complete digraphs allow us to 
design efficient algorithms. 

On semi-complete digraphs, the considered 
containment relations are tightly connected to 
digraph parameters cutwidth and pathwidth. For 
a digraph T and an ordering (v1, v2,...,Un) 
of V(T), by width of this ordering, we mean 
the maximum over 1 < t < n—1 of the 
number of arcs going from {v1,v2,...,v;} to 
{Up41, Ut42,+++,Un} in T. The cutwidth of a 
digraph 7, denoted by ctw(T), is the minimum 
width of an ordering of V(T). A path decom- 
position of a digraph T is a sequence P = 
(W,, W2,..., W,) of subsets of vertices, called 
bags, such that (i) 2 er W; = V(T), Gi) W; > 
W, OW, forall 1 <i < j < k <r, and 
(iii) whenever (u,v) is an edge of T, then u 
and v appear together in some bag of ? or all 
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the bags in which u appears are placed after all 
the bags in which v appears. The width of P is 
equal to max; <<; |W;| — 1. The pathwidth of 7, 
denoted by pw(7), is the minimum width of a 
path decomposition of 7’. 

It appears that if a semi-complete digraph 
T excludes some digraph H as an immersion, 
then its cutwidth is bounded by a constant cy 
depending on H only. Similarly, if T excludes H 
as a minor or as a topological minor, then its path- 
width is bounded by a constant py depending 
on H only. These Erdés-Pésa-style results were 
proven by Chudnovsky et al. [2] and Fradkin 
and Seymour [7], respectively. Based on this 
understanding of the links between containment 
relations and width parameters, it has been shown 
that immersion and minor relations are well- 
quasi-orderings of the class of semi-complete 
digraphs [1, 6]. 

The aforementioned theorems give also raise 
to natural algorithms for testing the containment 
relations. We try to approximate the appropriate 
width measure: If we obtain a guarantee that it 
is larger than the respective constant cy or py, 
then we can conclude that H is contained in T 
for sure. Otherwise, we can construct a decom- 
position of T of small width on which a dynamic 
programming algorithm can be employed. In fact, 
the proofs of Chudnovsky et al. [2] and Frad- 
kin and Seymour [7] can be turned into (some) 
approximation algorithms for cutwidth and path- 
width on semi-complete digraphs. Therefore, it is 
natural to look for more efficient such algorithms, 
both in terms of the running time and the approx- 
imation ratio. The efficiency of an approximation 
subroutine is, namely, the crucial ingredient of 
the overall running time for testing containment 
relations. 


Key Results 


As a by-product of their proof, Chudnovsky 
et al. [2] obtained an algorithm that, given 
an n-vertex semi-complete digraph T and an 
integer k, either finds an ordering of V(T) of 
width O(k?) or concludes that etw(T) > k by 
finding an appropriate combinatorial obstacle 
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embedded in T. The running time is O(n3). 
Similarly, a by-product of the proof of Fradkin 
and Seymour [7] is an algorithm that, for the 
same input, either finds a path decomposition of 
T of width O(k7) or concludes that pw(T) > k, 
again certifying this by providing an appropriate 
obstacle. Unfortunately, here the running time is 
O(n©)); in other words, the exponent of the 
polynomial grows with k. 

The proofs of the Erdés-Pésa statements pro- 
ceed as follows: One shows that if the found 
combinatorial obstacle is large enough, i.e., it 
certifies that ctw(T) > k or pw(T) > k for 
large enough k, then already inside this obstacle 
one can find an embedding of every digraph H 
of a fixed size. Of course, the final values of 
constants cy and py depend on how efficiently 
we can extract a model of H from an obstacle 
and, more precisely, how large k must be in terms 
of || in order to guarantee that an embedding of 
H can be found. (We denote |H| = |V(H)| + 
|E(H)|.) Unfortunately, in the proofs of Chud- 
novsky et al. [2] and Fradkin and Seymour [7], 
this dependency is exponential (even multiple ex- 
ponential in the case of py) and so is the overall 
dependency of constants cy and py on |H|. 
Using the framework presented before, one can 
obtain an f(|H|) - n3-time algorithm for testing 
whether H can be immersed into an n-vertex 
semi-complete digraph T, as well as similar tests 
for the (topological) minor relations with running 
time n&"4)_ Here, f and g are some multiple- 
exponential functions. 

The running time of the immersion testing al- 
gorithm is fixed-parameter tractable (FPT), while 
the running time for (topological) minor testing is 
only XP. It is natural to ask for an FPT algorithm 
also for the latter problem. Fomin and the current 
author [3,4,8] approached the issue from a differ- 
ent angle, which resulted in reproving the previ- 
ous results with better constants, refined running 
times, and also in giving FPT algorithms for all 
the containment tests. Notably, the framework 
seems to be simpler and more uniform compared 
to the previous work. We now state the results 
of [3,4, 8] formally, since they constitute the best 
known so far algorithms for topological problems 
in semi-complete digraphs. 
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Theorem 1 ([8]) There exists an algorithm that, 
given an n-vertex semi-complete digraph T, runs 
in time O(n) and returns an ordering of V(T) of 
width at most O(etw(T)?). 


Theorem 2 ([8, 9]) There exists an algorithm 
that, given an n-vertex semi-complete digraph T 
and an integer k, runs in time O(kn?) and either 
returns a path decomposition of V(T ) of width at 
most 6k or correctly concludes that pw(T) > k. 


Theorem 3 ([4, 8, 9]) There exist algorithms 
that, given an n-vertex semi-complete digraph T 
and an integer k, determine whether: 


© ctw(T) < k in time 20 E084) . 12, 
* pw(T) <k in time QO lk logk) , p72. 


Theorem 4 ([8]) There exist algorithms that, 
given a digraph H and an n-vertex semi- 
complete digraph T, determine whether: 


*  H canbe immersed inT in time 2044! ’8\HD. 
n?, 

¢ H is a topological minor of T in time 
QO |log|H|) , 2. 


* His aminor of T in time 2007 \’s lA . 72, 


Thus, Theorems | and 2 provide approxima- 
tion algorithms for cutwidth and pathwidth, The- 
orem 3 provides FPT algorithms for computing 
the exact values of these parameters, and The- 
orem 4 utilizes the approximation algorithms to 
give efficient algorithms for containment testing. 
We remark that the exact algorithm for cutwidth 
(the first bullet of Theorem 3) is a combination of 
the results of [8] (which gives a 2°) . n?-time 
algorithm) and of [4] (which gives a SOUP EMER). 
n° time algorithm). A full exposition of this 
algorithm can be found in the PhD thesis of the 
current author [9], which contains a compilation 
of [3, 4, 8]. Moreover, for Theorem 2 work [8] 
claims only a 7-approximation, which has been 
consequently improved to a 6-approximation in 
the aforementioned PhD thesis [9]. Finally, it 
also follows that in the Erdés-Pésa results, one 
can take cy = O(|H|?) and py = O(|H)); 
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this claim is not mentioned explicitly in [8], but 
follows easily from the results proven there. 

To conclude, let us shortly deliberate on the 
approach that led to these results. The key to 
the understanding is the work [8]. The main 
observation there is that a large cluster of ver- 
tices with very similar outdegrees is already an 
obstacle for admitting a path decomposition of 
small width. More precisely, if one finds 4k + 2 
vertices whose outdegrees pairwise differ by at 
most k (a so-called (4k + 2,k)-degree tangle), 
then this certifies that pw(T) > k; see Lemma 46 
of [9]. As it always holds that pw(T) < 2ctw(T), 
this conclusion also implies that ctw(T) > k/2. 
Therefore, in semi-complete digraphs of small 
pathwidth or cutwidth, the outdegrees of vertices 
must be spread evenly; there is no “knot” with a 
larger density of vertices around some value of 
the outdegree. If we then order the vertices of 
T by their outdegrees, then this ordering should 
crudely resemble an ordering with the optimal 
width, as well as the order in which the ver- 
tices appear on an optimal path decomposition 
of T. Indeed, it can be shown that any order- 
ing of V(T) w.r.t. nondecreasing outdegrees has 
width at most O(ctw(T)7) [8]. Hence, the al- 
gorithm of Theorem | is, in fact, trivial: We 
just sort the vertices by their outdegrees. The 
pathwidth approximation algorithm (Theorem 2) 
is obtained by performing a left-to-right scan 
through the outdegree ordering that constructs 
a path decomposition in a greedy manner. For 
the exact algorithms (Theorem 3), in the scan 
we maintain a dynamic programming table of 
size exponential in k, whose entries correspond 
to possible endings of partial decompositions for 
prefixes of the ordering. The key to improving 
the running time for cutwidth to subexponential 
in terms of k (shown in [4]) is relating the 
states of the dynamic programming algorithm to 
partition numbers, a sequence whose subexpo- 
nential asymptotics is well understood. Finally, 
the obstacles yielded by the approximation al- 
gorithms of Theorems | and 2 are more useful 
for finding embeddings of small digraphs H 
than the ones used in the previous works. This 
leads to a better dependence on || of constants 
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CH, PH in the Erdés-Pésa results, as well as 
of the running times of the containment tests 
(Theorem 4). 
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Problem Definition 


This problem concerns the construction of pure 
Nash equilibria (PNE) in a special class of atomic 
congestion games, known as the Parallel Links 
Game (PLG). The purpose of this note is to gather 
recent advances in the existence and tractability 
of PNE in PLG. 

THE PURE PARALLEL LINKS GAME. Let 
N = [n] (VK EN, [k] = {1,2,...,k}.) bea set 
of (selfish) players, each of them willing to have 
her good served by a unique shared resource 
(link) of a system. Let E = [m] be the set of all 
these links. For each link e € E, and each player 
i € N,let Dj,e(-) : R>o > Reo be the charging 
mechanism according to which link e charges 
player i for using it. Each player i € [n] comes 
with a service requirement (e.g., traffic demand, 
or processing time) W[i,e] > 0, if she is to be 
served by link e € E. A service requirement 
Wi, e] is allowed to get the value oo, to denote 
the fact that player i would never want to be 
assigned to link e. The charging mechanisms 
are functions of each link’s cumulative 
congestion. 

Any element o € F is called a pure strat- 
egy for a player. Then, this player is assumed 
to assign her own good to link e. A collection 
of pure strategies for all the players is called 
a pure strategies profile, or a configuration of 
the players, or a state of the game. 

The individual cost of player i wrt the profile 
0 is: 1C;(0) = Dio; (Y jeqntc; =o; VL. o,))- 
Thus, the Pure Parallel Links Game (PLG) 
is the game in strategic form defined as 
T =(N, (4; = E)ien, (1Ci)ienw), whose ac- 
ceptable solutions are only PNE. Clearly, an arbi- 
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trary instance of PLG can be described by the tu- 
ple (N, E, (W[i, elienjeez, (Di,e(-))ien,eck).- 

DEALING WITH SELFISH BEHAVIOR. The 
dominant solution concept for finite games in 
strategic form, is the Nash Equlibrium [14]. The 
definition of pure Nash Equilibria for PLG is the 
following: 


Definition 1 (Pure Nash Equilibrium) For any 
instance (NV, E,(W[i, e])ien,ecE(Di,e(-))ieNn,eeE) 
of PLG, a pure strategies profileo € E” isa Pure 
Nash Equilibrium (PNE in short), iff: Vi ¢€ 
N,Ve € E,ICj(0) = Dig, ( ~ 


Je[n]:o j =0; 


Wi. cil) = Die( Wie] + eae =e 
Wli.el). 


A refinement of PNE are the k- robust PNE, for 
n > k > 1 [9]. These are pure profiles for which 
no subset of at most k players may concurrently 
change their strategies in such a way that the 
worst possible individual cost among the movers 
is strictly decreased. 

QUALITY OF PURE EQUILIBRIA. In order to 
determine the quality of a PNE, a social cost 
function that measures it must be specified. 
The typical assumption in the literature of 
PLG, is that the social cost function is the 
worst individual cost paid by the players: 
Vo € E”, SC(o) = maxjen {IC;(o)} and Vp € 
(Am)", SC(pP) = peeHt (lien Piloi)) - 
maxjen{IC;(o)}. Observe that, for mixed 
profiles, the social cost is the expectation of 
the maximum individual cost among the players. 

The measure of the quality of an instance 
of PLG wrt PNE, is measured by the Pure 
Price of Anarchy (PPoA in short) [12]: 
PPoA = max {(SC(a))/OPT : o € E” is PNE} 
where OPT = mingegn{SC(o)}. 

DISCRETE DYNAMICS. Crucial concepts of 
strategic games are the best and better responses. 
Given a configuration o € E”, an improvement 
step, or selfish step, or better response of 
player i € N is the choice by i of a pure 
strategy a € E \ {o;}, so that player i would 
have a positive gain by this unilateral change 
(i.?e., provided that the other players maintain the 
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same strategies). That is, IC;(o) > IC;(o @; a) 
where, o @j a = (01,..., 07-1, @, 07 41,---, On). 
A best response, or greedy selfish step of player 
i, is any change from the current link 0; to 
an element a* € argmaxgeg{IC;(o ; a)}. 
An improvement path (aka a sequence of 
selfish steps [6], or an elementary step 
system [3]) is a sequence of configurations 
mw = (o(1),...,0(k)) such that 


V2 <r <k,di, € N,da, € E: 
[o(r) = o(r — 1) Gi, a] A [ICi, (o(7)) 
< 1¢;,(o(r7 — 1))]. 


A game has the Finite Improvement Property 
(FIP) iff any improvement path has finite length. 
A game has the Finite Best Response Property 
(FBRP) iff any improvement path, each step of 
whose is a best response of some player, has finite 
length. 

An alternative trend is to, rather than consider 
sequential improvement paths, let the players 
conduct selfish improvement steps concurrently. 
Nevertheless, the selfish decisions are no longer 
deterministic, but rather distributions over the 
links, in order to have some notion of a priori 
Nash property that justifies these moves. The 
selfish players try to minimize their expected 
individual costs this time. Rounds of concurrent 
moves occur until a posteriori Nash Property 
is achieved. This is called a selfish rerouting 
policy [4]. 


Subclasses of PLG 

[PLG,] Monotone PLG: The charging mecha- 
nism of each pair of a link and a player, is a non— 
decreasing function of the resource’s cumulative 
congestion. 

[PLG,] Resource Specific Weights PLG 
(RSPLG): Each player may have a different 
service demand from every link. 

[PLG3] Player Specific Delays PLG 
(PSPLG): Each link may have a different 
charging mechanism for each player. Some 
special cases of PSPLG are the following: 
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[PLG;,,] Linear Delays PSPLG: Every link 
has a (player specific) affine charging mecha- 
nism: Vi € N,Ve € E, Di.e(x) = diex + Die 
for some aj,¢ > 0 and bj,¢ > 0. 

[PLG3.,.; ] Related Delays PSPLG: Every 
link has a (player specific) non—uniformly related 
charging mechanism: Vi € N,Ve € E,W{i,e] 
= w; and Dj, ¢(x) = aj,ex for some dj,¢ > 0. 

[PLG.,] Resource Uniform Weights PLG 
(RUPLG): Each player has a unique ser- 
vice demand from all the resources. Ie, 
Vie N,Vee E,Wii,e] =we > 0. A_ special 
case of RUPLG is: 

[PLG,.,] Unweighted PLG: All the players 
have identical demands from all the links: Vi € 
N,Ve € E,Wii,e] = 1. 

[PLG;] Player Uniform Delays PLG (PU- 
PLG): Each resource adopts a unique charging 
mechanism, for all the players. That is, Vi € 
N,Ve € E, Die(x) = de(x). 

[PLG;.,] Unrelated Parallel Machines, or 
Load Balancing PLG (LBPLG): The links be- 
have as parallel machines. That is, they charge 
each of the players for the cumulative load as- 
signed to their hosts. One may think (wlog) that 
all the machines have as charging mechanisms 
the identity function. That is, Vi ¢ N,Ve ¢€ 
E, Die(x) = x. 

[PLG;.1.:] Uniformly Related Machines 
LBPLG: Each player has the same demand 
at every link, and each link serves players at 
a fixed rate. That is: Vi € N, We € E, Wii, e] = 
wi and Die(x) = cm Equivalently, service 
demands proportional to the capacities of the 
machines are allowed, but the identity function 
is required as the charging mechanism: Vi € 
N,Ve € E,Wii,e] = 5. and Dj,e(x) =x, 

[PLG;.1.1.1] Identical Machines LBPLG: 
Each player has the same demand at every 
link, and all the delay mechanisms are the 
identity function: Vi € N,Ve € E,Wli,e] = 
w; and Dj.e(x) = x. 

[PLG; ..2] Restricted Assignment LBPLG: 
Each traffic demand is either of unit or 
infinite size. The machines are identical. 
Ie, Wie N,Vee E,Wii,e] € {1,co} and 
Dj e(x) = x. 
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Algorithmic Questions Concerning PLG 
The following algorithmic questions are consid- 
ered: 


Problem 1 (PNEExistsInPLG(E, N, W, D)) 
InpuT: An instance (N, E,(W{i, e]ien ecr, 
(Die()ienece) of PLG 

OUTPUT: Is there a configuration o € E” of the 
players to the links, which is a PNE? 


Problem 2 (PNEConstructionInPLG(E,N,W,D)) 
INPUT: An instance (N, E,(W[i, e]l)ien.cek, 

(Di,e (-))ien,ecE) of PLG 

OUTPUT: An assignment o € E” of the players 

to the links, which is a PNE. 


Problem 3 (BestPNEInPLG(E, N, W, D)) 
INPUT: An instance (N, E,(W[i, el)ien.cek, 
(Di,e(-))ien,ecE) Of PLG. A social cost function 
SC: (Rs0)”'> Rso that characterizes the 
quality of any configuration o €¢ EN. 

OUTPUT: An assignment o € E” of the players 
to the links, which is a PNE and minimizes the 
value of the social cost, compared to other PNE 
of PLG. 


Problem 4 (WorstPNEInPLG(E, N, W, D)) 
INPUT: An instance (N, E,(W[i, e]l)ien.cek, 
(Di.e(-))ien,eeE) Of PLG. A social cost function 
SC: (Rso0)” t+» Rso that characterizes the 
quality of any configuration o €¢ EN. 

OUTPUT: An assignment o € E” of the players 
to the links, which is a PNE and maximizes the 
value of the social cost, compared to other PNE 
of PLG. 


Problem 5 (DynamicsConvergeInPLG(E, N, 
W, D)) 

InpuT: An instance (N, E,(W{i, e]ien eer, 
(Di,e())ien,eez) Of PLG 

OUTPUT: Does FIP (or FBRP) hold? How long 
does it take then to reach a PNE? 


Problem 6 (ReroutingConvergeInPLG(E, N, 
W,D)) 

Input: An instance (N, E,(W[i, el)ien,ecE, 
(Di,e(-)ien,ecE) Of PLG 


418 


OUTPUT: Compute (if any) a selfish rerouting 
policy that converges to a PNE. 


Status of Problem 1 

Player uniform, unweighted atomic congestion 
games always possess a PNE [15], with no as- 
sumption on monotonicity of the charging mech- 
anisms. Thus, Problem | is already answered 
for all unweighted PUPLG. Nevertheless, this is 
not necessarily the case for weighted versions of 
PLG: 


Theorem 1 ((13]) There is an instance of (mono- 
tone) PSPLG with only three players and three 
strategies per player, possessing no PNE. On the 
other hand, any unweighted instance of monotone 
PSPLG possesses at least one PNE. 


Similar (positive) results were given for LBPLG. 
The key observation that lead to these results, 
is the fact that the lexicographically minimum 
vector of machine loads is always a PNE of the 
game. 


Theorem 2 There is always a PNE_ for 
any instance of Uniformly Related  LB- 
PLG [7], and actually for any instance of 
LBPLG [3]. Indeed, there is a k—robust 
PNE for any instance of LBPLG, and any 
1<k <n [9]. 


Status of Problem 2, 5 and 6 

Milchtaich [13] gave a constructive proof of 
existence for PNE in unweighted, monotone 
PSPLG, and thus implies a path of length at 
most n that leads to a PNE. Although this 
is a very efficient construction of PNE, it is 
not necessarily an improvement path, when 
all players are considered to coexist all the 
time, and therefore there is no justification for 
the adoption of such a path by the players. 
Milchtaich [13] proved that from an arbitrary 
initial configuration and allowing only best reply 
defections, there is a best reply improvement 
path of length at most m- ‘eur Finally, [11] 
proved for unweighted, Related PSPLG that it 
possesses FIP. Nevertheless, the convergence 
time is poor. 
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For LBPLG, the implicit connection of PNE 
construction to classical scheduling problems, 
has lead to quite interesting results. 


Theorem 3 ([7]) The LPT algorithm of Graham, 
yields a PNE for the case of Uniformly Related 
LBPLG, in time Om logm). 


The drawback of the LPT algorithm is that it is 
centralized and not selfishly motivated. An alter- 
native approach, called Nashification, is to start 
from an arbitrary initial configuration o € E” 
and then try to construct a PNE of at most 
the same maximum individual cost among the 
players. 


Theorem 4 ((6]) There is an O(nm?) time 
Nashification algorithm for any instance of 
Uniformly Related PLG. 


An alternative style of Nashification, is to let the 
players follow an arbitrary improvement path. 
Nevertheless, it is not always the case that this 
leads to a polynomial time construction of a PNE, 
as the following theorem states: 


Theorem 5 For Identical Machines LBPLG: 


¢ There exist best response improvement paths 


m 
of length 2(max pe (+) \) (3.61 

e Any best response improvement path is of 
length O(2") [6]. 

e Any best response improvement path, which 
gives priority to players of maximum weight 
among those willing to defect in each improve- 
ment step, is of length at most n [3]. 

¢ If all the service demands are integers, then 
any improvement path which gives priority to 
unilateral improvement steps, and otherwise 
allows only selfish 2-flips (ie, swapping 
of hosting machines between two goods) 
converges to a 2-robust PNE in at most 


$(iew Wi)” steps [9]. 


The following result concerns selfish rerouting 
policies: 


Theorem 6 ([4]) 


¢ For unweighted Identical Machines LBPLG, 
a simple policy (BALANCE) forcing all the 
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players of overloaded links to migrate to anew 
(random) link with probability proportional 
to the load of the link, converges to a PNE 
in O(log logn + logm) rounds of concurrent 
moves. The same convergence time holds also 
for a simple Nash Rerouting Policy, in which 
each mover actually has an incentive to move. 
¢ For unweighted Uniformly Related LBPLG, 
BALANCE has the same convergence time, but 
the Nash Rerouting Policy may converge in 


2(./n) rounds. 


Finally, a generic result of [5] is mentioned, that 
computes a PNE for arbitrary unweighted, player 
uniform symmetric network congestion games 
in polynomial time, by a nice exploitation of 
Rosenthal’s potential and the solution of a proper 
minimum cost flow problem. Therefore, for PLG 
the following result is implied: 


Theorem 7 ((5]) For unweighted, monotone PU- 
PLG, a PNE can be constructed in polynomial 
time. 


Of course, this result provides no answer, e.g., 
for Restricted Assignment LBPLG, for which it 
is still not known how to efficiently compute 
PNE. 


Status of Problem 3 and 4 

The proposed LPT algorithm of [7] for 
constructing PNE in Uniformly Related LBPLG, 
actually provides a solution which is at most 
1.52 < PPoA(LPT) < 1.67 times worse than 
the optimum PNE (which is indeed the allocation 
of the goods to the links that minimizes the 
make-span). The construction of the optimum, 
as well as the worst PNE are hard problems, 
which nevertheless admits a PTAS (in some 
cases): 


Theorem 8 For LBPLG with a social cost func- 
tion as defined in the Quality of Pure Equilibria 
paragraph: 


¢ For Identical Machines, constructing the opti- 
mum or the worst PNE is NP—hard [7]. 

¢ For Uniformly Related Machines, there is 
a PTAS for the optimum PNE [6]. 
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¢ For Uniformly Related Machines, it holds 
that PPoA= O( min {(log m)/ (log logm), 
log(Smax)/(Smin)}) [2]. 

¢ For the Restricted Assignments, PPoA = 
Q((logm)/(log log m)) [10], 

¢ For a_ generalization of the Restricted 
Assignments, where the players have goods 
of any positive, otherwise infinite service 
demands from the links (and not only elements 
of {1, co}), it holds that m —1 < PPoA<m 
[10]. 


It is finally mentioned that a recent result [1] for 
unweighted, single commodity network conges- 
tion games with linear delays, is translated to the 
following result for PLG: 


Theorem 9 ({1]) For unweighted PUPLG with 
linear charging mechanisms for the links, the 
worst case PNE may be a factor of PPoA = 5/2 
away from the optimum solution, wrt the social 
cost defined in the Quality of Pure Equilibria 
paragraph. 


Key Results 


None 


Applications 


Congestion games in general have attracted much 
attention from many disciplines, partly because 
they capture a large class of routing and resource 
allocation scenarios. 

PLG in particular, is the most elementary 
(non-trivial) atomic congestion game among 
a large number of players. Despite its simplicity, 
it was proved ([8] that it is asymptotically the 
worst case instance wrt the maximum individual 
cost measure, for a large class atomic congestion 
games involving the so called layered networks. 
Therefore, PLG is considered an_ excellent 
starting point for studying congestion games 
in large scale networks. 

The importance of seeking for PNE, rather 
than arbitrary (mixed in general) NE, is quite 
obvious in sciences like the economics, ecology, 
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and biology. It is also important for computer 
scientists, since it enforces deterministic costs to 
the players, and both the players and the network 
designer may feel safer in this case about what 
they will actually have to pay. 

The question whether the Nash Dynamics con- 
verge to a PNE in a reasonable amount of time, is 
also quite important, since (in case of a positive 
answer) it justifies the selfish, decentralized, local 
dynamics that appear in large scale communica- 
tions systems. Additionally, the selfish rerouting 
schemes are of great importance, since this is 
what should actually be expected from selfish, 
decentralized computing environments. 


Open Problems 


Open Question 1 Determine the (in)existence of 
PNE for all the instances of PLG that do not 
belong in LBPLG, or in monotone PSPLG. 


Open Question 2 Determine the (in)existence of 
k—robust PNE for all the instances of PLG that 
do not belong in LBPLG. 


Open Question 3 Js there a polynomial time 
algorithm for constructing k—robust PNE, even 
for the Identical Machines LBPLG and k > 1 
being a constant? 


Open Question 4 Do the improvement paths of 
instances of PLG other than PSPLG and LBPLG 
converge to a PNE? 


Open Question 5 Are there selfish rerouting 
policies of instances of PLG other than Identical 
Machines LBPLG converge to a PNE? How long 
much time would they need, in case of a positive 
answer? 
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Problem Definition 


Concurrency, Synchronization 
and Resource Allocation 
A concurrent system is a collection of proces- 
sors that communicate by reading and writing 
from a shared memory. A distributed system is 
a collection of processors that communicate by 
sending messages over a communication net- 
work. Such systems are used for various reasons: 
to allow a large number of processors to solve 
a problem together much faster than any proces- 
sor can do alone, to allow the distribution of data 
in several locations, to allow different processors 
to share resources such as data items, printers or 
discs, or simply to enable users to send electronic 
mail. 

A process corresponds to a given computation. 
That is, given some program, its execution is 
a process. Sometimes, it is convenient to refer 
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to the program code itself as a process. A pro- 
cess runs on a processor, which is the physical 
hardware. Several processes can run on the same 
processor although in such a case only one of 
them may be active at any given time. Real 
concurrency is achieved when several processes 
are running simultaneously on several processors. 

Processes in a concurrent system often need 
to synchronize their actions. Synchronization be- 
tween processes is classified as either cooperation 
or contention. A typical example for cooperation 
is the case in which there are two sets of pro- 
cesses, called the producers and the consumers, 
where the producers produce data items which 
the consumers then consume. 

Contention arises when several processes 
compete for exclusive use of shared resources, 
such as data items, files, discs, printers, etc. 
For example, the integrity of the data may be 
destroyed if two processes update a common 
file at the same time, and as a result, deposits and 
withdrawals could be lost, confirmed reservations 
might have disappeared, etc. In such cases it is 
sometimes essential to allow at most one process 
to use a given resource at any given time. 

Resource allocation is about interactions 
between processes that involve contention. The 
problem is, how to resolve conflicts resulting 
when several processes are trying to use shared 
resources. Put another way, how to allocate 
shared resources to competing processes. 
A special case of a general resource allocation 
problem is the mutual exclusion problem where 
only a single resource is available. 


The Mutual Exclusion Problem 

The mutual exclusion problem, which was first 
introduced by Edsger W. Dijkstra in 1965, is the 
guarantee of mutually exclusive access to a single 
shared resource when there are several competing 
processes [6]. The problem arises in operating 
systems, database systems, parallel supercomput- 
ers, and computer networks, where it is neces- 
sary to resolve conflicts resulting when several 
processes are trying to use shared resources. The 
problem is of great significance, since it lies at 
the heart of many interprocess synchronization 
problems. 
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The problem is formally defined as follows: it 
is assumed that each process is executing a se- 
quence of instructions in an infinite loop. The 
instructions are divided into four continuous sec- 
tions of code: the remainder, entry, critical sec- 
tion and exit. Thus, the structure of a mutual 
exclusion solution looks as follows: 


loop forever 
remainder code; 
entry code; 
critical section; 
exit code 

end loop 


A process starts by executing the remainder code. 
At some point the process might need to execute 
some code in its critical section. In order to access 
its critical section a process has to go through 
an entry code which guarantees that while it is 
executing its critical section, no other process is 
allowed to execute its critical section. In addition, 
once a process finishes its critical section, the 
process executes its exit code in which it notifies 
other processes that it is no longer in its critical 
section. After executing the exit code the process 
returns to the remainder. 

The Mutual exclusion problem is to write the 
code for the entry code and the exit code in such 
a way that the following two basic requirements 
are satisfied. 

Mutual exclusion: No two processes are in their 
critical sections at the same time. 
Deadlock-freedom: If a process is trying to enter 
its critical section, then some process, not neces- 
sarily the same one, eventually enters its critical 
section. 

The deadlock-freedom property guarantees 
that the system as a whole can always continue to 
make progress. However deadlock-freedom may 
still allow “starvation” of individual processes. 
That is, a process that is trying to enter its critical 
section, may never get to enter its critical section, 
and wait forever in its entry code. A stronger 
requirement, which does not allow starvation, is 
defined as follows. 

Starvation-freedom: If a process is trying to 
enter its critical section, then this process must 
eventually enter its critical section. 
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Although _ starvation-freedom is _ strictly 
stronger than deadlock-freedom, it still allows 
processes to execute their critical sections 
arbitrarily many times before some trying process 
can execute its critical section. Such a behavior is 
prevented by the following fairness requirement. 
First-in-first-out (FIFO): No beginning process 
can enter its critical section before a process that 
is already waiting for its turn to enter its critical 
section. 

The first two properties, mutual exclusion and 
deadlock freedom, were required in the original 
statement of the problem by Dijkstra. They are 
the minimal requirements that one might want 
to impose. In solving the problem, it is assumed 
that once a process starts executing its critical 
section the process always finishes it regardless 
of the activity of the other processes. Of all 
the problems in interprocess synchronization, the 
mutual exclusion problem is the one studied most 
extensively. This is a deceptive problem, and at 
first glance it seems very simple to solve. 


Key Results 


Numerous solutions for the problem have been 
proposed since it was first introduced by Edsger 
W. Dijkstra in 1965 [6]. Because of its impor- 
tance and as a result of new hardware and soft- 
ware developments, new solutions to the problem 
are still being designed. Before the results are 
discussed, few models for interprocess commu- 
nication are mentioned. 


Atomic Operations 

Most concurrent solutions to the problem 
assumes an architecture in which n processes 
communicate asynchronously via a_ shared 
objects. All architectures support atomic 
registers, which are shared objects that support 
atomic reads and writes operations. A weaker 
notion than an atomic register, called a safe 
register, is also considered in the literature. In 
a safe register, a read not concurrent with any 
writes must obtain the correct value, however, 
a read that is concurrent with some write, 
may return an arbitrary value. Most modern 
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architectures support also some form of atomicity 
which is stronger than simple reads and writes. 
Common atomic operations have special names. 
Few examples are, 


¢ Test-and-set: takes a shared registers r and 
a value val. The value val is assigned to r, and 
the old value of r is returned. 

¢ Swap: takes a shared registers r and a local 
register £, and atomically exchange their val- 
ues. 

¢ Fetch-and-increment: takes a register r. The 
value of r is incremented by 1, and the old 
value of r is returned. 

¢ Compare-and-swap: takes a register r, and two 
values: new and old. If the current value of the 
register r is equal to old, then the value of r 
is set to new and the value true is returned; 
otherwise r is left unchanged and the value 
false is returned. 


Modern operating systems (such as Unix 
and Windows) implement synchronization 
mechanisms, such as semaphores, that simplify 
the implementation of mutual exclusion locks 
and hence the design of concurrent applications. 
Also, modern programming languages (such as 
Modula and Java) implement the monitor concept 
which is a program module that is used to ensure 
exclusive access to resources. 


Algorithms and Lower Bounds 

There are hundreds of beautiful algorithms for 
solving the problem some of which are also very 
efficient. Only few are mentioned below. First 
algorithms that use only atomic registers, or even 
safe registers, are discussed. 

The Bakery Algorithm. The Bakery algorithm 
is one of the most known and elegant mutual 
exclusion algorithms using only safe registers [9]. 
The algorithm satisfies the FIFO requirement, 
however it uses unbounded size registers. A mod- 
ified version, called the Black-White Bakery al- 
gorithm, satisfies FIFO and uses bounded number 
of bounded size atomic registers [14]. 

Lower bounds. A space lower bound for solv- 
ing mutual exclusion using only atomic registers 
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is that: any deadlock-free mutual exclusion algo- 
rithm for n processes must use at least n shared 
registers [5]. It was also shown in [5] that this 
bound is tight. A time lower bound for any mutual 
exclusion algorithm using atomic registers is that: 
there is no a priori bound on the number of 
steps taken by a process in its entry code until 
it enters its critical section (counting steps only 
when no other process is in its critical section 
or exit code) [2]. Many other interesting lower 
bounds exist for solving mutual exclusion. 

A Fast Algorithm. A fast mutual exclusion 
algorithm, is an algorithm in which in the ab- 
sence of contention only a constant number of 
shared memory accesses to the shared registers 
are needed in order to enter and exit a critical 
section. In [10], a fast algorithm using atomic 
registers is described, however, in the presence 
of contention, the winning process may have to 
check the status of all other processes before it 
is allowed to enter its critical section. A natural 
question to ask is whether this algorithm can be 
improved for the case where there is contention. 

Adaptive Algorithms. Since the other contend- 
ing processes are waiting for the winner, it is 
particularly important to speed their entry to the 
critical section, by the design of an adaptive 
mutual exclusion algorithm in which the time 
complexity is independent of the total number 
of processes and is governed only by the current 
degree of contention. Several (rather complex) 
adaptive algorithms using atomic registers are 
known [1, 3, 14]. (Notice that, the time lower 
bound mention earlier implies that no adaptive 
algorithm using only atomic registers exists when 
time is measured by counting all steps.) 

Local-spinning Algorithms. Many algorithms 
include busy-waiting loops. The idea is that in 
order to wait, a process spins on a flag register, 
until some other process terminates the spin with 
a single write operation. Unfortunately, under 
contention, such spinning may generate lots of 
traffic on the interconnection network between 
the process and the memory. An algorithm sat- 
isfies local spinning if the only type of spin- 
ning required is local spinning. Local Spinning 
is the situation where a process is spinning on 
locally-accessible registers. Shared registers may 


424 


be locally-accessible as a result of either coherent 
caching or when using distributed shared memory 
where shared memory is physically distributed 
among the processors. 

Three local-spinning algorithms are presented 
in [4, 8, 11]. These algorithms use strong atomic 
operations (i.e., fetch-and-increment, swap, 
compare-and-swap), and are also called scalable 
algorithms since they are both local-spinning 
and adaptive. Performance studies done, have 
shown that these algorithms scale very well as 
contention increases. Local spinning algorithms 
using only atomic registers are presented in 
[1, 3, 14]. 

Only few representative results have been 
mentioned. There are dozens of other very 
interesting algorithms and lower bounds. All 
the results discussed above, and many more, 
are described details in [15]. There are also many 
results for solving mutual exclusion in distributed 
message passing systems [13]. 


Applications 


Synchronization is a fundamental challenge in 
computer science. It is fast becoming a major 
performance and design issue for concurrent pro- 
gramming on modern architectures, and for the 
design of distributed and concurrent systems. 

Concurrent access to resources shared among 
several processes must be synchronized in order 
to avoid interference between conflicting opera- 
tions. Mutual exclusion locks (i.e., algorithms) 
are the de facto mechanism for concurrency 
control on concurrent applications: a process 
accesses the resource only inside a critical section 
code, within which the process is guaranteed 
exclusive access. The popularity of this approach 
is largely due the apparently simple programming 
model of such locks and the availability of 
implementations which are efficient and scalable. 
Essentially all concurrent programs (including 
operating systems) use various types of mutual 
exclusion locks for synchronization. 

When using locks to protect access to 
a resource which is a large data structure (or 
a database), the granularity of synchronization 
is important. Using a single lock to protect the 
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whole data structure, allowing only one process 
at a time to access it, is an example of coarse- 
grained synchronization. In contrast, fine-grained 
synchronization enables to lock “small pieces” 
of a data structure, allowing several processes 
with non-interfering operations to access it 
concurrently. Coarse-grained synchronization 
is easier to program but is less efficient and is not 
fault-tolerant compared to fine-grained synchro- 
nization. Using locks may degrade performance 
as it enforces processes to wait for a lock to be 
released. In few cases of simple data structures, 
such as queues, stacks and counters, locking may 
be avoided by using lock-free data structures. 
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Problem Definition 


Consider a graph G = (V, E). A subset C of V 
is called a dominating set if every vertex is either 
in C or adjacent to a vertex in C. If, further- 
more, the subgraph induced by C is connected, 
then C is called a connected dominating set. A 
connected dominating set with a minimum cardi- 
nality is called a minimum connected dominating 
set (MCDS). Computing an MCDS is an NP- 
hard problem and there is no polynomial-time 
approximation with performance ratio pH (A) for 
p < lunless NP C DTIME(nO“ '" ) where H 
is the harmonic function and A is the maximum 
degree of the input graph [11]. 

A unit disk is a disk with radius one. A unit 
disk graph (UDG) is associated with a set of unit 
disks in the Euclidean plane. Each node is at the 
center of a unit disk. An edge exists between two 
nodes u and v if and only if |uv| < 1 where |u| 
is the Euclidean distance between u and v. This 
means that two nodes u and v are connected with 
an edge if and only if u’s disk covers v and v’s 
disk covers u. 

Computing an MCDS in a unit disk graph 
is still NP-hard. How hard is it to construct 
a good approximation for MCDS in unit disk 
graphs? Cheng et al. [5] answered this question 
by presenting a polynomial-time approximation 
scheme. 


Historical Background 

The connected dominating set problem has been 
studied in graph theory for many years [23]. 
However, recently it becomes a hot topic due to 
its application in wireless networks for virtual 
backbone construction [4]. Guha and Khuller 
[11] gave a two-stage greedy approximation for 
the minimum connected dominating set in gen- 
eral graphs and showed that its performance ratio 
is 3 + In A where A is the maximum node 
degree in the graph. To design a one-step greedy 
approximation to reach a similar performance 
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ratio, the difficulty is to find a submodular po- 
tential function. In [22], Ruan et al. successfully 
designed a one-step greedy approximation that 
reaches a better performance ratio c + In A for 
any c > 2. Du et al. [7] showed that there 
exists a polynomial-time approximation with a 
performance ratio a(1 + In A) for any a > 1. 
The importance of those works is that the poten- 
tial functions used in their greedy algorithm are 
non-submodular and they managed to complete 
its theoretical performance evaluation with fresh 
ideas. 

Guha and Khuller [11] also gave a nega- 
tive result that there is no polynomial-time ap- 
proximation with a performance ratio pIn A 
for p < 1 unless NP C DTIME(nO™ ©”), 
As indicated by [9], dominating sets cannot be 
approximated arbitrarily well, unless P almost 
equal to NP. These results move ones’ attention 
from general graphs to unit disk graphs because 
the unit disk graph is the model for wireless 
sensor networks, and in unit disk graphs, MCDS 
has a polynomial-time approximation with a con- 
stant performance ratio. While this constant ratio 
is getting improved step-by-step [1, 2, 20, 25], 
Cheng et al. [5] closed this story by showing 
the existence of a polynomial-time approxima- 
tion scheme (PTAS) for the MCDS in unit disk 
graphs. This means that theoretically, the perfor- 
mance ratio for polynomial-time approximation 
can be as small as 1+ ¢ for any positive number e. 

Dubhashi et al. [8] showed that once a 
dominating set is constructed, a connected 
dominating set can be easily computed in a 
distributed fashion. Most centralized results 
for dominating sets are available at [19]. In 
particular, a simple constant approximation 
for dominating sets in unit disk graphs was 
presented in [19]. Constant-factor approximation 
for minimum-weight (connected) dominating 
sets in UDGs was studied in [3]. A PTAS for the 
minimum dominating set problem in UDGs was 
proposed in [21]. Kuhn et al. [16] proved that a 
maximal independent set (MIS) (and hence also 
a dominating set) can be computed in asymptoti- 
cally optimal time O(log 7) in UDGs and a large 
class of bounded independence graphs. Luby [18] 
reported an elegant local O(log n) algorithm for 


Connected Dominating Set 


MIS on general graphs. Jia et al. [12] proposed a 
fast O(log 1) distributed approximation for dom- 
inating set in general graphs. The first constant- 
time distributed algorithm for dominating sets 
that achieves a nontrivial approximation ratio 
for general graphs was reported in [13]. The 
matching O(log 1) lower bound is considered to 
be a classic result in distributed computing [17]. 
For UDGs a PTAS is achievable in a distributed 
fashion [15]. The fastest deterministic distributed 
algorithm for dominating sets in UDGs was 
reported in [14], and the fastest randomized 
distributed algorithm for dominating sets in 
UDGs was presented in [10]. 


Key Results 


The construction of PTAS for MCDS is based 
on the fact that there is a polynomial-time ap- 
proximation with a constant performance ratio. 
Actually, this fact is quite easy to see. First, note 
that a unit disk contains at most five independent 
vertices [2]. This implies that every maximal 
independent set has a size at most 1 + 4opt where 
opt is the size of an MCDS. Moreover, every 
maximal independent set is a dominating set and 
it is easy to construct a maximal independent 
set with a spanning tree of all edges with length 
two. All vertices in this spanning tree form a 
connected dominating set of a size at most 1 + 
8opt. By improving the upper bound for the size 
of a maximal independent set [26] and the way to 
interconnecting a maximal independent set [20], 
the constant ratio has been improved to 6.8 with 
a distributed implementation. 

The basic techniques in this construction are 
nonadaptive partition and shifting. Its general 
picture is as follows: First, the square containing 
all vertices of the input unit disk graph is divided 
into a grid of small cells. Each small cell is further 
divided into two areas, the central area and the 
boundary area. The central area consists of points 
h distance away from the cell boundary. The 
boundary area consists of points within distance 
h + 1 from the boundary. Therefore, two areas 
are overlapping. Then a minimum union of con- 
nected dominating sets is computed in each cell 
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for connected components of the central area of 
the cell. The key lemma is to prove that the union 
of all such minimum unions is no more than the 
minimum connected dominating set for the whole 
graph. For vertices not in central areas, just use 
the part of an 8-approximation lying in boundary 
areas to dominate them. This part together with 
the above union forms a connected dominating 
set for the whole input unit disk graph. By shift- 
ing the grid around to get partitions at different 
coordinates, a partition having the boundary part 
with a very small upper bound can be obtained. 

The following details the construction. 

Given an input connected unit disk graph G = 
(V, E) residing in a square 0 = {(x,y)|O<x < 
q,0 < y < q} where g < |V|. To construct an 
approximation with a performance ratio | + e for 
€ > 0, choose an integer m = O((1/e) In(1/e)). 
Let p = |g/m| + 1. Consider the square Q”. 
Partition O into (p + 1) x (p + 1) grids so 
that each cell is an m x m square excluding the 
top and the right boundaries, and hence, no two 
cells are overlapping each other. This partition of 
Qs is denoted by P(0) (Fig. 1). In general, the 
partition P(a) is obtained from P(0) by shifting 
the bottom-left corner of O from (—m,—m) to 
(—m+a, —m-+a). Note that shifting from P (0) 
to P(a) for 0 < a < m keeps Q covered by the 
partition. 


Connected Dominating Set, Fig. 1 Squares Q and O 
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For each cell e (an m x m square), Ce (d) 
denotes the set of points in e away from the 
boundary by distance at least d, e.g., Ce (0) is the 
cell e itself. Denote B.(d) = C,(0) — C,(d). 
Fix a positive integer h = 7 + 3|log,(4m7/z:)|. 
Call C.(h) the central area of e and Be(h + 1) 
the boundary area of e. Hence, the boundary area 
and the central area of each cell are overlapping 
with width one. 


Central Area 
Let Ge (d) denote the part of input graph G lying 
in area C, (d). In particular, Ge (A) is the part of 
graph G lying in the central area of e. Ge (h) may 
consist of several connected components. Let Ke 
be a subset of vertices in Ge (0) with a minimum 
cardinality such that for each connected com- 
ponent H of G, (h), K- contains a connected 
component dominating HH. In other words, K, is 
a minimum union of connected dominating sets 
in G(0) for the connected components of Ge (h). 
Now, denote by K(a) the union of Ke for e 
over all cells in partition P(a). K(a) has two 
important properties: 


Lemmal K(a) can be computed in time 
nO(m2). 


Lemma 2 |K°%| < opt for0<a<m-1. 


Lemma | is not hard to see. Note that in 
a square with edge length /2/2, all vertices 
induce a complete subgraph in which any vertex 
must dominate all other vertices. It follows that 
the minimum dominating set for the vertices 


2 
of G.(0) has size at most (| v2m]) . Hence, 


2 
the size of Ke is at most 3 ( /2m ) because 


any dominating set in a connected graph has a 
spanning tree with an edge length at most three. 
Suppose cell Ge(0) has ne vertices. Then the 
number of candidates for Ke is at most 


3([/2m])” 


=" (i)- 
k=0 


Hence, computing K(a) can be done in time 
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O(m?) 
Fn0m < (Sn) =n 
e e 


However, the proof of Lemma 2 is quite tedious. 
The reader who is interested in it may find it 
in [5]. 


Boundary Area 

Let F be a connected dominating set of G satis- 
fying |F'| < 80pt+ 1. Denote by F(a) the subset 
of F lying in the boundary area Bg (h + 1). Since 
F is constructed in polynomial time, only the size 
of F(a) needs to be studied. 


Lemma 3 Suppose h = 7 + 3|log,(4m?/z)| 
and |m/(h + 1)| => 32/¢. Then there is at least 
half of i = 0,1,...,|m/(h + 1)| — 1 such that 
|F(i(h + 1))| < €- opt. 


Proof Let Fx (a) (Fy (a)) denote the subset of 
vertices in F(a) each with distance < h + 1 from 
the horizontal (vertical) boundary of some cell in 
P(a). Then F(a) = Fy(a) U Fy (a). Moreover, 
all Fy (h+1)) for? = 0,1,..., |m/(h+1)|-1 
are disjoint. Hence, 


[m/(h+1)]-1 


» 


i=0 


[Fx (i(h + 1))| |S||F | < 80pt. 


Similarly, all Fy (i(4 + 1)) fori = 0,1,..., 
(h + 1)| — 1 are disjoint and 


[m/ 


[m/(h+1)]-1 


ye 


i=0 


[Fy i(h + 1))| |SI|F| S 80pt. 


Thus, 


[m/(h+1)]-1 


os 


i=0 
[m/(h+1)|-1 


» 


i=0 


IFGA+D) s 


(Fax @ (A+1))| + [Fv G@A+1)))/) 
< l6opt. 


That is, 
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[m/(h+1)|-1 


pS 


i=0 


1 


[m/@+D] |F(i(h+1)| <(e/2opt. 


This means that there are at least half of F(i(h + 
1)) fori = 0,1, |m/(hA + 1)| — 1 satisfying 


|F(i(h + 1))| < €- opt. 


Putting Together 

Now put K(a) and F(a). By Lemmas 2 and 3, 
there exists a € {0,h + 1,...,(|m/(h + 1)| - 
1)(A + 1)} such that 


|K(a) U F(a)| < (1 + e)opt. 


Lemma 4 For 0 < a < m—1, K(a)U F(a) 
is a connected dominating for input connected 
graph G. 


Proof K(a) U F(a) is clearly a dominating set 
for input graph G. Its connectivity can be shown 
as follows. Note that the central area and the 
boundary area are overlapping with an area of 
width one. Thus, for any connected component 
H of the subgraph G, (h), F(a) has a vertex in 
H. Hence, F(a) must connect to any connected 
dominating set for H, especially, the one Dy 
in K(a). This means that Dy is making up the 
connections of F lost from cutting a part in 
H. Therefore, the connectivity of K(a) U F(a) 
follows from the connectivity of F’. Oo 


By summarizing the above results, the follow- 
ing result is obtained: 


Theorem 1 There is a (1+ )-approximation for 


MCDS in connected unit disk graphs, running in 
time nO(G/e)log (1/2)2)_ 


Applications 


An important application of connected dominat- 
ing sets is to construct virtual backbones for wire- 
less networks, especially, wireless sensor net- 
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works [4]. The topology of a wireless sensor 
network is often a unit disk graph. 


Open Problems 


In general, the topology of a wireless network is 
a disk graph, that is, each vertex is associated 
with a disk. Different disks may have different 
sizes. There is an edge from vertex u to vertex 
v if and only if the disk at u covers v. A virtual 
backbone in disk graphs is a subset of vertices, 
which induces a strongly connected subgraph, 
such that every vertex not in the subset has an in- 
edge coming from a vertex in the subset and also 
has an out-edge going into a vertex in the subset. 
Such a virtual backbone can be considered as a 
connected dominating set in disk graph. Is there 
a polynomial-time approximation with a constant 
performance ratio? It is still open right now [6]. 
Thai et al. [24] has made some effort towards this 
direction. 
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Problem Definition 
Given a collection C of subsets of a finite set 


X, find a minimum subcollection C’ of C such 
that every element of X appears in some subset 
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in C’. This problem is called the minimum set- 
cover problem. Every feasible solution, i.e., a 
subcollection C’ satisfying the required condi- 
tion, is called a set-cover. The minimum set- 
cover problem is NP-hard, and the complexity 
of approximation for it is well solved. It is well 
known that (1) the minimum set-cover problem 
has a polynomial-time (1 + In7)-approximation 
where n = |X| [2,7,8], and moreover (2) if the 
minimum set-cover problem has a polynomial- 
time (o Inm)-approximation for any 0 < p < 1, 
then NP C DTIME(n? 8's”) [4], 

The minimum connected set-cover problem is 
closely related to the minimum set-cover prob- 
lem, which can be described as follows: Given 
a collection C of subsets of a finite set X and 
a graph G with vertex set C, find a minimum 
set-cover C’ C C such that the subgraph in- 
duced by C’ is connected. An issue of whether 
the minimum connected set-cover problem has a 
polynomial-time O(log )-approximation or not 
[1,9, 11] was open for several years. 


Key Results 


Zhang et al. [12] solved this problem by discov- 
ering a relationship between the minimum con- 
nected set-cover problem and the group Steiner 
tree problem. 

Given a graph G = (V,£) with edge non- 
negative weight c E — WN and k subsets 
(called groups) of vertices, V;,...,V;, find the 
minimum edge-weight tree interconnecting those 
k vertex subsets, i.e., containing at least one 
vertex from each subset. This is called the group 
Steiner tree problem. It has another formulation 
as follows: Given a graph G = (V, E) with edge 
nonnegative weight c : E —> Rt, a special 
vertex r, and k subsets of vertices, Vj,..., Vz, 
find the minimum edge-weight tree with root r, 
interconnecting those k vertex subsets. 

These two formulations are equivalent in 
the sense that one has a_ polynomial-time 
p-approximation and so does the other one. 
Actually, consider vertex r as a group with only 
one member. Then it is immediately known that 
if the first formulation has a polynomial-time p- 
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approximation, so does the second formulation. 
Next, assume the second formulation has 
a polynomial-time -approximation. In the 
first formulation, fix a group V;, for each 
vertex v € Vj, and apply the polynomial- 
time p-approximation algorithm for the second 
formulation to the root r = v and k — 1 
groups V2,...,V%. Choose the shortest one 
from |V;| obtained trees, which would be a 
polynomial-time p-approximation for the first 
formulation. 

The following are well-known results for the 
group Steiner tree problem 


Theorem 1 (Halperin) and Krauthgamer 
[6]) The group Steiner tree problem has no 
polynomial-time O(log?~* n)-approximation for 
any € > 0 unless NP has quasi-polynomial 
Las-Vega algorithm. 


Theorem 2 (Garg, Konjevod, Ravi [5]) The 
group Steiner tree problem has a polynomial-time 
random O(log? n log k)-approximation where n 
is the number of nodes in the input graph and k 
is the number of groups. 


Zhang et al. [12] showed that if the minimum 
connected set-cover problem has a polynomial- 
time p-approximation, then for any e > 0, there 
is a polynomial-time (po + €)-approximation for 
the group Steiner tree problem. Therefore, by 
Theorem | they obtained the following result. 


Theorem 3 (Zhang et al. [12]) The connected 
set-cover problem has no polynomial-time 
O(log? * n)-approximation for any ¢ > 0 unless 
NP has quasi-polynomial Las-Vega algorithm. 


To obtain a good approximation for the mini- 
mum connected set-cover problem, Wu et al. [10] 
showed that if the group Steiner tree problem has 
a polynomial-time p-approximation, so does the 
minimum connected set-cover problem. There- 
fore, they obtained the following theorem. 


Theorem 4 (Wu et al. [10]) The connected set- 
cover problem has a polynomial-time random 
O(log? n log k)-approximation where n = |C| 
and k = |X|. 
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Combining what have been proved by Zhang 
et al. [12] and by Wu et al. [10], it is easy to know 
the following relation. 


Theorem 5 The connected set-cover problem 
has a polynomial-time (p + &)-approximation 
for any ¢ > 0 if and only if the group Steiner 
tree problem has a polynomial-time (p + &)- 
approximation. 


This equivalence is also independently discov- 
ered by [3]. Actually, this equivalence is similar 
to the one between the minimum set-cover prob- 
lem and the minimum hitting set problem. 

For each element x € X, define a collection of 
subsets: 


C, ={S|xeS €C}. 


Then, the minimum set-cover problem becomes 
the minimum hitting set problem as follows: 
Given a finite set C and a collection of subsets 
of C, {C, | x €}, find the minimum hitting set, 
i.e., a subset C’ of C such that for every x € X, 
Cnc, £0, 

Similarly, the minimum connected set-cover 
problem becomes the equivalent connected hit- 
ting set problem as follows: Given a finite set 
C, a graph G with vertex set C, and a collection 
of subsets of C, {C, | x €}, find the minimum 
connected hitting set where a connected hitting 
set is a hitting set C’ such that C’ induces a 
connected subgraph of G. 

To see the equivalence between the minimum 
connected hitting set problem and the group 
Steiner tree problem, it is sufficient to note the 
following two facts. 

First, the existence of a connected hitting set 
C’ is equivalent to the existence of a tree with 
weight |C’| — 1, interconnecting groups C, for all 
x € X when the graph G is given unit weight 
for each edge. This is because in the subgraph 
induced by C’, we can construct a spanning tree 
with weight |C’| — 1. 

Second, a graph with nonnegative integer edge 
weight can be turned into an equivalent graph 
with unit edge weight by adding some new ver- 
tices to cut each edge with weight bigger than one 
into several edges with unit weight. 
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Open Problems 


It is an open problem whether there exists or not a 
polynomial-time approximation for the minimum 
connected set-cover problem with performance 
ratio O(log” n) for 2 <a < 3. 
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Problem Definition 


A new model of random graphs was introduced 
in [10], that of random regular graphs with edge 
faults (denoted hereafter by G;, ,), obtained by 
selecting the edges of a random member of the 
set of all regular graphs of degree r independently 
and with probability p. Such graphs can represent 
a communication network in which the links fail 
independently and with probability f = 1 — p. 
A formal definition of the probability space G;, 
follows. 


Definition 1 (The G,, , Probability Space) Let 
G/. be the probability space of all random regular 
graphs with n vertices where the degree of each 
vertex is r. The probability space G;, , of random 
regular graphs with edge faults is constructed 
by the following two subsequent random exper- 
iments: first, a random regular graph is chosen 
from the space G/ and, second, each edge is 
randomly and independently deleted from this 
graph with probability f = 1 — p. 


Important connectivity properties of G/ p are 
investigated in this entry by estimating the ranges 
of r, f for which, with high probability, G7, 
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graphs (a) are highly connected (b) become dis- 
connected and (c) admit a giant (ie., of O(n) 
size) connected component of small diameter. 


Notation. The terms “almost certainly” (a.c.) 
and “with high probability” (w.h.p.) will be fre- 
quently used with their standard meaning for 
random graph properties. A property defined in 
a random graph holds almost certainly when its 
probability tends to 1 as the independent variable 
(usually the number of vertices in the graph) 
tends to infinity. “With high probability” means 
that the probability of a property of the random 
graph (or the success probability of a randomized 
algorithm) is at least 1 — n~%, where a > 0 is 
a constant and n is the number of vertices in the 
graph. 

The interested reader can further study [1] 
for an excellent exposition of the probabilistic 
method and its applications, [3] for a classic book 
on random graphs, as well as [6] for an excellent 
book on the design and analysis of randomized 
algorithms. 


Key Results 


Summary. This entry studies several important 
connectivity properties of random regular graphs 
with edge faults. In order to deal with the G;, , 
model, [10] first extends the notion of configura- 
tions and the translation lemma between config- 
urations and random regular graphs provided by 
B. Bollobas [2,3], by introducing the concept of 
random configurations to account for edge faults 
and by also providing an extended translation 
lemma between random configurations and ran- 
dom regular graphs with edge faults. 

For this new model of random regular graphs 
with edge faults [10] shows that: 
1. For all failure probabilities f = 1— p <n 
(€ = > fixed) and any r > 3 the biggest 
part of G,,, (i.e., the whole graph except 
of O(1) vertices) remains connected and this 
connected part cannot be separated, almost 
certainly, unless more than r vertices are re- 
moved. Note interestingly that the situation for 
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this range of f and r is very similar, despite 
the faults, to the properties of G? which is r- 
connected for r > 3. 

2. Gj, is disconnected a.c. for constant f and 
any r = o(logn) but is highly connected, 
almost certainly, when r > alogn, where 
a > 0 an appropriate constant. 

3. Even when G,,. Z becomes disconnected, it 
still has a giant component of small diameter, 
even when r = O(1). An O(n logn)-time 
algorithm to construct a giant component is 
provided. 


Configurations and Translation Lemmata 
Note that it is not as easy (from the technical 
point of view) as in the Gy,» case to argue about 
random regular graphs, because of the stochastic 
dependencies on the existence of the edges due to 
regularity. The following notion of configurations 
was introduced by B. Bollobas [2,3] to translate 
statements for random regular graphs to state- 
ments for the corresponding configurations which 
avoid the edge dependencies due to regularity and 
thus are much easier to deal with: 


Definition 2 (Bollobas [2]) Let w = UF Wj 
be a fixed set of 2m = ae d; labeled vertices 
where |w;| = d;. A configuration F is a partition 
of w into m pairs of vertices, called edges of F’. 


Given a configuration F, let 6(F) be the 
(multi)graph with vertex set V in which (i, /) 
is an edge if and only if F has a pair (edge) 
with one element in w; and the other in w;. Note 
that every regular graph G € G/ is of the form 
0(F) for exactly (r!)” configurations. However 
not every configuration F with d; = r for all 
j corresponds to a G € G}' since F may have an 
edge entirely in some w; or parallel edges joining 
w; and w;. 

Let ¢ be the set of all configurations F’ and 
let G? be the set of all regular graphs. Given 
a property (set) Q C G/ let O* C @ such 
that O* N O-1(G") = 07-1(Q). By estimat- 
ing the probability of possible cycles of length 
one (self-loops) and two (loops) among pairs 
w;,w; in @(F), the following important lemma 
follows: 
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Lemma 1 (Bollobas [3]) Jf 7 > 2 is fixed and 
property Q* holds for a.e. configuration, then 
property Q holds for a.e. r—regular graph. 


The main importance of the above lemma is 
that when studying random regular graphs, in- 
stead of considering the set of all random regular 
graphs, one can study the (much more easier to 
deal with) set of configurations. 

In order to deal with edge failures, [10] intro- 
duces here the following extension of the notion 
of configurations: 


Definition 3 (Random Configurations) Let w= 
U"_,w; be a fixed set of 2m = 'j_,d; 
labeled “vertices” where |w;| = d;. Let F 
be any configuration of the set ¢@. For each 
edge of F, remove it with probability 1 — p, 
independently. Let d be the new set of objects 
and F the outcome of the experiment. F is called 


a random configuration. 


By introducing probability p in every edge, 
an extension of the proof of Lemma 1 leads 
(since in both O and O each edge has the same 
probability and independence to be deleted, thus 
the modified spaces follow the properties of QO 
and Q*) to the following extension to random 
configurations. 


Lemma 2 (Extended Translation Lemma) Let 
r > 2 fixed and O be a property for Gy, p graphs. 
If O holds for a.e. random configuration, then the 
corresponding property O holds for a.e. graph in 
Ge 


Multiconnectivity Properties of G7, 

The case of constant link failure probability f 
is studied, which represents a worst case for 
connectivity preservation. Still, [10] shows that 
logarithmic degrees suffice to guarantee that G;,_, 
remains w.h.p. highly connected, despite these 
constant edge failures. More specifically: 


Theorem 1 Let G be an instance of G;,_, where 
p = O(1) andr > alogn, where a > 0 an 
appropriate constant. Then G is almost certainly 
k-connected, where 
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beg ( logn ) 
loglogn 

The proof of the above theorem uses Chernoff 
bounds to estimate the vertex degrees in G;,_, and 
“similarity” of G7, , and Gn,’ (whose properties 
are known) for a suitably chosen p’. 

Now the (more practical) case in which 
f = 1- p = o(\) is considered and 
it is proved that the desired connectivity 
properties of random regular graphs are almost 
preserved despite the link failures. More 
specifically: 


enn Letr > 3and f =1-— p= O(n) 
fore = 2. Then the biggest part of Gj, p (i.e., the 
whole graph except of O(1) vertices) remains 
connected and this connected part (excluding 
the vertices that were originally neighbors of the 
O(1)-sized disconnected set) cannot be separated 
unless more than r vertices are removed, with 
probability tending to 1 as n tends to +00. 


The proof is carefully extending, in the case 
of faults, a known technique for random regular 
graphs about not admitting small separators. 


Gy, p Becomes Disconnected 

Next remark that a constant link failure probabil- 
ity dramatically alters the connectivity structure 
of the regular graph in the case of low degrees. In 
particular, by using the notion of random config- 
urations, [10] proves the following theorem: 


aoen and p = 


has at least one isolated node 


Theorem 3 When 2 < r < 
O(1) then Che 
with probability at least 1 — i 


The regime for disconnection is in fact larger, 
since [10] shows that G7, p 18 a.c. disconnected 
even for any r = o(logn) and constant f. The 
proof of this last claim is complicated by the fact 
that due to the range for r one has to avoid using 
the extended translation lemma. 


Existence of a Giant Component inG,, , 

Since G;, , is a.c. disconnected for r = o(logn) 
and 1—p = f = O(\), it would be in- 
teresting to know whether at least a large part 
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of the network represented by G;,_, is still con- 
nected, i.e., whether the biggest connected com- 
ponent of G;, ,, is large. In particular, [10] shows 
that: 


Theorem 4 When f < 1— 2 then Gy, p admits 


= 
a giant (i.e, O(n)-sized) connected component 


for any r => 64 with probability at least 1 — 


2 
O (257), where a > 0a constant that can be 


selected. 


In fact, the proof of the existence of the 
component includes first proving the existence 
(w.h.p.) of a sufficiently long (of logarithmic size) 
path as a basis for a BFS process starting from the 
vertices of that path that creates the component. 
The proof is quite complex: occupancy 
arguments are used (bins correspond to the 
vertices of the graphs while balls correspond to its 
edges); however, the random variables involved 
are not independent, and in order to use Chernoff- 
Hoeffding bounds for concentration one must 
prove that these random variables, although 
not independent, are negatively associated. 
Furthermore, the evaluation of the success of the 
BFS process uses a careful, detailed average case 
analysis. 

The path construction and the BFS process 
can be viewed as an algorithm that (in case of 
no failures) actually reveals a giant connected 
component. This algorithm is very efficient, as 
shown by the following result: 


Theorem 5 A giant component of G;, , can be 


constructed in O(n logn) time, with probability 
at least 1— O ( 
that can be selected. 


log? n 
na/3 


), where a > 0 a constant 


Applications 


In recent years the development and use of 
distributed systems and communication networks 
has increased dramatically. In addition, state-of- 
the-art multiprocessor architectures compute over 
structured, regular interconnection networks. 
In such environments, several applications 
may share the same network while executing 
concurrently. This may lead to unavailability 
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of certain network resources (e.g., links) for 
certain applications. Similarly, faults may cause 
unavailability of links or nodes. The aspect of 
reliable distributed computing (which means 
computing with the available resources and 
resisting faults) adds value to applications 
developed in such environments. 

When computing in the presence of faults, 
one cannot assume that the actual structure of 
the computing environment is known. Faults may 
happen even in execution time. In addition, what 
is a “faulty” or “unavailable” link for one appli- 
cation may in fact be the de-allocation of that 
link because it is assigned (e.g., by the network 
operation system) to another application. The 
problem of analyzing allocated computation or 
communication in a network over a randomly 
assigned subnetwork and in the presence of faults 
has a nature different from fault analysis of spe- 
cial, well-structured networks (e.g., hypercube), 
which does not deal with network aspects. The 
work presented in this entry addresses this inter- 
esting issue, i.e., analyzing the average case taken 
over a set of possible topologies and focuses on 
multiconnectivity and existence of giant compo- 
nent properties required for reliable distributed 
computing in such randomly allocated unreliable 
environments. 

The following important application of this 
work should be noted: multitasking in distributed 
memory multiprocessors is usually performed by 
assigning an arbitrary subnetwork (of the in- 
terconnection network) to each task (called the 
computation graph). Each parallel program may 
then be expressed as communicating processors 
over the computation graph. Note that a multi- 
connectivity value k of the computation graph 
means also that the execution of the application 
can tolerate up to k — 1 online additional faults. 


Open Problems 


The ideas presented in [10] inspired already fur- 
ther interesting research. Andreas Goerdt [4] con- 
tinued the work presented in a preliminary ver- 
sion [8] of [10] and showed the following results: 
if the degree r is fixed then p = — is a 


r-1 
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threshold probability for the existence of a linear- 
sized component in the faulty version of almost 
all random regular graphs. In fact, he further 
shows that if each edge of an arbitrary graph 
G with maximum degree bounded above by r is 


present with probability p = +, when A < 1, 


then the faulty version of G has only components 
whose size is at most logarithmic in the number 
of nodes, with high probability. His result implies 
some kind of optimality of random regular graphs 
with edge faults. Furthermore, [5,7] investigates 
important expansion properties of random regular 
graphs with edge faults, as well as [9] does in 
the case of fat trees, a common type of intercon- 
nection networks. It would be also interesting to 
further pursue this line of research, by also inves- 
tigating other combinatorial properties (and also 
provide efficient algorithms) for random regular 
graphs with edge faults. 
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Problem Definition 


Reaching agreement is one of the central issues in 
fault tolerant distributed computing. One version 
of this problem, called Consensus, is defined over 
a fixed set JT = {pi,..., Pn} of n processes 
that communicate by exchanging messages along 
channels. Messages are correctly transmitted (no 
duplication, no corruption), but some of them 
may be lost. Processes may fail by prematurely 
stopping (crash), may omit to send or receive 
some messages (omission), or may compute erro- 
neous values (Byzantine faults). Such processes 
are said to be faulty. Every process p € IT has 
an initial value v, and non-faulty processes must 
decide irrevocably on a common value v. More- 
over, if the initial values are all equal to the same 
value v, then the common decision value is v. The 
properties that define Consensus can be split into 
safety properties (processes decide on the same 
value; the decision value must be consistent with 
initial values) and a liveness property (processes 
must eventually decide). 

Various Consensus algorithms have been de- 
scribed [6, 12] to cope with any type of process 
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failures if there is a known (Intuitively, “known 
bound” means that the bound can be “built into” 
the algorithm. A formal definition is given in 
the next section.) bound on the transmission de- 
lay of messages (communication is synchronous) 
and a known bound on process relative speeds 
(processes are synchronous). In completely asyn- 
chronous systems, where there exists no bound 
on transmission delays and no bound on process 
relative speeds, Fischer, Lynch, and Paterson [8] 
have proved that there is no Consensus algorithm 
resilient to even one crash failure. The paper 
by Dwork, Lynch, and Stockmeyer [7] intro- 
duces the concept of partial synchrony, in the 
sense it lies between the completely synchronous 
and completely asynchronous cases, and shows 
that partial synchrony makes it possible to solve 
Consensus in the presence of process failures, 
whatever the type of failure is. 

For this purpose, the paper examines the quite 
realistic case of asynchronous systems that be- 
have synchronously during some “good” periods 
of time. Consensus algorithms designed for syn- 
chronous systems do not work in such systems 
since they may violate the safety properties of 
Consensus during a bad period, that is when 
the system behaves asynchronously. This leads 
to the following question: is it possible to de- 
sign a Consensus algorithm that never violates 
safety conditions in an asynchronous system, 
while ensuring the liveness condition when some 
additional conditions are met? 


Key Results 


The paper has been the first to provide a positive 
and comprehensive answer to the above question. 
More precisely, the paper (1) defines various 
types of partial synchrony and _ introduces 
a new round based computational model 
for partially synchronous systems, (2) gives 
various Consensus algorithms according to the 
severity of failures (crash, omission, Byzantine 
faults with or without authentication), and 
(3) shows how to implement the round based 
computational model in each type of partial 
synchrony. 
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Partial Synchrony 

Partial synchrony applies both to communi- 
cations and to processes. Two definitions for 
partially synchronous communications are given: 
(1) for each run, there exists an upper bound A 
on communication delays, but A is unknown in 
the sense it depends on the run; (2) there exists an 
upper bound A on communication delays that is 
common for all runs (A is known), but holds only 
after some time T, called the Global Stabilization 
Time (GST) that may depend on the run (GST 
is unknown). Similarly, partially synchronous 
processes are defined by replacing “transmission 
delay of messages” by “relative process speeds” 
in (1) and (2) above. That is, the upper bound 
on relative process speed @ is unknown, or @ is 
known but holds only after some unknown time. 


Basic Round Model 

The paper considers a round based model: com- 
putation is divided into rounds of message ex- 
change. Each round consists of a send step, a re- 
ceive step, and then a computation step. In a send 
step, each process sends messages to any subset 
of processes. In a receive step, some subset of 
the messages sent to the process during the send 
step at the same round is received. In a computa- 
tion step, each process executes a state transition 
based on its current state and the set of messages 
just received. 

Some of the messages that are sent may not 
be received, i.e., some can be lost. However, the 
basic round model assumes that there is some 
round GSR, such that all messages sent from non 
faulty processes to non faulty processes at round 
GSR or afterward are received. 


Consensus Algorithm for Benign Faults 
(Requires f <n/2) 

In the paper, the algorithm is only described 
informally (textual form). A formal expression is 
given by Algorithm |: the code of each process 
is given round by round, and each round is spec- 
ified by the send and the computation steps (the 
receive step is implicit). The constant f denotes 
the maximum number of processes that may be 
faulty (crash or omission). The algorithm requires 


f <n/2. 
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Algorithm 1 Consensus algorithm in the basic round model for benign faults (f < n/2) 


: Initialization: 
Acceptabley := {vp} 
Propery = {vp} 
vote = 1 
Lockp =D 


1 

2 

3 

4 

5 

6: Round r= 4k — 3: 
a, Send: 

8 send (Acceptable,) to coord, 
9) Compute: 

0: 


ll: vote, := select one of these common acceptable values 


12: Round r = 4k - 2: 


{Vp is the initial value of p } 


{All the lines for maintaining Pro perp are trivial to write, and so are omitted} 


if p = coord, and p receives at least > n— f messages containing a common value then 


13: Send: 

14: if p = coord, and votep +1 then 

15: send (vote,) to all processes 

16: Compute: 

IW if received (v) from coord, then 

18: Locky := Lock, \ {v, —}; Lockp := Lockp U {(v, k)}5 
19: Round r= 4k —-1: 

20: Send: 

21: if dv s.t.(v, k) € Lock, then 

DP send (ack) to coord, 

23: Compute: 

24: if p = coord, then 

25: if received at least > f + 1 ack messages then 

26: DECIDE( votep); 

Qe votep = 1 

28: Round r= 4k: 

29: Send: 

30: send (Lockp) to all processes 

31: Compute: 

D2: for all (v, 0) € Lockp do 

33: if received (w, 0) s.t.w #v and @> 6 then {release lock on v} 
34: Lockp = Lockp U {(w, 8)} \ {(v, 8) }s 

35: if |Lockp| =1 then 

36: Acceptable, := v where (v, —) € Lockp 

Sie else 

38: if Locky = @ then Acceptablep := Properp else Acceptabley =@ 


Rounds are grouped into phases, where each 
phase consists in four consecutive rounds. The 
algorithm includes the rotating coordinator strat- 
egy: each phase k is led by a unique coordinator — 
denoted by coord; — defined as process p; for 
phase k = i(mod n). Each process p maintains 
a set Proper, of values that p has heard of 
(proper values), initialized to {v,} where v, is p’s 
initial value. Process p attaches Proper, to each 
message it sends. 

Process p may lock value v when p thinks that 
some process might decide v. Thus value v is an 
acceptable value to p if (1) v is a proper value to 
p, and (2) p does not have a lock on any value 
except possibly v (lines 35-38). 

At the first round of phase k (round 4k — 3), 
each process sends the list of its acceptable values 
to coord,. If coord, receives at least n — f sets 
of acceptable values that all contain some value 


v, then coord, votes for v (line 11), and sends 
its vote to all at second round 4k — 2. Upon 
receiving a vote for v, any process locks v in the 
current phase (line 18), releases any earlier lock 
on v, and sends an acknowledgment to coord, 
at the next round 4k — 1. If the latter process 
receives acknowledgments from at least f + 1 
processes, then it decides (line 26). Finally locks 
are released at round 4k — for any value v, only 
the lock from the most recent phase is kept, see 
line 34 — and the set of values acceptable to p is 
updated (lines 35-38). 


Consensus Algorithm for Byzantine Faults 
(Requires f < n/3) 

Two algorithms for Byzantine faults are given. 
The first algorithm assumes signed messages, 
which means that any process can verify the 
origin of all messages. This fault model is 
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called Byzantine faults with authentication. 
The algorithm has the same phase structure as 
Algorithm |. The difference is that (1) messages 
are signed, and (2) “proofs” are carried by 
some messages. A proof carried by message 
m sent by some process p; in phase k consists of 
a set of signed messages sgn j(m',k), proving 
that p; received message (m’,k) in phase k 
from p; before sending m. A proof is carried 
by the message send at line 16 and line 30 
(Algorithm 1). Any process receiving a message 
carrying a proof accepts the message and behaves 
accordingly if — and only if the proof is found 
valid. The algorithm requires f < n/3 (less than 
a third of the processes are faulty). 

The second algorithm does not assume 
a mechanism for signing messages. Compared 
to Algorithm 1, the structure of a phase is slightly 
changed. The problem is related to the vote sent 
by the coordinator (line 15). Can a Byzantine 
coordinator fool other processes by not sending 
the right vote? With signed messages, such 
a behavior can be detected thanks to the “proofs” 
carried by messages. A different mechanism is 
needed in the absence of signature. 

The mechanism is a small variation of the 
Consistent Broadcast primitive introduced by 
Srikanth and Toueg [15]. The broadcast primitive 
ensures that (1) if a non faulty process broadcasts 
m, then every non faulty process delivers m, and 
(2) if some non faulty process delivers m, then all 
non faulty processes also eventually deliver m. 
The implementation of this broadcast primitive 
requires two rounds, which define a superround. 
A phase of the algorithm consists now of three 
superrounds. The superrounds 3k — 2, 3k —1, 
3k mimic rounds 4k — 3, 4k — 2, and 4k — 1 of 
Algorithm 1, respectively. Lock-release of phase 
k occurs at the end of superround 3k, i.e., does not 
require an additional round, as it does in the two 
previous algorithms. The algorithm also requires 


f <n/3. 


The Special Case of Synchronous 
Communication 

By strengthening the round based computational 
model, the authors show that synchronous com- 
munication allow higher resiliency. More pre- 
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cisely, the paper introduces the model called the 
basic round model with signals, in which upon 
receiving a signal at round r, every process knows 
that all the non faulty processes have received 
the messages that it has sent during round r. At 
each round after GSR, each non faulty process 
is guaranteed to receive a signal. In this com- 
putational model, the authors present three new 
algorithms tolerating less than n benign faults, 
n/2 Byzantine faults with authentication, and n/3 
Byzantine faults respectively. 


Implementation of the Basic Round Model 
The last part of the paper consists of algorithms 
that simulate the basic round model under vari- 
ous synchrony assumption, for crash faults and 
Byzantine faults: first with partially synchronous 
communication and synchronous processes (case 
1), second with partially synchronous communi- 
cation and processes (case 2), and finally with 
partially synchronous processes and synchronous 
communication (case 3). 

In case 1, the paper first assumes the basic 
case ® = 1, i.e., all non faulty process progress 
exactly at the same speed, which means that they 
have a common notion of time. Simulating the 
basic round model is simple in this case. In case 
2 processes do not have a common notion of 
time. The authors handle this case by designing 
an algorithm for clock synchronization. Then 
each process uses its private clock to determine 
its current round. So processes alternate between 
steps of the clock synchronization algorithm 
and steps simulating rounds of the basic round 
model. With synchronous communication (case 
3), the authors show that for any type of faults, 
the so-called basic round model with signals is 
implementable. 

Note that, from the very definition of partial 
synchrony, the six algorithms share the funda- 
mental property of tolerating message losses, 
provided they occur during a finite period of time. 


Upper Bound for Resiliency 

In parallel, the authors exhibit upper bounds for 
the resiliency degree of Consensus algorithms in 
each partially synchronous model, according to 
the type of faults. They show that their Consensus 


440 


Consensus with Partial Synchrony 


Consensus with Partial Synchrony, Table 1 Tight resiliency upper bounds (P stands for “process”, C for “commu- 
nication”; 0 means “asynchronous”, 1/2 means “partially synchronous”, and 1 means “synchronous’’) 


P=0 C=0 P=1/2 C=1/2 P=1 


Benign 0 [(m — 1)/2] 
Authenticated 

Byzantine 0 [(n — 1)/3] 
Byzantine 0 [(n — 1)/3] 


algorithms achieve these upper bounds, and so 
are optimal with respect to their resiliency degree. 
These results are summarized in Table 1. 


Applications 


Availability is one of the key features of critical 
systems, and is defined as the ratio of the time 
the system is operational over the total elapsed 
time. Availability of a system can be increased 
by replicating its critical components. Two main 
classes of replication techniques have been con- 
sidered: active replication and passive replica- 
tion. The Consensus problem is at the heart of the 
implementation of these replication techniques. 
For example, active replication, also called state 
machine replication [10, 14], can be implemented 
using the group communication primitive called 
Atomic Broadcast, which can be reduced to Con- 
sensus [3]. 

Agreement needs also to be reached in the 
context of distributed transactions. Indeed, all 
participants of a distributed transaction need to 
agree on the output commit or abort of the trans- 
action. This agreement problem, called Atomic 
Commitment, differs from Consensus in the valid- 
ity property that connects decision values (com- 
mit or abort) to the initial values (favorable to 
commit, or demanding abort) [9]. In the case de- 
cisions are required in all executions, the problem 
can be reduced to Consensus if the abort decision 
is acceptable although all processes were favor- 
able to commit, in some restricted failure cases. 


Open Problems 


A slight modification to each of the algorithms 
given in the paper is to force a process repeatedly 
to broadcast the message “Decide v” after it 


C=1/2 P=1/2 C=1P=1 C=1 
[(a — 1)/2] n—-1 n—1 
[(n — 1)/3] [(n — 1)/2] n—-1 
[(n — 1)/3] [(n — 1)/3] [(n — 1)/3] 


decides v. Then the resulting algorithms share the 
property that all non faulty processes definitely 
make a decision within O(f) rounds after GSR, 
and the constant factor varies between 4 (benign 
faults) and 12 (Byzantine faults). A question 
raised by the authors at the end of the paper 
is whether this constant can be reduced. Inter- 
estingly, a positive answer has been given later, 
in the case of benign faults and f < 7/3, with 
a constant factor of 2 instead of 4. This can be 
achieved with deterministic algorithms, see [4], 
based on the communication schema of the Rabin 
randomized Consensus algorithm [13]. 

The second problem left open is the gener- 
alization of this algorithmic approach — namely, 
the design of algorithms that are always safe 
and that terminate when a sufficiently long good 
period occurs — to other fault tolerant distributed 
problems in partially synchronous systems. The 
latter point has been addressed for the Atomic 
Commitment and Atomic Broadcast problems 
(see section “Applications”). 
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Problem Definition 


A convex drawing of a planar graph G is a planar 
drawing of G where every vertex is drawn as 
a point, every edge is drawn as a straight line 
segment, and every face is drawn as a convex 
polygon. Not every planar graph has a convex 
drawing. The planar graph in Fig. 1a has a convex 
drawing as shown in Fig. 1b whereas the planar 
graph in Fig. 1d has no convex drawing. Tutte 
[11] showed that every 3-connected planar graph 
has a convex drawing, and obtained a necessary 
and sufficient condition for a planar graph to 
have a convex drawing with a prescribed outer 
polygon. Furthermore, he gave a “barycentric 
mapping” method for finding a convex drawing, 
which requires solving a system of O(n) linear 
equations [12] and leads to an O(n!-°) time 
convex drawing algorithm for a planar graph with 
a fixed embedding. Development of faster algo- 
rithms for determining whether a planar graph 
(where the embedding is not fixed) has a convex 
drawing and finding such a drawing if it exits 
is addressed in the paper of Chiba, Yamanouchi, 
and Nishizeki [2]. 


A Characterization for Convex Drawing A 
plane graph is a planar graph with a fixed em- 
bedding. In a convex drawing of a plane graph G, 
the outer cycle C,(G) is also drawn as a convex 
polygon. The polygonal drawing C; of C,(G), 
called an outer convex polygon, plays a crucial 
role in finding a convex drawing of G. The plane 
graph G in Fig. la admits a convex drawing if an 
outer convex polygon C; has all vertices 1, 2, 3, 
4, and 5 of C,(G) as the apices (i.e., geometric 
vertices) of C;’, as illustrated in Fig. 1b. However, 
if C3 has only apices 1, 2, 3, and 4, then G 
does not admit a convex drawing as depicted in 
Fig. lc. We say that an outer convex polygon C* 
is extendible if there exists a convex drawing of 
G in which C,(G) is drawn as C;. Thus, the 
outer convex polygon drawn by thick lines in 
Fig. lb is extendible, while that in Fig. Ic is not. 
If the outer facial cycle C, has an extendible outer 
convex polygon, we say that the facial cycle Co is 
extendible. 

Tutte established a necessary and sufficient 
condition for an outer convex polygon to be 
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Convex Graph Drawing, Fig. 1 Plane graphs and drawings 


Py 


P3 


c 


Convex Graph Drawing, Fig. 2. G and C5 violating Conditions (i)-(iii) in Theorem 1 


extendible [11]. The following theorem obtained 
by Thomassen [9] is slightly more general than 
the result of Tutte. 


Theorem1 Let G be a 2-connected plane 
graph, and let CX be an outer convex polygon 
of G. Let CX be a k-gon, k > 3, and let 
Py, P2,..., Pe be the paths in Co(G), each 
corresponding to a side of the polygon C;, as il- 
lustrated in Fig. 2a. Then, C3 is extendible if and 
only if the following Conditions (i)-(iii) hold. 


(i) For each inner vertex v with d(v) > 3, there 
exist three paths disjoint except v, each 
joining v and an outer vertex. 

(ii) G—V(Co(G)) has no connected component 
HT such that all the outer vertices adjacent 
to vertices in H lie on a single path P;, and 
no two outer vertices in each path P; are 
joined by an inner edge. 

(iii) Any cycle containing no outer edge has at 
least three vertices of degree =3. 


Figure 2a—c violate Conditions (i)—-(iii) of 
Theorem 1, respectively, where each of the 


faces marked by x cannot be drawn as a convex 
polygon. 


Key Results 


Two linear algorithms for convex drawings 
are the key contribution of the paper of 
Chiba, Yamanouchi, and Nishizeki [2]. One 
algorithm is for finding a convex drawing 
of a plane graph if it exists, and the other 
algorithm is for testing whether there is a planar 
embedding of a given planar graph which has 
a convex drawing. Thus, the main result of 
the paper can be stated as in the following 
theorem. 


Theorem 2 Let G be a 2-connected planar 
graph. Then, one can determine whether G has 
a convex drawing in linear time and find such a 
drawing in linear time if it exists. 


Convex Drawing Algorithm 
In this section, we describe the drawing algorithm 
of Chiba, Yamanouchi, and Nishizeki [2] which is 
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based on Thomassen’s short proof of Theorem 1. 
Suppose that a 2-connected plane graph G and 
an outer convex polygon C3 satisfy conditions 
in Theorem |. The convex drawing algorithm 
extends C3 into a convex drawing of G in linear 
time. For simplicity, it is assumed that every inner 
vertex has degree three or more in G. Otherwise, 
replace each maximal induced path not on C,(G) 
by a single edge joining its ends (the resulting 
simple graph G’ satisfies Conditions (i)-(iii) of 
Theorem 1); then, find a convex drawing of G’; 
and finally, subdivide each edge substituting a 
maximal induced path. 

They reduce the convex drawing of G to those 
of several subgraphs of G as follows: delete from 
G an arbitrary apex v of the outer convex polygon 
C3 together with the edges incident to v; divide 
the resulting graph G’ = G — v into blocks 
Bi, Bz,..., By, p = 1 as illustrated in Fig. 3; 
determine an outer convex polygon C;* of each 
block B; so that B; with C - satisfies Conditions 
(i)-Gii) of Theorem 1; and recursively apply the 
algorithm to each block B; with C;* to determine 
the position of inner vertices of B;. 

The recursive algorithm described above can 
be implemented in linear time by ensuring that 
only the edges which newly appear on the outer 
face are traversed in each recursive step. 


Convex Testing Algorithm 
In this section, we describe the convex testing al- 
gorithm of Chiba, Yamanouchi, and Nishizeki [2] 


Vv Vp+1 


Convex Graph Drawing, Fig. 3 Reduction of the con- 
vex drawing of G into subproblems 
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which implies a constructive proof of Theorem 2. 
They have modified the conditions in Theorem 1 
into a form suitable for the convex testing, which 
is represented in terms of 3-connected compo- 
nents. Using the form, they have shown that the 
convex testing of a planar graph G can be reduced 
to the planarity testing of a certain graph obtained 
from G. 

To describe the convex testing algorithm, we 
need some definitions. A pair {x, y} of vertices 
of a 2-connected graph G = (V, E) is called 
a separation pair if there exists two subgraphs 
G, = (Vi, E;) and G, = (V2, E4) satisfying 
the following conditions (a) and (b): (a) V 
Vi U V2,Vi N V2 = {x,y}; and (b) E 
EL U E4,E, 9 B = O\Ei| > 2,125 
2. For a separation pair {x,y} of G, Gi 
(Vi, Ey} + (x, y)) and Gp = (V2, EF, + (x, y)) 
are called the split graphs of G. The new edges 
(x, y) added to G; and Gz are called the virtual 
edges. Dividing a graph G into two split graphs 
G, and Gz is called splitting. Reassembling the 
two split graphs G, and G2 into G is called 
merging. Suppose that a graph G is split, the split 
graphs are split, and so on, until no more splits 
are possible. The graphs constructed in this way 
are called the split components of G. The split 
components are of three types: triple bonds (i.e., 
a set of three multiple edges), triangles, and 3- 
connected graphs. The 3-connected components 
of G are obtained from the split components of G 
by merging triple bonds into a bond and triangles 
into a ring, as far as possible, where a bond is a 
set of multiple edges and a ring is a cycle. Note 
that the split components of G are not necessarily 
unique, but the 3-connected components of G are 
unique [5]. 

A separation pair {x,y} is prime if x and y 
are the end vertices of a virtual edge contained in 
a 3-connected component. Suppose that {x, y} is 
a prime separation pair of a graph G and that G 
is split at {x, y}, the split graphs are split, and so 
on, until no more splits are possible at {x, y}. A 
graph constructed in this way is called an {x, y}- 
split component of G if it has at least one real 
(i.e., non-virtual) edge. 

In some cases, it can be easily known only 
from the {x, y}-split components for a single 
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separation pair {x,y} that a graph G has no 
convex drawing. A prime separation pair {x, y} 
of G is called a forbidden separation pair if there 
are either (a) at least four {x, y}-split components 
or (b) exactly three {x, y}-split components, each 
of which is neither a ring nor a bond. Note that 
an {x, y}-split component corresponds to an edge 
(x, y) if it is a bond and to a subdivision of an 
edge (x, y) if itis aring. One can easily know that 
if a planar graph G has a forbidden separation 
pair, then any plane embedding of G has no con- 
vex drawing, that is, G has no extendible facial 
cycle. On the other hand, the converse of the fact 
above is not true. A prime separation pair {x, y} 
is called a critical separation pair if there are 
either (i) exactly three {x, y}-split components 
including a bond or a ring or (ii) exactly two 
{x, y}-split components each of which is neither 
a bond nor a ring. When a planar graph G has 
no forbidden separation pair, two cases occur: if 
G has no critical separation pair either, then G 
is a subdivision of a 3-connected graph, and so 
every facial cycle of G is extendible; otherwise, 
that is, if G has critical separation pairs, then a 
facial cycle F of G may or may not be extendible, 
depending on the interaction of F and critical 
separation pairs. 

Using the concepts of forbidden separation 
pairs and critical separation pairs, Chiba et al. 
gave the following condition in Theorem 3 which 
is suitable for the testing algorithm. They proved 
that the condition in Theorem 3 is equivalent to 
the condition in Theorem | under a restriction 
that the outer convex polygon C; is strict, that 
is, every vertex of C,(G) is an apex of C5 [2]. 


Theorem 3 Let G = (V,E) be a 2-connected 
plane graph with the outer facial cycle F = 
Co(G), and let CZ be an outer strict convex 
polygon of G. Then, C3 is extendible if and only 
if G and F satisfy the following conditions. 


(a) G has no forbidden separation pair. 

(b) For each critical separation pair {x, y} of G, 
there is at most one {x, y}-split component 
having no edge of F,, and, if any, it is either a 
bond if (x, y) € E ora ring, otherwise. 


The convex testing condition in Theorem 3 
is given for a plane graph. Note that Condition 


Convex Graph Drawing 


(a) does not depend on a plane embedding. Thus, 
to test whether a planar graph G has a convex 
drawing, it is needed to test whether G satisfies 
Condition (a) or not and if G satisfies Condition 
(a) then test whether G has a plane embedding 
such that its outer face F' satisfies Condition 
(b) in Theorem 3. With some simple observa- 
tion, it is shown that every graph G having no 
forbidden separation pair has an embedding such 
that the outer face satisfies Condition (b) if G 
has at most one critical separation pair. Hence, 
every planar graph with no forbidden separation 
pair and at most one critical separation pair has a 
convex drawing. 

The convex testing problem of G for the case 
where G has no forbidden separation pair and has 
two or more critical separation pairs is reduced to 
the planarity testing problem of a certain graph 
obtained from G. If G has a plane embedding 
which has a convex drawing, the outer face F 
of the embedding must satisfy Condition (b) of 
Theorem 3. Then, F contains every vertex of 
critical separation pairs and any split component 
which is neither a bond nor a ring must have an 
edge on the outer face. Observe that if a critical 
separation pair {x,y} has exactly three {x, y}- 
split components, then two of them can have 
edges on F and one cannot have an edge on F;; 
the {x, y}-split component which will not have an 
edge on F must be either a bond or a ring. Thus, 
to test whether G has such an embedding or not, 
a new graph from G is constructed as follows. 
For each critical separation pair {x, y}, if (x, y) 
is an edge of G, then delete the edge {x, y} from 
G. If (x, y) is not an edge of G and exactly one 
{x, y}-split component is a ring, then delete the 
x-y path in the component from G. Let G be the 
resulting graph, as illustrated in Fig. 4b. Let G2 
be the graph obtained from G, by adding a new 
vertex v and joining v to all vertices of critical 
separation pairs of G, as illustrated in Fig. 4c. 
If Gz has a planar embedding I such that v is 
embedded on the outer face of I> as illustrated 
in Fig. 4d, we get a planar embedding I of G 
from I by deleting v from the embedding as 
illustrated in Fig.4e. The outer facial cycle of 
I, will be the outer facial cycle F of a planar 
embedding of G as illustrated in Fig. 4f which 
satisfies the Condition (b) of Theorem 3. Thus, 
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Convex Graph Drawing, Fig. 4 Illustration for convex testing; (a) G, (b) G1, (c) Go, (d) L2, (e) L4, and (f) 


the strict convex polygon F* of F is extendible. 
Hence, G has a convex drawing if Gz has a planar 
embedding with v on the outer face. It is not 
difficult to show the converse implication. Hence, 
Theorem 2 holds. 

Observe that F may not be the only extendible 
facial cycle of G, that is, J’ may not be the only 
planar embedding of G which has a convex draw- 
ing. Chiba et al. [2] also gave a linear algorithm 
which finds all extendible facial cycles of G. 


Applications 


Thomassen [10] showed applications of convex 
representations in proving a conjecture of Griin- 
baum and Shephard on convex deformation of 
convex graphs and for giving a short proof of 
the result of Mani-Levistka, Guigas, and Klee 
on convex representation of infinite doubly peri- 
odic 3-connected planar graphs. The research on 
convex drawing of planar graphs was motivated 
by the desire of finding aesthetic drawings of 
graphs [3]. Arkin et al. [1] showed that there is a 
monotone path in some direction between every 
pair of vertices in any strictly convex drawing of 
a planar graph. 


Open Problems 


A convex drawing is called a convex grid drawing 
if each vertex is drawn at a grid point of an 
integer grid. Using canonical ordering and shift 
method, Chrobak and Kant [4] showed that every 
3-connected plane graph has a convex grid draw- 
ing on an (n — 2) x (m — 2) grid and such a grid 
drawing can be found in linear time. However, the 
question of whether every planar graph which has 
a convex drawing admits a convex grid drawing 
on a grid of polynomial size remained as an open 
problem. Several research works are concentrated 
in this direction [6, 7, 13]. For example, Zhou 
and Nishizeki showed that every internally tricon- 
nected plane graph G whose decomposition tree 
T(G) has exactly four leaves has a convex grid 
drawing ona 2nx4n = O(n”) grid and presented 
a linear algorithm to find such a drawing [13]. 
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Problem Definition 


The convex hull of a set P of n points in R@ is 
the intersection of all convex regions that contain 
P. While convex hulls are defined for arbitrary 
d, the focus here is on d = 2 (and d = 3). For 
a more general overview, we recommend reading 
[7,9] as well as [3]. 

A frequently used visual description for a 
convex hull in 2D is a rubber band: when we 
imagine the points in the plane to be nails and 
put a rubber band around them, the convex hull is 
exactly the structure we obtain by a tight rubber 
band; see Fig. 1. 

The above definition, though intuitive, is hard 
to use for algorithms to compute the convex hull 
— one would have to intersect all convex supersets 
of P. However, one can show that there is an 
alternative definition of the convex hull of P: it 
is the set of all convex combinations of P. 


Notation 
For a point set P = {pj,... 
combination is of the form 


, Pn}, @ convex 


Aj=1. 

(1) 
The convex hull, CH(P), of P is the polygon 
that consists of all convex combinations of P. 
The ordered convex hull gives the ordered se- 
quence of vertices on the boundary of CH(P), 
instead of only the set of vertices that constitute 
the hull. 


AypitAzpot...Anpn with A; > 0, 


Key Results 


In the following, we present algorithms that com- 
pute the ordered convex hull of a given point set 
P in the plane. We start with a short proof for a 
lower bound of 2(n logn). 


Lower Bound 

Theorem 1 Let P be a set of n points in the 
plane. An algorithm that computes the ordered 
convex hull is in 82(n logn). 


Convex Hulls 


Convex Hulls, Fig. 1 The 
convex hull of a set of 
points in R? 


a 


Convex Hulls, Fig. 2 Set of numbers X in gray, the 
point set P in black and the convex hull CH (P) in blue 


Proof Given an unsorted list of numbers X = 
{X1,X2,...,Xn}, we can lift these to a parabola 
as depicted in Fig. 2. Computing the convex hull 
CH(P) of the resulting set P = {(x;,x7) | x; € 
X} allows to output the sorted numbers by read- 
ing off the x-values of the vertices on the lower 
chain of CH(P) in O(n) time. Thus, a computa- 
tion of CH(P) in o(n logn) time would contra- 
dict the lower bound (2(n log n) for sorting. 


Divide and Conquer by Preparata and 

Hong [8] 

In the first step, the elements of P are sorted, 
and then the algorithm recursively divides P into 
subsets A and B of similar size. This is done 
until at most three points are left in each set, 
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for which the convex hull is trivially computed. 
The sorting assures that the computed convex 
hulls are disjoint. Thus, in each step of the merge 
phase, the algorithm is given two ordered convex 
hulls CH(A) and CH(B), which are separated by 
a vertical line. To compute CH(A U B), it needs 
to find the two tangents supporting CH(A) and 
CH(B) from above and below, respectively. The 
procedure, which is often referred to as a “wobbly 
stick’, is exemplified in Fig. 3. 

It is easy to see that each level of the merge 
phase requires O(n) time, resulting in a total 
running time of O(n log). A similar idea is also 
applicable in 3D. 


Graham Scan [4] 

The Graham Scan starts with a known point on 
the hull as an anchor, the bottommost (rightmost) 
point p,. The remaining 1 — 1 points are sorted 
according to the angles they form with the nega- 
tive x-axis through p,, from the largest angle to 
the smallest angle. The points p; are processed 
using this angular order. For the next point p;, 
the algorithm determines whether the last con- 
structed hull edge xy and p; form a left or a right 
turn. In case of a right turn, y is not part of the 
hull and is discarded. The discarding is continued 
as long as the last three points form a right turn. 
Once xy and p; form a left turn, the next point 
in the angular order is considered. Because of the 
initial sorting step, the total running time of the 
Graham Scan is O(n logn). 


Jarvis’s March or Gift Wrapping [5] 
Jarvis’s March is an output-sensitive algorithm, 
i.e., the running time does depend on the size of 
the output. Its total running time is O(nh), where 
h is the number of points on CH(P). 
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Convex Hulls, Fig. 3 Finding the tangent supporting 
CH(A) and CH(B) from below: considering CH(A) and 
CH(B) as obstacles, the algorithm iteratively tries to 
increment either a or B in clockwise (cw) and coun- 


Convex Hulls 


terclockwise (ccw) order, respectively, while maintaining 
visibility of @ and B. That is, it stops as soon as @ does 
not see the ccw neighbor of 8 and B does not see the cw 
neighbor of a 


Convex Hulls, Fig. 4 The first three steps of the gift wrapping algorithm. Starting at p1, the algorithm acts as if it was 


wrapping a string around the point set 


The algorithm starts with a known point on 
the hull, i.e., the bottommost (rightmost) point 
Pi. Just like wrapping a string around the point 
set, it then computes the next point on CH(P) 
in counterclockwise order: compare angles to 
all other points and choose the point with the 
largest angle to the negative x-axis. In general, 
the wrapping step is as follows: let px_, and 
Px be the two previously computed hull vertices, 
then px+1 is set to the point p € P, p # px, that 
maximizes the angle Z px—1 pe Pe+1; see Fig. 4. 
Each steps takes O(n) time and finds one point 
on CH(P). Thus, the total running time is O(nh). 


Chan’s Algorithm [2] 

In 1996, Chan presented an output-sensitive al- 
gorithm, with a worst-case optimal running time 
of O(n logh). This does not contradict the lower 
bound presented above, as it features n points on 
the hull. Let us for now assume that / is known, 
the number of points on the final convex hull. The 
algorithm runs in two phases. 


Phase 1 splits P into [n/h] groups of size at 
most 4. Computing the convex hull for each 
set using, e.g., Graham Scan takes O(h log h). 


This results in [7 /h] (potentially overlapping) 
convex hulls and takes O([n/h]-hlogh) = 
O(n log h); see Fig. 5. 

Phase 2 essentially applies Jarvis’s March. 
Starting at the lowest leftmost point, it wraps 
a string around the set of convex hulls, i.e., for 
each hull it computes the proper tangent to the 
current point and chooses the tangent with the 
best angle in order to obtain the next point on 
the final convex hull. Computing the tangent 
for a hull of size h takes O(logh), which 
must be done for each of the [”/h] hulls in 
each of the / rounds. Thus, the running time 
is O(h- [n/h]logh) = O(mlogh). 


Because / is not known, the algorithm does 
several such rounds for increasing values of h, 
until 4 is determined. In the initial round, it 
starts with a very small hg = 4 = 22° and 
continues with h; = 2?' in round 7. As long 
as h; < h, phase 1 is very quick. The second 
phase stops with an incomplete hull, knowing 
that h; is still too small. That is, round 7 costs 
at most nlog(h;) = n2'. The algorithm ter- 
minates as soon as h; > h. Thus, in total we 
obtain [loglogh] rounds. Therefore, the total 
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Convex Hulls, Fig. 5 
Initial step of Jarvis’s 
March in the first round of 
Chan’s algorithm 

(ho = 4). Starting at pi, 
the algorithm computes 
tangents to each convex 
hull (indicated in different 
colors) and selects the first 
tangent in 
counterclockwise order 


cost is: 
[log log h] 
OC SY) n2*y=O(n2!relestl+1)— O(n log h). 
t=1 
(2) 
Implementation 


Like many geometric algorithms, the computa- 
tion of the convex hull can be very sensitive to 
inconsistencies, due to rounding errors [6]. A 
well-maintained collection of exact implementa- 
tions that eliminates problems due to rounding 
errors can be found in CGAL, the Computational 
Geometry Algorithms Library [1]. 
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Problem Definition 


Data is often sampled as a means of addressing 
resource constraints on storage, bandwidth, or 
processing — even when we have the resources to 
store the full data set, processing queries exactly 
over the data can be very expensive, and we 
therefore may opt for approximate fast answers 
obtained from the much smaller sample. 

Our focus here is on data sets that have the 
form of a set of keys from some universe and mul- 
tiple instances, which are assignments of non- 
negative values to keys. We denote by vy; the 
value of key / in instance i. Examples of data 
sets with this form include measurements of a set 
of parameters; snapshots of a state of a system; 
logs of requests, transactions, or activity; IP flow 
records in different time periods; and occurrences 
of terms in a set of documents. Typically, this 
matrix is very sparse — the vast majority of entries 
are vj, = 0. 

The sampling algorithms we apply can scan 
the full data set but retain the values vz; of only 
a subset of the entries (4,i). From the sample, 
we would like to estimate functions (or statistics) 
specified over the original data. In particular, 
many common queries can be expressed as (or 
as a function of) the sum Spey f(vp.) over 
selected keys h € H of a basic function f 
applied to the values of the key / in one or more 
instances. These include domain (subset) queries 
new Vai Which are the total weight of keys 
H in instance 1, Ly distances between instances 
i,j which use f(vp,.) = |vpi — vpj|?, one- 
sided distances which use f(v,.) = max{0, va; — 
vaj}”, and sums of quantiles (say maximum or 
median) of the tuple (vj1,..., Var). 

The objective is to design a family of sampling 
scheme that is suitable for one or more query 
types. When the sampling scheme is specified, we 
are interested in designing estimators that use the 
information in the sample in the “best” way. The 
estimators are functions f that are applied to the 
sample and return an approximate answer to the 


query. 


Coordinated Sampling 


When sampling a single-instance i and aiming 
for approximation quality on domain queries and 
on sparse or skewed data, we use a weighted 
sample, meaning that the inclusion probability of 
an entry (A,i) in the sample depends (usually is 
increasing) with its weight v»;. In particular, zero 
entries are never sampled. Two popular weighted 
sampling schemes are Poisson sampling, where 
entries are sampled independently, and bottom-k 
(order) sampling [12, 35, 36]. It is convenient for 
our purposes to specify these sampling schemes 
through a rank function, r [O,.1]JxV—7> 
IR, which maps seed-value pairs to a number 
r(u,v) that is nonincreasing with uw and nonde- 
creasing with v. For each item h, we draw a 
seed u(h) ~ U[0,1] uniformly at random and 
compute the rank value r(u(h), vp;). A Poisson 
sample includes akeyh <= > r(u(h),vp;) = 
Ty;, where 7}; are fixed thresholds. A bottom- 
k sample includes the k items with the highest 
ranks. (The term bottom-k is due to equivalently 
using the inverse rank function and lowest k ranks 
[12-14, 35, 36].) 

Specifically, Poisson probability proportional 
to size (PPS) [28] samples include each key 
with probability proportion to its value. They 
are specified using the rank function r(u,v) = 
v/u and a fixed T; = Ty; across all keys in 
the instance. Priority (sequential Poisson) sam- 
ples [22, 33, 38] are bottom-k samples utilizing 
the PPS ranks r(u,v) = v/u, and successive 
weighted samplings without replacement [12, 23, 
35] are bottom-k samples with the rank function 
r(u,v) = —v/iIn(i — u). All these sampling 
algorithms can be easily implemented when the 
data is streamed or distributed. 

Queries over a_ single 
estimated using inverse probabilities [29]. For 
each sampled key h, we can compute the 
probability g(h) that it is included in the sample. 
With Poisson sampling, this probability is that 
of u ~ U[0,1] satisfying r(u(h), v_;) = Thi. 
With bottom-k sampling, we use a conditioned 
version [13, 22, 38]: The probability g(/) is that 
of r(u(h), v_;) being larger than the kth largest 
value among r(u(y), vy;), where y are all keys 
other than fA and u(y) is fixed to be as in the 
current sample. Note that this threshold value 
is available to us from the sample, and hence 


instance can be 
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keys: 1 2 3 #4 
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Instancel: ||/1 2 2 0 
Instance2: ||0 1 3 yi 


6 7 8 
4 0 1 
1 2 38 


PPS sampling probabilities for T=4 (sample of expected size 3): 


Instancel: 


Instance2: 


Coordinated Sampling, Fig. 1 Two instances with 8 
keys and respective PPS sampling probabilities for thresh- 
old value 4, so a key with value v is sampled with 
probability min{1, v/4}. To obtain two coordinated PPS 
samples of the instances, we associate an independent 
u(@i) ~ U[0,1] with each key i € [8]. We then 


q(h) can be computed for each sampled key. We 
then estimate }°,<7 f(vn;) using the sum over 
sampled keys that are in H of f(vj;;)/q(h). This 
estimator is a sum estimator in that it is applied 
separately for each key: f (h) = O when the 
key is not sampled and is f(h) = f(vni)/q(h) 
otherwise. When the sampling scheme is such 
that g(h) > O whenever f(vy;) > 0, this 
estimator is unbiased. Moreover, this is also the 
optimal sum estimator, in terms of minimizing 
variance. We note that tighter estimators can be 
obtained when the total weight of the instance is 
available to the estimator [13]. 


What Is Sample Coordination? 

When the data has multiple instances, we dis- 
tinguish between a data source that is dispersed, 
meaning that different entries of each key occur 
in different times or locations and co-located if 
all entries occur together. These scenarios [17] 
impose different constraints on the sampling al- 
gorithm. In particular, with co-located data, it is 
easy to include the complete tuple vy. of each key 
h that is sampled in at least one instance, whereas 
with dispersed data, we would want the sampling 
of one entry vp; not to depend on the values vj; 
in other instances 7. 

When sampling, we can redraw a fresh set 
of random seed values u(h) for each instance, 
which results in the samples being independent. 
Samples of different instances are coordinated 
when the set of seeds u(h) is common in all 
instances. Scalable sharing of seeds when data 
is dispersed is facilitated through random hash 


0.25 0.75 0.50 0.00 0.25 1.00 0.00 0.25 
0.00 0.25 0.75 0.50 0.00 0.25 0.50 0.75 


sample i € [8] in instance h € [2] if and only if 
u(i) < uni /4, where vp; is the value of 7 in instance h. 
When coordinating the samples this way, we make them 
as similar as possible. In the example, key 8 will always 
(for any drawing of seeds) be sampled in instance 2 if it is 
sampled in instance 1 and vice versa for key 2 


functions u(h), where the only requirement 
for our purposes is uniformity and pairwise 
independence. 

Coordinated sampling has the property that the 
samples of different instances are more similar 
when the instances are more similar, a property 
also known as Locality Sensitive Hashing (LSH) 
(26, 30, 31]. Figure 1 contains an example data 
set of two instances and the PPS sampling proba- 
bilities of each item in each instance. When the 
samples are coordinated, a key sampled in one 
instance is always sampled in the instance with 
a higher inclusion probability. 


Why Use Coordination? 


Co-located Data 

With co-located data, coordination allows us to 
minimize the sample size, which is the (expected) 
total number of included keys, while ensuring 
that the sample “contains” a desired Poisson or 
bottom-& sample of each instance. 

For a key fh, we can consider its respective 
inclusion probabilities in each of the instances 
(for bottom-k, inclusion may be conditioned on 
seeds of other keys). With coordination, the in- 
clusion probability in the combined sample, g(h), 
is always the maximum of these probabilities. Es- 
timation with these samples is easy: We estimate 
new Jf (vn.) using the Horvitz-Thompson esti- 
mator (inverse probabilities) [29]. The estimate is 
the sum f(v,.)/q(h) over sampled keys that are 
in H. Since the complete tuple va. is available for 


452 


each sampled key, we can compute f(v ,.), g(A), 
and thus the estimate. 

When the query involves entries from a single- 
instance 1, the variance of this estimate is at most 
that obtained from a respective sample of 7. This 
is because the inclusion probability g(A) of each 
key is at least as high as in the respective single- 
instance sample. Therefore, by coordinating the 
samples, we minimize the total number of keys 
included while ensuring estimation quality with 
respect to each instance. 


Dispersed Data 

With dispersed data, coordination is useful when 
we are interested in both domain queries over a 
single instance at a time and some queries that in- 
volve complex relation, such as similarity queries 
between multiple instances. Estimation of more 
complex relations, however, can be much more 
involved. Intuitively, this is because the sample 
can provide us with partial information on f(vz.) 
that is short of the exact value but still lower 
bounds it by a positive amount. Therefore, in this 
case, inverse probability estimators may not be 
optimal or even well defined — there could be 
zero probability of knowing the exact value, but 
the function f can have a nonnegative unbiased 
estimator. In the sequel, we overview estimators 
that are able to optimally use the available infor- 
mation. 


Implicit Data 

Another setting where coordination arises is 
when the input is not explicit, for example, 
expressed as relations in a graph, and coordinated 
samples can be obtained much more efficiently 
than independent samples. In this case, we work 
with coordinated samples even when we are 
interested in queries that are approximated well 
with independent samples. 


Key Results 


We now overview results on estimators that are 
applied to coordinated samples of dispersed data. 

We first observe that coordinated PPS and 
bottom-k samples are mergeable, meaning that a 
respective sample of the union, or more generally, 
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of a new instance whose weight of each key is 
the coordinate-wise maxima of several instances 
can be computed from the individual samples 
of each instance. This makes some estimation 
problems very simple. Even in these cases, how- 
ever, better estimators of cardinalities of unions 
and intersections (key-wise maxima or minima) 
can be computed when we consider all sampled 
keys in the two sets rather than just the sample 
of the union. Such estimators for the 0/1 case 
are presented in [14] and for general nonnegative 
weights in [17]. 

The general question is to derive estimators for 
an arbitrary function f > 0 with respect to a 
coordinated sampling scheme. This problem can 
be formalized as a Monotone Estimation Problem 
(MEP) [9]: The smaller the seed u(h) € U[0, 1] 
is, the more information we have on the values 
vy. and therefore on f(vj.). We are interested 
in deriving estimators f that are nonnegative; 
this is because we are interested in nonnegative 
functions and estimates should be from the same 
range. We also seek unbiasedness, because we are 
ultimately estimating a sum over many keys, and 
bias accumulates even when sampling of different 
keys are independent. Other desirable properties 
are finite variance (for any vz. in the domain) or 
bounded estimates. A complete characterization 
of functions f for which estimators with subsets 
of these properties exist is given in [15]. 

We are also interested in deriving estimators 
that are admissible. Admissibility is Pareto op- 
timality, meaning that any other estimator with 
lower variance on some data would have higher 
variance on another. Derivations of admissible 
estimators (for any MEP for which an unbiased 
nonnegative estimator exists) are provided in [9]. 
Of particular interest is the L* estimator, which 
is the unique admissible monotone estimator. By 
monotone, we mean that when there is more 
information, that is, when u(h) is higher, the 
estimate a is at least as high. 

A definition of variance competitiveness of 
MEP estimators is provided in [15]. The competi- 
tive ratio of an estimator f is the maximum, over 
all possible inputs (data values) v,., of the ratio 
of the integral of the square of 7 to the minimum 
possible by a nonnegative unbiased estimator. It 
turns out that the L* has ratio of at most 4 on any 
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MEP for which a nonnegative unbiased estimator 
with finite variances exists [9]. Two interesting 
remaining open problems, partially addressed in 
[10], are to design an estimator with minimum 
ratio for a given MEP and also to bound the 
maximum minimum ratio over all MEPs that 
admit an unbiased nonnegative estimator with 
finite variance. 


Applications 


We briefly discuss the history and some applica- 
tions of coordinated sampling. 

Sample coordination was proposed in 1972 by 
Brewer, Early, and Joice [2], as a method to max- 
imize overlap and therefore minimize overhead 
in repeated surveys [34, 36, 37]: The values of 
keys change, and therefore, there is a new set of 
PPS sampling probabilities. With coordination, 
the sample of the new instance is as similar as 
possible to the previous sample, and therefore, 
the number of keys that need to be surveyed 
again is minimized. The term permanent random 
numbers (PRN) is used in the statistics literature 
for sample coordination. 

Coordination was subsequently used by com- 
puter scientists to facilitate efficient processing 
of large data sets, as estimates obtained over 
coordinated samples are much more accurate 
than possible with independent samples [1, 3- 
6,8, 13, 14, 16,1721, 24, 25,27, 32]. 

In some applications, the representation of 
the data is not explicit, and coordinated samples 
are much easier to compute than independent 
samples. One such example is computing all- 
distance sketches, which are coordinated samples 
of (all) d-neighborhoods of all nodes in a graph 
[6,7, 11-13,32]. These sketches support centrality 
and similarity and influence queries useful in the 
analysis of massive graph data sets such as social 
networks or Web graphs [18-20]. 
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Problem Definition 


We consider a framework of the problems to 
enumerate all the subsets of a given graph, each 
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subset of which satisfies a given constraint. For 
example, enumerating all Hamilton cycles, all 
spanning trees, all paths between two vertices, 
all independent sets of vertices, etc. When we 
assume a graph G = (V, E) with the vertex set 
V = {v1,0v2,...,Un} and the edge set E = 
{€1,€2,...,@m}, a graph enumeration problem is 
to compute a subset of the power set 2” (or 2”), 
each element of which satisfies a given constraint. 
In this model, we can consider that each solution 
is a combination of edges (or vertices), and the 
problem is how to represent the set of solutions 
and how to generate it efficiently. 


Constraints 


Any kind of constraint for the graph edges and 
vertices can be considered. For example, we con- 
sider to enumerate all the simple (self-avoiding) 
paths connecting the two vertices s and ¢ on the 
graph shown in Fig. 1. The constraint is described 
as: 


1. At a terminal vertex (s and f), only one edge 
is selected and connected to the vertex. 

2. At the other vertices, none or just two edges 
are selected and connected to the vertex, re- 
spectively. 

3. The set of selected edges forms a connected 
component. 


In this example, the set of solutions can be 
represented as a combination of the edges, 
(e135, €1€4, €2€3e4, e7es}. 


Key Results 


A binary decision diagram (BDD) is a representa- 
tion of a Boolean function, one of the most basic 
models of discrete structures. After the epoch- 
making paper [1] by Bryant in 1986, BDD-based 
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methods have attracted a great deal of attention. 
BDD was originally invented for the efficient 
Boolean function manipulation required in VLSI 
logic design, but Boolean functions are also used 
for modeling many kinds of combinatorial prob- 
lems. Zero-suppressed BDD (ZDD) [8] is a vari- 
ant of BDD, customized for manipulating “sets 
of combinations.’ ZDDs have been successfully 
applied not only for VLSI design but also for 
solving various combinatorial problems, such as 
constraint satisfaction, frequent pattern mining, 
and graph enumeration. 

Recently, D. E. Knuth presented a surprisingly 
fast algorithm “Simpath” [7] to construct a ZDD 
which represents all the paths connecting two 
points in a given graph structure. This work is 
important because many kinds of practical prob- 
lems are efficiently solved by some variations 
of this algorithm. We generically call such ZDD 
construction method “frontier-based methods.” 


BDDs/ZDDs for Graph Enumeration 


A binary decision diagram (BDD) is a graph 
representation for a Boolean function, developed 
for VLSI design. A BDD is derived by reducing a 
binary decision tree, which represents a decision- 
making process by the input variables. If we fix 
the order of input variables and apply the follow- 
ing two reduction rules, then we have a compact 
canonical form for a given Boolean function: 


1. Delete all redundant nodes whose two edges 
have the same destination. 

2. Share all equivalent nodes having the same 
child nodes and the same variable. 


The compression ratio achieved by using a BDD 
instead of a decision tree depends on the property 
of Boolean function to be represented, but it can 
be 10-100 times in some practical cases. 

A zero-suppressed BDD (ZDD) is a variant of 
BDD, customized for manipulating sets of com- 
binations. This data structure was first introduced 
by Minato [8]. ZDDs are based on the special 
reduction rules different from ordinary ones, as 
follows: 
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1. Delete all nodes whose 1-edge directly points 
to the 0-terminal node, but do not delete the 
nodes which were deleted in ordinary BDDs. 


This new reduction rule is extremely effective, 
when it is applied to a set of sparse combinations. 
If each item appears in 1% of combinations 
in average, ZDDs are possibly more compact 
than ordinary BDDs, by up to 100 times. Such 
situations often appear in real-life problems, for 
example, in a supermarket, the number of items 
in a customer’s basket is usually much less than 
all the items displayed there. Because of such an 
advantage, ZDDs are now widely recognized as 
the most important variant of BDD. 

ZDDs can be utilized for enumerating and 
indexing the solutions of a graph problem. For 
example, Fig. 2 shows the ZDD enumerating all 
the simple paths of the graph the same as Fig. 1. 
The ZDD has four paths from the root node to the 
1-terminal node, and each path corresponds to the 
solution of the problem, where e; = 1 means to 
use the edge e; and e; = 0 means not to use e;. 


Frontier-Based Method 


In 2009, Knuth published the surprisingly fast 
algorithm “Simpath” [7] (Vol. 4, Fascicle 1, 
p. 121, or p. 254 of Vol. 4A) to construct a 
ZDD which represents all the simple (or self- 
avoiding) paths connecting two points s and t ina 
given graph (not necessarily the shortest ones but 


{€1€3€5, €1€4, Cr€3€y, Cr€s} 
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ones not passing through the same point twice). 
This work is important because many kinds of 
practical problems can be efficiently solved by 
some variations of this algorithm. Knuth provides 
his own C source codes on his web page for 
public access, and the program is surprisingly 
fast. For example, in a 14 x 14 grid graph (420 
edges in total), the number of self-avoiding 
paths between opposite corners is exactly 
2274497 146768 1273963 1826459327989863387 
613323440 (2.27 x 1047) ways. By applying 
the Simpath algorithm, the set of paths can be 
compressed into a ZDD with only 144759636 
nodes, and the computation time is only a few 
minutes. 

The Simpath algorithm is minutely written 
in Knuth’s book, and his source codes are also 
provided, but it is not easy to read. The survey 
paper [9] will be helpful for understanding the 
basic mechanism of the Simpath algorithm. 

The Simpath algorithm belongs to the method 
of dynamic programming, by scanning the given 
graph from left to right like a moving frontier 
line. If the frontier grows larger in the compu- 
tation process, more intermediate states appear 
and more computation time is required. Thus, 
it is important to keep the frontier small. The 
maximum size of the frontier depends on the 
given graph structures and the order of the edges. 
Planar and narrow graphs tend to have small 
frontier. 

Knuth described in his book [7] that the Sim- 
path algorithm can easily be modified to generate 
not only s — ¢ paths but also Hamilton paths, 
directed paths, some kinds of cycles, and many 
other problems by slightly changing the mate 
data structure. We generically call such ZDD 
construction method “frontier-based methods.” 


Applications 


Here we list graph problems which can be enu- 
merated and indexed by a ZDD using a frontier- 
based method. 


All s —¢ paths, s — t paths with length k, k- 
pairs of s — ¢ paths, all cycles, cycles with 
length k, Hamilton paths/cycles, directed 
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paths/cycles, all connected components, span- 
ning trees/forests, Steiner trees, all cutsets, 
k-partitioning, calculating probability of 
connectivity, all cliques, all independent sets, 
graph colorings, tilings, and perfect/imperfect 
matching. 


These problems are strongly related to many 
kinds of real-life problems. For example, path 
enumeration is of course important in geographic 
information systems and is also used for 
dependency analysis of a process flow chart, 
fault analysis of industrial systems, etc. Recently, 
Inoue et al. [5] discussed the design of electric 
power distribution systems. Such civil engineer- 
ing systems are usually near to planar graphs, 
so the frontier-based method is very effective 
in many cases. They succeeded in generating 
a ZDD to enumerate all the possible switching 
patterns in a realistic benchmark of an electric 
power distribution system with 468 switches. 
The obtained ZDD represents as many as 10°° 
of the valid switching patterns, but the actual 
ZDD size is less than 100 MB, and computation 
time is around 30min. After generating the 
ZDD, all valid switching patterns are compactly 
represented, and we can efficiently discover the 
switching patterns with maximum, minimum, 
and average cost. We can also efficiently apply 
additional constraints to the current solutions. In 
this way, frontier-based methods can be utilized 
for many kinds of real-life problems. 


Open Problems 


Frontier-based method is a general framework of 
the algorithm, and we have to develop particular 
algorithm for enumerating graphs to satisfy a 
given constraint. It is sometimes time consuming, 
and it is not clearly understood which kind of 
graphs can be generated easily and which are hard 
or impossible. 


Experimental Results 


It is an interesting problem how large n is possi- 
ble to count the number of simple paths included 
in ann xn grid graph with s and ¢ at the opposite 
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corners. We have worked for this problems and 
succeeded in counting the total number of self- 
avoiding s — t paths for the 26 x 26 grid graph. 
The number is exactly: 


1736993 158627927293 11754404212364989003 
72229588288 14060466370372091034241327 
6134762789218193498006 107082296223 14338 
0491348290026721931129627708738890853 
908108906396. 


This is the current world record and is officially 
registered in the On-Line Encyclopedia of Integer 
Sequences [10] in November 2013. The detailed 
techniques for solving larger problems are pre- 
sented in the report by Iwashita et al. [6] 


A Related YouTube Video 


In 2012, Minato supervised a short educational 
animation video (Fig. 3). The video is mainly de- 
signed for junior high school to college students, 
to show the power of combinatorial explosion and 
the state-of-the-art techniques for solving such 
hard problems. This video uses the simple path 
enumeration problem for 1 x n grid graphs. The 
story is that the teacher counts the total number 
of paths for children starting from n = 1, but 
she will be faced with a difficult situation, since 
the number grows unbelievably fast. She would 
spend 250,000 years to count the paths for the 
10x10 grid graph by using a supercomputer if she 
used a naive method. The story ends by telling 
that a state-of-the-art algorithm can finish the 
same problem in a few seconds. The video is now 
shown in YouTube [2] and received more than 
1.5 million views, which is an extraordinary case 
in the scientific educational contents. We hear 
that Knuth also enjoyed this video and shared it 
to several of his friends. 


Graphillion: Open Software Library 
The above techniques of data structures and al- 


gorithms have been implemented and published 
as an open software library, named “Graphillion” 
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Counting by ZDD, Fig. 3 
Screenshots of the 
animation video [2] 


[3,4]. Graphillion is a library for manipulating 
very large sets of graphs, based on ZDDs and 
frontier-based method. Traditional graph libraries 
maintain each graph individually, which causes 
poor scalability, while Graphillion handles a set 
of graphs collectively without considering indi- 
vidual graph. Graphillion is implemented as a 
Python extension in C++, to encourage easy de- 
velopment of its applications without introducing 
significant performance overhead. 


URLs to Code and Data Sets 


The open software library “Graphillion” can 
be found on the web page at http://graphillion. 
org, the YouTube video http://www.youtube. 
com/watch?v=Q4gTV4r0zRs, and the On-Line 
Encyclopedia of Integer Sequences (OEIS) on the 
self-avoiding path enumeration problem https:// 
oeis.org/A007764. 
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Problem Definition 


A graph stream is a sequence of unordered pairs 
of elements from a set V implicitly describing an 
underlying graph G on vertex set. The unordered 
pairs represent edges of the graph. A triangle is 
a triple of vertices u,v,w € V which form a 3- 
clique, that is, every unordered pair of vertices of 
the set {u, v, w} is connected by an edge. In this 
article, we investigate the problem of counting the 
number of triangles in an input graph G given 
as a graph stream. Furthermore, we restrict our 
attention to algorithms which are severely limited 
in total space (in particular, they cannot store the 
entire stream) and are allowed only a single scan 
over the stream of edges. 

Next we describe the streaming setting 
more formally. Consider a sequence of 
distinct unordered pairs or, equivalently, edges 
€1,€2,---,@m on the set V. Let G; be the graph 
formed by the first t edges of the stream where 
t € {l,...,m}. Denoting the empty graph by 
Go, we see that graph G; is obtained from G;— 
by inserting edge e; for allt € {1,...,m}. An 
edge {u,v} of the stream implicitly introduces 
vertex labels u and v. New vertices are therefore 
implicitly added as new labels. In this article, 
we do not consider edge or vertex deletions, nor 
do we allow repeated edges. The problem of 
counting triangles even in this simple setting has 
received a lot of attention [1,2,6, 12-15, 17]. 

A streaming algorithm has a small working 
memory M and gets to scan the input stream 
in a sequential fashion at most a few times. In 
this article, we only consider algorithms which 
make a single pass over the input stream. Thus, 
the algorithm proceeds by sequentially reading 
each edge e; from the input graph stream and 
updating its data structures in M using e;. The 
algorithm cannot read an edge that has already 
passed again. (It may remember it in M.) Since 
the size of M is much smaller than m, the 
algorithm must work with only a “sketch” of the 
input graph stored in M. The streaming algo- 
rithm has access to random coins and typically 
maintains a random sketch of the input graph 
(e.g., a random subsample of the input edge 
stream). 
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The aim is to output an accurate estimate of 
the number of triangles in graph Gy. In fact, 
we require that the algorithm output a running 
estimate of the number of triangles 7; in graph 
G; seen so far as it is reading the edge stream. 
It is also of interest to output an estimate of 
another quantity, related to the number of tri- 
angles, called transitivity. The transitivity of a 
graph, denoted x, is the fraction of length-2 paths 
which form triangles. A path of length 2 is also 
called a wedge. A wedge {{u, v}, {u, w}} is open 
(respectively, closed) if the edge {v, w} is absent 
(respectively, present) in the graph. Every closed 
wedge is part of a triangle, and every triangle has 
exactly three closed wedges. This immediately 
gives a formula for transitivity: « = 37/W, 
where T and W are the total number of triangles 
and wedges in the graph, respectively. We use 
the subscript ¢, as in 7;, W;, and x;, to denote 
corresponding quantities for graph G;. The key 
result described in this article is a small space 
single-pass streaming algorithm for maintaining 
a running estimate for each of 7;, W;, and k;. 


Outline of the Rest of the Article 

The bulk of this article is devoted to the de- 
scription of results and algorithms of [12]. This 
is complemented by a section on related work 
where we briefly describe some of the other algo- 
rithms for triangle counting. This is followed by 
applications of triangle counting, an open prob- 
lem, a section describing experimental results, 
and, finally, references used in this article. 


Key Results 


The main result (from [12]) presented in this 
article is a single pass, O(m//T)-space stream- 
ing algorithm to estimate the transitivity and the 
number of triangles of the input graph with small 
additive error. 


The Algorithm of [12] and Its Analysis 


The starting point of the algorithm is the wedge 
sampling idea from [19]. The transitivity of a 
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graph is precisely the probability that a uniformly 
random wedge from the graph is closed. Thus, 
estimating transitivity amounts to approximating 
the bias of a coin flip simulated by the follow- 
ing probabilistic experiment: sample a wedge 
{{u, v}, {u, w}} uniformly from the set of wedges 
and output “Heads” if it is closed and “Tails” 
otherwise. One can check whether a wedge is 
closed by checking if {v,w} is an edge in the 
graph. If, in addition, we have an accurate esti- 
mate of W, the triangle count can be estimated 
using T = «/3-W. 

If we adopt the described strategy, we need 
a way to sample a wedge uniformly from the 
graph stream. While this task by itself appears 
rather challenging, we note that sampling an edge 
uniformly from the graph stream can be done 
easily via an adaptation of reservoir sampling. 
(See below for details.) Can we use an edge 
sampling primitive to sample wedges uniformly 
from a graph stream? This is exactly what [12] 
achieves. Before we describe the algorithm of 
[12], we present a key primitive which is also 
used in other works on counting triangles in graph 
streams. 


A key Algorithmic Tool: A Reservoir of 
Uniform Edges 

This algorithmic tool allows one to maintain a 
set R; of k edges while reading the edge stream 
with the following guarantee for every ¢: each of 
the k edges in R; is selected uniformly from the 
edges of G; and all edges are mutually indepen- 
dent. The idea is to adapt the classic reservoir 
sampling [21]. More precisely, at the beginning 
of the stream, Ro consists of k empty slots. While 
reading the edge stream, on observing edge e;, do 
the following probabilistic experiment indepen- 
dently for each of the k slots of Ry—1 to yield R;: 
(i) sample a random number x uniformly from 
[0, 1], and (ii) if x < 1/t, replace the slot with e;. 
Otherwise, keep the slot unchanged. 

How does a reservoir of edges R; help in sam- 
pling wedges from the graph stream? When the 
edge reservoir R; is large enough, there are many 
pairs of edges in R; sharing a common vertex, 
yielding many wedges. Further, if the transitivity 
of the input graph is high, many of these wedges 
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will in fact be closed wedges, aka, triangles. This 
is the idea behind the algorithm of [12]. The 
key theoretical insight is that there is a birthday 
paradox-like situation here: many uniform edges 
result in sufficiently many “collisions” to form 
wedges. Recall (e.g., from Chap.II.3 of [10]) that 
the classic birthday paradox states that if we 
choose 23 random people, the probability that 2 
of them share a birthday is at least 1/2. In our 
setting, edges correspond to people and “sharing 
a birthday” corresponds to sharing a common 
vertex. 


A Key Analytical Tool: The Birthday 

Paradox 

Suppose Rj,..., Rx are i.i.d. samples from the 
uniform distribution on the edges of G. Then for 
distincti, 7 € [k], Pr({ Ri, R;} forms a wedge) = 
2W/m7?. By linearity of expectation, the expected 
number of expected wedges is (5) (2W/m7). In 
particular, setting k to be c times a large constant 
multiple of m//W results in expectation of at 
least c? wedges. Even better is the fact that these 
wedges are uniformly (but not independently) 
distributed in the set of all wedges. Similarly, 
one can show that when k is 2(m/J/T), one 
obtains many closed wedges. This is the heart of 
the argument in the analysis of the algorithm of 
[12]. 

While we do not wish to present all the tech- 
nical details of the algorithm of [12] here, we 
do provide some high-level ideas from [12]. The 
algorithm maintains a reservoir of edges R; as 
explained above. Further, it maintains a set C; 
of wedges {eg, ep} with eg,e, € R,y such that 
(i) {@g, ep} is a closed wedge in G; and (ii) the 
closing edge appears after edges e, and ey in the 
stream. In other words, C; are the wedges in R; 
which can be detected to be closed. The algorithm 
outputs a random bit b; whose expectation is 
close to ky. 


For each edge e; of the graph stream: 


(a) Update reservoir R; of k edges as described 
above. 

(b) Let W; be the set of wedges in R;. Let N; 
be the set of wedges in R; which have e; as 
their closing edge. 
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(c) Update C; to C;_-1 N W; UN;. 

(d) If there are no wedges in Ry, output b; = 0. 
Otherwise, sample a uniform wedge in R; 
and output b; = 1 if this wedge is in C; and 
b; = 0 otherwise. 


The next theorem shows that the expectation 
of bm is close to «/3. Moreover, it shows that 
|Win| can be used to estimate W. Getting a good 
estimate on E[b,,] and W allows one to estimate 
T by multiplying the two estimates together. We 
note that while the theorem is stated for the final 
index t = m, it holds for any large enough f. 


Theorem 1 ({12]) Assume W > m and fix B € 
(0, 1). Suppose k = |Rm| is Q(m/(B?/T)). Set 
W = m2|Wp|/(k(k —1)). Then |k/3—E[bm]| < 
B and Pr[|W — W| < BW) are at least 1 — B. 


Related Work 


The problem of counting triangles in graphs has 
been studied extensively in a variety of different 
settings: exact, sampling, streaming, and MapRe- 
duce. We refer the reader to references in [12] 
for a comprehensive list of these works. Here we 
focus on a narrow topic: single-pass streaming al- 
gorithms for estimating triangle counts. In partic- 
ular, we do not discuss streaming algorithms that 
make multiple passes (e.g., [11,13]) or algorithms 
that compute triangles incident on every vertex 
(e.g., Becchetti et al. [4]). 

Bar-Yossef et al. [3] were the first to study 
the problem of triangle counting in the streaming 
setting. Since then there have been a long line of 
work improving the guarantees of the algorithm 
in various ways [1, 2,6, 12-15, 17]. Specifically, 
Buriol et al. [6] gave an O(mn/T) space algo- 
rithm. The algorithm maintains samples of the 
form ((u, v),w) where (u,v) is a uniform edge 
in the stream and w is a uniform node label other 
than uw and v. The algorithm checks for presence 
of edges (u, w) and (v, w) to detect triangle. They 
also give an implementation of their algorithm 
which is practical but relative error in triangle 
estimates can be high. The O(m/./T) algorithm 
described in this algorithm is from Jha et al. [12]. 
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In parallel with [12], Pavan et al. [17] inde- 
pendently gave an O(mA/T) space algorithm 
where A is the maximum degree of the graph. 
Their algorithm is based on a sampling tech- 
nique they introduced called neighborhood sam- 
pling. Neighborhood sampling maintains edge 
pair samples of the form (71,72). In each pair, 
edge 7; is sampled uniformly from the edges 
observed so far. Edge rz is sampled uniformly 
from the set N(r,) where N(r;) consists of the 
edges adjacent to edge r; that appear after edge 
r, in the stream. When r2 is nonempty, the pair 
(71, 72) forms a wedge which can be monitored to 
see if it forms a triangle. Observe that a triangle 
formed by edges e;,, e;,, and e;, appearing in 
this order is detected as a closed triangle with 
probability 1/|N(e;,)| - m. Accounting for this 
bias by keeping track of the quantity N(r;) for 
each sample (71,72), one gets an unbiased es- 
timator for triangle counts. In a recent work, 
Ahmed et al. [1] give practical algorithms for 
triangle counting which seem to empirically im- 
prove on some of the previous results. In this 
work, the authors present a generic technique 
called graph sample and hold and use it for 
estimating triangle counts. At a high level, graph 
sample and hold associates a nonzero probability 
Pe, to each edge e; which corresponds to the 
probability with which edge e; is independently 
sampled. Importantly, the probability pe, may 
depend on the graph sampled so far. But the 
actual probability with which the edge is sampled 
is recorded. Now estimates about the original 
graph can be obtained from the sampled graph 
using the selection estimator. For example, the 
number of triangles is estimated by summing 
1/(Per, * Per, * Per,) Over all sampled triangles 
{€t, + Cro» €r3 }. This can be seen as a generalization 
of neighborhood sampling. 

On the lower bound side, Braverman et al. [5] 
show that any single-pass streaming algorithm 
which gives a good multiplicative approximation 
of triangle counts must use §2(m) bits of storage 
even if the input graph has (2(7) triangles. This 
improves lower bounds from [3, 13]. For algo- 
rithms making a constant c number of passes, for 
every constant c, the lower bound is shown to be 
92(m/T) in the same work. 
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Applications 


The number of triangles in a graph and the 
transitivity of a graph are important measures 
used widely in the study of networks across many 
different domains. For example, these measures 
appear in social science research [7, 8, 18,22], in 
data mining applications such as spam detection 
and finding common topics on the World Wide 
Web [4,9], and in bioinformatics for motif count- 
ing [16]. For a more detailed list of applications, 
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Counting Triangles in Graph Streams, Fig. 1 Output 
of STREAMING-TRIANGLES algorithm of [12] on a va- 
riety of real datasets while storing at most 40K edges. 
(a) Gives the estimated transitivity values alongside their 
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we point the reader to introductory sections of 
references in the related work. 


Open Problems 


Give a tight lower bound on the space required by 
a single-pass streaming algorithm for estimating 
the number of triangles in graph stream with 
additive error. The lower bound for multiplicative 
approximation is known to be §2(m) [3,5]. 
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exact values. (b) Gives the relative error of STREAMING- 
TRIANGLES’s estimates of triangles JT’. Observe that the 
relative error for T is mostly below 8 % and often below 
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Experimental Results 


Figures | and 2 give a glimpse of experimental 
results from [12] on performance of algorithm 
STREAMING-TRIANGLES. Specifically, Fig. 1 
shows the result of running the algorithm on 
a variety of graph datasets obtained from [20]. 
This includes run on graph datasets such as Orkut 
social network consisting of 200M edges. The 
relative errors on « and T are mostly less than 
5 % (except for graphs with tiny «). The storage 
used by the algorithm stated in terms of number 
of edges is at most 40K. (The storage roughly 
corresponds to the size of edge reservoir used in 
the algorithm of [12] described in this article.) 

An important aspect of the algorithm pre- 
sented in [12] is that it can track the quantities 
kx, and 7; for all values of f in real time. This is 
exhibited in Fig. 2. 
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Problem Definition 


The problem of sketching a large mathematical 
object is to produce a compact data structure 
that approximately represents it. The Count-Min 
(CM) sketch is an example of a sketch that allows 
a number of related quantities to be estimated 
with accuracy guarantees, including point queries 
and dot product queries. Such queries are at the 
core of many computations, so the structure can 
be used in order to answer a variety of other 
queries, such as frequent items (heavy hitters), 
quantile finding, join size estimation, and more. 
Since the sketch can process updates in the form 
of additions or subtractions to dimensions of 
the vector (which may correspond to insertions 
or deletions or other transactions), it is capa- 
ble of working over streams of updates, at high 
rates. 

The data structure maintains the linear pro- 
jection of the vector with a number of other 
random vectors. These vectors are defined im- 
plicitly by simple hash functions. Increasing the 
range of the hash functions increases the accuracy 
of the summary, and increasing the number of 
hash functions decreases the probability of a bad 
estimate. These trade-offs are quantified precisely 
below. Because of this linearity, CM sketches can 
be scaled, added, and subtracted, to produce sum- 
maries of the corresponding scaled and combined 
vectors. 


Key Results 


The Count-Min sketch was first proposed 
in 2003 [4], following several other sketch 
techniques, such as the Count sketch [2] and 
the AMS sketch [1]. The sketch is similar 
to a counting Bloom filter or multistage 
filter [7]. 
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Count-Min Sketch, Fig. 1 
Each item 7 is mapped to 


+c 


one cell in each row of the 
array of counts: when an 
update of c to item i; 


+c 


arrives, C; is added to each hy 
of these cells 


+c 


+c 


Data Structure Description 

The CM sketch is simply an array of counters 
of width w and depth d, CM[1, 1]...CM[d, w]. 
Each entry of the array is initially zero. Addition- 
ally, d hash functions 


hy...hg:{1...n}—> {1...w} 


are chosen uniformly at random from a pairwise- 
independent family. Once w and d are chosen, 
the space required is fixed: the data structure is 
represented by wd counters and d hash functions 
(which can each be represented in O(1) machine 
words [12]). 


Update Procedure 

A vector a of dimension n is described incre- 
mentally. Initially, a(0) is the zero vector, 0, so 
a; (0) is O for all 7. Its state at time ¢ is denoted 
a(t) = [a,(t),...a;(t),...a,(t)]. Updates to 
individual entries of the vector are presented as a 
stream of pairs. The ¢th update is (i;, c;), meaning 
that 


ai,(t) = a;,(t— I +e; 
ay (t) = a(t — 1) i’ A iy. 

For convenience, the subscript ¢ is dropped, and 
the current state of the vector simply referred to 
as a. For simplicity of description, it is assumed 
here that although values of a; increase and 
decrease with updates, each a; > 0. However, 
the sketch can also be applied to the case where 
a;s can be less than zero with some increase in 
costs [4]. 

When an update (i;,c;) arrives, cy; is added 
to one count in each row of the Count-Min 
sketch; the counter is determined by hj. For- 
mally, given (i;, cz), the following modifications 


are performed: 
Visjsd:CM{j,hj(it)]—CM[j, hj (i) + cr. 


This procedure is illustrated in Fig. 1. Because 
computing each hash function takes constant 
time, the total time to perform an update is O(d), 
independent of w. Since d is typically small 
in practice (often less than 10), updates can be 
processed at high speed. 


Point Queries 

A point query is to estimate the value of an 
entry in the vector a;. Given a query point i, 
an estimate is found from the sketch as ad; = 
minj<j<¢d CM[j,h;(i)]. The approximation 
guarantee is that if w = [€] and d = [In z1 the 
estimate a; obeys aj < G;; and, with probability 
at least 1 — 6, 


aj <a; + 6llal|i. 


Here, ||a||1 is the L; norm of a, i.e., the sum of 
the (absolute) values. The proof follows by using 
the Markov inequality to limit the error in each 
row and then using the independence of the hash 
functions to amplify the success probability. 

This analysis makes no assumption about the 
distribution of values in a. In many applications, 
there are Zipfian, or power law, distributions of 
item frequencies. Here, the (relative) frequency 
of the ith most frequent item is proportional to 
i-*, for some parameter z, where z is typically in 
the range 1-3. Here, the skew in the distribution 
can be used to show a stronger space/accuracy 
trade-off: for a Zipf distribution with parameter 
z, the space required to answer point queries with 
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error €||a||, with probability at least 1—6 is given 
by O(e7 ™{11/23 In 1/8) [5]. 


Range, Heavy Hitter, and Quantile Queries 

A range query is to estimate )~;_, a; for a range 
[/ ...r]. For small ranges, the range sum can be 
estimated as a sum of point queries; however, 
as the range grows, the error in this approach 
also grows linearly. Instead, log sketches can be 
kept, each of which summarizes a derived vector 
a* where 


(j+1)2k-1 


Ya 


i= j2k 


aX {j]= 


for k = 1...logn. A range of the form 
jak ...(j +1)2* —1 is called a dyadic range, and 
any arbitrary range [/ .. . 7] can be partitioned into 
at most 2logn dyadic ranges. With appropriate 
rescaling of accuracy bounds, it follows that 
Count-Min sketches can be used to find an 
estimate 7 for a range query on/ ...r such that 


, 
?—ellall, < Dia; <P. 


i=l 


The right inequality holds with certainty, and the 
left inequality holds with probability at least 1 — 
6. The total space required is oe ” log $) [4]. 
The closely related ¢-quantile query is to find a 


point 7 such that 


j+1 


j 
lai < glial < Yo ai. 


i=1 i=1 


Range queries can be used to (binary) search for a 
j which satisfies this requirement approximately 
(i.e., tolerates up to ella||; error in the above 
expression) given ¢. The overall cost is space 
that depends on 1/e, with further log factors 
for the rescaling necessary to give the overall 
guarantee [4]. The time for each insert or delete 
operation and the time to find any quantile are 
logarithmic in 7, the size of the domain. 

Heavy hitters are those points i such that a; > 
d|la||, for some specified ¢. The range query 
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primitive based on Count-Min sketches can again 
be used to find heavy hitters, by recursively split- 
ting dyadic ranges into two and querying each 
half to see if the range is still heavy, until a range 
of a single, heavy item is found. The cost of this is 
similar to that for quantiles, with space dependent 
on 1/e and logn. The time to update the data 
structure and to find approximate heavy hitters 
is also logarithmic in n. The guarantee is that 
every item with frequency at least (¢ + €)|la||1 is 
output, and with probability 1 — 6 no item whose 
frequency is less than ¢|la||, is output. 


Inner Product Queries 

The Count-Min sketch can also be used to esti- 
mate the inner product between two vectors. The 
inner product a-b can be estimated by treating the 
Count-Min sketch as a collection of d vectors of 
length w and finding the minimum inner product 
between corresponding rows of sketches of the 
two vectors. With probability 1 — 6, this estimate 
is at most an additive quantity ¢|la||1||b||1 above 
the true value of a - b. This is to be compared 
with AMS sketches which guarantee eé||a||2||b||2 
additive error but require space proportional to 5 
to make this guarantee. 


Conservative Update 

If only positive updates arrive, then the “conser- 
vative update” process (due to Estan and Vargh- 
ese [7]) can be used. For an update (i,c), dj 
is computed, and the counts are modified ac- 
cording to V1 < jf < d : CM[j,h;@)] < 
max(CM[j,h;(i)], a; +c). This procedure still 
ensures for point queries that a; < G;, and that 
the error is no worse than in the normal update 
procedure; it has been observed that conservative 
update can improve accuracy “up to an order of 
magnitude” [7]. However, deletions or negative 
updates can no longer be processed, and the new 
update procedure is slower than the original one. 


Applications 


The Count-Min sketch has found a number of 
applications. 
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Indyk [9] used the Count-Min sketch to esti- 
mate the residual mass after removing a set of 
items, that is, given a (small) set of indices /, 


to estimate )°;¢7 a;. This supports clustering 
over streaming data. 

¢ The entropy of a data stream is a function 
of the relative frequencies of each item or 
character within the stream. Using Count-Min 
sketches within a larger data structure based 
on additional hashing techniques, B. Lakshmi- 
nath and Ganguly [8] showed how to estimate 
this entropy to within relative error. 

¢ Sarlés et al. [14] gave approximate algorithms 
for personalized page rank computations 
which make use of Count-Min sketches to 
compactly represent web-sized graphs. 

¢ In describing a system for building selectivity 
estimates for complex queries, Spiegel and 
Polyzotis [15] use Count-Min sketches 
in order to allow clustering over a high- 
dimensional space. 

e Sketches that reduce the amount of infor- 
mation stored seem like a natural candidate 
to preserve privacy of information. However, 
proving privacy requires more care. Roughan 
and Zhang use the Count-Min sketch to allow 
private computation of a sketch of a vec- 
tor [13]. Dwork et al. show that the Count-Min 
sketch can be made pan-private, meaning that 
information about individuals contributing to 
the data structure is held private. 


Experimental Results 


There have been a number of experimental stud- 
ies of COUNT-MIN and related algorithms for a 
variety of computing models. These have shown 
that the algorithm is accurate and fast to ex- 
ecute [3, 11]. Implementations on desktop ma- 
chines achieve many millions of updates per sec- 
ond, primarily limited by IO throughput. Other 
implementations have incorporated Count-Min 
sketch into high-speed streaming systems such 
as Gigascope [6] and tuned it to process packet 
streams of multi-gigabit speeds. Lai and Byrd re- 
port on an implementation of Count-Min sketches 
on a low-power stream processor [10] capable 
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of processing 40 byte packets at a throughput 
rate of up to 13 Gbps. This is equivalent to about 
44 million updates per second. 


URLs to Code and Data Sets 


Sample implementations are widely available in 
a variety of languages. 


C code is given by the MassDal code bank: 
http://www.cs.rutgers.edu/~muthu/massdal- 
code-index.html. 

C++ code by Marios Hadjieleftheriou is 
available from _http://hadjieleftheriou.com/ 
sketches/index.html. 

The MADIlib project has SQL implementations 
for Postgres/Greenplum http://madlib.net/. 

OCaml implementation is available via https:// 
github.com/ezyang/ocaml-cminsketch. 


Cross-Reference 


AMS Sketch 


Recommended Reading 


1. Alon N, Matias Y, Szegedy M (1996) The space com- 
plexity of approximating the frequency moments. In: 
ACM symposium on theory of computing, Philadel- 
phia, pp 20-29 

2. Charikar M, Chen K, Farach-Colton M (2002) Find- 
ing frequent items in data streams. In: Procedings of 
the international colloquium on automata, languages 
and programming (ICALP), Malaga 

3. Cormode G, Hadjieleftheriou M (2009) Finding the 
frequent items in streams of data. Commun ACM 
52(10):97-105 

4. Cormode G, Muthukrishnan S (2005) An improved 
data stream summary: the Count-Min sketch and its 
applications. J Algorithms 55(1):58-75 

5. Cormode G, Muthukrishnan S (2005) Summarizing 
and mining skewed data streams. In: SIAM confer- 
ence on data mining, Newport Beach 

6. Cormode G, Korn F, Muthukrishnan S, Johnson T, 
Spatscheck O, Srivastava D (2004) Holistic UDAFs 
at streaming speeds. In: ACM SIGMOD international 
conference on management of data, Paris, pp 35-46 

7. Estan C, Varghese G (2002) New directions in traf- 
fic measurement and accounting. In: Proceedings of 


468 


ACM SIGCOMM, computer communication review, 
vol 32, 4, Pittsburgh, PA, pp 323-338 

8. Ganguly S$, Lakshminath B (2006) Estimating en- 
tropy over data streams. In: European symposium on 
algorithms (ESA), Zurich 

9. Indyk P (2003) Better algorithms for high- 
dimensional proximity problems via asymmetric 
embeddings. In: ACM-SIAM symposium on discrete 
algorithms, Baltimore 

10. Lai YK, Byrd GT (2006) High-throughput sketch 
update on a low-power stream processor. In: Pro- 
ceedings of the ACM/IEEE symposium on architec- 
ture for networking and communications systems, 
San Jose 

11. Manerikar N, Palpanas T (2009) Frequent items 
in streaming data: an experimental evaluation 
of the state-of-the-art. Data Knowl Eng 68(4): 
415-430 


12. Motwani R, Raghavan P (1995) Randomized 
algorithms. Cambridge University Press, 
Cambridge/New York 


13. Roughan M, Zhang Y (2006) Secure distributed data 
mining and its application in large-scale network 
measurements. In: ACM SIGCOMM computer com- 
munication review (CCR), Pisa 

14. Sarlos T, Benztr A, Csalogany K, Fogaras D, Racz 
B (2006) To randomize or not to randomize: space 
optimal summaries for hyperlink analysis. In: Inter- 
national conference on World Wide Web (WWW), 
Edinburgh 

15. Spiegel J, Polyzotis N (2006) Graph-based synopses 
for relational selectivity estimation. In: ACM SIG- 
MOD international conference on management of 
data, Chicago 


CPU Time Pricing 
Li-Sha Huang 


Department of Compurter Science and 
Technology, Tsinghua University, Beijing, China 


Keywords 


Competitive auction; Market equilibrium; Re- 
source scheduling 


Years and Authors of Summarized 
Original Work 


2005; Deng, Huang, Li 


CPU Time Pricing 


Problem Definition 


This problem is concerned with a Walrasian equi- 
librium model to determine the prices of CPU 
time. In a market model of a CPU job scheduling 
problem, the owner of the CPU processing time 
sells time slots to customers and the prices of 
each time slot depends on the seller’s strategy 
and the customers’ bids (valuation functions). 
In a Walrasian equilibrium, the market is clear 
and each customer is most satisfied according 
to its valuation function and current prices. The 
work of Deng, Huang, and Li [1] establishes the 
existence conditions of Walrasion equilibrium, 
and obtains complexity results to determine the 
existence of equilibrium. It also discusses the 
issues of excessive supply of CPU time and 
price dynamics. 


Notations 
Consider a combinatorial auction (2, /, V): 


¢ Commodities: The seller sells m kinds of 
indivisible commodities. Let 2 = {@, x 61, 
...;@m X dm} denote the set of commodities, 
where §; is the available quantity of the item 
Wj. 

¢ Agents: There are n agents in the market act- 
ing as buyers, denoted by J = {1,2,...,m}. 

e Valuation functions: Each buyer i € J has 
a valuation function v; : 2% — Rt to submit 
the maximum amount of money he is willing 
to pay for a certain bundle of items. Let 
V = {v1, v2,..., Un}. 


An XOR combination of two valuation func- 
tions v; and vz is defined by: 


(vy XOR v2)(S) = max {v1 (S), v2(S)} 


An atomic bid is a valuation function v 
denoted by a pair (S, g), where S C 92 and 
qéRt: 


g, ESET 


v(T) = 
0, otherwise 
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Any valuation function v; can be expressed by an 
XOR combination of atomic bids, 


vj = (Sin, git XOR (Sj2, giz)... XOR (Sin, Gin) 


Given (§2,1,V) as the input, the seller will 
determine an allocation and a price vector as 
the output: 


¢ An allocation X = {Xo, X1,X2,...,Xn} is 
a partition of Q, in which X; is the bundle of 
commodities assigned to buyer i and Xo is the 
set of unallocated commodities. 

¢ A price vector p is a non-negative vector in 
R™”, whose jth entry is the price of good 
Wj; € 2. 


For any subset T = {@1 X 01,...,@m X Om} CQ, 
define p(T) by p(T) = S77 0; p;. If buyer i 
is assigned to a bundle X;, his utility is 
uj(X;, p) = vi (Xi) — p(X). 


Definition A Walrasian equilibrium for a combi- 
natorial auction (2, J, V) is a tuple (X, p), where 
X = {Xo0, X1,..., Xn} is an allocation and p is 
a price vector, satisfying that: 


(1) p(Xo) = 0; 


(2) ui(Xi, p) = ui(B,p), VBCQ, 


Vli<i<n 


Such a price vector is also called a market 
clearing price, or Walrasian price, or equilibrium 
price. 


The CPU Job-Scheduling Problem 

There are two types of players in a market- 
driven CPU resource allocation model: a resource 
provider and n consumers. The provider sells to 
the consumers CPU time slots and the consumers 
each have a job that requires a fixed number of 
CPU time, and its valuation function depends on 
the time slots assigned to the job, usually the 
last assigned CPU time slot. Assume that all jobs 
are released at time t = 0 and the ith job needs 
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s; time units. The jobs are interruptible without 
preemption cost, as is often modeled for CPU 
jobs. 

Translating into the language of combinatorial 
auctions, there are m commodities (time units), 
Q = {w,...,@m}, and n buyers (jobs), 
I = {1,2,...,m}, in the market. Each buyer has 
a valuation function v;, which only depends on 
the completion time. Moreover, if not explicitly 
mentioned, every job’s valuation function is non- 
increasing w.r.t. the completion time. 


Key Results 


Consider the following linear programming prob- 
lem: 


n kj 
max ) | Wii 


i=1j=1 


St 


i, f|@x ES; 


Xij < OK, Var € 2 


ri 
) xi <1, Vil<i<n 
j=1 


Denote the problem by LPR and its inte- 
ger restriction by IP. The following theorem 
shows that a non-zero gap between the integer 
programming problem IP and its linear relax- 
ation implies the non-existence of the Walrasian 
equilibrium. 


Theorem 1 Jn a combinatorial auction, the Wal- 
rasian equilibrium exists if and only if the opti- 
mum of IP equals the optimum of LPR. The size 
of the LP problem is linear to the total number of 
XOR bids. 
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Theorem 2 Determination of the existence of 
Walrasian equilibrium in a CPU job scheduling 
problem is strong NP-hard. 


Now consider a job scheduling problem in which 
the customers’ valuation functions are all linear. 
Assume n jobs are released at the time t = 0 
for a single machine, the jth job’s time span is 
s; €N* and weight w; > 0. The goal of the 
scheduling is to minimize the weighted comple- 
tion time: = wit;, where ¢; is the completion 
time of job i. Such a problem is called an MWCT 
(Minimal Weighted Completion Time) problem. 


Theorem 3 Jn a single-machine MWCT job 
scheduling problem, Walrasian equilibrium al- 
ways exists whenm > EM + A, where m is the 
total number of processor time, EM = )~;_, Si 
and A = max, {sx}. The equilibrium can be 
computed in polynomial time. 


The following theorem shows the existence of 
a non-increasing price sequence if Walrasian 
equilibrium exists. 


Theorem 4 /f there exists a Walrasian equi- 
librium in a job scheduling problem, it can be 
adjusted to an equilibrium with consecutive al- 
location and a non-increasing equilibrium price 
vector. 


Applications 


Information technology has changed people’s 
lifestyles with the creation of many digital 
goods, such as word processing software, 
computer games, search engines, and online 
communities. Such a new economy has already 
demanded many theoretical tools (new and old, 
of economics and other related disciplines) be 
applied to their development and production, 
marketing, and pricing. The lack of a full 
understanding of the new economy is mainly 
due to the fact that digital goods can often 
be re-produced at no additional cost, though 
multi-fold other factors could also be part of the 
difficulty. The work of Deng, Huang, and Li [1] 
focuses on CPU time as a product for sale in the 
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market, through the Walrasian pricing model in 
economics. CPU time as a commercial product is 
extensively studied in grid computing. Singling 
out CPU time pricing will help us to set aside 
other complicated issues caused by secondary 
factors, and a complete understanding of this 
special digital product (or service) may shed 
some light on the study of other goods in the 
digital economy. 

The utilization of CPU time by multiple cus- 
tomers has been a crucial issue in the develop- 
ment of operating system concept. The rise of 
grid computing proposes to fully utilize compu- 
tational resources, e.g., CPU time, disk space, 
bandwidth. Market-oriented schemes have been 
proposed for efficient allocation of computational 
grid recourses, by [2, 5]. Later, various practical 
and simulation systems have emerged in grid 
resource management. Besides the resource al- 
location in grids, an economic mechanism has 
also been introduced to TCP congestion control 
problems, see Kelly [4]. 
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Problem Definition 


Given a point set V, a graph of the vertex set V 
in which two vertices have an edge if and only if 
the distance between them is at most r for some 
positive real number r is called a r-disk graph 
over the vertex set V and denoted by G, (V). If 
ry < 12, obviously G;, (V) € G;, (V). A graph 
property is monotonic (increasing) if a graph is 
with the property, then every supergraph with 
the same vertex set also has the property. The 
critical-range problem (or critical-radius prob- 
lem) is concerned with the minimal range r such 
that G, (V) is with some monotonic property. For 
example, graph connectivity is monotonic and 
crucial to many applications. It is interesting to 
know whether G; (V) is connected or not. Let 
Pcon (V) denote the minimal range r such that 
G, (V) is connected. Then, G; (V) is connected 
if r > Pcon(V), and otherwise not connected. 
Here Peon (V) is called the critical range for 
connectivity of V. Formally, the critical-range 
problem is defined as follows. 


Definition 1 The critical range for a monotonic 
graph property m over a point set V, denoted by 
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Px (V), is the smallest range r such that G; (V) 
has property zr. 


From another aspect, for a given geometric prop- 
erty, a corresponding geometric structure is usu- 
ally embedded. In many cases, the critical-range 
problem for graph properties is related or equiv- 
alent to the longest-edge problem of correspond- 
ing geometric structures. For example, if G; (V) 
is connected, it contains a Euclidean minimal 
spanning tree (EMST), and Peon (V) is equal to 
the largest edge length of the EMST. So the 
critical range for connectivity problem is equiv- 
alent to the longest edge of the EMST prob- 
lem, and the critical range for connectivity is 
the smallest r such that G,;(V) contains the 
EMST. 

In most cases, given an instance, the critical 
range can be calculated by polynomial time algo- 
rithms. So it is not a hard problem to decide the 
critical range. Researchers are interested in the 
probabilistic analysis of the critical range, espe- 
cially asymptotic behaviors of r-disk graphs over 
random point sets. Random geometric graphs [8] 
is a general term for the theory about r-disk 
graphs over random point sets. 


Key Results 


In the following, problems are discussed in a 2D 
plane. Let X,, X2,--- be independent and uni- 
formly distributed random points on a bounded 
region A. Given a positive integer n, the point 
process {X1, X2,...,Xn}is referred to as the 
uniform -point process on A, and is denoted by 
Xn (A). Given a positive number i, let Po (A) 
be a Poisson random variable with parameter 
d, independent of {X1, X2,...}. Then the point 
process {X1, Xo, aig © BOO is referred to as 
the Poisson point process with mean n on A, and 
is denoted by P, (A). A is called a deployment 
region. An event is said to be asymptotic almost 
sure if it occurs with a probability that converges 
to lasn — oo. 

In a graph, a node is “isolated” if it has no 
neighbor. If a graph is connected, there exists 
no isolated node in the graph. The asymptotic 
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distribution of the number of isolated nodes is 
given by the following theorem [2, 6, 14]. 


Theorem 1 Letr, = innté and 92 be a unit- 


area disk or square. The number of isolated nodes 
in Gy (Xy(2)) or G, (Py(&2)) is asymptotically 


Poisson with mean e7 . 


According to the theorem, the probability of 
the event that there is no isolated node is asymp- 
totically equal to exp (- e§ ). In the theory of 
random geometric graphs, if a graph has no 
isolated node, it is almost surely connected. Thus, 
the next theorem follows [6, 8, 9]. 

Theorem 2 Letr, = innté and 22 be a unit- 


area disk or square. Then, 
Pr [G,(X,(2)) is connected] — exp (- =) , and 


Pr [G; (Pn(2)) is connected] > exp (-e-*) ‘ 


In wireless sensor networks, the deployment re- 
gion is k-covered if every point in the deployment 
region is within the coverage ranges of at least 
k sensors (vertices). Assume the coverage ranges 
are disks of radius r centered at the vertices. Let 
k be a fixed non-negative integer, and {2 be the 
unit-area square or disk centered at the origin 
o. For any real number ¢, let ¢§2 denote the set 
{tx:x € Q}, i. e., the square or disk of area f° 
centered at the origin. Let C ,,, (respectively, 
C;,,-) denote the event that (2 is (k + 1)-covered 
by the (open or closed) disks of radius r centered 
at the points in P,,(§2) (respectively, X,(S2)). 
Let K 5» (respectively, Kon) denote the event 
that /sQ is (k + 1)-covered by the unit-area 
(closed or open) disks centered at the points in 
P,, (./s&2) (respectively, X,(./s@2)). To simplify 
the presentation, let 1 denote the peripheral of 
Q, which is equal to 4 (respectively, 2./z) if Q 
is a square (respectively, disk). For any € € R, 
let 


2 
(ff) 


a (é) = 16(2van+e"*) 7 


Nir 


Jn § 
FoR ayI © 
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and 

4e 4.2 (Vz+ 2) nes, ifk =0: 
BOA) cy 

Soe nen, ifk>1. 


The asymptotics of Pr[C,,,] and Pe[ Cal 
as n approaches infinity, and the asymptotics 
of Pr[Ks,] and Pr[K.4| as s approaches 
infinity are given in the following two 
theorems [4, 10, 16]. 


Theorem 3 Let rp = yf Bat Gkt nina ttn if 


wn 
limy—oo &n = & for some & € R , then 


1-BE)< fim Pr [Carn] = T+a@) , and 


ee es lim Pr [Chr | = Ita . 


If limyp—+co &n = 00, then 
jim, Pr [Crorn] _ jim. Pr [Chr | = hy 
If limp—+oo &n = —00, then 


jfim, Pr[Car4] = lim, Pr[Chn,] = 0. 


Theorem 4 Let yx: (s) = Ins+2(k + 1)InIns+ 
E (5). Uf lims+o & (5) = & for some — € R, then 


1 
1— BC) < Jim) Pr Recateye| = I+e® , and 
: j 1 
1 BG) < jin Pr Kewoe] <Tpa®- 


If lims_so9 € (S) = 00, ‘then 

Jim, Pr [Keys] = Jim Pr [K,u¢oy5] = 1 
If lims-+o0 € (s) = —00, then 

Jim, Pr[ Ke toe] = im, Pe[ Ks ne] = 9. 


In Gabriel graphs (GG), two nodes have an edge 
if and only if there is no other node in the 
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disk using the segment of these two nodes as its 
diameter. If V is a point set and / is a positive 
real number, we use pcg (V) to denote the largest 
edge length of the GG over V, and N (V,/) 
denotes the number of GG edges over V whose 
length is at least /. Wan and Yi (2007) [11] gave 
the following theorem. 


Theorem 5 Let 92 be a unit-area disk. For any 
constant &, N (r, (O27 


aa ) is asymptoti- 


cally Poisson with mean 2eé , and 


‘lim Pr rc (Pn(2)) <2] nats 
= exp (-2 e) : 


Let Ppe (V) denote the largest edge length of the 
Delaunay triangulation over a point set V. The 
following theorem is given by Kozma et al. [3]. 


Theorem 6 Let 92 be a unit-area disk. Then, 


poet (Xn(@)) = O ("") . 


In wireless networks with greedy forward routing 
(GFR), each node discards a packet if none of its 
neighbors is closer to the destination of the packet 
than itself, or otherwise forwards the packet to 
the neighbor that is the closest to the destination. 
Since each node only needs to maintain the lo- 
cations of its one-hop neighbors and each packet 
should contain the location of the destination 
node, GFR can be implemented in a localized and 
memoryless manner. Because of the existence 
of local minima where none of the neighbors is 
closer to the destination than the current node, 
a packet may be discarded before it reaches its 
destination. To ensure that every packet can reach 
its destination, all nodes should have sufficiently 
large transmission radii to avoid the existence 
of local minima. Applying the r-disk model, we 
assume every node has the same transmission 
radius r, and each pair of nodes with distance at 
most r has a link. For a point set V, the critical 
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transmission radius for GFR is given by 


PcrrR(V)= ~~ max ( min Iu : 
(uv)EV2 uAzv\ llw—vl|<|lu-v]] 


In the definition, (u,v) is a source—destination 
pair and w is a node that is closer to v than u. If 
every node is with a transmission radius not less 
than pgrr(V), GFR can guarantee the deliver- 
ability between any source—destination pair [12]. 


Theorem 7 Let (2 be a _ unit-area convex 
compact region with bounded curvature, and 


Bo = 1/ (2/3 = V3/2x) ~ 1.62. Suppose that 
nar? = (B + 0(1))Inn for some B > 0. Then, 


1. IfB > Bo, then pcre (Pn(2)) < rn is asymp- 
totically almost sure. 

2. IfB < Bo, then pcrr (Pn(&2)) > Tn is asymp- 
totically almost sure. 


Applications 


In the literature, r-disk graphs (or unit disk graphs 
by proper scaling) are widely used to model 
homogeneous wireless networks in which each 
node is equipped with an omnidirectional an- 
tenna. According to the path loss of radio fre- 
quency, the transmission ranges (radii) of wire- 
less devices depend on transmission powers. For 
simplicity, the power assignment problem usu- 
ally is modeled by a corresponding transmission 
range assignment problem. Recently, wireless ad- 
hoc networks have attracted attention from a lot 
of researchers because of various possible ap- 
plications. In many of the possible applications, 
since wireless devices are powered by batteries, 
transmission range assignment has become one 
of the most important tools for prolonging system 
lifetime. By applying the theory of critical ranges, 
a randomly deployed wireless ad-hoc network 
may have good properties in high probability if 
the transmission range is larger than some critical 
value. 

One application of critical ranges is to con- 
nectivity of networks. A network is k-vertex- 
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connected if there exist k node-disjoint paths 
between any pair of nodes. With such a prop- 
erty, at least k distinct communication paths exist 
between any pair of nodes, and the network is 
connected even if kK — 1 nodes fail. Thus, with 
a higher degree of connectivity, a network may 
have larger bandwidth and higher fault tolerance 
capacity. In addition, in [9, 14], and [15], net- 
works with node or link failures were considered. 

Another application is in topology control. 
To efficiently operate wireless ad-hoc networks, 
subsets of network topology will be constructed 
and maintained. The related topics are called 
topology control. A spanner is a subset of the net- 
work topology in which the minimal total cost of 
a path between any pair of nodes, e.g., distance or 
energy consumption, is only a constant fact larger 
than the minimal total cost in the original network 
topology. Hence spanners are good candidates for 
virtual backbones. Geometric structures, includ- 
ing Euclidean minimal spanning trees, relative 
neighbor graphs, Gabriel graphs, Delaunay tri- 
angulations, Yao’s graphs, etc., are widely used 
ingredients to construct spanners [1, 5, 13]. By 
applying the knowledge of critical ranges, the 
complexity of algorithm design can be reduced, 
e.g., [3, 11]. 


Open Problems 


A number of problems related to critical ranges 
remain open. Most problems discussed here 
apply 2-D plane geometry. In other words, the 
point set is in the plane. The first direction for 
future work is to study those problems in high- 
dimension spaces. Another open research area is 
on the longest-edge problems for other geometric 
structures, e.g., relative neighbor graphs and 
Yao’s graphs. A third direction for future work 
involves considering relations between graph 
properties. A well-known result in random 
geometric graphs is that vanishment of isolated 
nodes asymptotically implies connectivity of 
networks. But for the wireless networks with 
unreliable links, this property is still open. In 
addition, in wireless sensor networks, the rela- 
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tions between connectivity and coverage are also 
interesting. 
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Problem Definition 


This entry deals with proving negative results 
for distribution-free PAC learning. The crux of 
the problem is proving that a polynomial-time 
algorithm for learning various concept classes in 
the PAC model implies that several well-known 
cryptosystems are insecure. Thus, if we assume 
a particular cryptosystem is secure, we can con- 
clude that it is impossible to efficiently learn a 
corresponding set of concept classes. 


PAC Learning 

We recall here the PAC learning model. Let C be 
a concept class (a set of functions over n vari- 
ables), and let D be a distribution over the input 
space {0, 1}”. With C we associate a size func- 
tion size that measures the complexity of each 
c € C. For example, if C is a class of Boolean 
circuits, then size(c) is equal to the number of 
gates inc. Let A be a randomized algorithm that 
has access to an oracle which returns labeled 
examples (x, c(x)) for some unknown c € C; the 
examples x are drawn according to D. Algorithm 
A PAC learns concept class C by hypothesis class 
HT if for any c € C, for any distribution D 
over the input space, and any €,6 > 0, A runs 
in time poly(n, 1/e, 1/6, size(c)) and produces a 
hypothesis h € A such that with probability 
at least (1 — 8), Prp[c(x) 4 h(x)] < ©. This 
probability is taken over the random coin tosses 
of A as well as over the random labeled examples 
seen from distribution D. When H = C (the 
hypothesis must be some concept in C), then A is 
a proper PAC learning algorithm. In this entry it 
is not assumed H = C, i.e., hardness results for 
representation-independent learning algorithms 
are discussed. The only assumption made on 1 
is that for each h € H, h can be evaluated in 
polynomial time for every input of length n. 


Cryptographic Primitives 

Also required is knowledge of various crypto- 
graphic primitives such as public-key cryptosys- 
tems, one-way functions, and one-way trapdoor 
functions. For a formal treatment of these primi- 
tives, refer to Goldreich [3]. 
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Informally, a function f is one way if, after 
choosing a random x of length n and giving 
an adversary A only f(x), it is computationally 
intractable for A to find y such that f(y) = 
f(x). Furthermore, given x, f(x) can be eval- 
uated in polynomial time. That is, f is easy to 
compute one way, but there is no polynomial- 
time algorithm for finding pre-images of f on 
randomly chosen inputs. Say a function f is 
trapdoor if f is one way, but if an adversary A 
is given access to a secret “trapdoor” d, then A 
can efficiently find random pre-images of /. 

Trapdoor functions that are permutations are 
closely related to public-key cryptosystems: imag- 
ine a person Alice who wants to allow others 
to secretly communicate with her. She publishes 
a one-way trapdoor permutation f so that it 
is publicly available to everyone, but keeps the 
“trapdoor” d to herself. Then Bob can send Alice 
a secret message x by sending her f(x). Only Al- 
ice is able to invert f (recall f is a permutation) 
and recover x because only she knows d. 


Key Results 


The main insight in Kearns and Valiant’s work 
is the following: if f is a trapdoor one-way 
function, and C is a circuit class containing the 
set of functions capable of inverting f given 
access to the trapdoor, then C is not efficiently 
PAC learnable, i. e., assuming the difficulty of 
inverting trap-door function /, there is a distri- 
bution on {0,1}” where no learning algorithm 
can succeed in learning f’s associated decryption 
function. 

The following theorem is stated in the (closely 
related) language of public-key cryptosystems: 


Theorem 1 (Cryptography and learning; cf. 
Kearns and Valiant [4]) Consider a_ public- 
key cryptosystem for encrypting individual 
bits into n-bit strings. Let C be a concept 
class that contains all the decryption functions 
{0,1}” — {0,1} of the cryptosystem. If C is 
PAC learnable in polynomial time, then there 
is a polynomial-time distinguisher between the 
encryptions of 0 and 1. 
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The intuition behind the proof is as follows: 
fix an encryption function f, associated secret 
key d, and let C be a class of functions such 
that the problem of inverting f(x) given d can 
be computed by an element c of C; notice that 
knowledge of d is not necessary to generate a 
polynomial-size sample of (x, f(x)) pairs. 

If C is PAC learnable, then given a relatively 
small number of encrypted messages (x, f(x)), 
a learning algorithm A can find a hypothesis 
h that will approximate c and thus have a 
non-negligible advantage for decrypting future 
randomly encrypted messages. This violates the 
security properties of the cryptosystem. 

A natural question follows: “what is the 
simplest concept class that can compute the 
decryption function for secure public-key 
cryptosystems?” For example, if a _public- 
key cryptosystem is proven to be secure, and 
encrypted messages can be decrypted (given the 
secret key) by polynomial-size DNF formulas, 
then, by Theorem 1, one could conclude that 
polynomial-size DNF formulas cannot be learned 
in the PAC model. 

Kearns and Valiant do not obtain such a hard- 
ness result for learning DNF formulas (it is still 
an outstanding open problem), but they do obtain 
a variety of hardness results assuming the secu- 
rity of various well-known public-key cryptosys- 
tems based on the hardness of number-theoretic 
problems such as factoring. 

The following list summarizes their main re- 
sults: 


¢ Let C be the class of polynomial-size Boolean 
formulas (not necessarily DNF formulas) 
or polynomial-size circuits of logarithmic 
depth. Assuming that the RSA cryptosystem 
is secure, or recognizing quadratic residues 
is intractable, or factoring Blum integers is 
intractable, C is not PAC learnable. 

e Let C be the class of polynomial-size de- 
terministic finite automata. Under the same 
assumptions as above, C is not PAC learnable. 

¢ Let C be the class of constant depth threshold 
circuits of polynomial size. Under the same 
assumptions as above, C is not PAC learnable. 
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The depth of the circuit class is not specified 
but it can be seen to be at most 4. 


Kearns and Valiant also prove the intractability 
of finding optimal solutions related to coloring 
problems assuming the security of the above 
cryptographic primitives (e.g., breaking RSA). 


Relationship to Hardness Results for 

Proper Learning 

The key results above should not be confused 
with the extensive literature regarding hardness 
results for properly PAC learning concept classes. 
For example, it is known [1] that, unless RP 
= NP, it is impossible to properly PAC learn 
polynomial-size DNF formulas (i.e., require the 
learner to learn DNF formulas by outputting a 
DNF formula as its hypothesis). Such results 
are incomparable to the work of Kearns and 
Valiant, as they require something much stronger 
from the learner but take a much weaker assump- 
tion (RP 4 NP is a weaker assumption than the 
assumption that RSA is secure). 


Applications and Related Work 


Valiant [10] was the first to observe that the 
existence of a particular cryptographic primi- 
tive (pseudorandom functions) implies hardness 
results for PAC learning concept classes. The 
work of Kearns and Valiant has subsequently 
found many applications. Klivans and Sherstov 
have recently shown [7] that the problem of 
PAC learning intersections of halfspaces (a very 
simple depth-2 threshold circuit) is intractable 
unless certain lattice-based cryptosystems due to 
Regev [9] are not secure. Their result makes use 
of the Kearns and Valiant approach. Angluin and 
Kharitonov [2] have extended the Kearns and 
Valiant paradigm to give cryptographic hardness 
results for learning concept classes even if the 
learner has query access to the unknown concept. 
Kharitonov [6] has given hardness results for 
learning polynomial-size, constant depth circuits 
that assumes the existence of secure pseudo- 
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random generators rather than the existence of 
public-key cryptosystems. 


Open Problems 


The major open problem in this line of research 
is to prove a cryptographic hardness result for 
PAC learning polynomial-size DNF formulas. 
Currently, polynomial-size DNF formulas seem 
far too weak to compute cryptographic primitives 
such as the decryption function for a well-known 
cryptosystem. The fastest known algorithm 


for PAC learning DNF formulas runs in time 
20") Te, 
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Problem Definition 


A dictionary (also known as an associative array) 
is an abstract data structure capable of storing a 
set S of elements, referred to as keys, and infor- 
mation associated with each key. The operations 
supported by a dictionary are insertion of a key 
(and associated information), deletion of a key, 
and lookup of a key (retrieving the associated 
information). In case a lookup is made on a key 
that is not in S, this must be reported by the data 
structure. 

The hash table is a class of data structures 
use to implement dictionaries in the RAM model 
of computation. Open addressing hash tables are 
a particularly simple type of hash table, where 
the data structure is an array such that each 
entry either contains a key of S or is marked 
“empty.” Cuckoo hashing addresses the problem 
of implementing an open addressing hash table 
with worst-case constant lookup time. Specifi- 
cally, a constant number of entries in the hash 
table should be associated with each key x, such 
that x is present in one of these entries if x € S. 

In the following it is assumed that a key and 
the information associated with a key are single 
machine words. This is essentially without loss 
of generality: If more associated data is wanted, 
it can be referred to using a pointer. If keys 
are longer than one machine word, they can be 
mapped down to a single (or a few) machine 
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words using universal hashing [4] and the de- 
scribed method used on the hash values (which 
are unique to each key with high probability). 
The original key must then be stored as associated 
data. Let n denote an upper bound on the size of 
S'. To allow the size of the set to grow beyond n, 
global rebuilding can be used. 


Key Results 


Prehistory 

It has been known since the advent of universal 
hashing [4] that if the hash table has r > n? 
entries, a lookup can be implemented by retriev- 
ing just a single entry in the hash table. This is 
done by storing a key x in entry h(x) of the 
hash table, where /: is a function from the set of 
machine words to {1,...,”7}. If A is chosen at 
random from a universal family of hash functions 
[4], then with probability at least 1/2 every key in 
S' is assigned a unique entry. The same behavior 
would be seen if # was a random function, but in 
contrast to random functions, there are universal 
families that allow efficient storage and evalua- 
tion of (constant number of machine words and 
constant evaluation time). 

This overview concentrates on the case where 
the space used by the open- addressing hash table 
is linear, r = O(n). It was shown by Azar et al. 
[1] that it is possible to combine linear space 
with worst-case constant lookup time. It was 
not considered how to construct the data struc- 
ture. Since randomization is used, all schemes 
discussed have a probability of error. However, 
this probability is small, O(1/n) or less for 
all schemes, and an error can be handled by 
rehashing (changing the hash functions and re- 
building the hash table). The result of [1] was 
shown under the assumption that the algorithm 
is given free access to a number of truly random 
hash functions. In many of the subsequent papers, 
it is shown how to achieve the bounds using 
explicitly defined hash functions. However, no 
attempt is made here to cover these constructions. 

In the following, let ¢ denote an arbitrary pos- 
itive constant. Pagh [11] showed that retrieving 
two entries from the hash table suffices when 
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r > (2+ 8)n. Specifically, lookup of a key x can 
be done by retrieving entries h;(x) and h2(x) of 
the hash table, where h; and hz are random hash 
functions mapping machine words to {1,...,r}. 
The same result holds if , has range {1,...,r /2} 
and hz has range {r/2 + 1,...,r}, that is, if the 
two lookups are done in disjoint parts of memory. 

It follows from [11] that it is not possible to 
perform lookup by retrieving a single entry in the 
worst case unless the hash table is of size n2~°, 
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Pagh and Rodler [12] showed how to maintain 
the data structure of Pagh [11] under insertions. 
They considered the variant in which the lookups 
are done in disjoint parts of the hash table. It 
will be convenient to think of these as separate 
arrays, T; and T. Let L denote the contents of 
an empty hash table entry, and let x < y express 
that the values of variables x and y are swapped. 
The proposed dynamic algorithm, called cuckoo 
hashing, performs insertions by the following 
procedure: 


procedure insert(x) 
abe 


repeat 

x < T;[hj(x)];i:=3-i; 
until x = _L 
end 


At any time the variable x holds a key that 
needs to be inserted in the table, or L. The value 
of i changes between | and 2 in each iteration, 
so the algorithm is alternately exchanging the 
contents of x with a key from Table | and Table 2. 
Conceptually, what happens is that the algorithm 
moves a sequence of zero or more keys from 
one table to the other to make room for the new 
key. This is done in a greedy fashion, by kicking 
out any key that may be present in the location 
where a key is being moved. The similarity of the 
insertion procedure and the nesting habits of the 
European cuckoo is the reason for the name of the 
algorithm. 

The pseudocode above is slightly simplified. 
In general the algorithm needs to make sure 
not to insert the same key twice and handle the 
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possibility that the insertion may not succeed (by 
rehashing if the loop takes too long). 


Theorem 1 Assuming that r > (2 + 8)n, the 
expected time for the cuckoo hashing insertion 
procedure is O(1). 


Generalizations of Cuckoo Hashing 

Cuckoo hashing has been generalized in several 
directions. Kirsh et al. [8] showed that keeping 
a small stash of memory locations that are in- 
spected at every lookup can significantly reduce 
the error probability of cuckoo hashing. 

More generally the case of k > 2 hash func- 
tions has been considered. Also, the hash table 
may be divided into “buckets” of size b, such that 
the lookup procedure searches an entire bucket 
for each hash function. Let (k, b)-cuckoo denote 
a scheme with k hash functions and buckets of 
size b. What was described above is a (2,1)- 
cuckoo scheme. Already in 1999, (4,1)-cuckoo 
was described in a patent application by David 
A. Brown (US patent 6,775,281). Fotakis et al. 
described and analyzed a (k, 1)-cuckoo scheme 
in [7], and a (2,b)-cuckoo scheme was described 
and analyzed by Dietzfelbinger and Weidling [5]. 
In both cases, it was shown that space utilization 
arbitrarily close to 100% is possible and that 
the necessary fraction of unused space decreases 
exponentially with k and b. The insertion proce- 
dure considered in [5,7] is a breadth-first search 
for the shortest sequence of key moves that can 
be made to accommodate the new key. Pani- 
grahy [13] studied (2,2)-cuckoo schemes in de- 
tail, showing that a space utilization of 83 % can 
be achieved dynamically, still supporting con- 
stant time insertions using breadth-first search. 
In a static setting with no updates, thresholds for 
general (k,b)-cuckoo hashing have been estab- 
lished (see LeLarge [10] and its references). 


Applications 


Dictionaries (sometimes referred to as key-value 
stores) have a wide range of uses in computer 
science and engineering. For example, dictionar- 
ies arise in many applications in string algorithms 
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and data structures, database systems, data com- 
pression, and various information retrieval ap- 
plications. Also, cuckoo hashing has been used 
in oblivious RAM simulations and other crypto- 
graphic constructions [2, 14]. 


Open Problems 


The results above provide a good understanding 
of the properties of open-addressing schemes 
with worst-case constant lookup time. How- 
ever, several aspects are still not understood 
satisfactorily. 

First of all, there is no practical class of hash 
functions for which the above results can be 
shown. The only explicit classes of hash func- 
tions that are known to make the methods work 
either have evaluation time O(log m) or use 
space n°), Tt is an intriguing open problem to 
construct a class having constant evaluation time 
and space usage. 

For the generalizations of cuckoo hashing, the 
use of breadth-first search is not so attractive 
in practice, due to the associated overhead in 
storage. A simpler approach that does not require 
any storage is to perform a random walk where 
keys are moved to a random, alternative position. 
(This generalizes the cuckoo hashing insertion 
procedure, where there is only one alternative 
position to choose.) Panigrahy [13] showed that 
this works for (2,2)-cuckoo when the space uti- 
lization is low. However, it is unknown whether 
this approach works well as the space utilization 
approaches 100 %. 

Finally, many of the analyses that have been 
given are not tight. In contrast, most classical 
open addressing schemes have been analyzed 
very precisely. It seems likely that precise anal- 
ysis of cuckoo hashing and its generalizations 
is possible using techniques from analysis of 
algorithms and tools from the theory of random 
graphs. In particular, the relationship between 
space utilization and insertion time is not well 
understood. A precise analysis of the probabil- 
ity that cuckoo hashing fails has been given by 
Kutzelnigg [9]. 
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Experimental Results 


All experiments on cuckoo hashing and its gener- 
alizations so far presented in the literature have 
been done using simple, heuristic hash func- 
tions. Pagh and Rodler [12] presented experi- 
ments showing that, for space utilization 1/3, 
cuckoo hashing is competitive with open ad- 
dressing schemes that do not give a worst-case 
guarantee. Zukowski et al. [15] showed how to 
implement cuckoo hashing such that it runs very 
efficiently on pipelined processors with the capa- 
bility of processing several instructions in par- 
allel. For hash tables that are small enough to 
fit in cache, cuckoo hashing was 2 to 4 times 
faster than chained hashing in their experiments. 
Erlingsson et al. [6] considered (k, b)-cuckoo 
schemes for various combinations of small val- 
ues of k and b, showing that very high space 
utilization is possible even for modestly small 
values of k and b. For example, a space utilization 
of 99.9% is possible fork = b = 4. It was 
further found that the resulting algorithms were 
very robust. Experiments in [7] indicate that the 
random walk insertion procedure performs as 
well as one could hope for. 
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Problem Definition 


In the online bin packing problem, a sequence of 
items with sizes in the interval (0, 1] arrive one by 
one and need to be packed into bins, so that each 
bin contains items of total size at most 1. Each 
item must be irrevocably assigned to a bin before 
the next item becomes available. The algorithm 
has no knowledge about future items. There is an 
unlimited supply of bins available, and the goal is 
to minimize the total number of used bins (bins 
that receive at least one item). 

The most common performance measure for 
online bin packing algorithms is the asymptotic 
performance ratio, or asymptotic competitive ra- 
tio, which is defined as 


A(L 
Rasy (A):=lim sup max ce] 
n 


noo 


orr(L)=n| | 
(1) 


Hence, for any input L, the number of bins used 
by an online algorithm A is compared to the 
optimal number of bins needed to pack the same 
input. Note that calculating the optimal num- 
ber of bins might take exponential time; more- 
over, it requires that the entire input is known in 
advance. 


Key Results 


This paper presents a new framework for analyz- 
ing online bin packing algorithms. It can be used 
to analyze all known versions of the well-known 
Harmonic algorithm, including a new version 
introduced in this paper. 

The Harmonic algorithm [4] partitions the 
input into types depending on its size and packs 
each type separately. Harmonic-k has k types, 
and type 7 consists of items of size in the interval 
(1/@ + 1),1/i]. Harmonic has k open bins at 
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all times, one for each type, and packs 7 items of 
type 7 in one bin. Thus it achieves an asymptotic 
performance ratio of 1.691. 

For some inputs, this algorithm wastes a lot 
of space in some bins, for instance, if many 
items of size 1/2 + e arrive for some small 
€ > 0. Several authors improved on the basic 
Harmonic algorithm by combining some items 
of different types together into bins. Typically 
this is done by partitioning the intervals (1/2, 1] 
and (1/3,1/2] further, guaranteeing that items 
can be combined. As a simple example, by 
introducing intervals (1/2,0.6] and (1/3, 0.4], 
we can guarantee that items of these two 
new types can always be packed together in a 
single bin. Furthermore, the remaining intervals 
(0.6, 1] and (0.4,0.5] now give better area 
guarantees than before: bins with items of these 
types will be at least 0.6 and at least 0.8 full, 
respectively. 

Seiden builds on this idea and gives a 
new algorithm, Harmonic++, which beats 
all previously known algorithms and _ is 
still the best algorithm known. He used a 
computer-assisted search to set the many 
parameters of this algorithm. The algorithm 
partitions the intervals (1/2, 1] and (1/3, 1/2] 
in ten matching subintervals (in the sense 
described above) and also partitions intervals 
of several smaller types, using no less than 
70 intervals in total. Also using a computer 
search, he proves that the asymptotic per- 
formance ratio of this algorithm is at most 
1.58889. 

Seiden also showed that the asymptotic perfor- 
mance ratio of a similar algorithm presented ear- 
lier, Harmonic+1, is at least 1.5972, disproving a 
claim by Richey [6] that Harmonic+1 is 1.58872- 
competitive. 

The framework introduced by Seiden was used 
later in other contexts, for instance, for two- 
dimensional bin packing, where a set of rectan- 
gles needs to be packed into square bins. Han 
et al. [3] presented an algorithm with asymp- 
totic performance ratio 2.5545. For the special 
case of packing squares, Han et al. [2] presented 
an algorithm with asymptotic performance ratio 
2.1187. 


Curve Reconstruction 


Open Problems 


The algorithm is very close to optimal for 
Harmonic-type algorithms, for which Ramanan 
et al. [5] showed a lower bound of 1.58333.... 
However, the general lower bound for this 
problem is only 1.54037 [1]. It is very difficult 
to see how either result can be improved, 
and this remains a challenging open problem 
which will require new ideas. There has been 
essentially no improvement in this area in over a 
decade. 
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Problem Definition 


Given a set S of sample points from a col- 
lection J” of simple (nonintersecting) curves in 
the Euclidean plane, curve reconstruction is the 
problem of computing the graph G(S, I”), called 
the correct reconstruction, whose vertex set is S 
and that has an edge between two vertices if and 
only if the respective samples are adjacent on a 
curve in I"; see Fig. 1. 

Obviously, it is not possible to correctly 
reconstruct a given collection of curves from 
an arbitrary sample set from it. Therefore, 
some restriction on the sample set S — a so- 
called sampling condition — is required which 
specifies how dense a sampling has to be to 
guarantee a correct output of an algorithm. The 
difficulty for an algorithm to solve the curve 
reconstruction problem (and to come up with 
a suitable sampling condition) varies with the 
classes of allowed curves in J” and whether the 
set S is actually sampled from the curves or 
noisy. 


_ oe 
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Key Results 


If the curves are closed, smooth, and uniformly 
sampled — that is, with a uniform maximum dis- 
tance between adjacent sample points — several 
methods for the curve reconstruction problem 
are known to work ranging over minimum span- 
ning trees,a-shapes, B-skeletons [KR85], and r- 
regular shapes; see the survey by Edelsbrunner 
[7]. The focus of this section are approaches 
which can deal with nonuniform sampling con- 
ditions, that is, conditions which allow sparser 
sampling in areas of low detail and require higher 
sampling only in areas of high detail. 


Closed Smooth Curves 

Amenta, Bern, and Epstein [2] introduced the 
concept of the local feature size lfs(p) of a point 
p € I which is defined as the distance to the 
medial axis of I’. The medial axis of a collection 
of curves I" is defined as the set of points in the 
plane which have more than one closest point on 
a curve in I"; see Fig.2. Roughly speaking, a 
neighborhood of a point of size equal to its local 
feature size is intersected by the curves in a single 
piece that winds up only a small angle. 

The introduction of the local feature size al- 
lowed for a very elegant sampling condition: 
A sample set S is called an €-sampling for a 
collection of curves I’, if for all p ¢ I, 4s € S 
with |ps| < elfs(p). This condition naturally 
captures the intuition that “complicated” areas of 
the curve require higher sampling density than 
areas of low detail. 


eo 


Curve Reconstruction, Fig. 1 A collection of curves I”, a sample S from I’, and the correct reconstruction GOS I") 
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Curve Reconstruction, Fig. 2 The light curves are the 
medial axis of the heavy curves (Courtesy of N. Amenta, 
M. Bern, and D. Eppstein) 


For small enough e€, the Voronoi nodes in the 
Voronoi diagram VD(S) of an e-sampling S for 
I approximate the medial axis of J”. Based on 
that intuition, the CRUST algorithm in [1] outputs 
as correct reconstruction the edges of the Delau- 
nay triangulation of S having a ball empty of 
Voronoi vertices in the Voronoi diagram VD(S). 
For € < 0.252 CRUST provably outputs the 
correct reconstruction of an €-sampling S with 
respect to a collection of closed smooth curves 
T. 

In the same paper, the authors could also show 
that a known algorithm — the f-skeleton — for 
suitable choice of 6 also correctly reconstructs a 
collection of closed smooth curves for € < 0.297. 

Later, Dey and Kumar [4] presented an 
extremely simple and straightforward algorithm 
connecting essentially the nearest neighbors on 
opposite sides. They could prove this algorithm to 
be correct under the local feature size sampling 
condition for ¢ < 1/3. What is particularly 
interesting about this algorithm is the fact that 
decisions which points to connect are made 
based on a very local neighborhood of the 
respective points. This idea later nicely translated 
to algorithms for the 3- and higher-dimensional 
manifold reconstruction problem. 


Open Smooth Curves 

When considering the larger class of open and 
closed smooth curves, there is a little caveat. 
While one can guarantee for sufficiently dense 
samplings, i.e., small enough values of e€, that 


Curve Reconstruction 


all edges of the correct reconstruction are present 
in the output of a reconstruction algorithm, one 
cannot always avoid the inclusion of additional 
edges in the output of an algorithm. Essentially, 
the problem is that a sample set, S set, might 
be an €-sampling for two collections of curves 
I’ and I’ with different correct reconstructions 
G(S,I) and G(S, I’) irrespectively how small 
€ is chosen. 

Dey, Mehlhorn, and Ramos in [5] introduced 
the concept of a witness curve I"*, proving the 
following guarantee for their CONSERVATIVE 
CRUST algorithm: If S is an €-sampling for a 
collection of open and closed smooth curves I’, 
their algorithm returns a reconstruction H such 
that H > G(S, I), that is, H contains all edges 
of the correct reconstruction. Furthermore, the 
algorithm outputs a curve [”* such that S is an 
e’ » e-sampling for '* and H = G(S,I*). 
Their algorithm is similar to the CRUST algo- 
rithm in that it identifies a subcomplex of the 
Delaunay triangulation. 


Closed Curves with Corners 

Another natural extension of the allowed classes 
of curves in J" is the inclusion of curves with cor- 
ners, that is, points where left and right tangent do 
not coincide. Unfortunately, in this case, one can- 
not use the sampling condition based on the local 
feature size since the medial axis actually touches 
the corners, requiring an infinitely dense sam- 
pling near corners. The first algorithms to deal 
with a single closed curve possibly with corners 
were by Giesen [9] and Althaus/Mehlhorn [2] 
based on the construction of a travelling salesman 
tour. In their sampling condition, areas of the 
curve nearby a corner were exempt from the €- 
sampling condition. [2] could even prove that 
the respective TSP instance can be solved in 
polynomial time for sufficiently dense sample 
sets. Dey and Wenger [6] proposed a non-TSP- 
based approach for collections of closed curves 
with corners. 


Open and Closed Curves with Corners 

Finally Funke/Ramos [8] considered the case of 
collections of open and closed curves. While 
their algorithm also comes with a guarantee for 
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some variant of an €-sampling condition (with 
special condition near corners like [9] and [2]), 
they also propose a sampling condition which is 
expressed with respect to the correct reconstruc- 
tion G(S,J"). Not being based on a travelling 
salesman tour computation, their algorithm can 
also handle collections containing several open 
curves. As [5] the algorithm also produces a 
collection of witness curves I’. 


Noisy Sample Sets 

A generalization in a different direction is the 
consideration of sample sets S which do not con- 
sist of points exactly on the curves in I” but —e.g., 
due to measurement errors — lie only “nearby.” 
In [3] the authors considered such noisy sample 
sets from a collection of disjoint smooth closed 
curves and could prove for a perturbed locally 
uniform sample set that their algorithm computes 
as output a set of polygonal curves converging to 
I” as the sampling density increases. 
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Problem Definition 


The problem is motivated by the need to manage 
data on a set of storage devices to handle dynami- 
cally changing demand. To maximize utilization, 
the data layout (i.e., a mapping that specifies the 
subset of data items stored on each disk) needs to 
be computed based on disk capacities as well as 
the demand for data. Over time as the demand for 
data changes, the system needs to create new data 
layout. The data migration problem is to compute 
an efficient schedule for the set of disks to convert 
an initial layout to a target layout. 

The problem is defined as follows. Suppose 
that there are N disks and A data items, and 
an initial layout and a target layout are given 
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(see Fig. la for an example). For each item i, 
source disks S; is defined to be a subset of disks 
which have item / in the initial layout. Destination 
disks D; is a subset of disks that want to receive 
item 7. In other words, disks in D; have to store 
item i in the target layout but do not have to store 
it in the initial layout. Figure 1b shows the cor- 
responding S; and Dj. It is assumed that S; 4 @ 
and D; # @ for each item i. Data migration is the 
transfer of data to have all D; receive data item i 
residing in S; initially, and the goal is to minimize 
the total amount of time required for the transfers. 

Assume that the underlying network is fully 
connected and the data items are all the same 
size. In other words, it takes the same amount of 
time to migrate an item from one disk to another. 
Therefore, migrations are performed in rounds. 
Consider the half-duplex model, where each disk 
can participate in the transfer of only one item — 
either as a sender or as a receiver. The objective is 
to find a migration schedule using the minimum 
number of rounds. No bypass nodes (A bypass 
node is a node that is not the target of a move 
operation, but is used as an intermediate holding 
point for a data item.) can be used and therefore 
all data items are sent only to disks that desire 
them. 


Key Results 
Khuller et al. [11] developed a 9.5-approximation 


for the data migration problem, which was later 
improved to 6.5 + o(1). In the next subsection, 
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Data Migration, Fig. 1 a ae b 
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and target layout and Si={2,3} Di=t{l} 
right their corresponding 24 12 13 
S;’s and D;’s , , ; S.={1,2} D={3} 
disk 1 disk 2 disk 3 
Target Layout S={1} D={3} 


the lower bounds of the problem are first 
examined. 


Notations and Lower Bounds 

1. Maximum in-degree (8): Let £; be the 
number of data items that a disk j has to 
receive. In other words, 6; = |{i|j € Dj}|. 
Then 6 = max; B; is a lower bound on the 
optimal as a disk can receive only one data 
item in one round. 

2. Maximum number of items that a disk may 
be a source or destination for (~): For each 
item i, at least one disk in S; should be used 
as a source for the item, and this disk is 
called a primary source. A unique primary 
source s; € S; for each item i that minimizes 
a = max;=1,...w(|{i|7 = si}| + Bj) can be 
found using a network flow. Note that a > B, 
and « is also a lower bound on the optimal 
solution. 

3. Minimum time required for cloning (M): 
Let a disk 7 make a copy of item i at the kth 
round. At the end of the mth round, the number 
of copies that can be created from the copy 
is at most 2”-* as in each round the number 
of copies can only be doubled. Also note that 
each disk can make a copy of only one item 
in one round. Since at least |D;| copies of 
item 7 need to be created, the minimum m that 
satisfies the following linear program gives 
a lower bound on the optimal solution: L(m): 


m 
SS 5 a > |D;| for alli 


(1) 

J k=1 
So xiik <1 forall j,k (2) 

I 
O< xin <1 (3) 


Data Migration Algorithm 

A 9.5-approximation can be obtained as follows. 
The algorithm first computes representative sets 
for each item and sends the item to the represen- 
tative sets first, which in turn send the item to the 
remaining set. Representative sets are computed 
differently depending on the size of Dj. 


Representatives for Big Sets 

For sets with size at least 8, a disjoint collec- 
tion of representative sets R;,i = 1...A has to 
satisfy the following properties: Each R; should 
be a subset of D; and |R;| = ||D;|/B|. The 
representative sets can be found using a network 
flow. 


Representatives for Small Sets 

For each item i, let y; = |D;|modk. A sec- 
ondary representativer; in D; for the items with 
yi #0 needs to be computed. A disk j can be 
a secondary representative r; for several items as 
long as Dies, vi < 26 —1, where J; is a set of 
items for which j is a secondary representative. 
This can be done by applying the Shmoys—Tardos 
algorithm [17] for the generalized assignment 
problem. 


Scheduling Migrations 
Given representatives for all data items, migra- 
tions can be done in three steps as follows: 


1. Migration to R;: Each item i is first sent to 
the set R;. By converting a fractional solu- 
tion given in L(M), one can find a migration 
schedule from s; to R; and it requires at most 
2M + a rounds. 

2. Migration to 7;: Item i is sent from primary 
source s; to r;. The migrations can be done 
in 1.5@ rounds, using an algorithm for edge 
coloring [16]. 


Data Migration 


3. Migration to the remaining disks: A transfer 
graph from representatives to the remaining 
disks can now be created as follows. For each 
item i, add directed edges from disks in R; to 
(B — 1)['4i| disks in D; \ R; such that the 
out-degree of each node in R; is at most 6 — 1 
and the in-degree of each node in D; \ R; 
from R; is 1. A directed edge is also added 
from the secondary representative r; of item i 
to the remaining disks in D; which do not have 
an edge coming from R;. It has been shown 
that the maximum degree of the transfer graph 
is at most 48 — 5 and the multiplicity is B + 2. 
Therefore, migration for the transfer graph can 
be done in 58 — 3 rounds using an algorithm 
for multigraph edge coloring [18]. 


Analysis 

Note that the total number of rounds required 
in the algorithm described in “Data Migration 
Algorithm” is at most 2M + 2.5a + 58 — 3. As 
a, B and M are lower bounds on the optimal 
number of rounds, the abovementioned algorithm 
gives a 9.5-approximation. 


Theorem 1 ({11]) There is a 9.5-approximation 
algorithm for the data migration problem. 


Khuller et al. [10] later improved the algorithm 
and obtained a (6.5 + o(1))-approximation. 


Theorem 2 ({10]) There is a (6.5 + o(1))- 
approximation algorithm for the data migration 
problem. 


Applications 


Data Migration in Storage Systems 

Typically, a large storage server consists of sev- 
eral disks connected using a dedicated network, 
called a storage area network. To handle high de- 
mand, especially for multimedia data, a common 
approach is to replicate data objects within the 
storage system. Disks typically have constraints 
on storage as well as the number of clients that 
can access data from a single disk simultaneously. 
Approximation algorithms have been developed 
to map known demand for data to a specific 
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data layout pattern to maximize utilization (The 
utilization is the total number of clients that can 
be assigned to a disk that contains the data they 
want.) [4, 8, 14, 15]. In the layout, they compute 
not only how many copies of each item need to 
be created, but also a layout pattern that specifies 
the precise subset of items on each disk. The 
problem is NP-hard, but there are polynomial- 
time approximation schemes [4, 8, 14]. Given the 
relative demand for data, the algorithm computes 
an almost optimal layout. 

Over time as the demand for data changes, 
the system needs to create new data layouts. To 
handle high demand for popular objects, new 
copies may have to be dynamically created and 
stored on different disks. The data migration 
problem is to compute a specific schedule for the 
set of disks to convert an initial layout to a target 
layout. Migration should be done as quickly as 
possible since the performance of the system will 
be suboptimal during migration. 


Gossiping and Broadcasting 

The data migration problem can be considered as 
a generalization of gossiping and broadcasting. 
The problems of gossiping and broadcasting play 
an important role in the design of communication 
protocols in various kinds of networks and have 
been extensively studied (see for example [6, 7] 
and the references therein). The gossip problem 
is defined as follows. There are n individuals and 
each individual has an item of gossip that he/she 
wish to communicate to everyone else. Commu- 
nication is typically done in rounds, where in 
each round an individual may communicate with 
at most one other individual. Some communica- 
tion models allow for the full exchange of all 
items of gossip known to each individual in a sin- 
gle round. In addition, there may be a communi- 
cation graph whose edge indicates which pairs of 
individuals are allowed to communicate directly 
in each round. In the broadcast problem, one 
individual needs to convey an item of gossip to 
every other individual. The data migration prob- 
lem generalizes the gossiping and broadcasting 
in three ways: (1) each item of gossip needs to 
be communicated to only a subset of individuals; 
(2) several items of gossip may be known to an 
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individual; (3) a single item of gossip can initially 
be shared by several individuals. 


Open Problems 


The data migration problem is NP-hard by re- 
duction from the edge coloring problem. How- 
ever, no inapproximability results are known for 
the problem. As the current best approxima- 
tion factor is relatively high (6.5 + o(1)), it is 
an interesting open problem to narrow the gap 
between the approximation guarantee and the 
inapproximability. 

Another open problem is to combine data 
placement and migration problems. This question 
was studied by Khuller et al. [9]. Given the initial 
layout and the new demand pattern, their goal 
was to find a set of data migrations that can be 
performed in a specific number of rounds and 
gives the best possible layout to the current de- 
mand pattern. They showed that even one-round 
migration is NP-hard and presented a heuris- 
tic algorithm for the one-round migration prob- 
lem. The experiments showed that performing 
a few rounds of one-round migration consecu- 
tively works well in practice. Obtaining nontrivial 
approximation algorithms for this problem would 
be interesting future work. 

Data migration in a heterogeneous storage 
system is another interesting direction for future 
research. Most research on data migration has 
focused mainly on homogeneous storage sys- 
tems, assuming that disks have the same fixed 
capabilities and the network connections are of 
the same fixed bandwidth. In practice, however, 
large-scale storage systems may be heterogenous. 
For instance, disks tend to have heterogeneous 
capabilities as they are added over time ow- 
ing to increasing demand for storage capacity. 
Lu et al. [13] studied the case when disks have 
variable bandwidth owing to the loads on dif- 
ferent disks. They used a control-theoretic ap- 
proach to generate adaptive rates of data mi- 
grations which minimize the degradation of the 
quality of the service. The algorithm reduces 
the latency experienced by clients significantly 
compared with the previous schemes. However, 
no theoretical bounds on the efficiency of data 
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migrations were provided. Coffman et al. [2] 
studied the case when each disk i can handle p; 
transfers simultaneously and provided approxi- 
mation algorithms. Some papers [2, 12] consid- 
ered the case when the lengths of data items are 
heterogenous (but the system is homogeneous), 
and present approximation algorithms for the 
problem. 


Experimental Results 


Golubchik et al. [3] conducted an extensive study 
of the performance of data migration algorithms 
under different changes in user-access patterns. 
They compared the 9.5-approximation [11] and 
several other heuristic algorithms. Some of these 
heuristic algorithms cannot provide constant 
approximation guarantees, while for some of 
the algorithms no approximation guarantees are 
known. Although the worst-case performance of 
the algorithm by Khuller et al. [11] is 9.5, in the 
experiments the number of rounds required was 
less than 3.25 times the lower bound. 

They also introduced the correspondence 
problem, in which a matching between disks in 
the initial layout with disks in the target layout is 
computed so as to minimize changes. A good 
solution to the correspondence problem can 
improve the performance of the data migration 
algorithms by a factor of 4.4 in their experiments, 
relative to a bad solution. 


URL to Code 
http://www.cs.umd.edu/projects/smart/data- 
migration/ 
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They assumed that a data transfer graph is given, 
in which a node corresponds to each disk and 
a directed edge corresponds to each move oper- 
ation that is specified (the creation of new copies 
of data items is not allowed). Computing a data 
movement schedule is exactly the problem of 
edge-coloring the transfer graph. Algorithms for 
edge-coloring multigraphs can now be applied to 
produce a migration schedule since each color 
class represents a matching in the graph that can 
be scheduled simultaneously. Computing a solu- 
tion with the minimum number of rounds is NP- 
hard, but several good approximation algorithms 
are available for edge coloring. With space con- 
straints on the disk, the problem becomes more 
challenging. Hall et al. [5] showed that with the 
assumption that each disk has one spare unit 
of storage, very good constant factor approxi- 
mations can be developed. The algorithms use 
at most 4[A/4] colors with at most n/3 bypass 
nodes, or at most 6[A/4] colors without bypass 
nodes. 

Most of the results on the data migration prob- 
lem deal with the half-duplex model. Another in- 
teresting communication model is the full-duplex 
model where each disk can act as a sender and 
a receiver in each round for a single item. There 
is a (4+ o(1))-approximation algorithm for the 
full-duplex model [10]. 
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Problem Definition 


The NP-complete DOMINATING SET problem is 
a notoriously hard problem: 


Problem 1 (Dominating Set) 

INPUT: An undirected graph G = (V, F) and an 
integer k > 0. 

QUESTION: Is there an S C V with |S| < k such 
that every vertex v € V is contained in S or has at 
least one neighbor in S? 


For instance, for an n-vertex graph its optimiza- 
tion version is known to be polynomial-time 
approximable only up to a factor of O(logn) 
unless some _ standard complexity-theoretic 
assumptions fail [9]. In terms of parametrized 
complexity, the problem is shown to be W[2]- 
complete [8]. Although still NP-complete when 
restricted to planar graphs, the situation much 
improves here. In her seminal work, Baker 
showed that there is an efficient polynomial- 
time approximation scheme (PTAS) [6], and 
the problem also becomes fixed-parameter 
tractable [2, 4] when restricted to planar graphs. 
In particular, the problem becomes accessible 
to fairly effective data reduction rules and 
a kernelization result (see [16] for a general 
description of data reduction and kernelization) 
can be proven. This is the subject of this entry. 


Key Results 


The key idea behind the data reduction is pre- 
processing based on locally acting simplification 
rules. Exemplary, here we describe a rule where 
the local neighborhood of each graph vertex is 
considered. To this end, we need the following 
definitions. 

We partition the neighborhood M(v) of an ar- 
bitrary vertex v € V in the input graph into three 
disjoint sets Ni(v), No(v), and N3(v) depending 
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on local neighborhood structure. More specifi- 
cally, we define 


¢ Nj(v) to contain all neighbors of v that have 
edges to vertices that are not neighbors of v; 

¢ Np (v) to contain all vertices from N(v)\N1(v) 
that have edges to at least one vertex 
from Ny(v); 

¢ N3(v) to contain all neighbors of v that are 
neither in Nj(v) nor in N(v). 


An example which illustrates such a partitioning 
is given in Fig. 1 (left-hand side). A helpful 
and intuitive interpretation of the partition is to 
see vertices in Nj(v) as exits because they have 
direct connections to the world outside the closed 
neighborhood of v, vertices in N2(v) as guards 
because they have direct connections to exits, and 
vertices in N3(v) as prisoners because they do not 
see the world outside {v} U N(v). 

Now consider a vertex w € N3(v). Such a ver- 
tex only has neighbors in {v} U N2(v) U N3(v). 
Hence, to dominate w, at least one vertex 
of {vu} U No(v) U N3(v) must be contained in 
a dominating set for the input graph. Since v can 
dominate all vertices that would be dominated 
by choosing a vertex from N>(v) U N3(v) into 
the dominating set, we obtain the following data 
reduction rule. 


If N3(v) 4 O for some vertex v, then remove 
N2(v) and N3(v) from G 
and add a new vertex v’ 


with the edge {v, v’} to G. 


Note that the new vertex v’ can be considered as 
a “gadget vertex” that “enforces” v to be chosen 
into the dominating set. It is not hard to verify the 
correctness of this rule, that is, the original graph 
has a dominating set of size k iff the reduced 
graph has a dominating set of size k. Clearly, 
the data reduction can be executed in polynomial 
time [5]. Note, however, that there are particular 
“diamond” structures that are not amenable to 
this reduction rule. Hence, a second, somewhat 
more complicated rule based on considering the 
joint neighborhood of two vertices has been in- 
troduced [5]. 


Data Reduction for Domination in Graphs 


© N(v) 
® N2( 
O N;( 


U 


) 
v) 


Data Reduction for Domination in Graphs, Fig.1_ The 
left-hand side shows the partitioning of the neighborhood 
of a single vertex v. The right-hand side shows the 


Altogether, the following core result could be 
shown [5]. 


Theorem 1 A planar graph G = (V,E) can 
be reduced in polynomial time to a planar 
graph G' = (V', E’) such that G has a domi- 
nating set of size k iff G’ has a dominating set of 
size k and |V'| = O(k). 


In other words, the theorem states that the DOM- 
INATING SET in planar graphs has a linear-size 
problem kernel. The upper bound on |V’| was 
first shown to be 335k [5] and was then further 
improved to 67k [7]. Moreover, the results can be 
extended to graphs of bounded genus [10]. In ad- 
dition, similar results (linear kernelization) have 
been recently obtained for the FULL-DEGREE 
SPANNING TREE problem in planar graphs [13]. 
Very recently, these results have been generalized 
into a methodological framework [12]. 


Applications 


DOMINATING SET is considered to be one of 
the most central graph problems [14, 15]. Its 
applications range from facility location to bioin- 
formatics. 


Open Problems 


The best lower bound for the size of a problem 
kernel for DOMINATING SET in planar graphs 
is 2k [7]. Thus, there is quite a gap between 
known upper and lower bounds. In addition, 
there have been some considerations concerning 
a generalization of the above-discussed data 
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result of applying the presented data reduction rule to this 
particular (sub)graph 


reduction rules [3]. To what extent such 
extensions are of practical use remains to be 
explored. Finally, a study of deeper connections 
between Baker’s PTAS results [6] and linear 
kernelization results for DOMINATING SET in 
planar graphs seems to be worthwhile for future 
research. Links concerning the class of problems 
amenable to both approaches have been detected 
recently [12]. The research field of data reduction 
and problem kernelization as a whole together 
with its challenges is discussed in a recent 
survey [11]. 


Experimental Results 


The above-described theoretical work has been 
accompanied by experimental investigations on 
synthetic as well as real-world data [1]. The re- 
sults have been encouraging in general. However, 
note that grid structures seem to be a hard case 
where the data reduction rules remained largely 
ineffective. 
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Problem Definition 


The problem is concerned with the following 
setting. A computationally limited client wants 
to compute some property of a massive input, 
but lacks the resources to store even a small 
fraction of the input, and hence cannot perform 
the desired computation locally. The client there- 
fore accesses a powerful but untrusted service 
provider (e.g., a commercial cloud computing 
service), who not only performs the requested 
computation but also proves that the answer is 
correct. An array of closely related models have 
been introduced to capture this scenario. The 
following section provides a unified presentation 
of these models, emphasizing their common fea- 
tures before delineating their differences. 


Streaming Verification Model 

Leto = (d1,d2,...,@m) be a data stream, where 
each a; comes from a data universe U/ of size n, 
and let F be a function mapping data streams to 
a finite range 7. A stream verification protocol 
for F involves two parties: a prover P and a 
(randomized) verifier V. The protocol consists of 
two stages: a stream observation stage and a proof 
verification stage. 

In the stream observation stage, V processes 
the stream o, subject to the standard constraints 
of the data-stream model, i.e., sequential access 
to o and limited memory. In the proof verification 
stage, V and P exchange a sequence of one or 
more messages, and afterward V outputs a value 
b. V is allowed to output a special symbol L 
indicating a rejection of P’s claims. Formally, V 
constitutes a stream verification protocol if the 
following two properties are satisfied: 
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¢ Completeness: There is some prover strategy 
P such that, for all streams o, the probability 
that V outputs F(o) after interacting with P is 
at least 2/3. 

¢ Soundness: For all streams o and all prover 
strategies P, the probability that V outputs a 
value not in {F (x), L} after interacting with 
P is at most € < 1/3. 


Here, the probabilities are taken over V’s inter- 
nal randomness. The constants 2/3 and 1/3 are 
not essential and are chosen by convention. The 
parameter € is referred to as the soundness error 
of the protocol. 


Costs 

There are five primary costs in any stream ver- 
ification protocol: (1) V’s space usage, (2) the 
total communication cost, (3) V’s runtime, (4) 
P’s runtime, and (5) the number of messages 
exchanged. 


Differences Between Models 

There are three primary differences between 
the various models of stream verification that 
have been put forth in the literature. The first 
is whether the soundness condition is required 
to hold against all cheating provers (such 
protocols are called information-theoretically 
or statistically sound), or only against cheating 
provers that run in polynomial time (such 
protocols are called computationally sound). 
The second is the amount and format of the 
interaction allowed between P and V. The third 
is the temporal relationship between the stream 
observation and proof verification stage — in 
particular, several models permit P and Y to 
exchange messages before and during the stream 
observation stage and sometimes permit the 
prover’s messages to depend on parts of the 
data stream that V has not yet seen. In general, 
more permissive models allow a larger class of 
problems to be solved efficiently, but may yield 
protocols that are less realistic. 


Summary of Models 
The annotated data streaming (ADS) model [3] 
is noninteractive: P is permitted to send a single 
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message to VY, with no communication allowed 
in the reverse direction. Technically, this model 
permits the contents of P’s message to be inter- 
leaved with the stream, in which case each bit 
of P’s message may be viewed as an “annota- 
tion” associated with a particular stream update. 
However, for most ADS protocols that have been 
developed, P’s message can be sent after the 
stream observation phase. There are two kinds of 
ADS protocols: prescient protocols, in which the 
annotation sent at any given time can depend on 
parts of the data stream that V has not yet seen, 
and online protocols, which disallow this kind of 
dependence. 

Streaming interactive proofs (SIPs) extend the 
ADS model to allow the prover and verifier to 
exchange many messages [6]. The Arthur—Merlin 
streaming model [10] is equivalent to a restricted 
class of SIPs, in which Y is only allowed to 
send a single message to P (which must con- 
sist entirely of random coin tosses, in analogy 
with the classical complexity class AM), before 
receiving P’s reply. The streaming delegation 
model [5] corresponds to SIPs that only satisfy 
computational, rather than information-theoretic, 
soundness. 


Key Results 


Obtaining exact answers even for basic problems 
in the standard data streaming model is impos- 
sible using o(7) space. In contrast, stream veri- 
fication protocols with o(m) space and commu- 
nication costs have been developed for (exactly 
solving) a wide variety of problems. Many of 
these protocols have adapted powerful algebraic 
techniques originally developed in the classical 
literature on interactive proofs, particularly the 
sum-check protocol of Lund et al. [14]. All of the 
protocols described here apply even to streams in 
the strict turnstile update model, where universe 
items can be deleted as well as inserted. 


Annotated Data Streams 

Chakrabarti et al. [3] showed that prescient ADS 
protocols can be exponentially more powerful 
than online ones for some problems. For example, 
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there is a prescient ADS protocol with logarith- 
mic space and communication costs for com- 
puting the median of a sequence of numbers: 
P sends VY the claimed median t at the start 
of the stream, and while observing the stream, 
Y checks that |{7 : aj < t}| < m/2, and 
{7 : aj; > t}| < m/2, which can be done 
using an O(logm)-bit counter. Meanwhile, [3] 
proved that any online protocol for MEDIAN with 
communication cost / and space cost v requires 
h-v = §2(n) and gave an online ADS protocol 
achieving these communication—space trade-offs 
up to logarithmic factors. 

Chakrabarti et al. [3] also gave online 
ADS protocols achieving identical trade-offs 
between space and communication costs for 
problems including FREQUENCY MOMENTS 
and FREQUENT ITEMS and used a_ lower 
bound due to Klauck [11] on the Merlin— 
Arthur communication complexity of the SET- 
DISJOINTNESS function to show that these 
trade-offs are optimal for these problems even 
among prescient protocols. Subsequent work 
gave similarly optimal online ADS protocols 
for several more problems, including maximum 
matching and counting triangles in graphs and 
matrix-vector multiplication [8, 18]. Chakrabarti 
et al. [2] gave optimized protocols for streams 
whose length m is much smaller than the universe 
size n. 


Streaming Interactive Proofs 

Cormode, Thaler, and Yi [6] showed that several 
general protocols from the classical literature 
on interactive proofs can be simulated in the 
SIP model. In particular, this includes a power- 
ful, general-purpose protocol due to Goldwasser, 
Kalai, and Rothblum [9] (henceforth, the GKR 
protocol). Given any problem computed by an 
arithmetic or Boolean circuit of polynomial size 
and polylogarithmic depth, the GKR protocol 
requires only polylogarithmic space and com- 
munication while using polylogarithmic rounds 
of verifier—prover interaction. This yields SIPs 
for exactly solving many basic streaming prob- 
lems with polylogarithmic space and communi- 
cation costs, including FREQUENCY MOMENTS, 
FREQUENT ITEMS, and GRAPH CONNECTIVITY. 
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Optimized protocols for specific problems, in- 
cluding FREQUENCY MOMENTS (see the detailed 
example below), were also presented. 
Chakrabarti et al. [4] give constant-round SIPs 
with logarithmic space and communication costs 
for many problems, including INDEX, RANGE- 
COUNTING, and NEAREST-NEIGHBOR SEARCH. 
Gur and Raz [10] gave an Arthur—Merlin stream- 
ing protocol for the DISTINCT ELEMENTS prob- 
lem with communication cost O(h) space cost 
O(v) for any h, v satisfying h- v > n. Klauck 
and Prakash [13] extended this protocol to give an 
SIP for Distinct Elements with polylogarithmic 
space and communication costs and logarithmi- 
cally many rounds of prover—verifier interaction. 


Computationally Sound Protocols 
Computationally sound protocols may achieve 
properties that are unattainable in the information- 
theoretic setting. For example, they typically 
achieve reusability, allowing the verifier to use 
the same randomness to answer many queries. 
In contrast, most SIPs only support “one-shot” 
queries, because they require the verifier to reveal 
secret randomness to the prover over the course of 
the protocol. Chung et al. [5] combined the GKR 
protocol with fully homomorphic encryption 
(FHE) to give reusable two-message protocols 
with polylogarithmic space and communication 
costs for any problem in the complexity class NC. 
They also gave reusable four-message protocols 
with polylogarithimic space and communication 
costs for any problem in the complexity class P. 
Papamanthou et al. [15] gave improved protocols 
for a class of low-complexity queries including 
point queries and range search: these protocols 
avoid the use of FHE and allow the prover to 
answer such queries in polylogarithmic time. (In 
contrast, protocols based on the GKR protocol 
[5,6] require the prover to spend time quasilinear 
in the size of the data stream after receiving a 
query, even if the answer itself can be computed 
in sublinear time.) 


Implementations 

Implementations of the GKR protocol were 
provided in [7, 17]. Cormode, Mitzenmacher, 
and Thaler [7] also provided optimized 
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implementations of several ADS protocols from 
[3,8]. Thaler et al. [19] provided parallelized im- 
plementations using Graphics Processing Units. 


Detailed Example 

The sum-check protocol can be directly applied 
to give an SIP for the kth frequency moment 
problem with log 7 rounds of prover—verifier iter- 
ation and O(log?(n)) space and communication 
costs. The sum-check protocol is described in 
Fig. 1. 


Properties and Costs of the Sum-Check 

Protocol 

The sum-check protocol satisfies perfect com- 
pleteness and has soundness error « < deg(g)/ 
|F|, where deg(g) denotes the total degree of 
g [14]. There is one round of prover—verifier 
interaction in the sum-check protocol for each of 
the v variables of g, and the total communication 
is O(deg(g)) field elements. 
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Note that as described in Fig. 1, the sum-check 
protocol assumes that the verifier has oracle ac- 
cess to g. However, this will not be the case in 
applications, as g will ultimately be a polynomial 
that depends on the input data stream. 


The SIP for Frequency Moments 

In the kth frequency moment problem, the goal 
is to output >; <7, hes , where f; is the number 
of times item i appears in the data stream o. 
For a vector i = (i1,...,ilogn) € {0, Lyles", 


let yi(%X1,.--,Xtogn) = Brae Xi, (Xx), where 
Xo(xXk) = 1|-— xx and x1(Xx) = Xk. Xi is 


the unique multilinear polynomial that maps i € 
{0, 1}'8” to 1 and all other values in {0, 1}!°2” to 
O, and it is referred to as the multilinear extension 
of i. 

For each i € U, associate i with a vector 
i ¢ {0,1}'°8” in the natural way, and let F be a 
finite field with n* < |F| < 4-n*. Define the 
polynomial a : Flee” _, F via 


gi(X1) = 


eS 


gj (Xj) = 


gj (0) + g;(1), rejecting if not. 


Input: V is given oracle access to a u-variate polynomial g over finite field F and an H € F. 
Goal: Determine whether H = Do (ea jeer) E{0,1}” Q(B yroreBy)» 


° In the first round, P computes the univariate polynomial 


ys 


£2,...,8yE{0,1}e-t 


and sends g; to Y. V checks that g; is a univariate polynomial of degree at most deg, (q), 
and that H = g1(0) + gi (1), rejecting if not. 

VY chooses a random element 7; € F, and sends r; to P. 

In the jth round, for 1 < 7 < v, P sends to V the univariate polynomial 


(a3 41,---2v)€{0,1}°-3 


V checks that g; is a univariate polynomial of degree at most deg;(g), and that gj—1(rj-1) = 


Y chooses a random element r; € F, and sends r; to P. 
In round v, P sends to V the univariate polynomial 


Gu( Xv) =9(1 [seeteesy Py—1,Xy)- 
Y checks that g, is a univariate polynomial of degree at most deg,(g), rejecting if not. 
e Y chooses a random element 7, € F and evaluates g(1rj........°,) with a single oracle query 
to g. V checks that go(rv) = g(r1,......, rv), rejecting if not. 
e If V has not yet rejected, V halts and accepts. 


G(X 7 GopeVy)) 


Data Stream Verification, Fig. 1 Description of the sum-check protocol. deg; (g) denotes the degree of g in the ith 


variable 
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f= 


ie{0,1}os” 


fi Xi (1) 


Note that f is the unique multilinear polynomial 
satisfying the property that a (i) = fj for allie 
{0, 1 ylog ne 

The Ath frequency moment of o is equal to 


» eS SS 7G. 


i€{0, 1}loe7 ie{0, 1}o87 


Hence, in order to compute the Ath frequency 
moment of o, it suffices to apply the sum-check 
protocol to the polynomial g = 7 K This requires 
logn rounds of prover—verifier interaction, and 
since the total degree of f Big ie log n, the total 
communication cost is O(k log) field elements, 
which require O(k? log” n) total bits to specify. 

At the end of the sum-check protocol, V must 
compute 


g(T1,---.Nogn) = CF ry 253 Teen) 


for randomly chosen (71,...,Tiogn) € Foe”. It 
suffices for V to evaluate z:= f(r1,.-..Nogn), 
since CP Gisasdelinan) = z*. The following 


lemma establishes that V can evaluate z with a 
single pass over o, while storing O(logn) field 
elements. 


Lemma 1 VY can compute z = An, . +s Togn) 
with a single streaming pass over o, while storing 
O(log n) field elements. 


Proof Given any stream update a; € U, let 
aj denote the binary vector associated with a;. 
It follows from Eq. (1) that fin, -++sNogn) = 
yi Xa;(T1,---+Tlogn). Thus, V can compute 
7 (T1,.--,Tlogn) incrementally from the raw 
stream by initializing f (T1,.--5Tlogn) <— 0 and 
processing each update a; via 


S(r1,-+-.Mogn) — f(r1,---5Nogn) 


+ Ya(T1,---.Nogn)- 


VY only needs to store (r1,...,Mogn) and 
S(r1,.--,Nogn), Which is O(ogn) field 
elements in total. 


Data Stream Verification 


Open Problems 


¢ For several functions F: {0,1}” — {0,1}, it 
is known that any online ADS protocol for F 
with communication cost and space cost v 
requires h - v = §2(n). This lower bound is 
tight in many cases, such as for the INDEX 
function [3]. However, it is open to exhibit 
a function that cannot be computed by any 
online ADS protocol with communication and 
space costs both bounded above by h, for 
some h = w(n'/?). 

¢ Two-message online SIP protocols with log- 
arithmic space and communication costs are 
known for several functions, including the 
INDEX function [4]. It is also known that ex- 
isting techniques cannot yield 2- or 3-message 
online SIPs of polylogarithmic cost for the 
MEDIAN or FREQUENCY MOMENT problems. 
However, it is open to exhibit a function F : 
{0, 1}” — {0, 1} that cannot be computed by 
any online two-message SIP with communica- 
tion and space costs both bounded above by h, 
for some h = w(logn). 
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Problem Definition 


A three-dimensional domain with piecewise lin- 
ear boundary elements can be represented as a 
piecewise linear complex (PLC) of linear cells 
— vertices, edges, polygons, and polyhedra — 
that satisfy the following properties [4]. First, no 
vertex lies in the interior of an edge and every two 
edges are interior-disjoint. Second, the boundary 
of a polygon or polyhedra are union of cells in 
the PLC. Third, if two cells f and g intersect, the 
intersection is a union of cells in the PLC with 
dimensions lower than f or g. A triangulation 
of an input PLC is conforming if every edge 
and polygon appear as a union of segments and 
triangles in the triangulation. Additional Steiner 
vertices are often necessary. The 3D conforming 
Delaunay triangulation problem is to construct 
a triangulation of an input PLC that is both 
conforming and Delaunay. Figure la, b shows 
an input PLC and its conforming Delaunay tri- 
angulation. In many applications, it is often de- 
sired that the triangulation is not unnecessarily 
dense and the resulting tetrahedra are of bounded 


500 


3D Conforming Delaunay Triangulation 


3D Conforming Delaunay Triangulation, Fig. 1 A PLC and its conforming Delaunay triangulation in (a) and (b). 
A sliver with negligible volume and edge lengths similar to its circumradius in (c). 


aspect ratio. Another popular shape measure is 
the radius-edge ratio, which is the ratio of the 
circumradius of the tetrahedron to its shortest 
edge length. Tetrahedra with bounded radius- 
edge ratio may still have negligible volume, and 
they are known as slivers. Figure 1c shows a 
sliver. 


Key Results 


Since the Delaunayhood of edges and triangles 
are guaranteed by the emptiness of their circum- 
spheres, one would imagine that a conforming 
Delaunay triangulation can be obtained by sprin- 
kling Steiner vertices on the input edges and 
polygons. Indeed, Murphy, Mount, and Gable [6] 
showed a way to do this, but the resulting trian- 
gulation is very dense, and no shape guarantee is 
offered. 

Shewchuk [9] gave the first algorithm that 
offers shape guarantee for PLCs in which two 
adjoining elements do not make an acute angle. 
(The exact requirement is more general and is 
called the projection condition [4, 9].) The al- 
gorithm is a generalization of Ruppert’s Delau- 
nay refinement algorithm in the plane [8]. An 
initial Delaunay triangulation is formed using 
the input vertices, and then Steiner vertices are 
added incrementally. Boundary conformity takes 
precedence. Therefore, whenever a segment on an 
edge or a triangle on a polygon has a nonempty 
diametric ball (the ball enclosed by the smallest 
circumsphere), that segment or triangle is split by 
inserting the center of its diametric ball. A Delau- 
nay tetrahedron with radius-edge ratio larger than 
a prescribed constant p > 2 is split by inserting 


its circumcenter. However, if this circumcenter 
lies inside the diametric ball of a segment or a 
triangle, then the insertion is aborted and that 
segment or triangle is split instead. Similarly, the 
insertion of a triangle’s diametric ball center is 
aborted if it lies in the diametric ball of a segment, 
and that segment is split instead. The following 
theorem states the main result. 


Theorem 1 ([4,9]) Let p be a constant greater 
than 2. Let P be a PLC with no acute input 
angle. A conforming Delaunay triangulation of 
P can be obtained by Delaunay refinement and 
all tetrahedra obtained have radius-edge ratio at 
most p. 


In the presence of acute angles, the splitting 
of segments and triangles may lead to an infinite 
loop as illustrated in Fig. 2. Notice that the Steiner 
vertices inserted are approaching the input vertex 
in Fig. 2. In the plane, Ruppert [8] proposed a fix: 
place some protecting circles centered at the input 
vertices, disallow the insertion of Steiner vertices 
inside these protecting circles, and triangulate the 
inside of these protecting circles using a separate 
mechanism. A key change is that if a Steiner 
vertex to be inserted is too close to an arc on a 
protecting circle, then the insertion of the Steiner 
vertex is aborted and the circular arc is split by 
inserting its midpoint. This is analogous to the 
splitting of segments. 

Cohen-Steiner, de Verdiére, and Yvinec [5] 
generalized this idea partly to three dimensions. 
They proposed to place protecting balls centered 
at the input vertices as well as at some ap- 
propriate points in the interior of input edges. 
These protecting balls cover all input vertices and 
edges. The intersection between a protecting ball 


3D Conforming Delaunay Triangulation 


3D Conforming Delaunay Triangulation, Fig. 2 The 
midpoint of the segment with the largest diametric ball 
triggers the splitting of the segment with the second largest 
diametric ball, which in turn triggers the splitting of the 
segment with the smallest diametric ball. This may go on 
indefinitely 


boundary and an input polygon is analogous to 
a protecting circle in 2D. Therefore, when we 
want to insert Steiner vertices in a polygon f 
to recover the Delaunay triangles on /, if such 
a Steiner vertex v is too close to an arc @ at 
the intersection of f and some protecting ball 
boundary, the insertion of v is aborted and a is 
split instead. The portions of polygons inside the 
protecting balls are triangulated using a separate 
mechanism. If a tetrahedron t has large radius- 
edge ratio but its circumcenter lies inside some 
protecting ball, then t is just kept in the triangu- 
lation. As a result, no shape guarantee is offered. 


Theorem 2 ([5]) There is a Delaunay refine- 
ment algorithm that constructs a conforming 
Delaunay triangulation of any input PLC. 


Cheng and Poon [2] extended Delaunay re- 
finement by observing that segments, circular 
arcs, triangles, and spherical triangles can all be 
handled in a uniform way. 

Let 6 be the union of protecting balls with 
centers at the input vertices and interior of input 
edges. Let B be a protecting ball. Let d denote 
the boundary operator. For every input polygon 
J, the Steiner vertices on f 1 BM 0B divide 
ff A BN OB into circular arcs. For every pro- 
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tecting B, the projection of the convex hull of 
the Steiner vertices on B onto 0B divides 0B into 
some spherical triangles. The diametric ball of a 
segment or triangle can be viewed as the circum- 
scribing ball whose boundary intersects the affine 
hull of the segment or triangle at right angle. 
Analogously, the “diametric ball” of a circular arc 
in 0B or a spherical triangle with vertices in 0B 
can be defined as the circumscribing ball whose 
boundary intersects B at right angle. If a Steiner 
vertex to be inserted lies inside this “diametric 
ball,” the insertion is aborted and the circular arc 
or spherical triangle is split instead. A circular 
arc is split by inserting its midpoint. A spherical 
triangle is split by inserting the intersection point 
between 0B and the line segment joining the 
centers of B and the “diametric ball.’ The last 
ingredient is that for every pair of protecting balls 
whose centers are adjacent on an input edge, their 
boundaries should intersect at right angle. That 
is, the protecting ball B’ adjacent to B serves as 
the “diametric ball” of the spherical triangles that 
have vertices on the circle 0B N 0B’. 


Theorem 3 ((2]) There exist a constant p > 2 
and a Delaunay refinement algorithm that con- 
structs a conforming Delaunay triangulation of 
any input PLC such that all resulting tetrahedra 
have radius-edge ratio at most p. 


In fact, all tetrahedra inside the union of pro- 
tecting balls have aspect ratios that depend only 
on the smallest angle in the input PLC. Subse- 
quently, different simplifications and algorithms 
with less expensive primitives have been pro- 
posed [3,7]. 

Placing Steiner vertices on the protecting balls 
and constructing the convex hull of the Steiner 
vertices on a protecting ball are fairly expensive. 
Even checking whether a Steiner vertex to be 
inserted lies inside any protecting ball is a burden. 
These expensive computations can be bypassed 
by switching to the weighted Delaunay trian- 
gulation — a more general variant of Delaunay 
triangulation. 

Let B, and By be two balls with centers x 
and y and radii ry > 0 and ry > 0. The power 
distance between B, and By is defined to be 


(Bx, By) = d(x, yy? - re _ ee 
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This definition allows B, or By to degenerate 
to a single point. As in the Euclidean case, the 
bisector between B, and By is also a plane 
perpendicular to the line through x and y; how- 
ever, the bisector plane may not pass through 
the midpoint of xy. Using the power distance, 
one can define a weighted version of the Voronoi 
diagram called the power diagram. The dual of 
the power diagram is known as the weighted De- 
launay triangulation. For each segment, triangle 
or tetrahedron o in the triangulation, there is a 
point z at equal and minimum power distances 
D from the vertices of o. This point z is known 
as the orthocenter of o. The ball centered at 
z with radius /D is called the orthoball of 
o, which is at zero power distances from the 
vertices of o. 

The key idea is to use a weighted Delaunay 
triangulation after placing the protecting balls. 
The Delaunay refinement strategy is then mod- 
ified to insert orthocenters instead of centers of 
diametric balls. If the protecting balls are not too 
large, every triangle or tetrahedron o in the initial 
weighted Delaunay triangulation involve a pair of 
nonoverlapping protecting ball, which must be a 
positive power distance apart. It follows that the 
orthocenter of o lies outside all protecting balls. 
As the refinement progresses, an edge, triangle, 
or tetrahedron o may involve Steiner vertices 
which can be viewed as balls of zero radii. Such a 
Steiner vertex must be at positive power distances 
from the other vertices of 0, so the orthocenter of 
o also lies outside all protecting balls. In sum- 
mary, the indefinite insertions of Steiner vertices 
at decreasing distances from the input vertices 
and edges as shown in Fig. 2 cannot happen. The 
following theorem summarizes the result. 


Theorem 4 ([4]) Let P be a PLC. Let p be 
any constant at least 2. There is an algorithm 
that constructs a conforming weighted Delaunay 
triangulation of P in which no tetrahedron has 
an orthoradius-edge ratio greater than p. There- 
fore, tetrahedra with no weighted vertices have 
circumradius-edge ratio at most p. 


There are known methods for eliminating sliv- 
ers from a conforming weighted Delaunay trian- 
gulation of a PLC [1,4]. 
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Problem Definition 


In order to ensure the integrity of data in the 
presence of errors, an error-correcting code is 
used to encode data into a redundant form (called 
a codeword). It is natural to view both the orig- 
inal data (or message) as well as the associated 
codeword as strings over a finite alphabet. There- 
fore, an error-correcting code C is defined by an 
injective encoding map E: 5 +5”, where k is 
called the message length, and n the block length. 
The codeword, being a redundant form of the 
message, will be longer than the message. The 
rate of an error-correcting code is defined as the 
ratio k/n of the length of the message to the length 
of the codeword. The rate is a quantity in the 
interval (0, 1], and is a measure of the redundancy 
introduced by the code. Let R(C) denote the rate 
of a code C. 

The redundancy built into a codeword enables 
detection and hopefully also correction of any 
errors introduced, since only a small fraction of 
all possible strings will be legitimate codewords. 
Ideally, the codewords encoding different mes- 
sages should be “far-off’ from each other, so 
that one can recover the original codeword even 
when it is distorted by moderate levels of noise. 
A natural measure of distance between strings is 
the Hamming distance. The Hamming distance 
between strings x, y € X’* of the same length, 
denoted dist(x, y), is defined to be the number 
of positions i for which x; # y;. 

The minimum distance, or simply distance, 
of an error-correcting code C, denoted d(C), is 
defined to be the smallest Hamming distance 
between the encodings of two distinct messages. 
The relative distance of a code C of block length 
n, denoted 6(C), is the ratio between its dis- 
tance and n. Note that arbitrary corruption of any 
|(d(C) — 1)/2] of locations of a codeword of C 
cannot take it closer (in Hamming distance) to 
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any other codeword of C. Thus in principle (i.e., 
efficiency considerations apart) error patterns of 
at most | (d(C) — 1)/2] errors can be corrected. 
This task is called unique decoding or decod- 
ing up to half-the-distance. Of course, it is also 
possible, and will often be the case, that error 
patterns with more than d(C)/2 errors can also 
be corrected by decoding the string to the closest 
codeword in Hamming distance. The latter task is 
called Nearest-Codeword decoding or Maximum 
Likelihood Decoding (MLD). 

One of the fundamental trade-offs in the the- 
ory of error-correcting codes, and in fact one 
could say all of combinatorics, is the one be- 
tween rate R(C) and distance d(C) of a code. 
Naturally, as one increases the rate and thus 
number of codewords in a code, some two code- 
words must come closer together thereby low- 
ering the distance. More qualitatively, this rep- 
resents the tension between the redundancy of 
a code and its error-resilience. To correct more er- 
rors requires greater redundancy, and thus lower 
rate. 

A code defined by encoding map E: 3* > 5” 
with minimum distance d is said to be an (n, k, d) 
code. Since there are ||" codewords and only 
|X*-!) possible projections onto the first k = 1 
coordinates, some two codewords must agree 
on the first k —1 positions, implying that the 
distance d of the code must obey d <n —k + 1 
(this is called the Singleton bound). Quite 
surprisingly, over large alphabets & there are 
well-known codes called Reed-Solomon codes 
which meet this bound exactly and have the 
optimal distance d =n —k +1 for any given 
rate k/n. (In contrast, for small alphabets, such 
as &’ = {0, 1}, the optimal trade-off between rate 
and relative distance for an asymptotic family of 
codes is unknown and is a major open question 
in combinatorics.) 

This article will describe the best known 
algorithmic results for error-correction of Reed— 
Solomon codes. These are of central theoretical 
and practical interest given the above-mentioned 
optimal trade-off achieved by Reed—Solomon 
codes, and their ubiquitous use in our everyday 
lives ranging from compact disc players to deep- 
space communication. 
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Reed-Solomon Codes 


Definition 1 A Reed-Solomon code (or RS 
code), RSz,s[”,k], is parametrized by integers 
n,k satisfying 1 < k <n, a finite field F of size 
at least n, and a tuple S = (@1,Q2,...,Q@,) of n 
distinct elements from F. The code is described 
as a subset of F” as: 


RSr,s[n,k] = {(p(a1), p(@2),-.-, P(@n)) 
| p(X) € F[X] is a polynomial of degree <k—1}. 


In other words, the message is viewed as a poly- 
nomial, and it is encoded by evaluating the poly- 
nomial at n distinct field elements aj,...,Qn. 
The resulting code is linear of dimension k, and 
its minimum distance equals n —k + 1, which 
matches the Singleton bound. 


The distance property of RS codes follows from 
the fact that the evaluations of two distinct poly- 
nomials of degree less than k can agree on at most 
k —1 field elements. Note that in the absence 
of errors, given a codeword y € F”, one can 
recover its corresponding message by polynomial 
interpolation on any k out of the n codeword 
positions. In fact, this also gives an erasure de- 
coding algorithm when all but the information- 
theoretically bare minimum necessary k symbols 
are erased from the codeword (but the receiver 
knows which symbols have been erased and the 
correct values of the rest of the symbols). The RS 
decoding problem, therefore, amounts to a noisy 
polynomial interpolation problem when some of 
the evaluation values are incorrect. 

The holy grail in decoding RS codes would 
be to find the polynomial p(X) whose RS encod- 
ing is closest in Hamming distance to a noisy 
string y € F”. One could then decode y to this 
message p(X) as the maximum likelihood choice. 
No efficient algorithm for such nearest-codeword 
decoding is known for RS codes (or for that 
matter any family of “good” or non-trivial codes), 
and it is believed that the problem is NP-hard. 
Guruswami and Vardy [6] proved the problem 
to NP-hard over exponentially large fields, but 
this is a weak negative result since normally one 
considers Reed-Solomon codes over fields of 
size at most O(n). 


Decoding Reed-Solomon Codes 


Given the intractability of nearest-codeword 
decoding in its extreme generality, lot of attention 
has been devoted to the bounded distance decod- 
ing problem, where one assumes that the string 
y <¢ F” to be decoded has at most e errors, and 
the goal is to find the Reed-Solomon codeword(s) 
within Hamming distance e from y. 

When e < (n — k)/2, this corresponds to de- 
coding up to half the distance. This is a classical 
problem for which a polynomial time algorithm 
was first given by Peterson [8]. (It is notable 
that this even before the notion of polynomial 
time was put forth as the metric of theoretical 
efficiency.) The focus of this article is on a list 
decoding algorithm for Reed—Solomon codes due 
to Guruswami and Sudan [5] that decode beyond 
half the minimum distance. The formal problem 
and the key results are stated next. 


Key Results 


In this section, the main result of focus con- 
cerning decoding Reed—Solomon codes is stated. 
Given the target of decoding errors beyond half- 
the-minimum distance, one needs to deal with 
inputs where there may be more than one code- 
word within the radius e specified in the bounded 
distance decoding problem. This is achieved by 
a relaxation of decoding called list decoding 
where the decoder outputs a list of all code- 
words (or the corresponding messages) within 
Hamming distance e from the received word. If 
one wishes, one can choose the closest codeword 
among the list as the “most likely” answer, but 
there are many applications of Reed—Solomon 
decoding, for example to decoding concatenated 
codes and several applications in complexity the- 
ory and cryptography, where having the entire list 
of codewords adds to the power of the decoding 
primitive. The main result of Guruswami and 
Sudan [5], building upon the work of Sudan [9], 
is the following: 


Theorem 1 ((5]) Let C =RSps[n,k] be 
a Reed-Solomon code over a field F of size 
q>=n with S = (a1,Q@2,...,@,). There is 
a deterministic algorithm running in time 
polynomial in q that on input y € has outputs 
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a list of all polynomials p(X) € F[X] of degree 
less than k for which p(ai) # yi for less 
than n — \/(k —1)n positions i € {1,2,...,n}. 
Further, at most O(n?) polynomials will be output 
by the algorithm in the worst-case. 


Alternatively, one can correct a RS code 
of block length n and rate R=k/n up to 
n— J/(k —1) errors, or equivalently a fraction 
1— JR of errors. 

The Reed-Solomon decoding algorithm is 
based on the solution to the following more 
general polynomial reconstruction problem 
which seems like a natural algebraic question 
in itself. (The problem is more general than RS 
decoding since the a;’s need not be distinct.) 


Problem 1 (Polynomial Reconstruction) 

Input: Integers k,t <n and n distinct pairs 
{(@i, Vi) }72, Where aj, y; € F. 

Output: A list of all polynomials p(X) € F[X] of 
degree less than k which satisfy p(a;) = y; for at 
least t values of i € [n]. 


Theorem 2 The polynomial reconstruction 
problem can be solved in time polynomial in 


n,|F|, provided t > ./(k — 1)n. 


The reader is referred to the original papers [5, 
9], or a recent survey [1], for details on the above 
algorithm. A quick, high level peek into the 
main ideas is given below. The first step in the 
algorithm consists of an interpolation step where 
a nonzero bivariate polynomial Q(X,Y) is “fit” 
through the n pairs (a@;, y;), so that O(a;, y;) = 0 
for every i. The key is to do this with relatively 
low degree; in particular one can find such 
a Q(X,Y) with so-called (1,k — 1)-weighted 
degree at most D = ,/2(k — 1)n. This degree 
budget on Q implies that for any polynomial 
p(X) of degree less than k, Q(X, p(X)) will have 
degree at most D. Now whenever p(a@;) = yj, 
O(a;, p(a)i)) = O(ai, yi) = 0. Therefore, if 
a polynomial p(X) satisfies p(a;) = y; for at 
least ¢ values of i, then Q(X, p(X)) has at 
least ¢f roots. On the other hand the polynomial 
O(X, p(X)) has degree at most D. Therefore, 
if t > D, one must have Q(X, p(X)) = 0, or 
in other words Y — p(X) is a factor of Q(X,Y). 
The second step of the algorithm factorized the 
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polynomial Q(X,Y), and all polynomials p(X) that 
must be output will be found as factors Y — p(X) 
of Q(X,Y). 

Note that since D = ./2(k — 1)n this gives 
an algorithm for polynomial reconstruction 
provided the agreement parameter f satisfies 
t > /2(k —1)n [9]. To get an algorithm for 
t > ,/(k — 1)n, and thus decode beyond half the 
minimum distance (n —k)/2 for all parameter 
choices for k, n, Guruswami and Sudan [5] 
use the crucial idea of allowing “multiple 
roots” in the interpolation step. Specifically, the 
polynomial Q is required to have r > | roots at 
each pair (a@;, y;) for some integer multiplicity 
parameter r (the notion needs to be formalized 
properly, see [5] for details). This necessitates 
an increase in the (1,4 — 1)-weighted degree 
of a factor of about r//2, but the gain is 
that one gets a factor r more roots for the 
polynomial Q(X, p(X)). These facts together 
lead to an algorithm that works as long as 
t> J/(k—I1)n. 

There is an additional significant benefit of- 
fered by the multiplicity based decoder. The mul- 
tiplicities of the interpolation points need not 
all be equal and they can picked in proportion 
to the reliability of different received symbols. 
This gives a powerful way to exploit “soft” in- 
formation in the decoding stage, leading to im- 
pressive coding gains in practice. The reader is 
referred to the paper by Koetter and Vardy [7] for 
further details on using multiplicities to encode 
symbol level reliability information from the 
channel. 


Applications 


Reed-Solomon codes have been extensively 
studied and are widely used in practice. The 
above decoding algorithm corrects more errors 
beyond the traditional half the distance limit 
and therefore directly advances the state of 
the art on this important algorithmic task. The 
RS list decoding algorithm has also been the 
backbone for many further developments in 
algorithmic coding theory. In particular, using 
this algorithm in concatenation schemes leads to 
good binary list-decodable codes. A variant of 
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RS codes called folded RS codes have been used 
to achieve the optimal trade-off between error- 
correction radius and rate [3] (see the companion 
encyclopedia entry by Rudra on folded RS 
codes). 

The RS list decoding algorithm has also 
found many surprising applications beyond 
coding theory. In particular, it plays a key 
role in several results in cryptography and 
complexity theory (such as constructions of 
randomness extractors and pseudorandom 
generators, hardness amplification, constructions 
to hardcore predicates, traitor tracing, reductions 
connecting worst-case hardness to average-case 
hardness, etc.); more information can be found, 
for instance, in [10] or Chap. 12 in [2]. 


Open Problems 


The most natural open question is whether one 
can improve the algorithm further and correct 
more than a fraction 1— JR of errors for RS 
codes of rate R. It is important to note that there 
is a combinatorial limitation to the number of 
errors one can list decode from. One can only 
list decode in polynomial time from a fraction 
p of errors if for every received word y the 
number of RS codewords within distance e = pn 
of y is bounded by a polynomial function of 
the block length n. The largest p for which this 
holds as a function of the rate R is called the 
list decoding radius pp = P_p(R) of RS codes. 
The RS list decoding algorithm discussed here 
implies that p_p(R) => 1—- VR, and it is trivial 
to see than p_p(R) < 1 — R. Are there RS codes 
(perhaps based on specially structured evaluation 
points) for which p_p(R) > 1 — VR? Are there 
RS codes for which the 1 — VR radius (the so- 
called “Johnson bound”) is actually tight for list 
decoding? For the more general polynomial re- 
construction problem the ,/(k — 1)n agreement 
cannot be improved upon [4], but this is not 
known for RS list decoding. 

Improving the NP-hardness result of [6] to 
hold for RS codes over polynomial sized fields 
and for smaller decoding radii remains an impor- 
tant challenge. 
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Problem Definition 


A dynamic graph algorithm maintains a given 
property P on a graph subject to dynamic 
changes, such as edge insertions, edge deletions 
and edge weight updates. A dynamic graph 
algorithm should process queries on property 
P quickly, and perform update operations faster 
than recomputing from scratch, as carried out by 
the fastest static algorithm. An algorithm is fully 
dynamic if it can handle both edge insertions and 
edge deletions. A partially dynamic algorithm 
can handle either edge insertions or edge 
deletions, but not both: it is incremental if it 
supports insertions only, and decremental if it 
supports deletions only. 

This entry addressed the decremental version 
of the all-pairs shortest paths problem (APSP), 
which consists of maintaining a directed graph 
with real-valued edge weights under an inter- 
mixed sequence of the following operations: 


delete(u, v): delete edge (u, v) from the graph. 

distance(x, y): return the distance from ver- 
tex x to vertex y. 

path(x, y): report a shortest path from vertex x 
to vertex y, if any. 


A natural variant of this problem supports a gen- 
eralized delete operation that removes a vertex 
and all edges incident to it. The algorithms ad- 
dressed in this entry can deal with this general- 
ized operation within the same bounds. 


History of the Problem 

A simple-minded solution to this problem would 
be to rebuild shortest paths from scratch after 
each deletion using the best static APSP algo- 
rithm so that distance and path queries can be 
reported in optimal time. The fastest known static 
APSP algorithm for arbitrary real weights has 
arunning time of O(mn +n? log logn), where m 
is the number of edges and n is the number of 
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vertices in the graph [13]. This is 2(n3) in the 
worst case. Fredman [6] and later Takaoka [19] 
showed how to break this cubic barrier: the best 
asymptotic bound is by Takaoka, who showed 
how to solve APSP in O(n3,/loglogn/logn) 
time. 

Another simple-minded solution would be to 
answer queries by running a point-to-point short- 
est paths computation, without the need to update 
shortest paths at each deletion. This can be done 
with Dijkstra’s algorithm [3] in O(m + n logn) 
time using the Fibonacci heaps of Fredman and 
Tarjan [5]. With this approach, queries are an- 
swered in O(m + nlogn) worst-case time and 
updates require optimal time. 

The dynamic maintenance of shortest paths 
has a long history, and the first papers date back to 
1967 [11, 12, 17]. In 1985 Even and Gazit [4] pre- 
sented algorithms for maintaining shortest paths 
on directed graphs with arbitrary real weights. 
The worst-case bounds of their algorithm for 
edge deletions were comparable to recomput- 
ing APSP from scratch. Also Ramalingam and 
Reps [15, 16] and Frigioni et al. [7, 8] con- 
sidered dynamic shortest path algorithms with 
real weights, but in a different model. Namely, 
the running time of their algorithm is analyzed 
in terms of the output change rather than the 
input size (output bounded complexity). Again, 
in the worst case the running times of output- 
bounded dynamic algorithms are comparable to 
recomputing APSP from scratch. 

The first decremental algorithm that was 
provably faster than recomputing from scratch 
was devised by King for the special case of 
graphs with integer edge weights less than C: her 
algorithm can update shortest paths in a graph 
subject to a sequence of 92(n*) deletions in 
O(C -n?) amortized time per deletion [9]. Later, 
Demetrescu and Italiano showed how to deal 
with graphs with real non-negative edge weights 
in O(n? logn) amortized time per deletion [2] 
in a sequence of (Q(m/n) operations. Both 
algorithms work in the more general case where 
edges are not deleted from the graph, but their 
weight is increased at each update. Moreover, 
since they update shortest paths explicitly 
after each deletion, queries are answered in 
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optimal time at any time during a sequence of 
operations. 


Key Results 


The decremental APSP algorithm by Demetrescu 
and Italiano hinges upon the notion of locally 
shortest paths [2]. 


Definition 1 A path is locally shortest in a graph 
if all of its proper subpaths are shortest paths. 


Notice that by the optimal-substructure prop- 
erty, a shortest path is locally shortest. The main 
idea of the algorithm is to keep information 
about locally shortest paths in a graph subject 
to edge deletions. The following theorem derived 
from [2] bounds the number of changes in the set 
of locally shortest paths due to an edge deletion: 


Theorem 1 /f shortest paths are unique in the 
graph, then the number of paths that start or 
stop being shortest at each deletion is O(n”) 
amortized over S2(m/n) update operations. 


The result of Theorem | is purely combinatorial 
and assumes that shortest paths are unique in the 
graph. The latter can be easily achieved using any 
consistent tie-breaking strategy (see, e.g., [2]). 
It is possible to design a deletions-only algo- 
rithm that pays only O(log) time per change in 
the set of locally shortest paths, using a simple 
modification of Dijkstra’s algorithm [3]. Since 
by Theorem | the amortized number of changes 
is bounded by O(n”), this yields the following 
result: 


Theorem 2 Consider a graph with n vertices 
and an initial number of m edges subject to 
a sequence of 2(m/n) edge deletions. If shortest 
paths are unique and edge weights are non- 
negative, it is possible to support each delete 
operation in O(n? logn) amortized time, each 
distance query in O(1) worst-case time, and each 
path query in O(£) worst-case time, where ¢ is the 
number of vertices in the reported shortest path. 
The space used is O(mn). 
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Applications 


Application scenarios of dynamic shortest paths 
include network optimization [1], document 
formatting [10], routing in communication 
systems, robotics, incremental compilation, 
traffic information systems [18], and dataflow 
analysis. A comprehensive review of real-world 
applications of dynamic shortest path problems 
appears in [14]. 


URL to Code 


An efficient C language implementation of the 
decremental algorithm addressed in section “Key 
Results” is available at the URL: http://www.dis. 
uniroma | .it/~demetres/experim/dsp. 
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Problem Definition 


A dynamic graph algorithm maintains informa- 
tion about a graph that is changing over time. 
Given a property P of the graph (e.g., minimum 
spanning tree), the algorithm must support an 
online sequence of query and update operations, 
where an update operation changes the underly- 
ing graph, while a query operation asks for the 
state of P in the current graph. In the typical 
model studied, each update only affects a single 
edge. In a fully dynamic setting, an update can 
insert or delete an edge or change the weight of an 
existing edge; in a decremental setting an update 
can only delete an edge or increase a weight; in an 
incremental setting an update can insert an edge 
or decrease a weight. 

This entry addresses the decremental (1 + e€)- 
approximate all-pairs shortest path problem 
(APSP) in weighted directed graphs. The goal 
is to maintain a directed graph G with real- 
valued nonnegative edge weights under an 
online intermixed sequence of the following 
operations: 


¢ delete(u, v) (update): remove edge (u, v) from 
G. 

¢ increase-weight(u,v,w) (update): increase 
the weight of edge (u, v) to w. 

¢ distance(u,v) (query): return a (1+ .€)- 
approximation to the shortest u — v distance 
inG. 

e path(u, v) (query): return a (1 + €)-approxi- 
mate shortest path from u to v. 


A History of Decremental APSP 

The naive approach to the decremental APSP 
problem is to recompute shortest paths from 
scratch after every update, allowing queries to 
be answered in optimal time. Letting n be the 
number of vertices and m the number of edges, 
computing APSP requires O(mn + n? log log(n)) 
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time in sparse graphs [11] or slightly less than n3 
in dense graphs (see [13, 14]). Another simple- 
minded approach would be to not perform 
any computation during the updates and to 
simply compute the shortest u — v path from 
scratch when a query arrived. This would lead 
to a constant update time and a query time of 
O(m + n log(n)) using Dijkstra’s algorithm with 
Fibonacci heaps [6]. 

One can improve significantly upon both 
the above approaches by reusing information 
between updates. Decremental shortest path 
algorithms have a long history, with the current 
state of the art for the general case of directed 
graphs with real-valued weights being an 
algorithm of Demetrescu and Italiano which 
achieves constant query time and update time 
O(n? log(n)) [5]. Later papers improved upon 
O(n”) update time in restricted types of graph. 
In unweighted directed graphs, Baswana et al. 
achieve an amortized update of O(n? log? (n)/m) 
for exact distances and O(e!n?//m) for (1+e) 
approximate distances [1]. (We say that f(7) = 
O(gin)) if f(n) = O(g(n)polylog(n))). 
Keeping the (1 + €) approximation, Roditty 
and Zwick further reduced the amortized update 
time to O(n/e) [12]. 

An amortized update time of O(n) forms a 
natural barrier for decremental APSP because if 
edges are deleted from the graph one at a time, 
an O(n) update time allows us to maintain APSP 
over the entire sequence of deletions in a total 
of O(mn) time; excepting fast matrix multipli- 
cation in dense graphs, this O(mn) matches the 
best known bound for the much simpler problem 
of computing APSP a single time in the static 
setting. Roddity and Zwick achieve this desired 
total update time of O(mn) only for undirected, 
unweighted graphs; this entry focuses on a result 
of Bernstein that achieves the same O(mn) for 
directed graphs with weights polynomial in n [3]. 

There have recently been several results on 
breaking through the O(n) amortized update time 
barrier in undirected graphs by allowing a larger 
than (1+ €) approximation (see [4, 7, 9]), as 
well in directed graphs for single-source shortest 
paths [8]. 
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Key Results 


Bernstein’s result shows that in a directed graph 
with weights polynomial in 7, we can maintain 
(1 + €)-approximate decremental APSP with 
constant query time and a fotal update time of 
O(mn) over the entire sequence of deletions and 
weight increases (see Theorem 2 below). At a 
high level, Bernstein’s result uses the framework 
from his earlier paper on fully dynamic APSP 
in undirected graphs [2], but all the details and 
techniques change significantly in the shift to 
directed graphs. 


From Weighted Distances to Hop 

Distances 

The majority of dynamic APSP algorithms use 
as a building block an algorithm of King for 
maintaining a single-source shortest path tree un- 
der deletions [10]. King’s algorithm maintains a 
shortest path tree up to distance d (assuming inte- 
gral weights) in the total update time O(md) over 
all deletions (amortized O(d) per update); hence, 
O(mn) is the total update time in unweighted 
graphs where d < n. This makes it an extremely 
efficient building block for small distances, but 
with two main drawbacks: it is inefficient at han- 
dling vertices that are far apart and it completely 
fails in graphs with large weights where d can 
be very big. Bernstein’s algorithm overcomes the 
second of these problems by showing that if we 
allow a (1 + €) approximation, then with a simple 
scaling approach, we can shift the dependency 
from the weighted distance d between two ver- 
tices to the unweighted hop distance between 
them. 


Definition 1 The hop distance between two ver- 
tices x and y is the number of edges on the 
shortest x — y path. The (1 + €)-approximate x — 
y hop distance, denoted h(x, y), is the minimum 
number of edges among any (1 + €)-approximate 
x — y path. 


Theorem 1 Given a directed graph G with non- 
negative real weights and a source s and letting 
R be the ratio of the heaviest to the lightest 
nonzero edge weight in the graph, we can for any 
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hop distance h decrementally maintain (1 + €)- 
shortest paths from s to all vertices v for which 
h(s,v) <h. The total update time over the whole 
sequence of deletions and weight increases is 
O(nhlog(R)/€), which is O(nh) if weights are 
polynomial in n. 


We refer to the above decremental SSSP algo- 
rithm as h-SSSP. In short, Theorem | tells us 
that with a (1 + €) approximation, we can decre- 
mentally maintain a shortest path tree in time 
proportional not to the maximum distance of the 
tree (O(nd)) but to the maximum hop distance 
(O(nh)) of the tree. This is a big improvement 
in weighted graphs where h < d, but still 
inadequate as h can be (2(n). Bernstein’s key 
idea is that regardless of whether the original 
graph is weighted or not, we can add weighted 
edges that reduce hop distances in the graph 
and hence allow h-SSSP to run extremely effi- 
ciently. 


Shortcut Edges 

The algorithm of Bernstein works by adding 
many different (weighted) shortcut edges (x, y) 
to the original graph G, which are defined as 
edges that do not exist in G itself and have 
weight w(x, y) satisfying 6(x,y) < w(x,y) < 
(1 + €)d(x, y), where 5(x, y) is the shortest x—y 
distance. Note that as the graph changes, 6(x, y) 
will increase, and so the algorithm will have to 
increase w(x, y) for the shortcut edge to remain 
valid; a shortcut edge (x, y) is not simply com- 
puted once, but must be maintained over the 
whole sequence of edge deletions and weight 
increases. 

It is clear that because the weight of a shortcut 
edge (x,y) is tethered to d(x, y), the shortcut 
edges do not change shortest distances in the 
graph. But they do drastically reduce hop dis- 
tances. In an unweighted graph with 6(x,y) = 
1,000, a single x — y edge of weight 1,000 (or 
slightly larger) decreases h(x, y) from 1,000 to 
1. Moreover, any path that goes through x and y 
can also use the (x, y) shortcut edge to reduce its 
hop distance by 999. 
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Bernstein’s algorithm runs in phases, each 
of which adds more shortcut edges to succes- 
sively decrease all hop distances in the graph 
by a factor of 2. It starts by defining a small 
set of pairs S; for which it maintains approxi- 
mate shortest paths over the entire sequence of 
updates; this first step can easily be done in 
the desired O(mnlog(R)) total update time as 
instead of maintaining all-pairs shortest paths, 
the algorithm only has to maintain |S,| pairs. 
Now that the algorithm maintains 6(x, y) for 
every pair (x, y) in S;, it can add shortcut edges 
(x,y) to the graph. These shortcut edges de- 
crease hop distances in the graph, thus increas- 
ing the efficiency of the h-SSSP building block 
and allowing it to maintain approximate shortest 
distances within a slightly larger set of pairs 
S> in the same O(mn log(R)) total update time. 
Since knowing the shortest distance between two 
vertices allows us to maintain a corresponding 
shortcut edge, maintaining the larger set of dis- 
tances S> directly leads to a larger set of shortcut 
edges, which further reduce hop distances in 
the graph, allowing h-SSSP to efficiently main- 
tain shortest distances for an even larger set of 
pairs $3; this in turn leads to more shortcut 
edges, thus further reducing hop distances and 
allowing h-SSSP to maintain a larger distance 
set S4, and so on. After log(m) such layers, 
there are enough shortcut edges to ensure that 
all hop distances are constant, so by Theorem 1 
h-SPPP can decrementally maintain a shortest 
path tree in the graph in a total update time of 
only O(n log(R)/e); doing this from every vertex 
yields the desired bound of O(mnlog(R)/e) for 
decremental APSP. 


Theorem 2 Let G be a graph with nonnega- 
tive real-valued edge weights, n vertices, and 
m initial edges subject to an arbitrary sequence 
of &' edge deletions and weight increases. Let 
R be the ratio of the heaviest weight that ever 
appears in G to the lightest nonzero weight. 
It is possible to support the whole sequence of 
updates in a total time O(mnlog(R)/€) + O(Z) 
while answering queries with a single O(1) time 
lookup. 
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Problem Definition 


An important requirement of wireless ad hoc 
networks is that they should be self-organizing, 
and transmission ranges and data paths may need 
to be dynamically restructured with changing 
topology. Energy conservation and network per- 
formance are probably the most critical issues 
in wireless ad hoc networks, because wireless 
devices are usually powered by batteries only 
and have limited computing capability and mem- 
ory. Hence, in such a dynamic and resource- 
limited environment, each wireless node needs 
to locally select communication neighbors and 
adjust its transmission power accordingly, such 
that all nodes together self-form a topology that 
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is energy efficient for both unicast and broadcast 
communications. 

To support energy-efficient unicast, the topol- 
ogy is preferred to have the following features in 
the literature: 


1. POWER SPANNER: [1, 9, 13, 16, 17] Formally 
speaking, a subgraph H is called a power 
spanner of a graph G if there is a positive real 
constant p such that for any two nodes, the 
power consumption of the shortest path in H 
is at most p times of the power consumption 
of the shortest path in G. Here 0 is called the 
power stretch factor or spanning ratio. 

2. DEGREE BOUNDED: [1, 9, 11, 13, 16, 17] It 
is also desirable that the logical node degree 
in the constructed topology is bounded from 
above by a small constant. Bounded logical 
degree structures find applications in Blue- 
tooth wireless networks since a master node 
can have only seven active slaves simulta- 
neously. A structure with small logical node 
degree will save the cost of updating the rout- 
ing table when nodes are mobile. A structure 
with a small degree and using shorter links 
could improve the overall network through- 
out [6]. 

3. PLANAR: [1, 4, 13, 14, 16] A network topol- 
ogy is also preferred to be planar (no two 
edges crossing each other in the graph) to 
enable some localized routing algorithms to 
work correctly and efficiently, such as Greedy 
Face Routing (GFG) [2], Greedy Perimeter 
Stateless Routing (GPSR) [5], Adaptive Face 
Routing (AFR) [7], and Greedy Other Adap- 
tive Face Routing (GOAFR) [8]. Notice that 
with planar network topology as the under- 
lying routing structure, these localized rout- 
ing protocols guarantee the message delivery 
without using a routing table: each intermedi- 
ate node can decide which logical neighboring 
node to forward the packet to using only local 
information and the position of the source and 
the destination. 


To support energy-efficient broadcast [15], the 
locally constructed topology is preferred to be 
low-weighted [10, 12]: the total link length of 
the final topology is within a constant factor of 
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that of EMST. Recently, several localized algo- 
rithms [10, 12] have been proposed to construct 
low-weighted structures, which indeed approxi- 
mate the energy efficiency of EMST as the net- 
work density increases. However, none of them 
is power efficient for unicast routing. 

Before this work, all known topology control 
algorithms could not support power efficient uni- 
cast and broadcast in the same structure. It is 
indeed challenging to design a unified topology, 
especially due to the trade off between spanner 
and low weight property. The main contribution 
of this algorithm is to address this issue. 


Key Results 


This algorithm is the first localized topology con- 
trol algorithm for all nodes to maintain a unified 
energy-efficient topology for unicast and broad- 
cast in wireless ad hoc/sensor networks. In one 
single structure, the following network properties 
are guaranteed: 


1. Power efficient unicast: given any two nodes, 
there is a path connecting them in the structure 
with total power cost no more than 2p + 1 
times the power cost of any path connecting 
them in the original network. Here p > | is 
some constant that will be specified later in 
this algorithm. It assumes that each node u can 
adjust its power sufficiently to cover its next- 
hop v on any selected path for unicast. 

2. Power efficient broadcast: the power 
consumption for broadcast is within a constant 
factor of the optimum among all locally 
constructed structures. As proved in [10], 
to prove this, it equals to prove that the 
structure is low-weighted. Here we called 
a structure low-weigthed, if its total edge 
length is within a constant factor of the total 
length of the Euclidean Minimum Spanning 
Tree (EMST). For broadcast or generally 
multicast, it assumes that each node wu can 
adjust its power sufficiently to cover its 
farthest down-stream node on any selected 
structure (typically a tree) for multicast. 

3. Bounded logical node degree: each node has 
to communicate with at most k — 1 logical 
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Algorithm 1 S@GG: Power-Efficient Unicast Topology 
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1: First, each node self-constructs the Gabriel graph GG locally. The algorithm to construct GG locally is well-known, 
and a possible implementation may refer to [13]. Initially, all nodes mark themselves WHITE, ie., unprocessed. 
2: Once a WHITE node u has the smallest ID among all its WHITE neighbors in N(uw), it uses the following strategy 


to select neighbors: 


1. Node u first sorts all its BLACK neighbors (if available) in N(u) in the distance-increasing order, then sorts 
all its WHITE neighbors (if available) in N(u) similarly. The sorted results are then restored to N(u), by first 
writing the sorted list of BLACK neighbors then appending the sorted list of WHITE neighbors. 

2. Node u scans the sorted list N(u) from left to right. In each step, it keeps the current pointed neighbor w in the 
list, while deletes every conflicted node v in the remainder of the list. Here a node v is conflicted with w means 
that node v is in the 9-dominating region of node w. Here 6 = 21/k (k > 9) is an adjustable parameter. 

Node u then marks itself BLACK, i.e. processed, and notifies each deleted neighboring node v in N(u) by a broad- 


casting message UPDATEN. 


3: Once a node v receives the message UPDATEN from a neighbor u in N(v), it checks whether itself is in the nodes 
set for deleting: if so, it deletes the sending node u from list N(v), otherwise, marks u as BLACK in N(v). 

4: When all nodes are processed, all selected links {uv|v € N(u),Wv € GG} form the final network topology, 
denoted by SOGG. Each node can shrink its transmission range as long as it sufficiently reaches its farthest 


neighbor in the final topology. 


neighbors, where k >9 is an adjustable 
parameter. 

. Bounded average physical node degree: 
the expected average physical node degree 
is at most a small constant. Here the physical 
degree of a node uv in a structure His defined as 
the number of nodes inside the disk centered 
at u with radius max,yex¥ ||uv||. 

5. Planar: there are no edges crossing each other. 
This enables several localized routing algo- 
rithms, such as [2, 5, 7, 8], to be performed on 
top of this structure and guarantee the packet 
delivery without using the routing table. 

6. Neighbors @-separated: the directions be- 
tween any two logical neighbors of any node 
are separated by at least an angle 9, which 
reduces the communication interferences. 


It is the first known localized topology con- 
trol strategy for all nodes together to maintain 
such a single structure with these desired prop- 
erties. Previously, only a centralized algorithm 
was reported in [1]. The first step is Algorithm 1 
that can construct a power-efficient topology for 
unicast, then it extends to the final algorithm 
(Algorithm 2) that can support power-efficient 
broadcast at the same time. 


Definition 1 (6-Dominating Region) For each 
neighbor node v of a node u, the 0-dominating 
region of v is the 20-cone emanated from u, with 
the edge uv as its axis. 


Let Nupc(u) be the set of neighbors of node u in 
UDG, and let N(u) be the set of neighbors of node 
u in the final topology, which is initialized as the 
set of neighbor nodes in GG. 

Algorithm 1 constructs a degree-(k — 1) pla- 
nar power spanner. 


Lemma 1 Graph SOGG is connected if the 
underlying graph GG is connected. Furthermore, 
given any two nodes u and v, there exists a path 
{u,t,,...,t-,v} connecting them such that all 
edges have length less than V/2||uv||. 


Theorem 1 The structure SOGG has node de- 
gree at most k — 1 and is planar power spanner 
with neighbors ©-separated. Its power stretch 
factor is at most p = Rpg — (2/2sin EP), 


where k > 9 is an adjustable parameter. 


Obviously, the construction is consistent for two 
endpoints of each edge: if an edge uv is kept 
by node u, then it is also kept by node v. It is 
worth mentioning that, the number 3 in crite- 
rion ||xy|| > max(||uvv|], 3|]ux||, 3||vy ||) is care- 
fully selected. 


Theorem 2 The structure LSOGG is a degree- 
bounded planar spanner. It has a constant power 
spanning ratio 20+ 1, where p is the power 
spanning ratio of SOGG. The node degree is 
bounded by k — 1 where k => 9 is a customizable 
parameter in SOGG. 


Degree-Bounded Planar Spanner with Low Weight 


Algorithm 2 Construct LS OGG: Planar Spanner with Bounded Degree and Low Weight 


1: All nodes together construct the graph SOGG in a localized manner, as described in Algorithm 1. Then, each 


node marks its incident edges in SOGG unprocessed. 

2: Each node u locally broadcasts its incident edges in SOGG to its one-hop neighbors and listens to its neighbors. 
Then, each node x can learn the existence of the set of 2-hop links E,(x), which is defined as follows: E(x) = 
{uv € SOGG | vor v € Nypa(x)}. In other words, E(x) represents the set of edges in SOGG with at least one 
endpoint in the transmission range of node x. 

3: Once a node x learns that its unprocessed incident edge xy has the smallest ID among all unprocessed links in 
E(x), it will delete edge xy if there exists an edge uv € E(x) (here both u and v are different from x and y), 
such that ||xy|]| > max(||uv||, 3||ux||,3||vy||); otherwise it simply marks edge xy processed. Here assume that 
uvyx is the convex hull of u, v, x and y. Then the link status is broadcasted to all neighbors through a message 


UPDATESTATUS(XY). 


4: Once a node u receives a message UPDATESTATUS(XY), it records the status of link xy at E(u). 
5: Each node repeats the above two steps until all edges have been processed. Let LSOGG be the final structure 


formed by all remaining edges in SOGG. 


Theorem 3 The structure LSOGG 
weighted. 


is low- 


Theorem 4 Assuming that both the ID and the 
geometry position can be represented by logn 
bits each, the total number of messages during 
constructing the structure LSOGG is in the 
range of [5n,13n], where each message has at 
most O(log n) bits. 


Compared with previous known low-weighted 
structures [10, 12], LS@OGG not only achieves 
more desirable properties, but also costs much 
less messages during construction. To construct 
LS@OGG, each node only needs to collect the 
information E2(x) which costs at most 6n 
messages for n nodes. The Algorithm 2 can 
be generally applied to any known degree- 
bounded planar spanner to make it low-weighted 
while keeping all its previous properties, except 
increasing the spanning ratio from p to 20 + 1 
theoretically. 

In addition, the expected average node 
interference in the structure is bounded by a small 
constant. This is significant on its own due to the 
following reasons: it has been taken for granted 
that “a network topology with small logical node 
degree will guarantee a small interference” and 
recently Burkhart et al. [3] showed that this 
is not true generally. This work also shows 
that, although generally a small logical node 
degree cannot guarantee a small interference, the 
expected average interference is indeed small if 


the logical communication neighbors are chosen 
carefully. 


Theorem 5 For a set of nodes produced by 
a Poisson point process with density n, the 
expected maximum node interferences of EMST, 
GG, RNG, and Yao are at least O(log n). 


Theorem 6 For a set of nodes produced by 
a Poisson point process with density n, the 
expected average node interferences of EMST 
are bounded from above by a constant. 


This result also holds for nodes deployed with 
uniform random distribution. 


Applications 


Localized topology control in wireless ad hoc 
networks are critical mechanisms to maintain 
network connectivity and provide feedback to 
communication protocols. The major traffic in 
networks are unicast communications. There 
is a compelling need to conserve energy and 
improve network performance by maintaining 
an energy-efficient topology in localized ways. 
This algorithm achieves this by choosing 
relatively smaller power levels and size of 
communication neighbors for each node (e.g., 
reducing interference). Also, broadcasting is 
often necessary in MANET routing protocols. 
For example, many unicast routing protocols 
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such as Dynamic Source Routing (DSR), Ad 
Hoc On Demand Distance Vector (AODV), Zone 
Routing Protocol (ZRP), and Location Aided 
Routing (LAR) use broadcasting or a derivation 
of it to establish routes. It is highly important 
to use power-efficient broadcast algorithms for 
such networks since wireless devices are often 
powered by batteries only. 
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Problem Definition 


The problem is to construct a spanning tree of 
small degree for a connected undirected graph 
G =(V, E). In the Steiner version of the prob- 
lem, a set of distinguished vertices D C V is 
given along with the input graph G. A Steiner 
tree is a tree in G which spans at least the 
set D. 

As finding a spanning or Steiner tree of the 
smallest possible degree A* is NP-hard, one is in- 
terested in approximating this minimization prob- 
lem. For many such combinatorial optimization 
problems, the goal is to find an approximation in 
polynomial time (a constant or larger factor). For 
the spanning and Steiner tree problems, the iter- 
ative polynomial time approximation algorithms 
of Fiirer and Raghavachari [8] (see also [14]) 
find much better solutions. The degree A of the 
solution tree is at most A* + 1. 

There are very few natural NP-hard optimiza- 
tion problems for which the optimum can be 
achieved up to an additive term of 1. One such 
problem is coloring a planar graph, where col- 
oring with four colors can be done in poly- 
nomial time. On the other hand, 3-coloring is 
NP-complete even for planar graphs. An other 
such problem is edge coloring a graph of degree 
A. While coloring with A+ 1 colors is always 
possible in polynomial time, A edge coloring is 
NP-complete. 

Chvatal [3] has defined the toughness t(G) 
of a graph as the minimum ratio |X|/c(X) such 
that the subgraph of G induced by V\X has 
c(X) > 2 connected components. The inequality 
1/t(G) < A* immediately follows. Win [17] has 
shown that A* < AO + 3; ie., the inverse of 
the toughness is actually a good approximation 
of A*. 

A set X, such that the ratio |X|/c(X) is the 
toughness t(G), can be viewed as witnessing the 
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upper bound |X |/c(X) on t(G) and therefore the 
lower bound c(X)/|X | on A*. Strengthening this 
notion, Fiirer and Raghavachari [8] define X to 
be a witness set for A* > d if d is the smallest 
integer greater or equal to (|X| +c(X)—1)/|X]. 
Their algorithm not only outputs a spanning tree, 
but also a witness set X, proving that its degree is 
at most A* + 1. 


Key Results 


The minimum degree spanning tree and Steiner 
tree problems are easily seen to be NP-hard, 
as they contain the Hamiltonian path problem. 
Hence, we cannot expect a polynomial time al- 
gorithm to find a solution of minimal possible 
degree A*. The same argument also shows that 
an approximation by a factor less than 3/2 is 
impossible in polynomial time unless P = NP. 

Initial approximation algorithms obtained so- 
lutions of degree O(A* log n) [6], wheren = |V| 
is the number of vertices. The optimal result for 
the spanning tree case has been obtained by Fiirer 
and Raghavachari [7, 8]. 


Theorem 1 Let A* be the degree of an un- 
known minimum degree spanning tree of an input 
graph G = (V,E). There is a polynomial time 
approximation algorithm for the minimum degree 
spanning tree problem that finds a spanning tree 
of degree at most A* + 1. 


Later this result has been extended to the Steiner 
tree case [8]. 


Theorem 2 Assume a Steiner tree problem is 
defined by a graph G = (V, E) and an arbitrary 
subset D of vertices V. Let A* be the degree of 
an unknown minimum degree Steiner tree of G 
spanning at least the set D. There is a polynomial 
time approximation algorithm for the minimum 
degree Steiner tree problem that finds a Steiner 
tree of degree at most A* + 1. 


Both approximation algorithms run in time 
O(mn logna(m,n)), where m is the number 
of edges and a is the inverse Ackermann 
function. 
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Applications 


Some possible direct applications are in networks 
for noncritical broadcasting, where it might be 
desirable to bound the load per node, and in 
designing power grids, where the cost of splitting 
increases with the degree. Another major benefit 
of a small degree network is limiting the effect of 
node failure. 

Furthermore, the main results on approximat- 
ing the minimum degree spanning and Steiner 
tree problems have been the basis for approximat- 
ing various network design problems, sometimes 
involving additional parameters. 

Klein, Krishnan, Raghavachari and Ravi [11] 
find 2-connected subgraphs of approximately 
minimal degree in 2-connected graphs, as well 
as approximately minimal degree spanning trees 
(branchings) in directed graphs. Their algorithms 
run in quasi-polynomial time, and approximate 
the degree A* by (1 + €)A* + O(log, 4,7). 

Often the goal is to find a spanning tree that 
simultaneously has a small degree and a small 
weight. For a graph having an minimum weight 
spanning tree (MST) of degree A* and weight 
w, Fischer [5] finds a spanning tree with degree 
O(A* + logn) and weight w, (i.e., an MST of 
small weight) in polynomial time. 

K6nemann and Ravi [12, 13] provide a bi- 
criteria approximation. For a given B* > A*, 
let w be the minimum weight of any spanning 
tree of degree at most B*. The polynomial 
time algorithm finds a spanning tree of degree 
O(B* + logn) and weight O(w). In the second 
paper, the algorithm adapts to the case of 
a different degree bound on each vertex. 
Chaudhuri et al. [2] further improved this result 
to approximate both the degree B* and the weight 
w by a constant factor. 

In another extension of the minimum degree 
spanning tree problem, Ravi and Singh [15] have 
obtained a strict generalization of the A* + 1 
spanning tree approximation [8]. Their polyno- 
mial time algorithm finds an MST of degree 
A* +k for the case of a graph with k distinct 
weights on the edges. 

Recently, there have been some drastic im- 
provements. Again, let w be the minimum cost of 
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a spanning tree of given degree B *. Goemans [9] 
obtains a spanning tree of cost w and degree 
B* + 2. Finally, Singh and Lau [16] decrease 
the degree to B* + 1 and also handle individual 
degree bounds A* for each vertex v in the same 
way. 

Interesting approximation algorithms are also 
known for the 2-dimensional Euclidian minimum 
weight bounded degree spanning tree problem, 
where the vertices are points in the plane and 
edge weights are the Euclidian distances. Khuller, 
Raghavachari, and Young [10] show factor 1.5 
and 1.25 approximations for degree bounds 3 
and 4 respectively. These bounds have later been 
improved slightly by Chan [1]. Slightly weaker 
results are obtained by Fekete et al. [4], using 
flow-based methods, for the more general case 
where the weight function just satisfies the trian- 
gle inequality. 


Open Problems 


The time complexity of the minimum degree 
spanning and Steiner tree algorithms [8] is 
O(mna(m,n)logn). Can it be improved to 
O(mn)? In particular, what can be gained by 
initially selecting a reasonable Steiner tree with 
some greedy technique instead of starting the 
iteration with an arbitrary Steiner tree? 

Is there an efficient parallel algorithm that 
can obtain a A* +1 approximation in poly- 
logarithmic time? Fiirer and Raghavachari [6] 
have obtained such an NC-algorithm, but only 
with a factor O(logn) approximation of the 
degree. 
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Problem Definition 


The Delaunay triangulation and the Voronoi dia- 
gram are two classic geometric structures in the 
field of computational geometry. Their success 
can perhaps be attributed to two main reasons: 
Firstly, there exist practical, efficient algorithms 


to 


construct them; and secondly, they have an 


enormous number of useful applications ranging 
from meshing and 3D-reconstruction to interpo- 
lation. 


Given a set S of n sites in some space E, we 


define the Voronoi regionVs(p) of p € S to be 
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the set of points in E whose nearest neighbor in 
S is p (for some distance 6): 


Vip) = {x EE, ¥q eS \{p}5(x, p) < 8(x,g)}. 
It is easily seen that these regions form a partition 
of E into convex regions which we refer to as 
cells. These concepts may be extended into more 
exotic spaces such as periodic and hyperbolic 
spaces or metric spaces using convex distances, 
though we restrict ourselves to the case where E 
is the Euclidean space E = R?@ and the distance 
6 is the Lz norm. 

The Voronoi diagram V(S) may now be de- 
fined as the limit between the different Voronoi 
cells 


V(S) =E\ LU Vs(p). 


pes 


The Delaunay triangulation D(S) is the geo- 
metric dual of V(S). More formally, D(S) is a 
simplicial complex defined by 


OE 


(S) <=> () Vs(p) #9, 


peo 


where Vs(p) is the closure of the Voronoi cell 
Vs (p) (see Fig. 1). 

Voronoi diagrams and Delaunay triangulations 
have received a lot of attention in the literature 


Delaunay Triangulation and Randomized Construc- 
tions, Fig. 1 The Voronoi diagram of a set S of 15 points 
and its dual Delaunay triangulation 
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with several surveys, books, or book chapters 
(e.g., [4, 14]) and hundreds of papers. In this 
article, we will focus on randomized construction 
algorithms for the Delaunay triangulation. Such 
algorithms use randomness to speed up their 
running time but do not assume any randomness 
in the data distribution. 


Key Results 
Delaunay Properties 


Empty Ball Property 

One crucial property of the Delaunay triangula- 
tion, which is the basis of many algorithms, is 
the empty ball property, which guarantees that a 
triangle is a Delaunay triangle of S if and only if 
the interior of its circumball does not contain any 
point of S. 


Size of the Triangulation 

In the plane, the combinatorial properties of a 
triangulation (not necessarily Delaunay) are com- 
pletely fixed by the Euler relation. In particular, 
given n vertices, h of which on the convex hull, 
every triangulation must have 2n —2—/A triangles 
and 3n — 3 —h edges. In dimension d, the Dehn- 
Sommerville relations yield a linear dependence 
for the number of simplices of all dimensions 
on the number of simplices of dimensions k 
fork < Ea this gives an O (n (#1) upper 
bound for the number of simplices of all di- 
mension. For both Delaunay and more general 
triangulations, these bounds are tight in the worst 
case. 

These bounds can be tightened given some as- 
sumptions on the distribution of the input sites. If 
the points are uniformly distributed in a compact 
convex of fixed volume, then the triangulation 
size (its total number of simplices) is O(n), with 
a constant exponential in d [9]. In 3D, and for re- 
construction purposes, it is convenient to assume 
that the points lie on a surface. It is known that 
the Delaunay triangulation of points uniformly 
distributed on a convex polyhedron has size O(n) 
(for a constant depending on the polyhedron 
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complexity). For points uniformly distributed on 
a (non convex) polyhedron, the triangulation’s 
size is between §2(n) and O(nlogn) [12]. If, 
instead of making a probabilistic assumption, we 
assume that the points are a “good sampling” 
of the surface such that every small ball cen- 
tered on the surface contains between | and k 
points (where « is a constant), then the size of 
the Delaunay triangulation is @(n) for a poly- 
hedron, O(nlogn) for a generic smooth sur- 
face [3], and 2(n./n) for a nongeneric surface 
(e.g., a cylinder). In the case of the cylinder, 
a uniformly distributed point set has a trian- 
gulation of size O(n logn). In dimension d, a 
p-dimensional polyhedron whose faces have a 
“good sampling” has size O(n*) where k = 
d+1 
Et oy 


First Algorithms 
Many classical techniques in algorithmic and 
computational geometry have been used to at- 
tack the problem of constructing the Delaunay 
triangulation and the Voronoi Diagram. The gift 
wrapping and the incremental approaches were 
introduced in the 1970s [11], followed by some 
worst case optimal algorithms in 2D, based on 
divide-and-conquer [13] and sweep line tech- 
niques [10]. In higher dimensions, the optimal 
worst case construction of Delaunay triangulation 
and convex hulls was solved in the 1990s. 

In the remainder of the entry, we will describe 
some further algorithmic techniques that may be 
used to construct the Delaunay triangulation. 


Randomized Construction 
One popular and efficient method, applied 
to the Delaunay triangulation at the end of 


P< 
a 
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the 1980s [5], is Randomized Incremental 
Construction (RIC). The idea is to exploit the 
simplicity of an incremental algorithm while 
avoiding its worst case behavior by simply 
adjusting the order of insertion of the points. 


Conflict Graph 

Recalling that D(S) is the set of triangles with 
vertices in S whose circumballs are empty, the 
idea is to maintain for a sequence @ = Sg C 
S; C Sz Cc... C Sy = S, where |S;| = i, a 
sequence of triangulations D(.S;) with associated 
conflict graphs. We define the conflict graph to 
be a bipartite graph that links a point p of S \ S; 
to a simplex o in D(S;) if the circumball of o 
contains p (p and o are called in conflict). The 
information contained in the conflict graph sim- 
plifies the construction of D(S;+1) from D(S;) 
since it gives directly the simplices in D(S;)\ 
(Si+1). 

The key point comes from an analysis based 
on random sampling [6]; let’s assume that S; is 
a random sample of size i of S. We say that a 
simplex has width j if it has 7 points in conflict 
in S. In which case a Delaunay simplex is a 
simplex of width 0. Denote by A; the number of 
simplices of width j and let A<z = ))j<, Aj. 
We first bound A=, using the following remark: 
a simplex of width 7 is a Delaunay simplex 
of a random sample R of size % of S with 


probability pr = past (1 - 7)" (vertices of 
o must be chosen in R and points in conflict 
must not). Notice that for j € [2...k], we have 
(1 - z)’ > (1- Lyk > + since(1 — 1)* is an 
increasing function of value + for x = 2. For 
k > 2, we have (using P to denote the probability 


measure) 


Ay... Aah 


(R)) = DAP; = part jatl 


j<n 


> Sst A, <0 (pes) =0 (offal 211). 
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We can now analyze the incremental construction 
of D(S'). The probability that a triangle of width 
J appears during the construction is 


»_ Geen) 


 Gaas iy 
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(the number of permutations that look at the 
vertices of o before the points in conflict divided 
by the total number of permutations). Then the 
cost of the algorithm is given by the total number 
of conflicts occurring during the construction: 


se width (o)P(o appears) = A “jy: P; = Ys; — A<j-1)j pi 
j J 


oe Sat J 


J 


= Ag+ PU + Deh) s Dall lo (x2) <0[al@l S°j18l 
J j Fi 


which gives O(n logn) ford = 2 and O (n (21) 
for higher d. 


Backward Analysis 

A simpler way of analyzing RIC is backward 
analysis [15], and we will sketch it in 2D. The 
idea is quite simple and consists in asking: what 


uxp(degree(last point)) = 


is the cost of the last step? The answer is that 
the cost of modifying the triangulation during 
last insertion is clearly proportional to the degree 
(number of simplices incident) of the last point 
inserted into the triangulation. Since the last point 
is arandom point, its expected degree is 


1 6 
“xp(degree(random point)) = — ¥ degree(p) < a 6, 
n n 


pes 


and summing over all insertion steps gives a 
linear cost for updating the triangulation. It re- 
mains to count the cost of updating the conflict 
graph. We remark that there is a conflict between 
the last point p, and a triangle created by the 
insertion of the jth point p; if and only if the 
edge pj Pn exists in D(S; U {pn}). Since p; 
and py are both random points of S; U {pn}, 


it happens with probability O (3), the expected 
number of conflicts for p, is thus O (D, ) = 
O(logn), and the total number of conflicts is 


O (>-; logk) = O(nlogn). 


Delaunay Hierarchy 
The conflict graph approach assumes complete 
knowledge of S' to initialize the conflict graph. 
Using a lazy approach and postponing the con- 
flict determination, it is possible to obtain online 
algorithms [5]. 

Among the online schemes to construct 


the Delaunay triangulation, the Delaunay 


hierarchy [7] gives good results both in theory 
and in practice. The Delaunay hierarchy 
constructs a sequence of random samples 
S = So D S; D > S, such that 
P(p€ S; | p € Si-1) = a. Then the Delaunay 


triangulations of D(S;),1 < i < A are 
maintained under point insertions. Pointers 
from a vertex of D(S;) to the vertices at the 


same position in D(S;-,) Gf i < A) and in 
(S;+1) (if it actually belongs to $;+1) are also 
computed. 

When a new point p needs to be inserted, it is 
located by walking in D(S;,) (using neighborhood 
relations) to reach the closest vertex wy, of p 
in D(S;,). Then the hierarchy is descended, 
walking in D(S;) from w;+; to find w; the 
closest neighbor in sample S;. Using these 
neighbors, it is easy to insert p in D(S;,) and 
in the triangulation of other samples that the 
random process assigns to p. 

In 2D, the expected cost of the walk at any 
level is O(a") and the expected value for h is 
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O Ee ); Thus the theoretical complexity of 
the algorithm is O(n logn). The value of a can 
be optimized depending on the input distribution: 
for random points a = + gives good timings and 
a very low memory requirement in addition to the 


one needed for D(S). 


A Less Randomized Construction 

Constructing the Delaunay triangulation by in- 
serting the points in a random order presents a 
drawback with respect to memory management. 
Since the inserted point is random in S, there is 
very little chance that the triangles needed are 
present in the cache memory. So, an idea is to 
sort the points using a space-filling curve (see 
Fig. 2-left) to ensure locality of the insertions. 
Unfortunately, when inserting the points in such 
an order, the randomized complexity results no 
longer apply and the number of created and 
destroyed triangles during the construction may 
explode on certain data sets. 

A smart solution has been proposed: it is 
possible to use an insertion order random enough 
to apply randomized complexity results and allow 
some locality to benefit from cache memory. 
BRIO (Biased Randomized Insertion Order) [1] 
proposes to partition S in a set of random samples 
S = Uogi<n Si such that |S;| = o|5S;+1| for 
a < 1, a small constant (e.g.,a@ = +). and 
to insert the samples by increasing size, each 
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sample being sorted using a spatial filling curve 
(see Fig. 2-right). 

In the random setting, we have seen that 
the probability for a triangle of width 7 to 
appear in the conflict graph algorithm was 
CESUEED CES) = @(j~3). Using BRIO, this 
probability is a bit less intricate to compute, but 
it can be bounded in terms of @ and it can be 
shown that it is still @(j~+) and thus randomized 
complexity results still apply. 


Experimental Results 


On a 16GB, 2.3GHz desktop CGAL currently 
computes the Delaunay triangulation of up to 
200M points in 2D and 50M points in 3D [8]. 

Static timings are almost constant with respect 
to the total number of points and are about | us 
per point in 2D and 8 us per point in 3D. In the 
dynamic setting, one million points are processed 
in 6s in 2D and 25s in 3D. 


URLs to Code and Data Sets 


CGAL, among a big collection of computational 
geometry algorithms, provides implementations 
for Delaunay triangulations in 2D, 3D, and gen- 
eral dimension. It computes the Delaunay tri- 
angulation in 2D and 3D using the Delaunay 
hierarchy in a dynamic setting and using BRIO 
for static computation (http://cgal.org). 


Delaunay Triangulation and Randomized Constructions, Fig. 2 Right: the Hilbert space-filling curve. Left: 


random points sorted with BRIO 
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Problem Definition 


Satisfiability is the central NP-complete problem. 


Given a Boolean formula in conjunctive normal 
form, for example, (x VV VZ)A(XVZ)A..., 


Derandomization of k-SAT Algorithm 


decide whether there is a satisfying assignment. 
An important subclass is kK-SAT, where the input 
is restricted to k-CNF formulas: CNF formulas in 
which every clause has at most k literals. In 1999, 
Uwe Schoéning [6] gave an extremely simple 
randomized algorithm for k-SAT of running time 


2(k —1)\" 
(a) poly(z). 


In particular this solves 3-SAT in time 
O*(1.334"), 4-SAT in O*(1.5") for 4-SAT, 
and so on (we use O* to suppress polynomial 
factors in 7). Several authors have attempted to 
derandomize Schéning’s algorithm, albeit at the 
cost of a greater running time: an algorithm of 
Dantsin, Goerdt, Hirsch, Kannan, Kleinberg, 
Papadimitriou, Raghavan, and Schoning [2] 
runs in time O*((2k/(k + 1))"), which for 
k = 3 is O*(1.5"). For k = 3 Brueggemann 
and Kern [1] achieve O(1.473”); Scheder [5] 
achieves O(1.465”); Kutzkov and Scheder [3] 
reduced this to O(1.439”). All improvements 
suffer from two drawbacks: they fall short 
of achieving the running time of Sch6ning’s 
randomized algorithm; most of them are tailored 
to k = 3; finally, they are all fairly complicated. 


Key Results 


We describe a simple deterministic algorithm due 
to Moser and Scheder [4] with a running time that 
matches that of Sch6ning’s up to subexponential 
overhead. That is, we prove the following theo- 
rem: 


Theorem 1 There is a deterministic algorithm 
deciding satisfiability of k-CNF formulas over n 


n+o(n) 
variables in time (24-2) : 


Notation 


For a CNF formula F' we denote by vbl(F’) the 
set of variables appearing in F’. Usually n = 
|vbl(F’)| denotes the number of variables in a 
formula. By {0, pe? (or {0, 1}”), we denote 
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the set of all truth assignments to these variables. 
We make frequent use of the notation F [wd] | 
which is the CNF formula created from F' by 
replacing every occurrence of the literal u by b 
and of u by 1 — b. For a literal u and a truth 
assignment a, we denote by a[u = 5] the truth 
assignment that sets u to b and agrees with a 
otherwise. 


A Promise Version of k-SAT 


The heart of the proof will be an algorithm 
solving a promise version of k-SAT: 


Theorem 2 Let F be a k-CNF formula over n 
variables, a € {0,1}" a (not necessarily satisfy- 
ing) assignment, andr € No. There is a deter- 
ministic algorithm sb-fast with the following 
properties: 


1. If F is unsatisfiable, sb-fast (F, r) returns 
unsatisfiable. 

2. If F has a satisfying assignment a* with 
dy (a,a*) < r, then sb-fast(F, a, r) 
returns satisfiable. 

3. Otherwise (i.e., if F is satisfiable but all 
satisfying assignments of F are too far from 
a), then sb-fast(F,a,r) might return 
unsatisfiable or satisfiable. 


Furthermore, sb-fast runs in time (k — 
1)" +2 poly(n). 
The “inner random walk” in Schdning’s 


algorithm has all properties stated in Theorem 2, 
except that it is randomized (with a small error 
probability). Combining Theorem 2 with the 
covering code machinery of Dantsin, Goerdt, 
Hirsch, Kannan, Kleinberg, Papadimitriou, 
Raghavan, and Schéning [2] yields our main 
result, Theorem 1. 


Theorem 3 Suppose there is an algorithm A 
which satisfies properties 1-4 of Theorem 2 and 
runs in time c’poly(n). Then there is an al- 
gorithm B solving k-SAT in time ea 


Furthermore, B is deterministic if A is. 


Plugging inc = k — 1 yields Theorem 2. 
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Preliminaries: A Slower Algorithm 


Dantsin et al. [2] give a deterministic algorithm, 
henceforth called sb- slow, satisfying Point 1-4 
of Theorem 2 with a running time of k”poly(n). 
We will start by explaining and analyzing this 
algorithm because our algorithm sb-fast uses 
it as a subroutine in case the input formula F is 
“well behaved.” 


Algorithm sb-slow(F,a,r). F is 
a k-CNF formula over n variables, a € 
{0, pe? a truth assignment, and r € 
No. 


1. Ifa satisfies F, return satisfiable. 
2. Else if r = 0, return unsatisfiable. 
3. Else: 

(a) Pick some clause C = (uj V--- V ug) 
unsatisfied by a. Note that £ < k holds 
in any case, but £ < k is possible. 

(b) Set F; := Flr, 

(c) Call sb-slow(F;, a, r — 1) for all 


1<i<f. 

(d) If some of these @ recursive calls 
returns satisfiable, return 
Satisfiable, otherwise return 
unsatisfiable. 


It is obvious that sb-slow runs in time 
k’poly(7). For correctness, note that sb-slow 
returns unsatisfiable if F is unsatisfiable. 
If F has a satisfying assignment a* 4 aq, let 
(u, V ... V ux) be the clause picked in step (3a). 
Now a” satisfies some literal u; in that clause, 
and thus the formula F; := F wi=1] jg satisfiable, 
as well. Since neither u; nor u; appears in Fj, 
we see that a*[uj = 0] satisfies F;. Since 
dy(a*[uj = O],a) = dy(a*,a)-—1<r-1, 
the call sb-slow(F;,a,r — 1) will return 
satisfiable. 


Speeding Up the Algorithm 


Let k € No be fixed from now on. 


Derandomization of k-SAT Algorithm 


Definition 1 Let F be ak-CNF formula and a € 
{0, pee: We say that F is good (with respect 
to a) if @ satisfies all k-clauses of F (it might still 
violate smaller clauses). 


Observe that if F is good, then Fl"! is good 
for every literal u. If F is good with respect to a, 
then sb-slow(F, a, r) picks a clause of size at 
most k — 1 in step (3a) and causes at most k — 1 
recursive calls, each again with a good formula: 


Lemma 1 Suppose F is a k-CNF that is good 
with respect to a. Then sb- slow (F, @, r) runs 
in time (k — 1)" poly(n). 


Great! The only thing that is left is to do 
something smart for formulas that are not good, 
i.e., have some unsatisfied clauses of size k. We 
will now describe how sb-fast proceeds and 
give a precise pseudocode description later. The 
algorithm sb- fast greedily finds a maximal set 
of pairwise disjoint k-clauses that are unsatisfied 
by @: Cy,...,Cm, all over disjoint sets of vari- 
ables. This can be done in polynomial time. Let 
B be an assignment to these variables. 


Proposition 1 F!*! is good with respect to a. 


This is easy to see: consider a k-clause C in F 
that is not satisfied by a. Then C might or might 
not be among C),..., Cm, but by maximality it 
shares at least one variable with some 7. This vari- 
able disappears in F!*1, and C is either satisfied 
or shrinks to something smaller than k. 

Fix t = |log, log, n|.Ifm <t sb-fast can 
simply iterate over all 24" < 2*t < (log, n)k 
different assignments 6 to the variables in 
Ci,....Cm: each FI8] js good, and thus 
sb-fast (F/Fl7) runs in time (k—1)’ poly(n). 
Also, if F is unsatisfiable then all F 1B) are. 
If F has a satisfying assignment a@* with 
dy(a*,a) < r, then at least one FI8! does, 
too. So this step is correct. We are left with 
the case that m > t. In this case we define 
G2= C1, x5. Ge 


k-ary Covering Codes 


The set [k]’ is endowed with a Hamming 
distance, just as the Hamming cube is: for 
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w,w € |k]’, we define dy(w.w) := 
| {i € [¢] | wi x wi YI. The k-ary Hamming ball 
around w of radius s is the set Bw) 

{w’ € [k]' |dy(w,w’) <s}. The number of 
elements in such a ball is independent of w. 
We define and observe 


vol (t,5) = |B o| = = (;) (k-1). 


i=0 


If C © [k]’ and Uvec B® (w) = [k]’, we call 
C ak-ary code of length ¢ and covering radius s. 
Using the probabilistic method it is easy to show 
the following result: 


Lemma 2 Lett, k € N, ands € No. There exists 
a k-ary code of length t and covering radius s 
with at most 


tIn(k) k? 

vol (t, 5) 
elements. 
Observe that k‘ < log, n and thus there are at 
most 2'°82” = n subsets C © [k]’. We iterate 


through all of them and find a smallest code of 
covering radius s. 

Consider G = (C),..., Cy), our maximal set 
of pairwise disjoint k-clauses not satisfied by a. 
Any satisfying assignment a* of F must satisfy 
at least one literal in each C;. Since they are 
pairwise disjoint, this implies dy(a, a*) > t. 
There are exactly k’ assignments that satisfy G 
and have distance exactly t from a. Each such 
assignment can be represented by a w € [k]’ in 
the obvious way. To be more precise, for w € [k]' 
we define a[G, w] to be the assignment which we 
obtain from a by flipping the wih literal in C;, for 
1 <i <t. If G is understood from the context, 
we write a[w] instead of a[G, w]. 


Example 1 Consider G = ((x1, y1, Z1), (X2, 
yo, Z2), (x3, y3, Z3)), a = (0,...,0) andt = 
3. Let w = (2, 3,3). Then a[w] is the assignment 
that sets y;, Z2, and z3 to | and all other variables 
to 0. 
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Proposition 2 We observe the following facts 
about a[w]: 


1. dy(a, a[w]) = t for every w € [k]'. 

2. If a* satisfies F, then for some w* € [k]’ we 
have dy(a[w*], a*) = dy(a, a*) -t. 

3. Let w,w’ € [k]’. Then dy(a[w], a[w’]) = 
2du(w, w’). 


Lemma 3 Lett and G be defined as above, and 
letC © [k]’ be ak-ary code of covering radius s. 
Tf a* is a satisfying assignment of F, then there 
is some w € C such that dy(a[w*],a*) < 
dy(a,a*)—t+2s. 


In particular, if B,(a) contains a satisfying 
assignment, then there is some w € C such that 
B,—t42s (a [w]) contains it, too. 


Proof (of Lemma 3) By Proposition 2, there is 
some w* ¢€ [k]’ such that dy (a [w*],a*) = 
dy (a,a*) —t <r —t. Since C has covering ra- 
dius s, there is some w € C such that dy (ww*) < 
s, and by Observation 2, dy(a[w], a[w*]) < 
2s. The lemma now follows from the triangle 
inequality. The proof is illustrated in Fig. 1. 


We now state sb-fast formally. We com- 
pute an optimal k-ary code C of length t and 
covering radius s = t/k that is fixed throughout 
the run of the algorithm. 


Algorithm sb-fast(F, a, r). 


1. If aq satisfies F, return satisfiable. 

2. Else ifr = 0, return unsatisfiable. 

3. Else let G be maximal set of pairwise 
disjoint k-clauses of F' unsatisfied by a. 

4. If |G| = ¢t := log, log,n: Call 
sb-slow(F/4l, w,r) for every B € 
{0, 1)" and return satisfiable iff 
at least one call returns satisfiable. 

5. Else, if |G| > t set 
G = (Ci,...,Cz) and call 
sb-fast(F,a[G,w],r—t + 2t/k) 
for every w © C and return 
satisfiable iff at least one call 
returns Satisfiable. 
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Derandomization of 
k-SAT Algorithm, Fig. 1 
The distance from a* to 
a[w] is at most the distance 
from a* to a[w*] plus 2s 


Derandomization of k-SAT Algorithm 


Correctness of sb- fast follows from the above 
discussion: If there is some a* € B,(q@) that 
satisfies F', then for at least one w € C it holds 
that dy (a [w],a*) < r —t + 2t/k, thus the 
corresponding recursive call to sb- fast will be 
successful. 

What about the running time? If |G| < f, then 
every call to sb-slow (FI8) a, r) runs in time 
O((k — 1)"poly(n)). Otherwise, the procedure 
sb-fast calls itself recursively for each w € C. 
Every level further into the recursion, the param- 
eter r decreases by t — 2t/k. The running time 
is therefore |C \"/ teozEE RD poly(). To evaluate this 
we have to estimate the size of C. Recall that 
s=t/k. 


c| < k'poly (1) _ _k'poly (t) 
~ vol (¢,) — (‘) s 
(k —1) 
S 
- k‘poly (¢) 
~ (6) (GQ) kD 
k* poly (t) 


_ t—s 
ks (G5) k= 
= (k —1)'~”5 poly(t). 
Therefore, 


= = r/(t—2s) 
ici"! |< ((k — 1 poly () 


= (k= 1)" poly ("0 


Since ¢ is a growing function in n, the term 
poly(t)!/“—?5) converges to 1 as n grows, and the 
running time is at most (k — 1)"t°™ Poly), This 
completes the proof of Theorem 2. 


Q 
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Problem Definition 


One of the most fundamental communication 
problems in wired as well as wireless networks 
is broadcasting, where one distinguished source 
node has a message that needs to be sent to all 
other nodes in the network. 

The radio network abstraction captures the 
features of distributed communication networks 
with multi-access channels, with minimal as- 
sumptions on the channel model and processors’ 
knowledge. Directed edges model unidirectional 
links, including situations in which one of two 
adjacent transmitters is more powerful than the 
other. In particular, there is no feedback mecha- 
nism (see, for example, [13]). In some applica- 
tions, collisions may be difficult to distinguish 
from the noise that is normally present on the 
channel, justifying the need for protocols that 
do not depend on the reliability of the collision 
detection mechanism (see [9, 10]). Some network 
configurations are subject to frequent changes. 
In other networks, topologies could be unstable 
or dynamic; for example, when mobile users 
are present. In such situations, algorithms that 
do not assume any specific topology are more 
desirable. 

More formally a radio network is a directed 
graph where by 1 we denote the number of nodes 
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in this graph. If there is an edge from u to v, 
then we say that v is an out-neighbor of u and 
u is an in-neighbor of v. Each node is assigned a 
unique identifier from the set {1,2,...,}. In the 
broadcast problem, one node, for example, node 
1, is distinguished as the source node. Initially, 
the nodes do not possess any other informa- 
tion. In particular, they do not know the network 
topology. 

The time is divided into discrete time steps. 
All nodes start simultaneously, have access to 
a common clock, and work synchronously. A 
broadcasting algorithm is a protocol that for each 
identifier id, given all past messages received by 
id, specifies, for each time step t, whether id will 
transmit a message at time f¢, and if so, it also 
specifies the message. A message M transmitted 
at time ¢ from a node u is sent instantly to all its 
out-neighbors. An out-neighbor uv of u receives 
M at time step ¢ only if no collision occurred, that 
is, if the other in-neighbors of v do not transmit 
at time ¢ at all. Further, collisions cannot be 
distinguished from background noise. If v does 
not receive any message at time ¢, it knows that 
either none of its in-neighbors transmitted at time 
t or that at least two did, but it does not know 
which of these two events occurred. The running 
time of a broadcasting algorithm is the smallest 
t such that for any network topology, and any 
assignment of identifiers to the nodes, all nodes 
receive the source message no later than at step f. 

All efficient radio broadcasting algorithms are 
based on the following purely combinatorial con- 
cept of selectors. 

Selectors Consider subsets of {1,...,1}. We 
say that a set S hits a set X iff |SN X| = 1, 
and that S avoids Y iff SN Y = @. A family S 
of sets is a w-selector if it satisfies the following 


property: 


(*) For any two disjoint sets X¥, Y withw/2 < 
|X| < w, |Y| < w, there is a set in S which 
hits X and avoids Y. 


A complete layered network is a graph consist- 
ing of layers Lo,..., Lm—1, in which each node 
in layer L; is directly connected to every node in 
layer L;+1, for alli = 0,...,m— 1. The layer 
Lo contains only the source node s. 
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Key Results 


Theorem 1 ([5]) For all positive integers w and 
n, S.t.. w <n, there exists a w-selector S with 
O(w log n) sets. 


Theorem 2 ({5]) There exists a deterministic 
O(n log?n)-time algorithm for broadcasting in 
radio networks with arbitrary topology. 


Theorem 3 ({[5]) There exists a deterministic 
O(n log n)-time algorithm for broadcasting in 
complete layered radio networks. 


Applications 


Prior to this work, Bruschi and Del Pinto showed 
in [1] that radio broadcasting requires time 
Q(n log D) in the worst case. In [4], Chlebus 
et al. presented a broadcasting algorithm with 
time complexity O(n!/°) — the first subquadratic 
upper bound. This upper bound was later 
improved to O(n5/3 log3n) by De Marco and 
Pelc [8] and by Chlebus et al. [3] to O(n?3/2) by 
application of finite geometries. 

Recently, Kowalski and Pelc in [12] proposed 
a faster O(n log n log D) — time radio broadcast- 
ing algorithm, where D is the eccentricity of the 
network. Later, Czumaj and Rytter showed in [6] 
how to reduce this bound to O(n log? D). The re- 
sults presented in [5] (see Theorems 1-3, as well 
as further improvements in [6, 12]) are existential 
(non-constructive). The proofs are based on the 
probabilistic method. A discussion on efficient 
explicit construction of selectors was initiated by 
Indyk in [11] and then continued by Chlebus and 
Kowalski in [2]. 

More careful analysis and further discussion 
on selectors in the context of combinatorial group 
testing can be found in [7], where DeBonis et al. 
proved that the size of selectors is @(w log *). 


Open Problems 
The exact complexity of radio broadcasting re- 


mains an open problem, although the gap be- 
tween the lower and upper bounds Q(n log D) 
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and O(n log? D) is now only a factor of log D. 
Another promising direction for further studies is 
improvement of efficient explicit construction of 
selectors. 
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Problem Definition 


In the Linear Search Problem (LSP), we seek effi- 
cient strategies for locating an immobile target on 
the infinite line. More formally, the search envi- 
ronment consists of the infinite (i.e., unbounded) 
line, with a point O designated as a specific start 
point. A mobile searcher is initially located at O, 
whereas the target may be hidden at any point 
on the line. The searcher’s strategy S defines the 
movement of the searcher on the line; on the 
other hand, the hider’s strategy H is defined as 
the precise placement of the target on the line, 
and we denote by || the distance of the target 
from the start point. Given strategies S, H, the 
cost of locating the target, denoted by c(S, H) 
is the total distance traversed by the searcher at 
the first time the target is located. The normalized 


cost of the strategies is defined as the quantity 
c(S, H) = Sip. 

The objective of the linear search problem 
is to determine a strategy S for the searcher 
that minimizes the worst-case normalized cost, 
namely, the quantity sup; C(S, H); the latter is 
often referred to as the competitive ratio of the 
strategy S, due to similarities of this setup with 
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the competitive analysis of online algorithms. In 
game-theoretic terms, the problem can be de- 
scribed as a zero-sum game between the searcher 
and the hider in which we seek the minimax 
strategy of the game. 


Extensions 

A natural extension of the linear search problem 
is the m-ray search problem, also known as the 
star search problem. Here, the search environ- 
ment consists of m infinite rays, with the start 
point O being their common intersection point. 
Clearly, the linear search problem is precisely the 
2-ray search problem. 


Constraints 

It must be noted that if || is arbitrarily small, 
no strategy of constant competitive ratio exists. 
Hence, a frequent and natural assumption in the 
field is to assume that |H| > 1, i-e., that the 
target is hidden at least at some minimum allowed 
distance from the start point. A different assump- 
tion that can be made in order to circumvent 
this complication is that the search strategy must 
incorporate an infinite sequence of infinitesimal 
steps (i.e., depths of exploration). In this entry we 
assume the former, namely, that |H| > 1. 

In addition, we assume only deterministic (i.e., 
pure) strategies for both the searcher and the 
hider. We note that a substantial amount of pre- 
vious work has addressed mixed strategies under 
given probability distributions on the placement 
of the target. We refer the reader to the textbook 
of Alpern and Gal [1]. 


Key Results 


We consider two variants of the problem. In the 
first variant, the searcher lacks any information 
concerning the hidden target. In the second vari- 
ant, the searcher knows that the target is within 
distance h = |H| from the start point O. 

Note that for the linear search problem, the 
searcher’s strategy is completely determined by 
the sequence of search depths {x;}j>1, where x; 
denotes the total distance from the start point 
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in which the line is searched during the i-th 
exploration. 


Searching with No Information 
It has long been known that a doubling strategy, 
namely, the strategy {2'};>, attains an optimal 
competitive ratio equal to 9 for this variant. The 
result is due to Beck and Newman [4] and redis- 
covered by Baeza- Yates et al. [2]. 

Calculating an upper bound on the competitive 
ratio is easy: if the target is at distance h from O, 
the doubling strategy will discover it at traversal 
2k + 1, where 27"! < h and 27*+! > h. The 
total ESianee traversed by the searcher is equal to 
2 ae 2i+d = 4(2?*—1)+h. The competitive 
ratio is maximized when h — oo and converges 
(from below) to 9. 

An elegant approach for proving the tightness 
of this bound is based on lower bounds on cer- 
tain functionals over positive sequences [8]. Let 
{x;}i>1 be an optimal search strategy, then it is 
easy to see that its cinpelive ratio is at least 


Xi 


equal to sup; 1 + gia Eiei* . In addition, it can 
be readily seen that an optimal search strategy 
{xi}i>1 must be monotone, ie., x1 > x; for 
i > j. Given the above, Gal shows that there 


exists a > 1 such that 


= Sup lim 1+2 


k>oo 


k+1 k+1 i 
» 1 Xj a 
sup 1 + 221 1 Vint 
k Xk 


which in turn is at least equal to 9, for all a > 1. 

Informally, the above argument shows that 
geometric strategies of the form {a' };>1 comprise 
the space of optimal strategies, and by choosing 
a = 2 one obtains the best strategy. 


Searching with an Upper Bound on the 
Target Distance 

In the setting in which the searcher has an upper 
bound / on the distance of the target from the 
start point, it is possible to obtain improved com- 
petitive ratios. Jaillet and Stafford [9] approach 
this problem by solving the following “dual” 
problem: given a target competitive ratio r, and 
the upper bound h, what is the largest “extent” 
(i.e., the furthest one can go in both directions) 
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that can be searched while guaranteeing a com- 
petitive ratio at most r? A solution to this problem 
implies a solution to the (primal) search problem: 
it suffices to find the smallest r such that e(r) > 
h, where e(h) is the best-possible extent. 

The dual problem of finding e(r) is addressed 
by means of a series of linear-program formu- 
lations. The solution to this series of linear pro- 
grams defines a search strategy in which the 
search depths {x;};>1 are determined by an ap- 
propriate linear recurrence relation. As a last 
step, e(r) is obtained as a particular element 
of the sequence that is generated by the linear 
recurrence in question. It should be noted that 
although the strategy is optimal, this technique 
does not yield a closed-form expression of the 
optimal competitive ratio (given /). 

A similar approach leads to a solution for 
m-tay searching with an upper bound on the 
target distance. The crucial difficulty here, as 
opposed to the case m = 2, is in showing 
that an optimal strategy can be found in the 
class of cyclic strategies: these are strategies in 
which the searcher always visits the rays in some 
fixed round-robin order. This seemingly intuitive 
property is surprisingly hard to be established 
formally. To bypass this obstacle, one must first 
show that the property holds when the search 
depths form a nondecreasing sequence. Then one 
can argue that, once a searcher is at the start point, 
it will always choose to explore the ray that has 
been explored the least up to the current point. As 
noted in [9], this “least-extended-so-far discipline 
is the link between non-decreasing depths and 
the cyclic property that is sought.” Once the 
optimality of cyclic strategies is established, a 
similar approach can be applied as in the case 
m = 2; namely, the search depths are determined 
by a (more complicated) linear recurrence. 


Applications 


The problem has obvious applications in the 
context of robotic navigation in an unknown envi- 
ronment. Strategies based on doubling are used in 
searching more complicated environments, e.g., 
a graph [11]. The linear search problem and its 
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generalization have connections with the design 
of black-box strategies for obtaining interruptible 
algorithms. The latter class consists of algorithms 
with the property that they return efficient solu- 
tions even if interrupted during their execution. 
Such algorithms are very desirable in the context 
of real-time and anytime applications in artificial 
intelligence [5]. 


Cross-References 


Randomized Searching on Rays or the Line 


Recommended Reading 


1. Alpern S, Gal S (2003) The theory of search games 
and rendezvous. Kluwer Academic, Boston 
2. Baeza-Yates R, Culberson J, Rawlins G (1993) 
Searching in the plane. Inf Comput 106(2): 
234-252 
3. Beck A (1964) On the linear search problem. Naval 
Res Logist 2:221—228 
4. Beck A, Newman DJ (1970) Yet more on the linear 
search problem. Isr J Math 8:419-429 
5. Bernstein DS, Finkelstein L, Zilberstein S (2003) 
Contract algorithms and robots on rays: unifying two 
scheduling problems. In: Proceedings of the 18th 
international joint conference on artificial intelligence 
(IJCAI), Acapulco, pp 1211-1217 
6. Bose P, De Carufel J, Durocher S (2013) Revisiting 
the problem of searching on a line. In: Proceedings of 
the 21st European symposium on algorithms (ESA), 
Sophia Antipolis, pp 205-216 
7. Gal S (1972) A general search game. Isr J Math 
12:34-45 
8. Gal S (1974) Minimax solutions for linear search 
problems. SIAM J App! Math 27:17-30 
9. Jaillet P, Stafford M (2001) Online searching. Oper 
Res 49:501-515 
10. Kirkpatrick DG (2009) Hyperbolic dovetailing. In: 
Proceedings of the 17th annual European sympo- 
sium on algorithms (ESA), Copenhagen, pp 616- 
627 
11. Koutsoupias E, Papadimitriou C, Yannakakis M 
(1996) Searching a fixed graph. In: Proceedings 
of the 23rd international colloquium on automata, 
languages, and programming (ICALP), Paderborn, 
pp 280-289 
12. Lépez-Ortiz A, Schuierer S (2001) The ultimate 
strategy to search on m rays. Theor Comput Sci 
261(2):267—295 
13. Schuierer S (2001) Lower bounds in online geomet- 
ric searching. Comput Geom Theory Appl 18(1): 
37-53 


533 


Dictionary Matching 


Moshe Lewenstein 
Department of Computer Science, Bar-Ilan 
University, Ramat-Gan, Israel 


Keywords 


Approximate dictionary matching; Approximate 
text indexing 


Years and Authors of Summarized 
Original Work 


2004; Cole, Gottlieb, Lewenstein 


Problem Definition 


Indexing and dictionary matching are generalized 
models of pattern matching. These models have 
attained importance with the explosive growth of 
multimedia, digital libraries, and the Internet. 


1. Text Indexing: In text indexing one desires 
to preprocess a text ft, of length n, and to 
answer where subsequent queries p, of length 
m, appear in the text fr. 

2. Dictionary Matching: In dictionary match- 
ing one is given a dictionary D of strings 
P1.---»Pa to be preprocessed. Subsequent 
queries provide a query string ¢, of length n, 
and ask for each location in ¢ at which patterns 
of the dictionary appear. 


Key Results 


Text Indexing 

The indexing problem assumes a large text that 
is to be preprocessed in a way that will allow 
the following efficient future queries. Given a 
query pattern, one wants to find all text locations 
that match the pattern in time proportional to the 
pattern length and to the number of occurrences. 
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To solve the indexing problem, Weiner [14] 
invented the suffix tree data structure (originally 
called a position tree), which can be constructed 
in linear time, and subsequent queries of length 
m are answered in time O(mlog|=| + tocc), 
where tocc is the number of pattern occurrences 
in the text. 

Weiner’s suffix tree in effect solved the 
indexing problem for exact matching of fixed 
texts. The construction was simplified by the 
algorithms of McCreight and, later, Chen 
and Seiferas. Ukkonen presented an online 
construction of the suffix tree. Farach presented 
a linear time construction for large alphabets 
(specifically, when the alphabet is {1,...,n°}, 
where 7 is the text size and c is some fixed 
constant). All results, besides the latter, work by 
handling one suffix at a time. The latter algorithm 
uses a divide-and-conquer approach, dividing the 
suffixes to be sorted to even-position suffixes and 
odd-position suffixes. See the entry on > Suffix 
Tree Construction for full details. The standard 
query time for finding a pattern p in a suffix tree 
is O(m log |%]). By slightly adjusting the suffix 
tree, one can obtain a query time of O(m+logn); 
see [12]. 

Another popular data structure for indexing is 
suffix arrays. Suffix arrays were introduced by 
Manber and Myers. Others proposed linear time 
constructions for linearly bounded alphabets. All 
three extend the divide and conquer approach 
presented by Farach. The construction in [11] 
is especially elegant and significantly simplifies 
the divide-and-conquer approach, by dividing the 
suffix set into three groups instead of two. See 
the entry on > Suffix Array Construction for full 
details. The query time for suffix arrays is O(m+ 
logn) achievable by embedding additional Icp 
(longest common prefix) information into the 
data structure. See [11] for reference to other 
solutions. Suffix Trays were introduced in [6] as a 
merge between suffix trees and suffix arrays. The 
construction time of suffix trays is the same as 
for suffix trees and suffix arrays. The query time 
is O(m + log |X)). 

Solutions for the indexing problem in dynamic 
texts, where insertions and deletions (of single 
characters or entire substrings) are allowed, 
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appear in several Papers; see [2] and references 
therein. 


Dictionary Matching 

Dictionary matching is, in some sense, the “in- 
verse” of text indexing. The large body to be 
preprocessed is a set of patterns, called the dic- 
tionary. The queries are texts whose length is 
typically significantly smaller than the dictionary 
size. It is desired to find all (exact) occurrences of 
dictionary patterns in the text in time proportional 
to the text length and to the number of occur- 
rences. 

Aho and Corasick [1] suggested an automaton- 
based algorithm that preprocesses the dictionary 
in time O(d) and answers a query in time 
O(n + docc), where docc is the number of 
occurrences of patterns within the text. Another 
approach to solving this problem is to use a 
generalized suffix tree. A generalized suffix tree is 
a suffix tree for a collection of strings. Dictionary 
matching is done for the dictionary of patterns. 
Specifically, a suffix tree is created for the 
generalized string p $1 p2$2...$pqa$qa, where 
the $;’s are not in the alphabet. A randomized 
solution using a fingerprint scheme was proposed 
in [3]. In [7] a parallel work-optimal algorithm 
for dictionary matching was presented. Ferragina 
and Luccio [8] considered the problem in the 
external memory model and suggested a solution 
based upon the String Btree data structure along 
with the notion of a certificate for dictionary 
matching. Two-dimensional dictionary matching 
is another fascinating topic which appears 
as a separate entry. See also the entry on 

Multidimensional String Matching. 


Dynamic Dictionary Matching 

Here one allows insertion and deletion of patterns 
from the dictionary D. The first solution to the 
problem was a suffix tree-based method for solv- 
ing the dynamic dictionary matching problem. 
Idury and Schaffer [10] showed that the failure 
function (function mapping from one longest 
matching prefix to the next longest matching 
prefix; see [1]) approach and basic scanning loop 
of the Aho-Corasick algorithm can be adapted to 
dynamic dictionary matching for improved initial 
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dictionary preprocessing time. They also showed 
that faster search time can be achieved at the 
expense of slower dictionary update time. 

A further improvement was later achieved 
by reducing the problem to maintaining a se- 
quence of well-balanced parentheses under cer- 
tain operations. In [13] an optimal method was 
achieved based on a labeling paradigm, where 
labels are given to, sometimes overlapping, sub- 
strings of different lengths. The running times are 
O(|D|) preprocessing time, O(m) update time, 
and O(n + docc) time for search. See [13] for 
other references. 


Text Indexing and Dictionary Matching 

with Errors 

In most real-life systems, there is a need to allow 
errors. With the maturity of the solutions for exact 
indexing and exact dictionary matching, the quest 
for approximate solutions began. Two of the 
classical measures for approximating closeness 
of strings, Hamming distance and Edit distance, 
were the first natural measures to be considered. 


Approximate Text Indexing 

For approximate text indexing, given a distance 
k, one preprocesses a specified text t. The goal 
is to find all locations ¢ of t within distance k of 
the query p, i.e., for the Hamming distance all 
locations £ such that the length m substring of 
t beginning at that location can be made equal 
to p with at most k character substitutions. (An 
analogous statement applies for the edit distance.) 
For k = 1 [4] one can preprocess in time 
O(n log” n) and answer subsequent queries p 
in time O(m,/logn loglogn + occ). For small 
k > 2, the following naive solutions can be 
achieved. The first possible solution is to traverse 
a suffix tree checking all possible configurations 
of k, or less, mismatches in the pattern. However, 
while the preprocessing needed to build a suffix 
tree is cheap, the search is expensive, namely, 
O(m*+1|>|* + occ). Another possible solution, 
for the Hamming distance measure only, leads to 
data structures of size approximately O(n*+") 
embedding all mismatch possibilities into the 
tree. This can be slightly improved by using the 
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data structures for k = 1, which reduce the size 
to approximately O(n*). 


Approximate Dictionary Matching 

The goal is to preprocess the dictionary along 
with a threshold parameter k in order to sup- 
port the following subsequent queries: Given a 
query text, seek all pairs of patterns (from the 
dictionary) and text locations which match within 
distance k. Here once again there are several 
algorithms for the case where k = 1 [4, 9]. 
The best solution for this problem has query 
time O(mloglogn + occ); the data structure 
uses space O(n logn) and can be built in time 
O(n logn). 

The solutions for k = 1 in both problems 
(Approximate Text Indexing and Approximate 
Dictionary Matching) are based on the following, 
elegant idea, presented in Indexing terminology. 
Say a pattern p matches a text ¢ at location i 
with one error at location 7 of p (and at location 
i+ j-—1oft). Obviously, the 7 — 1-length prefix 
of p matches the aligned substring of ¢ and so 
does the m — j — | length suffix. If ¢ and p are 
reversed, then the 7 — 1-th length prefix of p 
becomes a j — 1-th length suffix of p* (that is 
p reverse). Notice that there is a match with, at 
most one error, if (1) the suffix of p starting at 
location 7 + 1 matches the (prefix of the) suffix 
of ¢ starting at location i + j and (2) the suffix 
of p® starting at location m — j + 1 (the reverse 
of the 7 — 1-th length prefix of p) matches the 
(prefix of the) suffix of ¢* starting at location 
m—i— j +3. So, the problem now becomes a 
search for locations j which satisfy the above. To 
do so, the abovementioned solutions, naturally, 
use two suffix trees, one for the text and one for 
its reverse (with additional data structure tricks 
to answer the query fast). In dictionary matching 
the suffix trees are defined on the dictionary. The 
problem is that this solution does not carry over 
for k > 2. See the introduction of [5] for a full 
list of references. 


Text Indexing and Dictionary Matching 
Within (Small) Distance k 

Cole et al. [5] proposed a new method that yields 
a unified solution for approximate text indexing, 
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approximate dictionary matching, and other re- 
lated problems. However, since the solution is 
somewhat involved, it will be simpler to explain 
the ideas on the following problem. The desire 
is to index a text ¢ to allow fast searching for 
all occurrences of a pattern containing, at most, 
k don’t cares (don’t cares are special characters 
which match all characters). 

Once again, there are two possible, relatively 
straightforward, solutions to be elaborated. The 
first is to use a suffix tree, which is cheap to 
preprocess, but causes the search to be expensive, 
namely, O(m|=|* + occ) (if considering k mis- 
matches this would increase to O(m*+!|D|* + 
occ). To be more specific, imagine traversing a 
path in a suffix tree. Consider the point where 
a don’t care is reached. If in the middle of an 
edge the only text suffixes (representing sub- 
strings) that can match the pattern with this don’t 
care must also go through this edge, so simply 
continue traversing. However, if at a node, then 
all the paths leaving this node must be explored. 
This explains the mentioned time bound. 

The second solution is to create a tree that 
contains all strings that are at Hamming distance 
k from a suffix. This allows fast search but 
leads to trees of size exponential in k, namely, 
O(n**!) size trees. To elaborate, the tree, called 
a k-error trie, is constructed as follows. First, 
consider the case for one don’t care, i.e., a l-error 
trie, and then extend it. At any node v a don’t 
care may need to be evaluated. Therefore, create 
a special subtree branching off this node that 
represents a don’t care at this node. To understand 
this subtree, note that the subtree (of the suffix 
tree) rooted at v is actually a compressed trie 
of (some of the) suffixes of the text. Denote the 
collection of suffixes S,. The first character of 
all these suffixes has to be removed (or, perhaps 
better imagined as a replacement with a don’t 
care character). Each will be a new suffix of the 
text. Denote the new collection as Si: Now, create 
a new compressed trie of suffixes for S/,, calling 
this new subtree an error tree. Do so for every v. 
The suffix tree along with its error trees is a 1- 
error trie. Turning to queries in the 1-error trie, 
when traversing the 1-error trie, do so with the 
suffix tree up till the don’t care at node v. Move 
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into the error tree at node v and continue the 
traversal of the pattern. 

To create a 2-error trie, simply take each error 
tree and construct an error tree for each node 
within. A (k + 1)-error trie is created recursively 
from a k-error trie. Clearly the 1-error trie is of 
size O(n), since any node u in the original suffix 
tree will appear in all the new subtrees of the 1- 
error trie created for each of the nodes v which 
are ancestors of u. Likewise, the k-error trie is of 
size O(n**!), 

The method introduced in Cole et al. [5] uses 
the idea of the error trees to form a new data 
structure, which is called a k-errata trie. The k- 
errata trie will be much smaller than O(n*t+!), 
However, it comes at the cost of a somewhat 
slower search time. To understand the k-errata 
tries, itis useful to first consider the 1-errata tries 
and to extend. The 1-errata trie is constructed 
as follows. The suffix tree is first decomposed 
with a centroid path decomposition (which is a 
decomposition of the nodes into paths, where all 
nodes along a path have their subtree sizes within 
a range 2” and 2’*!, for some integer r). Then, 
as before, error trees are created for each node v 
of the suffix tree with the following difference. 
Namely, consider the subtree, 7), at node v and 
consider the edge (v, x) going from v to child x 
on the centroid path. 7, can be partitioned into 
two subtrees, 7 U (v,x), and T/ all the rest of 
Ty. An error tree is created for the suffixes in 
T;. The 1-errata trie is the suffix tree with all 
of its error trees. Likewise, a (kK + 1)-errata trie 
is created recursively from a k-errata trie. The 
contents of a k-errata trie should be viewed as 
a collection of error trees, k levels deep, where 
error trees at each level are constructed on the 
error trees of the previous level (at level 0 there 
is the original suffix tree). The following lemma 
helps in obtaining a bound on the size of the k- 
errata trie. 


Lemma 1 Let C be a centroid decomposition of 
a tree T. Let u be an arbitrary node of T and x 
be the path from the root to u. There are at most 
log n nodes v on x for which v and v’s parent on 
m are on different centroid paths. 
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The implication is that every node wu in the 
original suffix tree will only appear in log n error 
trees of the 1-errata trie because each ancestor v 
of u is on the path z from the root to u and only 
log n such nodes are on different centroid paths 
than their children (on 7). Hence, u appears in 
only log “n error trees in the k-errata trie. There- 
fore, the size of the k-errata trie is O(n log* n). 
Creating the k-errata tries in O(n log* +1 n) can 
be done. To answer queries on a k-errata trie, 
given the pattern with (at most) k don’t cares, 
the Oth level of the k-errata trie, i.e., the suffix 
tree, needs to be traversed. This is to be done until 
the first don’t care, at location j, in the pattern is 
reached. If at node v in the Oth level of the k- 
errata trie, enter the (1st level) error tree hanging 
off of v and traverse this error tree from location 
j +2 of the pattern (until the next don’t care is 
met). However, the error tree hanging off of node 
v does not contain the subtree hanging off of v 
that is along the centroid path. Hence, continue 
traversing the pattern in the Oth level of the k- 
errata trie, starting along the edge on the centroid 
path leaving v (until the next don’t care is met). 
The search is done recursively for k don’t cares 
and, hence, yields an o(2* m) time search. 

Recall that a solution for indexing text that 
supports queries of a pattern with k don’t cares 
has been described. Unfortunately, when index- 
ing to support k mismatch queries, not to mention 
k edit operation queries, the traversal down a 
k-errata trie can be very time consuming as 
frequent branching is required since an error 
may occur at any location of the pattern. To 
circumvent this problem, search many error trees 
in parallel. In order to do so, the error trees have 
to be grouped together. This needs to be done 
carefully; see [5] for the full details. Moreover, 
edit distance needs even more careful handling. 
The time and space of the algorithms achieved in 
[5] are as follows: 


Approximate Text Indexing: The data 
structure for mismatches uses space O(n log* n), 
takes time O(n log* +1 n) to build, and answers 
queries in time O((log* n) loglogn+m-+ occ). 
For edit distance, the query time becomes 
O((log* n)loglogn + m + 3* - occ). It must 
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be pointed out that this result is mostly effective 
for constant k. 


Approximate Dictionary Matching: For k 
mismatches the data structure uses space O(n + 
d log* d), is built in time O(n+d log*t! d), and 
has a query time of O((m + log‘ d) -loglogn + 
occ). The bounds for edit distance are modified 
as in the indexing problem. 


Applications 


Approximate Indexing has a wide array of 
applications in signal processing, computational 
biology, and text retrieval, among others. 
Approximate Dictionary Matching is important 
in digital libraries and text retrieval systems. 
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Problem Definition 


The problem of lossless data compression is the 
problem of compactly representing data in a for- 
mat that admits the faithful recovery of the orig- 
inal information. Lossless data compression is 
achieved by taking advantage of the redundancy 
which is often present in the data generated by 
either humans or machines. 

Dictionary-based data compression has been 
“the solution” to the problem of lossless data 
compression for nearly 15 years. This technique 
originated in two theoretical papers of Ziv 
and Lempel [15, 16] and gained popularity 
in the “1980s” with the introduction of the 
Unix tool compress (1986) and of the gif 
image format (1987). Although today there 
are alternative solutions to the problem of 
lossless data compression (e.g., Burrows- 
Wheeler compression and Prediction by Partial 
Matching), dictionary-based compression is still 
widely used in everyday applications: consider 
for example the zip utility and its variants, 
the modem compression standards V.42bis 
and V.44, and the transparent compression 
of pdf documents. The main reason for the 
success of dictionary-based compression is its 
unique combination of compression power and 
compression/decompression speed. The reader 
should refer to [13] for a review of several 
dictionary-based compression algorithms and 
of their main features. 


Key Results 


Let T be a string drawn from an alphabet ». 
Dictionary-based compression algorithms work 
by parsing the input into a sequence of substrings 
(also called words) T;, T2,..., Tq and by encod- 
ing a compact representation of these substrings. 
The parsing is usually done incrementally and 
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on-line with the following iterative procedure. 
Assume the encoder has already parsed the sub- 
strings 71, T2,..., Tj-1. To proceed, the encoder 
maintains a dictionary of potential candidates 
for the next word 7; and associates a unique 
codeword with each of them. Then, it looks at 
the incoming data, selects one of the candidates, 
and emits the corresponding codeword. Different 
algorithms use different strategies for establish- 
ing which words are in the dictionary and for 
choosing the next word 7;. A larger dictionary 
implies a greater flexibility for the choice of the 
next word, but also longer codewords. Note that 
for efficiency reasons the dictionary is usually not 
built explicitly: the whole process is carried out 
implicitly using appropriate data structures. 

Dictionary-based algorithms are usually clas- 
sified into two families whose respective ances- 
tors are two parsing strategies, both proposed by 
Ziv and Lempel and today universally known as 
LZ78 [16] and LZ77 [15]. 


The LZ78 Algorithm 

Assume the encoder has already parsed the words 
T1,T2,...,Ti—1, that is, T = TyT2--+T;-1T; 
for some text suffix 7). The LZ78 dictionary 
is defined as the set of strings obtained by 
adding a single character to one of the words 
T,,...,7;-1; or to the empty word. The next 
word 7; is defined as the longest prefix of 7; 
which is a dictionary word. For example, for 
T = aabbaaabaabaabba the LZ78 parsing is: 
a, ab, b, aa, aba, abaa, bb, a. It is easy to see 
that all words in the parsing are distinct, with 
the possible exception of the last one (in the 
example the word a). Let Ty denote the empty 
word. If 7; = Tja, with O < j <i andae 2, 
the codeword emitted by LZ78 for T; will be 
the pair (j, «). Thus, if LZ78 parses the string 
T into t words, its output will be bounded by 
tlogt + tlog || + O(f) bits. 


The LZ77 Algorithm 

Assume the encoder has already parsed the words 
Ty, Veen, Tiegy that 1s TS Tie Th 
for some text suffix 7;. The LZ77 dictionary 
is defined as the set of strings of the form wa 
where a € » and w is a substring of T starting 
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in the already parsed portion of 7. The next 
word 7; is defined as the longest prefix of T; 
which is a dictionary word. For example, for 
T = aabbaaabaabaabba the LZ77 parsing is: 
a, ab, ba, aaba, abaabb, a. Note that, in some 
sense, 75 = abaabb is defined in terms of itself: 
it is a copy of the dictionary word wa with w 
starting at the second a of T4 and extending into 
Ts! It is easy to see that all words in the parsing 
are distinct, with the possible exception of the 
last one (in the example the word a), and that 
the number of words in the LZ77 parsing is 
smaller than in the LZ78 parsing. If 7; = wa 
with a € 2, the codeword for 7; is the triplet 
(s;,£;,a@) where s; is the distance from the start 
of T; to the last occurrence of w in 7) 7>--- 7;-1, 
and £; = |w|. 


Entropy Bounds 

The performance of dictionary-based com- 
pressors has been extensively investigated 
since their introduction. In [15] it is shown 
that LZ77 is optimal for a certain family of 
sources, and in [16] it is shown that LZ78 
achieves asymptotically the best compression 
ratio attainable by a finite-state compressor. This 
implies that, when the input string is generated 
by an ergodic source, the compression ratio 
achieved by LZ78 approaches the entropy of 
the source. More recent work has established 
similar results for other Ziv-Lempel compressors 
and has investigated the rate of convergence of 
the compression ratio to the entropy of the source 
(see [14] and references therein). 

It is possible to prove compression bounds 
without probabilistic assumptions on the input, 
using the notion of empirical entropy. For any 
string 7, the order k empirical entropy H;(T) is 
the maximum compression one can achieve using 
a uniquely decodable code in which the codeword 
for each character may depend on the k charac- 
ters immediately preceding it [6]. The following 
lemma is a useful tool for establishing upper 
bounds on the compression ratio of dictionary- 
based algorithms which hold pointwise on every 
string T. 
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Lemma 1 ([6, Lemma 2.3]) Let T = T,T>2---Tq 
be a parsing of T such that each word T; appears 
at most M times. Then, for any k = 0 
dlogd <|T|A;(T) +d log(|T|/d) 
+dlogM + O(kd + d), 


where H,(T) whereis the k-th order empirical 
entropy of T. Oo 


Consider, for example, the algorithm LZ78. It 
parses the input T into ¢ distinct words (ignoring 
the last word in the parsing) and produces an 
output bounded by tlogt +tlog|'|+ O(¢) 
bits. Using Lemma 1 and the fact that 
t = O(|T|/logT), one can prove that LZ78’s 
output is at most |7|H,(T) + o0(|T]) bits. Note 
that the bound holds for any k > 0: this means 
that LZ78 is essentially “as powerful” as any 
compressor that encodes the next character on 
the basis of a finite context. 


Algorithmic Issues 

One of the reasons for the popularity of 
dictionary-based compressors is that they admit 
linear-time, space-efficient implementations. 
These implementations sometimes require non- 
trivial data structures: the reader is referred 
to [12] and references therein for further reading 
on this topic. 


Greedy vs. Non-Greedy Parsing 

Both LZ78 and LZ77 use a greedy parsing strat- 
egy in the sense that, at each step, they select the 
longest prefix of the unparsed portion which is 
in the dictionary. It is easy to see that for LZ77 
the greedy strategy yields an optimal parsing; 
that is, a parsing with the minimum number of 
words. Conversely, greedy parsing is not optimal 
for LZ78: for any sufficiently large integer m 
there exists a string that can be parsed to O(m) 
words and that the greedy strategy parses in 
92(m?/?) words. In [9] the authors describe an ef- 
ficient algorithm for computing an optimal pars- 
ing for the LZ78 dictionary and, indeed, for any 
dictionary with the prefix-completeness property 
(a dictionary is prefix-complete if any prefix 
of a dictionary word is also in the dictionary). 
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Interestingly, the algorithm in [9] is a one-step 
lookahead greedy algorithm: rather than choosing 
the longest possible prefix of the unparsed portion 
of the text, it chooses the prefix that results in the 
longest advancement in the next iteration. 


Applications 


The natural application field of dictionary-based 
compressors is lossless data compression (see, for 
example [13]). However, because of their deep 
mathematical properties, the Ziv-Lempel parsing 
rules have also found applications in other algo- 
rithmic domains. 


Prefetching 

Krishnan and Vitter [7] considered the problem 
of prefetching pages from disk into memory to 
anticipate users’ requests. They combined LZ78 
with a pre-existing prefetcher P; that is asymp- 
totically at least as good as the best memoryless 
prefetcher, to obtain a new algorithm P that is 
asymptotically at least as good as the best finite- 
state prefetcher. LZ78’s dictionary can be viewed 
as a trie: parsing a string means starting at the 
root, descending one level for each character in 
the parsed string and, finally, adding a new leaf. 
Algorithm P runs LZ78 on the string of page 
requests as it receives them, and keeps a copy of 
the simple prefetcher P; for each node in the trie; 
at each step, P prefetches the page requested by 
the copy of P; associated with the node LZ78 is 
currently visiting. 


String Alignment 

Crochemore, Landau and Ziv-Ukelson [4] 
applied LZ78 to the problem of sequence 
alignment, i.e., finding the cheapest sequence of 
character insertions, deletions and substitutions 
that transforms one string T into another 7’ 
(the cost of an operation may depend on the 
character or characters involved). Assume, for 
simplicity, that |7| = |T’| =n. In 1980 Masek 
and Paterson proposed an O(n?/logn)-time 
algorithm with the restriction that the costs be 
rational; Crochemore et al.’s algorithm allows 
real-valued costs, has the same asymptotic cost 
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in the worst case, and is asymptotically faster for 
compressible texts. 

The idea behind both algorithms is to break 
into blocks the matrix A[1...”,1...n] used by 
the obvious O(n *)-time dynamic programming 
algorithm. Masek and Paterson break it into 
uniform-sized blocks, whereas Crochemore 
et al. break it according to the LZ78 parsing 
of T and 7’. The rationale is that, by the 
nature of LZ78 parsing, whenever they come 
to solve a block Afi...i’,j...j’], they can 
solve it in O(i’—i+j'—j) time because 
they have already solved blocks identical to 
Afi...i/-1,j...jJand Afi...i 7... 7-1] 
[8]. Lifshits, Mozes, Weimann and Ziv- 
Ukelson [8] recently used a similar approach 
to speed up the decoding and training of hidden 
Markov models. 


Compressed Full-Text Indexing 

Given a text T, the problem of compressed full- 
text indexing is defined as the task of building an 
index for T that takes space proportional to the en- 
tropy of 7 and that supports the efficient retrieval 
of the occurrences of any pattern P in T. In [10] 
Navarro proposed a compressed full-text index 
based on the LZ78 dictionary. The basic idea is 
to keep two copies of the dictionary as tries: one 
storing the dictionary words, the other storing 
their reversal. The rationale behind this scheme 
is the following. Since any non-empty prefix of 
a dictionary word is also in the dictionary, if the 
sought pattern P occurs within a dictionary word, 
then P is a suffix of some word and easy to find 
in the second dictionary. If P overlaps two words, 
then some prefix of P is a suffix of the first word— 
and easy to find in the second dictionary—and the 
remainder of P is a prefix of the second word—and 
easy to find in the first dictionary. The case when 
P overlaps three or more words is a generalization 
of the case with two words. Recently, Arroyuelo 
et al. [1] improved the original data structure 
in [10]. For any text 7, the improved index uses 
(2+ €)|T|A;,(T) + o(|T| log | ’|) bits of space, 
where H;(T) is the k-th order empirical entropy 
of T, and reports all occ occurrences of P in T in 
O(|P|7 log|P| + (|P| + occ) log |T|) time. 
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Independently of [10], in [5] the LZ78 parsing 
was used together with the Burrows-Wheeler 
compression algorithm to design the first full- 
text index that uses 0(|7|log|T|) bits of space 
and reports the occ occurrences of P in T in 
O(|P|+ occ) time. If T = T,;T2---Ty is the 
LZ78 parsing of T, in [5] the authors consider the 
string Ts = T;$72$---$Tz$ where $ is a new 
character not belonging to &. The string Ts is 
then compressed using the Burrows-Wheeler 
transform. The $’s play the role of anchor points: 
their positions in Ts are stored explicitly so that, 
to determine the position in T of any occurrence 
of P, it suffices to determine the position with 
respect to any of the $’s. The properties of 
the LZ78 parsing ensure that the overhead of 
introducing the $’s is small, but at the same time 
the way they are distributed within Ts guarantees 
the efficient location of the pattern occurrences. 

Related to the problem of compressed full-text 
indexing is the compressed matching problem 
in which text and pattern are given together (so 
the former cannot be preprocessed). Here the 
task consists in performing string matching in 
a compressed text without decompressing it. For 
dictionary-based compressors this problem was 
first raised in 1994 by A. Amir, G. Benson, 
and M. Farach, and has received considerable 
attention since then. The reader is referred to [11] 
for a recent review of the many theoretical and 
practical results obtained on this topic. 


Substring Compression Problems 

Substring compression problems involve 
preprocessing T to be able to efficiently answer 
queries about compressing substrings: e.g., how 
compressible is a given substring s in JT? what 
is s’s compressed representation? or, what is the 
least compressible substring of a given length £? 
These are important problems in bioinformatics 
because the compressibility of a DNA sequence 
may give hints as to its function, and because 
some clustering algorithms use compressibility 
to measure similarity. The solutions to these 
problems are often trivial for simple compressors, 
such as Huffman coding or run-length encoding, 
but they are open for more powerful algorithms, 
such as dictionary-based compressors, BWT 
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compressors, and PPM compressors. Recently, 
Cormode and Muthukrishnan [3] gave some 
preliminary solutions for LZ77. For any string s, 
let C(s) denote the number of words in the LZ77- 
parsing of s, and let LZ77(s) denote the LZ77- 
compressed representation of s. In [3] the authors 
show that, with O(|T| polylog(|T7]|)) time prepro- 
cessing, for any substring s of T they can: a) 
compute LZ77(s) in O(C(s) log |T | log log |T'|) 
time, b) compute an approximation of C(s) within 
a factor O(log |T | log* |T |) in O(1) time, c) find 
a substring of length £ that is close to being 
the least compressible in O(|T|€/ log £) time. 
These bounds also apply to general versions 
of these problems, in which queries specify 
another substring f in T as context and ask about 
compressing substrings when LZ77 starts with 
a dictionary already containing the words in the 
LZ77 parsing of f. 


Grammar Generation 

Charikar et al. [2] considered LZ78 as an ap- 
proximation algorithm for the NP-hard problem 
of finding the smallest context-free grammar that 
generates only the string T. The LZ78 parsing of 
T can be viewed as a context-free grammar in 
which for each dictionary word T; = Ta there is 
a production X; — Xj;a. For example, for T = 
aabbaaabaabaabba the LZ78 parsing is: a, ab, 
b, aa, aba, abaa, bb, a, and the corresponding 
grammar is: S > X,...X7X1,X, ~ a,X2 > 
Xb, X3 —> b,X4 > X 1a, X5 —> Xa, X6 —> 
X5a, X7 — X3b. Charikar et al. showed LZ78’s 
approximation ratio is in O((\T|/ log |T|)?/7) N 
Q(|T|?/3 log|T|); i.e., the grammar it produces 
has size at most f(|T|)-m*, where f(|T|) is 
a function in this intersection and m * is the size 
of the smallest grammar. They also showed m * is 
at least the number of words output by LZ77 on 
T, and used LZ77 as the basis of a new algorithm 
with approximation ratio O(log(|T|/m*)). 


URL to Code 


The source code of the gzip tool (based on LZ77) 
is available at the page http://www.gzip.org/. An 
LZ77-based compression library zlib is available 
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from http://www.zlib.net/, A more recent, and 
more efficient, dictionary-based compressor is 
LZMA (Lempel—Ziv Markov chain Algorithm), 
whose source code is available from http://www. 
7-zip.org/sdk.html. 
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Problem Definition 


Many datasets can be represented by graphs, 
where nodes correspond to individuals and 
edges capture relationships between them. On 
one hand, such datasets contain potentially 
sensitive information about individuals; on the 
other hand, there are significant public benefits 
from allowing access to aggregate information 
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about the data. Thus, analysts working with 
such graphs are faced with two conflicting 
goals: protecting privacy of individuals and 
publishing accurate aggregate statistics. This 
article describes algorithms for releasing accurate 
graph statistics while preserving a rigorous 
notion of privacy, called differential privacy. 

Differential privacy was introduced by Dwork 
et al. [6]. It puts a restriction on the algorithm that 
processes sensitive data and publishes the output. 
Intuitively, differential privacy requires that, for 
every individual, the output distribution of the 
algorithm is roughly the same whether or not this 
individual’s data is present in the dataset. Next, 
we give a formal definition of differential privacy, 
specialized to datasets represented by graphs. 

Two graphs are called neighbors if one can be 
obtained from the other by removing a node and 
its adjacent edges. Given a parameter € > 0, an 
algorithm A is €-node differentially private if for 
all neighbor graphs G and G’ and for all sets S of 
possible outputs produced by A: 


Pr[A(G) € S] < e€ - Pr[A(G) € S]. 


This variant of differential privacy is called 
node-differential privacy because neighbor 
graphs are defined with respect to node removals. 
Analogously, we can define edge differential 
privacy by letting graphs be neighbors if 
they differ in exactly one edge. Intuitively, 
edge differential privacy protects edges (which 
represent connections between people), while 
node-differential privacy protects nodes together 
with their adjacent edges (i.e., all information 
pertaining to individuals). Node-differential 
privacy is a stronger privacy definition, but it 
is much harder to attain because it requires the 
output distribution of the algorithm to hide much 
larger differences in the input graph. 

We would like to design differentially pri- 
vate algorithms (preferably, node-differentially 
private) that compute accurate graph statistics 
on a large family of realistic graphs. Typically, 
graphs that contain sensitive information, such 
as friendships, sexual relationships, and com- 
munication patterns, are sparse. Some examples 
of graph statistics we would like to compute 
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on these graphs are the number of edges, small 
subgraph counts, and the degree distribution. 

Most work on the topic considers an analyst 
who wants to evaluate a real-valued function f 
on the private input graph G (e.g., the number 
of triangles or the number of connected com- 
ponents in G). The goal is to release as good 
an approximation as possible to the true value 
(G). Differentially private algorithms must be 
randomized, so we try to minimize the expecta- 
tion of the random variable error 4(G) = |A(G)— 
F(G)|. We will also discuss work on algorithms 
that release higher-dimensional summaries (i.e., 
output a real vector). 


Bibliographical Notes 

Edge privacy was first studied by Nissim 
et al. [16], and the distinction between node and 
edge privacy was laid out by Hay et al. [9]. Edge 
differentially private algorithms for a variety of 
tasks have been widely investigated. Examples 
include subgraph counts, degree distributions, 
and parameters of generative statistical models. 
Gehrke et al. [7] investigated a notion whose 
strength lies between edge and node privacy: 
node privacy for bounded-degree graphs. (The 
focus of their work is a generalization of 
differential privacy, called zero-knowledge 
privacy.) 

Until recently, no node-differentially private 
algorithms (where privacy guarantees hold with 
respect to all graphs) were known that compute 
accurate graph statistics on realistic (namely, 
sparse) graphs. The first such algorithms were 
designed independently by Blocki et al. [3], 
Kasiviswanathan et al. [11], and Chen and 
Zhou [5]. Those algorithms look at releasing 
one real-valued statistic at a time. Two more 
recent works focus on higher-dimensional node- 
private releases: Raskhodnikova and Smith [17] 
and Borgs et al. [4]. 

This encyclopedia entry focuses on node- 
differentially private algorithms, since these 
offer the strongest privacy guarantees. Progress, 
however, continues on edge-private algorithms; 
see Lin and Kifer [13], Karwa and Slavkovic [10], 
Lu and Miklau [14], and Zhang et al. [18] for 
recent results. 
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Key Results 


The main difficulty in the design of node-private 
algorithms is that techniques based on local 
sensitivity of a function (which are the basis of 
the best edge-private algorithms) yield node- 
private algorithms whose error on “typical” 
inputs swamps the statistic that one wants to 
release. The local sensitivity of a function f 
is a discrete analogue of the derivative of f 
— it measures how much the value of f can 
change when the input graph is replaced with its 
neighbor. On sparse graphs, the local sensitivity 
can be larger than the value of the function. Any 
method whose error is proportional to the local 
sensitivity will have large relative error. 


Focus on a “Preferred Subset” 

To get around the challenge of high local sensi- 
tivity, two works [3, 11] independently designed 
algorithms that are given a set S of “nice” graphs 
that hopefully contains G (e.g., graphs with an 
upper bound on the maximum degree). These 
algorithms are private on all graphs and return an 
accurate answer on graphs in S. What makes this 
approach work is that S is selected so that the 
sensitivity of f is small when restricted to inputs 
in S. 

Let G denote the set of all labeled, undirected 
graphs. We will call S C G the “preferred” 
subset. Define the Lipschitz constant (also called 
the restricted sensitivity) of f on S to be 


I f(G') — f(@lh 
dnode(G, G’) , 


Ar(S) = sup 


G,G’eS 


where dnode is the node distance between two 
graphs — the number of vertex insertions and 
deletions needed to go from G to G’. Blocki 
et al. [3] and Kasiviswanathan et al. [11] give 
methods for adding noise proportional to the 
Lipschitz constant of f on S. 


Theorem 1 ((3,11]) For every S C G, function 
f: SR ande > 0, there exists an algorithm 
As that is €-differentially private (for all inputs) 
and such that, for allG € S, 
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3|As(G) — f(G)| = O(As(f)/e?). 


Moreover, for S = Gp (the set of D-bounded 
graphs), the running time of A is the running time 
for one evaluation of f plus a fixed polynomial in 
the size of G. 


The same works [3, 11] also give generic re- 
ductions showing that given any algorithm that is 
€-differentially private when restricted to graphs 
in S, one can design an algorithm A that has simi- 
lar behavior on graphs in S but is ¢’-differentially 
private for all inputs, for <’ not too much larger 
than e. 


“Down” Sensitivity 

Rather than focusing on a single “nice” subset, 
some works [5, 17] sought to add noise propor- 
tional to a quantity related to, but usually much 
smaller than, the local sensitivity. 

Define the down sensitivity (called empirical 
global sensitivity when first defined by Chen 
and Zhou [5]) of f at a graph G to be the 
Lipschitz constant of f when restricted to the set 
of induced subgraphs of G. Specifically, we write 
G <= H to denote that G is an induced subgraph 
of H (i.e., G can be obtained by deleting a set of 
vertices from H) and define the down sensitivity 
to be 


DS 7 (G) = max 
A, H’'neighbors, H <x H’< 


(W@)-S@. 


By carefully (and privately) selecting the “‘pre- 
ferred” subset based on the input, one can add 
noise essentially proportional to the down sensi- 
tivity. 


Theorem 2 ({17]) For every monotone function 
f :G— Rande > 0, there is an algorithm A ¢ 
that is €-differentially private and such that, for 
allG €G, 


R|A ¢(G) — f(G)| 


DS ¢(G 1 
Pe peeves max DS ¢(G')). 
c / 
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Moreover, Ay can be made efficient when f is 
a generalized linear query (a class that includes 
counting occurrences of a fixed subgraph). 


The down sensitivity is low for many com- 
monly studied statistics in graphs that satisfy a- 
decay, a condition on the degree distribution that 
is satisfied by known generative models (includ- 
ing those that generate “‘scale-free’’). (See [11] for 
a definition of a-decay.) 


Lipschitz Extensions and 
Higher-Dimensional Releases 
The main technical tool in the down-sensitivity- 
based results [5, 17] is the construction of 
efficient (i.e., polynomial time computable) 
Lipschitz extensions of the function f from 
subsets S of graphs to the space of all graphs. 
Kasiviswanathan et al. [11] and Chen and 
Zhou [5] give efficient Lipschitz extensions of 
several useful functions (including graph counts) 
that return a single real value. Raskhodnikova and 
Smith [17] give efficient Lipschitz extensions of 
higher-dimensional functions, namely, the degree 
distribution and adjacency matrix of a graph. 
Borgs et al. [4] use the Lipschitz extension 
technique together with the exponential 
mechanism to provide the first node-differentially 
private algorithms for fitting high-dimensional 
statistical models to a given graph (specifically, 
they consider stochastic block models and 
generalizations thereof). 


Applications 


The algorithms discussed above address a real 
problem: datasets containing sensitive informa- 
tion about relationships among a collection of 
individuals are often valuable sources of informa- 
tion, but publishing useful summaries about such 
data without leaking individual information is 
difficult. Even when the graphs are “anonymized” 
by removing all obviously identifying informa- 
tion, such as names, addresses, birthdays, and zip 
codes, they present a privacy risk. For example, 
[1, 15] give de-anonymization attacks based only 
on unlabeled links. Node-differentially private 
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algorithms offer a principled method for releasing 
information about a network while providing rig- 
orous privacy guarantees (though some authors 
argue that even stronger notions may be needed 
[7, 12]). 


Open Problems 


Gupta et al. [8] and Blocki et al. [2] give edge 
differentially private algorithms for releasing a 
data structure that approximates the sizes of all 
cuts in the input graph in the following sense: for 
any cut, with high probability, the estimated cut 
size is accurate (the first reference gives weaker 
approximation guarantees with a stronger quanti- 
fier order: with high probability, all cut sizes are 
accurate). It is open whether a node-differentially 
private algorithm can obtain similar results. 

For datasets that do not contain information 
about relationships, but only contain personal at- 
tributes that come from a relatively small set, dif- 
ferentially private algorithms can output a large 
number of statistics at once (see » Query Re- 
lease via Online Learning and » Geometric Ap- 
proaches to Answering Queries cross-referenced 
below). It is open how to do achieve similar 
results for graph statistics, even with edge differ- 
ential privacy. 

Finally, all algorithms we discussed release 
numerical graph statistics. The subject of differ- 
entially private synthetic graphs is largely unex- 
plored. See [10, 13] for initial results. 
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Problem Definition 


Notations 

Let G = (V, E) be a plane geometric network, 
whose vertex set V is a finite set of point sites in 
IR, connected by an edge set E of non-crossing 
straight line segments with endpoints in V. For 
two points p # qg € V, let Eg(p,q) denote a 
shortest path from p to gq in G. Then 


_ lke. 
o(P.4) = Ipq| 
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is the detour one encounters when using network 
G, in order to get from p to q, instead of walking 
straight. Here, | . | denotes the Euclidean length. 
The dilation of G is defined by 
o(G):= Max pAgeV o(p,q). (2) 
This value is also known as the spanning ratio 
or the stretch factor of G. It should, however, 
not be confused with the geometric dilation of a 
network, where the points on the edges are also 
being considered, in addition to the vertices. 
Given a finite set S of points in the plane, 
one would like to find a plane geometric network 
G = (V,£E) whose dilation o(G) is as small 
as possible, such that S is contained in V. The 
value of 


x(S) : = inf{o(G); G = (V, E) finite plane 
geometric network where S Cc V} 


is called the dilation of point set S. The problem 
is in computing, or bounding, &(S) for a given 
set S. 


Related Work 


If edge crossings were allowed, one could use 
spanners whose stretch can be made arbitrarily 
close to 1; see the monographs by Eppstein [6] 
or Narasimhan and Smid [12]. Different types 
of triangulations of S are known to have their 
stretch factors bounded from above by small con- 
stants, among them the Delaunay triangulation 
of stretch <2.42; see Dobkin et al. [3], Keil and 
Gutwin [10], and Das and Joseph [2]. Eppstein 
[5] has characterized all triangulations 7 of dila- 
tion o(T) = 1; these triangulations are shown 
in Fig. 1. Trivially, &(S) = 1 holds for each 
point set S' contained in the vertex set of such a 
triangulation T. 


Key Results 


The previous remark’s converse also turns out to 
be true. 
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Dilation of Geometric 
Networks, Fig.1 The 
triangulations of dilation | 


Theorem 1 ({11]) Jf S is not contained in one of 
the vertex sets depicted in Fig. 1, then X(S) > 1. 


That is, if a point set S is not one of these 
special sets, then each plane network including 
S in its vertex set has a dilation larger than some 
lower bound 1 + n(S). The proof of Theorem 1 
uses the following density result. Suppose one 
connects each pair of points of S with a straight 
line segment. Let S’ be the union of S and the 
resulting crossing points. Now the same construc- 
tion is applied to S’ and repeated. For the limit 
point set S®, the following theorem holds. It 
generalizes work by Hillar and Rhea [8] and by 
Ismailescu and Radoicié [9] on the intersections 
of lines. 


Theorem 2 ({11]) Jf Sis not contained in one of 
the vertex sets depicted in Fig. 1, then S™ lies 
dense in some polygonal part of the plane. 


For certain infinite structures can concrete lower 
bounds be proven. 


Theorem 3 ([4]) Let Nbe an infinite plane net- 
work all of whose faces have a diameter bounded 
from above by some constant. Then o(N) > 
1.00156 holds. 


Theorem 4 ([4]) Let C denote the (infinite) set 
of all points on a closed convex curve. Then 


X(C) > 1.00157 holds. 


Theorem 5 ([4]) Given n families F;,2 <i < 
n, each consisting of infinitely many equidistant 
parallel lines. Suppose that these families are in 
general position. 


Then their intersection graph G is of dilation 
at least 2//3. 

The proof of Theorem 5 makes use of Kro- 
necker’s theorem on simultaneous approxima- 
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Dilation of Geometric Networks, Fig. 2. A network of 
dilation ~1.1247 


tion. The bound is attained by the packing of 
equiangular triangles. 

Finally, there is a general upper bound to the 
dilation of finite point sets. 


Theorem 6 ([4]) Each finite point set S is of 
dilation X(S) < 1.1247. 


To prove this upper bound, one can embed 
any given finite point set S in the vertex set of 
a scaled, and slightly deformed, finite part of 
the network depicted in Fig. 2. It results from a 
packing of equilateral triangles by replacing each 
vertex with a small triangle and by connecting 
neighboring triangles as indicated. 


Applications 


A typical university campus contains facilities 
like lecture halls, dorms, library, mensa, and 
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supermarkets, which are connected by some path 
system. Students in a hurry are tempted to walk 
straight across the lawn, if the shortcut seems 
worth it. After a while, this causes new paths to 
appear. Since their intersections are frequented by 
many people, they attract coffee shops or other 
new facilities. Now, people will walk across the 
lawn to get quickly to a coffee shop, and so on. 

D. Eppstein [5] has asked what happens to the 
lawn if this process continues. The above results 
show that (1) part of the lawn will be completely 
destroyed, and (2) the temptation to walk across 
the lawn cannot, in general, be made arbitrarily 
small by a clever path design. 


Open Problems 


For practical applications, upper bounds to the 
weight (= total edge length) of a geometric 
network would be valuable, in addition to upper 
dilation bounds. Some theoretical questions re- 
quire further investigation, too. Is U(S) always 
attained by a finite network? How to compute, 
or approximate, &(S') for a given finite set S? 
Even for a set as simple as S5, the corners of 
a regular 5-gon, is the dilation unknown. The 
smallest dilation value known, for a triangulation 
containing Ss; among its vertices, equals 1.0204; 
see Fig.3. Finally, what is the precise value of 
sup{ 4 (S); S finite}? 


Cross-References 
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Dilation of Geometric Networks, Fig. 3 The best 
known embedding for S5 
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Problem Definition 


The performance of a communication network 
is affected by the packet collisions which occur 
when two or more packets appear simultaneously 
in the same network node (router) and all these 
packets wish to follow the same outgoing link 
from the node. Since network links have limited 
available bandwidth, the collided packets wait on 
buffers until the collisions are resolved. Colli- 
sions cause delays in the packet delivery time 
and also contribute to the network performance 
degradation. 

Direct routing is a packet delivery method 
which avoids packet collisions in the network. 
In direct routing, after a packet is injected into 
the network it follows a path to its destination 
without colliding with other packets, and thus 
without delays due to buffering, until the packet 
is absorbed at its destination node. The only delay 
that a packet experiences is at the source node 
while it waits to be injected into the network. 

In order to formulate the direct routing prob- 
lem, the network is modeled as a graph where all 
the network nodes are synchronized with a com- 
mon time clock. Network links are bidirectional, 
and at each time step any link can be crossed by 
at most two packets, one packet in each direction. 
Given a set of packets, the routing time is defined 
to be the time duration between the first packet 
injection and the last packet absorbtion. 

Consider a set of N packets, where each packet 
has its own source and destination node. In the 
direct routing problem, the goal is first to find 
a set of paths for the packets in the network, 
and second, to find appropriate injection times 
for the packets, so that if the packets are in- 
jected at the prescribed times and follow their 
paths they will be delivered to their destinations 
without collisions. The direct scheduling problem 
is a variation of the above problem, where the 
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paths for the packets are given a priori, and the 
only task is to compute the injection times for 
the packets. 

A direct routing algorithm solves the direct 
routing problem (similarly, a direct scheduling 
algorithm solves the direct scheduling problem). 
The objective of any direct algorithm is to mini- 
mize the routing time for the packets. Typically, 
direct algorithms are offline, that is, the paths 
and the injection schedule are computed ahead 
of time, before the packets are injected into the 
network, since the involved computation requires 
knowledge about all packets in order to guarantee 
the absence of collisions between them. 


Key Results 


Busch, Magdon-Ismail, Mavronicolas, and Spi- 
rakis, present in [6] a comprehensive study of 
direct algorithms. They study several aspects of 
direct routing such as the computational com- 
plexity of direct problems and also the design of 
efficient direct algorithms. The main results of 
their work are described below. 


Hardness of Direct Routing 

It is shown in [Sect. 4 in 6] that the optimal 
direct scheduling problem, where the paths are 
given and the objective is to compute an optimal 
injection schedule (that minimizes the routing 
time) is an NP-complete problem. This result 
is obtained with a reduction from vertex color- 
ing, where vertex coloring problems are trans- 
formed to appropriate direct scheduling problems 
in a 2-dimensional grid. In addition, it is shown 
in [6] that approximations to the direct scheduling 
problem are as hard to obtain as approximations 
to vertex coloring. A natural question is what 
kinds of approximations can be obtained in poly- 
nomial time. This question is explored in [6] for 
general and specific kinds of graphs, as described 
below. 


Direct Routing in General Graphs 

A direct algorithm is given in [Section 3 
in 6] that solves approximately the optimal 
direct scheduling problem in general network 
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topologies. Suppose that a set of packets and 
respective paths are given. The injection schedule 
is computed in polynomial time with respect to 
the size of the graph and the number of packets. 
The routing time is measured with respect to the 
congestion C of the packet paths (the maximum 
number of paths that use an edge), and the 
dilation D (the maximum length of any path). 

The result in [6] establishes the existence of 
a simple greedy direct scheduling algorithm with 
routing time rt = O(C - D). In this algorithm, 
the packets are processed in an arbitrary order 
and each packet is assigned the smallest avail- 
able injection time. The resulting routing time is 
worst-case optimal, since there exist instances of 
direct scheduling problems for which no direct 
scheduling algorithm can achieve a better routing 
time. A trivial lower bound on the routing time 
of any direct scheduling problem is 2(C + D), 
since no algorithm can deliver the packets faster 
than the congestion or dilation of the paths. Thus, 
in the general case, the algorithm in [6] has 
routing time rf = O((rt*)*), where rt* is the 
optimal routing time. 


Direct Routing in Specific Graphs 

Several direct algorithms are presented in [6] for 
specialized network topologies. The algorithms 
solve the direct routing problem where first good 
paths are constructed and then an efficient injec- 
tion schedule is computed. Given a set of packets, 
let C’ and D* denote the optimal congestion and 
dilation, respectively, for all possible sets of paths 
for the packets. Clearly, the optimal routing time 
is rt* = Q(C* + D*). The upper bounds in the 
direct algorithm in [6] are expressed in terms of 
this lower bound. All the algorithms run in time 
polynomial to the size of the input. 


Tree 

The graph G is an arbitrary tree. A direct routing 
algorithm is given in [Section 3.1 in 6], where 
each packet follows the shortest path from its 
source to the destination. The injection sched- 
ule is obtained using the greedy algorithm with 
a particular ordering of the packets. The routing 
time of the algorithm is asymptotically optimal: 
rt <2C* + D* —2 <3-rt*. 
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Mesh 

The graph G is a d-dimensional mesh (grid) 
with n nodes [10]. A direct routing algorithm 
is proposed in [Section 3.2 in 6], which first 
constructs efficient paths for the packets with 
congestion C = O(dlogn-C*) and dilation 
D = O(d?- D*) (the congestion is guaranteed 
with high probability). Then, using these paths 
the injection schedule is computed giving a direct 
algorithm with the routing time: 


rt = O(d? log?n-C* + d?- D*) 
= O(d? log” n-rt*). 


This result follows from a more general result 
which is shown in [6], that says that if the paths 
contain at most b “bends”, i1.e., at most b dimen- 
sion changes, then there is a direct scheduling 
algorithm with routing time O(b- C + D). The 
result follows because the constructed paths have 
b = O(d logn) bends. 


Butterfly 

The graph G is a butterfly network with n input 
and n output nodes [10]. In [Section 3.3 in 6] the 
authors examine permutation routing problems 
in the butterfly, where each input (output) node 
is the source (destination) of exactly one packet. 
An efficient direct routing algorithm is presented 
in [6] which first computes good paths for the 
packets using Valiant’s method [14, 15]: two 
butterflies are connected back to back, and 
each path is formed by choosing a random 
intermediate node in the output of the first 
butterfly. The chosen paths have congestion 
C = O(lgn) (with high probability) and dilation 
D =2\lgn = O(D*). Given the paths, there is 
a direct schedule with routing time very close to 
optimal: rt < 5lgn = O(rt*). 


Hypercube 

The graph G is a hypercube with n nodes [10]. 
A direct routing algorithm is given in [Section 3.4 
in 6] for permutation routing problems. The algo- 
rithm first computes good paths for the packets 
by selecting a single random intermediate node 
for each packet. Then an appropriate injection 
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schedule gives routing time rt < 141gn, whichis 
worst-case optimal since there exist permutations 
for which D* = Q(lgn). 


Lower Bound for Buffering 

In [Section 5 in 6] an additional problem has 
been studied about the amount of buffering 
required to provide small routing times. It is 
shown in [6] that there is a direct scheduling 
problem for which every direct algorithm 
requires routing time (2(C-D); at the same 
time, C+D=O0(VC-D)=0(C-D). If 
buffering of packets is allowed, then it is 
well known that there exist packet scheduling 
algorithms [11, 12] with routing time very 
close to the optimal O(C + D). In [6] it is 
shown that for the particular packet problem, 
in order to convert a direct injection schedule of 
routing time O(C - D) to a packet schedule with 
routing time O(C + D), it is necessary to buffer 
packets in the network nodes in total Q(N4/>) 
times, where a packet buffering corresponds 
to keeping a packet in an intermediate node 
buffer for a time step, and N is the number of 
packets. 


Related Work 

The only previous work which specifically ad- 
dresses direct routing is for permutation problems 
on trees [3, 13]. In these papers, the resulting 
routing time is O(n) for any tree with n nodes. 
This is worst-case optimal, while the result in [6] 
is asymptotically optimal for all routing problems 
in trees. 

Cypher et al. [7] study an online version of 
direct routing in which a worm (packet of length 
L) can be re-transmitted if it is dropped (they also 
allow the links to have bandwidth B > 1). Adler 
et al. [1] study time constrained direct routing, 
where the task is to schedule as many packets 
as possible within a given time frame.They show 
that the time constrained version of the problem 
is NP-complete, and also study approximation 
algorithms on trees and meshes. Further, they 
discuss how much buffering could help in this 
setting. 

Other models of bufferless routing are match- 
ing routing [2] where packets move to their desti- 
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nations by swapping packets in adjacent nodes, 
and hot-potato routing [4, 5, 8, 9] in which 
packets follow links that bring them closer to the 
destination, and if they cannot move closer (due 
to collisions) they are deflected toward alternative 
directions. 


Applications 


Direct routing represent collision-free commu- 
nication protocols, in which packets spend the 
smallest amount of time possible time in the net- 
work once they are injected. This type of routing 
is appealing in power or resource constrained 
environments, such as optical networks, where 
packet buffering is expensive, or sensor networks 
where energy resources are limited. Direct rout- 
ing is also important for providing quality of ser- 
vice in networks. There exist applications where 
it is desirable to provide guarantees on the de- 
livery time of the packets after they are injected 
into the network, for example in streaming audio 
and video. Direct routing is suitable for such 
applications. 
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Problem Definition 


Let S = {51,52,...,5,} be a set of elements 
called objects and let C = {c1,C2,...,Cm} 
be a set of functions from S to {0,1} called 
characters. For each object s; € S and character 
cj € C, we say that s; has c; if c;(s;) = 1 or that 
Ss; does not have c; if c;(s;) = 0, respectively 
(in this sense, characters are binary). Then the 
set S and its relation to C can be naturally rep- 
resented by a matrix M of size (n x m) satisfying 
Mii, 7] = c;(s;) for every i € {1,2,...,m} and 
J € {1,2,...,m}. Such a matrix M is called a 
binary character state matrix. 

Next, for each s; € S, define the set Cs; = 
{cj €C : s; has c;}. A phylogeny for S is a tree 
whose leaves are bijectively labeled by S, and 
a directed perfect phylogeny for (S,C) (if one 
exists) is a rooted phylogeny T for S in which 
each c; € C is associated with exactly one edge 
of T in such a way that for any s; € S, the set 
of all characters associated with the edges on the 
path in T from the root to leaf s; is equal to Cs,. 
See Figs. 1 and 2 for two examples. 

Now, define the following problem. 


Problem 1 (The Directed Perfect Phylogeny 
Problem for Binary Characters) 


INPUT: An (n x m)-binary character state ma- 
trix M for some S and C. 

OutTPuT: A directed perfect phylogeny for 
(S, C), if one exists; otherwise, null. 


Key Results 


In the presentation below, define a set S, j for 
each c; € C by Se; = {s; € S : s; hasc;}. 
The next lemma is the key to solving the Directed 
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a 
M |, Cg C3 Cy Cy Cg C7 Cg 
55/0 0 11 101 0 
So/0 1 11 0 0 0 0 
53/1 000 01 0 1 
54/0 0 110 0 1 0 
55/1 0 00 0 0 0 0 
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Sy S4 


Directed Perfect Phylogeny (Binary Characters), Fig. 1 (a) A (5 x 8)-binary character state matrix M. (b) A 


directed perfect phylogeny for (S, C) 


Mic, & 
Ss, }1 0 
sy/1 1 
83/0 1 


Directed Perfect Phylogeny (Binary Characters), 
Fig. 2 This binary character state matrix admits no di- 
rected perfect phylogeny 


Perfect Phylogeny Problem for Binary Characters 
efficiently. It is also known in the literature as the 
pairwise compatibility theorem [5]. 


Lemma 1 There exists a directed perfect phy- 
logeny for (S,C) if and only if for each pair 
Cj,Ck € C, it holds that Se, O Se, = 9, Se; © 
Sex» OF Sex & Se;- 


Short constructive proofs of the lemma can be 
found in, e.g., [8] and [14]. An algebraic proof 
of a slightly more general version of the lemma 
was given earlier by Estabrook, Johnson, and 
McMorris [3, 4]. 

Using Lemma |, it is trivial to construct a 
top-down algorithm for the problem that runs 
in O(nm7) time. As one might expect, a faster 
algorithm is possible. Gusfield [7] observed that 
after sorting the columns of M in nonincreasing 
lexicographic order, all duplicate copies of a 
column appear in a consecutive block of columns 
and column j is to the right of column k if S,, is 
a proper subset of S;,, and then exploited these 
two facts together with Lemma | to obtain the 
following result: 


Theorem 1 ([7]) The Directed Perfect Phy- 
logeny Problem for Binary Characters can be 
solved in O(nm) time. 


For a description of the original algorithm and 
a proof of its correctness, see [7] or [14]. A 
conceptually simplified version of the algorithm 
based on keyword trees can be found in Chap- 
ter 17.3.4 in [8]. Gusfield [7] also gave an ad- 
versary argument to prove a corresponding lower 
bound of 2(nm) on the running time, showing 
that his algorithm is time optimal: 


Theorem 2 ([7]) Any algorithm that decides if 
a given binary character state matrix M admits 
a directed perfect phylogeny must, in the worst 
case, examine all entries of M. 


Agarwala, Fernandez-Baca, and Slutzki [1] 
noted that the input binary character state matrix 
is often sparse, i.e., in general, most of the objects 
will not have most of the characters. In addition, 
they noted that for the sparse case, it is more 
efficient to represent the input (S,C) by all the 
sets Se; for j € {1,2,...,m}, where each 
set Se; is defined as above and each Se; is spec- 
ified as a linked list, than by using a binary char- 
acter state matrix. Agarwala et al. [1] proved that 
with this alternative representation of S and C, 
the algorithm of Gusfield can be modified to run 
in time proportional to the total number of Is in 
the corresponding binary character state matrix: 


Theorem 3 ({1]) The variant of the Di- 
rected Perfect Phylogeny Problem for Binary 
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Characters in which the input is given as 
linked lists representing all the sets Sc, for 
J € {1,2,...,m} can be solved in O(h) time, 
where h = YY7_1 |Sc,;|- 


For a description of the algorithm, refer to [1] 
or [6]. Observe that Theorem 3 does not contra- 
dict Theorem 2; in fact, Gusfield’s lower bound 
argument for proving Theorem 2 considers an 
input matrix consisting mostly of Is. 

When only a portion of an (7 x m)-binary 
character state matrix is available, an O(nm)- 
time algorithm by Pe’er et al. [13] can fill in the 
missing entries with Os and Is so that the resulting 
matrix admits a directed perfect phylogeny, if 
possible. A ZDD-based algorithm for enumerat- 
ing all such solutions was recently developed by 
Kiyomi et al. [11]. 


Theorem 4 ((13]) The variant of the Directed 
Perfect Phylogeny Problem for Binary Charac- 
ters in which the input consists of an incomplete 
binary character state matrix can be solved in 
O(nm) time. 


Applications 


Directed perfect phylogenies for binary charac- 
ters are used to describe the evolutionary history 
for a set of objects (e.g., biological species) 
that share some observable traits and that have 
evolved from a “blank” ancestral object which 
has none of the traits. Intuitively, the root of a di- 
rected perfect phylogeny corresponds to the blank 
ancestral object, and each directed edge e = 
(u,v) corresponds to an evolutionary event in 
which the hypothesized ancestor represented by u 
gains the characters associated with e, transform- 
ing it into the hypothesized ancestor or object rep- 
resented by v. For simplicity, it may be assumed 
that each character can emerge once only during 
the evolutionary history and is never lost after it 
has been gained, so that a leaf s; is a descendant 
of the edge associated with a character c; if 
and only if s; has c;. When this requirement 
is too strict, one can relax it to permit errors, 
for example, by letting each character be asso- 
ciated with more than one edge in the phylogeny 
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(i.e., allow each character to emerge many times) 
while minimizing the total number of such asso- 
ciations (Camin-Sokal optimization) or by keep- 
ing the requirement that each character emerges 
only once but allowing it to be lost multiple 
times (Dollo parsimony) [5,6]. Such relaxations 
generally increase the computational complexity 
of the underlying computational problems; see, 
e.g., [2] and [15]. 

Binary characters are commonly used by biol- 
ogists and linguists. Traditionally, morphological 
traits or directly observable features of species 
were employed by biologists as binary characters, 
and recently, binary characters based on genomic 
information such as substrings in DNA or protein 
sequences, SNP markers, protein regulation data, 
and shared gaps in a given multiple alignment 
have become more and more prevalent. Chap- 
ter 17.3.2 in [8] mentions several examples where 
phylogenetic trees have been successfully con- 
structed based on such types of binary character 
data. In the context of reconstructing the evo- 
lutionary history of natural languages, linguists 
often use phonological and morphological char- 
acters with just two states [10]. 

The Directed Perfect Phylogeny Problem for 
Binary Characters is closely related to the Perfect 
Phylogeny Problem, a fundamental problem in 
computational evolutionary biology and phylo- 
genetic reconstruction [5, 6, 14]. This problem 
(also described in more detail in Encyclopedia 
entry > Perfect Phylogeny (Bounded Number of 
States)) introduces nonbinary characters so that 
each character c; € C has a set of allowed 
states {0,1,...,r; — 1} for some integer r;, 
and for each s; € S, character c; is in one 
of its allowed states. Generalizing the notation 
used above, define the set S, j for every a € 
{0,1,...,77 — I} by Scja = {5 € S 
the state of s; onc; is a}. Then, the objective of 
the Perfect Phylogeny Problem is to construct (if 
possible) an unrooted phylogeny T for S such 
that the following holds: for each c; € C and 
distinct states a, 8 of c;, the minimal subtree 
of T that connects S, jot and the minimal sub- 
tree of T that connects S,,,g are vertex-disjoint. 
McMorris [12] showed that the special case with 
rj = 2 for all c; € C can be reduced to the 
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Directed Perfect Phylogeny Problem for Binary 
Characters in O(nm) time: for each c; € C, if 
the number of Is in column 7 of M is greater 
than the number of Os, then set entry M[i, j] to 
1— M{i, j] for alli € {1,2,...,}. Therefore, 
another application of Gusfield’s algorithm [7] is 
as a subroutine for solving the Perfect Phylogeny 
Problem in O(nm) time when r; = 2 forall c; € 
C. Even more generally, the Perfect Phylogeny 
Problem for directed as well as undirected cladis- 
tic characters can be solved in polynomial time by 
a similar reduction to the Directed Perfect Phy- 
logeny Problem for Binary Characters (see [6]). 

In addition to the above, it is possible to 
apply Gusfield’s algorithm to determine whether 
two given trees describe compatible evolutionary 
history, and if so, merge them into a single tree 
so that no branching information is lost (see [7] 
for details). Finally, Gusfield’s algorithm has also 
been used by Hanisch, Zimmer, and Lengauer [9] 
to implement a particular operation on docu- 
ments defined in their Protein Markup Language 
(ProML) specification. 
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Problem Definition 


The problem is concerned about computing vir- 
tual coordinates for greedy routing in a wire- 
less ad hoc network. Consider a set of wireless 
nodes S densely deployed inside a geometric 
domain R C R?. Nodes within communication 
range can directly communicate with each other. 
We ask whether one can compute a set of vir- 
tual coordinates for S such that greedy routing 
has guaranteed delivery. In particular, each node 
forwards the message to the neighbor whose 
distance to the destination, computed under the 
virtual coordinates and some metric function d, 
is the smallest. If such a neighbor can always be 
found, greedy routing successfully delivers the 
message to the destination. The problem can be 
phrased as finding a greedy embedding of S in 
some geometric space, such that greedy routing 
always succeeds. 

In the setting of this entry, we assume that the 
nodes are a dense sample of the domain R such 
that the communication graph on S contains a 
triangulated mesh » as a discrete approximation 
of R. 


Key Results 


The key result is a family of distributed algo- 
rithms for computing the greedy embedding us- 
ing discrete Ricci flow. Given a triangular mesh 
> with vertex set V, edge set FE, and face set 
F, we can define a piecewise linear metric by 
the edge lengths on Y¥:/ : E — Rt that 
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satisfies the triangle inequality for each triangle 
face. The piecewise linear metric determines the 
corner angles of the triangles on 2’, by the cosine 
law. The discrete curvature Kj; at a vertex v; is 
defined as the angle deficit on the mesh. If v; 
is an interior vertex, K; = 27 — a 0;, where 
0;’s are the corner angles at v;. If v; is a vertex 
on the boundary, Ki = a — )°; 0;, where 6;’s 
are the corner angles at v;. Thus, the curvature at 
an interior vertex v; is 0 if the surface is flat at 
v;. The curvature at a boundary vertex v; is 0 if 
the boundary is locally a straight line at v; (see 
Fig. 1). The famous Gauss-Bonnet theorem states 
that the total curvature is a topological invariant: 
Dujev Ki = 20x(), where x(2) is the Euler 
characteristic number (The Euler characteristics 
number of a surface is 2 — 2g — h, where g is 
the genus or the number of handles and h is the 
number of holes.) of X’. Ricci flow is a process 
that deforms the surface metric to meet any target 
curvature that is admissible by the Gauss-Bonnet 
theorem. 

A conformal map in the continuous surface 
preserves the intersection angle of any two 
curves. In the discrete case, the “intersection 
angle” is defined using the circle packing 
metric [10, 11]. We place a circle at each vertex 
v; with radius y; such that for each edge e;;, 
the circles at v;,v; intersect or are tangent to 
each other. The intersection angle is denoted 
by ¢(e;;). The pair of vertex radii and the 
intersection angles on a mesh », (I, ®), are 
called a circle packing metric of X' (see Fig. 1). 
Two circle packing metrics (I, ®;) and (I>, £2) 
on the same mesh are conformal equivalent, if 
®@, = @>. Therefore, a conformal deformation of 
a circle packing metric only modifies the vertex 
radii y;’s and preserves the intersection angles. 
Note that the circle packing metric and the edge 
lengths (the piecewise linear metric) on one mesh 
can be converted to each other by using the cosine 
law. 

Now we are ready to introduce the discrete 
Ricci flow algorithm. Let u; be log y; for each 
vertex. Then the discrete Ricci flow, introduced 
in the work of [2], is defined as follows: du) = 
K; — K;, where K;, K; are the current and target 
curvature at vertex v;, respectively. Discrete Ricci 
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Discrete Ricci Flow for Geometric Routing, Fig. 1 
The circle packing metric 


flow can be formulated in the variational setting, 
namely, it is a negative gradient flow of some 
special energy form: f(u) = die pa e.< = 
K;)du;, where ug is an arbitrary initial met- 
ric and K is the prescribed target curvature. 
The integration above is well defined and called 
the Ricci energy. The discrete Ricci flow is the 
negative gradient flow of the discrete Ricci en- 
ergy. The discrete metric which induces K is 
the minimizer of the energy. Computing the de- 
sired circle packing metric with prescribed cur- 
vature K is equivalent to minimizing the dis- 
crete Ricci energy. The discrete Ricci energy 
is strictly convex (namely, its Hessian is posi- 
tive definite after a normalization). The global 
minimum uniquely exists, corresponding to the 
metric u, which induces K. The discrete Ricci 
flow converges to this global minimum and the 
convergence is exponentially fast [2], i.e., |K; - 
Ki(t)| < cye~°2', where c,,c2 are two posi- 
tive constants. This represents a centralized al- 
gorithm for computing the discrete Ricci flow on 
». In the following, we describe the distributed 
algorithm for different types of greedy routing 
scenarios. 


Discrete Ricci Flow Algorithm 

To apply discrete Ricci flow for greedy routing, 
we take a triangular mesh » as a subgraph from 
the communication graph. All non-triangular 
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faces are considered as network holes that will be 
mapped to circular holes in the embedding. All 
nodes not on hole boundaries have zero curvature 
under the mapping. Thus, the embedding is 
denoted as a circular domain. With the virtual 
coordinates and Euclidean distance metric, 
greedy routing guarantees delivery. (For a node 
in the interior of the triangulation, if the corner 
angle is greater than 27/3, we will adopt greedy 
routing on an edge that has provably guaranteed 
delivery.) 

In particular, we set all edge lengths to be 
initially 1, which determines the initial curvature 
at each node. In particular, we choose the circle 
packing metric by placing a circle of initial radius 
1/2 on each node. The circles at adjacent nodes 
are tangent to each other. Thus, the intersection 
angle is kept at 0. We now set the target curvature 
at interior nodes to be zero and at hole boundary 
nodes to be 22/k with k as the number of nodes 
on the hole boundary. The algorithms run in a 
gossip style. In each round, each node exchanges 
its radius with neighbors and computes its own 
Gaussian curvature. The algorithm stops when 
the current curvature is within error e from the 
specified target curvature. 

At each gossip round, node v; is associated 
with a disk with radius e“’, where uw; is a scalar 
value. The length of the edge connecting v; and 
v; equals to e“’ + e“/, The corner angles of 
each triangle can be estimated using cosine law 
by each node locally. That is, the angle 6! K in 
triangle [v;, v;, vx] is 


2 2 2 
glk — -1 Ui le lig 
i = COS —S 
ij lei 


The curvature k; at v; is 


a — > 4 6", v; € 0M 


l 


When the target curvature is not met, u; is mod- 
ified proportionally to the difference between the 
target curvature and the current curvature. 


uj = uj + 8(ki — ki) 
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Discrete Ricci Flow for Geometric Routing, Fig. 2 (a) A network of 7,000 nodes with many holes; (b) virtual 


coordinates 


Once the curvatures are computed, the trian- 
gulation is then flattened out by a simple flooding 
from a triangle root. Given three edge lengths of 
the root triangle [vo, v1, v2], the node coordinates 
can be constructed directly. Then the neighbor- 
ing triangle of the root, e.g., [v1, vo, vi], can be 
flattened; the virtual coordinates of v; are the 
intersection of two circles, one is centered at vg 
with radius /o; and the other is centered at v, 
with radius /;;. In a similar way, the neighbors 
of the newly flattened triangles can be further 
embedded. The virtual coordinates of the whole 
network are thus computed (Fig. 2). 


Discrete Hyperbolic Ricci Flow 

The key result in conformal geometry says that 
any surface with a Riemannian metric admits a 
Riemannian metric of constant Gaussian curva- 
ture, which is conformal to the original metric. 
Such metric is called the uniformization metric. 
Thus, depending on the surface topology, the 
uniformization metric has either positive con- 
stant, zero, or negative constant curvature every- 
where. Simply connected surfaces with constant 
curvature are only of three canonical types: the 
sphere (constant positive curvature everywhere), 
the Euclidean plane (zero curvature everywhere), 
and the hyperbolic plane (negative curvature ev- 
erywhere). Discrete Ricci flow is a powerful tool 
to compute the uniformization metric. 


In our setting, when the triangulation »’ has 
two or more holes, it has negative total curvature. 
Thus, its uniformization metric is hyperbolic. To 
actually embed the surface and realize the uni- 
formization metric, the holes in the network are 
cut open to get a simply connected triangulation 
T. Using discrete hyperbolic Ricci flow, we em- 
bed T in a convex region S in hyperbolic space. 
Each node is given a hyperbolic coordinate. Each 
edge uv has a length d(u,v) as the geodesic 
between u, v in the hyperbolic space. In this way, 
greedy routing with the hyperbolic metric (i.e., 
send the message to the neighbor closer to the 
destination measured by hyperbolic distance) has 
guaranteed delivery. 

The hyperbolic Ricci flow is very similar to the 
Euclidean version with a few modifications. First 
all metrics are hyperbolic. The edge length /;; of 
ei; is determined by the hyperbolic cosine law: 


cosh /;; = cosh y; cosh y;+sinh y; sinh y; cos $j. 
(1) 
Let u; = logtanh “; the discrete Ricci flow is 
defined as 
duj(t) _ 


dt (2) 


where K; is the discrete Gaussian curvature at 
vj. Once the hyperbolic metric is computed, we 
can embed the triangulation isometrically onto 
the Poincare disk. 


=K; 
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Generalized Discrete Surface Ricci Flow 

There are many schemes for discrete surface 
Ricci flow [14], including tangential circle pack- 
ing, Thurston’s circle packing, inversive distance 
circle packing, Yamabe flow, virtual radius circle 
packing, and mixed typed schemes. All of them 
can be unified as follows. The combinatorial 
structure of the triangulation is 2; it is with one 
of three background geometries: Euclidean E?, 
hyperbolic Hi?, and spherical S?. Each vertex is 
associated with a circle; the vertex radii function 
is y : V > RY. Each vertex is also associated 
with a constant €, which indicates the scheme. 
Each edge has a conformal structure coefficient 
n: E — R. Soa circle packing metric is given 
by (“’, y, 7, €). The discrete conformal factor is 
given by 


log yi Ee 
uj = 4 logtanh * , H? 
log tan e _S? 
The length of [v;, v;] is given by 
17, = 2nijpet tu; + €;erui 4 €j e2tj ; n2 
coshl;; = Anjjpeti t's +(1 +e; e241 (+e; ets) 4 
Uy (1—-e; 2% )-e; enki ) , 
cos]; = Ani; eli Tey +(1-«; e2"i Nie ei) 2 
ij = 


(1teje2"% (+e ;e7"/) 
The discrete Ricci flow is given by 


du;(t) 


= K; — K;(t), 
7 (t) 


where K : V —> R is the prescribed target 
curvature, which is the negative gradient flow of 
the discrete Ricci energy 


E(w) = [re — K;)du;. 


For the discrete surfaces with Euclidean back- 
ground geometry, the Ricci energy is convex on 
the space )°; uj; = 0. For those with hyperbolic 
background geometry, the energy is convex. For 
spherical case, the energy is indefinite. 
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For Yamabe scheme (where «€ = (Q), the 
combinatorial structure »' is Delaunay, if for each 
edge [v;, v;] share by two faces [v;, vj, vg] and 
[vj;, ui, vz], ok + a, < x. If during the Yamabe 
flow, the combinatorial structure can be updated 
to ensure the Delaunay condition, then for any 
K : V > (—ce, 27) satisfying the Gauss-Bonnet 
constraint > cy K(v) = 2y(2), the Yamabe 
flow with surgery can lead to the discrete metric 
that realizes the target curvature; the conver- 
gence is exponentially fast. This theorem implies 
the discrete uniformization theorem: any closed 
polyhedral surface admits a polyhedral metric 
discretely conformal to the original one, which 
induces constant Gaussian curvature everywhere 
[4,5] (Fig. 3). 


Applications 


The presented Ricci flow algorithms can be 
applied for a variety of routing primitives 
for large-scale wireless sensor networks 
with nonuniform node distribution. Besides 
guaranteed delivery [8], we can also achieve 
multiple additional desirable routing objectives, 
all derived from the unique property of a 
conformal mapping. For example, greedy routing 
on a circular domain may accumulate high 
traffic load on the interior hole boundaries. To 
alleviate that, we can reflect the network along 
a hole boundary using a Mobius transformation 
and map a copy of the network to cover the 
interior of the hole, recursively [9] (see Fig. 4). 
Routing on this covering space makes traffic load 
more balanced as hole boundaries essentially 
“disappear.” In another case, when there are 
sudden link or node failures, we can apply a 
Mobius transformation to generate a different 
circular domain, with the sizes and positions of 
the holes rearranged, on which greedy routing 
generates a different path [6]. Thus, quick 
recovery from a spontaneous failure is possible. 
The hyperbolic Ricci flow can be used to map 
the domain with the holes cut open to a convex 
polygon that can tile up the entire hyperbolic 
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Discrete Ricci Flow for Geometric Routing, Fig. 4 Three-level circular reflections and a routing path 


plane. This mapping supports greedy routing with 
specified “homotopy types,” i.e., routes that go 
around holes in different ways [13] (see Fig. 5). 
Hyperbolic embedding can be generalized to 
3D sensor networks with complex topology 
as in the case of monitoring underground 
tunnels [12]. Additional applications include 
generation of “space filling curves” for 
arbitrary domains [1], supporting greedy routing 
in mobile networks [7] and load balanced 
routing [3]. 


Open Problems 


Given a smooth surface S$ with a Riemannian 
metric g, the smooth Ricci flow leads to the uni- 
formization metric e?4g, where A is the smooth 
conformal factor. If the surface is tessellated to 
get a discrete surface Mo and discrete Ricci flow 
is performed on Mp, one obtains discrete confor- 
mal factor function ug. When M is subdivided 
by n times, the discrete conformal factor is un, 
whether limy—+o9 Un = A. 


Discrete Ricci Flow for Geometric Routing, Fig. 5 
Computing the shortest paths using the hyperbolic em- 
bedding of a 3-connected domain with 1,286 nodes. Two 
different paths are generated using greedy routing toward 


Experimental Results 


The convergence rate, i.e., the number of itera- 
tions is proportional to o( 28/9), where 6 is 
the step size in the Ricci flow algorithm and 
€ is the error bound on the curvature. In our 
experiments we take € to be le — 6. Routing 
with the virtual coordinates has 100% delivery 
rate and the average path stretch (compared to 
the shortest path in the network) is no greater 
than 2. 


URLs to Code and Data Sets 


http://www.cs.sunysb.edu/~gu/tutorial/RicciFlow. 


html 
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Problem Definition 


Let G = (V, E) be a weighted undirected graph 
with 1 vertices and m edges. A distance oracle 
is a data structure capable of representing al- 
most shortest paths efficiently, both in terms of 
space requirement and query time. Thorup and 
Zwick [7] showed that for any integer k > 1 it 
is possible to preprocess the graph in O(mn'/*) 
time and generate a compact data structure of size 
O(kn!*1/*) that answers approximate distance 
queries with 2k — 1 multiplicative stretch in 
O(k) time. This means that for every u,v € 
V, it is possible to retrieve an estimate d (u, v) 
to the distance d(u,v) in O(k) time, such that 
d(u,v) < d(u,v) < (Qk — 1)d(u,v). Re 
cently, [8] showed, using a clever query algo- 
rithm, that the query time of Thorup and Zwick 
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can be reduced from O(k) to O(logk). Even 
more recently, [1] showed that the query time 
of Thorup and Zwick can be reduced to O(1). 
Thorup and Zwick [7] showed, based on the girth 
conjecture of [2], that there are dense enough 
graphs which cannot be represented by a data 
structure of size less than n'+!/* without in- 
creasing the stretch above 2k — 1| for any integer 
k. Therefore, for dense graphs their distance 
oracle is optimal assuming the girth conjecture 
holds. 

This suggests that the distance oracle of Tho- 
rup and Zwick can be improved only for sparse 
graphs and in particular, graphs with less than 
n'+1/K edges. Alternatively, it might be possible 
to get below the 2k — 1 multiplicative stretch by 
allowing an additive stretch as well. 

Notice, however, that we cannot gain from 
introducing also additive stretch without getting 
an improved multiplicative stretch distance ora- 
cles for sparse graphs (i.e., m = O(n)). A data 
structure with size S(m,n) and stretch (a, 6), 
where @ is multiplicative stretch and f is additive 
stretch, implies a data structure with size S((B + 
1)m,n + Bm) and multiplicative stretch of a, 
as if we divide every edge into 6 + 1 edges 
then all distances become a multiply of 6 + 1 
and additive stretch of 6 is useless. For graphs 
with m = O(n), the size of the data structure is 
asymptotically the same. 


Key Results 


Patragcu and Roditty [4] obtained a distance ora- 
cle for sparse unweighted graphs with m = O(n) 
of size O(m>/3) that can supply in O(1) time 
an estimate of the distance with multiplicative 
stretch 2. For dense graphs, the distance oracle 
has size of O(n?!) and stretch (2, 1). 

Patragscu et al. [5] extended this result for 
weighted graphs and generalized it. In particular, 
they show that for any fixed positive integers 
k and £, there is a distance oracle with stretch 
a= 2k+1t3 = 2k+1-F,2k +14 §, 
that uses O(m!+2/@+)) space. The query time 
is O(k + £). 
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Sommer et al. [6] proved a three-way trade- 
off between space, stretch, and query time of 
approximate distance oracles. They show that 
any distance oracle that can give stretch @ an- 
swers to distance queries in time O(t) must 
use n'+2(1/@@)) / Jog n space. Their result is ob- 
tained by a reduction from lopsided set disjoint- 
ness to distance oracles, using the framework 
introduced by [3]. Any improvement to this lower 
bound requires a major breakthrough in lower 
bounds techniques. In particular, it does not imply 
anything even for slightly non-constant query 
time as §2 (log) and slightly non-linear space as 
nol. 

Patrascu and Roditty [4] showed also a con- 
ditional lower bound for distance oracle that 
is based on a conjecture on the hardness of 
the set intersection problem. They showed that 
a distance oracle for unweighted graphs with 
O(n) edges, which can distinguish be- 
tween distances of 2 and 4 in constant time (as 
multiplicative stretch strictly less than 2 implies) 
requires 2(n?) space, assuming the conjecture 
holds. Thus, non-constant query time is essential 
to get stretch smaller than 2. 

Patrascu et al. [5] showed, based also on a 
conjecture on the hardness of the set intersec- 
tion problem, for any fixed positive integer @, 
that there are graphs with m edges such that 
a distance oracle with constant query time and 


stretch below 3 — 2/(€ + 1) must use space 
Q(m'FV/2-1/9)_ 


nm = 


Open Problems 


The conditional lower bounds of [5] for sparse 
graphs do not say anything on stretch 3. 
The best space upper bound for stretch 3 in 
sparse graphs and dense graphs is O(n!*). 
While in dense graphs this is tight due to the 
existence of graphs with @Q(n!°) edges and 
girth 6, for sparse graphs nothing is known. 
Therefore, we have the following two open 
problems: 

Can we get a o(n'°) space for stretch 3 in 
sparse graphs? Can we get stretch less than 3 for 
space O(n 1.5) in sparse graphs? 
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Problem Definition 


Introduction 

From a mathematical point of view, a phylogeny 
defines a probability space for random sequences 
observed at the leaves of a binary tree T. 
The tree T represents the unknown hierarchy 
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of common ancestors to the sequences. It is 
assumed that (unobserved) ancestral sequences 
are associated with the inner nodes. The tree 
along with the associated sequences models the 
evolution of a molecular sequence, such as the 
protein sequence of a gene. In the conceptually 
simplest case, each tree node corresponds to 
a species, and the gene evolves within the 
organismal lineages by vertical descent. 

Phylogeny reconstruction consists of finding T 
from observed sequences. The possibility of such 
reconstruction is implied by fundamental princi- 
ples of molecular evolution, namely, that random 
mutations within individuals at the genetic level 
spreading to an entire mating population are not 
uncommon, since often they hardly influence 
evolutionary fitness [15]. Such mutations 
slowly accumulate, and, thus, differences 
between sequences indicate their evolutionary 
relatedness. 

The reconstruction is theoretically feasible in 
several known situations. In some cases, dis- 
tances can be computed between the sequences, 
and used in a distance-based algorithm. Such an 
algorithm is fast-converging if it almost surely 
recovers T, using sequences that are polynomially 
long in the size of T. Fast-converging algorithms 
exploit statistical concentration properties of dis- 
tance estimation. 


Formal Definitions 

An evolutionary topology U(X) is an unrooted 
binary tree in which leaves are bijectively 
mapped to a set of species X. A rooted topology T 
is obtained by rooting a topology U on one 
of the edges uv: a new node p is added (the 
root), the edge uv is replaced by two edges pv 
and pu, and the edges are directed outwards 
on paths from p to the leaves. The edges, 
vertices, and leaves of a rooted or unrooted 
topology T are denoted by E(T), V(T) and L(7), 
respectively. 

The edges of an unrooted topology U may be 
equipped with a a positive edge length function 
d: E(U) + (0, 00). Edge lengths induce a tree 
metric d: V(U) x V(U) + [0, 00) by the exten- 
sion d(u,V) = )oeeywy A(e), where u > v de- 
notes the unique path from u to v. The value d(u, 
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v) is called the distance between u and v. The 
pairwise distances between leaves form a dis- 
tance matrix. 

An additive tree metric is a function 6: X x 
X +> [0,00) that is equivalent to the distance 
matrix induced by some topology U(X) and edge 
lengths. In certain random models, it is possible 
to define an additive tree metric that can be 
estimated from dissimilarities between sequences 
observed at the leaves. 

In a Markov model of character evolution over 
a rooted topology 7, each node wu has an associ- 
ated state, which is a random variable & (u) taking 
values over a fixed alphabet A = {1,2,...r}. 
The vector of leaf states constitutes the 
character & = (&(u):u€ L(T)). The states 
form a first-order Markov chain along every 
path. The joint distribution of the node states 
is specified by the marginal distribution of the 
root state, and the conditional probabilities 
P{E(v) = b|E(u) = a} = pe(a > b) on each 
edge e, called edge transition probabilities. 

A sample of length £ consists of inde- 
pendent and identically distributed characters 
& = (&:i =1,...£). The random sequence 
associated with the leaf u is the vector 
S(u) = (& @):i =1,...2£). 

A_ phylogeny reconstruction algorithm is 
a function F mapping samples to unrooted 
topologies. The success probability is the 
probability that F(4’) equals the true topology. 


Popular Random Models 


Neyman Model [14] 
The edge transition probabilities are 


l-—pe ifa=b; 


He ifaXéb 


Pela > b) = 


with some edge-specific mutation probability 0 < 
Me < 1-—1/r. The root state is uniformly 
distributed. A distance is usually defined by 


d(u,v) =~" n(1 — 2; PEW) 4 EO)}). 
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General Markov Model 

There are no restrictions on the edge transi- 
tion probabilities in the general Markov model. 
For identifiability [1, 16], however, it is usu- 
ally assumed that 0 < detP, < 1, where Pe is 
the stochastic matrix of edge transition prob- 
abilities. Possible distances in this model in- 
clude the paralinear distance [1, 12] and the 
LogDet distance [13, 16]. This latter is defined 
by d(u, v) = —Indet J,,,, where J, is the matrix 
of joint probabilities for &(u) and &(v). 

It is often assumed in practice that sequence 
evolution is effected by a continuous-time 
Markov process operating on the edges. 
Accordingly, the edge length directly measures 
time. In particular, Pe = e2@©) on every edge e, 
where Q is the instantaneous rate matrix of the 
underlying process. 


Key Results 


It turns out that the hardness of reconstructing 
an unrooted topology U from distances is de- 
termined by its edge depth p(U). Edge depth is 
defined as the smallest integer k for which the 
following holds. From each endpoint of every 
edge e € E(U), there is a path leading to a leaf, 
which does not include e and has at most k 
edges. 


Theorem 1 (Erdés, Steel, Székely, Warnow [6]) 
If U has n leaves, then p(U) < 1+ logy(n — 1). 
Moreover, for almost all random n-leaf topolo- 
gies under the uniform or Yule-Harding distribu- 
tions, p(U) € O(log logn) 


Theorem 2 (Erdés, Steel, Székely, Warnow [6]) 
For the Neyman model, there exists a polynomial- 
time algorithm that has a success probabil- 
ity (1 —6) for random samples of length 
1 
7 ( logn + log = ) (1) 
f?d = 2g)tor6 


where O< f=mingfe and g =maxe 
[Me < 1/2 are extremal edge mutation probabili- 
ties, and p is the edge depth of the true topology. 
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Theorem 2 can be extended to the general 
Markov model with analogous success rates for 
LogDet distances [7], as well as to a number of 
other Markov models [2]. 

Equation (1) shows that phylogenies can 
be reconstructed with high probability from 
polynomially long sequences. Algorithms with 
such sample size requirements were dubbed fast- 
converging [9]. Fast convergence was proven for 
the short quartet methods of Erdés et al. [6, 7], 
and for certain variants [11] of the so-called 
disk-covering methods introduced by Huson 
et al. [9]. All these algorithms run in (2(n>) 
time. Cstirds and Kao [3] initiated the study 
of computationally efficient fast-converging 
algorithms, with a cubic-time solution. Cstirés [2] 
gave a quadratic-time algorithm. King et al. [10] 
designed an algorithm with an optimal running 
time of O(nlogn) for producing a phylogeny 
from a matrix of estimated distances. 

The short quartet methods were revisited re- 
cently: [4] described an O(n*)-time method that 
aims at succeeding even if only a short sample is 
available. In such a case, the algorithm constructs 
a forest of “trustworthy” edges that match the true 
topology with high probability. 

All known fast-converging distance-based al- 
gorithms have essentially the same sample bound 
as in (1), but Daskalakis et al. [5] recently gave 
a twist to the notion of fast convergence. They 
described a polynomial-time algorithm, which 
outputs the true topology almost surely from 
a sample of size O(log 1), given that edge lengths 
are not too large. Such a bound is asymptotically 
optimal [6]. Interestingly, the sample size bound 
does not involve exponential dependence on the 
edge depth: the algorithm does not rely on a dis- 
tance matrix. 


Applications 


Phylogenies are often constructed in molecular 
evolution studies, from aligned DNA or protein 
sequences. Fast-converging algorithms have 
mostly a theoretical appeal at this point. Fast 
convergence promises a way to handle the 
increasingly important issue of constructing 
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large-scale phylogenies: see, for example, the 
CIPRES project (http://www.phylo.org/). 
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Problem Definition 


A phylogeny is an evolutionary tree tracing the 
shared history, including common ancestors, of 
a set of extant species or “taxa.” Phylogenies 
are increasingly reconstructed on the basis of 
molecular data (DNA and protein sequences) 
using statistical techniques such as likelihood 
and Bayesian methods. Algorithmically, these 
techniques suffer from the discrete nature of 
tree topology space. Since the number of tree 
topologies increases exponentially as a function 
of the number of taxa, and each topology requires 
a separate likelihood calculation, it is important 
to restrict the search space and to design efficient 
heuristics. Distance methods for phylogeny re- 
construction serve this purpose by inferring trees 
in a fraction of the time required for the more 
statistically rigorous methods. Distance methods 
also provide fairly accurate starting trees to be 
further refined by more sophisticated methods. 
Moreover, the input to a distance method is the 
matrix of pairwise evolutionary distances among 
taxa, which are estimated by maximum likeli- 
hood, so that distance methods also have sound 
statistical justifications. 

Mathematically, a phylogenetic tree is a triple 
T = (V,E,1) where V is the set of nodes 
(extant taxa correspond to leaves, ancestors to 
internal nodes), E is the set of edges (branches) 
representing relations of descent, and / is a func- 
tion that assigns positive lengths to each edge in 
E, representing a measure of evolutionary diver- 
gence, for example, in terms of time, or amount 
of change between DNA and protein sequences. 
Any phylogenetic tree T defines a metric Dr on 
its leaf set L: let Pr(u, v) define the unique path 
through T from u to v; then the distance from u 


tovissettoDr(u,v)= Yo lee). 
e€Pr(u,v) 
Distance methods for phylogeny reconstruc- 


tion rely on the fundamental result [22] that the 
map T — Dr is reversible; i.e., a tree T can be 
reconstructed from its tree metric, a problem that 
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can be solved in O (n log) time [14]. However, 
in practice Dr is not known, and one must use 
molecular sequence data to estimate a distance 
matrix D that approximates Dry [9]. As the 
amount of sequence data increases, D can be 
assumed to converge to Dy. A minimal require- 
ment for any distance method is consistency: for 
any tree 7, and for distance matrices D “close 
enough” to Dr, the algorithm should output a 
tree with the same topology as T (1.e., with 
the same underlying graph (V,F)). The present 
chapter deals with the question of when any 
distance algorithm for phylogeny reconstruction 
can be guaranteed to output the correct phylogeny 
as a function of the divergence between D and 
Dr. Atteson [1] demonstrated that this question 
can be precisely answered for neighbor joining 
(NJ) [18], one of the most cited algorithms in 
computational biology (with more than 35,000 
citations up to 2014), and a number of NJ’s 
variants. 


The Neighbor Joining (NJ) Algorithm of 
Saitou and Nei [18] 

NJ is agglomerative: it works by using the input 
matrix D to identify a pair of taxa x, y € L 
that are neighbors in T, i.e., there exists a node 
u € V such that {(u,x), (u,y)} C E. Then, the 
algorithm creates a node c that is connected to x 
and y, extends the distance matrix to c, and solves 
the reduced problem on LU{c}\{x,y}. The pair 
(x, y) is chosen to minimize the following sum: 


Sp (x, y) = (\L| —2)-D (x,y) 


- > (D (z,x) + D(z, y)). 


zeL 


The soundness of NJ is based on the observation 
that, if D = Dr fora tree T, the value Sp(x, y) 
will be minimized for a pair (x, y) that are neigh- 
bors in T. 


Balanced Minimum Evolution and 
Algorithms Inspired by It 

A number of papers (reviewed in [11]) have been 
dedicated to the various interpretations and prop- 
erties of the S‘p criterion. One of these interpreta- 
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tions consists of observing that agglomerating the 
pair of nodes that minimizes S:p is equivalent to 
choosing, among all the trees that can be obtained 
in this way, the one that minimizes a simple linear 
formula [16] to calculate the length of a tree 
from the distances between its leaves [11], thus 
connecting distance and parsimony methods [9]. 
As the optimization principle seeking the tree that 
minimizes this formula has been named balanced 
minimum evolution (BME) [6], NJ can then be 
seen as a greedy algorithm for BME. 

This remarkable connection between NJ and 
BME naturally spurred the proposal of alternative 
algorithms for BME. One of these, GreedyBME, 
consists of iteratively adding taxa to a tree so 
that, at each step, the resulting tree is the one 
that minimizes BME among all the binary trees 
that can be obtained in this way [6]. More in- 
volved algorithms can be obtained by combining 
a simple tree construction algorithm such as NJ 
or GreedyBME, with a local search based on the 
traditional tree rearrangements used in phyloge- 
netics [9], such as nearest-neighbor interchange 
(NND or subtree pruning and regrafting (SPR). 


The Fast Neighbor Joining (FNJ) Algorithm 
of Elias and Lagergren [7] 
Standard implementations of NJ require O(n?) 
computations, where n is the number of taxa in 
the data set. Since a distance matrix only has 
n? entries, many attempts have been made to 
construct a distance algorithm that would only 
require O(n”) computations while retaining the 
accuracy of NJ. To this end, one of the most 
interesting results is the fast neighbor joining 
(FNJ) algorithm of Elias and Lagergren [7]. 

Most of the computation of NJ is used in the 
recalculations of the sums Sp(x, y) after each 
agglomeration step. Although each recalculation 
can be performed in constant time, and although 
it is not necessary to consider all pairs of taxa 
(x,y) in order to find the one that minimizes 
this sum [20], the number of pairs to consider 
remains, in the worst case, O(k”) when k nodes 
are left to agglomerate. Thus, summing over k, 
O(n?) computations are required in all. 

Elias and Lagergren take a related approach 
to agglomeration, which does not exhaustively 
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seek the minimum value of Sp(x,y) at each 
step, but instead uses a heuristic to maintain a 
list of candidates of “visible pairs” (x, y) for 
agglomeration. At the (n — k)th step, when two 
neighbors are agglomerated from a k-taxa tree to 
form a (k — 1)-taxa tree, FNJ has a list of O(k) 
visible pairs for which Sp(x, y) is calculated. 
The pair joined is selected from this list. By 
trimming the number of pairs considered, Elias 
and Lagergren achieved an algorithm which re- 
quires only O(n?) computations. Other similar 
improvements to neighbor joining have also been 
proposed in recent years [8, 13, 20]. 


Safety Radius Performance Analysis 

(Atteson [1]) 

In order to provide accuracy guarantees for 
distance-based algorithms, Atteson [1] tackled 
the following question: if D is a distance matrix 
that approximates a tree metric D7, can one have 
some confidence in the algorithm’s ability to 
reconstruct T, or parts of it, given D, based 
on some measure of the distance between 
D and Dr? For two matrices, D; and Do, 
the Loo distance between them is defined by 
||D1 — Da|loo = max;,j|Di@i, j) — Dali, /)I.- 
Moreover, let w(T) denote the length of the 
shortest internal edge of a tree 7. This is 
an important quantity, as short branches in a 
phylogeny are difficult to resolve, because of 
the relatively few (if any) molecular changes 
occurring on a short branch. 

The safety radius of an algorithm A is then the 
greatest value of r with the property that given 
any phylogeny 7, and any distance matrix D 
satisfying ||D — Dr|lo <r-u(T), A will return 
a tree 7 with the same topology as T. Similarly, 
the edge radius of A is the greatest value of r, 
for which the presence in T of an edgee € E 
is guaranteed whenever ||D- Dr|loo < r-/(e). 
As an easy consequence of these definitions, the 
safety radius is always at least as large as the edge 
radius. Moreover, both the safety radius and the 
edge radius can also be attributed to an optimiza- 
tion principle, assuming an exact optimization 
algorithm. 
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Key Results 


Atteson [1] proved the following theorems: 
Theorem 1 The safety radius of NJ is \/2. 


Theorem 2 The largest possible safety radius 
for any algorithm is 1/2. 


Indeed, given any j2, one can find two different 
trees 7}, T> and a distance matrix D such that 
= (71) = (72), and || D—Dr7, |loo = My = 
||D — D7, |loo. Since D is equidistant from two 
distinct tree metrics, no algorithm could assign it 
to the “closest” tree. 

In their presentation of FNJ, Elias and Lager- 
gren updated Atteson’s results for their algorithm. 
They showed: 


Theorem 3 The safety radius of FNJ is \/2. 


An insight on the above results on neighbor- 
joining-type algorithms is provided by the fact 
that the optimization principle they are linked to, 
BME, has itself safety radius !/2 [15]. A simple 
consequence of this [15] is the fact that also 
GreedyBME has safety radius 1/2, a result first 
proven by Shigezumi [19]. Finally, performing a 
local search guided by BME and based on SPR 
leads to an algorithm with safety radius greater 
or equal to 1/3, regardless of the method used to 
construct the initial tree [2]. 

The edge radius of a number of algorithms 
has also been studied. As conjectured by Atte- 
son [1] and formally proven by Dai et al. [5], 
the edge radius of NJ is 1/4. Interestingly, other 
heuristics, related to NJ via the principle they 
seek to optimize (BME), perform better than NJ 
in terms of edge radius: GreedyBME has edge 
radius 1/3 [3]; moreover, building an initial tree 
with GreedyBME and then performing a local 
search guided by BME and based on NNI or 
SPR operations constitute an algorithm with edge 
radius 1/3 [3]. 

Finally, we note that the safety radius frame- 
work has also been applied to the ultrametric 
setting where the correct tree T is rooted and all 
tree leaves are at the same distance from the root 
[10]. These trees are called “molecular clock” 
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trees in phylogenetics and “indexed hierarchies” 
in data analysis. In this setting, the optimal safety 
radius is equal to | (instead of 1/2), and a number 
of standard algorithms (e.g., UPGMA, with time 
complexity in O(n7)) have a safety radius of 1. 


Open Problems 


With increasing amounts of sequence data be- 
coming available for an increasing number of 
species, distance algorithms such as NJ should 
be useful for quite some time. Currently, the 
bottleneck in the process of building phyloge- 
nies is estimating distances, rather than exploring 
tree topologies. Two algorithms were recently 
developed to reconstruct trees from incomplete 
distance matrices. These algorithms use character 
information as well as distances and hence cannot 
be categorized as pure distance methods. 

FastTree [17] is an NJ-like heuristic that 
avoids computing the full distance matrix. For 
each taxon, FastTree computes the distances 
to O(./n) close neighbors. FastTree also uses 
sequence profiles to approximate Sp(x, y) 
values in constant time. The overall algorithm 
takes O(san./n logn) time and O(san + n./n) 
memory, where s is the length of the input 
sequences and a is their alphabet size. FastTree 
has been shown to be highly accurate with 
simulated data [17], but no formal guarantee 
has yet been shown for this algorithm. 

The only known 0(n”) algorithm with theoret- 
ical guarantees is LSHTree [4]. It uses locality- 
sensitive hashing to rapidly find candidate pairs 
of close sequences for merging. After each 
merge, LSHTree reconstructs ancestral sequences 
at new internal nodes to ensure that a close pair 
of sequences can be found at each iteration. 
LSHTree is guaranteed to reconstruct the correct 
tree from sequences of logarithmic length under a 
Markov model of sequence evolution. The exact 
running time of LSHTree depends on the branch 
lengths. 

As we have shown, a number of distance- 
based tree building algorithms have been ana- 
lyzed in the safety radius framework. However, 
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computer simulations (e.g., [6, 7]) have shown 
that not all algorithms with optimal safety radius 
achieve the same accuracy: for example, NJ is 
slightly more accurate than FNJ (both having 
safety radius = 1/2), but is beaten by heuristics 
based on NNI or SPR moves (with demonstrated 
safety radius >= 1/3, but possibly = 1/2). More- 
over, some well-established methods (e.g., based 
on least squares [10,21]) have safety radius con- 
verging to 0 when the number of taxa increases, 
which contradicts the common practice. These 
experimental observations indicate that the safety 
radius approach should be sharpened to provide 
better theoretical analysis of method performance 
(see [12] for a work in this direction). In par- 
ticular, the choice of the Loo norm to measure 
the error in a distance matrix seems to have little 
statistical or biological justification. 

An alternative analysis framework, strictly 
linked to the one presented here, is the one 
seeking to estimate the minimum sequence length 
required for accurate reconstruction of the correct 
tree. It is discussed in a separate entry of this 
encyclopedia [A]. 
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Problem Definition 


Consider a communication network, modeled by 
an undirected weighted graph G = (V, E), where 
|V| =n, |E| = m. Each vertex of V represents a 
processor of unlimited computational power; the 
processors have unique identity numbers (ids), 
and they communicate via the edges of FE by 
sending messages to each other. Also, each edge 
e € E has associated a weight w(e), known to the 
processors at the endpoints of e. Thus, a proces- 
sor knows which edges are incident to it and their 
weights, but it does not know any other infor- 
mation about G. The network is asynchronous: 
each processor runs at an arbitrary speed, which 
is independent of the speed of other processors. 
A processor may wake up spontaneously or when 
it receives a message from another processor. 
There are no failures in the network. Each mes- 
sage sent arrives at its destination within a fi- 
nite but arbitrary delay. A distributed algorithm 
A for G is a set of local algorithms, one for 
each processor of G, that include instructions for 
sending and receiving messages along the edges 
of the network. Assuming that A terminates (i.e., 
all the local algorithms eventually terminate), 
its message complexity is the total number of 
messages sent over any execution of the algo- 
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rithm, in the worst case. Its time complexity is the 
worst-case execution time, assuming processor 
steps take negligible time, and message delays are 
normalized to be at most | unit. 

A minimum spanning tree (MST) of G is a 
subset E’ of E such that the graph T = (V, E’) is 
a tree (connected and acyclic) and its total weight, 
w(E’') = > w(e), is as small as possible. The 


eck’ 
computation of an MST is a central problem in 


combinatorial optimization, with a rich history 
dating back to 1926 [2], and up to now, the 
book [12] collects properties, classical results, 
applications, and recent research developments. 

In the distributed MST problem, the goal is to 
design a distributed algorithm A that terminates 
always and computes an MST T of G. At the 
end of an execution, each processor knows which 
of its incident edges belong to the tree T and 
which do not (i.e., the processor writes in a local 
output register the corresponding incident edges). 
It is remarkable that in the distributed version of 
the MST problem, a communication network is 
solving a problem where the input is the network 
itself. This is one of the fundamental starting 
points of network algorithms. 

It is not hard to see that if all edge weights 
are different, the MST is unique. Due to the 
assumption that processors have unique ids, 
it is possible to assume that all edge weights 
are different: whenever two edge weights are 
equal, ties are broken using the processor ids 
of the edge endpoints. Having a unique MST 
facilitates the design of distributed algorithms, as 
processors can locally select edges that belong 
to the unique MST. Notice that if processors do 
not have unique ids and edge weights are not 
different, there is no deterministic MST (nor any 
spanning tree) distributed algorithm, because it 
may be impossible to break the symmetry of the 
graph, for example, in the case it is a cycle with 
all edge weights equal. 


Key Results 


The distributed MST problem has been studied 
since 1977, and dozens of papers have been 
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written on the subject. In 1983, the fundamen- 
tal distributed GHS algorithm in [5] was pub- 
lished, the first to solve the MST problem with 
O(m + n log n) message complexity. The paper 
has had a very significant impact on research in 
distributed computing and won the 2004 Edsger 
W. Dijkstra Prize in Distributed Computing. 

It is not hard to see that any distributed MST 
algorithm must have Q(m) message complex- 
ity (intuitively, at least one message must tra- 
verse each edge). Also, results in [3,4] imply an 
Q2(n log n) message complexity lower bound for 
the problem. Thus, the GHS algorithm is optimal 
in terms of message complexity. 

The Q(m + nlogn) message complexity 
lower bound for the construction of an MST 
applies also to the problem of finding an arbitrary 
spanning tree of the graph. However, for specific 
graph topologies, it may be easier to find an 
arbitrary spanning tree than to find an MST. In 
the case of a complete graph, Q(n7) messages 
are necessary to construct an MST [8], while 
an arbitrary spanning tree can be constructed in 
O(n log n) messages [7]. 

The time complexity of the GHS algorithm is 
O(n log n). In [1] it is described how to improve 
its time complexity to O(n) while keeping the 
optimal O(m + nlog n) message complexity. 
It is clear that Q(D) time is necessary for the 
construction of a spanning tree, where D is the 
diameter of the graph. And in the case of an MST, 
the time complexity may depend on other param- 
eters of the graph. For example, due to the need 
for information flow among processors residing 
on a common cycle, as in an MST construction, 
at least one edge of the cycle must be excluded 
from the MST. If messages of unbounded size 
are allowed, an MST can be easily constructed in 
O(D) time, by collecting the graph topology and 
edge weights in a root processor. The problem 
becomes interesting in the more realistic model 
where messages are of size O(log n) and an edge 
weight can be sent in a single message. When 
the number of messages is not important, one can 
assume without loss of generality that the model 
is synchronous. For near-time optimal algorithms 
and lower bounds, see [10] and references herein. 
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Applications 


The distributed MST problem is important to 
solve, both theoretically and practically, as an 
MST can be used to save on communication, 
in various tasks such as broadcast and leader 
election, by sending the messages of such appli- 
cations over the edges of the MST. 

Also, research on the MST problem, and in 
particular the MST algorithm of [5], has mo- 
tivated a lot of work. Most notably, the algo- 
rithm of [5] introduced various techniques that 
have been in widespread use for multicasting, 
query and reply, cluster coordination and routing, 
protocols for handshake, synchronization, and 
distributed phases. Although the algorithm is in- 
tuitive and is easy to comprehend, it is sufficiently 
complicated and interesting that it has become a 
challenge problem for formal verification meth- 
ods, e.g., [11]. 


Open Problems 


There are many open problems in this area, and 
only a few significant ones are mentioned. As far 
as message complexity, although the asymptoti- 
cally tight bound of O(m +n log n) for the MST 
problem in general graphs is known, finding the 
actual constants remains an open problem. There 
are smaller constants known for general spanning 
trees than for MST though [6]. 

As mentioned above, near-time optimal al- 
gorithms and lower bounds appear in [10] and 
references herein. The optimal time complexity 
remains an open problem. Also, in a synchronous 
model for overlay networks, where all processors 
are directly connected to each other, an MST can 
be constructed in sublogarithmic time, namely, 
O(log log n) communication rounds [9], and no 
corresponding lower bound is known. 
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Problem Definition 


This entry considers enumeration of combinato- 
rial problems, which can be formulated as fol- 
lows. Given a large search space C and a predicate 
of interest P : C + {true, false} the goal is 
to enumerate the solutions S C C such that 
V2s € S P(s) = true. In most settings, C 
is the complete set of combinations of an initial 
set G; hence, |C| = 2'9! and the problem is NP- 
hard. There are also cases where the elements to 
enumerate are not sets but other combinatorial 
structures such as sequences or graphs. 

We restrict ourselves to the case where C can 
be organized as an enumeration tree: 


¢ There exists a distinguished element of C 
called the root 

¢ There exists (at least) a parent function parent: 
C\ {root} Fe C 


Finding S implies, in the worst case, to enumer- 
ate all elements of C. A large body of research 
has exploited properties of P together with par- 
ent to avoid enumerating some parts of C, as 
described in the >» Reverse Search; Enumeration 
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Algorithms entry. The focus of the current entry 
is to exploit parallel computing devices in order 
to speedup the enumeration process. Due to the 
prevalence of multicore processors nowadays, 
they will be the main focus of this entry. How- 
ever, many solutions presented also apply to a 
cluster setting. 


Key Results 


The main challenges of distributed enumeration, 
as well as existing solutions, are presented below. 


Synchronization 


For most distributed algorithms, one important 
challenge is to ensure the sharing of information 
between parallel processes, either by message 
passing in a cluster setting or by access to shared 
memory locations in a multicore setting. 

This entry considers a tree-shaped enumer- 
ation, where all branches of the enumeration 
tree are independent. In such setting, complex 
synchronization mechanisms are not needed. The 
main difficulty is thus to guarantee that such tree- 
shaped enumeration can take place, i.e., find (at 
least) one parent function (cf. » Reverse Search; 
Enumeration Algorithms entry). 

Note that in some parallel SAT solvers [6] us- 
ing portfolio-based approaches, each computing 
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resource exploits a different strategy (and thus a 
different parent function) to explore the search 
space. Ideally these strategies are orthogonal: 
they explore different parts of the space, and they 
exchange information to help mutual pruning. 


Load Balancing 


Most recent algorithms explore the enumeration 
tree with Depth-First Search (DFS). The simplest 
way to perform distributed enumeration is to 
partition the enumeration tree and assign each 
subtree to a parallel process. A simple example 
is shown in Fig. 1. The figure shows the enumer- 
ation tree. Each subtree of the root is assigned to 
a different computing resource (either to a node 
in a cluster or to a core of a multicore processor). 

However, depending on the choice of parent 
function and of pruning strategies, each subtree 
is likely to be of a different size, resulting on 
large running time differences for the parallel 
processes. Such phenomenon is called load un- 
balance. On the right of Fig. 1, the execution 
time is represented for both computing resources, 
assuming that computing for any node of the 
enumeration tree takes exactly one time unit. 
Computing resource 7; computes for four time 
units, while computing resource T, computes for 
eight time units. This means that 7, has been 
idle for four time units: for more than half of the 
execution time, the execution has only exploited 


Load unbalance 


Execution time 
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one of the two available computing resources, 
resulting in a longer execution time (in an ideal 
case, as there are 12 nodes to explore, the exe- 
cution time should be 6 time units). The solution 
to this problem is to use dynamic load balancing 
strategies such as work sharing or work stealing. 

Work sharing is based on a simple pro- 
ducer/consumer principle: tasks (here nodes of 
the enumeration tree) are not assigned statically 
but made available in a single pool. Each idle 
computing resource can request one or more tasks 
from this pool. By reducing the granularity of the 
work each computing resource has to perform, 
this technique effectively reduces load unbalance. 
A disadvantage of this technique is that if compu- 
tation for a single node of the enumeration tree is 
short, the computing resource will make frequent 
requests to the central work pool, with an 
increased risk of synchronization overheads over 
accesses to the pool. This can be limited by pass- 
ing more than one enumeration tree node to each 
computing resource [1]; however, if the computa- 
tion time of each of these nodes is unpredictable, 
a new load unbalance situation may arise. 

Work stealing [2] is an “optimistic” improve- 
ment over work sharing in cases where the com- 
putation time of each task is unpredictable (which 
is often the case in distributed enumeration). The 
idea is that each computing resource enqueues a 
list of tasks to perform, which can either come 
from a central pool (as in work sharing) or from 
a partition of the search space (as in a static parti- 
tioning). When a computing resource finishes its 
tasks and becomes idle, it will query the work 
queues of the other resources and “steal” part 
of a nonempty queue (usually from the biggest 
queue). 

Recent works from Hanusse et al. [3] have 
proposed an enumeration approach that mixes 
DFS and BFS. The parallelism that takes place 
in the BFS steps naturally lead to a good load 
balancing without needing work-stealing mech- 
anisms. Their approach show promising results. 


Data Locality 


In some classes of applications, such as pattern 
mining, testing the predicate P requires to access 
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a large dataset D. The enumeration tree is or- 
ganized such that each branch explores a subset 
of the dataset, which allows to benefit from data 
locality effects and efficiently exploit the cache in 
case of multicore processors. 

Techniques to limit load unbalance presented 
above tend to dispatch nodes from the same 
branch to different computing resources, which 
destroys data locality. This can lead to a 
vastly increased bandwidth usage (memory 
bus bandwidth for multicores and network 
bandwidth in clusters), resulting in a severe loss 
of performance. 

Several solutions have been proposed by the 
pattern mining community in order to combine 
good load balance and good data locality. These 
solutions are designed for multicore processors. 
In the case of work stealing, Buehrer et al. [4] 
propose a method which dynamically decides, 
for a given node of the enumeration tree, if it 
has to be mined by the same thread as its parent 
(preserving locality) or if it can be enqueued and 
possibly stolen by another thread. This method 
takes into account the system load, which is a 
function of the size of the queues for the threads. 

In the case of work sharing, Négrevergne 
et al. [5] divide the task pool in one queue per 
thread. Task assignement to threads prioritizes 
data locality. 


Applications 


The main applications of distributed enumeration 
are pattern mining (a field of data mining) and 
SAT problem solving [6]. Pattern mining consists 
in finding regularities in data, whereas solving 
SAT consists in finding if there exists a truth as- 
signment for variables of a propositional formula. 
Both problems explore a huge search space and 
have numerous applications, they thus require 
to exploit as much as possible available parallel 
computing power. 


Open Problems 


Having an optimal parallel scaling for applica- 
tions using distributed enumeration is still an 
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open problem. The techniques presented in this 
entry allow to design algorithms having satisfy- 
ing results on existing multicore computers with 
tens of cores. However as the number of cores 
will grow towards many-core processors (hun- 
dreds of cores), the current algorithms are un- 
likely to exhibit a good parallel scalability. Novel 
approaches with fewer overheads for handling the 
parallelism will be required. 
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Problem Definition 


Broadcasting is a fundamental problem in com- 
munication networks, where one distinguished 
node, called the source, holds a piece of informa- 
tion, and the goal is to disseminate this message 
to all other nodes in the network. 

The _ signal-to-interference-and-noise-ratio 
model, SINR for short, generalizes the abstract 
radio networks model (RN) in the following way: 
nodes located in a metric space communicate by 
transmitting a signal to the wireless medium, and 
the quantitative accumulation of interference and 
signal attenuation are taken into account when 
deciding which nodes successfully receive the 
signal. 

In more detail, a wireless network consists of 
n nodes, deployed into the Euclidean plane; each 
node v has its transmission power Py, which is a 
positive real number. A network is uniform when 
transmission powers P, are equal or nonuniform 
otherwise. In the following, the uniform networks 
are considered. 

Nodes work synchronously in rounds; each 
node can either act as a transmitter or as a receiver 
during a round. 

Interferences and collisions are determined by 
three fixed model parameters: path loss a > 2, 
threshold 6 > 1, and ambient noise NV’ > 0. The 
SINR(v, u, 7) ratio, for given nodes u, v anda set 
of transmitting nodes 7, is defined as 


Pydist(v, u)~* 


SINR(v, u, 7) = ; ; 
( y N+ wee) P,,dist(w, u)—% 


where dist(-,-) is the distance function on the 
plane. A node u successfully receives a message 
from a node v inaroundif v € 7,u ¢ 7, and 
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SINR(v, u, T) = B, (1) 
where 7 is the set of nodes transmitting at that 
round. 

A single message sent in an execution of any 
algorithm can carry the source message and at 
most logarithmic, in the size of the network, num- 
ber of control bits. A node other than the source 
starts executing the broadcast protocol after the 
first successful receipt of the source message; it is 
often called a non-spontaneous wake-up model. 

In an ad hoc network, there is no central 
knowledge of network topology. A node v par- 
ticipating in an execution of a protocol knows 
the size of a network n or only a linear upper 
bound on the size. In the case that a network is 
deployed in the Euclidean space, one can distin- 
guish between the case that each node knows its 
coordinates and the case that it does not know 
them. 

Assuming that the transmission power of 
nodes is equal to P, the largest distance from the 
transmitter in which a message can be received is 
equal tor = (P/(NB))!/%, provided only one 
node is transmitting in the whole network. 


Sensitivity. Due to physical constraints, it is 
often assumed that the actual distance on which 
message can be received is smaller than r = 
(P/(NB))!/@. This assumption is expressed by 
the sensitivity parameter 0 < e; < 1 such thata 
message transmitted by v is received at a node 
u in a round with the set of transmitters 7 if 
SINR(v, u, 7) => B and dist(v, uv) < (1—és)r. 

The setting with ¢, = 0 is called the model 
with strong sensitivity, and ¢; > O defines the 
model with weak sensitivity. 


Connectivity. In order to determine which 
nodes are connected in a network, the notion 
of a communication graph is introduced. To this 
aim, the connectivity parameter &¢ is introduced 
such that 1 > e => &, = O. An edge 
(u,v) appears in the communication graph iff 
dist(u, v) < (1 — €,)r. The setting with e, = &, 
is called the model with weak connectivity, and 
the inequality ¢; < e, defines the model with 
strong connectivity. 


Distributed Randomized Broadcasting in Wireless Networks under the SINR Model 


Time complexity of a randomized broadcasting 
algorithm is the number of rounds after which, 
for all communication networks defined by given 
SINR parameters a, 6, and NV, and the param- 
eters €s, &¢, the source message is delivered to 
all nodes accessible from the source node in 
the communication graph with high probability 
(whp), i.e., with the probability at least 1 — 1/n, 
where n is the number of nodes in the network. 


Key Results 


Complexity of the broadcasting problem signif- 
icantly differs in various models obtained by 
constraints imposed on the sensitivity and con- 
nectivity parameters. 


The Model with Strong Sensitivity and 

Strong Connectivity 

In this setting reception of messages is deter- 
mined merely by Eq. (1), and the communication 
graph does not contain links of distance close to 
r due to €, > 0. 


Theorem 1 ([3]) The broadcasting problem can 
be solved in time O (D logn + log” n) with high 
probability in the model with strong connectivity 
and strong sensitivity for networks in Euclidean 
two-dimensional space with known coordinates. 


For the setting that nodes do not know their 
coordinates, Daum et al. [2] provided a broad- 
casting algorithm which relies on a parameter 
Rs equal to the maximum ratio among distances 
between pairs of nodes connected by an edge 
in the communication graph. That is, Rs; = 
max{dist(u, v)/dist(x, y)| (u,v), (x,y) € E}, 
where G(V, E) is the communication graph of a 
network. 


Theorem 2 ([2]) The broadcasting problem can 
be solved in time O (D lognlog®*! Rs) with 
high probability in the model with strong connec- 
tivity and strong sensitivity. 


As Rs might be even exponential with respect to 
n, the solution from Theorem 2 is inefficient for 
some networks. The following theorem gives a 
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solution independent of geometric properties of 
a network. 


Theorem 3 ((6]) The broadcasting problem can 
be solved in time O (D log? n) with high prob- 
ability in the model with strong sensitivity and 
strong connectivity. 


The Model with Strong Sensitivity and 

Weak Connectivity 

In this setting reception of messages is 
determined merely by Eq.(1), and an edge 
(u,v) belongs to the communication graph iff 
SINR(u, v, 8) => B. 


Theorem 4 ([2]) There of 
networks with diameter 2 in the model with strong 
sensitivity and weak connectivity in which each 
broadcasting algorithm requires S2(n) rounds to 
accomplish broadcast. 


exist families 


Theorem 5 ((2]) The broadcasting problem can 
be solved in time O(n log* n) with high probabil- 
ity in the model with strong sensitivity and weak 
connectivity. 


The Model with Weak Sensitivity and 

Strong Connectivity 

In this setting, transmissions on unreliable links 
(i.e., on distance very close to r) are “filtered out.” 
Moreover, the communication graph connects 
only nodes in distance at most (1 — €,.)r, which is 
strictly smaller than r. 


Theorem 6 ((7]) The broadcasting problem can 
be solved in time O (D + log? n) with high prob- 
ability in the model with weak sensitivity and 
strong connectivity for networks deployed on the 
Euclidean plane, provided 0 < & < & = 
2/3. This solution works in the model with spon- 
taneous wake-up and requires power control 
mechanism. 


When applied directly to the case of non- 
spontaneous wake-up, the algorithm from [7] 
gives time bound O (D log’ n). 


Corollary 1 The broadcasting problem can be 
solved in time O (D log? n) with high probability 
in the model with weak sensitivity and strong con- 
nectivity for networks deployed on the Euclidean 
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plane, provided 0 < &; < & = 2/3. This so- 
lution works in the model with non-spontaneous 
wake-up and requires power control mechanism. 


The Model with Weak Sensitivity and 

Weak Connectivity 

In this setting, the maximal distances for a suc- 
cessful transmission and for connecting nodes 
by an edge are equal and both smaller than the 
largest theoretically possible range r following 
from Eq. (1). 


Theorem 7 ((5]) There exist (i) an_ infinite 
family of n-node networks requiring Q(n logn) 
rounds to accomplish broadcast whp and, (ii) for 
every D, A = O(n), an infinite family of n-node 
networks of diameter D and maximum degree of 
the communication graph A requiring {2(DA) 
rounds to accomplish broadcast whp in the model 
with weak sensitivity and weak connectivity (i.e., 
0 < €5 = &¢ < 1) for networks on the plane. 


Using appropriate combinatorial structures, 
deterministic broadcasting algorithms were 
obtained with complexities close to the above 
lower bounds, provided nodes know their 
coordinates. 


Theorem 8 ([5]) The broadcasting problem 
can be solved deterministically in time O( min 
{DAlog? N,nlogN}) in the model with 
weak sensitivity and weak connectivity (i.e., 
0 < €; = & < 1) for networks in Euclidean 
two-dimensional space with known coordinates, 
with IDs in the range [1, N]. 


The above result translates to a randomized 
algorithm with complexity O( min {DA log’ n, 
n logn}), since nodes can choose unique IDs in 
the polynomial range with high probability. 

Recently, Chlebus et al. [1] provided a ran- 
domized algorithm for the setting without knowl- 
edge of coordinates. 


Theorem 9 ({1]) The broadcasting problem can 
be solved in time O (n log” n) with high proba- 
bility in the model with weak sensitivity and weak 
connectivity. 
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Distributed Randomized Broadcasting in Wireless 
Networks under the SINR Model, Table 1 Complexity 
of randomized broadcasting with non-spontaneous wake- 
up for various sensitivity and connectivity settings. The 
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result from [7] marked with * requires power control 
mechanism and €¢ = 2/3. The positive results requiring 
knowledge of coordinates apply only to the Euclidean 
space 


Strong Weak 
Coordinates connectivity: €¢ > Es connectivity: €¢ = &s 
Strong sensitivity: | Known O (D logn + log” n) Q(n) 
Es = 0 Unknown O (D log? n) ,O (D logn log?! Rs) O (n log? n) 
Weak sensitivity: Known 2 (min{ DA, n}) 
&,>0 O( min { DA log? n, nlogn}) 
Unknown O (D log? n) * O (n log? n) 
Applications Cross-References 


Using similar techniques to those in [5], an ef- 
ficient deterministic broadcasting algorithm was 
obtained in the model with strong connectivity 
and strong sensitivity. 


Theorem 10 ((4]) The broadcasting problem 
can be solved deterministically in time O(D log” 
N ) in the model with strong connectivity and 
strong sensitivity for networks in Euclidean two- 
dimensional space with known coordinates, with 
IDs in the range [1, N. 


The solution in [7] applies to a more general 
problem of multi-broadcast and was further gen- 
eralized in [8] to the setting in which nodes wake 
up in various time steps. 

Positive results from [1, 2,6] work also in a 
more general setting when nodes are deployed in 
a bounded-growth metric space. 


Open Problems 


In all considered models, there is at least logn 
gap between the established lower and upper 
bounds for the complexity of broadcasting. 
A natural open problem is to tighten these 
bounds. 

As seen in Table 1, it is not known whether the 
complexity of broadcasting depends on sensitiv- 
ity for each connectivity setting. 

Another interesting research direction is to 
explore the impact of additional features such as 
power control or carrier sensing on the complex- 
ity of the problem. 
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Preliminary Remark 


The presentation of this entry of the Encyclopedia 
follows Chapter 6 of [15], to which it borrows 
the presentation style and all figures. The reader 
interested on this topic will find developments 
in [15]. 


The Notion of a Global State 


Modeling the Execution of a Process: The 
Event Point of View 
A distributed computation involving n asyn- 
chronous sequential processes pj,..., Dn, 
communicating by directed channels (hence, a 
directional channel can be represented by two 
directed channels). Channels can be FIFO (first 
in first out) or non-FIFO. 

A distributed computation can be modeled by 
a (reflexive) partial order on the events produced 
by the processes. An event corresponds to the 
sending of a message, the reception of a message, 
or anonempty sequence of operations which does 
not involve the sending or the reception of a 
message. This partial order, due to Lamport and 
called happened before relation [12], is defined as 
follows. Let e; and e2 be two events; e1 = e2 
is the smallest order relation such that: 
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¢ Process order: e; and e, are the same event or 
have been produced by the same process, and 
€1 was produced before e2. 

¢ Message order: e; is the sending of a message 
m and é2 is its reception. 

¢ Transitive closure: There is an event e such 
that e; a ere ey 


Modeling the Execution of a Process: The 
Local State Point of View 
Let us consider a ees Pi, which starts in 
the initial state o? . Let e* be its xth event. 
The transition faricho 5:0 ascaciated with p; is 
consequently such that o* = 6;(0;',e*), where 
x>1. 

While a distributed computation can be mod- 


ie; 


eled the partial order east (“action” point of 
view), it follows from the previous definition that 
it can also be modeled by a partial order on its 
local states (“state” point of view). This partial 
order, denoted =: is defined as follows: o;* cout 
y Sat eF. 

A two-process distributed execution is de- 

scribed in Fig. 1. The relation —*, on event and 


; a : 
the relation —~> on local states can be easily 


extracted from it. As an example, we have el a 


oO 
e3 and Ge — re Two local states o and o’ 


oO 
which are not related by —> are independent. 
This is denoted o||o’. 


Orphan and In-Transit Messages 

Let us consider an ordered pair of local states 
(o;,0;) from different processes p; and p; and 
a message m sent by p; to p;. 


Distributed Snapshots, Fig.1 A two-process dis- 
tributed execution 
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* Ifm is sent by p; after o; and received by p; 
before o;, this message is orphan with respect 
to (o;,0;). This means that m is received 
and not sent with respect to the ordered pair 
(0;,0;). 

¢ Ifm is sent by p; before o; and received by p; 
after o;, this message is in-transit with respect 
to (o;,0;). This means that m is sent and not 
yet received with respect to the ordered pair 
(0;,0;). 


As an example, the message m, is orphan 
with respect to the ordered pair (o9,0/'), while 
the message mz is in-transit with respect to the 


ordered pair (07,04). 


Consistent Global State 

A global state is a vector of n local states (one 
per process), plus a set of channel states (one 
per directed channel). The state of a channel is 
a sequence of messages if the channel is FIFO or 
a set of messages if the channel is non-FIFO. 

A consistent global state (also called snap- 
shot) is a global state in which the computation 
has passed or could have passed. More formally, 
a global state is a pair (2, M) where the vector 
of local states © = [o1,...,0,] and the set of 
channel states M = Uyq, ;);¢8(i, j) are such that 
for any directed pair (i, 7) we have: 


* oj||o;. (This means that there is no orphan 
message with respect to the ordered pair 
(0;,0;)-) 

¢ cs(i, j) contains all the messages which are in 
transit with respect to the ordered pair (0; , 0 ;) 
and only them. (This means that cs(i, j) con- 
tains all the messages (and only them) sent by 
pi before o; and not yet received by p; when 
it enters o;.) 


As an example, when 
([o?,03],{c9(1,2) = mz, cs(2,1) = 9}) 
is a consistent global state, while both 
(lot.o$|,4és(1,2) = .:., €9(2,1) = <..}) and 
([o?, oF], {es(1, 2) = 8, cs(2, 1) = O}) are not 
consistent (the first because, as the message ™ is 
orphan with respect to the ordered pair (o}, 09), 


looking at Fig. 1, 
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we do not have o}'||o; the second because the 
message , does not belong to the channel state 
cs(2, 1).). 


The Lattice of Global States 

Let us consider a vector of local states U = 
[o1,...,0n] belonging to a consistent global 
state. The consistent global state, where X’ = 
[o1,...,0,], is directly reachable from © if there 
is a process p; whose next event e;, and we have 
ViFT: or, = 0; and o; = 4; (0, e;). This is 

GS os GS 
denoted © —> >’. By definition 2 —> D. More 


GS 


GS Gs : 
generally, 2 —> Lg —> Xp-:- Ly — XX; is 


denoted & oe dz. 

It can be shown that the set of all the vectors 
& associated with the consistent global states pro- 
duced by a distributed computation is a lattice [3, 
16]. The lattice obtained from the computation 
of Fig. 1 is described in Fig. 2. In this lattice, the 
notation © = [a, b] means © = [oe oF): 


Problem Definition 


Specification of the Computation of a 
Consistent Global State 

The problem to determine on-the-fly a consis- 
tent global state (in short CGS) was introduced, 
precisely defined, and solved by Chandy and 
Lamport in 1985 [2]. It can be defined by the fol- 
lowing properties. The first is liveness property, 
while the last two are safety properties. 


¢ Termination. If one or more processes launch 
the computation of a consistent global state, 
then this global state computation terminates. 

¢ Consistency. If a global state is computed, 
then it is consistent. 

¢ Validity. Let Xgtar be the global state of the 
computation when CGS starts, Lena be its 
global state when CGS terminates, and & be 
the global state returned by CGS. We have 


Gs* Gs* 
Ystatt —> and & — Dena. 


The validity property states that the global 
state which is returned depends on the time at 
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Events produced by 


583 


Dinit = (0, 0) 


» final = 2, 3] 


Distributed Snapshots, Fig. 2. The lattice associated with the computation of Fig. 2 


which its computation is launched. Without this 
property, returning always the initial global state 
would be a correct solution. 


Principles of CGS Algorithms 

To compute a consistent global state, each pro- 
cess p; is in charge of (a) recording a copy 
of its local state o; (sometimes called its local 
snapshot) and (b) the states of its input (or out- 
put) channels. In order that the computed global 
state satisfies the safety properties, in one way 
or another, all CGS algorithms have two things 
to do. 


e Synchronization. In order to ensure that there 
is no orphan messages with respect to each 
ordered pair of local states (o;,0;), such that 
there is a directed channel from p; to p;, 
the processes must synchronize the recording 
of their local states which will define the 
consistent global state that is computed. 

« Message recording. Each process has to 
record all the messages it receives (or 


messages it sends) which are in transit with 
respect to the computed global state. 


Key Result 1: Chandy-Lamport’s 
Algorithm 


Chandy and Lamport ’s algorithm is denoted 
CL85 in the following. 


Assumption 

CL85 considers a failure-free asynchronous sys- 
tem. Asynchronous means that each process pro- 
ceeds to its speed which can vary arbitrarily 
with time and remains always unknown to the 
other processes. Message transfer delays also are 
arbitrary (but finite). 

CL85 assumes that the processes are 
connected by a directed communication graph, 
which is strongly connected (there is a directed 
path from any process to any other process). 
Each process has consequently a nonempty set 
of input channels and a nonempty set of output 
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channels. Moreover, each directed channel is a 
FIFO channel. 


The Algorithm in Two Rules 

CL85 requires that each process computes the 
state of its input channels. At a high abstraction 
level, it can formulated with two rules. 


¢ “Local state recording” rule. When a process 
pi records its local state o;, it sends a special 
control message (called marker) on each of its 
outgoing channels. 

It is important to notice that as channels are 
FIFO, a marker partitions the messages sent 
on a channel in two sets: the messages sent 
before the marker and the messages sent after 
the marker. 

¢ “Input channel state recording” rule. When a 
process p; receives a marker on one of its 
input channels c(j,i), there are two cases. 

— Ifnot yet done, it records its local state (i.e., 
it applies the first rule) and defines cs(j, 7) 
(the state of the input channel c(j,i)) as 
the empty sequence. 

— Ifithas already recorded its local state (i.e., 
executed the first rule), p; defines cs(j, i) 
as the sequence of application messages 
received on c(j,i) between the recording 
of its local state and the reception of the 
marker on this input channel. 


Properties of the Computed Global State 

If one or more processes execute the first rule, 
it follows from this rule, and the fact that the 
communication graph is strongly connected, all 
the processes will execute this rule. Hence, a 
marker is sent on each directed channel, and each 
process records the state of its input channels. 
This proves the liveness property. 

The consistency property follows from the fol- 
lowing simple observation. Let us color processes 
and messages as follows. A process is initially 
green and becomes red when it sends a marker on 
each of its output channels. Moreover, a message 
has the color of its sender at the time the message 
is sent. 

It is easy to see that the previous two rules 
guarantee that a green process turns red before 
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receiving a red message (hence, there is no or- 
phan messages). Moreover, the green messages 
received on a channel c(/j,i) by a red process pj 
are the messages that are in transit with respect to 
the ordered pair (0 ;, 0;). Hence, all the in-transit 
messages are recorded and only them. 


The Inherent Uncertainty on the 
Computed Global State 
The proof of the validity property is a little 
bit more involved. The interested reader will 
consult [2, 15]. When looking at the lattice of 
Fig. 2, let us consider that the CL85 algorithm 
is launched when the observed computation is 
in the global state Ugtar = [0,1] and terminates 
when it is in the global state Neng = [2,2]. The 
consistency property states that the global state & 
which is returned is one of the following global 
states: [0, 1], [1, 1], [0, 2], [2, 1], [1, 2], or [2, 2]. 
This uncertainty on the computed global state 
is intrinsic to the nature of distributed computing. 
(Eliminate would require to freeze the execution 
of the application we are observing, which in 
some sense forces it to execute sequentially.) 
The main property of the consistent global 
state X that is computed is that the application has 
passed through it or could have passed through 
it. While an external omniscient observer can 
know if the application passed or not through 
x, no process can know it. This noteworthy 
feature characterizes the relativistic nature of the 
observation of distributed computations. 


Message-Passing Snapshot Versus Shared 
Memory Snapshot 

The notion of a shared memory snapshot has 
been introduced in [1]. A snapshot object is an 
object that consists of an array of atomic multi- 
reader/single-write atomic registers, one per pro- 
cess (a process can read any register but can 
write only the register it is associated with). A 
snapshot object provides the processes with two 
operations denoted update() and snapshot(). 
The update() operation allows the invoking pro- 
cess to store a new value in its register, while 
the snapshot() operation allows it to obtain the 
values of all the atomic read/write registers as 
if that operation was executed instantaneously. 
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More precisely, the invocations of the operations 
update() and snapshot() are linearizable [8]. 

Differently from the snapshot values returned 
in a message-passing system (whose global 
structure is a lattice), the arrays of values 
returned by the snapshot() operations of a 
shared memory system can be totally ordered. 
This is a fundamental difference, which is related 
to the communication medium. In one case, 
the underlying shared memory is a centralized 
component, while in the second case, the 
underlying message-passing system is inherently 
distributed, making impossible to totally order all 
the message-passing snapshots. 


Other Assumptions and Algorithms 
Algorithms that compute consistent global states 
in systems equipped with non-FIFO channels 
have been designed. Such algorithms are de- 
scribed in [11, 13, 15]. 

A communication-induced (CI) algorithm is a 
distributed algorithm that does not use additional 
control messages (such as markers). In these 
algorithms, control information (if needed) has to 
be carried by application messages. CI algorithms 
that compute consistent global states have been 
investigated in [6]. 

Global states computation in large-scale dis- 
tributed systems is addressed in [9]. 


Key Result 2: A Necessary and 
Sufficient Condition 


The Issue 

An important question is the following one: 
Given a set of x, 1 < x < n, local states from 
different processes do these local states belong to 
a consistent global state? 

If x = n, the answer is easy: any ordered 
pair of local states (o;,0;) has to be such that 
o;||o; (none of them causally depends on the 
other). Hence, the question is interesting when 
1 < x <n. This problem was addressed and 
solved by Netzer and Xu [14] and generalized 
in [7]. 

As a simple example, let us consider the exe- 
cution in Fig.3, where there are three processes 
Pi, pj, and p x that have recorded the local states 
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of and o®, where x € {i, j,k}. These local states 
produced by the computation are the only local 
states which have been recorded. Said another 
way, these recorded local states can be seen as 
local checkpoints. 

An instance of the previous question is the 
following one: can the set {o7", oP } be extended 
by the addition of a recorded local state oj of p; 
(Le.,o; = oF oro; = a?) such that the resulting 
global state & = [of,0;, er ] is a consistent? 

It is easy to see that, despite the fact that the 
local states of and oP are independent (a |lo? ), 
neither [o7', 07, op ] nor [of, of ; oF ] is consistent. 
More precisely, [o7, 07, ay ] is not consistent be- 
cause, due the message mx, the local states of 
and op are not independent, while [o7, of ; oF ] is 
not consistent because, due the message mj, ;, o/’ 
and of are not independent. 


The Result 

The notion of a zigzag path has been introduced 
by Netzer and Xu in [14]. An example of a 
simple zigzag path is the sequence of messages 
(mi,j, ™;,%) of Fig.3, where we see that the 
local states o7 and oP are related by this zigzag 
path. A zigzag path captures hidden dependencies 
linking recorded local states. These dependencies 
are hidden in the sense that not all of them can 
be captured by the relation — + defined on local 
states (as shown in the figure). 

The main result due to Netzer and Xu [14] is 
the following: a set x local states, with 1 < x <n 
and at most one local state per process, can be 
extended to a consistent global state if and only if 
no two of them are related by a zigzag path. This 
result has been extended in [7]. 
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Applications: Global Snapshots in 
Action 


Distributed snapshots are a key element to un- 
derstand and master the uncertainty created by 
asynchrony. From a practical point of view, they 
are important in distributed checkpointing, in the 
detection of stable properties defined on the set of 
global states, and in the debugging of distributed 
programs. 


Detection of Stable Properties 
A stable property is a property that, one true, 
remains true forever. In the distributed context, 
examples of distributed stable properties are 
deadlock (once deadlocked, an application 
remains forever deadlocked), termination (once 
terminated, an application remains forever 
terminated) [4,5], object inaccessibility, etc. 
Algorithms that compute consistent global 
states satisfying the liveness, consistency, and 
validity properties previously stated can be used 
to detect stable properties. This follows from the 
observation that if the computed global state & 
satisfies a stable property P, then the global state 
Nena also satisfies P. 


Checkpointing 

A checkpoint is a global state from which a com- 
putation can be resumed. Trivially, checkpoint- 
ing and consistent global states computation are 
problems which are very close [7]. The interested 
reader can consult [10, 15] for more details. 
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Problem Definition 


The vertex coloring problem takes as input an 
undirected graph G:=(V,E) and computes 
a vertex coloring, i.e., a function, c: V > [k] 
for some positive integer k such that adjacent 
vertices are assigned different colors (that is, 
c(u) ~ c(v) for all (u,v) € E). In the (A 4+ 1) 
vertex coloring problem, k is set equal to A + 1 
where A is the maximum degree of the input 
graph G. In general, (A+ 1) colors could be 
necessary as the example of a clique shows. 
However, if the graph satisfies certain properties, 
it may be possible to find colorings with far fewer 
colors. Finding the minimum number of colors 
possible is a computationally hard problem: 
the corresponding decision problems are NP- 
complete [5]. In Brooks—Vizing colorings, the 
goal is to try to find colorings that are near 
optimal. 

In this paper, the model of computation used is 
the synchronous, message passing framework as 
used in standard distributed computing [11]. The 
goal is then to describe very simple algorithms 
that can be implemented easily in this distributed 
model that simultaneously are efficient as mea- 
sured by the number of rounds required and have 
good performance quality as measured by the 
number of colors used. For efficiency, the number 
of rounds is require to be poly-logarithmic in 
n, the number of vertices in the graph and for 
performance quality, the number of colors used 
is should be near-optimal. 


Key Results 


Key theoretical results related to distributed 
(A + 1)-vertex coloring are due to Luby [9] 
and Johansson [7]. Both show how to compute 
a (A+ 1)-coloring in O(logn) rounds with 
high probability. For Brooks—Vizing colorings, 


587 


Kim [8] showed that if the graph is square or 
triangle free, then it is possible to color it with 
O(A/ log A) colors. If, moreover, the graph is 
regular of sufficiently high degree (A > logn), 
then Grable and Panconesi [6] show how to 
color it with O(A/log A) colors in O(logn) 
rounds. See [10] for a comprehensive discussion 
of probabilistic techniques to achieve such 
colorings. 

The present paper makes a comprehensive ex- 
perimental analysis of distributed vertex coloring 
algorithms of the kind analyzed in these papers on 
various classes of graphs. The results are reported 
in section “Experimental Results” below and the 
data sets used are described in section “Data 
Sets.” 


Applications 


Vertex coloring is a basic primitive in many 
applications: classical applications are schedul- 
ing problems involving a number of pairwise 
restrictions on which jobs can be done simulta- 
neously. For instance, in attempting to schedule 
classes at a university, two courses taught by the 
same faculty member cannot be scheduled for 
the same time slot. Similarly, two course that 
are required by the same group of students also 
should not conflict. The problem of determining 
the minimum number of time slots needed sub- 
ject to these restrictions can be cast as a vertex 
coloring problem. One very active application 
for vertex coloring is register allocation. The 
register allocation problem is to assign variables 
to a limited number of hardware registers during 
program execution. Variables in registers can be 
accessed much quicker than those not in registers. 
Typically, however, there are far more variables 
than registers so it is necessary to assign multiple 
variables to registers. Variables conflict with each 
other if one is used both before and after the other 
within a short period of time (for instance, within 
a subroutine). The goal is to assign variables 
that do not conflict so as to minimize the use of 
non-register memory. A simple approach to this 
is to create a graph where the nodes represent 
variables and an edge represents conflict be- 
tween its nodes. A coloring is then a conflict-free 
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assignment. If the number of colors used is less 
than the number of registers then a conflict-free 
register assignment is possible. Modern appli- 
cations include assigning frequencies to mobile 
radios and other users of the electro-magnetic 
spectrum. In the simplest case, two customers 
that are sufficiently close must be assigned dif- 
ferent frequencies, while those that are distant 
can share frequencies. The problem of minimiz- 
ing the number of frequencies is then a vertex 
coloring problem. For more applications and ref- 
erences, see Michael Trick’s coloring page [12]. 


Open Problems 


The experimental analysis shows convincingly 
and rather surprisingly that the simplest, trivial, 
version of the algorithm actually performs best 
uniformly! In particular, it significantly outper- 
forms the algorithms which have been analyzed 
rigorously. The authors give some heuristic recur- 
rences that describe the performance of the trivial 
algorithm. It is a challenging and interesting open 
problem to give a rigorous justification of these 
recurrences. Alternatively, and less appealing, 
a rigorous argument that shows that the trivial 
algorithm dominates the ones analyzed by Luby 
and Johansson is called for. Other issues about 
how local structure of the graph impacts on the 
performance of such algorithms (which is hinted 
at in the paper) is worth subjecting to further 
experimental and theoretical analysis. 


Experimental Results 


All the algorithms analyzed start by assigning an 
initial palette of colors to each vertex, and then 
repeating the following simple iteration round: 


1. Wake up!: Each vertex independently of the 
others wakes up with a certain probability to 
participate in the coloring in this round. 

2. Try!: Each vertex independently of the others, 
selects a tentative color from its palette of 
colors at this round. 
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3. Resolve conflicts!: If no neighbor of a vertex 
selects the same tentative color, then this color 
becomes final. Such a vertex exits the algo- 
rithm, and the remaining vertices update their 
palettes accordingly. If there is a conflict, then 
it is resolved in one of two ways: Either all 
conflicting vertices are deemed unsuccessful 
and proceed to the next round, or an inde- 
pendent set is computed, using the so-called 
Hungarian heuristic, amongst all the vertices 
that chose the same color. The vertices in the 
independent set receive their final colors and 
exit. The Hungarian heuristic for independent 
set is to consider the vertices in random or- 
der, deleting all neighbors of an encountered 
vertex which itself is added to the independent 
set, see [1, p. 91] for a cute analysis of this 
heuristic to prove Turan’s Theorem. 

4. Feed the Hungry!: If a vertex runs out of colors 
in its palette, then fresh new colors are given 
to it. 


Several parameters can be varied in this basic 
scheme: the wake up probability, the conflict 
resolution and the size of the initial palette are 
the most important ones. 

In (A+ 1)-coloring, the initial palette for 
a vertex v is set to [A] := {1,--- , A + 1} (global 
setting) or [d(v) + 1] (where d(v) is the degree of 
vertex v) (local setting). The experimental results 
indicate that (a) the best wake-up probability 
is 1, (b) the local palette version is as good as 
the global one in running time, but can achieve 
significant color savings and (c) the Hungarian 
heuristic can be used with vertex identities rather 
than random numbers giving good results. 

In the Brooks—Vizing colorings, the initial 
palette is set to [d(v)/s] where s is a shrink- 
ing factor. The experimental results indicate that 
uniformly, the best algorithm is the one where 
the wake-up probability is 1, and conflicts are 
resolved by the Hungarian heuristic. This is both 
with respect to the running time, as well as the 
number of colors used. Realistically useful values 
of s are between 4 and 6 resulting in A/s- 
colorings. The running time performance is ex- 
cellent, with even graphs with a thousand vertices 
colored within 20-30 rounds. When compared to 
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the best sequential algorithms, these algorithms 
use between twice or thrice as many colors, but 
are much faster. 


Data Sets 


Test data was both generated synthetically us- 
ing various random graph models, and bench- 
mark real life test sets from the second DIMACS 
implementation challenge [3] and Joe Culber- 
son’s web-site [2] were also used. 
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Problem Definition 


Indexing data so that it can be easily searched is 
one of the most fundamental problems in com- 
puter science. Especially in the fields of databases 
and information retrieval, indexing is at the heart 
of query processing. One of the most popular 
indexes, used by all search engines, is the inverted 
index. However, in many cases like bioinformat- 
ics, eastern language texts, and phrase queries 
for Web, one may not be able to assume word 
demarcations. In such cases, these documents are 
to be seen as a string of characters. Thus, more so- 
phisticated solutions are required for these string 
documents. 

Formally, we are given a collection of D docu- 
ments D = {d, dz, d3,...,dp}. Each document 
d; is a string drawn from the character set S’ of 
size o and the total number of characters across 
all the documents is 7. Our task is to preprocess 
this collection and build a data structure so that 
queries can be answered as quickly as possible. 
The query consists of a pattern string P, of length 
p, drawn from ». As the answer to the query, 
we are supposed output all the documents d; in 
which this pattern P occurs as a substring. This 
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is called the document listing problem. In a more 
advanced top-k version, the query consists of 
a tuple (P,k) where k is an integer. Now, we 
are supposed to output only the k most relevant 
documents. This is called the top-k document 
retrieval problem. 

The notion of relevance is captured by a score 
function. The function score(P,d) denotes the 
score of the document d with respect to the 
pattern P. It can be the number of times P occurs 
in d, known as term frequency, or the distance 
between two closest occurrences of P in d, or 
any other function. Here, we will assume that 
score(P,d) is solely dependent on the set of 
occurrences of P in d and is known at the time 
of construction of the data structure. 


Key Results 


The first formal study of this problem was initi- 
ated by Muthukrishnan [4]. He took the approach 
of augmenting the generalized suffix tree with ad- 
ditional information. All subsequent works have 
used generalized suffix trees as their starting 
point. A generalized suffix tree GST is a compact 
trie of all the lexicographically sorted suffixes 
of all the documents. Thus, n total suffixes are 
stored and there are n leaves in this trie. Each 
edge in GST is labeled with a string and each 
root to leaf path (labels concatenated) represents 
some suffix of some document. The overall num- 
ber of nodes in GST is O(n). With each leaf, 
we associate a document id, which indicates the 
document to which that particular suffix belongs. 

When the pattern P comes as a query, we 
first traverse from the root downward and find a 
vertex, which is known as locus(P). This can be 
done in O(p) time. This is the first vertex below 
the edge where P finished in the root to leaf 
traversal. If v is locus(P) then all the leaves in the 
subtree of v represent the suffixes whose prefix 
is P. For any vertex v, let path(v) be the string 
obtained by concatenating all the labels from the 
root until v. 


Document Listing Problem 
Let us first see how the document listing problem 
is solved. One easy solution is to reach the locus 
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v of P and then visit all the leaves in the subtree 
of v. But this is costly as the number of leaves 
occ may be much more than number of unique 
document labels ndoc among these leaves. Opti- 
mally, we want to achieve O(p + ndoc) time. 

To overcome this issue, Muthukrishnan first 
proposed to use a document array D4. To con- 
struct this, he traverses all the leaves in GST 
from left-to-right and takes the corresponding 
document id. Thus, D4[i] = document id of ith 
lexicographically smallest suffix in GST. It is 
easy to find boundary points sp and ep such that 
the subtree of locus v corresponds to entries form 
Dalsp,..., ep]. To uniquely find documents in 
Dal|sp,...ep], we must not traverse the entire 
subarray as this would cost us O(occ). To avoid 
this, we construct another array C called a chain 
array. C[i] = j, where 7 < i is the largest 
index for which Dy[i] = Dal[j]. If no such j 
exists then C[i] = —1. Thus, every document 
entry Dali] links to the previous entry with the 
same document id. Now, to solve the document 
listing problem, one needs to get all the i’s 
such that sp < i < ep and C[i] < sp. The 
second constraint guarantees that every document 
is output only once. Muthukrishnan shows how to 
repeatedly apply range minimum queries (RMQ) 
to achieve constant time per output id. 


Theorem 1 ([4]) Given a collection of docu- 
ments of total length n, we can construct a data 
structure taking O(n) space, such that document 
listing queries for pattern P can be answered in 
optimal O(p + ndoc) time. 


Top-k Document Retrieval 

Hon et al. [3] brought in an additional constraint 
of score function which can capture various no- 
tions of relevance like frequency and proximity. 
Instead of reporting all the documents, we only 
care to output the k highest scoring documents, 
as these would be the most relevant. Thus, O(p + 
ndoc) time is not optimal. We briefly describe 
their solution. 

Let STg denote a suffix tree only for suf- 
fixes of document d. They augment the GST 
with Links. Link L is a 4-tuple: origin_node o, 
target_node t, document_id d, score_value s. 
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Essentially, (L.o, L.t, L.d, L.s) is a link if 
and only if (t’,o0’) is an edge in the suffix tree 
STq of the document d. Here, t’ is the node in 
STq for which path(t’) = path(L.t). Similarly, 
path(o’) = path(L.o). The score value of the link 
L.s = score(path(o’),d). For L.o and L.t, we 
use the preorder-id of those nodes in the GST. 
The total number of link entries is the same as the 
sum of number of edges in each individual suffix 
tree of all the documents, which is O(n). They 
store these links in an array £, sorted by target. In 
case of a tie among targets, we sort them further 
by their origin values. 

When they execute the query (P, k), they first 
find the locus node v in GST. Now, the task is to 
find the top-k highest scoring documents within 
the subtree of v. Because the links are essentially 
edges in individual suffix trees, for any document 
d;, there is at most one link whose origin o is 
within the subtree of v and whose target ¢ is 
outside the subtree. Moreover, note that if the 
target ¢ is outside the subtree then it must be 
one of the ancestors of v. The score of this link 
is exactly the score(P,d;). Then the query is: 
Among all the links, whose origin starts within 
the subtree of locus v and whose target is outside 
the subtree of v, find the top-k highest scoring 
links. The documents are to be output in sorted 
order of score values. Let f, be the preorder value 
of v and /, be the preorder value of the last node 
in the subtree of v, then any qualifying link L 
would satisfy fy < L.o < 1, and L.t < fy. And 
then, among all such links, only get k with the 
highest scores. 

Their main idea is that one needs to look for at 
most p different target values — one for each an- 
cestor of v. In the sorted array £, these links come 
as at most p different subarray chunks. Moreover, 
within every target value the links originating 
from the subtree of v also come contiguously. 
Let (41,71), (2, r2),..., (Ig, rg) with g < p be 
the intervals of the array £ in which any link L 
satisfies fy < L.o < ly and L.t < fy. We 
skip here the description of how these intervals 
are found in O(p) time. Now, the task is to get 
top-k highest ranking links from these intervals. 
For this, they construct a Range Maximum Query 
(RMQ) structure on score values of £. They ap- 


591 


ply RMQs over each interval and put these values 
in a heap. Then they do extract-min from the 
heap, which at most maintains O(k) elements. 
If they output an element from (/g, rq), the next 
greatest element from the same interval is put 
in the heap. They stop when the heap outputs k 
links. This takes O(p + k log k) time. 


Theorem 2 ([3]) Given a collection of docu- 
ments of total length n, we can construct a 
data structure taking O(n) space, such that 
top-k document retrieval queries (P,k) can be 
answered in O(p + k logk) time. 


Navarro and Nekrich [5] further improved the 
time to optimal O(p + k). To achieve this, they 
first change the target attribute of the link to 
target_depth td. They model the links as two 
dimensional points (x;, yj) with weights w; as 
the score. They maintain a global array of these 
points sorted by their x-coordinates, which are 
the preorders of origins of the links, while y 
stands for target_depth. If h is the depth of locus 
v and v spans the preorders [a,b], then their 
query is to obtain points in [a,b] x [0,/] with 
top-k highest weights. First, they make a basic 
unit structure for m <n points and answer these 
queries in O(m/ + k) time. Here,0 < f < 1 
is a constant. This is done by partitioning points 
by weights. Within each partition, the weights 
are disregarded. Then they start executing query 
Q = [a,b] x [0,h] from the highest partition. 
If less than k’ < k points qualify, then they 
output these points, change k to k — k’ and go 
to the next partition and so on. At some stage, 
in some partition more than k points will qualify. 
One cannot output all of them, and must get only 
the highest weighted points from that partition. 
For this, the partitions are further recursively 
divided into next level partitions. The depth of 
this recursion is constant and there are at most 
O(m/‘) partitions to be queried and each point 
is output in constant time. This gives O(m/ + 
k) time. For sorted reporting, they show how 
Radix sort can be applied in a constant number of 
rounds. 

Next, with this as a basic unit, they show how 
to create a layerwise data structure, so that we 
choose the appropriate layer according to h when 
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the query comes. The parameters of that layer 
ensure that we get O(h + k) time for the query. 


Theorem 3 ([5]) Given a collection of docu- 
ments of total length n, we can construct a data 
structure taking O(n) space, such that top-k 
document retrieval queries can be answered in 


O(p + k) time. 


External Memory Document Retrieval 

Shah et al. [6] obtained the first external memory 
results for this problem. In the external memory 
model, we take B as the block size and we count 
I/O (input/output) operations as the performance 
measure of the algorithm. They achieved opti- 
mal O(p/B + loggn + k/B) V/Os for unsorted 
retrieval. They take O(n log*n) space, which 
is slightly super linear. We briefly describe the 
structure here. They first make ranked compo- 
nents of GST. The rank of a node v with subtree 
size Sy is |log[s,/B]|]|. Ranked components are 
the contiguous set of vertices of the same rank. 
Apart from rank 0 vertices, all other compo- 
nents form downward paths (this is very similar 
to heavy path decomposition). Now, for links, 
instead of global array CL, they keep the set of 
links associated with each component. Basically, 
every link belongs to the component where its 
target is. They maintain two structures, one is a 
3-sided structure in 2D [2] and the other is a 3D 
dominance structure [1]. 

The query processing first finds the locus v. 
Also, the query parameter k is converted into a 
score threshold t using sketching structures. We 
are interested in links such that fy < L.o < 
ly, Lit < fy, and L.s > t. These are four 
constraints. In external memory, three constraints 
are manageable, but not four. So they decom- 
pose the query into those with three constraints. 
They categorize the answer set into two kinds of 
links (i) the links whose targets are in the same 
component as v, and (ii) the links whose targets 
are in components ranked higher than v. By the 
property of rank components there are at most 
logn/B such higher components. For the second 
kind of links, we query all the higher-ranked 3- 
sided (2D) structures [2] with f,,/y,t as the 
parameters. As long as the links are coming from 
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the subtree of v, the target values of links need not 
be checked. For the first kind of links, one cannot 
drop the target condition, i.c., L.t < f, must be 
satisfied. However, they show that a slight renum- 
bering of pre-orders based on visiting the child in 
its own rank component allows condition L.o < 
ly to be dropped. Such queries are answered by 
3D dominance structures. Using this, they obtain 
O(p/B + log?(n/B) + k/B) query I/Os with 
linear space structure. They further bootstrap this 
to remove the middle term log log?(n/B) by dou- 
bling space requirements. This recursively leads 
to the following result. 


Theorem 4 ([6]) Given a document collection of 
size n, we can construct an O(n log* n) space 
structure in external memory, which can answer 
the top-k document retrieval queries (P,k) in 
O(p/B + loggn + k/B) I/Os. The output is 
unsorted. 


As a side effect of this result, they also obtain 
internal memory sorted top-k retrieval in O(p + 
k) time like [5], and better, just O(k) time if 
locus(P) is given. This is because the answers 
come from at most logn different components. 
For dominance and 3-sided queries, one can get 
sorted outputs. And then, atomic heaps are used 
to merge at most logn sorted streams. Since 
atomic heaps only have O(logn) elements at a 
time, they can generate each output in O(1) time. 
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Problem Definition 


This problem deals with the design and analysis 
of a novel approximation algorithm for the min- 
imum weight connected dominating set problem 
(MWCDS) under unit disk graph (UDG) model. 
The WCDS is proved to be NP-hard in 1970s, 
while for a long period researchers could not find 
a constant-factor approximation until 2006, when 
Ambiihl et al. first introduced an approximation 
with ratio of 89 under UDG model. Inspired 
by their subroutines, we proposed a (10 + €)- 
approximation with double partition technique, 
which greatly improved the efficiency and effec- 
tiveness of the problem. 
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Given a homogeneous wireless ad hoc net- 
work, represented as an undirected graph G = 
(V, E), where each vertex v; € V in the network 
has the same communication range, we denote it 
as the unit | distance. Two nodes v1, v2 can com- 
municate with each other when their Euclidean 
distance is smaller than 1, and correspondingly, 
the edge set E = {(v;, vj) | dist(vj, vj) < 1}. If 
v; and v; are connected, then we say that v; isa 
neighbor of v; (and vice versa). Such communi- 
cation model is named as unit disk graph (UDG). 
Additionally, each vertex v; has a weight w;. 


Objective 

We hope to find a connected dominating subset 
U C V in the given graph with the minimum 
weight, such that each vertex v; € V is either in 
U or has a neighbor in U and the induced graph 
G[U] is connected. In addition, the weight of U 
(defined as the sum of weights for elements in U, 
say, W(U) = pa ey Wi) is the minimum among 
all connected dominating subsets satisfying the 
above requirements. 


Constraints 


1. Unit disk graph: We restrict our discussion 
on two-dimensional space where each vertex 
has the same communication range, and edges 
between vertices are constructed according to 
the distance constraint. 

2. Weight minimization: We focus on the 
weight version of minimum connected 
dominating set problem, which is much 
harder than the cardinality version, and thus 
providing a constant-factor approximation 
seems more difficult. 


Problem 1 (Minimum Weight Connected 
Dominating Set in Unit Disk Graph) 
INPUT: A unit disk graph G = (V,E) anda 
weight assigned on each vertex 
OUTPUT: A minimum weight connected domi- 
nating vertex subset U C V such that (1) 
wirelength is minimized and (2) the area- 
density constraints Dj; < K are satisfied for 
all Bij eB 
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Key Results 


The minimum weight connected dominating set 
problem (MWCDS) can be divided into two 
parts: selecting a minimum weight dominating 
set (MWDS) and connecting the dominating set 
into a connected dominating set. In this chapter, 
we will focus on the former part, while the latter 
part is equivalent to solving a node-weighted 
Steiner tree problem in unit disk graphs. 

The first constant-factor approximation algo- 
rithm for MWCDS under UDG was proposed 
by Ambiihl C. et al. in 2006 [1], which is a 
polynomial-time 89-approximation. Later, Gao 
X. et al. [2] introduced a better approximation 
scheme with approximation ratio of (6 + ¢€) for 
MWDS, and Huang Y. et al. [3] further extended 
this idea to MWCDS with approximation ratio 
of (10 + €). The main idea of their methods in- 
volves a double partition technique and a shifting 
strategy to reduce the redundant vertices selected 
through the algorithms. 

In recent year, the approximation for MWDS 
in UDG received further improvements from (6+ 
€) to 5 by Dai and Yu [4], to 4 by Erlebach T. 
et al. [5] and Zou F et al. [6] independently, 
and to 3.63 by Willson J. et al. [7]. Meanwhile, 
to connect the dominating set in UDG, Ambiihl 
C. et al. [1] gave a 12-approximation, Huang Y. 
et al. [3] provided a 4-approximation, and Zou F. 
et al. [8] constructed a 2.5p-approximation with 
a known p-approximation algorithm for the mini- 
mum network Steiner tree problem. Recently, the 
minimum approximation for network Steiner tree 
problem has an approximation ratio of 1.39 [9], 
so the best approximation ratio for MWCDS 
problem in UDG is 7.105 up to now. 


Double Partition Technique 


Given a UDG G containing n disks in the plane. 
Let uw < a2 be a real number which is suf- 


ficiently close to Seg say, 4 = 0.7. Partition 
the area into squares with side length j. If the 
whole area has boundary P(n) x O(n), where 


P(n) and Q(n) are two polynomial functions on 
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n, then given the integer even constant K and 
letting K x K squares form a block, our partition 
will have at most ((2) + 1)x (2) + 1) 
blocks. We will discuss algorithm to compute 
MWDS for each block firstly and then combine 
them together. 


MWDS in K x K Squares 

Assume each block B has K* squares Sj;, for 
i,j € {0,1,..., K—1}. Let V;; be the set of disks 
in S;;. If we have a dominating set D for this 
block, then for each square S;;, its corresponding 
dominating set is (1) either a disk from inside Sj; 
(since dist(d,d’) < 1 for any two disks within 
this square) or (2) a group of disks from neighbor 
region around S;;, the union of which can cover 
all disk centers inside the square. Then if we want 
to select a minimum weight dominating set, for 
each square we will have two choices. However, 
instead of selecting dominating sets square by 
square, we hope to select them strip by strip to 
avoid repeated computation for some disks. For 
this purpose, we have the following lemmas. 


Lemma 1 ([1]) Let P be a set of points located 
in a strip between lines y = y, and y = y2 for 
some y, < yz. Let D be a set of weighted disks 
with uniform radius whose centers are above 
the line y = yz or below the line y = yy. 
Furthermore, assume that the union of the disks 
in D covers all points in P. Then a minimum 
weight subset of D that covers all points in P 
can be computed in polynomial time. 


The proof of result for Lemma | is in fact con- 
structive. It gives a polynomial-time algorithm by 
a dynamic programming. It says that as long as 
the set of centers P in a horizontal strip can be 
dominated by a set of centers D above and/or 
below the strip, then an optimal subset of D 
dominating P can be found in polynomial time. 

Our next work is to select some disks for 
each square within a strip so that those disks 
can be covered by disks from the upper and 
lower strips. To better illuminate the strategy, we 
divide the neighbor parts of S;; into eight regions 
UL,UM, UR,CL,CR,LL,LM,andLR as 
shown in Fig.1. The four lines forming Sj; 
are X = X1,X = x2, y = yi, and y = yo. 
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Double Partition, Fig. 1 S;; and its neighbor regions 


Denote by Left = ULUCLULL, Right = 
URUCRU LR, Up = ULUUM U UR, and 
Down = LLULM U LR. After that, we will 
have Lemma 2. 


Lemma 2 Suppose p € Vj; is a disk in Sj; 
which can be dominated by a disk d € LM. We 
draw two lines pj; and p,, which intersect y = 
y1 by angle q and az Then the shadow Pim 
surrounded by x = xX1,X = X2,yY = V1, PL 
and p, (shown in Fig. 2) can also be dominated 
by d. Similar results can be held for shadow 
Pum, Pct, and Per, which can be defined with 
a rotation. 


Proof We split shadow Pz y into two halves with 
vertical line x = Xp, where Xp is x-coordinate 
of disk p. Then we prove that the right half of 
Pym can be covered by d. The left half can be 
proved symmetrically. Let o be intersection point 
of x = x, and y = yj, a that of p, and x = 
X2 (or pr and y = y;), and D that of x = x2 
and y = jj. Intuitionally, the right half can be 
either a quadrangle pabo or a triangle pao. We 
will prove both cases as follows: 


Quadrangle case: Draw the perpendicular line 
of the line segment pa, namely, pm. When 
d is under p, as in Fig. 3a, we will have 
dist(d,a) < dist(p,d) < 1. Moreover, it is 
trivial that dist(d,o) and dist(d,b) are all < 1. 
Thus, d can cover the whole quadrangle. When 
d is above the line pm as in Fig. 3b, we draw an 
auxiliary line y = yg parallel with y = y, and 
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x = xq intersecting y = yg at point c. Since d 
lies above pm, Zcad < 7/4, and thus 


dist(c, a) E J/2/2 _ 


ase.4)= cos Zcad 


1. 
us 
cos 7. 


Note that both dist(d,o) and dist(d,b) are less 
than 1, so d can cover the whole quadrangle. 


Triangle case: Similarly, draw p» as described 
above. The proof remains when d is under p», 
(see Fig. 3c). When d is above py as in Fig. 3d, 
we draw auxiliary line x = xg intersecting y = 
y, atc. Then we will get the same conclusion. 


With the help of Lemma 2, we can select 
a region from S;;, where the disks inside this 
region can be covered by disks from Up and 
Down neighbor area. We name this region as 
“sandglass,” with formal definition as follows: 


Definition 1 (Sandglass) If D is a dominating 
set for square S;; and DM Vj; = @, then there 
exists a subset Vy C V;; which can only be 
covered by disks from UM and LM (we can 
set Vy = @ if there are no such disks). Choose 
Vim C Vw the disks that can be covered by 
disks from LM, and draw p; and p, line for each 
p € Vim. Choose the leftmost p; and rightmost 
Pr and form a shadow similar as that in Lemma 2. 
Symmetrically, choose Vyjyy and form a shadow 
with leftmost and rightmost lines. The union 
of the two shadows form a “sandglass” region 
Sand;; of S;; (see Fig. 4a, where solid circle 
represents Vz, while hollow circle represents 
Vum ). Fig. 4b-4d gives other possible shapes of 
Sandj. 


Lemma 3 Suppose D is a dominating set for Sj; 
and Sandj;;s are chosen in the above way. Then 
any disks in Sand;; can be dominated by disks 
only from neighbor region Up U Down, and disks 
from Sj; \Sand;; can be dominated by disks only 
from neighbor region Left U Right. 


Proof Suppose to the contrary, there exists a disk 
d € Sand;; which cannot be dominated by disks 
from Up U Down. Since D is a dominating set, 
there must be ad’ € CL UCR which dominates 
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Double Partition, Fig. 2 a b 
Different shapes for 
shadow Pr 
Pp 
3m 3m 
“9 4 ™ 4 
a Se 4 an 
P| P, Py P,. 
c d 
P 
Pp 
3a 3a 
Zs “a 
Ns ne 
P; P, P BP, 
Double Partition, Fig. 3 a b 


Shape of shadow and 


location of d. (a) d to left 
of Dm. (b) d to left of pm. 
(c) d to left of pm. (d) d 
to left of Dm 


d. Without loss of generality, assume d belongs 
to lower half of the sandglass which is formed by 
pi and po, and let d’ € CL (see Fig. 5). Based on 
our assumption, d cannot locate in p,’s triangle 
shadow to Down region (otherwise, since p; can 


d 


be dominated by a disk from LM, d can also 
be dominated by this disk). We then draw d 
and d, to CL region and form a shadow to CL. 
Then by Lemma 2 every disk from this shadow 
can be dominated by d’. Obviously, pi belongs 
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- 


Double Partition, Fig. 4 Sandglass Sand; for Sj. (a) 
Form of sandglass. (b) Sandglass without intersection. (c) 
Sandglass with single disk. (d) Sandglass with half side 


Up Region 


CR 


Down Region 


Double Partition, Fig. 5 Proof for sandglass 


to this region, but p; is a disk which cannot be 
dominated by disks from CL, a contradiction. 


Till now we already find “sandglass” region 
in which disks can be dominated by disks only 
from Up and Dowz regions. In our algorithm, for 
each square S;;, we can firstly decide whether to 
choose a disk inside this square as dominating set 
or to choose a dominating set from its neighbor 
region. If we choose the latter case, the algorithm 
will randomly select 4 disks d, d2, d3, and d4 
from S;; and make corresponding sandglass (we 
can also choose less than 4 disks to form the 
sandglass). By enumeration of all possible sand- 
glasses including the case of choosing one disk 
inside the square, for all squares within K x K 
area, there are at most Dy us Chae choices (n 
is the number of disks), which can be calculated 
within polynomial time. Moreover, when consid- 
ering choosing a dominating set from neighbor 
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Double Partition, Fig. 6 Block selection 


Algorithm 1 Calculate MWDS in K x K squares 


Input: K x K squares with inner disks 
Output: A local MWDS 

1: For each S;7, choose its sandglass or randomly select 
dé Sij- 

2: If d € Sj; is selected, then remove d and all disks 
dominated by d. 

3: For each strip a Si; fromi = 1 to K, calculate 
a dominating set for the union of disks in the sand- 
glasses. 

4: For each strip ie Si; from j = 1 to K, calculate 
a dominating set for the remaining disks not covered 
by Step 3. 

Return the union of disks chosen in the above steps 
for K x K squares. 


regions, we should also include regions around 
this K x K areas such that we will not miss disks 
outside the region. Therefore, we should consider 
(K +4) x (K +4) area, where the inner region is 
our selected block and the surrounding four strips 
are the assistance (shown as Fig. 6). 

In all, we will have Algorithm | with four 
steps to calculate an MWDS for K x K squares. 
We enumerate all possible cases for each S;; and 
choose the solution with minimum weight, which 
forms an MWDS for Sj;. 


MWDC for the Whole Region 

As discussed above, if our plane has size P(n) x 
O(n), then there are at most (( 2) + 1)x 
( p2@y + 1) blocks in the plane. We name each 
block B*”, where 0 < x < pP@] + 1 and 
0O<y< p2@) + 1. Then, using Algorithm 1 
to calculate dominating set for each block and by 
combining them together, we obtain a dominating 
set for our original partition. 
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Double Partition, Fig. 7 Move blocks 


Algorithm 2 Calculate MWDS for the whole 
plane 
Input: G in region P(n) x Q(n) 
Output: A global MWDS 
1: For a certain partition, calculate MWDS for each 
block B*”, sum the weight of MWSD for each block, 
and form a solution. 
2: Move each block to two squares to the right and two 
squares to the top of the original block. 
3: Repeat Step 1 for new partition, and get a new 
solution. 
4: Repeat Step 2 for [] times, and choose the mini- 
mum solution among those steps. 
Return the solution from Step 4 as our final result. 


Next, we move our blocks to different posi- 
tions by shifting policy. Move every block two 
squares right and two squares up to its original 
position, which can be seen from Fig.7. Then 
calculate dominating set for each block again, and 
combine the solution together. We do this process 
& times, choose the minimum solution as our 
final result. The whole process can be shown as 
Algorithm 2. 


Performance Ratio 


In the following, we extend our terminology 
“dominate” to points (a point is a location which 
is not necessarily a disk). A point p is dominated 
by a set of disks if the distance between p and 
at least one center of the disks is not more than 
1. We say an area is dominated by a set of disks 
if every point in this area is dominated by the 
set of disks. Let OPT be optimal solution for 
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Double Partition, Fig. 8 An example for disk cover 
region 

our problem and w(OPT) the weight of optimal 
solution. 


Theorem 1 Algorithm 2 always outputs a dom- 
inating set with weight within 6 + «€ times of the 
optimum one. 


Proof Our proof mainly has two phases. The first 
phase analyzes that our Algorithm | gives a 6- 
approximation for disks in K x K squares. The 
second phase proves that result from Algorithm 2 
is less than (6+ €)-w(OPT). 


Phase 1: If a disk has radius 2 and our partition 
has side length uw < Sey then a disk may 
dominate disks from at most 16 squares, which 
can be shown in Fig. 8. Simply, if a disk in OPT 
is used to dominate the square it belongs to, 
then we will remove this disk before calculating 
MWDS for strips. Therefore, it will be used only 
once. If a disk is not used to dominate the square 
containing it, then it may be used 3 times in 
calculating its 3 horizontal neighbor strips (11, 
Hy, and H3 as shown in Fig. 8) and another 3 
times in calculating its 3 vertical neighbor strips 
(V1, V2, and V3 in Fig. 8). Therefore, Algorithm | 
is a 6-approximation for each block. 


Phase 2: Now we consider the disks in side 
strips for a block. As discussed above, when 
calculating MWDS for a strip, we may use disks 


Double Partition 


within (K + 2) x (K + 2) squares. Therefore, 
we can divide a block B®») into three kinds of 
squares, just as shown in Fig.9 (0 < x < p21, 
andQ<y< p24 ). If a disk belongs to inner 
part A of B®), it will be used at most 6 times 
during calculating process. We name those disks 
as din. If a disk belongs to side part B of Boy), 
it may be used at most 5 times for calculating 
BY), but it may used at most 4 times when 
calculating B°)’s neighbor block. We name 
those disks as dsige. If a disk belongs to corner 
squares C of BY), it may be used at most 4 times 
for calculating B®”) and at most 8 times for 
neighbor blocks. We name those disks as deorner- 
In addition, we know that during shifting process 
a node can stay at most 4 times in side or corner 
square. If we name / as the /th shifting, then our 
final solution will have the following inequality: 


W (Solution) 


= min {) [6w(din) + 9 (d Jae) + 120(d corner} 
Sol; 
K 
1 - 1 1 1 
= K a {ow(din) a 12w(d ae - ee) 
2 1=0 


= 6w(OPT) + ** (OPT) 
2. 
< (6+ ©)w(OPT) 


where ¢ = 42/ & can be arbitrarily small 
when K is sufficiently large. 


Applications 


Dominating set problem is widely used in 
network-related applications. For instance, 
in mobile and wireless ad hoc networks, it 
is implemented for communication _ virtual 
backbone selection to improve routing efficiency, 
for sensor coverage problem to extend network 
lifetime, and for clustering and data gathering 
problem to avoid flooding and energy waste. 
In optical network and data center networks, it 
is used for network management and switch- 
centric routing protocols. In social network 
applications, it is used for many cluster-related 
problems like positive influence, effective leader 
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Double Partition, Fig.9 Divide block B®”) into 
3 parts 


group, etc. A weighted dominating set is a 
generalized heterogeneous network model to 
describe real-world applications, which is more 
realistic and practical. 


Open Problems 


There are two open problems for the minimum 
weighted connected dominating set (MWCDS) 
problem under unit disk graph (UDG). Firstly, 
there is another grid partition design to con- 
struct a constant approximation for domatic par- 
tition problem in unit disk graph [10]. Could this 
new technique result in an improvement in run- 
ning time or performance ratio? Secondly, does 
MWCDS problem in UDG have a polynomial- 
time approximation scheme (PTAS)? Currently, 
no one has answered the above questions. 
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Problem Definition 


Given an undirected, unweighted graph with 
n nodes and m edges that is modified by a 
sequence of edge insertions and deletions, the 
problem is to maintain a data structure that 
quickly answers queries that ask for the length 
d(u, v) of the shortest path between two arbitrary 
nodes u and v in the graph, called the distance of 
u and v. The fastest exact algorithm for this 
problem is randomized and takes amortized 
O (n? (logn + log? ((m + n)/n))) time per 
update and constant query time [6, 11]. In the 
decremental case, i.e., if only edge deletions are 
allowed, there exists a deterministic algorithm 
with amortized time O(n”) per deletion [7]. More 
precisely, its total update time for a sequence of 
up to m deletions is O(mn7). Additionally, there 
is a randomized algorithm with O(n? log’ n) 
total update time and constant query time [1]. 
However, in the decremental case, when only 
a-approximate answers are required, i.e., when 
it suffices to output an estimate 5(u, v) such that 
d(u,v) < 6(u,v) < ad(u,v) for all nodes u 
and v, the total update time can be significantly 
improved: Let € > 0 be a small constant. The 
fastest prior work was a class of randomized 
algorithms with total update time O(mn) for 


w= 1+ [10], 6(n5/2+00/veE™) for 
a = 3+, and Ot OU oe) for 


a =2k—1+.e [4]. 

This leads to the question whether fora = 1+ 
€ (a), a total update time of o(nm) is possible and 
(b) a deterministic algorithm with total update 
time O(nm) exists. 

As pointed out in [3] and several other places, 
a deterministic algorithm is interesting due to 
the fact that deterministic algorithms can deal 
with an adaptive offline adversary (the strongest 
adversary model in online computation [2, 5]), 
while the randomized algorithms developed so 
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far assume an oblivious adversary (the weakest 
adversary model) where the order of edge dele- 
tions must be fixed before an algorithm makes 
random choices. 


Key Results 


The paper of Henzinger, Krinninger, and 
Nanongkai [8] presents two algorithms for 
a = 1+ «. The first one is a deterministic 
algorithm with total update time O(mn). 
The second one studies a slightly relaxed 
version of the problem: Given a _ constant 
B, let d(u,v) be an (a, 8)-approximation if 
d(u,v) < 6(u,v) < ad(u,v) + B for all nodes 
u and v. The second algorithm is a randomized 
algorithm with total update time O(n°/2) that 
can guarantee both a (1 + €,2) anda (2 + €,0) 
approximation. 

The results build on two prior techniques, 
namely, an exact decremental single-source 
shortest path data structure [7], called ES-tree, 
and the (1 + €,0)-approximation algorithm 
of [10], called RZ-algorithm. The RZ-algorithm 
chooses for all integer i with | < i < logn, 
O(n/(e2')) random nodes as centers, and 
maintains an ES-tree up to distance 2'*? for 
each center. For correctness, it exploits the fact 
that the random choice of centers guarantees 
the following invariant (I): For every pair of 
nodes u and v with distance d(u,v), there 
exists with high probability a center c such 
that d(u,c) < ed(u,v) and d(c,v) < d(u,v). 
The total update time per center is O(m2') 
resulting in a total update time of O(mn). The 
deterministic algorithm of [8] derandomizes 
this algorithm by initially choosing centers 
fulfilling invariant (I) and after each update (a) 
greedily generating new centers to guarantee 
that (I) continues to hold and (b) moving the 
root of the existing ES-trees. To achieve a 
running time of O(mn), the algorithm is not 
allowed to create more than O(n /(€2')) many 
centers for each i. This condition is fulfilled by 
dynamically assigning each center a set of 2(2') 
vertices such that no vertex is assigned to two 
centers. 
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The improved randomized algorithm uses the 
idea of an emulator, a sparser weighted graph 
that approximates the distances of the original 
graph. Emulators were used for dynamic shortest- 
path algorithms before [4]. The challenge when 
using an emulator is that edge deletions in the 
original graph might lead to edge deletions, edge 
insertions, or weight increases in the emulator, 
requiring in principle the use of a fully dynamic 
shortest-path algorithm on the emulator. Bern- 
stein and Roditty [4] deal with this challenge 
by using an emulator where the number of dis- 
tance changes between any two nodes can be 
bounded. However, the RZ-algorithm requires 
that the number of times that the distance between 
any two nodes changes is at most R before that 
distance exceeds R for any integer R with 1 < 
R <_n. As the emulator used by Bernstein 
and Roditty does not fulfill this property, they 
cannot run the RZ-algorithm on it. The new algo- 
rithm does not construct such an emulator either. 
Instead, it builds an emulator where the error 
introduced by edge insertions is limited and runs 
the RZ-algorithm with modified ES-trees, called 
monotone ES-trees, on this emulator. The analy- 
sis exploits the fact that the distance between any 
two nodes in the original graph can only increase 
after an edge deletion. Thus, even if an edge 
deletion leads to changes in the emulator that 
decrease their distance in the emulator, the corre- 
sponding ES-trees do not have to be updated, i.e., 
the distance of a vertex to its root in the ES-tree 
never decreases. The analysis shows that the error 
introduced through the use of monotone ES-trees 
in the RZ-algorithm is small so that the claimed 
approximation ratio is achieved. However, since 
the ES-trees are run on the sparse emulator the 
overall running time is o(mn). 


Open Problems 


The main open problem is to find a similarly 
efficient algorithm in the fully dynamic setting, 
where both edge insertions and deletions are 
allowed. A further open problem is to extend the 
derandomization technique to the exact algorithm 
of [1]. 
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Another challenge is to obtain similar results 
for weighted, directed graphs. We recently 
extended some of the above techniques to 
weighted, directed graphs and presented a 
randomized algorithm with O (mn°-?8°) total 
update time for (1 + €)-approximate single- 
source shortest paths [9]. 
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Problem Definition 


A dynamic graph algorithm maintains informa- 
tion about a graph that is changing over time. 
Given a property P of the graph (e.g., maxi- 
mum matching), the algorithm must support an 
online sequence of query and update operations, 
where an update operation changes the underly- 
ing graph, while a query operation asks for the 
state of P in the current graph. In the typical 
model studied, each update affects a single edge, 
in which case the most general setting is the fully 
dynamic one, where an update can either insert 
an edge, delete an edge, or change the weight 
of an edge. Common restrictions of this include 
the decremental setting, where an update can 
only delete an edge or increase a weight, and the 
incremental setting where an update can insert an 
edge or decrease a weight. 

This entry addresses the problem of 
maintaining a-approximate all-pairs shortest 
paths (APSP) in the fully dynamic setting in a 
weighted, undirected graph (the approximation 
factor a depends on the algorithm); the goal is to 
maintain an undirected graph G with real-valued 
nonnegative edge weights under an online inter- 
mixed sequence of the following operations: 


¢ delete(u, v) (update): remove edge (u, v) from 
G. 
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¢ insert(u, v) (update): insert an edge (u, v) into 
G. 

¢ change weight(u,v,w) (update): change the 
weight of edge (u, v) to w. 

e distance(u,v) (query): return an a-appro- 
ximation to the shortest u — v distance in 
G. 

¢ path(u, v) (query): return an a-approximate 
shortest path from u to v. 


Approaches 
The naive approach to the fully dynamic APSP 
problem is to recompute shortest paths from 
scratch after every update, allowing queries to 
be answered in optimal time. Letting n be the 
number of vertices and m the number of edges, 
computing APSP requires O(mn + n7loglog(n)) 
time in sparse graphs [8] or slightly less 
than n> in dense graphs [9, 13]. If we allow 
approximation, a slightly better approach would 
be to construct an approximate distance oracle 
after each update, i.e., a static data structure 
for answering approximate distance queries 
quickly; an oracle for returning k-approximate 
distances (k > 3) can be constructed in time 
O(min{n? log(n),kmn'/*%) [1, 11]. Another 
simple-minded approach would be to not perform 
any work during the updates and to simply 
compute the shortest u—v path from scratch when 
a query arrived; using Dijkstra’s algorithm with 
Fibonacci heaps [7], this would lead to a constant 
update time and a query time of O(m +n log(n)). 
The goal of a dynamic algorithm is to improve 
upon the above approaches by taking advantage 
of the fact that each update only affects a single 
edge, so one can reuse information between up- 
dates and thus avoid recomputing from scratch. 
In a breakthrough result, Demetrescu and Italiano 
showed that in the most general case of a di- 
rected graph with arbitrary real weights, one can 
answer updates in amortized time O(n? log?(n)) 
while maintaining optimal O(1) time for distance 
queries [6]; Thorup improved the update time 
slightly to O(n? (log(n) +log?((m+n)/n))) [10]. 
This entry addresses a recent result of Bernstein 
[3] which shows that in undirected graphs, one 
can significantly improve upon this n? update 
time by settling for approximate distances. 
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Key Results 


Bernstein’s paper starts by showing that assum- 
ing integral weights, there is an algorithm for 
maintaining (2 + €)-approximate APSP up to a 
bounded distance d with amortized update time 
O(md). This is efficient for small distances, but 
in the general case d can be very large, especially 
in weighted graphs. Bernstein overcomes this by 
introducing a high-level framework for extending 
shortest path-related algorithms that are efficient 
for small distances to ones that are efficient for 
general graphs; he later applied this approach to 
two other results [4, 5]. 

The first step of the approach is to show 
that with simple scaling techniques, an algorithm 
that is efficient for small (weighted) distances 
can be extended to an algorithm that is efficient 
for shortest paths with few edges (regardless of 
weight). Applying this technique to the above 
algorithm yields the following result: 


Definition 1 Let the hop distance of a path be 
the number of edges it contains. A graph G is 
said to have approximate hop diameter h if for 
every pair of vertices (x, y), there is a (1 + €)- 
approximate shortest x—y path with hop distance 


<h. 


Theorem 1 ((3]) Let G be an undirected graph 
with nonnegative real edge weights, and let R be 
the ratio of the heaviest to the lightest nonzero 
weight. One can maintain (2 + €)-APSP in the 
Jully dynamic setting with amortized update time 
O(mh log(nR)), where h is the approximate hop 
diameter of the graph. 


Shortcut Edges 

Theorem | provides an efficient algorithm for 
small h, but on its own a result that is efficient for 
small hop diameter is not particularly powerful as 
even in unweighted graphs / can be 2(n). The 
second step of Bernstein’s approach is to show 
that regardless of whether the original graph is 
weighted, one can add weighted edges to reduce 
the hop diameter. A shortcut edge (x, y) is anew 
edge constructed by the algorithm that has weight 
w(x, y) with d(x, y) < w(x, y) < (1+€)d(x, y), 
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where 5(x, y) is the shortest x — y distance. It is 
clear that because shortcut weights are tethered to 
shortest distances, they do not change (weighted) 
distances in the graph. But a shortcut edge can 
greatly reduce hop distances; for example, in an 
unweighted graph where 6(x, y) = 1000, adding 
a single shortcut edge (x, y) of weight 1000 de- 
creases the x — y hop distance to | while also de- 
creasing the hop distance of paths that go through 
x and y. Bernstein adapts techniques from span- 
ner and emulator theory (see in particular Thorup 
and Zwick’s result on graph sparsification [12]) to 
show that in fact a small number of shortcut edges 
suffice to greatly reduce the hop diameter of a 
graph. 


Theorem 2 ((3]) Let G be an undirected graph 
with nonnegative real edge weights, and let R be 
the ratio of the heaviest to the lightest edge weight 
in the graph. There exists an algorithm that in 


time O(m -n°0/V 8) . log(nR)) constructs a 
set S of O(n'*+OG//l2™) . log(nR)) shortcut 
edges such that adding S to the edges of the 
graph reduces the approximate hop diameter to 


nOU/ s/s). 


Theorems | and 2 combined encapsulate the 
approach of Bernstein’s algorithm: take an 
algorithm that works well for small distances, 
use scaling to transform it into an algorithm that 
is efficient for graphs of small hop diameter, and 
then add shortcut edges to the original graph 
to ensure a small hop diameter. For dynamic 
APSP, there are additional complications that 
arise from edges being inserted and deleted 
over time, but the basic approach remains 
the same. 


Theorem 3 ((3]) Let G be an undirected graph 
with real nonnegative edge weights, and let R 
be the ratio of the maximum edge weight ap- 
pearing in the graph during any point in the 
update sequence to the minimum nonzero edge 
weight. There is an algorithm that maintains fully 
dynamic (2 + €)-approximate APSP in amortized 
update time O(m - nOG/V 8) . log(nR)) and 
can answer distance queries in worst-case time 


O(log log log(7)). 
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Open Problems 


The main open problem for fully dynamic ap- 
proximate APSP is to develop an efficient algo- 
rithm for maintaining (1 + €) approximate dis- 
tances, possibly with a small additive error in the 
unweighted case. Another interesting problem 
would be to achieve o(n”) update times for dense 
graphs — this can already be done to some extent 
by combining the result of Bernstein discussed 
here with the fully dynamic spanner of Baswana 
et al. [2], but only for unweighted graphs and at 
the cost of a much worse approximation ratio. 
Other open problems include removing the de- 
pendence on log(R) and developing an efficient 
deterministic algorithm for the problem. 
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Problem Definition 


The dynamic tree problem is that of maintaining 
an arbitrary n-vertex forest that changes over 
time through edge insertions (Jinks) and deletions 
(cuts). Depending on the application, one asso- 
ciates information with vertices, edges, or both. 
Queries and updates can deal with individual 
vertices or edges, but more commonly they refer 
to entire paths or trees. Typical operations include 
finding the minimum-cost edge along a path, 
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determining the minimum-cost vertex in a tree, or 
adding a constant value to the cost of each edge 
on a path (or of each vertex of a tree). Each of 
these operations, as well as links and cuts, can be 
performed in O(log 7) time with appropriate data 
structures. 


Key Results 


The obvious solution to the dynamic tree problem 
is to represent the forest explicitly. This, however, 
is inefficient for queries dealing with entire paths 
or trees, since it would require actually traversing 
them. Achieving O(log) time per operation re- 
quires mapping each (possibly unbalanced) input 
tree into a balanced tree, which is better suited 
to maintaining information about paths or trees 
implicitly. There are three main approaches to 
perform the mapping: path decomposition, tree 
contraction, and linearization. 


Path Decomposition 
The first efficient dynamic tree data structure was 
Sleator and Tarjan’s ST-trees [13, 14], also known 
as link-cut trees or simply dynamic trees. They 
are meant to represent rooted trees, but the user 
can change the root with the evert operation. 
The data structure partitions each input tree into 
vertex-disjoint paths, and each path is represented 
as a binary search tree in which vertices appear 
in symmetric order. The binary trees are then 
connected according to how the paths are related 
in the forest. More precisely, the root of a binary 
tree becomes a middle child (in the data structure) 
of the parent (in the forest) of the topmost vertex 
of the corresponding path. Although a node has 
no more than two children (left and right) within 
its own binary tree, it may have arbitrarily many 
middle children. See Fig. 1. The path containing 
the root (glifcba in the example) is said to be 
exposed, and is represented as the topmost binary 
tree. All path-related queries will refer to this 
path. The expose operation can be used to make 
any vertex part of the exposed path. 

With standard balanced binary search trees 
(such as red-black trees), ST-trees support each 
dynamic tree operation in O(log” n) amortized 
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Dynamic Trees, Fig. 1 An ST-tree (Adapted from [14]). 
On the /eft, the original tree, rooted at a and already par- 
titioned into paths; on the right, the actual data structure. 


time. This bound can be improved to O(logn) 
amortized with locally biased search trees, and 
to O(logn) in the worst case with globally bi- 
ased search trees. Biased search trees (described 
in [5]), however, are notoriously complicated. 
A more practical implementation of ST-trees uses 
splay trees, a self-adjusting type of binary search 
trees, to support all dynamic tree operations in 
O(log n) amortized time [14]. 


Tree Contraction 

Unlike ST-trees, which represent the input trees 
directly, Frederickson’s topology trees [6, 7, 8] 
represent a contraction of each tree. The origi- 
nal vertices constitute level 0 of the contraction. 
Level 1 represents a partition of these vertices 
into clusters: a degree-one vertex can be com- 
bined with its only neighbor; vertices of degree 
two that are adjacent to each other can be clus- 
tered together; other vertices are kept as single- 
tons. The end result will be a smaller tree, whose 
own partition into clusters yields level 2. The 
process is repeated until a single cluster remains. 
The topology tree is a representation of the con- 
traction, with each cluster having as children its 
constituent clusters on the level below. See Fig. 2. 


Solid edges connect nodes on the same path; dashed edges 
connect different paths 


With appropriate pieces of information 
stored in each cluster, the data structure can 
be used to answer queries about the entire tree 
or individual paths. After a link or cut, the 
affected topology trees can be rebuilt in O(log) 
time. 

The notion of tree contraction was developed 
independently by Miller and Reif [11] in the 
context of parallel algorithms. They propose two 
basic operations, rake (which eliminates vertices 
of degree one) and compress (which eliminates 
vertices of degree two). They show that O(log 7) 
rounds of these operations are sufficient to con- 
tract any tree to a single cluster. Acar et al. trans- 
lated a variant of their algorithm into a dynamic 
tree data structure, RC-trees [1], which can also 
be seen as a randomized (and simpler) version of 
topology trees. 

A drawback of topology trees and RC-trees is 
that they require the underlying forest to have ver- 
tices with bounded (constant) degree in order to 
ensure O(log 7) time per operation. Similarly, al- 
though ST-trees do not have this limitation when 
aggregating information over paths, they require 
bounded degrees to aggregate over trees. Degree 
restrictions can be addressed by “ternarizing” 
the input forest (replacing high-degree vertices 
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Dynamic Trees, Fig. 2. A topology tree (Adapted from [7]). On the Jeft, the original tree and its multilevel partition; 


on the right, a corresponding topology tree 


Dynamic Trees, Fig. 3 
The rake and compress 
operations, as used by top 
trees (From [16])) 


with a series of low-degree ones [9]), but this 
introduces a host of special cases. 

Alstrup et al.’s top trees [3, 4] have no such 
limitation, which makes them more generic than 
all data structures previously discussed. Although 
also based on tree contraction, their clusters be- 
have not like vertices, but like edges. A compress 
cluster combines two edges that share a degree- 
two vertex, while a rake cluster combines an edge 
with a degree-one endpoint with a second edge 
adjacent to its other endpoint. See Fig. 3. 

Top trees are designed so as to completely 
hide from the user the inner workings of the data 
structure. The user only specifies what pieces of 
information to store in each cluster, and (through 
call-back functions) how to update them after 
a cluster is created or destroyed when the tree 
changes. As long as the operations are properly 
defined, applications that use top trees are com- 
pletely independent of how the data structure is 
actually implemented, i.e., of the order in which 
rakes and compresses are performed. 

In fact, top trees were not even proposed 
as stand-alone data structures, but rather as an 
interface on top of topology trees. For efficiency 
reasons, however, one would rather have a more 
direct implementation. Holm, Tarjan, Thorup 


a 
W 


compress(v) 


rake(v) 


—_—____ > 


and Werneck have presented a conceptually 
simple stand-alone algorithm to update a top 
tree after a link or cut in O(logn) time in 
the worst case [17]. Tarjan and Werneck [16] 
have also introduced self-adjusting top trees, 
a more efficient implementation of top trees 
based on path decomposition: it partitions the 
input forest into edge-disjoint paths, represents 
these paths as splay trees, and connects these 
trees appropriately. Internally, the data structure 
is very similar to ST-trees, but the paths are 
edge-disjoint (instead of vertex-disjoint) and the 
ternarization step is incorporated into the data 
structure itself. All the user sees, however, are 
the rakes and compresses that characterize tree 
contraction. 


Linearization 

ET-trees, originally proposed by Henzinger and 
King [10] and later slightly simplified by Tar- 
jan [15], use yet another approach to represent 
dynamic trees: linearization. It maintains an Eu- 
ler tour of the each input tree, i.e., a closed 
path that traverses each edge twice—once in each 
direction. The tour induces a linear order among 
the vertices and arcs, and therefore can be repre- 
sented as a balanced binary search tree. Linking 
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and cutting edges from the forest corresponds 
to joining and splitting the affected binary trees, 
which can be done in O(logn) time. While lin- 
earization is arguably the simplest of the three 
approaches, it has a crucial drawback: because 
each edge appears twice, the data structure can 
only aggregate information over trees, not paths. 


Lower Bounds 

Dynamic tree data structures are capable of 
solving the dynamic connectivity problem on 
acyclic graphs: given two vertices v and w, decide 
whether they belong to the same tree or not. 
P*atragscu and Demaine [12] have proven a lower 
bound of (2(logn) for this problem, which is 
matched by the data structures presented here. 


Applications 


Sleator and Tarjan’s original application for 
dynamic trees Dinic’s blocking flow 
algorithm [13]. Dynamic trees are used to 
maintain a forest of arcs with positive residual 
capacity. As soon as the source s and the sink 
t become part of the same tree, the algorithm 
sends as much flow as possible along the 
s-t path; this reduces to zero the residual 
capacity of at least one arc, which is then 
cut from the tree. Several maximum flow and 
minimum-cost flow algorithms incorporating 
dynamic trees have been proposed ever since 
(some examples are [9, 15]). Dynamic tree 
data structures, especially those based on tree 
contraction, are also commonly used within 
dynamic graph algorithms, such as the dynamic 
versions of minimum spanning trees [6, 10], 
connectivity [10], biconnectivity [6], and 
bipartiteness [10]. Other applications include 
the evaluation of dynamic expression trees [8] 
and standard graph algorithms [13]. 


was 


Experimental Results 


Several studies have compared the performance 
of different dynamic-tree data structures; in most 
cases, ST-trees implemented with splay trees are 
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the fastest alternative. Frederickson, for example, 
found that topology trees take almost 50 % more 
time than splay-based ST-trees when executing 
dynamic tree operations within a maximum flow 
algorithm [8]. Acar et al. [2] have shown that RC- 
trees are significantly slower than splay-based 
ST-trees when most operations are links and cuts 
(such as in network flow algorithms), but faster 
when queries and value updates are dominant. 
The reason is that splaying changes the structure 
of ST-trees even during queries, while RC-trees 
remain unchanged. 

Tarjan and Werneck [17] have presented an 
experimental comparison of several dynamic tree 
data structures. For random sequences of links 
and cuts, splay-based ST-trees are the fastest al- 
ternative, followed by splay-based ET-trees, self- 
adjusting top trees, worst-case top trees, and 
RC-trees. Similar relative performance was ob- 
served in more realistic sequences of operations, 
except when queries far outnumber structural 
operations; in this case, the self-adjusting data 
structures are slower than RC-trees and worst- 
case top trees. The same experimental study also 
considered the “obvious” implementation of ST- 
trees, which represents the forest explicitly and 
require linear time per operation in the worst 
case. Its simplicity makes it significantly faster 
than the O(log7)-time data structures for path- 
related queries and updates, unless paths are 
hundred nodes long. The sophisticated solutions 
are more useful when the underlying forest has 
high diameter or there is a need to aggregate 
information over trees (and not only paths). 
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Problem Definition 


Given two strings S = s,s2...5, and R = 
r1l2 ... Tm (wlog let n > m) over an alphabet 
o = {0}, 02, ... og}, the standard edit distance 
between S and R, denoted ED(S, R) is the min- 
imum number of single character edits, specif- 
ically insertions, deletions and replacements, to 
transform S into R (equivalently R into S). 

If the input strings S and R are permutations 
of the alphabet o (so that |S| = |R| = |o|) then 
an analogous permutation edit distance between 
Sand R, denoted PED(S, R) can be defined as the 
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minimum number of single character moves, to 
transform S into R (or vice versa). 

A generalization of the standard edit distance 
is edit distance with moves, which, for input 
strings S and R is denoted EDM(S, R), and is de- 
fined as the minimum number of character edits 
and substring (block) moves to transform one of 
the strings into the other. A move of block s{[j, 


k] to position h transforms S = 5152 ... Sp into 
S! = 81... Sj-1 Sk4iS8k42 +++ Sp—-1Sj +++ SkSh 
... Sp [Al]. 

If the input strings S and R are permutations 
of the alphabet o (so that |S| = |R| = |o|) 


then EDM(S, R) is also called as the transposition 
distance and is denoted TED(S, R) [1]. 

Perhaps the most general form of the standard 
edit distance that involves edit operations on 
blocks/substrings is the block edit distance, 
denoted BED(S, R). It is defined as_ the 
minimum number of single character edits, 
block moves, as well as block copies and 
block uncopies to transform one of the strings 
into the other. Copying of a block s{[j, k] to 
position h transforms S = 5152 ... Sy into S’ = 
Sy... SFSFH1 ©. Sk... Sp—1 87 
A block uncopy is the inverse of a block 
copy: it deletes a block s[j, k] provided there 
exists s[j’, kK’) =  s[j,k] which does not 
overlap with s[j, k] and transforms S into 
S’ = Sy... S7-1Sk41 +--+ Sn- 

Throughout this discussion all edit operations 
have unit cost and they may overlap; i.e., a char- 
acter can be edited on multiple times. 


oes SESh ... Sn 
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Key Results 


There are exact and approximate solutions to 
computing the edit distances described above 
with varying performance guarantees. As can be 
expected, the best available running times as well 
as the approximation factors for computing these 
edit distances vary considerably with the edit 
operations allowed. 


Exact Computation of the Standard and 
Permutation Edit Distance 

The fastest algorithms for exactly computing the 
standard edit distance have been available for 
more than 25 years. 


Theorem 1 (Levenshtein [9]) The standard edit 
distance ED(S, R) can be computed exactly in 
time O(n +m) via dynamic programming. 


Theorem 2 (Masek-Paterson [11]) The stan- 
dard edit distance ED(S, R) can be computed 
exactly in time O(n + n-m/logy,, n) via the 
“four-Russians trick”. 


Theorem 3 (Landau-Vishkin [8]) /t is possible 
to compute ED(S, R) in time O(n - ED(S, R)). 


Finally, note that if S and R are permutations 
of the alphabet o, PED(S, R) can be computed 
much faster than the standard edit distance for 
general strings: Observe that PED(S,R) = 
n — LCS(S, R) where LCS(S, R) represents the 
longest common subsequence of S and R. For 
permutations S, R, LCS(S, R) can be computed in 
time O(n - log logn) [3]. 


Approximate Computation 

of the Standard Edit Distance 

If some approximation can be tolerated, it is 
possible to considerably improve the O(n - m) 
time (O notation hides polylogarithmic factors) 
available by the techniques above. The fastest 
algorithm that approximately computes the stan- 
dard edit distance works by embedding strings S 
and R from alphabet o into shorter strings S’ and 
R’ from a larger alphabet o’ [2]. The embedding 
is achieved by applying a general version of the 
Locally Consistent Parsing [13, 14] to partition 
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the strings R and S into consistent blocks of 
size c to 2c — 1; the partitioning is consistent 
in the sense that identical (long) substrings are 
partitioned identically. Each block is then re- 
placed with a label such that identical blocks are 
identically labeled. The resulting strings S’ and 
R’ preserve the edit distance between S and R 
approximately as stated below. 


Theorem 4 (Batu-Ergun-Sahinalp [2]) 
ED(S, R) can be computed in time O(n'*) 
within an approximation factor of min{n a +e). 


(ED(S, R)/n®)2+°M}, 


For the case of € = 0, the above result 
provides an O(n) time algorithm for ap- 
proximating ED(S, R) within a factor of 
minfn3 to), ED(S, R)2t°3, 


Approximate Computation 

of Edit Distances Involving Block Edits 

For all edit distance variants described above 
which involve blocks, there are no known poly- 
nomial time algorithms; in fact it is NP-hard to 
compute TED(S, R) [1], EDM(S, R) and BED(S, 
R) [10]. However, in case S and R are permuta- 
tions of o, there are polynomial time algorithms 
that approximate transposition distance within 
a constant factor: 


Theorem 5 (Bafna-Pevzner [1]) TED(S, R) can 
be approximated within a factor of 1.5 in O(n?) 
time. 


Furthermore, even if S and R are arbitrary strings 
from o, it is possible to approximately compute 
both BED(S, R) and EDM(S, R) in near linear 
time. More specifically obtain an embedding of 
S and R to binary vectors f(S) and f(R) such that: 


Theorem 6 (Muthukrishnan-Sahinalp [12]) 
NFCS)— F(R) < BED(S,R) < ||f(S) - 


log* n 


f(R)Ih1- logn. 


In other words, the Hamming distance between 
J(S) and f(R) approximates BED(S, R) within 
a factor of logn-log* n. Similarly for EDM(S, R), 
it is possible to embed S and R to integer valued 
vectors F(S) and F(R) such that: 
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Theorem 7 (Cormode-Muthukrishnan [4]) 

LFS) Fh < EDM(S,R) < ||F(S) — 
og” n 

F(R)||1 - logan. 

In other words, the L; distance between F(S) and 

F(R) approximates EDM(S, R) within a factor of 

logn - log* n. 

The embedding of strings S and R into binary 
vectors f(S) and f(R) is introduced in [5] and 
is based on the Locally Consistent Parsing 
described above. To obtain the embedding, one 
needs to hierarchically partition S$ and R into 
growing size core blocks. Given an alphabet 
o, Locally Consistent Parsing can identify 
only a limited number of substrings as core 
blocks. Consider the lexicographic ordering 
of these core blocks. Each dimension i of the 
embedding /(S) simply indicates (by setting 
F(S)[i] = 1) whether S includes the ith 
core block corresponding to the alphabet o as 
a substring. Note that if a core block exists in S$ 
as a substring, Locally Consistent Parsing will 
identify it. 

Although the embedding above is exponential 
in size, the resulting binary vector f(S) is very 
sparse. A simple representation of f(S) and f(R), 
exploiting their sparseness can be computed in 
time O(n log” n) and the Hamming distance be- 
tween f(S) and f(R) can be computed in linear 
time by the use of this representation [12]. 

The embedding of S and R into integer valued 
vectors F(S) and F(R) are based on similar tech- 
niques. Again, the total time needed to approxi- 
mate EDM(S, R) within a factor of logn - log* n 
is O(n log* n). 


Applications 


Edit distances have important uses in compu- 
tational evolutionary biology, in estimating the 
evolutionary distance between pairs of genome 
sequences under various edit operations. There 
are also several applications to the document ex- 
change problem or document reconciliation prob- 
lem where two copies of a text string S have been 
subject to edit operations (both single character 
and block edits) by two parties resulting in two 
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versions S; and S, and the parties communicate 
to reconcile the differences between the two ver- 
sions. An information theoretic lower bound on 
the number of bits to communicate between the 
two parties is then 22(BED(S, R)) - logn. The 
embedding of S and R to binary strings f(S) and 
J(R) provides a simple protocol [5] which gives 
a near-optimal tradeoff between the number of 
rounds of communication and the total number of 
bits exchanged and works with high probability. 

Another important application is to the 
Sequence Nearest Neighbors (SNN) problem, 
which asks to preprocess a set of strings 
Si, , Sx so that given an on-line query 
string R, the string S; which has the lowest 
distance of choice to R can be computed in time 
polynomial with |R| and polylogarithmic with 
a |S;|. There are no known exact solutions 
for the SNN problem under any edit distance 
considered here. However, in [12], the embedding 
of strings S; into binary vectors f(S;), combined 
with the Approximate Nearest Neighbors results 
given in [6] for Hamming Distance, provides an 
approximate solution to the SNN problem under 
block edit distance as follows. 


Theorem 8 (Muthukrishnan-Sahinalp [12]) 
It is possible to preprocess a set of strings 
Si, , Se from a given alphabet o in 
O(poly(S*_ |S;|)) time such that for any 
on-line query string R from o one can compute 
a string S; in time O(polylog(y*_, |S; |) - 
poly(|R|)) which guarantees that for all h € 
[1,k], BED(S;,R) < BED(Sh, R) - log(max; 
|S; |) - log* (max; |Sj|). 


Open Problems 


It is interesting to note that when dealing with 
permutations of the alphabet o the problem of 
computing both character edit distances and 
block edit distances become much easier; one can 
compute PED(S, R) exactly and TED(S, R) within 
an approximation factor of 1.5 in O(n) time. For 
arbitrary strings, it is an open question whether 
one can approximate TED(S, R) or BED(S, R) 
within a factor of o(logn) in polynomial time. 
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One recent result in this direction shows that 
it is not possible to obtain a polylogarithmic 
approximation to TED(S, R) via a_ greedy 
strategy [7]. Furthermore, although there is 
a lower bound of Q(n3) on the approximation 
factor that can be achieved for computing the 
standard edit distance in O(n) time by the use of 
string embeddings, there is no general lower 
bound on how closely one can approximate 
ED(S, R) in near linear time. 
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Problem Definition 


The basic group testing problem is to identify 
the unknown set of positive items from a large 
population of items using as few tests as possi- 
ble. A test is a subset of items. A test returns 
positive if there is a positive item in the subset. 
The semantics of “positives,” “items,” and “tests” 
depend on the application. 

In the original context [3], group testing was 
invented to solve the problem of identifying 
syphilis-infected blood samples from a large 
collection of WWII draftees’ blood samples. 
In this case, items are blood samples, which 
are positive if they are infected. A test is 
a pool (group) of blood samples. Testing a 
group of samples at a time will save resources 
if the test outcome is negative. On the other 
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hand, if the test outcome is positive, then all 
we know is that at least one sample in the 
pool is positive, but we do not know which 
one(s). 

In nonadaptive combinatorial group testing 
(NACGT), we assume that the number of posi- 
tives is at most d for some fixed integer d and 
that all tests have to be specified in advance 
before any test outcome is known. The NACGT 
paradigm has found numerous applications in 
many areas of mathematics, computer science, 
and computational biology [4, 9, 10]. 

A NACGT strategy with ¢ tests on a universe 
of N items is represented by atx N binary matrix 
M = (m;;), where m;; = 1 iff item 7 belongs to 
test i. Let M; and M/ denote row i and column 
j of M, respectively. Abusing notation, we will 
also use M; (respectively, M/) to denote the set 
of rows (respectively, columns) corresponding to 
the 1-entries of row i (respectively, column j ). In 
other words, M,; is the ith pool, and M/ is the set 
of pools that item j belongs to. 

Let D Cc [N] be the unknown subset of posi- 
tive items, where |D| < d. Let y = (y;)i_, € 
{0, 1}! denote the test outcome vector, i.e., yj = 
1 iff the ith test is positive. Then, the test outcome 
vector is precisely the (Boolean) union of the 
positive columns: y = ee D M/. The task of 
identifying the unknown subset D from the test 
outcome vector y is called decoding. 


The main problem In many modern applica- 
tions of NACGT, there are two key requirements 
for an NACGT scheme: 


1. Small number of tests. “Tests” are computa- 
tionally expensive in many applications. 

2. Efficient decoding. As the item universe size 
N can be extremely large, it would be ideal for 
the decoding algorithm to run in time sublin- 
ear in N and more precisely in poly(d, log V) 
time. 


Key Results 


To be able to uniquely identify an arbitrary subset 
D of at most d positives, it is necessary and suffi- 
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cient for the test outcome vectors y to be different 
for distinct subsets D of at most d positives. An 
NACGT matrix with the above property is called 
d-separable. However, in general such matrices 
only admit the brute force 2(N@)-time decoding 
algorithm. A very natural decoding algorithm 
called the naive decoding algorithm runs much 
faster, in time O(tN). 


Definition 1 (Naive decoding algorithm) 
Eliminate all items that participate in negative 
tests; return the remaining items. 


This algorithm does not work for arbitrary d- 
separable matrices. However, if the test matrix 
M satisfies a slightly stronger property called d- 
disjunct, then the naive decoding algorithm is 
guaranteed to work correctly. 


Definition 2 (Disjunct matrix) At x N binary 
matrix M is said to be d-disjunct iff M/ \ 
Ukes M* + @ for any set S of d columns and 
any j € S. (See Fig. 1.) 


Minimize Number of Tests 


It is remarkable that d-disjunct matrices not only 
allow for linear time decoding, which is a vast 
improvement over the brute-force algorithm for 
separable matrices, but also have asymptotically 
the same number of tests as d-separable matrices 
[4]. Let t(d, N) denote the minimum number 
of rows of an N-column d-disjunct matrix. It 
has been known for about 40 years [5] that 
t(Q(/N),N) = O(N), and for d = O(V/N) 


we have 


2 

a( 10g) < t(d,N) < O(d*logN). 
logd 

(1) 


A t x N d-disjunct matrix with tf = 
O(d? log N), rows can be constructed randomly 
or even deterministically (see [11]). However, 
the decoding time O(tN) of the naive 
decoding algorithm is still too slow for modern 
applications, where in most cases d « N and 
thust < N. 
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Efficient Decodable 
Group Testing, Fig.1 A 
d-disjunct matrix has the 
following property: for any 
subset S of d (not 
necessarily contiguous) 


columns, and any column a ‘t 00000000000000 
J that is not present in S, 
there exists a row 7 that has 
a 1 incolumn / and all 
zeros in S 
j S 
Efficient Decoding Note that a matrix is d-disjunct matrix iff it 


An ideal decoding time would be in the order 
of poly(d,log N), which is sublinear in N for 
practical ranges of d. Ngo, Porat, and Rudra [10] 
showed how to achieve this goal using a couple of 
ideas: (a) two-layer test matrix construction and 
(b) code concatenation using a list recoverable 
code. 


(a) Two-layer test matrix construction The 
idea is to construct M by stacking on top of 
one another two matrices: a “filtering” matrix F 
and an “identification” matrix D. (See Fig. 2.) 
The filtering matrix is used to quickly identify 
a “small” set of L candidate items including 
all the positives. Then, the identification matrix 
is used to pinpoint precisely the positives. For 
example, let D be any d-disjunct matrix, and that 
from the tests corresponding to the rows of F, 
we can produce a set S of L = poly(d, log NV) 
candidate items in time poly(d,log N). Then, 
by running the naive decoding algorithm on 
S using test results corresponding to the rows 
of D, we can identify all the positives in time 
poly(d,log N). To formalize the notion of 
“filtering matrix,’ we borrow a concept from 
coding theory, where producing a small list 
of candidate codewords is the list decoding 
problem [6]. 


Definition 3 (List-disjunct matrix) Letd+ < 
N be positive integers. A matrix F is (d, €)-list 
disjunct if and only if U ;e7 M/\\Uzes M* 4 @ 
for any two disjoint sets S and T of columns of 
F with |S| = d and |7| = @. (See Fig. 3.) 


is (d,1)-list disjunct. However, the relaxation 
to £ = O@(d) allows the existence (and 
construction) of (d, O(d))-list-disjunct matrices 
with O(d log(N/d)) rows. The existence of such 
small list-disjunct matrices is crucially used in 
the second idea below. 


(b) Code Concatenation with list recoverable 
codes A t x N (d,€)-list-disjunct matrix 
admits O(tN)-decoding time using the naive 
decoding algorithm. However, to achieve 
poly(d,log N) decoding time overall, we will 
need to construct list-disjunct matrices that allow 
for a poly(d, log N’) decoding time. In particular, 
to use such a matrix as a filtering matrix, it 
is necessary that £ = poly(d). To construct 
efficiently decodable list-disjunct matrices, we 
need other ideas. Ngo, Porat, and Rudra [10] 
used a connection to list recoverable codes [6] 
to construct such matrices. This connection 
was used to construct (d, O(d?/?))-list-disjunct 
matrices with t = o0(d* log, N) rows that can 
be decoded in poly(t) time. This along with 
the construction in Fig.2 implies the following 
result: 


Theorem 1 ({10]) Given any d-disjunct matrix, 
it can be converted into another matrix with 1 + 
o(1) times as many rows that is also efficiently 
decodable (even if the original matrix was not). 


Other constructions of list-disjunct matrices 
with worse parameters were obtained earlier by 
Indyk, Ngo and Rudra [7], and Cheraghchi [1] 
using connections to expanders and randomness 
extractors. 
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F (filtering matrix) 


ty (d, €)-list disjunct 


poly(t,) 
time 


L:= d+ 


D (identification matrix) 


ty d-disjunct 


q 
I 
O(L+t,) 4 ___ | 
i Yo 
i 
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Efficient Decodable Group Testing, Fig. 2. The vector 
x denotes the characteristic vector of the d positives 
(illustrated by the orange box). The final matrix is the 
stacking of F, which is a (d, £)-list-disjunct matrix, and 
D, which is a d-disjunct matrix. The result vector is 
naturally divided into y , (the part corresponding to F and 
denoted by the red vector) and y 5 (the part corresponding 


d 


to D and denoted by the blue vector). The decoder first 
uses y, to compute a superset of the set of positives 
(denoted by green box), which is then used with y> to 
compute the final set of positives. The first step of the 
decoding is represented by the red-dotted box, while the 
second step (naive decoder) is denoted by the blue-dotted 
box 


Efficient Decodable 
Group Testing, Fig.3 A 
(d, £)-list-disjunct matrix 
satisfies the following 
property: for any subset S 
of size d and any disjoint 


subset T of size £, there 


000100000 


exists a row 7 that has a 1 
in at least one column in T 
and all zeros in S 


Applications 


Heavy hitter is one of the most fundamental 
problems in data streaming [8]. Cormode and 
Muthukrishnan [2] showed that an NACGT 
scheme that is efficiently decodable and is also 
explicit solves a natural version of the heavy 
hitter problem. An explicit construction means 


T S 


one needs an algorithm that outputs a column 
or a specific entry of M instead of storing the 
entire matrix M which can be extremely space 
consuming. This is possible with Theorem | by 
picking the filtering and decoding matrices to be 
explicit. 

Another important generalization of NACGT 
matrices are those that can handle errors in the 
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test outcomes. Again this is possible with the 
construction of Fig. 2 if the filtering and decoding 
matrices are also error tolerant. The list-disjunct 
matrices constructed by Cheraghchi are also error 
tolerant [1]. 


Open Problems 


The outstanding open problem in group test- 
ing theory is to close the gap (1). An explicit 
construction of (d,d)-list-disjunct matrices is 
not known; solving this problem will lead to 
a scheme that is (near-)optimal in all desired 
objectives. 
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Problem Definition 


For a hypergraph H = (V,€), a subset of edges 
E’ © E is an exact cover of H, if every vertex 
of V is contained in exactly one hyperedge of 
€’, that is, for alle, f € E€’ withhe # f, 
en f =@and JE’ = V. The EXACT COVER 
(XC) problem asks for the existence of an exact 
cover in a given hypergraph H. Exact Cover is in 
Karp’s famous list of 21 NP-complete problems; 
it is NP-complete even for 3-element hyperedges 
(problem X3C [SP2] in [14]). 
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Let G be a finite simple undirected graph with 
vertex set V and edge set FE. A vertex dominates 
itself and all its neighbors, i.e., every vertex v € 
V dominates its closed neighborhood N[v] = 
{u | u = v oruv € E}. A vertex subset D 
of G is an efficient dominating (e.d.) set, if, for 
every vertex v € V, there is exactly one d € D 
dominating v [1,2]. An edge subset M of G is 
an efficient edge dominating (e.e.d.) set, if it is an 
efficient dominating set in the line graph L(G) of 
G [15]. Efficient dominating sets are sometimes 
also called independent perfect dominating sets, 
and efficient edge dominating sets are also known 
as dominating induced matchings. 

The EFFICIENT DOMINATION (ED) problem 
for a graph G asks for the existence of an e.d. 
set in G. The EFFICIENT EDGE DOMINATION 
(EED) problem asks for the existence of an e.d. 
set in the line graph L(G). 

For a graph G, let N(G) denote its closed 
neighborhood hypergraph, that is, for every ver- 
tex v € V, the closed neighborhood N[v] is a 
hyperedge in \V(G); note that this is a multiset 
since distinct vertices may have the same closed 
neighborhood. For a graph G, the square G? has 
the same vertex set as G and two vertices, x and 
y, are adjacent in G?, if and only if their distance 
in G is at most 2. Note that G? is isomorphic to 
L(N(G)). 

By definition, the ED problem on a graph G is 
the same as the Exact Cover problem on its closed 
neighborhood hypergraph A/(G), and the EED 
problem is the same as the Exact Cover problem 


on L(N(G)). 


Key Results 


ED and EED are NP-complete; their complexity 
on special graph classes was studied in various 
papers — see, e.g., [2, 3, 12, 16-18, 20, 22, 24, 25] 
for ED and [5, 7, 11, 15, 19, 21] for EED. In 
particular, ED remains NP-complete for chordal 
graphs as well as for (very restricted) bipartite 
graphs such as chordal bipartite graphs, and EED 
is NP-complete for bipartite graphs but solvable 
in linear time for chordal graphs. 
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ED for Graphs 

A key tool in [8] is a reduction of ED for G 
to the maximum-weight independent set prob- 
lem for G?, which is based on the following 
observation: 

For a hypergraph H = (V,€) ande é€ €, 
let w(e) := |e| be an edge weight function. For 
the line graph L(#7), let a,(L(#)) denote the 
maximum weight of an independent vertex set 
in L(#). The weight of any independent vertex 
set in L(#Z) is at most |V|, and H has an exact 
cover, if and only if a(L(H)) = |V|. Using the 
fact that G? is isomorphic to LVV(G)) and ED 
on G corresponds to Exact Cover on \(G), this 
means that ED on G can be reduced to the max- 
imum weight of an independent vertex set in G?, 
similarly for EED. This unified approach helps 
to answer some open questions on ED and EED 
for graph classes; one example is ED for strongly 
chordal graphs: Since for a dually chordal graph 
G, its square G? is chordal, ED is solvable in 
polynomial time for dually chordal graphs and 
thus for strongly chordal graphs [8] (recall that 
ED is NP-complete for chordal graphs). Similar 
properties of powers lead to polynomial time 
for ED on AT-free graphs using known results 
[8]. For Ps-free graphs having an e.d., G? is 
P4-free [9]. 

ED is NP-complete for planar bipartite graphs 
of maximum degree 3 [9]. In [23], this is sharp- 
ened by adding a girth condition: ED is NP- 
complete for planar bipartite graphs of maximum 
degree 3 and girth at least g, for every fixed g. 

From the known results, it follows that ED 
is NP-complete for F-free graphs whenever F 
contains a cycle or a claw. Thus, F can be 
assumed to be cycle- and claw-free (see, e.g., 
[9]); such graphs F' are called linear forests. 
For (P3 + P3)-free graphs and thus for P7- 
free graphs, ED is NP-complete. ED is robustly 
solvable in time O(nm) for Ps5-free graphs and 
for (P4 + P>)-free graphs [9, 23]. For every 
fixed k > 1, ED is solvable in polynomial 
time for (Ps + k P2)-free graphs [4]. For Pg-free 
graphs, the complexity of ED is an open prob- 
lem, and correspondingly for (P6 + kP2)-free 
graphs; these are the only open cases for F'-free 
graphs. 
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EED for Graphs 

The fact that graphs having an e.e.d. are K4-free 
leads to a simple linear time algorithm for EED 
on chordal graphs. More generally, EED is solv- 
able in polynomial time for hole-free graphs and 
thus for weakly chordal graphs and for chordal 
bipartite graphs [7]. This also follows from the 
fact that, for a weakly chordal graph G, L(G)? 
is weakly chordal [10] and from the reduction 
of EED for G to the maximum-weight inde- 
pendent set problem for L(G). In [23], this is 
improved to a robust O(nm) time algorithm for 
EED on hole-free graphs. In [8], we show that 
EED is solvable in linear time for dually chordal 
graphs. 

One of the open problems for EED was its 
complexity on P,-free graphs. In [5], we show 
that EED is solvable in linear time for P7-free 
graphs. The complexity of EED remains open for 
P,,-free graphs, k > 8. In [11], EED is solved in 
polynomial time on claw-free graphs. EED is NP- 
complete for planar bipartite graphs of maximum 
degree 3 [7]. In [23], it is shown that EED is 
NP-complete for planar bipartite graphs of max- 
imum degree 3 and girth at least g, for every 
fixed g. 


XC, ED, and EED for Hypergraphs 

The notion of a-acyclicity [13] is one of the 
most important and most frequently studied hy- 
pergraph notions. Among the many equivalent 
conditions describing aw-acyclic hypergraphs, we 
take the following: For ahypergraph H = (V,€), 
a tree T with node set € and edge set Er is a join 
tree of H, if, for all vertices v € V, the set of 
hyperedges €5 := {e € E | v € e} containing 
v induces a subtree of 7. H is a-acyclic, if it 
has a join tree. Let H* := (€,{E& | v € V}) 
be the dual hypergraph of H. The hypergraph 
H = (V,€) is a hypertree, if there is a tree T 
with vertex set V such that, for all e € €, T[e] is 
connected. Obviously, H is a-acyclic, if and only 
if its dual H* is a hypertree. 

By a result of Duchet, Flament, and Slater 
(see, e.g., [6]), it is known that H is a hypertree, 
if and only if H has the Helly property and its 
line graph L(#7) is chordal. In its dual version, 
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it says that H is a-acyclic, if and only if H is 
conformal and its 2-section graph is chordal. In 
[8], we show: 


(i) ED and XC are NP-complete for a-acyclic 
hypergraphs but solvable in polynomial time 
for hypertrees. 

(ii) EED is NP-complete for hypertrees but solv- 
able in polynomial time for a-acyclic hyper- 
graphs. 
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Problem Definition 


Multiple sequence alignment is an important 
problem in computational biology. Applications 
include finding highly conserved subregions in 
a given set of biological sequences and inferring 
the evolutionary history of a set of taxa from their 
associated biological sequences (e.g., see [9]). 
There are a number of measures proposed for 
evaluating the goodness of a multiple alignment, 
but prior to this work, no efficient methods are 
known for computing the optimal alignment for 
any of these measures. The work of Gusfield 
[7] gives two computationally efficient multiple 
alignment approximation algorithms for two 
of the measures with approximation ratio of 
less than 2. For one of the measures, they also 
derived a randomized algorithm, which is much 
faster and with high probability and reports a 
multiple alignment with small error bounds. 
To the best knowledge of the entry authors, 
this work is the first to provide approximation 
algorithms (with guarantee error bounds) for this 
problem. 
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Notations and Definitions 


Let X and Y be two strings of alphabet &. The 
pairwise alignment of X and Y maps X and Y 
into strings X’ and Y’ that may contain spaces, 
denoted by ‘_’, where (1) |X’| = |Y’| = @ and 
(2) removing spaces from X’ and Y’ returns X 
and Y, respectively. The score of the alignment 
is defined as d(X’, Y’) = )-f_, s(X'(), Y'() 
where X’(i) (and Y’(i)) denotes the ith character 
in X’ (and Y’) and s(a,b) witha,b € XU ‘_’ is 
the distance-based scoring scheme that satisfies 
the following assumptions: 


1. si’, ‘’) = 0; 
2. Triangular inequality: for any three characters, 
X,Y,Z,9(X,Z) s(x, y) + s(y,Z)). 


Let y = Xj, X2,..., Xx be aset of k > 2 strings 
of alphabet ©. A multiple alignment A of these k 
strings maps X), X2,...,X% to Xj, X},...,Xy 
that may contain spaces such that (1) |x 1 
| x3 a |x; = ¢ and (2) removing 
spaces from X : returns X; for all 1 < i < k. 
The multiple alignment A can be represented as a 
k x € matrix. 


The Sum of Pairs (SP) Measure 


The score of a multiple alignment A, de- 
noted by SP(A), is defined as the sum 
of the scores of pairwise alignments in- 
duced by A, that is, }° d(X;, X') — 


i<j 


£ : ; 
i<j Xep=1 5(X; [pI], X;[p]) where <i <j <k. 


Problem 1 (Multiple Sequence 
with Minimum SP Score) 


Alignment 


INPUT: A set of & strings, a scoring scheme s. 
OUTPUT: A multiple alignment A of these k 
strings with minimum SP(A). 


The Tree Alignment (TA) Measure 
In this measure, the multiple alignment is derived 


from an evolutionary tree. For a given set x of k 
strings, let x’ > x. An evolutionary tree 7, for 
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x is a tree with at least k nodes, where there is 
a one-to-one correspondence between the nodes 
and the strings in x’. Let X’ € x’ be the string 
for node u. The score of Ds denoted by TA (TY), 
is defined as }°,_(,y) D (X/,, X/) where e is an 
edge in T) and D(X/,X{) denotes the score 
of the optimal pairwise alignment for X/ and 
X;,. Analogously, the multiple alignment of y 
under the TA measure can also be represented 
by a |y’| x £ matrix, where |y’| > k, with a 
score defined as >.) d (X/,X4) (e is an 
edge in Ti similar to the multiple alignment 
under the SP measure in which the score is the 
summation of the alignment scores of all pairs of 
strings. Under the TA measure, since it is always 
possible to construct the | y’| x € matrix such that 
d (X/,X4) = D(X/,X{) for alle = (u,v) in 
ve and we are usually interested in finding the 
multiple alignment with the minimum TA value, 
so D (x X}) is used instead of d (x/, X/) in 
the definition of TA (7). 


Problem 2 (Multiple Sequence 
with Minimum TA Score) 


Alignment 


INPUT: A set of & strings, a scoring scheme s. 
OUTPUT: An evolutionary tree T for these k 
strings with minimum TA(T). 


Key Results 


Theorem 1 Let A* be the optimal multiple 
alignment of the given k strings with minimum SP 
score. They provide an approximation algorithm 
(the center star method) that gives a multiple 


alignment A such that ont < 2k) 
= 2 2 
= 2. 


The center star method is to derive a multiple 
alignment which is consistent with the optimal 
pairwise alignments of a center string with all 
the other strings. The bound is derived based on 
the triangular inequality of the score function. 
The time complexity of this method is O(k?£7), 
where £7 is the time to solve the pairwise align- 
ment by dynamic programming and k? is needed 
to find the center string, X-, which gives the 
minimum value of peer D(X¢, Xj). 
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Theorem 2 Let A* be the optimal multiple 
alignment of the given k strings with minimum 
SP. score. They provide a randomized algorithm 
that gives a multiple alignment A such that 
SPU < 2+ aa with probability at least 
1- (1)? for any r > 1 and p = 1. Instead 


k 
of computing > optimal pairwise alignments 


to find the best center string, the randomized 
algorithm only considers p randomly selected 
strings to be candidates for the best center 
string; thus, this method needs to x compute 
only (k — 1)p optimal pairwise alignments in 
O(k pe?) time where 1 < p <k. 


Theorem 3 Let T* be the optimal evolutionary 
tree of the given k strings with minimum TA score. 
They provide an approximation algorithm that 
gives an evolutionary tree T such that TaT*) = 
2(k=1) 7 _ 2 

k k 


In the algorithm, they first compute all the (5) 


optimal pairwise alignments to construct a graph 
with every node representing a distinct string 
X; and the weight of each edge (X;, Xj) as 
D(X; X;). This step determines the overall time 
complexity O(k?£). Then, they find a mini- 
mum spanning tree from the graph. The multiple 
alignment has to be consistent with the optimal 
pairwise alignments represented by the edges of 
this minimum spanning tree. 


Applications 


Multiple sequence alignment is a fundamental 
problem in computational biology. In particular, 
multiple sequence alignment is useful in identify- 
ing those common structures, which may only be 
weakly reflected in the sequence and not easily 
revealed by pairwise alignment. These common 
structures may carry important information for 
their evolutionary history, critical conserved mo- 
tifs, and common 3D molecular structure, as well 
as biological functions. 

More recently, multiple sequence alignment is 
also used in revealing noncoding RNAs (ncR- 
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NAs) [2]. In this type of multiple alignment, we 
are not only align the underlying sequences but 
also the secondary structures of the RNAs. Re- 
searchers believe that ncRNAs that belong to the 
same family should have common components 
giving a similar secondary structure. The multiple 
alignment can help to locate and identify these 
common components. 


Open Problems 


A number of open problems related to the work 
of Gusfield remain open. For the SP measure, 
the center star method can be extended to the q- 
star method (¢ > 2) with approximation ratio of 
2—q/k [1,10], sect. 7.5 of [11]). Whether there 
exists an approximation algorithm with better 
approximation ratio or with better time complex- 
ity is still unknown. For the TA measure, to 
be the best knowledge of the entry authors, the 
approximation ratio in Theorem 3 is currently the 
best result. 

Another interesting direction related to this 
problem is the constrained multiple sequence 
alignment problem [12] which requires the mul- 
tiple alignment to contain certain aligned charac- 
ters with respect to a given constrained sequence. 
The best known result [6] is an approximation 
algorithm (also follows the idea of center star 
method) which gives an alignment with approx- 
imation ratio of 2 — 2/k for k strings. 

For the complexity of the problem, Wang and 
Jiang [13] were the first to prove the NP-hardness 
of the problem with SP score under a nonmetric 
distance measure over a 4-symbol alphabet. More 
recently, in [5], the multiple alignment problem 
with SP score, star alignment, and TA score have 
been proved to be NP-hard for all binary or larger 
alphabets under any metric. Developing efficient 
approximation algorithms with good bounds for 
any of these measures is desirable. 


Experimental Results 


Two experiments have been reported in the pa- 
per showing that the worst-case error bounds in 
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Theorems | and 2 (for the SP measure) are pes- 
simistic compared to the typical situation arising 
in practice. 

The scoring scheme used in the experiments 
is s(a,b) = Oifa = b;s(a,b) = 1 it either 
a or Db is a space; otherwise s(a,b) = 2. 
Since computing the optimal multiple alignment 
with minimum SP score has been shown to 
be NP-hard, they evaluate the performance 
of their algorithms using the lower bound of 
di<; D(X, X;) (recall that D(X;, X;) is the 
score of the optimal pairwise alignment of X; 
and X;). They have aligned 19 similar amino 
acid sequences with average length of 60 of 
homeoboxs from different species. The ratio of 
the scores of reported alignment by the center star 
method to the lower bound is only 1.018 which 
is far from the worst-case error bound given in 
Theorem 1. They also aligned 10 not-so-similar 
sequences near the homeoboxes, and the ratio 
of the reported alignment to the lower bound 
is 1.162. Results also show that the alignment 
obtained by the randomized algorithm is usually 
not far away from the lower bound. 


Data Sets 


The exact sequences used in the experiments are 
not provided. 
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Problem Definition 


We consider the following fundamental problem 
in scheduling theory. Suppose that there is a set 7 
of n independent jobs J; with processing time p ; 
and a set P of m nonidentical processors P; that 
run at different speeds s;. If job J; is executed on 
processor P;, then processor P; needs p; /s; time 
units to complete the job. The goal is to find an 
assignment a : J — P for the jobs to the proces- 
sors that minimizes the total length of the sched- 
ule max;=1,....m pass p;/si. This is the 
minimum time needed to complete all jobs on the 
processors. The problem is denoted Q||Cmax and 
it is also called the minimum makespan problem 
on uniform parallel processors. By simplicity we 
may assume that the number m of processors is 
bounded by the number of jobs; otherwise select 
only the fastest n machines in O(m) time. 


Key Results 


The scheduling problem on uniform and also 
identical processors is NP-hard [7] and the exis- 
tence of a polynomial time algorithm for it would 
imply P = NP. Hochbaum and Shmoys [9, 10] 
presented a family of polynomial time approxi- 
mation algorithms {A,|¢ > 0} for both schedul- 
ing problems, where each algorithm A, generates 
a schedule of length (1 + €) OPT(/) for each 
instance J and has running time polynomial in 
the input size ||. Such a family of algorithms is 
called a polynomial time approximation scheme 
(PTAS). It is allowed that the running time of 
each algorithm A, is exponential in 1/e. The 
running time of the PTAS for uniform processors 
by Hochbaum and Shmoys [10] is (n/e)O0/2), 
Two restricted classes of approximation 
schemes were defined to classify different faster 
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approximation scheme. An efficient polynomial 
time approximation scheme (EPTAS) is a PTAS 
with running time f(1/e) poly(|I|) for some 
function f, while a fully polynomial time 
approximation scheme (FPTAS) runs in time 
poly(1/e, ||); polynomial in 1/e and the size 
|| of the instance. Since the scheduling problem 
on identical and also uniform processors is 
NP-hard in the strong sense (it contains bin 
packing as special case), we cannot hope for 
an FPTAS. For identical processors, Hochbaum 
and Shmoys (see [8]) and Alon et al. [1] gave an 
EPTAS with running time f(1/e) + O(n), where 
f is doubly exponential in 1/e. 


Known Techniques 

Hochbaum and Shmoys [9] introduced the 
dual approximation approach for identical and 
uniform processors and used the relationship 
between these scheduling problems and the bin 
packing problem. This relationship between 
scheduling on identical processors and bin 
packing problem had been exploited already by 
Coffman et al. [3]. Using the dual approximation 
approach, Hochbaum and Shmoys [9] proposed a 
PTAS for scheduling on identical processors with 
running time (n/e)@/€), 

The main idea in the approach is to guess the 
length of the schedule by using binary search 
and to consider the corresponding bin packing 
instance with scaled identical bin size equal to 1. 
Then they distinguish between large items with 
size > € and small items with size < e¢. For 
the large items they use a dynamic programming 
approach to calculate the minimum number of 
bins needed to pack them all. Afterward, they 
pack the remaining small items in a greedy way 
in enlarged bins of size | + € (i.e., they pack into 
any bin that currently contains items of total size 
at most 1; and if no such bin exists, then they open 
a new bin). 

Furthermore, Hochbaum and Shmoys (see [8]) 
and Alon et al. [1] achieved an improvement to 
linear time by using an integer linear program 
for the cutting stock formulation of bin packing 
for the large items and a result on integer linear 
programming with a fixed number of variables by 
Lenstra [15]. This gives an EPTAS for identical 
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processors with running time f(1/e) + O(n) 
where f is doubly exponential in 1/e. 

For uniform processors, the decision problem 
for the scheduling problem with makespan at 
most T can be viewed as a bin packing problem 
with different bin sizes. Using an e-relaxed ver- 
sion of this bin packing problem, Hochbaum and 
Shmoys [10] were also able to obtain a PTAS for 
scheduling on uniform processors with running 
time (n/e)O/ ©) The main underlying idea in 
their algorithm is a clever rounding technique and 
a nontrivial dynamic programming approach over 
the different bins ordered by their sizes. 


New Results 
Recently, Jansen [11] proposed an EPTAS for 
scheduling jobs on uniform machines: 


Theorem 1 ({11]) There is an EPTAS (a family 
of algorithms {A<|« > 0}) which, given an in- 
stance I of Q||Cmax with n jobs and m processors 
with different speeds and a positive number € > 
0, produces a schedule for the jobs of length 
A.) < (1+ 6)OPT(J). The running time of 
Ag is 
20(1/e? log? (1/€)) + poly(n). 


Interestingly, the running time of the EPTAS 
is only single exponential in 1 /e. 


Integer Linear Programming and 

Grouping Techniques 

The new algorithm uses the dual approximation 
method by Hochbaum and Shmoys [10] to trans- 
form the scheduling problem into a bin packing 
problem with different bin sizes. Next, the input 
is structured by rounding bin sizes and processing 
times to values of the form (1 + 6)! and 6(1 + 8)! 
with i € Z where 6 depends on e. After sorting 
the bins according to their sizes, cy > ... = 
Cm, three groups of bins are built: 6; with the 
largest K bins (where K is constant). Let G be 
the smallest index such that capacity cx+G+1 < 
ycx where y < | depends on e€; such an index G 
exists for Cm < ycx. In this case Bz is the set of 
the next G largest bins where the maximum size 
Cmax(B2) = cKx+1 divided by the minimum size 
Cmin(B2) = Cx+6 is bounded by a constant 1/y 
and 3 is the set with the remaining smaller bins 
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of size smaller than ycx. This generates a gap 
of constant size between the capacities of bins in 
B, and Bs3. If the rate cm /cK, where cm is the 
smallest bin size, is larger the constant y, then a 
simpler instance is obtained with only two groups 
By, and Bp of bins. 

For 6, all packings for the very large items 
are computed (those which only fit there). If there 
is a feasible packing, then a mixed integer linear 
program (MILP) or an integer linear program 
(ILP) in the simpler case is used to place the 
other items into the bins. The placement of the 
large items into the second group B, is done 
via integral configuration variables; similar to the 
ILP formulation for bin packing by Fernandez de 
la Vega and Lueker [6]. Fractional configuration 
variables are used for the placement of large 
items into 63. Furthermore, additional fractional 
variables are taken to place small items into 
B,, Bz, and B3. The MILP has only a constant 
number of integral variables and, therefore, can 
be solved via the algorithm by Lenstra or Kannan 
[14,15]. 

In order to avoid that the running time is 
doubly exponential in 1/e, a recent result by 
Eisenbrand and Shmonin [5] about integer cones 
is used. To apply their result a system of equal- 
ities for the integral configuration variables is 
considered and the corresponding coefficients are 
rounded. Then each feasible solution of the mod- 
ified MILP contains at most O(1/6 log?(1/5)) 
integral variables with values larger than zero. By 
choosing the strictly positive integral variables 
in the MILP, the number of integral configu- 
ration variables is reduced from 20/6 los(1/5)) 
to O(1/5log?(1/8)). The number of choices is 
bounded by 2001/8? log5 (1/8) 

Afterward, the fractional variables in the 
MILP solution are rounded to integral values 
using ideas from scheduling job shops [13] 
and scheduling on unrelated machines [16]. The 
effect of the rounding is that most of the items 
can be placed directly into the bins. Only a few 
of them cannot be placed this way, and here is 
where the K largest bins and the gap between 
B, and 63 come into play. It can be proved that 
these items can be moved to the K largest bins 
by increasing their sizes only slightly. 
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Algorithm Avoiding the MILP 

Recently an EPTAS for scheduling on uniform 
machines is presented by Jansen and Robenek 
[12] that avoids the use of an MILP or ILP solver. 
In the new approach instead of solving (M)ILPs, 
an LP-relaxation and structural information about 
the “closest” ILP solution is used. 

In the following the main techniques are de- 
scribed for identical processors. For a given LP- 
solution x, the distance to the closest ILP solution 
y in the infinity norm is studied, i.e., ||x — y|lgo- 
For the constraint matrix As of the considered LP, 
this distance is defined by 


max -gap(Ag) := max{min{||y* — x*|loo : y* 


solution of ILP} : x*solution of LP}. 


Let C(As) denote an upper bound for max -gap 
(As). The running time of the algorithm is 
20M /elog(1/e)log(C(As))) 4 poly(n). The al- 
gorithm for uniform processors is more 
complex, but we obtain a similar running time 
2O(1/elog(1/e)lox(C(As))) 4+ poly(n), where the 
constraint matrix As is slightly different. For the 
details we refer to [12]. 

It can be proved using a result by Cook 
et al. [4] that C(As), C(As) < 20C/el8*G/)) 
Consequently, the algorithm has a running time 
at most 20(1/e? los? (1/e)) + poly(n), the same as 
in [11]. But, to our best knowledge, no instance 
is known to take on the value 20(/«ls*(/e)) 
for max- gap(As). We conjecture C(As3) < 
poly(1/e). If that holds, the running time of the 
algorithm would be QO /elog*(1/e)) 4 poly(n) 
and thus improve the result in [11]. 


Lower Bounds 

Recently, Chen, Jansen, and Zhang [2] proved the 
following lower bound on the running time: For 
scheduling on an arbitrary number of identical 
machines, denoted by P||Cmax, a polynomial 
time approximation scheme (PTAS) of running 
time 20(0/8)"™) poly(n) for any 6 > 0 would 
imply that the exponential time hypothesis (ETH) 
for 3-SAT fails. 


627 


Open Problems 


The main open question is whether there is an EP- 
TAS for scheduling jobs on identical and uniform 
machines with a running time 20(/«los°(1/e)) 


poly(n). 


Experimental Results 


None is reported. 
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Problem Definition 


In the 50 years since the discovery of the 
structure of DNA, and with new techniques for 
sequencing the entire genome of organisms, 
biology is rapidly moving towards a data- 
intensive, computational Many of 
the newly faced challenges require high- 
performance computing, either due to the 
massive-parallelism required by the problem, 
or the difficult optimization problems that are 
often combinatoric and NP-hard. Unlike the 
traditional uses of supercomputers for regular, 
numerical computing, many problems in biology 
are irregular in structure, significantly more 
challenging to parallelize, and integer-based 
using abstract data structures. 

Biologists are in search of biomolecular 
sequence data, for its comparison with other 


science. 
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genomes, and because its structure determines 
function and leads to the understanding of bio- 
chemical pathways, disease prevention and cure, 
and the mechanisms of life itself. Computational 
biology has been aided by recent advances in 
both technology and algorithms; for instance, 
the ability to sequence short contiguous strings 
of DNA and from these reconstruct the whole 
genome and the proliferation of high-speed 
microarray, gene, and protein chips for the study 
of gene expression and function determination. 
These high-throughput techniques have led to an 
exponential growth of available genomic data. 

Algorithms for solving problems 
computational biology often require parallel 
processing techniques due to the data- and 
compute-intensive nature of the computations. 
Many problems use polynomial time algorithms 
(e.g., all-to-all comparisons) but have long 
running times due to the large number of items 
in the input; for example, the assembly of 
an entire genome or the all-to-all comparison 
of gene sequence data. Other problems 
are compute-intensive due to their inherent 
algorithmic complexity, such as protein folding 
and reconstructing evolutionary histories from 
molecular data, that are known to be NP-hard (or 
harder) and often require approximations that are 
also complex. 


from 


Key Results 


None 


Applications 


Phylogeny Reconstruction 

A phylogeny is a representation of the evolu- 
tionary history of a collection of organisms or 
genes (known as taxa). The basic assumption of 
process necessary to phylogenetic reconstruction 
is repeated divergence within species or genes. 
A phylogenetic reconstruction is usually depicted 
as a tree, in which modern taxa are depicted at the 
leaves and ancestral taxa occupy internal nodes, 
with the edges of the tree denoting evolution- 
ary relationships among the taxa. Reconstructing 
phylogenies is a major component of modern 
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research programs in biology and medicine (as 
well as linguistics). Naturally, scientists are in- 
terested in phylogenies for the sake of knowl- 
edge, but such analyses also have many uses in 
applied research and in the commercial arena. 
Existing phylogenetic reconstruction techniques 
suffer from serious problems of running time (or, 
when fast, of accuracy). The problem is particu- 
larly serious for large data sets: even though data 
sets comprised of sequence from a single gene 
continue to pose challenges (e.g., some analyses 
are still running after 2 years of computation 
on medium-sized clusters), using whole-genome 
data (such as gene content and gene order) gives 
rise to even more formidable computational prob- 
lems, particularly in data sets with large numbers 
of genes and highly-rearranged genomes. 

To date, almost every model of speciation and 
genomic evolution used in phylogenetic recon- 
struction has given rise to NP-hard optimiza- 
tion problems. Three major classes of methods 
are in common use. Heuristics (a natural conse- 
quence of the NP-hardness of the problems) run 
quickly, but may offer no quality guarantees and 
may not even have a well-defined optimization 
criterion, such as the popular neighbor-joining 
heuristic [9]. Optimization based on the crite- 
rion of maximum parsimony (MP) [4] seeks the 
phylogeny with the least total amount of change 
needed to explain modern data. Finally, optimiza- 
tion based on the criterion of maximum likelihood 
(ML) [5] seeks the phylogeny that is the most 
likely to have given rise to the modern data. 

Heuristics are fast and often rival the opti- 
mization methods in terms of accuracy, at least 
on datasets of moderate size. Parsimony-based 
methods may take exponential time, but, at least 
for DNA and amino acid data, can often be 
run to completion on datasets of moderate size. 
Methods based on maximum likelihood are very 
slow (the point estimation problem alone ap- 
pears intractable) and thus restricted to very small 
instances, and also require many more assump- 
tions than parsimony-based methods, but appear 
capable of outperforming the others in terms of 
the quality of solutions when these assumptions 
are met. Both MP- and ML-based analyses are 
often run with various heuristics to ensure timely 
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termination of the computation, with mostly un- 
quantified effects on the quality of the answers 
returned. 

Thus there is ample scope for the application 
of high-performance algorithm engineering in 
the area. As in all scientific computing areas, 
biologists want to study a particular dataset and 
are willing to spend months and even years in 
the process: accurate branch prediction is the 
main goal. However, since all exact algorithms 
scale exponentially (or worse, in the case of 
ML approaches) with the number of taxa, speed 
remains a crucial parameter — otherwise few 
datasets of more than a few dozen taxa could ever 
be analyzed. 


Experimental Results 


As an illustration, this entry briefly describes 
a high-performance software suite, GRAPPA 
(Genome Rearrangement Analysis through 
Parsimony and other Phylogenetic Algorithms) 
developed by Bader et al. GRAPPA extends 
Sankoff and Blanchette’s breakpoint phylogeny 
algorithm [10] into the more biologically- 
meaningful inversion phylogeny and provides 
a highly-optimized code that can make use of 
distributed- and shared-memory parallel systems 
(see [1, 2, 6, 7, 8, 11] for details). In [3], Bader 
et al. gives the first linear-time algorithm and fast 
implementation for computing inversion distance 
between two signed permutations. GRAPPA 
was run on a 512-processor IBM Linux cluster 
with Myrinet and obtained a 512-fold speed- 
up (linear speedup with respect to the number of 
processors): a complete breakpoint analysis (with 
the more demanding inversion distance used in 
lieu of breakpoint distance) for the 13 genomes 
in the Campanulaceae data set ran in less than 
1.5 h in an October 2000 run, for a million- 
fold speedup over the original implementation. 
The latest version features significantly improved 
bounds and new distance correction methods 
and, on the same dataset, exhibits a speedup 
factor of over one billion. GRAPPA achieves this 
speedup through a combination of parallelism 
and high-performance algorithm engineering. 
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Although such spectacular speedups will not 
always be realized, many algorithmic approaches 
now in use in the biological, pharmaceutical, and 
medical communities may benefit tremendously 
from such an application of high-performance 
techniques and platforms. 

This example indicates the potential of ap- 
plying high-performance algorithm engineering 
techniques to applications in computational 
biology, especially in areas that involve complex 
optimizations: Bader’s reimplementation did 
not require new algorithms or entirely new 
techniques, yet achieved gains that turned an 
impractical approach into a usable one. 
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Problem Definition 


Dealing effectively with applications in large net- 
works, it typically requires the efficient solution 
of one ore more underlying algorithmic prob- 
lems. Due to the size of the network, a consid- 
erable effort is inevitable in order to achieve the 
desired efficiency in the algorithm. 

One of the primary tasks in large network 
applications is to answer queries for finding best 
routes or paths as efficiently as possible. Quite 
often, the challenge is to process a vast number of 
such queries on-line: a typical situation encoun- 
tered in several real-time applications (e.g., traffic 
information systems, public transportation sys- 
tems) concerns a query-intensive scenario, where 
a central server has to answer a huge number 
of on-line customer queries asking for their best 
routes (or optimal itineraries). The main goal in 
such an application is to reduce the (average) 
response time for a query. 

Answering a best route (or optimal itinerary) 
query translates in computing a minimum cost 
(shortest) path on a suitably defined directed 
graph (digraph) with nonnegative edge costs. 
This in turn implies that the core algorithmic 
problem underlying the efficient answering of 
queries is the single-source single-target shortest 
path problem. 

Although the straightforward approach of pre- 
computing and storing shortest paths for all pairs 
of vertices would enabling the optimal answer- 
ing of shortest path queries, the quadratic space 
requirements for digraphs with more than 10° 
vertices makes such an approach prohibitive for 
large and very large networks. For this reason, the 
main goal of almost all known approaches is to 
keep the space requirements as small as possible. 
This in turn implies that one can afford a heavy 
(in time) preprocessing, which does not blow up 
space, in order to speed-up the query time. 

The most commonly used approach for an- 
swering shortest path queries employs Dijkstra’s 
algorithm and/or variants of it. Consequently, the 
main challenge is how to reduce the algorithm’s 
search-space (number of vertices visited), as this 
would immediately yield a better query time. 
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Key Results 


All results discussed concern answering of 
optimal (or exact or distance-preserving) shortest 
paths under the aforementioned query-intensive 
scenario, and are all based on the following 
generic approach. A preprocessing of the input 
network G = (V, E£) takes place that results in 
a data structure of size O(|V| + |E|) (ie., linear 
to the size of G). The data structure contains 
additional information regarding certain shortest 
paths that can be used later during querying. 

Depending on the pre-computed additional 
information as well as on the way a shortest path 
query is answered, two approaches can be distin- 
guished. In the first approach, graph annotation, 
the additional information is attached to vertices 
or edges of the graph. Then, speed-up techniques 
to Dijkstra’s algorithm are employed that, based 
on this information, decide quickly which part 
of the graph does not need to be searched. In 
the second approach, an auxiliary graph G' 
is constructed hierarchically. A shortest path 
query is then answered by searching only a small 
part of G’, using Dijkstra’s algorithm enhanced 
with heuristics to further speed-up the query 
time. 

In the following, the key results of the 
first [3, 4, 9, 11] and the second approach [1, 2, 
5, 7, 8, 10] are discussed, as well as results 
concerning modeling issues. 


First Approach: Graph Annotation 

The first work under this approach concerns the 
study in [9] on large railway networks. In that 
paper, two new heuristics are introduced: the 
angle-restriction (that tries to reduce the search 
space by taking advantage of the geometric lay- 
out of the vertices) and the selection of sta- 
tions (a subset of vertices is selected among 
which all pairs shortest paths are pre-computed). 
These two heuristics along with a combination of 
the classical goal-directed or A “ search turned 
out to be rather efficient. Moreover, they moti- 
vated two important generalizations [10, 11] that 
gave further improvements to shortest path query 
times. 
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The full exploitation of geometry-based 
heuristics was investigated in [11], where both 
street and railway networks are considered. In 
that paper, it is shown that the search space 
of Dijkstra’s algorithm can be significantly 
reduced (to 5—10 % of the initial graph size) by 
extracting geometric information from a given 
layout of the graph and by encapsulating 
pre-computed shortest path information in 
resulted geometric objects, called containers. 
Moreover, the dynamic case of the problem was 
investigated, where edge costs are subject to 
change and the geometric containers have to be 
updated. 

A powerful modification to the classical Dijk- 
stra’s algorithm, called reach-based routing, was 
presented in [4]. Every vertex is assigned a so- 
called reach value that determines whether a par- 
ticular vertex will be considered during Dijkstra’s 
algorithm. A vertex is excluded from considera- 
tion if its reach value is small; that is, if it does 
not contribute to any path long enough to be of 
use for the current query. 

A considerable enhancement of the classical 
A * search algorithm using landmarks (selected 
vertices like in [9, 10]) and the triangle inequality 
with respect to the shortest path distances was 
shown in [3]. Landmarks and triangle inequality 
help to provide better lower bounds and hence 
boost A * search. 


Second Approach: Auxiliary Graph 

The first work under this approach concerns the 
study in [10], where a new hierarchical decompo- 
sition technique is introduced called multi-level 
graph. A multi-level graph ™ is a digraph which 
is determined by a sequence of subsets of V and 
which extends E by adding multiple levels of 
edges. This allows to efficiently construct, during 
querying, a subgraph of M which is substantially 
smaller than G and in which the shortest path 
distance between any of its vertices is equal to the 
shortest path distance between the same vertices 
in G. Further improvements of this approach have 
been presented recently in [1]. A refinement of 
the above idea was introduced in [5], where the 
multi-level overlay graphs are introduced. In such 
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a graph, the decomposition hierarchy is not de- 
termined by application-specific information as it 
happens in [9, 10]. 

An alternative hierarchical decomposition 
technique, called highway hierarchies, was 
presented in [7]. The approach takes advantage 
of the inherent hierarchy possessed by real- 
world road networks and computes a hierarchy 
of coarser views of the input graph. Then, the 
shortest path query algorithm considers mainly 
the (much smaller in size) coarser views, thus 
achieving dramatic speed-ups in query time. 
A revision and improvement of this method was 
given in [8]. A powerful combination of the 
highway hierarchies with the ideas in [3] was 
reported in [2]. 


Modeling Issues 

The modeling of the original best route (or 
optimal itinerary) problem on a large network 
to a shortest path problem in a suitably defined 
directed graph with appropriate edge costs also 
plays a significant role in reducing the query time. 
Modeling issues are thoroughly investigated 
in [6]. In that paper, the first experimental 
comparison of two important approaches (time- 
expanded versus time-dependent) is carried out, 
along with new extensions of them towards 
realistic modeling. In addition, several new 
heuristics are introduced to speed-up query 
time. 


Applications 


Answering shortest path queries in large graphs 
has a multitude of applications, especially in 
traffic information systems under the aforemen- 
tioned scenario; that is, a central server has to 
answer, as fast as possible, a huge number of 
on-line customer queries asking for their best 
routes or itineraries. Other applications of the 
above scenario involve route planning systems 
for cars, bikes and hikers, public transport sys- 
tems for itinerary information of scheduled ve- 
hicles (like trains or buses), answering queries 
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in spatial databases, and web searching. All the 
above applications concern real-time systems in 
which users continuously enter their requests for 
finding their best connections or routes. Hence, 
the main goal is to reduce the (average) response 
time for answering a query. 


Open Problems 


Real-world networks increase constantly in size 
either as a result of accumulation of more and 
more information on them, or as a result of the 
digital convergence of media services, commu- 
nication networks, and devices. This scaling-up 
of networks makes the scalability of the under- 
lying algorithms questionable. As the networks 
continue to grow, there will be a constant need 
for designing faster algorithms to support core 
algorithmic problems. 


Experimental Results 


All papers discussed in section “Key Results” 
contain important experimental studies on the 
various techniques they investigate. 


Data Sets 


The data sets used in [6, 11] are available from 
http://lso-compendium.cti.gr/ under problems 26 
and 20, respectively. 

The data sets used in [1, 2] are available from 
http://www.dis.uniroma | .it/~challenge9/. 


URL to Code 


The code used in [9] is available from http://doi. 
acm.org/10.1145/35 1827.384254. 

The code used in [6, 11] is available from 
http://lso-compendium.cti.gr/ under problems 26 
and 20, respectively. 
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The code used in [3] is available from http:// 
www.avglab.com/andrew/soft.html. 
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Problem Definition 


Transforming a theoretical geometric algorithm 
into an effective computer program abounds with 
hurdles. Overcoming these difficulties is the con- 
cern of engineering geometric algorithms, which 
deals, more generally, with the design and imple- 
mentation of certified and efficient solutions to 
algorithmic problems of geometric nature. Typ- 
ical problems in this family include the con- 
struction of Voronoi diagrams, triangulations, ar- 
rangements of curves and surfaces (namely, space 
subdivisions), two- or higher-dimensional search 
structures, convex hulls and more. 

Geometric algorithms strongly couple topo- 
logical/combinatorial structures (e.g., a graph de- 
scribing the triangulation of a set of points) on 
the one hand, with numerical information (e.g., 
the coordinates of the vertices of the triangula- 
tion) on the other. Slight errors in the numerical 
calculations, which in many areas of science 
and engineering can be tolerated, may lead to 
detrimental mistakes in the topological structure, 
causing the computer program to crash, to loop 
infinitely, or plainly to give wrong results. 

Straightforward implementation of geometric 
algorithms as they appear in a textbook, using 
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standard machine arithmetic, is most likely to 
fail. This entry is concerned only with certified 
solutions, namely, solutions that are guaranteed 
to construct the exact desired structure or a good 
approximation of it; such solutions are often 
referred to as robust. 

The goal of engineering geometric algorithms 
can be restated as follows: Design and implement 
geometric algorithms that are at once robust and 
efficient in practice. 

Much of the difficulty in adapting in practice 
the existing vast algorithmic literature in compu- 
tational geometry comes from the assumptions 
that are typically made in the theoretical study 
of geometric algorithms that (1) the input is in 
general position, namely, degenerate input is pre- 
cluded, (2) computation is performed on an ideal 
computer that can carry out real arithmetic to in- 
finite precision (so-called real RAM), and (3) the 
cost of operating on a small number of simple 
geometric objects is “unit” time (e.g., equal cost 
is assigned to intersecting three spheres and to 
comparing two integer numbers). 

Now, in real life, geometric input is quite 
often degenerate, machine precision is limited, 
and operations on a small number of simple 
geometric objects within the same algorithm may 
differ 100-fold and more in the time they take 
to execute (when aiming for certified results). 
Just implementing an algorithm carefully may 
not suffice and often redesign is called for. 


Key Results 


Tremendous efforts have been invested in 
the design and implementation of robust 
computational-geometry software in recent years. 
Two notable large-scale efforts are the CGAL 
library [1] and the geometric part of the LEDA 
library [14]. These are jointly reviewed in the 
survey by Kettner and Naher [13]. Numerous 
other relevant projects, which for space 
constraints are not reviewed here, are surveyed by 
Joswig [12] with extensive references to papers 
and Web sites. 

A fundamental engineering decision to 
take when coming to implement a geometric 
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algorithm is what will the underlying arithmetic 
be, that is, whether to opt for exact computation 
or use the machine floating-point arithmetic. 
(Other less commonly used options exist as well.) 
To date, the CGAL and LEDA libraries are almost 
exclusively based on exact computation. One 
of the reasons for this exclusivity is that exact 
computation emulates the ideal computer (for 
restricted problems) and makes the adaptation of 
algorithms from theory to software easier. This 
is facilitated by major headway in developing 
tools for efficient computation with rational 
or algebraic numbers (GMP [3], LEDA [14], 
CORE [2] and more). On top of these tools, clever 
techniques for reducing the amount of exact com- 
putation were developed, such as floating-point 
filters and the higher-level geometric filtering. 

The alternative is to use the machine floating- 
point arithmetic, having the advantage of be- 
ing very fast. However, it is nowhere near the 
ideal infinite precision arithmetic assumed in the 
theoretical study of geometric algorithms and 
algorithms have to be carefully redesigned. See, 
for example, the discussion about imprecision in 
the manual of QHULL, the convex hull program 
by Barber et al. [5]. Over the years a variety 
of specially tailored floating-point variants of 
algorithms have been proposed, for example, the 
carefully crafted VRONI package by Held [11], 
which computes the Voronoi diagram of points 
and line segments using standard floating-point 
arithmetic, based on the topology-oriented ap- 
proach of Sugihara and Iri. While VRONI works 
very well in practice, it is not theoretically cer- 
tified. Controlled perturbation [9] emerges as 
a systematic method to produce certified ap- 
proximations of complex geometric constructs 
while using floating-point arithmetic: the input 
is perturbed such that all predicates are com- 
puted accurately even with the limited-precision 
machine arithmetic, and a method is given to 
bound the necessary magnitude of perturbation 
that will guarantee the successful completion of 
the computation. 

Another decision to take is how to represent 
the output of the algorithm, where the major issue 
is typically how to represent the coordinates of 
vertices of the output structure(s). Interestingly, 
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this question is crucial when using exact com- 
putation since there the output coordinates can 
be prohibitively large or simply impossible to 
finitely enumerate. (One should note though that 
many geometric algorithms are selective only, 
namely, they do not produce new geometric en- 
tities but just select and order subsets of the 
input coordinates. For example, the output of an 
algorithm for computing the convex hull of a set 
of points in the plane is an ordering of a subset 
of the input points. No new point is computed. 
The discussion in this paragraph mostly applies to 
algorithms that output new geometric constructs, 
such as the intersection point of two lines.) But 
even when using floating-point arithmetic, one 
may prefer to have a more compact bit-size rep- 
resentation than, say, machine doubles. In this 
direction there is an effective, well-studied so- 
lution for the case of polygonal objects in the 
plane, called snap rounding, where vertices and 
intersection points are snapped to grid vertices 
while retaining certain topological properties of 
the exact desired structure. Rounding with guar- 
antees is in general a very difficult problem, 
and already for polyhedral objects in 3-space the 
current attempts at generalizing snap rounding 
are very costly (increasing the complexity of 
the rounded objects to the third, or even higher, 
power). 

Then there are a variety of engineering issues 
depending on the problem at hand. Following 
are two examples of engineering studies where 
the experience in practice is different from what 
the asymptotic resource measures imply. The 
examples relate to fundamental steps in many 
geometric algorithms: decomposition and point 
location. 


Decomposition 

A basic step in many geometric algorithms is 
to decompose a (possibly complex) geometric 
object into simpler subobjects, where each 
subobject typically has constant descriptive 
complexity. A well-known example is_ the 
triangulation of a polygon. The choice of 
decomposition may have a significant effect on 
the efficiency in practice of various algorithms 
that rely on decomposition. Such is the case 
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when constructing Minkowski sums of polygons 
in the plane. The Minkowski sum of two sets A 
and B in R@ is the vector sum of the two sets 
A®B={a+blaeA,b € B}. The simplest 
approach to computing Minkowski sums of 
two polygons in the plane proceeds in three 
steps: triangulate each polygon, then compute 
the sum of each triangle of one polygon with 
each triangle of the other, and finally take 
the union of all the subsums. In asymptotic 
measures, the choice of triangulation (over 
alternative decompositions) has no effect. In 
practice though, triangulation is probably the 
worst choice compared with other convex 
decompositions, even fairly simple heuristic 
ones (not necessarily optimal), as shown by 
experiments on a dozen different decomposition 
methods [4]. The explanation is that triangulation 
increases the overall complexity of the subsums 
and in turn makes the union stage more complex — 
indeed by a constant factor, but a noticeable 
factor in practice. Similar phenomena were 
observed in other situations as well. For 
example, when using the prevalent vertical 
decomposition of arrangements — often it is too 
costly compared with sparser decompositions 
(i.e., decompositions that add fewer extra 
features). 


Point Location 

A recurring problem in geometric computing 
is to process given planar subdivision (planar 
map), so as to efficiently answer point-location 
queries: Given a point g in the plane, which 
face of the map contains g? Over the years 
a variety of point-location algorithms for 
planar maps were implemented in CGAL, in 
particular, a hierarchical search structure that 
guarantees logarithmic query time after expected 
O(n logn) preprocessing time of a map with 
n edges. This algorithm is referred to in CGAL 
as the RIC point-location algorithm after the 
preprocessing method which uses randomized 
incremental construction. Several simpler, easier- 
to-program algorithms for point location were 
also implemented. None of the latter beats the 
RIC algorithm in query time. However, the RIC 
is by far the slowest of all the implemented 
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algorithms in terms of preprocessing, which in 
many scenarios renders it less effective. One 
of the simpler methods devised is a variant 
of the well-known jump-and-walk approach to 
point location. The algorithm scatters points 
(so-called landmarks) in the map and maintains 
the landmarks (together with their containing 
faces) in a nearest-neighbor search structure. 
Once a query q is issued it finds the nearest 
landmark £ to q, and “walks” in the map from 
£ toward q along the straight line segment 
connecting them. This landmark approach offers 
query time that is only slightly more expensive 
than the RIC method while being very efficient 
in preprocessing. The full details can be found 
in [10]. This is yet another consideration when 
designing (geometric) algorithms: the cost of 
preprocessing (and storage) versus the cost of 
a query. Quite often the effective (practical) 
tradeoff between these costs needs to be deduced 
experimentally. 


Applications 


Geometric algorithms are useful in many areas. 
Triangulations and arrangements are examples 
of basic constructs that have been intensively 
studied in computational geometry, carefully im- 
plemented and experimented with, as well as used 
in diverse applications. 


Triangulations 

Triangulations in two and three dimensions 
are implemented in CGAL [7]. In fact, CGAL 
offers many variants of triangulations useful for 
different applications. Among the applications 
where CGAL triangulations are employed are 
meshing, molecular modeling, meteorology, 
photogrammetry, and geographic information 
systems (GIS). For other available triangulation 
packages, see the survey by Joswig [12]. 


Arrangements 
Arrangements of curves in the plane are 
supported by CGAL [15], as well as_ en- 
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velopes of surfaces in three-dimensional space. 
Forthcoming is support also for arrangements 
of curves on surfaces. CGAL arrangements 
have been used in motion planning algorithms, 
computer-aided design and manufacturing, GIS, 
computer graphics, and more (see Chap. | in [6]). 


Open Problems 


In spite of the significant progress in certified im- 
plementation of effective geometric algorithms, 
the existing theoretical algorithmic solutions for 
many problems still need adaptation or redesign 
to be useful in practice. One example where 
progress can have wide repercussions is devising 
effective decompositions for curved geometric 
objects (e.g., arrangements) in the plane and for 
higher-dimensional objects. As mentioned ear- 
lier, suitable decompositions can have a signif- 
icant effect on the performance of geometric 
algorithms in practice. 

Certified fixed-precision geometric computing 
lags behind the exact computing paradigm in 
terms of available robust software, and moving 
forward in this direction is a major challenge. 
For example, creating a certified floating-point 
counterpart to CGAL is a desirable (and highly 
intricate) task. 

Another important tool that is largely missing 
is consistent and efficient rounding of geometric 
objects. As mentioned earlier, a fairly satisfactory 
solution exists for polygonal objects in the plane. 
Good techniques are missing for curved objects 
in the plane and for higher-dimensional objects 
(both linear and curved). 


URL to Code 


http://www.cgal.org 
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Problem Definition 


Let P be a set of n points in the plane in general 
position, i.e., no three points are collinear. A 
geometric graph on P is a graph on the vertex set 
P whose edges are straight-line segments con- 
necting points in P. A geometric graph is called 
non-crossing (or crossing-free) if any pair of its 
edges does not have a point in common except 
possibly their endpoints. We denote by P(P) the 
set of all non-crossing geometric graphs on P 
(which are also called plane straight-line graphs 
on P). A graph class C(P) C P(P) can be 
defined by imposing additional properties such 
as connectivity, degree bound, or cycle-freeness. 
Examples of C(P) are the set of triangulations 
(i.e., inclusion-wise maximal graphs in P(P)), 
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the set of non-crossing perfect matchings, the set 
of non-crossing spanning k-connected graphs, 
the set of non-crossing spanning trees, and the 
set of non-crossing spanning cycles (i.e., sim- 
ple polygons). The problem is to enumerate all 
graphs in C(P) for a given set P of 1 points in 
the plane. 

The following notations will be used to 
denote the cardinality of C(P): tri(P) for 
triangulations, pg(P) for plane straight-line 
graphs, st(P) for non-crossing spanning trees, 
and cg(P) for non-crossing spanning connected 
graphs. 


Key Results 


Enumeration of Triangulations 

The first efficient enumeration algorithm for tri- 
angulations was given by Avis and Fukuda [3] as 
an application of their reverse search technique. 
The algorithm relies on well-known properties of 
Delaunay triangulations. 

A triangulation T on P is called Delaunay if 
no point in P is contained in the interior of the 
circumcircle of a triangle in 7’. If itis assumed for 
simplicity that no four points in P lie on a circle, 
then the Delaunay triangulation on P exists and 
is unique. The Delaunay triangulation has the 
lexicographically largest angle vector among all 
triangulations on P, where the angle vector of a 
triangulation is the list of all the angles sorted in 
nondecreasing order. 

For a triangulation T, a Lawson edge is an 
edge ab which is incident to two triangles, say 
abc and abd in T, and the circumcircle of abc 
contains d in its interior. Flipping a Lawson 
edge ab (i.e., replacing ab with another diagonal 
edge cd) always creates a triangulation having a 
lexicographically larger angle vector. Moreover a 
triangulation has a Lawson edge if and only if it 
is not Delaunay. In other words, any triangulation 
can be converted to the Delaunay triangulation by 
flipping Lawson edges. 

In the algorithm by Avis and Fukuda, a rooted 
search tree on the set of triangulations is defined 
such that the root is the Delaunay triangulation 
and the parent of a non-Delaunay triangulation T 
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is a triangulation obtained by flipping the smallest 
Lawson edge in T (assuming a fixed total order- 
ing on edges). Since the Delaunay triangulation 
can be computed in O(n log 7) time, all the trian- 
gulations can be enumerated by tracing the rooted 
search tree based on the reverse search technique. 
A careful implementation achieves O(n - tr(P)) 
time with O(7) space. 

An improved algorithm was given by 
Bespamyatnikh [5], which runs in O(log logn - 
tr(P)) time with O(7) space. His algorithm 
is also based on the reverse search technique, 
but the rooted search tree is defined by using 
the lexicographical ordering of edge vectors 
rather than angle vectors. This approach was 
also applied to the enumeration of pointed 
pseudo-triangulations [4]. See [6] for another 
approach. 


Enumeration of Non-crossing Geometric 
Graphs 

In [3], Avis and Fukuda also developed an enu- 
meration algorithm for non-crossing spanning 
trees, whose running time is O(n? - sp(P)). 
This was improved to O(nlogn - sp(P)) by 
Aichholzer et al. [1]. They also gave enumeration 
algorithms for plane straight-line graphs and non- 
crossing spanning connected graphs with running 
time O(n logn - pg(P)) and O(n logn - sc(P)), 
respectively. 

Katoh and Tangiawa [8] proposed a simple 
enumeration technique for wider classes of non- 
crossing geometric graphs. The same approach 
was independently given by Razen and Welz! [9] 
for counting the number of plane straight-line 
graphs, and the following description in terms of 
Delaunay triangulations is from [9]. 

Since each graph in C(P) is a subgraph of 
a triangulation, one can enumerate all graphs in 
C(P) by first enumerating all triangulations and 
then enumerating all graphs in C(P) in each 
triangulation. The output may contain duplicates, 
but one can avoid duplicates by enumerating only 
graphs in{G € C(P) | L(T) C E(G) C E(T)} 
for each triangulation 7’, where L(T) denotes the 
set of the Lawson edges in 7. This enumera- 
tion framework leads to an algorithm with time 
complexity O((t? + log logn)tri(P) +t-c(P)) 
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and space complexity O(n + s) provided that 
graphs in {G € C(G) | L(T) © E(G) C 
E(T)} can be enumerated in O(f) time per graph 
with O(t?) time preprocessing and O(s) space 
for each triangulation 7. For example, in the 
case of non-crossing spanning trees, one can 
use a fast enumeration algorithm for spanning 
trees in a given undirected graph to solve each 
subproblem, and the current best implementa- 
tion gives an enumeration algorithm for non- 
crossing spanning trees with time complexity 
O(n - tri(P) + st(P)). 

For plane straight-line graphs and spanning 
connected graphs, pg(P) > (/8)"tri(P) [9] 
and cs(P) > 1.51”tri(P) [8] hold for any 
P in general position. Hence tri(P) is domi- 
nated by pg(P) and cs(P), respectively, and 
plane straight-line graphs or non-crossing span- 
ning connected graphs can be enumerated in 
constant time on average with O(n) space [8]. 
The same technique can be applied to the set of 
non-crossing spanning 2-connected graphs. It is 
not known whether there is a constant c > 1 such 
that st(P) > c"tri(P) for every P in general 
position. 

In [8] an approach that avoids enumerating all 
triangulations was also discussed. Suppose that a 
nonempty subset Z of P(P) satisfies a monotone 
property, i.c., for every G,G’ € P(P) with 
G C G’,G’ € T implies G € TZ, and suppose 
that C(P) is the set of all maximal elements in 
TZ. Then all graphs in C(P) can be enumerated 
just by enumerating all triangulations T on P 
with L(T) € Z, and this can be done efficiently 
based on the reverse search technique. This ap- 
proach leads to an algorithm for enumerating 
non-crossing minimally rigid graphs in O(n”) 
time per output with O(7) space, where a graph 
G = (V,E£) is called minimally rigid if |E| = 
2|V| — 3 and |E’| < 2|V’| — 3 for any subgraph 
G’ = (V’, E’) with |V’| > 2. 


Enumeration of Non-crossing Perfect 
Matchings 

Wettstein [10] proposed a new enumeration (and 
counting) technique for non-crossing geometric 
graphs. This is motivated from a counting 
algorithm of triangulations by Alvarez and 


640 


Seidel [2] and can be used for enumerating, 
e.g., non-crossing perfect matchings, plane 
straight-line graphs, convex subdivisions, and 
triangulations. The following is a sketch of the 
algorithm for non-crossing perfect matchings. 

A matching can be reduced to an empty graph 
by removing edges one by one. By fixing a rule 
for the removing edge in each matching, one can 
define a rooted search tree 7 on the set of non- 
crossing matchings, and the set of non-crossing 
matchings can be enumerated by tracing 7. To 
reduce time complexity, the first idea is to trace 
only a subgraph 7’ of 7 induced by a subclass 
of non-crossing matchings by a clever choice of 
removing edges. Another idea is a compression 
of the search tree J’ by using an equivalence 
relation on the subclass of non-crossing match- 
ings. The resulting graph G is a digraph on the set 
of equivalence classes, where there is a one-to- 
one correspondence between non-crossing per- 
fect matchings and directed paths of length n/2 
from the root. A crucial observation is that G has 
at most 2”n? edges while the number of non- 
crossing perfect matchings is known to be at least 
poly(n) - 2” for any P in general position [7]. 
Hence non-crossing perfect matchings can be 
enumerated in polynomial time on average by 
first constructing G and then enumerating all the 
dipaths of length n/2 in G. It was also noted in 
[10] that the algorithm can be polynomial-time 
delay, but still the space complexity is exponen- 
tial in n. 


Open Problems 


A challenging open problem is to design an 
efficient enumeration algorithm for the set of 
non-crossing spanning cycles, the set of highly 
connected triangulations, or the set of degree- 
bounded triangulations or non-crossing spanning 
trees. It is also not known whether triangulations 
can be enumerated in constant time per output. 
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Problem Definition 


Let G = (V,E) be a (directed or undirected) 
graph with n = |V| vertices and m = |E| 
edges. A walk of length k is a sequence of 
vertices U9,..., Ux € V such that v; and v; +1 are 
connected by an edge of E, for any0 <i <k. 
A path x of length k is a walk vo,..., vg% such 
that any two vertices v; and v; are distinct, for 
0 <i <j <k: this is also called st-path where 
S = vo and t = vg. A cycle (or, equivalently, 
elementary circuit) C of length k + 1 is a path 
Uo,--+-, Ux Such that vz and vg are connected by 
an edge of FE. 

We denote by Ps:(G) the set of sf-paths in 
G for any two given vertices s,t € V and 
by C(G) the set of cycles in G. Given a graph 
G, the problem of st-path enumeration asks for 
generating all the paths in Ps+(G). The problem 
of cycle enumeration asks for generating all the 
cycles in C(G). 

We denote by S(G) the set of spanning trees 
in a connected graph G, where a spanning tree 
T C E isa set of |J| = n — 1 edges such that 
no cycles are contained in T and each vertex in 
V is incident to at least an edge of 7. Given a 
connected graph G, the problem of spanning tree 
enumeration asks for generating all the spanning 
trees in S(G). 

Typical costs of enumeration algorithms are 
proportional to the output size times a polynomial 
function of the graph size. Sometimes enumera- 
tion is meant with the stronger property of listing, 
where each solution is explicitly output. In the 
latter case, we define an algorithm for a listing 
problem to be optimally output sensitive if its 


641 


running time is O(n + m+ K) where K is the 
following output cost for the enumeration prob- 
lem at hand, namely, P,;(G), C(G), or S(G). 


° K = Vrep,,(G) || where |z| is the number 
of nodes in the sf-path zr. 

* K = )'cecay |C| where |C| is the number 
nodes in the cycle C. 

° K = reso) |T| = |S(G)|- (@ — 1) for 
spanning trees. 


Although the above is a notion of optimality 
for listing solutions explicitly, it is possible in 
some cases that the enumeration algorithm can 
efficiently encode the differences between con- 
secutive solutions in the sequence produced by 
the enumeration. This is the case of spanning 
trees, where a cost of K = |S(G)| is possible 
when they are implicitly represented during enu- 
meration. This is called CAT (constant amortized 
time) enumeration in [28]. 


Key Results 


Some possible approaches to attack the enumer- 
ation problems are listed below, where the term 
“search” is meant as an exploration of the space 
of solutions. 

Backtrack search. A backtracking algorithm 
finds the solutions for a listing problem by ex- 
ploring the search space and abandoning a partial 
solution (thus the name “backtracking’’) that can- 
not be completed to a valid one. 

Binary partition search. An algorithm divides 
the search space into two parts. In the case of 
graphs, this is generally done by taking an edge 
(or a vertex) and (i) searching for all solutions that 
include that edge (resp. vertex) and (ii) searching 
for all solutions that do not include that edge 
(resp. vertex). Point (i) can sometimes be imple- 
mented by contracting the edge, i.e., merging the 
endpoints of the edge and their adjacency list. 

Differential encoding search. The space of 
solutions is encoded in such a way that consec- 
utive solutions differ by a constant number of 
modifications. Although not every enumeration 
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problem has properties that allow such encoding, 
this technique leads to very efficient algorithms. 

Reverse search. This is a general technique to 
explore the space of solutions by reversing a local 
search algorithm. This approach implicitly gener- 
ates a tree of the search space that is traversed by 
the reverse search algorithm. One of the proper- 
ties of this tree is that it has bounded height, a 
useful fact for proving the time complexity of the 
algorithm. 

Although there is some literature on tech- 
niques for enumeration problems [38, 39, 41], 
many more techniques and “tricks” have been in- 
troduced when attacking particular problems. For 
a deep understanding of the topic, the reader is 
recommended to review the work of researchers 
such as David Avis, Komei Fukuda, Shin-ichi 
Nakano, and Takeaki Uno. 


Path and Cycles 

Listing all the cycles in a graph is a classical 
problem whose efficient solutions date back to 
the early 1970s. In particular, at the turn of the 
1970s, several algorithms for enumerating all 
cycles of an undirected graph were proposed. 
There is a vast body of work, and the majority of 
the algorithms listing all the cycles can be divided 
into the following three classes (see [1, 23] for 
excellent surveys). 

Search space algorithms. Cycles are looked 
for in an appropriate search space. In the case 
of undirected graphs, the cycle vector space [6] 
turned out to be the most promising choice: from 
a basis for this space, all vectors are computed, 
and it is tested whether they are a cycle. Since the 
algorithm introduced in [43], many algorithms 
have been proposed: however, the complexity of 
these algorithms turns out to be exponential in the 
dimension of the vector space and thus in n. For 
the special case of planar graphs, the paper in [34] 
describes an algorithm listing all the cycles in 
O((|C(G)| + 1)n) time. 

Backtrack algorithms. All paths are generated 
by backtrack, and, for each path, it is tested 
whether it is a cycle. One of the first algo- 
rithms based on this approach is the one pro- 
posed in [37], which is however exponential in 
|C(G)|. By adding a simple pruning strategy, 
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this algorithm has been successively modified 
in [36]: it lists all the cycles in O(nm(|C(G)| + 
1)) time. Further improvements were proposed 
in [16], [35], and [27], leading to O((|C(G)| + 
1)(m + n)) time algorithms that work for both 
directed and undirected graphs. Apart from the 
algorithm in [37], all the algorithms based on this 
approach are polynomial-time delay, that is, the 
time elapsed between the outputting of two cycles 
is polynomial in the size of the graph (more 
precisely, O(nm) in the case of the algorithm 
of [36] and O(m) in the case of the other three 
algorithms). 

Algorithms using the powers of the adjacency 
matrix. This approach uses the so-called variable 
adjacency matrix, that is, the formal sum of edges 
joining two vertices. A nonzero element of the 
pth power of this matrix is the sum of all walks of 
length p: hence, to compute all cycles, we com- 
pute the mth power of the variable adjacency ma- 
trix. This approach is not very efficient because 
of the non-simple walks. All algorithms based on 
this approach (e.g., [26] and [45]) basically differ 
only on the way they avoid to consider walks that 
are neither paths nor cycles. 

For directed graphs, the best known algorithm 
for listing cycles is Johnson’s [16]. It builds 
upon Tarjan’s backtracking search [36], where 
the search starts from the least vertex of each 
strongly connected component. After that, a new 
strongly connected component is discovered, and 
the search starts again from the least vertex in it. 
When exploring a strongly connected component 
with a recursive backtracking procedure, it uses 
an enhanced marking system to avoid visiting the 
same cycle multiple times. A vertex is marked 
each time it enters the backtracking stack. Upon 
leaving the stack, if a cycle is found, then the 
vertex is unmarked. Otherwise, it remains marked 
until another vertex involved in a cycle is popped 
from the stack, and there exists a path of marked 
vertices (not in the stack) between these two 
vertices. This strategy is implemented using a 
collection of lists B, one list per vertex containing 
its marked neighbors not in the stack. Unmarking 
is done by a recursive procedure. The complexity 
of the algorithm is O(n +m + |C(G)|m) time and 
O(n + m) space. 
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For undirected graphs, Johnson’s bound can 
be improved with an optimal output-sensitive 
algorithm [2]. First of all, the cycle enumeration 
problem is reduced to the st-path enumeration by 
considering any spanning tree of the given graph 
G and its non-tree edges b;, b2,..., b,. Then, for 
i = 1,2,...,r, the cycles in C(G) can be listed 
as st-paths in G \ {by,...,b;}, where s and ¢ 
are the endpoint of non-tree edge b;. Hence, the 
subproblem to be solved with an optimal output- 
sensitive algorithm is the st-path enumeration 
problem. Binary partition search is adopted to 
avoid duplicated output, but the additional ingre- 
dient is the notion of certificate, which is a suit- 
able data structure that maintains the biconnected 
components of the residual graph and guarantees 
that each recursive call thus produces at least 
one solution. Its amortized analysis is based on 
a lower bound on the number of st-paths that can 
be listed in the residual graph, so as to absorb the 
cost of maintaining the certificate. The final cost 
is O(m+Nn+ yo rep,;(G) ||) time and O(n +m) 
space, which is optimal for listing. 


Spanning Trees 

Listing combinatorial structures in graphs has 
been a long-time problem of interest. In his 1970 
book [25], Moon remarks that “many papers have 
been written giving algorithms, of varying de- 
grees of usefulness, for listing the spanning trees 
of a graph” (citation taken from [28]). Among 
others, he cites [7,9, 10, 13,42] — some of these 
early papers date back to the beginning of the 
twentieth century. More recently, in the 1960s, 
Minty proposed an algorithm to list all spanning 
trees [24]. 

The first algorithmic solutions appeared in the 
1960s [24] and the combinatorial papers even 
much earlier [25]. Other results from Welch, 
Tiernan, and Tarjan for this and other problems 
soon followed [36, 37,43] and used backtracking 
search. Read and Tarjan presented an algorithm 
taking O(m +n + |S(G)|-m) time and O(m + 
n) space [27]. Gabow and Myers proposed the 
first algorithm [11] which is optimal when the 
spanning trees are explicitly listed, taking O(m+ 
n + |S(G)|-7) time and O(m + n) space. 
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When the spanning trees are implicitly 
enumerated, Kapoor and Ramesh [17] showed 
that an elegant incremental representation is 
possible by storing just the O(1) information 
needed to reconstruct a spanning tree from 
the previously enumerated one, requiring a 
total of O(m +n + |S(G)|) time and O(mn) 
space [17], later reduced to O(m) space by 
Shioura et al. [32]. These methods use the reverse 
search where the elements are the spanning trees. 
The rule for moving along these elements and 
for their differential encoding is based upon 
the observation that adding a non-tree edge 
and removing a tree edge of the cycle thus 
formed produces another spanning tree from 
the current one. Some machinery is needed to 
avoid duplicated spanning trees and to spend 
O(1) amortized cost per generated spanning 
tree. 

A simplification of the incremental enumera- 
tion of spanning trees is based on matroids and 
presented by Uno [39]. It is a binary partition 
search giving rise to a binary enumeration tree, 
where the two children calls generated by the 
current call correspond to the fact that the current 
edge is either contracted in O(n) time or deleted 
in O(m — n) time. There is a trimming and 
balancing phase in O(n(m — n)) time: trimming 
removes the edges that do not appear in any of 
the spanning trees that will be generated by the 
current recursive call and contracts the edges that 
appear in all of these spanning trees. Balancing 
splits the recursive calls as in the divide-and- 
conquer paradigm. A crucial property proved 
in [39] is that the residual graph will generate at 
least 2(n(m — n)) spanning trees, and thus the 
total cost per call, which is dominated by trim- 
ming and balancing, can be amortized as O(1) 
per spanning tree. The method in [39] works 
also for directed spanning trees (arborescences) 
with an amortized O(log 7) time cost per directed 
spanning tree. 


Applications 


The classical problem of listing all the cycles of a 
graph has been extensively studied for its many 
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applications in several fields, ranging from the 
mechanical analysis of chemical structures [33] 
to the design and analysis of reliable commu- 
nication networks and the graph isomorphism 
problem [43]. Almost 40 years after, the problem 
of efficiently listing all cycles of a graph is still 
an active area of research (e.g., [14, 15,22, 29, 30, 
44]). New application areas have emerged in the 
last decade, such as bioinformatics: for example, 
two algorithms for this problem have been pro- 
posed in [20] and [21] while studying biological 
interaction graphs, with important network prop- 
erties derived for feedback loops, signaling paths, 
and dependency matrix, to name a few. 

When considering weighted cycles, the paper 
in [19] proves that there is no polynomial total 
time algorithm (unless P = MNP) to enumer- 
ate negative-weight (simple) cycles in directed 
weighted graphs. Uno [40] and Ferreira et al. [8] 
considered the enumeration of chordless cycles 
and paths. A chordless or induced cycle (resp., 
path) in an undirected graph is a cycle (resp., 
path) such that the subgraph induced by its ver- 
tices contains exactly the edges of the cycle 
(resp., path). Both chordless cycles and paths are 
very natural structures in undirected graphs with 
an important history, appearing in many papers 
in graph theory related to chordal graphs, perfect 
graphs, and co-graphs (e.g., [4,5,31]), as well 
as many NP-complete problems involving them 
(e.g., [3, 12, 18]). 

As for spanning trees, we refer to the section 
“K-best enumeration” of this book. 
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Problem Definition 


A priority queue is an abstract data structure 
that maintains a set O of elements, each with an 
associated value called a key, under the following 
set of operations [5, 6]: 


insert( Q, x, k ): Inserts element x with key k 
into QO. 

find-min( Q ): Returns an element of Q with the 
minimum key but does not change Q. 

delete( QO, x, k ): Deletes element x with key 
k from Q. 


Additionally, the following operations are often 
supported: 
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delete-min( Q ): Deletes an element with the 
minimum key value from Q and returns it. 

decrease-key( Q, x, k ): Decreases the current 
key k’ of x tok assuming k < k’, 

meld( Q, , Q2 ): Given priority queues Q; and 
Q>, returns the priority queue Q; U Qo. 


Observe that a delete-min can be implemented as 
a find-min followed by a delete, a decrease-key 
as a delete followed by an insert, and a meld as 
a series of find-min, delete and insert. However, 
more efficient implementations of decrease-key 
and meld often exist [5, 6]. 

Priority queues have many practical ap- 
plications including event-driven simulation, 
job scheduling on a shared computer, and 
computation of shortest paths, minimum 
spanning forests, minimum cost matching, 
optimum branching, etc. [5,6]. 

A priority queue can trivially be used for sort- 
ing by first inserting all keys to be sorted into the 
priority queue and then by repeatedly extracting 
the current minimum. The major contribution 
of Mikkel Thorup’s 2002 article (Full version 
published in 2007) titled “Equivalence between 
Priority Queues and Sorting” [17] is a reduction 
showing that the converse is also true. Taken 
together, these two results imply that priority 
queues are computationally equivalent to sorting, 
that is, asymptotically, the per key cost of sorting 
is the update time of a priority queue. 

A result similar to those in the current work 
[17] was presented earlier by the same author [14] 
which resulted in monotone priority queues (i.e., 
meaning that the extracted minimums are non- 
decreasing) with amortized time bounds only. In 
contrast, the current work [17] constructs general 
priority queues with worst-case bounds. 

In addition to establishing the equivalence 
between priority queues and sorting, Thorup’s 
reductions [17] are also used to translate several 
known sorting results into new results on priority 
queues. 


Background 

Some relevant background information is 
summarized below which will be useful in 
understanding the key results in section “Key 
Results.” 
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¢ A standard word RAM models what one 
programs in a standard imperative program- 
ming language such as C. In addition to 
direct and indirect addressing and conditional 
jumps, there are functions, such as addition 
and multiplication, operating on a constant 
number of words. The memory is divided 
into words, addressed linearly starting from 0. 
The running time of a program is the number 
of instructions executed and the space is the 
maximal address used. The word length is a 
machine-dependent parameter which is big 
enough to hold a key and at least logarithmic 
in the number of input keys so that they can 
be addressed. 

¢ A pointer machine is like the word RAM 
except that addresses cannot be manipulated. 

* The AC® complexity class consists of 
constant-depth circuits with unlimited fan- 
in [18]. Standard AC® operations refer to 
the operations available via C but where the 
functions on words are in AC®. For example, 
this includes addition but not multiplication. 

¢ Integer keys will refer to nonnegative integers. 
However, if the input keys are signed integers, 
the correct ordering of the keys is obtained 
by flipping their sign bits and interpreting 
them as unsigned integers. Similar tricks work 
for floating point numbers and integer frac- 
tions [14]. 

¢ The atomic heaps of Fredman and Willard 
[7] are used in one of Thorup’s reductions 
[17]. These heaps can support updates and 
searches in sets of O (log? n) keys in O (1) 
worst-case time [20]. However, atomic heaps 
use multiplication operations which are not 
in AC. 


Key Results 


The main results in this paper are two reductions 
from priority queues to sorting. The stronger 
of the two, stated in Theorem 1, is for integer 
priority queues running on a standard word RAM. 


Theorem 1 /f for some nondecreasing function 
S, up to n integer keys can be sorted in S(n) 
time per key, an integer priority queue can be im- 
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plemented supporting find-min in constant time, 
and updates, i.e., insert and delete, in O (S(n)) 
time. Here n is the current number of keys in 
the queue. The reduction uses linear space. The 
reduction runs on a standard word RAM assum- 
ing that each integer key is contained in a single 
word. 


The reduction above provides the following 
new bounds for linear space integer priority 
queues improving previous bounds given by Han 
[8] and Thorup [14], respectively: 


1. (Deterministic) O (log log n) update time us- 
ing a sorting algorithm by Han [9]. 


2. (Randomized) O ( vlog log n) expected up- 
date time using a sorting algorithm given by 
Han and Thorup [10]. 


The reduction in Theorem | employs atomic 
heaps [7] which, in addition to being very compli- 
cated, use AC° operations. The following slightly 
weaker recursive reduction which does not re- 
strict the domain of the keys is completely com- 
binatorial. 


Theorem 2 /f for some nondecreasing function 
S, up ton keys can be sorted in S(n) time per key, 
a priority queue can be implemented supporting 
find-min in constant time, and updates in T(n) 
time where n is the current number of keys in the 
queue and T(n) satisfies the recurrence: 


T(n) = O(S(n)) + T(O (logn)) 


The reduction runs on a pointer machine in 
linear space using only standard AC° operations. 


This reduction implies the following new in- 
teger priority queue bounds not implied by The- 
orem 1, which improve previous bounds given 
by Thorup in 1998 [13] and 1997 [15], respec- 
tively: 


1. (Deterministic in AC®°) © ((loglogn)!**) 
update time for any constant € > 0 using a 
standard AC® sorting algorithm given by Han 
and Thorup [10]. 
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2. (Randomized in AC°) © (log log n) expected 
update time using a randomized AC° sorting 
algorithm given by Thorup [15]. 


The Reduction in Theorem 1 
Given a sorting routine that can sort up to n 
keys in S() time per key, the priority queue is 
constructed as follows. All keys are assumed to 
be distinct. 

The data structure has two major components: 
a partially sorted list of keys called a base list and 
a set of level buffers (also called update buffers). 
Most keys of the priority queue reside in the base 
list partitioned into logarithmic-sized disjoint sets 
called base sets. While the keys inside any given 
base set are not required to be sorted, each of 
those keys must be larger than every key in the 
base set (if any) appearing before it in the list. 
Keys inside each base set are stored in a doubly 
linked list allowing constant time updates. The 
first base set in the list containing the smallest 
key among all base sets is also maintained in an 
atomic heap so that the current minimum can be 
found in constant time. Each level buffer has a 
different capacity and accumulates updates (in- 
sert/delete) with key values in a different range. 
Smaller level buffers accept updates with smaller 
keys. An atomic heap is used to determine in 
constant time which level buffer collects a new 
update. When a level buffer accumulates enough 
updates, they first enter a sorting phase and then a 
merging phase. In the merging phase each update 
is applied on the proper base set in the key list, 
and invariants on base set size and ranges of level 
buffers are fixed. These phases are not executed 
immediately, instead they are executed in fixed 
time increments over a period of time. A level 
buffer continues to accept new updates, while 
some updates accepted by it earlier are still in 
the sorting phase, and some even older updates 
are in the merging phase. Every time it accepts a 
new update, O (S(m)) time is spent on the sorting 
phase associated with it and O (1) time on its 
merging phase including rebalancing of base sets 
and scanning. This strategy allows the sorting and 
merging phases to complete execution by the time 
the level buffer becomes full again and thus keep- 
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ing the movement of updates through different 
phases smooth while maintaining an O (S(n)) 
worst-case time bound per update. Moreover, 
the size and capacity constraints ensure that the 
smallest key in the data structure is available in 
O (1) time. More details are given below. 


The Base List: The base list consists of base sets 
Aj, A2,..., Az, where $ < |A;| < ©® for 
i <_k, and |A,| < @® for some ® = 
© (login). The exact value of ® is chosen care- 
fully to make sure that it conforms with the 
requirements of the delicate worst-case base 
set rebalancing protocol used by the reduction. 
The base sets are partitioned by base splitters 
S0,51,+--,Sk4+1, Where $9 = —OO, Sez, = 
oo, and fori = 1,...,k—1, max Aj_) <5; < 
min A;. If a base set becomes too large or too 
small, it is split or joined with an adjacent set, 
respectively. 

Level Buffers: Among the base splitters / + 2 = 
© (login) are chosen to become level splitters 
to,ti,---,t/,ti41 With fo = so = —co and 
ti41 = Sk+1 = ©, so that for 7 > 0, the 
number of keys in the base list below ¢; is 
around 4/+!@, These splitters are placed in 
an atomic heap. As the base list changes the 
level splitters are moved, as needed, in order 
to maintain their exponential distribution. 
Associated with each level splitter t;, 1 < 
j < /, is a level buffer B; containing keys 
in [t;-1,t;+2), where t742 = oo. Buffer B; 
consists of an entrance buffer, a sorter, and 
a merger, each with capacity for 4/ keys. 
Level j works in a cycle of 4/ steps. The 
cycle starts with an empty entrance, at most 
4/ updates in the sorter, and a sorted list of at 
most 4/ updates in the merger. In each step 
one may accept an update for the entrance, 
spend S (4/) = O(S(n)) time in the sorter 
and O(1) time in merging the sorted list 
in the merger with the O (4/) base splitters 
in [t;-1,tj;+2) and scanning for a new f; 
among them. Therefore, after 4/ such steps, 
the sorted list is correctly merged with the 
base list, a new ¢; is found, and a new sorted 
list is produced. The sorter then takes the 
role of the merger, the entrance becomes the 
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sorter, and the empty merger becomes the new 
entrance. 

Handling Updates: When a new update key k 
(insert/delete) is received, the atomic heap of 
level splitters is used to find in O (1) time the 
tj such that k € [tj-1.t;). Ifk e€ [to, t1), 
its position is identified among the O (1) base 
splitters below ¢;, and the corresponding base 
set is updated in O (1) time using the doubly 
linked list and the atomic heap (if exists) over 
the keys of that set. If k € [t;-1,t;) for some 
j > 1, the update is placed in the entrance 
of B;, performing one step of the cycle of B; 
in O(S(n)) time. Additionally, during each 
update another splitter ¢, is chosen in a round- 
robin fashion, and a step of a cycle of level r 
is executed in O (S(n)) time. This additional 
work ensures that after every / updates some 
progress is made on moving each level splitter. 


A find-min returns the minimum element of 
the base list which is available in O (1) time. 


The Reduction in Theorem 2 

This reduction follows from the previous reduc- 
tion by replacing the atomic heap containing the 
level splitters with a data structure similar to a 
level buffer and the atomic heap over the keys 
of the first base set with a recursively defined 
priority queue satisfying the following recurrence 
for update time: T(n) = O (S(n)) + T(O (®)). 


Further Improvement 

Alstrup et al. [1] presented a general reduction 
that transforms a priority queue to support insert 
in O (1) time while keeping the other bounds 
unchanged. This reduction can be used to reduce 
the cost of insertion to a constant in Theorems | 
and 2. 


Applications 


Thorup’s equivalence results [17] can be used to 
translate known sorting results into new results 
on priority queues for integers and strings in 
different computational models (see section “Key 
Results”). These results can also be viewed as a 
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new means of proving lower bounds for sorting 
via priority queues. 

A new RAM priority queue that matches the 
bounds in Theorem | and also supports decrease- 
key in O (1) time is presented by Thorup [16]. 
This construction combines Andersson’s expo- 
nential search trees [2] with the priority queues 
implied by Theorem 1. The reduction in Theo- 
rem | is also used by Pagh et al. [12] in order to 
develop an adaptive integer sorting algorithm for 
the word RAM and by Arge and Thorup [3] to 
develop a sorting algorithm that is simultaneously 
I/O efficient and internal memory efficient in the 
RAM model of computation. Cohen et al. [4] use 
a priority queue generated through this reduction 
to obtain a simple and fast amortized imple- 
mentation of a reservoir sampling scheme that 
provides variance optimal unbiased estimation of 
subset sums. Reductions from meldable priority 
queues to sorting presented by Mendelson et 
al. [11] use the reductions from non-meldable 
priority queues to sorting given in [17]. 

An external-memory version of Theorem | 
has been proved by Wei and Yi [19]. 


Open Problems 


One major open problem is to find a general 
reduction (if one exists) that allows us to decrease 
the value of a key in constant time. Another open 
question is whether the gap between the bounds 
implied by Theorems | and 2 can be reduced or 
removed. For example, for a hypothetical linear 
time-sorting algorithm, Theorem | implies a pri- 
ority queue with an update time of O (1), while 
Theorem 2 implies only O (log* n)-time updates. 
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Problem Definition 


A graph parameter o is a real-valued function 
over graphs that is invariant under graph iso- 
morphism. For example, the average degree of 
the graph, the average distance between pairs 
of vertices, and the minimum size of a vertex 
cover are graph parameters. For a fixed graph 
parameter o and a graph G = (V, E), we would 
like to compute an estimate of o(G). To this 
end we are given query access to G and would 
like to perform this task in time that is sublinear 
in the size of the graph and with high success 
probability. In particular, this means that we do 
not read the entire graph but rather only access 
(random) parts of it (via the query mechanism). 
Our main focus here is on a very basic graph 
parameter: its average degree, denoted d(G). 
The estimation algorithm is given an approx- 
imation parameter € > 0. It should output a 
value d such that with probability at least 2/3 
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(over the random choices of the algorithm) it 
holds that d(G) < d < (1 +6): d(G). (The 
error probability can be decreased to 2~* by 
invoking the algorithm ©() times and outputting 
the median value.) For any vertex v € V = [n] of 
its choice, where [7] a {1,...,n}, the estimation 
algorithm may query the degree of v, denoted 
d(v). We refer to such queries as degree queries. 
In addition, the algorithm may ask for the ith 
neighbor of v, for any 1 < i < d(v). These 
queries are referred to as neighbor queries. We 
assume for simplicity that G does not contain any 
isolated vertices (so that, in particular, d(G) > 
1). This assumption can be removed. 


Key Results 


The problem of estimating the average degree of 
a graph in sublinear time was first studied by 
Feige [7]. He considered this problem when the 
algorithm is allowed only degree queries, so that 
the problem is a special case of estimating the 
average value of a function given query access 
to the function. For a general function d : [n] > 
[n — 1], obtaining a constant-factor estimate of 
the average value of the function (with constant 
success probability) requires §2(n) queries to the 
function. Feige showed that when d is the degree 
function of a graph, for any y € (0, 1], it is 
possible to obtain an estimate of the average 
degree that is within a factor of (2 + y) by 
performing only O(./n/y) (uniformly selected) 
queries. He also showed that in order to go below 
a factor of 2 in the quality of the estimate, 2(n) 
queries are necessary. 

However, given that the object in question is 
a graph, it is natural to allow the algorithm to 
query the neighborhood of vertices of its choice 
and not only their degrees; indeed, the afore- 
mentioned problem definition follows this natural 
convention. Goldreich and Ron [10] showed that 
by giving the algorithm this extra power, it is 
possible to break the factor-2 barrier. They pro- 
vide an algorithm that, given € > 0, outputs a 
(1+ €)-factor estimate of the average degree (with 
probability at least 2/3) after performing O(./n - 
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poly(logn, 1/e€)) degree and neighbor queries. In 
fact, since a degree query to vertex v can be 
replaced by O(logd(v)) = O(logn) neighbor 
queries, which implement a binary search, degree 
queries are not necessary. Furthermore, when the 
average degree increases, the performance of the 
algorithm improves, as stated next. 


Theorem 1 There exists an algorithm that 
makes only neighbor queries to the input 
graph and satisfies the following condition. 
On input G = (V,E) and € e€ (0,1), 
with probability at least 2/3, the algorithm 


halts within O (yn/ae) - poly(logn, 1/6) 


steps and outputs a value in [d(G),(1 + €)- 
d(G)}. 


The running time stated in Theorem | is 
essentially optimal in the sense that (as shown 
in [10]) a (1 + e)-factor estimate requires 
2(,/n/(ed(G))) queries, for every value 
of n, for d(G) € [2,0(n)], and for € ¢€ 
[w(n-"/4), o(n/d(G))]. 

The following is a high-level description of 
the algorithm and the ideas behind its analysis. 
For the sake of simplicity, we only show how 
to obtain a (1 + €)-factor estimate by perform- 
ing O (./n- poly(logn, 1/€)) queries (under the 
assumption that d(G) > 1). For the sake of 
the presentation, we also allow the algorithm to 
perform degree queries. We assume that e < 1/2, 
or else we run the algorithm with e = 1/2. We 
first show how to obtain a (2 + €)-approximation 
by performing only degree queries and then ex- 
plain how to improve the approximation by using 
neighbor queries as well. 

Consider a partition of the graph vertices into 
buckets B,,..., B;, where 


By 2S {v: (1 +€/8)'"! < d(v) < (1 +€/8)} 
(1) 
andr = O(logn/e). By this definition, 


i eS _ 
— )-|Bih(1+e/8)' € [a(G), (+e/8)-a@)]. 
i=1 
(2) 
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Suppose we could obtain an estimate bj of 
the size of each bucket B; such that b; € 
[1 — €/8)|Bi|, 1 + €/8)| Bi |]. Then 


ia : 
-) bj -(1 + €/8)' 
n 


i=1 
c [a — €/8)-d(G), (1 + 3€/8)- d(G)| 
(3) 


Now, for each B;, if we uniformly at ran- 
n logr 
[Bil ° e2 
a multiplicative Chernoff bound, with proba- 
bility 1 — O(1/r), the fraction of sampled 


vertices that belong to B; is in the interval 
[a - €/8) Sil, d+ c/8) 4]. By querying the 
degree of each sampled vertex, we can deter- 
mine to which bucket it belongs and obtain an 
estimate of |B;|. Unfortunately, if B; is much 
smaller than ./n, then the sample size required 


to estimate |B;| is much larger than the de- 


sired O (./n- poly(logn, 1/e)). Let L = {i : 


|Bi| => /en/8r} denote the set of indices of 
large buckets. The basic observation is that if, 


for each i € L, we have an estimate b; € 
[C1 — €/8)|B;|, 1 + €/8)| B; |], then 


dom select 2( ) vertices, then, by 


1S, (i +e/8y 
n 


1éEL 
E [as —«/4)-d(G), (1 + 3€/8)- d(G)| , 
(4) 


The reasoning is essentially as follows. Recall 
that °,,d(v) = 2|E£|. Consider an edge (u, v) 
where u € B; andv ¢€ By. If j,k € L, 
then this edge contributes twice to the sum 
in Eq. (4): once when i = j and once when 
i= k. If j € Landk ¢ L (or vice verse), 
then this edge contributes only once. Finally, 
if j,k € L, then the edge does not contribute 
at all, but there are at most €n/8 edges of this 
latter type. Since it is possible to obtain such 
estimates b; for all i ¢€ JL simultaneously, 
with constant success probability, by sampling 
O ((J/n- poly(logn, 1/e)) vertices, we can get a 
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(2+ €)-factor estimate by performing this number 
of degree queries. Recall that we cannot obtain 
an approximation factor below 2 by performing 
o(n) queries if we use only degree queries. 

In order to obtain the desired factor of (1 + €), 
we estimate the number of edges (u,v) such 
that uv € B; andv ¢€ By with j € L and 
k € L, which are counted only once in Eq. (4). 
Here is where neighbor queries come into play. 
For each i € L (more precisely, for each i 
such that b; is sufficiently large), we estimate 
e) © |{(uv): we B;, v € By fork ¢ L}I. 
This is done by uniformly sampling neighbors 
of vertices in B;, querying their degree, and 
therefore estimating the fraction of edges incident 
to vertices in B; whose other endpoint belongs 
to By, fork € L. If we denote the estimate 
of e; by é;, then we can get that by perform- 
ing O ((./n- poly(logn, 1/e)) neighbor queries, 
with high constant probability, the é;’s are such 
that 


"(6-1 +¢/8)' +4) 


ieéL 
E [a ~€/2)-d(G), (1+ €/2): dG) | 
(5) 


By dividing the left-hand side in Eq. (5) by (1 — 
€/2), we obtain the (1 + €)-factor we sought. 


Estimating the Average Distance 

Another graph parameter considered in [10] is 
the average distance between vertices. For this 
parameter, the algorithm is given access to a 
distance-query oracle. Namely, it can query the 
distance between any two vertices of its choice. 
As opposed to the average degree parameter 
where neighbor queries could be used to improve 
the quality of the estimate (and degree queries 
were not actually necessary), distance queries are 
crucial for estimating the average distance, and 
neighbor queries are not of much use. The main 
(positive) result concerning the average distance 
parameter is stated next. 


Theorem 2 There exists an algorithm that 
makes only distance queries to the input graph 
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and satisfies the following condition. On input 
G = (V,E) and € € (0,1), with proba- 
bility at least 2/3, the algorithm halts within 


O ( yniBio) . poty(1/e)) steps and outputs a 


value in [D(G), (1 + €) - D(G)], where D(G) 
is the average of the all-pairs distances in G. A 
corresponding algorithm exists for the average 
distance to a given vertexs € V. 


Comments for the Recommended 
Reading 


The current entry falls within the scope of 
sublinear-time algorithms (see, e.g., [4]). 

Other graph parameters that have been stud- 
ied in the context of sublinear-time algorithms 
include the minimum weight of a spanning tree 
[2, 3,5], the number of stars [11] and the number 
of triangles [6], the minimum size of a vertex 
cover [13—15, 17], the size of a maximum match- 
ing [14, 17], and the distance to having vari- 
ous properties [8, 13, 16]. Related problems over 
weighted graphs that represent distance metrics 
were studied in [12] and [1]. 
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Problem Definition 


This entry considers geometric optimization 
NP-hard problems like the Euclidean traveling 
salesman problem and the Euclidean Steiner tree 
problem. These problems are geometric variants 
of standard graph optimization problems, and the 
restriction of the input instances to geometric or 
Euclidean case arises in numerous applications 
(see [1,2]). The main focus of this entry is on the 
Euclidean traveling salesman problem. 


The Euclidean Traveling Salesman 

Problem (TSP) 

For a given set S of n points in the Euclidean 
space R¢, find the minimum length path that 
visits each point exactly once. The cost 5(x, y) 
of an edge connecting a pair of points x, y € R@ 
is equal to the Euclidean distance between points 


d 
Ye (xi —y;)*, where 
i=1 
xX = (X1,..., Xq) and y = (yj, ..., yg). More 
generally, the distance could be defined using 


other norms, such as £, norms for any p > 1, 


D 1/p 
6(x,y) = (Se = vy) 

Fora given set S of points in Euclidean space 
R¢@, for a certain integer d, d > 2, a Euclidean 
graph (network) is a graph G = (S, E), where 
E is a set of straight-line segments connecting 
pairs of points in S. If all pairs of points in S 
are connected by edges in E, then G is called a 
complete Euclidean graph on S. The cost of the 
graph is equal to the sum of the costs of the edges 
of the graph, cost((G) = diy yex 5%, Y). 

A_ polynomial-time approximation scheme 
(PTAS) is a family of algorithms {.A,} such that, 
for each fixed ¢ > 0, A, runs in polynomial time 
in the size of the input and produces a (1 + ¢€)- 
approximation. 


x and y, that is, 6(x, y) = 
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Related Work 

The classical book by Lawler et al. [16] pro- 
vides extensive information about the TSP. Also, 
the survey exposition of Bern and Eppstein [8] 
presents the state of the art for geometric TSP 
until 1995, and the survey of Arora [2] discusses 
the research after 1995. 


Key Results 


We begin with the hardness results. The TSP in 
general graphs is well known to be NP-hard, 
and the same claim holds for the Euclidean TSP 
[14, 18]. 


Theorem 1 The Euclidean TSP is NP-hard. 


Perhaps rather surprisingly, it is still not known 
if the decision version of the problem is ’P- 
complete [14]. (The decision version of the Eu- 
clidean TSP: given a point set in the Euclidean 
space R¢ and a number ¢, verify if there is a 
simple path of length smaller than ¢ that visits 
each point exactly once.) 

The approximability of TSP has been studied 
extensively over the last few decades. It is 
not hard to see that TSP is not approximable 
in polynomial time (unless P = ANP) for 
arbitrary graphs with arbitrary edge costs. When 
the weights satisfy the triangle inequality (the 
so-called metric TSP), there is a polynomial- 
time 3/2-approximation algorithm due _ to 
Christofides [9], and it is known that no PTAS 
exists (unless P = NP). This result has 
been strengthened by Trevisan [21] to include 
Euclidean graphs in high dimensions (the same 
result holds also for any ¢, metric). 


Theorem 2 (Trevisan [21]) [fd > log n, then 
there exists a constant € > O such that the 
Euclidean TSP in R4 is NP-hard to approximate 
within a factor of I + €. 


In particular, this result implies that if d > logn, 
then the Euclidean TSP in R? has no PTAS 
unless P = NP. 

The same result holds also for any £, metric. 
Furthermore, Theorem 2 implies that Euclidean 
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TSP in R'°2” is APX PB-hard under E-reductions 
and APX-complete under AP-reductions. 

It has been believed for some time that The- 
orem 2 might hold for smaller values of d, 
in particular even for d = 2, but this has 
been disproved independently by Arora [1] and 
Mitchell [17]. 


Theorem 3 (Arora [1] and Mitchell [17]) The 
Euclidean TSP on the plane has a PTAS. 


The main idea of the algorithms of Arora and 
Mitchell is rather simple, but the details of the 
analysis are quite complicated. Both algorithms 
follow the same approach. One first proves a so- 
called structure theorem, which demonstrates that 
there is a (1 + €)-approximation that has some 
local properties (in the case of the Euclidean 
TSP, there is a quadtree partition of the space 
containing all the points such that there is a 
(1 + €)-approximation in which each cell of the 
quadtree is crossed by the tour at most a constant 
number of times and only in some prespecified 
locations). Then, one uses dynamic programming 
to find an optimal (or almost optimal) solution 
that obeys the local properties specified in the 
structure theorem. 

The original algorithms presented in the first 
conference version of [1] and in the early version 
of [17] have the running times of the form 
O(n'/£) to obtain a (1 + &)-approximation, 
but this has been subsequently improved. In 
particular, Arora’s randomized algorithm in 
[1] runs in time O(n(log n)!/£), and it can be 
derandomized with a slowdown of O(n). The 
result from Theorem 3 can be also extended to 
higher dimensions. Arora shows the following 
result. 


Theorem 4 (Arora [1]) For every constant d, 
the Euclidean TSP in R@ has a PTAS. 

For every fixed c > | and given any n points in 
R¢, there is a randomized algorithm that finds a 
(1 + +) -approximation of the optimum traveling 


salesman tour in O (n (log nyor/aey’* ) time. 
In particular, for any constant d and c, the 
running time is O (n (log nyo), The algorithm 
can be derandomized by increasing the running 


time by a factor of O(n“). 
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This has been later extended by Rao and 
Smith [19], who proved the following. 


Theorem 5 (Rao and Smith [19]) There is a 
deterministic algorithm that computes a (1 + 1). 
approximation of the optimum traveling salesman 
tour in O (208° + (cd)°On log n) time. 
There randomized algorithm that 
succeeds with probability at least 4 and that 
computes a (1 + +) approximation of the 
optimum traveling salesman tour in expected 
d-1 
(evar ) 
These results are essentially asymptotically opti- 
mal in the decision tree model thanks to a lower 
bound of Q(” logn) for any sublinear approxi- 
mation for 1-dimensional Euclidean TSP due to 
Das et al. [12]. In the real RAM model, one can 
further improve the randomized results. 


Theorem 6 (Bartal and Gottlieb [6]) Given 
a set S of n points in d-dimensional grid 
d ‘ _ (cd)O@ 
{0,..., A}* with A = 2 n, there is a 
randomized algorithm that with probability 1 — 
e Oa) com Z imati 
putes a (1 + *)-approximation 
of the optimum traveling salesman tour for S in 
time 2©4)° n in the integer RAM model. 


is a 


n+ O(dn log n) time. 


If the data is not given in the integral form, then 
one may round the data into this form using 
the floor or mod functions, and assuming these 
functions are atomic operations, the rounding 
can be done in O(dn) total time, leading to the 
following theorem. 


Theorem 7 (Bartal and Gottlieb [6]) Given a 
set of n points in R4, there is a randomized 
algorithm that with probability 1 — e~°4 (at) 
computes a (1 + +) -approximation of the opti- 
mum traveling salesman tour in time 20d » 
in the real RAM model with atomic floor or mod 


operations. 


Applications 


The techniques developed by Arora [1] and 
Mitchell [17] found numerous applications in 
the design of polynomial-time approximation 
schemes for geometric optimization problems. 
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Euclidean Minimum Steiner Tree 

For a given set S of n points in the Euclidean 
space R, find the minimum-cost network con- 
necting all the points in S (where the cost of a 
network is equal to the sum of the lengths of the 
edges defining it). 


Euclidean k-median 

For a given set S of m points in the Euclidean 
space R@ and an integer k, find k-medians among 
the points in S so that the sum of the distances 
from each point in S to its closest median is 
minimized. 


Euclidean k-TSP 

For a given set S of m points in the Euclidean 
space R@ and an integer k, find the shortest tour 
that visits at least k points in S. 


Euclidean k-MST 

For a given set S of n points in the Euclidean 
space R@ and an integer k, find the shortest tree 
that visits at least k points in S. 


Euclidean Minimum-Cost k-Connected 
Subgraph 

For a given set S of m points in the Euclidean 
space R@ and an integer k, find the minimum- 
cost subgraph (of the complete graph on S$) that 
is k-connected. 


Theorem 8 For every constant d, the following 
problems have a PTAS: 


¢ Euclidean minimum Steiner tree problem in 
R?¢ [1,19] 

¢ Euclidean k-median problem in R¢ [5] 

¢ Euclidean k-TSP and the Euclidean k-MST 
problems in R¢ [1] 

¢ Euclidean minimum-cost k-connected 
subgraph problem in R4 (constant k) [10] 


The technique developed by Arora [1] and 
Mitchell [17] led also to some quasi-polynomial- 
time approximation schemes, that is, the 
algorithms with the running time of nO“°s”, 
For example, Arora and Karokostas [4] gave a 
quasi-polynomial-time approximation scheme 
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for the Euclidean minimum latency problem, 
Das and Mathieu [13] gave a quasi-polynomial- 
time approximation scheme for the Euclidean 
capacitated vehicle routing problem, and Remy 
and Steger [20] gave a quasi-polynomial-time 
approximation scheme for the minimum-weight 
triangulation problem. 

For more discussion, see the survey by 
Arora [2] and Czumaj and Lingas [11]. 


Extensions to Planar Graphs and Metric 
Spaces with Bounded Doubling Dimension 
The dynamic programming approach used by 
Arora [1] and Mitchell [17] is also related to 
the recent advances for a number of optimization 
problems for planar graphs and in graphs in 
metric spaces with bounded doubling dimension. 
For example, Arora et al. [3] designed a PTAS for 
the TSP in weighted planar graphs (cf. [15] for a 
linear-time PTAS), and there is a PTAS for metric 
spaces with bounded doubling dimension [7]. 


Open Problems 


An interesting open problem is if the quasi- 
polynomial-time approximation schemes men- 
tioned above (for the minimum latency, the 
capacitated vehicle routing, and the minimum- 
weight triangulation problems) can be extended 
to obtain PTAS. For more open problems, see 
Arora [2]. 


Experimental Results 


The Web page of the 8th DIMACS Imple- 
mentation Challenge, http://dimacs.rutgers.edu/ 
Challenges/TSP/, contains a lot of instances. 


URLs to Code and Data Sets 


The Web page of the 8th DIMACS Imple- 
mentation Challenge, http://dimacs.rutgers.edu/ 
Challenges/TSP/, contains a lot of instances. 
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Problem Definition 


All problems in NP can be exactly solved in 
2Pely(™) time via exhaustive search, but research 
has yielded faster exponential-time algorithms 
for many NP-hard problems. However, some key 
problems have not seen improved algorithms, and 
problems with improvements seem to converge 
toward O(C”) for some unknown constant 
C>1. 

The satisfiability problem for Boolean formu- 
las in conjunctive normal form, CNF-SAT, is 
a central problem that has resisted significant 
improvements. The complexity of CNF-SAT and 
its special case k-SAT, where each clause has 
k literals, is the canonical starting point for the 
development of NP-completeness theory. 

Similarly, in the last 20 years, two hypothe- 
ses have emerged as powerful starting points 
for understanding exponential-time complexity. 
In 1999, Impagliazzo and Paturi [5] defined the 
exponential-time hypothesis (ETH), which as- 
serts that 3-SAT cannot be solved in subexponen- 
tial time. Namely, it asserts there is an € > 0 such 
that 3-SAT cannot be solved in O((1 + €)”) time. 
ETH has been a surprisingly useful assumption 
for ruling out subexponential-time algorithms for 
other problems [2,6]. A stronger hypothesis has 
led to more fine-grained lower bounds, which is 
the focus of this article. Many NP-hard problems 
are solvable in C” time via exhaustive search (for 
some C > 1) but are not known to be solvable 
in (C — €)” time, for any € > 0. The strong 
exponential-time hypothesis (SETH) [1,5] asserts 
that for every € > 0, there exists a k such that k- 
SAT cannot be solved in time O((2—€)”). SETH 
has been very useful in establishing tight (and 
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exact) lower bounds for many problems. Here we 
survey some of these tight results. 


Key Results 


The following results are reductions from k-SAT 
to other problems. They can be seen either as new 
attacks on the complexity of SAT or as lower 
bounds for exact algorithms that are conditional 
on SETH. 


Lower Bounds on General Problems 

The following problems have lower bounds con- 
ditional on SETH. The reduction for the first 
problem is given to illustrate the technique. 


k-Dominating Set 

A dominating set of a graph G = (V,E) isa 
subset S C V such that every vertex is either 
in S or is a neighbor of a vertex in S. The k- 
DOMINATING SET problem asks to find a dom- 
inating set of size k. Assuming SETH, for any 
k > 3 ande > 0, kK-DOMINATING SET cannot be 
solved in o(n*-*) time [8]. 

The reduction from SAT to k-DOMINATING 
SET proceeds as follows. Fix some k > 3 and 
let F be a CNF formula on n variables; we build 
a corresponding graph Gr. Partition its variables 
into k equally sized parts of n/k variables. For 
each part, take all 2”/* partial assignments and 
make a node for each partial assignment. Make 
each of the k parts into a clique (disjoint from 
the others). Add a dummy node for each partial 
assignment clique that is connected to every node 
in that clique but has no other edges. Add m more 
nodes, one for each clause. Finally, make an edge 
from a partial assignment node to a clause node 
iff the partial assignment satisfies the clause. We 
observe that there is a k-dominating set in Gr if 
F is satisfiable. 


2Sat+2Clauses 

The 2SAT+2CLAUSES problem asks whether a 
Boolean formula is satisfiable, given that it is 
a 2-CNF with two additional clauses of arbi- 
trary length. Assuming SETH, for any m = 
nit+o() and « > 0, 2SAT+2CLAUSES cannot 
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be solved in O(n?~*) time [8]. It is known that 
2SAT+2CLAUSES can be solved in O(mn + n?) 
time [8]. 


HornSat+kClauses 

The HORNSAT+KCLAUSES problem asks 
whether a Boolean formula is satisfiable, given 
that it is a CNF of clauses that contain at most 
one nonnegative literal per clause (a Horn CNF), 
conjoined with k additional clauses of arbitrary 
length but only positive literals. Assuming SETH, 
for any k > 2 ande > 0, HORNSAT+KCLAUSES 
cannot be solved in O((n + m)*-£) time [8]. It 
can be trivially solved in O(n* - (m+ n)) time by 
guessing a variable to set to true for each of the k 
additional clauses and checking if the remaining 
Horn CNF is satisfiable in linear time. 


3-Party Set Disjointness 

The 3-PARTY SET DISJOINTNESS problem is a 
communication problem with three parties and 
three subsets S;,S2,53 C [m], where the ith 
party has access to all sets except for S;. The 
parties wish to determine if S$; N---N S3 = 
@. Clearly this can be done with O(m) bits of 
communication. Assuming SETH, 3-PARTY SET 
DISJOINTNESS cannot be solved using protocols 
running in 2°”) time and communicating only 
o(m) bits [8]. 


k-SUM 
The k-SUM problem asks whether a set of n 
numbers contains a k-tuple that sums to zero. 
Assuming SETH, k-SUM on n numbers cannot 
be solved in n° time for any k < n°-99. (It is 
well known that k-SuM is in O(n!*/21) time) [8]. 
For all the problems below, we can solve in 
2" n° (1) time via exhaustive search. 


k-Hitting Set 

Given a set system F C 2Y in some universe U, 
a hitting setis asubset H C U suchthat HNS # 
®@ for every S € F. The kK-HITTINGSET problem 
asks whether there is a hitting set of size at most fr, 
given that each set S € F has at most k elements. 
SETH is equivalent to the claim that for all € > 0, 
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there is a k for which k-HITTINGSET cannot be 
solved in time O((2 — €)”) [3]. 


k-Set Splitting 

Given a set system F C 2Y in some universe 
U, a set splitting is a subset X C U such that 
the first element of the universe is in X and for 
every S € ¥, neither S C X nor S$ C (UU \ 
X). The k-SETSPLITTING problem asks whether 
there is a set splitting, given that each set S € 
F has at most k elements. SETH is equivalent 
to the claim that for all « > 0, there is a k for 
which k-SETSPLITTING cannot be solved in time 
O((2 — €)") [3]. 


k-NAE-Sat 

The k-NAE-SAT problem asks whether a kK-CNF 
has an assignment where the first variable is set to 
true and each clause has both a true literal and a 
false literal. SETH is equivalent to the claim that 
for all « > 0, there is a k for which k-NAE-SAT 
cannot be solved in time O((2 — €)”) [3]. 


c-VSP-Circuit-SAT 

The c-VSP-CIRCUIT-SAT problem asks whether 
a cn-size Valiant series-parallel circuit over n 
variables has a satisfying assignment. SETH is 
equivalent to the claim that for all « > 0, there 
is ak for which c-VSP-CIRCUIT-SAT cannot be 
solved in time O((2 — €)”) [3]. 


Problems Parameterized by Treewidth 

A variety of NP-complete problems have been 
shown to be much easier on graphs of bounded 
treewidth. Reductions starting from SETH given 
by Lokshtanov, Marx, and Saurabh [7] can also 
prove lower bounds that depend on the treewidth 
of an input graph, tw(G). The following are 
proven via analyzing the pathwidth of a graph, 
pw(G), and the fact that tw(G) < pw(G). 


Independent Set 

An independent set of a graph G = (V, EF) isa 
subset S C V such that the subgraph induced 
by S contains no edges. The INDEPENDENT 
SET problem asks to find an independent set of 
maximum size. Assuming SETH, for any e > 0, 
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INDEPENDENT SET cannot be solved in (2 — 
€)'™(G) 7, OC time. 


Dominating Set 

A dominating set of a graph G = (V,E) isa 
subset S C V such that every vertex is either in 
S' or is a neighbor of a vertex in S. The DOMI- 
NATING SET problem asks to find a dominating 
set of minimum size. Assuming SETH, for any 
€ > 0, DOMINATING SET cannot be solved in 
(3 —€)*O n° time. 


Max Cut 

A cut of a graph G = (V, E£) is a partition of V 
into S and V\S. The size of a cut is the number of 
edges that have one endpoint in S' and the other in 
V\S.The MAX CUT problem asks to find a cut of 
maximum size. Assuming SETH, for any € > 0, 
Max CuT cannot be solved in (2 — €)™n9M 
time. 


Odd Cycle Transversal 

An odd cycle transversal of a graph G = (V, E) 
is a subset S C V such that the subgraph 
induced by V \ S is bipartite. The ODD CYCLE 
TRANSVERSAL problem asks to, given an inte- 
ger k, determine whether there is an odd cycle 
transversal of size k. Assuming SETH, for any 
€ > 0, ODD CYCLE TRANSVERSAL cannot be 
solved in (3 — €)™n° time. 


Graph Coloring 

A q-coloring of a graph G = (V, £) is a function 
fe: V = [gq]. A q-coloring is proper if for 
all edges (u,v) € E, uw(u) # pv). The g- 
COLORING problem asks to decide whether the 
graph has a proper g-coloring. Assuming SETH, 
for any g => 3 andé > 0, g-COLORING cannot be 
solved in (q — €)™On? time. 


Partition Into Triangles 

A graph G = (V,E£) can be partitioned into 
triangles if there is a partition of the vertices 
into S;, S2,...,S,/3 such that each S; induces a 
triangle in G. The PARTITION INTO TRIANGLES 
problem asks to decide whether the graph can be 
partitioned into triangles. Assuming SETH, for 
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any € > 0, PARTITION INTO TRIANGLES cannot 
be solved in (2 — €)'n° time. 

All of the above results are tight, in the sense 
that when € = 0, there is an algorithm for each of 
them. 


Showing Difficulty Via Set Cover 

Given a set system F C 2U in some universe U,a 
set cover is a subset C C F such that Usec S = 
U. The SET COVER problem asks whether there 
is a set cover of size at most f. 

Cygan et al. [3] also gave reductions from SET 
COVER to several other problems, showing lower 
bounds conditional on the assumption that for all 
€ > 0, there is a k such that SET COVER where 
sets in F have size at most k cannot be computed 
in time O*((2—€)”). 

It is currently unknown how SET COVER is re- 
lated to SETH; if there is a reduction from CNF- 
SAT to SET COVER, then all of these problems 
would have conditional lower bounds as well. 


Steiner Tree 

Given a graph G = (V, E) and a set of terminals 
T C V, a Steiner Tree is a subset X C V 
such that the graph induced by X is connected 
and T C X. The STEINER TREE problem asks 
whether G has a Steiner tree of size at most 
t. With the above SET COVER assumption, for 
all « > O, STEINER TREE cannot be solved in 
O*((2 —«)*) time. 


Connected Vertex Cover 

A connected vertex cover of a graph G = (V, E) 
is a subset X C V such that the subgraph in- 
duced by X is connected and every edge contains 
at least one endpoint in X. The CONNECTED 
VERTEX COVER problem asks whether G has a 
connected vertex cover of size at most ¢. With 
the above SET COVER assumption, for all € > 0, 
CONNECTED VERTEX COVER cannot be solved 
in O*((2 — €)*) time. 


Set Partitioning 

Given a set system F C 2U in some universe U, 
a set partitioning is a set cover C where pairwise 
disjoint elements have an empty intersection. The 
SET PARTITIONING problem asks whether there 
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is a set partitioning of size at most t. With the 
above SET COVER assumption, for all € > 0, SET 
PARTITIONING cannot be solved in O* ((2—€)”) 
time. 


Subset Sum 

The SUBSET SUM problem asks whether a set of 
Nn positive numbers contains a subset that sums 
to a target t. With the above SET COVER assump- 
tion, for all 6 < 1, SUBSET SUM cannot be solved 
in O*(t®) time. Note that there is a dynamic 
programming solution that runs in O(nt) time. 


Open Problems 


¢ Does ETH imply SETH? 

¢ Does SETH imply SET COVER requires 
O*((2 — €)”) time for all « > 0? 

¢ Does SETH imply that the Traveling Sales- 
man Problem in its most general, weighted 
form requires O*((2—«)”) time for all € > 0? 

* Given two graphs F and G, on k and n nodes, 
respectively, the SUBGRAPH ISOMORPHISM 
problem asks whether a (noninduced) sub- 
graph of G is isomorphic to F. Does SETH 
imply that SUBGRAPH ISOMORPHISM cannot 
be solved in 20)? 
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Problem Definition 


In the subset sum problem, we are given 
integers a,,...,d,,¢ and are asked to find a 
subset X C {1,...,n} such that Vcya; = 
t. In the Knapsack problem, we are given 
Q1,.--,4n,01,...,by,t, u and are asked to find 
a subset X C {1,...,a} such that \jcy ai <t 
and \o;cy b; > u. It is well known that both 
problems can be solved in O(nt) time using 
dynamic programming. However, as is typical for 
dynamic programming, these algorithms require 
a lot of working memory and are relatively hard 
to execute in parallel on several processors: the 
above algorithms use O(t) space which may be 
exponential in the input size. 

This raises the question: when can we avoid 
these disadvantages and still be (approximately) 
as fast as dynamic programming algorithms? 
It appears that by (slightly) loosening the time 
budget, space usage and parallelization can 
be significantly improved in many dynamic 
programs. 


Key Results 


A Space Efficient Algorithm for Subset Sum 

In this article, we will use O(-) to suppress factors 
that are poly-logarithmic in the input size. In 
what follows, we will discuss how to prove the 
following theorem: 


Theorem 1 (Lokshtanov and Nederlof, [7]) 
There is an algorithm counting the number 
of solutions of a subset sum instance in 
O(n?t(n + logt)) time and (n + Ig(t))(Ignt) 
space. 


The Discrete Fourier Transform 
We use Iverson’s bracket notation: given a 
Boolean predicate b, [b] denotes 1 if b is true 
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and 0 otherwise. Let P(x) be a polynomial 
of degree N — 1, and let po,...,pn—1 be 
its coefficients. Thus, P(x) = ye Re. 
Let wy denote the N’th root of unity, that 
1s, ON = en. Let k,t be integers such that 
k # t. By the summation formula for geometric 


_ —pN 
fie ro= 1 for r # 1), we 


progressions ( 


have: 
N-1 (k—t)N N\k-t 
Soft = 1—on _ 1-(y) 
al l-—ak* L=oc 
£=0 N N 
_ (1)k¥-* 
= ia = 0. 
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On the other hand, if k = f, then 3a oe? = 
N-1 


7-0 | = WN. Thus, both cases can be 

: -1 &(k-t) 
compactly summarized as he On = 
[k = t]N. As a consequence, we can express 


a coefficient p; of P(x) directly in terms of its 
evaluations: 


N-1 
p= Sik =e 
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Using the Discrete Fourier Transform 

for Subset Sum 

Given an instance d1,...,@y,¢t of subset sum, 
define the polynomial P(x) to be P(x) = 
Te, + x). Clearly, we can discard integers 
a; larger than t, and assume that P(x) has degree 
at most N = nt. If we expand the products in this 
polynomial to get rid of the parentheses, we get 
a sum of 2” products and each of these products 
is of the type x* and corresponds to a subset 
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X C {l,...,m} such that °;-y a; = k. Thus, 
if we aggregate these products, we obtained the 
normal form P(x) = ye pex*, where pr 
equals the number of subsets X C [n] such that 
iex ai = k. Plugging this into Eq. 1, we have 
that the number of subset sum solutions equals 


i2= n 
-£ taj 
Bi, Lem Tew): (2) 
= i= 


Given Eq. 2, the algorithm suggests itself: evalu- 
ation of the right-hand side gives the number of 
solutions of the subset sum instance. Given wy, 
this would be a straightforward on the unit-cost 
RAM model (recall that in this model arithmetic 
instructions as +,—, * and / are assumed to take 
constant time): the required powering operations 
are performed in log(V) arithmetic operations so 
an overall upper bound would be O(n7t log(nt)) 
time. 

However, still the value of this algorithm is 
not clear yet: for example, wy may be irrational, 
so it is not clear how to perform the arithmetic 
efficiently. This is an issue that also arises for 
the folklore fast Fourier transform (see, e.g., [3] 
for a nice exposition), and this issue is usually 
not addressed (a nice exception is Knuth [6]). 
Moreover in our case we should also be careful 
on the space budget: for example, we cannot even 
store 2‘ in the usual way within our space budget. 
But, as we will now see, it turns out that we can 
simply evaluate Eq.2 with finite precision and 
round to the nearest integer within the resource 
bounds claimed in Theorem 1. 


Evaluating Equation 2 with Finite Precision 

The algorithm establishing Theorem | is pre- 
sented in Algorithm 1. Here, p represents the 
amount of precision the algorithm works with. 
The procedure tr, truncates p bits after the 
decimal point. The procedure apxr,(z) returns 
an estimate of On: In order to do this, estimates 
of wX with z being powers of 2 are precomputed 
in Lines 3-4. We omit an explicit implementation 
of the right-hand side of Line 4 since this is very 
standard; for example, one can use an approxima- 
tion of z together with a binary splitting approach 
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Algorithm 1 Approximate evaluation of Eq. 2 
Algorithm: SSS(a@1,...,@y,f) 
Require: for every 1 <i <n,aj; <t. 
1: p< 3n + 6lognt 
2: s<0O 
: for 0 <q < log N —1do 
//store roots of unity for powers of two 


3 

4 

5 rq <tr,(e*X ) 

6: end for 

7: for0 < £< N—1do 

8 p< apxr,(—t % N) 
9: for 1 <i<ndo 


10: p<—trp(p * (1+ apxr, (la; % N))) 
11: end for 

12: s<s+p 

13: end for 


14: return rnd(tr,(47)) //round to nearest int 


Algorithm: apxr,(z) 
Require: z < N 

15: p’< 1 

16: for 1 < g < log N —1do 
17: if 2% divides z then 


18: Pp <—trp(p' * rq) 
19: end if 
20: end for 


21: return p’ 


(see [1, Section 4.9.1]) or a Taylor expansion- 
based approach (see [2]). Crude upper bounds 
on the time and space usage of both approaches 
are O(p* log N) time and O(n lognt + lognt) 
space. 

Let us proceed with verifying whether 
Algorithm 1 satisfies the resource bounds of 
Theorem |. It is easy to see that all intermediate 
numbers have modulus at most 2” N, so their 
estimates can be represented with O(p) bits. For 
all multiplications we will use an asymptotically 
fast algorithm running in O(n) time (e.g., [5]). 
Then, Line 3-4 take O(p? log N) time. Line 6 
takes O(plg N) time; Lines 7-8 take nolg N 
time, which is the bottleneck. So overall, the 
algorithm uses O(Nnp) = O(n?t(n + log t)) 
time. The space usage is dominated by the 
precomputed values which use O(log Np) = 
O(n + log t (log nt)) space. 

For the correctness of Algorithm 1, let us 
first study what happens if we work with infinite 
precision (i.e., 9 = 00). Note that apxrgg(z) = 
Wy Since it computes 


663 
log N-1 
[| (24 divides zJrg 
q=1 
log N-1 
en 29 
= I] [27 divides z]wy 
ql 
= pute! 24 divides z]24 oe 
= iy = 0%. 


Moreover, note that on iteration € of the for-loop 
of Line 5, we will have on Line 9 that p = 
P(wh) by the definition of P(x). Then, it is easy 
to see that Algorithm | indeed evaluates the right- 
hand side of Eq. 2. 

Now, let us focus on the finite precision. The 
algorithm computes an N-sized sum of (n + 
2log N)-sized products of precomputed values, 
(increased by one). Note that it is sufficient to 
guarantee that on Line 9 in every iteration @, |p — 
on TT + wisi )| < 0.4, since then the total 
error of s on Line 10 is at most 0.4N and the total 
error of s/N is 0.4, which guarantees rounding to 
the correct integer. Recall that p is the result of 
an (n + 2 log N)-sized product, so let us analyze 
how the approximation error propagates in this 
situation. If a,b are approximations of a,b and 
we approximate c by trpd * b, we have 


\c—é| < |a—A||b|+|b—b||a|+|a—@||b—b| +27”. 


Thus, if a is the result of a (i-1)-sized product, 
and using an upper bound of 2 for the modulus 
of any of the product terms in the algorithm, 
we can upper bound the error of FE; estimating 
an i-length product as follows: E; < 27? and 
fori > 1: 


By 2th 42-9 4 BoP 
238.4 +o, 


Using straightforward induction we have that 
E; < 6'2-° So indeed, setting p = 3n + 6lognt 
suffices. 


A Generic Framework 
For Theorem 1, we only used that the to be 
determined value is a coefficient of a (relatively) 
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small degree polynomial that we can evaluate 
efficiently. Whether this is the case for other 
problems solved by dynamic programming can 
be seen from the structure of the used recur- 
rence: when the recurrence can be formulated 
over a polynomial ring where the polynomials 
have small degree, we can evaluate it fast and 
interpolate with the same technique as above 
to find a required coefficient. For example, for 
Knapsack, one can use the polynomial P(x, y) = 
10 ae yi) and look for a nonzero coefficient 
of xt y# where ¢t’ < ¢ and uv’ > u to obtain 
a pseudo-polynomial time and polynomial space 
algorithm as well. 

Naturally, this technique does not only apply 
to the polynomial ring. In general, if the ring 
would be R <¢ CN*% equipped with matrix 
addition and multiplication, we just need a matrix 
that simultaneously diagonalizes all matrices of 
R (in the above case, R are all circulant matrices 
which are simultaneously diagonalized by the 
Fourier matrix). 


Applications 


The framework applies to many dynamic pro- 
gramming algorithms. A nice additional exam- 
ple is the algorithm of Dreyfus and Wagner for 
Steiner tree [4, 7, 8]. 
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Problem Definition 


Given a graph G with n vertices, an ordering is a 
bijective function z : V(G) > {1,2,...,n}. The 
bandwidth of z is a maximal length of an edge, 
1.€., 


bw(z) = max |z(u)—2(v)|. 


E(G) 


The bandwidth problem, given a graph G and a 
positive integer b, asks if there exists an ordering 
of bandwidth at most b. 
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Key Results 


An exhaustive search for the bandwidth problem 
enumerates all the m! orderings, trying to find 
one of bandwidth at most b. The first single 
exponential time algorithm is due to Feige and 
Kilian [6], which we are going to describe now. 


Bucketing 

Definition 1 For a positive integer k, let Z, be 
the collection of [n/k] sets obtained by splitting 
the set {1,...,} into equal parts (except the last 
one), i.e., Zp = {{1,...,k}, {k41,...,2k},...}. 
A function f : V(G) > Tx is called a k-bucket 
assignment, if for every edge uv € E(G) at least 
one of the following conditions is satisfied: 


* fw = fv), 
* |max f(u) — min f(v)| <b, 
* |min f(u) — max f(v)| < bd. 


Clearly, if a function f : V(G) > Zx is nota 
k-bucket assignment, then there is no ordering 
of bandwidth at most b consistent with f, where 
x is consistent with f iff m(v) € f(v) for each 
v € V(G). A bucket function can be seen as a 
rough assignment — instead of assigning vertices 
to their final positions in the ordering, we assign 
them to intervals. 

The O(10” poly(7)) time algorithm of [6] is 
based on two ideas, both related to the notion of 
bucket assignments. For the sake of presentation, 
let us assume that 7 is divisible by b, whereas b 
is a power of two. Moreover, we assume that G is 
connected, as otherwise it is enough to consider 
each connected component of G separately. 

First, one needs to show that there is a family 
of at most n3”—! b-bucket assignments F, such 
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that any ordering of bandwidth at most D is 
consistent with some b-bucket assignment from 
F. We create F recursively by branching. First, 
fix an arbitrary vertex vo, and assign it to some in- 
terval from Z; (there are at most n choices here). 
Next, consider any vertex v without assigned 
interval, which has a neighbor u with already 
assigned interval. By the assumption that G is 
connected, v always exists. Note that in order 
to create a valid bucket assignment, v has to be 
assigned either to the same interval as u or to 
one of its two neighboring intervals. This gives 
at most three branches to be explored. 

In the second phase, consider some b-bucket 
assignment f © F. We want to check whether 
there exists some ordering of bandwidth at most 
b consistent with f. To do this, for each vertex 
v, we branch into two choices, deciding whether 
v should be assigned to the left half of f(v) or 
to the right half of f(v). This leads to at most 
2” b/2-bucket assignments to be processed. The 
key observation is that each of those assignments 
can be naturally split into two independent sub- 
problems. This is because each edge within an 
interval of length b/2 and each edge between 
two neighboring intervals of length b/2 will be 
of length at most b — 1. Additionally, each edge 
connecting two vertices with at least two intervals 
of length b/2 in between would lead to violating 
the constraint of being a valid b/2-bucket assign- 
ment. Therefore, it is enough to consider vertices 
in even and odd intervals separately (see Fig. 1). 
Such routine of creating more and more refined 
bucket assignments can be continued, where the 
running time used for 7 vertices satisfies 


Tiny = 2.2: r(5) 
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Exact Algorithms for Bandwidth, Fig. 1 Thick vertical lines separate subsequent intervals from Zp/2. Meaningful 


edges connect vertices with exactly one interval in between 
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which in turn gives T(n) = 4"poly(n). 
Since we have |F| < n3”"~!, we end up 
with O(12"poly(7)) time algorithm. If instead 
of generating b-bucket assignments one uses 
b/2-bucket assignments (there are at most n5”~! 
of them), then the running time can be improved 
to 10” poly(7). 


Dynamic Programming 

In [2,5], Cygan and Pilipczuk have shown that 
for a single (b + 1)-bucket assignment, one can 
check in time and space O(2”poly(n)) whether 
there exists an ordering of bandwidth at most 
b consistent with it. Since there are at most 
n3"—! (b + 1)-bucket assignments, this leads to 
O(6" poly()) time algorithm. 

The key idea is to assign vertices to their final 
positions consistent with some f € F in a very 
specific order. Let us color the set of positions 
{1,...,”} with color(i) = (i — 1) mod (6 + 1). 
Define a color order of positions, where 
positions from {1,...,m} are sorted by their 
color values, breaking ties with position values 
(see Fig. 2). 

A lemma that proves usefulness of the 
color order shows that if we assign vertices to 
positions in the color order, then we can use 
the standard Held-Karp dynamic programming 
over subsets approach. In particular, in a state of 


b+1 b+1 
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Exact Algorithms for Bandwidth, Fig. 2. An index of 
each position in a color order form = 14 and b = 3 
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dynamic programming, it is enough to store the 
subset S C V(G) of vertices already assigned 
to the first |S] positions in the color order 
(see Fig. 3). 


Further Improvements 

Instead of upper bounding running time of the 
algorithm for each (b + 1)-bucket assignment 
separately, one can count the number of states 
of the dynamic programming routine used by the 
algorithm throughout the processing of all the 
bucket assignments. As shown in [2], this leads to 
O(5” poly(7)) running time, which with more in- 
sights and more technical analysis can be further 
improved to O(4.83”) [5] and O(4.383”") [3]. 
If only polynomial space is allowed, then the 
best known algorithm needs O(9.363”) running 
time [4]. 


Related Work 


Concerning small values of b, Saxe [8] presented 
a nontrivial O(n?+1) time and space dynamic 
programming, consequently proving the problem 
to be in XP. However, Bodlaender et al. [1] have 
shown that bandwidth is hard for any fixed level 
of the W hierarchy. 

For a related problem of minimum distortion 
embedding, Fomin et al. [7] obtained a 
O(5"poly()) time algorithm, improved by 
Cygan and Pilipczuk [4] to running times 


same as for the best known bandwidth 
algorithms. 

b \ 
Re fe ee 7I 


Exact Algorithms for Bandwidth, Fig. 3 When a ver- 
tex v is to be assigned to the next position in the color 
order, then all its neighbors from the left interval cannot 


be yet assigned a position, whereas all its neighbors from 
the right interval have to be already assigned in order to 
obtain an ordering of bandwidth at most b 
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Open Problems 


Many vertex ordering problems admit O(2” 
poly()) time and space algorithms, like Hamil- 
tonicity, cutwidth, pathwidth, optimal linear 
arrangement, etc. In [2], Cygan and Pilipczuk 
have shown that a dynamic programming routine 
with such a running time is possible, provided 
a (b + 1)-bucket assignment is given. A natural 
question to ask is whether it is possible to obtain 
O(2”"poly(n)) without the assumption of having 
a fixed assignment to be extended. 
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Problem Definition 


The dominating set problem is a classical NP- 
hard optimization problem which fits into the 
broader class of covering problems. Hundreds of 
papers have been written on this problem that has 
a natural motivation in facility location. 


Definition 1 For a given undirected, simple 
graph G = (V, E), a subset of vertices D C V is 
called a dominating set if every vertex u € V—D 
has a neighbor in D. The minimum dominating 
set problem (abbr. MDS) is to find a minimum 
dominating set of G, i.e., a dominating set of G 
of minimum cardinality. 


Problem 1 (MDS) 


INPUT: Undirected simple graph G = (V, E). 
OUTPUT: A minimum dominating set D of G. 
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Various modifications of the dominating set prob- 
lem are of interest, some of them obtained by 
putting additional constraints on the dominating 
set aS, e.g., requesting it to be an independent 
set or to be connected. In graph theory, there 
is a huge literature on domination dealing with 
the problem and its many modifications. In graph 
algorithms, the MDS problem and some of its 
modifications like independent dominating set 
and connected dominating set have been studied 
as benchmark problems for attacking NP-hard 
problems under various algorithmic approaches. 


Known Results 

The algorithmic complexity of MDS and its mod- 
ifications when restricted to inputs from a par- 
ticular graph class has been studied extensively. 
Among others, it is known that MDS remains 
NP-hard on bipartite graphs, split graphs, planar 
graphs, and graphs of maximum degree 3. Poly- 
nomial time algorithms to compute a minimum 
dominating set are known, e.g., for permutation, 
interval, and k-polygon graphs. There is also a 
0(3* n°™) time algorithm to solve MDS on 
graphs of treewidth at most k. 

The dominating set problem is one of the 
basic problems in parameterized complexity; it 
is W[2]-complete and thus it is unlikely that the 
problem is fixed parameter tractable. On the other 
hand, the problem is fixed parameter tractable on 
planar graphs. Concerning approximation, MDS 
is equivalent to MINIMUM SET COVER under L- 
reductions. There is an approximation algorithm 
solving MDS within a factor of 1 + log|V], 
and it cannot be approximated within a factor 
of (1 — €)In|V| for any « > 0, unless NP C 
DTIME(n'?2 8”), 


Exact Exponential Algorithms 
If P # NP, then no polynomial time algorithm 
can solve MDS. Even worse, it has been observed 
in [5] that unless SNPC SUBEXP (which is 
considered to be highly unlikely), there is not 
even a subexponential time algorithm solving the 
dominating set problem. 

The trivial O(2” (n + m)) algorithm, which 
simply checks all the 2” vertex subsets whether 
they are dominating, clearly solves MDS. Three 
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faster algorithms have been established in 2004. 
The algorithm of Fomin et al. [5] uses a deep 
graph-theoretic result due to B. Reed, stating 
that every graph on n vertices with minimum 
degree at least three has a dominating set of size 
at most 3n/8, to establish an O(2°-9°°”) time 
algorithm solving MDS. The O(2°-9!9”) time 
algorithm of Randerath and Schiermeyer [9] uses 
very nice ideas including matching techniques to 
restrict the search space. Finally, Grandoni [6] 
established an O(29-85°”) time algorithm to solve 
MDS. 


Key Results 


Branch and Reduce and Measure 

and Conquer 

The work of Fomin, Grandoni, and Kratsch 
presents a simple and easy way to implement 
a recursive branch and reduce algorithm to solve 
MDS. It was first presented at ICALP 2005 [2] 
and later published in 2009 in [3]. The running 
time of the algorithm is significantly faster than 
the ones stated for the previous algorithms. 
This is heavily based on the analysis of the 
running time by measure and conquer, which 
is a method to analyze the worst case running 
time of (simple) branch and reduce algorithms 
based on a sophisticated choice of the measure of 
a problem instance. 


Theorem 1 There is a branch and reduce al- 
gorithm solving MDS in time O(2°-6!9") using 
polynomial space. 


Theorem 2 There is an algorithm solving MDS 
in time O(29:5°8") using exponential space. 


The algorithms of Theorems | and 2 are 
simple consequences of a transformation from 
MDS to MINIMUM SET COVER (abbr. MSC) 
combined with new exact exponential time 
algorithms for MSC. 


Problem 2 (MSC) 


INPUT: Finite set // and a collection S of subsets 
Si, So, area St of U. 
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OUTPUT: A minimum set cover S’, where S’ C 
S is a set cover of (U,S) if Us. es, Si =U. 


Theorem 3 There is a branch and reduce al- 
gorithm solving MSC in time 0(2°305(UIF+ISD) 
using polynomial space. 


Applying memorization to the polynomial 
space algorithm of Theorem 3, the running time 
can be improved as follows. 


Theorem 4 There is an algorithm solving MSC 
in time O(2°29°SI+I4)) needing exponential 
space. 


The analysis of the worst case running time of 
the simple branch and reduce algorithm solving 
MSC (of Theorem 3) is done by a careful choice 
of the measure of a problem instance which al- 
lows to obtain an upper bound that is significantly 
smaller than the one that could be obtained using 
the standard measure. The refined analysis leads 
to acollection of recurrences. Then, random local 
search was used to compute the weights, used in 
the definition of the measure, aiming at the best 
achievable upper bound of the worst case running 
time. By now various other methods to do these 
time-consuming computations are available; see, 
e.g., [1]. 


Getting Faster MDS Algorithms 

There is a lot of interest in exact exponential 
algorithms for solving MDS and in improving 
their best known running times. Two important 
improvements on the running times of the orig- 
inal algorithm stated in Theorems | and 2 have 
been achieved. To simplify the comparison, let 
us mention that in [4] those running times are 
stated as O(1.5259”) using polynomial space and 
O(1.5132”) needing exponential space. 

Van Rooij and Bodlaender presented faster ex- 
act exponential algorithms solving MDS that are 
strongly based on the algorithms of Fomin et al. 
and the methods of their analysis. By introducing 
new reduction rules in the algorithm and a refined 
analysis, they achieved running time O(1.5134”) 
using polynomial space and time O(1.5063”), 
presented at STACS 2008. This analysis has been 
further improved in [11] to achieve a running time 
of O(1.4969") using polynomial space, which 
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was published in 2011. It should be emphasized 
that memorization cannot be applied to the latter 
algorithm. 

The currently best known algorithms solving 
MDS have been obtained by Ywata [7] and pre- 
sented at IPEC 2011. 


Theorem 5 There is a branch and reduce al- 
gorithm solving MDS in time O(1.4864") using 
polynomial space. 


Theorem 6 There is an algorithm solving 
MDS in time O(1.4689") needing exponential 
space. 


Ywata’s polynomial space branch and reduce 
algorithm is also strongly related to the algo- 
rithm of Fomin et al. and its analysis. The im- 
provement in the running time is achieved by 
some crucial change in the order of branchings 
in the algorithm solving MSC, i.e., the algo- 
rithm branches on the same element consecu- 
tively. These consecutive branchings can then 
be exploited by a refined analysis using global 
weights called potentials. Thus, such an analy- 
sis is dubbed “potential method.” By a variant 
of memorization where dynamic programming 
memorizes only solutions of subproblems with 
small number of elements, an algorithm of run- 
ning time O(1.4689”) needing exponential space 
has been obtained. 


Counting Dominating Sets 

A strongly related problem is #DS that asks to 
determine for a given graph G the number of 
dominating sets of size k, for any k. In [8], 
Nederlof, van Rooij, and van Dijk show how to 
combine inclusion/exclusion and a branch and re- 
duce algorithm while using measure and conquer, 
as to obtain an algorithm (needing exponential 
space) of running time O(1.5002”). Clearly, this 
also solves MDS. 


Applications 


There are various other NP-hard domination-type 
problems that can be solved by exact exponen- 
tial algorithms based on an algorithm solving 
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MINIMUM SET COVER: any instance of the initial 
problem is transformed to an instance of MSC, 
and then an algorithm solving MSC is applied and 
thus the initial problem is solved. Examples of 
such problems are TOTAL DOMINATING SET, k- 
DOMINATING SET, k-CENTER, and MDS on split 
graphs. Measure and conquer and the strongly 
related quasiconvex analysis of Eppstein [1] have 
been used to design and analyze a variety of 
exact exponential branch and reduce algorithms 
for NP-hard problems, optimization, counting, 
and enumeration problems; see [4]. 


Open Problems 


While for many algorithms it is easy to show that 
the worst case analysis is tight, this is not the 
case for the nowadays time analysis of branch 
and reduce algorithms. For example, the worst 
case running times of the branch and reduce 
algorithms of Fomin et al. [3] solving MDS 
and MSC remain unknown; a lower bound of 
92(3"/4) for the MDS algorithm is known. The 
situation is similar for many other branch and 
reduce algorithms. Consequently, there is a strong 
need for new and better tools to analyze the 
worst case running time of branch and reduce 
algorithms. 
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Problem Definition 


The satisfiability problem (SAT) for Boolean for- 
mulas in conjunctive normal form (CNF) is one 
of the first NP-complete problems [2, 13]. Since 
its NP-completeness currently leaves no hope for 
polynomial-time algorithms, the progress goes 
by decreasing the exponent. There are several 
versions of this parametrized problem that differ 
in the parameter used for the estimation of the 
running time. 


Problem 1 (SAT) 


INPUT: Formula F in CNF containing 7 vari- 
ables, m clauses, and / literals in total. 

OUTPUT: “Yes” if F has a satisfying assignment, 
1.e., a substitution of Boolean values for the 
variables that makes F true. “No” otherwise. 


The bounds on the running time of SAT algo- 
rithms can be thus given in the form | F |?“ - a”, 
|F|OM . B™, or |F|O™ - y’, where |F| is the 
length of a reasonable bit representation of F 
(i.e., the formal input to the algorithm). In fact, 
for the present algorithms, the bases f and y are 
constants, while @ is a function a(n,m) of the 
formula parameters (because no better constant 
than a = 2 is known). 


Notation 

A formula in conjunctive normal form is a set of 
clauses (understood as the conjunction of these 
clauses), a clause is a set of literals (understood 
as the disjunction of these literals), and a literal 
is either a Boolean variable or the negation of 
a Boolean variable. A truth assignment assigns 
Boolean values (false or true) to one or more vari- 
ables. An assignment is abbreviated as the list of 
literals that are made true under this assignment 
(e.g., assigning false to x and true to y is denoted 
by —x, y). The result of the application of an 
assignment A to a formula F (denoted F[A]) 
is the formula obtained by removing the clauses 
containing the true literals from F and removing 
the falsified literals from the remaining clauses. 
For example, if F = (x V>y Vz) A (y V 772), 
then F[—x, y] = (z). A satisfying assignment 
for F is an assignment A such that F[A] = 
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true. If such an assignment exists, F is called 
satisfiable. 


Key Results 
Bounds for 8 and y 


General Approach and a Bound for 8 

The trivial brute-force algorithm enumerating all 
possible assignments to the n variables runs in 2” 
polynomial-time steps. Thus a < 2, and by trivial 
reasons also B, y < 2. In the early 1980s, Monien 
and Speckenmeyer noticed that 6 could be made 
smaller. (They and other researchers also noticed 
that a could be made smaller for a special case 
of the problem where the length of each clause 
is bounded by a constant; the reader is referred 
to another entry (Local search algorithms for k- 
SAT) of the Encyclopedia for relevant references 
and algorithms.) Then Kullmann and Luckhardt 
[12] set up a framework for divide-and-conquer 
(Also called DPLL due to the papers of Davis and 
Putnam [6] and Davis, Logemann, and Loveland 
[7].) algorithms for SAT that split the original 
problem into several (yet usually a constant num- 
ber of) subproblems by substituting the values 
of some variables and simplifying the obtained 
formulas. This line of research resulted in the 
following upper bounds for f and y: 


Theorem 1 (Hirsch [8]) SAT can be solved in 
time 


1. |F/OM 30.30897m. 
2. |F|OM P 0.102991 


A typical divide-and-conquer algorithm for SAT 
consists of two phases: splitting of the origi- 
nal problem into several subproblems (e.g., re- 
ducing SAT(F) to SAT(F[x]) and SAT (F[-x])) 
and simplification of the obtained subproblems 
using polynomial-time transformation rules that 
do not affect the satisfiability of the subprob- 
lems (i.e., they replace a formula by an equi- 
satisfiable one). The subproblems F,,..., Fx for 
splitting are chosen so that the corresponding 
recurrent inequality using the simplified problems 
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/ / 
| en 


k 
T(F)< > T (F/) + const, 


i=1 


gives a desired upper bound on the number of 
leaves in the recurrence tree and, hence, on the 
running time of the algorithm. In particular, in 
order to obtain the bound | F|O) . 29-30897™ one 
takes either two subproblems F [x], F|-x] with 
recurrent inequality 


tm < tm—3 + tm—4 


or four subproblems F |x, y], F[x, >y], F[-x, y], 
F [-x,-y] with recurrent inequality 


tm S 2tm—6 a 2tm—7 


where t; = 


rules used in the 
|F | O(1), 20.102991 


max m(G)<il (G). The simplification 
| F|O) . 20.30897m time and the 


-time algorithms are as follows: 
Simplification Rules 


Elimination of 1-Clauses If F contains a 
1-clause (a), replace F by F [a]. 


Subsumption If F contains two clauses C and 
D such that C C D, replace F by F\{D}. 


Resolution with Subsumption Suppose a lit- 
eral a and clauses C and D are such that a is the 
only literal satisfying both conditions a € C and 
sa € D. In this case, the clause (CUD)\{a, -a} 
is called the resolvent by the literal a of the 
clauses C and D and denoted by R(C, D). 

The rule is: if R(C,D) C D, replace F by 
(F\{D}) U{R(C, D)}. 


Elimination of a Variable by Resolution 
[6] Given a literal a, construct the formula 
DP q(F) by 


1. Adding to F all resolvents by a 
2. Removing from F all clauses containing a 
or 7a 
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The rule is: if DP g(F) is not larger in m (resp., 
in /) than F, then replace F by DP g(F). 


Elimination of Blocked Clauses A clause C 
is blocked for a literal a w.r.t. F if C contains 
the literal a, and the literal —a occurs only in 
the clauses of F that contain the negation of at 
least one of the literals occurring in C\{a}. For 
a CNF-formula F and a literal a occurring in it, 
the assignment /(a, F) is defined as 


{a} U {literals x € {a, 7a} | the clause 


{a,x} is blocked for —a w.r.t. F}. 


Lemma 2 (Kullmann [11]) 


(1) If a clause C is blocked for a literal a w.rt. 
F, then F and F\{C} are equi-satisfiable. 

(2) Given a literal a, the formula F is satisfiable 
iff at least one of the formulas F[-a] and 
F [I (a, F)] is satisfiable. 


The first claim of the lemma is employed as a 
simplification rule. 


Application of the Black and White Literals 
Principle Let P be a binary relation between 
literals and formulas in CNF such that for a 
variable v and a formula F’, at most one of 
P(v, F) and P(-v, F) holds. 


Lemma 3 Suppose that each clause of F 
that contains a literal w satisfying P(w, F) 
contains also at least one literal b satisfying 
P(-b, F). Then F and F\{{l|P(-l, F)}] are 
equi-satisfiable. 


A Bound for y 

To obtain the bound |F|O . 29-102997 it is 
enough to use a pair F[-a], F[/(a, F)] of sub- 
problems (see Lemma 2(2)) achieving the desired 
recurrent inequality t7 < tj-5 + tj-17 and to 
switch to the | F|O“ .29-30897 time algorithm if 
there are none. A recent (much more technically 
involved) improvement to this algorithm [16] 
achieves the bound | F|O( . 29-0926! 
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A Bound for a 

Currently, no non-trivial constant upper bound 
for a is known. However, starting with [14] 
there was an interest to non-constant bounds. 
A series of randomized and deterministic al- 
gorithms showing successive improvements was 
developed, and at the moment the best possible 
bound is achieved by a deterministic divide-and- 
conquer algorithm employing the following re- 
cursive procedure. The idea behind it is a di- 
chotomy: either each clause of the input formula 
can be shortened to its first k literals (then a 
k-CNF algorithm can be applied), or all these 
literals in one of the clauses can be assumed 
false. (This clause-shortening approach can be 
attributed to Schuler [15], who used it in a ran- 
domized fashion. The following version of the 
deterministic algorithm achieving the best known 
bound both for deterministic and randomized 
algorithms appears in [5].) 

Procedure S 


Input: a CNF formula F and a positive inte- 
gerk. 


1. Assume F consists of clauses C),..., Cm. 
Change each clause C; to a clause Dj; as 
follows: If |C;| > k then choose any k 
literals in C; and drop the other literals; 
otherwise leave C; as is, i.c., Dj = C;. Let 
F’ denote the resulting formula. 

2. Test satisfiability of F’ using the m - 
poly (n) - (2 — 2/(k + 1))”-time k-CNF 
algorithm defined in [4]. 

3. If F’ is satisfiable, output “satisfiable” and 
halt. Otherwise, for each i, do the follow- 
ing: 

1. Convert F to F; as follows: 
1. Replace C; by Dj; forall j <i. 
2. Assign false to all literals in D;. 
2. Recursively invoke Procedure S on 
(Fi, k). 
4. Return “unsatisfiable’. 


The algorithm just invokes Procedure S on 
the original formula and the integer parameter 
k =k x (m,n). The most accurate analysis of 
this family of algorithms by Calabro, Impagli- 
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azzo, and Paturi [1] implies that, assuming that 
m > n, one can obtain the following bound 
by taking k(m,n) = 2log(m/n) + const. 
(This explicit bound is not stated in [1] and is 
inferred in [3].) 


Theorem 4 (Dantsin, Hirsch [3]) Assuming 
m >n, SAT can be solved in time 


FIO .2" (1 ). 
ms O (log (m/n)) 


Applications 


While SAT has numerous applications, the pre- 
sented algorithms have no direct effect on them. 


Open Problems 


Proving a constant upper bound ona < 2 remains 
a major open problem in the field, as well as the 
hypothetic existence of (1 + ¢)!-time algorithms 
for arbitrary small ¢ > 0. 

It is possible to perform the analysis of a 
divide-and-conquer algorithm and even to gener- 
ate simplification rules automatically [10]. How- 
ever, this approach so far led to new bounds only 
for the (NP-complete) optimization version of 
2-SAT [9]. 


Experimental Results 


Jun Wang has implemented the algorithm yield- 
ing the bound on f and collected some statistics 
regarding the number of applications of the sim- 
plification rules [17]. 
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Problem Definition 


A graph class IT is a set of simple graphs. One 
can also think of IT as a property: IT comprises 
all the graphs that satisfy a certain condition. We 
say that class (property) [7 is hereditary if it 
is closed under taking induced subgraphs. More 
precisely, whenever G € JT and H is an induced 
subgraph of G, then also H ¢€ JT. 


Exact Algorithms for Induced Subgraph Problems 


We shall consider the MAXIMUM INDUCED 
IT-SUBGRAPH problem: given a graph G, find 
the largest (in terms of the number of vertices) 
induced subgraph of G that belongs to IT. 
Suppose now that class [7 is polynomial-time 
recognizable: there exists an algorithm that 
decides whether a given graph H belongs to /7 in 
polynomial time. Then MAXIMUM INDUCED IT- 
SUBGRAPH on an -vertex graph G can be solved 
by brute force in time (The O*(-) notation hides 
factors polynomial in the input size.) O* (2”): we 
iterate through all the induced subgraphs of G, 
and on each of them, we run a polynomial-time 
test deciding whether it belongs to /7. 

Can we do anything smarter? Of course, this 
very much depends on the class JT we are work- 
ing with, MAXIMUM INDUCED IT-SUBGRAPH 
is a generic problem that encompasses many 
other problems as special cases; examples in- 
clude CLIQUE (JJ = complete graphs), INDE- 
PENDENT SET (/7 = edgeless graphs), or FEED- 
BACK VERTEX SET (U/7 = forests). It is conve- 
nient to assume that J7 is also hereditary; this 
assumption is satisfied in many important exam- 
ples, including the aforementioned special cases. 

So far, the MAXIMUM INDUCED [II- 
SUBGRAPH problem has been studied for many 
graph classes IT, and basically in all the cases 
it turned out that it is possible to find an 
algorithm with running time O(c”) for some 
c < 2. Obtaining a result of this type is 
often informally called breaking the 2” barrier. 
While the algorithms share a common general 
methodology, vital details differ depending on 
the structural properties of the class 7. This 
makes each and every algorithm of this type 
contrived to a particular scenario. However, it 
is tempting to formulate the following general 
conjecture. 


Conjecture 1 ({1]) For every hereditary, 
polynomial-time recognizable class of graphs 
IT, there exists a constant cy < 2 for which there 
is an algorithm solving MAXIMUM INDUCED 
IT-SUBGRAPH in time O(c}h). 


On one hand, current partial progress on this 
conjecture consists of scattered results exploit- 
ing different properties of particular classes JT, 
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without much hope for proving more general 
statements. On the other hand, finding a coun- 
terexample refuting Conjecture | based, e.g., on 
the Strong Exponential Time Hypothesis seems 
problematic: the input to MAXIMUM INDUCED 
IT-SUBGRAPH consists only of C) bits of in- 
formation about adjacencies between the ver- 
tices, and it seems difficult to model the search 
space of a general k-SAT using such input under 
the constraint that JT has to be hereditary and 
polynomial-time recognizable. 

It can be that Conjecture | is either false 
or very difficult to prove, and therefore, one 
can postulate investigating its certain subcases 
connected to well-studied classes of graphs. For 
instance, one could assume that graphs from [7 
have constant treewidth or that JT is a subclass of 
chordal or interval graphs. Another direction is to 
strengthen the assumption about the description 
of the class /7 by requiring that belonging to 
IT can be expressed in some formalism (e.g., 
some variant of logic). Finally, one can inves- 
tigate the algorithms for MAXIMUM INDUCED 
IT-SUBGRAPH where IT is not required to be 
hereditary; here, natural nonhereditary properties 
are connectivity and regularity. 


Key Results 


Table 1 presents a selection of results on the 
MAXIMUM INDUCED J7T-SUBGRAPH problem. 
Since the algorithms are usually quite technical 
when it comes to details, we now present an 
overview of the general methodology and most 
important techniques. In the following, we as- 
sume that JT is hereditary and polynomial-time 
recognizable. 

Most often, the general approach is to exam- 
ine the structure of the input instance and of 
a fixed, unknown optimum solution. The goal 
is to identify as broad spectrum of situations 
as possible where the solution can be found 
by examining O((2 — «)”) candidates, for some 
é > 0. By checking the occurrence of each 
of these situations, we eventually narrow down 
our investigations to the case where we have a 
well-defined structure of the input instance and 
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Exact Algorithms for Induced Subgraph 
Problems, Table 1 Known results for MAXIMUM 
INDUCED JI-SUBGRAPH. The first part of the table 
presents results for problems for which breaking the 2” 
barrier follows directly from branching on forbidden 
subgraphs. The second part contains results for which 
breaking the barrier requires a nontrivial insight into the 
structure of IT. Finally, the last part contains results for 
nonhereditary classes IT. Here, €¢ denotes a small, positive 
constant, and its index specifies a parameter on which the 
value of this constant depends 


Property Time complexity Reference 
Edgeless O(1.2109”) Robson [10] 
Biclique O(1.3642”) Gaspers et al. [6] 
Cluster graph O(1.6181”) Fomin et al. [3] 
Bipartite O(1.62”) Raman et al. [9] 
Acyclic O(1.7347") Fomin et al. [2] 
Constant treewidth O©O(1.7347”) Fomin et al. [2] 
Planar O(1.7347") Fomin et al. [4] 
d -degenerate O((2—€g)")  Pilipezukx2 [8] 
Chordal O((2—«)”) Bliznets et al. [1] 
Interval O((2—«)”) Bliznets et al. [1] 
r-regular O((2—e,-)") — Gupta et al. [7] 
Matching O(1.6957") Gupta et al. [7] 


a number of assumptions about how the solution 
looks like. Then, hopefully, a direct algorithm can 
be devised. 

Let us consider a very simple example of 
this principle, which is also a technique used 
in many algorithms for breaking the 2” barrier. 
Suppose the input graph has n vertices and as- 
sume the optimum solution is of size larger that 
(1/2 + 45)n, for some 6 > 0. Then, as candidates 
for the optimum solution, we can consider all 
the vertex subsets of at least this size: there is 
only (2 — ¢)” of them, where ¢ > O depends 
on 6. Similarly, if the optimum solution has size 
smaller than (1/2 — 6)n, then we can identify 
this situation by iterating through all the vertex 
subsets of size (1/2 — 5)n (whose number is 
again (2 — e)” for some ¢ > 0) and verifying 
that none of them induces a graph belonging to 
IT; note that here we use the assumption that 
IT is hereditary. In this case we can solve the 
problem by looking at all vertex subsets of size 
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at most (1/2 — 6)n. All in all, we can solve 
the problem faster than O*(2") provided that 
the number of vertices in the optimum solution 
differs by at least dn from n/2, for some € > 0. 
More precisely, for every 6 > 0 we will obtain 
a running time of the form O((2 — e)”), where 
€ tends to 0 when 6 tends to 0. Hence, we can 
focus only on the situation when the number of 
vertices in the optimum solution is very close 
ton/2. 

We now give an overview of some other im- 
portant techniques. 


Branching on Forbidden Induced 

Subgraphs 

Every hereditary graph class [7 can be char- 
acterized by giving a minimal set of forbidden 
induced subgraphs F: a graph belongs to /7 if 
and only if it does not contain any graph from 
F as an induced subgraph, and F is inclusion- 
wise minimal with this property. For instance, 
the class of forests is characterized by F being 
the family of all the cycles, whereas taking F to 
be the family of all the cycles of length at least 
4 gives the class of chordal graphs. For many 
important classes the family F is infinite, but 
there are notable examples where it is finite, like 
cluster, trivially perfect, or split graphs. 

If I7 is characterized by a finite set of forbid- 
den subgraphs /, then already a simple branch- 
ing strategy yields an algorithm working in time 
O((2 — «)"), for some € > 0 depending on F. 
Without going into details, we iteratively find 
a forbidden induced subgraph that is not yet 
removed by the previous choices and branch on 
the fate of all the undecided vertices in this 
subgraph, omitting the branch where all of them 
are included in the solution. Since this forbidden 
induced subgraph is of constant size, a standard 
analysis shows that the running time of this algo- 
rithm is O((2 — e)”) for some ¢ > 0 depending 
on max yer |V(H)|. This simple observation can 
be combined with more sophisticated techniques 
in case when F is infinite. We can namely start 
the algorithm by branching on forbidden induced 
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subgraphs that are of constant size and, when 
their supply is exhausted, turn to some other 
algorithms. The following lemma provides a for- 
malization of this concept; a graph is called F- 
free if it does not contain any graph from F as an 
induced subgraph. 


Lemma 1 ([1]) Let F be a finite set of graphs 
and let € be the maximum number of vertices in 
a graph from F. Let II be a hereditary graph 
class that is polynomial-time recognizable. As- 
sume that there exists an algorithm A that for 
a given F-free graph G on n vertices, in time 
O((2—s)”) finds a maximum induced subgraph of 
G that belongs to IT, for some ¢ > 0. Then there 
exists an algorithm A’ that for a given graph 
G on n vertices, in time O((2 — &')") finds a 
maximum induced subgraph of G that is F -free 
and belongs to II, where &' > 0 is a constant 
depending on « and €. 


Thus, for the purpose of breaking the 2” bar- 
rier, it is sufficient to focus on the case when 
no constant-size forbidden induced subgraph is 
present in the input graph. 


Exploiting a Large Substructure 

Here, the general idea is to look for a large 
substructure in the graph that can be leveraged 
to design an algorithm breaking the barrier. Let 
us take as an example the MAXIMUM INDUCED 
CHORDAL SUBGRAPH problem, considered by 
Bliznets et al. [1]. Suppose that in the input graph 
G one can find a clique Q of size 6n, for some 
56 > 0; recall that the largest clique in a graph 
can be found as fast as in time O(1.2109”) [10]. 
Then consider the following algorithm: guess, by 
considering 2”—!2! possibilities, the intersection 
of the optimum solution with V(G) \ Q. Then 
observe that, since Q is a clique, every induced 
cycle in G can have only at most two vertices 
in common with Q. Hence, the problem of op- 
timally extending the choice on V(G) \ Q to 
Q essentially boils down to solving a VERTEX 
COVER instance on |Q| vertices, which can be 
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done in time O(1.2109!2!), As Q constitutes 
a linear fraction of all the vertices, the overall 
running time is O(1.2109!2! . 2”—!2!), which is 
O((2 — «)") for some € > O depending on 6. 
Thus, one can focus on the case where the largest 
clique in the input graph, and hence also in any 
maximum-sized induced chordal subgraph, has 
less than 6n vertices. 


Potential Maximal Cliques 

A potential maximal clique (PMC) in a graph G 
is a subset of vertices that becomes a clique in 
some inclusion-wise minimal triangulation (By a 
triangulation of a graph we mean any its chordal 
supergraph.) of G. Fomin and Villanger in [2] 
observed two facts. Firstly, whenever H is an 
induced subgraph of G of treewidth ft, then there 
exists a minimal triangulation TG of G that 
captures H in the following sense: every clique 
of TG intersects V(#) only at a subset of some 
bag of a fixed width-t tree decomposition of 
H. Secondly, a graph G on n vertices can have 
only O(1.734601"”) PMCs, which can be enu- 
merated in time O(1.734601”). Intuitively, this 
means that we can effectively search the space 
of treewidth-¢ induced subgraphs of G in time 
O(1.734601” - n°) using dynamic program- 
ming. Slightly more precisely, treewidth-t in- 
duced subgraphs of G can be assembled in a 
dynamic programming manner using states of 
the form (2,X), where 92 is a PMC in G 
and X is a subset of 92 of size at most t + 
1, corresponding to $2 M V(#). In this man- 
ner one can obtain an algorithm with running 
time O(1.734601” - n°) for finding the maxi- 
mum induced treewidth-t subgraph, which in par- 
ticular implies a O(1.734601”)-time algorithm 
for MAXIMUM INDUCED FOREST, equivalent 
to FEEDBACK VERTEX SET. Recently, Fomin 
et al. [5] extended this framework to encapsulate 
also problems where the induced subgraph H is 
in addition required to satisfy a property express- 
ible in Monadic Second-Order Logic. 
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Problem Definition 


The CNF satisfiability problem is to determine, 
given a CNF formula F with 7 variables, whether 
or not there exists a satisfying assignment for F. 
If each clause of F contains at most k literals, 
then F is called a kK-CNF formula and the prob- 
lem is called k-SAT, which is one of the most 
fundamental NP-complete problems. The trivial 
algorithm is to search 2” O/l-assignments for 
the 1 variables. But since [6], several algorithms 
which run significantly faster than this O(2”) 
bound have been developed. As a simple exercise, 
consider the following straightforward algorithm 
for 3-SAT, which gives us an upper bound of 
1.913”: choose an arbitrary clause in F, say, 
(x1 VX2V x3). Then generate seven new formulas 
by substituting to these x1, x2, and x3 all the 
possible values except (x1,x2,x3) = (0, 1,0) 
which obviously unsatisfies F. Now one can 
check the satisfiability of these seven formulas 
and conclude that F is satisfiable iff at least one 
of them is satisfiable. (Let T(n) denote the time 
complexity of this algorithm. Then one can get 
the recurrence T(n) < 7x T(n—3) and the above 
bound follows.) 


Key Results 


In the long history of k-SAT algorithms, the one 
by Schoning [11] is an important breakthrough. 
It is a standard local search and the algorithm 
itself is not new (see, e.g., [7]). Suppose that 
y is the current assignment (its initial value is 
selected uniformly at random). If y is a satisfying 
assignment, then the algorithm answers yes and 
terminates. Otherwise, there is at least one clause 
whose three literals are all false under y. Pick an 
arbitrary such clause and select one of the three 
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literals in it at random. Then flip (true to false 
and vice versa) the value of that variable, replace 
y with that new assignment, and then repeat the 
same procedure. More formally: 


SCH(CNF formula F’, integer 7) 

repeat / times 

y = uniformly random vector € {0, 1}” 

z = RandomWalk(F, y); 

if z satisfies F 

then output(z); exit; 

end 

output(“Unsatisfiable’ ); 

RandomWalk(CNF formula G(x1, x2,... 
assignment y); 

y=7i 

for 3n times 


Xn); 


if y’ satisfies G 

then return y’; exit; 

C < anarbitrary clause of G that is not satisfied 
by y's 

Modify y’ as follows: 

select one literal of C uniformly at random and 

flip the assignment to this literal; 

end 

return y’ 


Sch6ning’s analysis of this algorithm is very 
elegant. Let d(a,b) denote the Hamming dis- 
tance between two binary vectors (assignments) 
a and b. For simplicity, suppose that the formula 
F has only one satisfying assignment y* and the 
current assignment y is far from y* by Hamming 
distance d. Suppose also that the currently false 
clause C includes three variables, x;, x;, and 
xz. Then y and y* must differ in at least one 
of these three variables. This means that if the 
value of x;, x;, or xx is flipped, then the new as- 
signment gets closer to y* by Hamming distance 
one with probability at least 1/3. Also, the new 
assignment gets farther by Hamming distance one 
with probability at most 2/3. The argument can 
be generalized to the case that F has multiple 
satisfying assignments. Now here comes the key 
lemma: 
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Lemma 1 Let F be a satisfiable formula and 
y* be a satisfying assignment for F. For each 
assignment y, the probability that a satisfying 
assignment (that may be different from y*) 
is found by RandomWalk (F,y) is at least 
(1/(k — 1))40")/p(n), where p(n) is a 
polynomial in n. 


By taking the average over random ini- 
tial assignments, the following theorem 
follows: 


Theorem 1 For any satisfiable formula F 
on n variables, the success probability of 
RandomWalk (F,y) is at least (k/2(k — 
1))"/p(n) for some polynomial p. Thus, 
by setting IT = (2(k — 1)/k)" - p(n), 
SCH finds a satisfying assignment with high 
probability. When k = 3, this value of I is 
0(1.334"). 


Applications 


The Schéning’s result has been improved by a 
series of papers [1, 3,9] based on the idea of 
[3]. Namely, RandomWalk is combined with the 
(polynomial time) 2SAT algorithm, which makes 
it possible to choose better initial assignments. 
For derandomization of SCH, see [2]. Iwama and 
Tamaki [4] developed a nontrivial combination of 
SCH with another famous, backtrack-type algo- 
rithm by [8], resulting in the then fastest algo- 
rithm with O(1.324”) running time. The current 
fastest algorithm is due to [10], which is based 
on the same approach as [4] and runs in time 
O(1.32216"). 


Open Problems 


k-SAT is probably the most popular NP-complete 
problem for which numerous researchers are 
competing for its fastest algorithm. Thus, 
improving its time bound is always a good 
research target. 
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Experimental Results 


AI researchers have also been very active in SAT 
algorithms including local search; see, e.g., [5]. 
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Problem Definition 


Let G = (V, E) be an n-node undirected, simple 
graph without loops. A set J C V is called 
an independent set of G if the nodes of J are 
pairwise not adjacent. The maximum indepen- 
dent set (MIS) problem asks to determine the 
maximum cardinality w(G) of an independent set 
of G. MIS is one of the best studied NP-hard 
problems. 

We will need the following notation. The 
(open) neighborhood of a vertex v is N(v) = 
{u € V : uv € E}, and its closed neighborhood 
is N[v] = N(v) U {v}. The degree deg(v) of v 
is |N(v)|. For W CV, GW] = (W,En(%)) 
is the graph induced by W. We let G-W = 
G[V —W]. 


Key Results 


A very simple algorithm solves MIS (exactly) 
in O*(2") time: it is sufficient to enumerate 
all the subsets of nodes, check in polynomial 
time whether each subset is an independent set 


Exact Algorithms for Maximum Independent Set 


or not, and return the maximum cardinality in- 
dependent set. We recall that the O* notation 
suppresses polynomial factors in the input size. 
However, much faster (though still exponential- 
time) algorithms are known. In more detail, there 
exist algorithms that solve MIS in worst-case 
time O*(c”) for some constant c € (1,2). In 
this section, we will illustrate some of the most 
relevant techniques that have been used in the 
design and analysis of exact MIS algorithms. 
Due to space constraints, our description will be 
slightly informal (please see the references for 
formal details). 


Bounding the Size of the Search Tree 

All the nontrivial exact MIS algorithms, starting 
with [7], are recursive branching algorithms. As 
an illustration, consider the following simple MIS 
algorithm Alg1. If the graph is empty, output 
a(G) = 0 (base instance). Otherwise, choose any 
node v of maximum degree, and output 


a(G) = max{a(G — {v}), 1 + a(G — N[v})}. 


Intuitively, the subgraph G — {v} corresponds to 
the choice of not including v in the independent 
set (v is discarded), while the subgraph G — N [v] 
to the choice of including v in the independent set 
(v is selected). Observe that, when v is selected, 
the neighbors of uv have to be discarded. We 
will later refer to this branching as a standard 
branching. 

The running time of the above algorithm, and 
of branching algorithms more in general, can be 
bounded as follows. The recursive calls induce a 
search tree, where the root is the input instance 
and the leaves are base instances (that can be 
solved in polynomial time). Observe that each 
branching step can be performed in polynomial 
time (excluding the time needed to solve sub- 
problems). Furthermore, the height of the search 
tree is bounded by a polynomial. Therefore, the 
running time of the algorithm is bounded by 
O*(L(n)), where L(n) is the maximum number 
of leaves of any search tree that can be generated 
by the considered algorithm on an input instance 
with n nodes. Let us assume that L(n) < c” for 
some constant c > 1. When we branch at node v, 
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we generate two subproblems containing n — 1 
and n — |N[v]| nodes, respectively. Therefore, 
c has to satisfy c? > c™1 4 ct INI, As- 
suming pessimistically |N[v]| = 1, one obtains 
c” > 2c"—! and therefore c > 2. We can 
conclude that the running time of the algorithm 
is O*(2"). Though the running time of Alg1 
does not improve on exhaustive search, much 
faster algorithms can be obtained by branching 
in a more careful way and using a similar type 
of analysis. This will be discussed in the next 
subsections. 


Refined Branching Rules 

Several refined branching rules have been de- 
veloped for MIS. Let us start with some reduc- 
tion rules, which reduce the problem without 
branching (alternatively, by branching on a single 
subproblem). An isolated node v can be selected 
w.l.o.g.: 


a(G) = 1+a(G—N{v)). 


Observe that if N[u] C N[v], then node v can be 
discarded w.l.o.g. (dominance): 


a(G) = a(G —{v}). 


This rule implies that nodes of degree 1 can 
always be selected. 

Suppose that we branch at a node v, and in 
the branch where we discard v we select exactly 
one of its neighbors, say w. Then by replacing 
w with v, we obtain a solution of the same car- 
dinality including v: this means that the branch 
where we select v has to provide the optimal 
solution. Therefore, we can assume w.l].o.g. that 
the optimal solution either contains v or at least 
2 of its neighbors. This idea is exploited in the 
folding operation [1], which we next illustrate 
only in the case of degree-2 nodes. Let N[v] = 
{w1,w2}. Remove N[v]. If wyw2 ¢ E, create 
a node v’ and add edges between v’ and nodes 
in N(w1) U N(w2) — {vu}. Let Groa(v) be the 
resulting graph. Then, one has 


a(G) = 1+ a(Gyoa(v)). 
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Intuitively, including node v’ in the optimal so- 
lution to Gyoig(v) corresponds to selecting both 
w, and we, while discarding v’ corresponds to 
selecting v. 

Let Alg2 be the algorithm that exhaustively 
applies the mentioned reduction rules and then 
performs a standard branching on a node of 
maximum degree. Reduction rules reduce the 
number of nodes at least by 1; hence, we have 
the constraint c” > c”—!. If we branch at node v, 
deg(v) > 3. This gives c” > c?1 4 e?-4, which 
is satisfied by c > 1.380.... Hence, the running 
time is in O*(1.381”). 

Let us next briefly sketch some other useful 
ideas that lead to refined branchings. A mirror [3] 
of a node v is a node u at distance 2 from v such 
that N(v) — N(u) induces a clique. By the above 
discussion, if we branch by discarding v, we can 
assume that we select at least two neighbors of uv 
and therefore we have also to discard the mirrors 
M(v) of v. In other terms, we can use the refined 
branching 


a(G)=max{a(G—{v}—M(v)), H+-a(G—N[v])}. 


A satellite [5] of a node v is a node u at distance 2 
from v such that there exists a node u’ € N(v) N 
N(u) that satisfies N[w’'] — N[v] = {u}. Observe 
that if an optimal solution discards u, then we can 
discard v as well by dominance since N[w’] € 
N[v] in G — {u}. Therefore, we can assume that 
in the branch where we select v, we also select its 
satellites S(v). In other terms, 


a(G) = max{a(G — {v}),1 + |S(v)| 
+ a(G — N[v] — UnesqvyN [u))}- 


Another useful trick [4] is to branch on nodes 
that form a small separator (of size | or 2 in the 
graph), hence isolating two or more connected 
components that can be solved independently 
(see also [2,5]). 


Measure and Conquer 

Above we always used the number n of nodes as a 
measure of the size of subproblems. As observed 
in [3], much tighter running time bounds can 
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be achieved by using smarted measures. As an 
illustration, we will present a refined bound on 
the running time of Alg2. 

Let us measure the size of subproblems with 
the number 73 of nodes of degree at least 3 (large 
nodes). Observe that, when n3 = 0, G is acollec- 
tion of isolated nodes, paths, and cycles. There- 
fore, in that case, Alg2 only applies reduction 
rules, hence solving the problem in polynomial 
time. In other terms, L(n3) = L(0) = 1 in this 
case. If the algorithm applies any reduction rule, 
the number of large nodes cannot increase and we 
obtain the trivial inequality c”3 > c”3. Suppose 
next that Alg2 performs a standard branching at 
a node v. Note that at this point all nodes in the 
graph are large. If deg(v) > 4, then we obtain the 
inequality c”3 > c?3—! +¢"3~> which is satisfied 
by c > 1.324.... Otherwise (deg(v) = 3), 
observe that the neighbors of v have degree 3 
in G and at most 2 in G — {v}. Therefore, the 
number of large nodes is at most n3 — 4 in both 
subproblems G — {v} and G — N[v]. This gives 
the inequality c”3 > 2c”3~4 which is satisfied by 
c> 21/4 < 1.1893. We can conclude that the 
running time of the algorithm is in O*(1.325”). 
In [3], each node is assigned a weight which is a 
growing function of its degree, and the measure 
is the sum of node weights (a similar measure is 
used also in [2,5]). 

In [2], it is shown how to use a fast MIS 
algorithm for graphs of maximum degree A to 
derive faster MIS algorithms for graphs of maxi- 
mum degree A + 1. Here the measure used in the 
analysis is a combination of the number of nodes 
and edges. 


Memorization 

So far we described algorithms with polynomial 
space complexity. Memorization [6] is a tech- 
nique to speed up exponential-time branching 
algorithms at the cost of an exponential space 
complexity. The basic idea is to store the op- 
timal solution to subproblems in a proper (ex- 
ponential size) data structure. Each time a new 
subproblem is generated, one first checks (in 
polynomial time) whether that subproblem was 
already solved before. This way one avoids to 
solve the same subproblem several times. 
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In order to illustrate this technique, it is 
convenient to consider the variant Alg3 of 
Alg2 where we do not apply folding. This way, 
each subproblem corresponds to some induced 
subgraph G[W] of the input graph. We will also 
use the standard measure though memorization 
is compatible with measure and conquer. By 
adapting the analysis of Alg2, one obtains 
the constraint c? > c"~! + c"~3 and hence 
a running time of O*(1.466”). Next, consider 
the variant Alg3mem of Alg3 where we apply 
memorization. Let L,(n) be the maximum 
number of subproblems on k nodes generated by 
Alg3menm starting from an instance with n nodes. 
A slight adaptation of the standard analysis 
shows that Lz(n) < 1.466"—*. However, since 
there are at most (2) induced subgraphs on k 
nodes and we never solve the same subproblem 
twice, one also has Lz(n) < ()- Using 
Stirling’s formula, one obtains that the two 
upper bounds are roughly equal for k = an and 
a = 0.107.... We can conclude that the running 
time of Alg3mem is in O*() “4-9 Le(n)) = 
O* (=o min{1.466"-*, (7)}) = O*(max"_, 
min{1.466"-*, (7)}) = O*(1.4660-0107)") = 
O* (1.408). The analysis can be refined [6] by 
bounding the number of connected induced sub- 
graphs with & nodes in graphs of small maximum 
degree. 
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Problem Definition 


In the maximum 2-satisfiability problem (abbre- 
viated as MAX 2-SAT), one is given a Boolean 
formula in conjunctive normal form, such that 
each clause contains at most two literals. The task 
is to find an assignment to the variables of the 
formula such that a maximum number of clauses 
are satisfied. 

MAX 2-SAT is a classic optimization problem. 
Its decision version was proved NP-complete by 
Garey, Johnson, and Stockmeyer [7], in stark 
contrast with 2-SAT which is solvable in linear 
time [2]. To get a feeling for the difficulty of 
the problem, the NP-completeness reduction is 
sketched here. One can transform any 3-SAT 
instance F into a MAX 2-SAT instance F’, by 
replacing each clause of F such as 


C= (€, V £5 V €3), 
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where £1, £2, and £3 are arbitrary literals, with the 
collection of 2-CNF clauses 


(€1), (€2), (3), (ci), (P01 V 72), (Al2 V 703), 
(71 V -€3), (C1 V ci), (€2 V ci), (€3 V ci), 


where c; is a new variable. The following are 
true: 


e If an assignment satisfies c;, then exactly 
seven of the ten clauses in the 2-CNF 
collection can be satisfied. 

e If an assignment does not satisfy c;, then 
exactly six of the ten clauses can be satisfied. 


If F is satisfiable then there is an assignment 
satisfying 7/10 of the clauses in F’, and if F is not 
satisfiable, then no assignment satisfies more than 
7/10 of the clauses in F’. Since 3-SAT reduces 
to MAX 2-SAT, it follows that MAX 2-SAT (as a 
decision problem) is NP-complete. 


Notation 


A CNF formula is represented as a set of 
clauses. 

The letter w denotes the smallest real number 
such that for all e > 0, by n matrix multipli- 
cation over a field can be performed in O(n®T*) 
field operations. Currently, it is known that w < 
2.373 [4, 16]. The field matrix product of two 
matrices A and B is denoted by A x B. 

Let A and B be matrices with entries from RU 
{oo}. The distance product of A and B (written in 
shorthand as A @ B) is the matrix C defined by 
the formula 


Cli, J] = min {Ali,k] + Blk. j)}. 


A word on m’s and n’s: in reference to graphs, 
m and n denote the number of edges and the 
number of nodes in the graph, respectively. In 
reference to CNF formulas, m and n denote the 
number of clauses and the number of variables, 
respectively. 
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Key Result 


The primary result of this entry is a procedure 
solving Max 2-Sat in O(m - 2°”/3) time. The 
method can be generalized to count the number of 
solutions to any constraint optimization problem 
with at most two variables per constraint. Indeed, 
in the same running time, one can find a Boolean 
assignment that maximizes any given degree- 
two polynomial in 1 variables [18, 19]. In this 
entry, we shall restrict attention to be Max 2- 
Sat, for simplicity. There are several other known 
exact algorithms for Max 2-Sat that are more 
effective in special cases, such as sparse instances 
[3, 8,9, 11-13, 15, 17]. The procedure described 
below is the only one known (to date) that runs in 
c” steps for a constant c < 2. 


Key Idea 


The algorithm gives a reduction from MAX 2-SAT 
to the problem MAX TRIANGLE, in which one is 
given a graph with integer weights on its nodes 
and edges, and the goal is to output a 3-cycle of 
maximum weight. At first, the existence of such 
a reduction sounds strange, as MAX TRIANGLE 
can be trivially solved in O(n?) time by trying all 
possible 3-cycles. The key is that the reduction 
exponentially increases the problem size, from 
a MAX 2-SAT instance with m clauses and n 
variables to a MAX TRIANGLE instance having 
O(22"/3) edges, O(2”/3) nodes, and weights in 
the range {—m,...,m}. 

Note that if MAX TRIANGLE required @(n>) 
time to solve, then the resulting MAX 2-SAT 
algorithm would take ©(2”) time, rendering the 
above reduction pointless. However, it turns out 
that the brute-force search of O(n*) for MAX 
TRIANGLE is not the best one can do: using fast 
matrix multiplication, there is an algorithm for 
MAX TRIANGLE that runs in O(Wn®) time on 
graphs with weights in the range {—W,..., W}. 


Main Algorithm 


First, a reduction from MAX 2-SAT to MAX TRI- 
ANGLE is described, arguing that each triangle of 
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weight K in the resulting graph is in one-to-one 
correspondence with an assignment that satisfies 
K clauses of the MAX 2-SAT instance. Let a,b 
be reals, and let Z[a, b] := [a, b] NZ. 


Lemma 1 /f MAX TRIANGLE on graphs with n 
nodes and weights in Z[—W, W] is solvable in 
O(f(W) - g(n)) time, for polynomials f and g, 
then MAX 2-SAT is solvable in O( f (m)-g(2”/3)) 
time, where m is the number of clauses and n is 
the number of variables. 


Proof Let C be a given 2-CNF formula. Assume 
without loss of generality that n is divisible by 3. 
Let F be an instance of MAX 2-SAT. Arbitrarily 
partition the n variables of F into three sets P, 
Pz, P3, each having n/3 variables. For each P;, 
make a list L; of all 2”/3 assignments to the 
variables of P;. 

Define a graph G = (V, E) with V = Ly, U 
L2U L3 and E = {(u,v)|u € Pj,v € Pj,i F 
j}. That is, G is a complete tripartite graph with 
2”/3 nodes in each part, and each node in G 
corresponds to an assignment to 1/3 variables in 
C. Weights are placed on the nodes and edges 
of G as follows. For a node v, define w(v) to 
be the number of clauses that are satisfied by the 
partial assignment denoted by v. For each edge 
{u, v}, define w({u, v}) = —W,,, where W,y is 
the number of clauses that are satisfied by both u 
and v. 

Define the weight of a triangle in G to be 
the total sum of all weights and nodes in the 
triangle. 


Claim 1 There is a one-to-one correspondence 
between the triangles of weight K in G and the 
variable assignments satisfying exactly K clauses 
in F. 


Proof Let a be a variable assignment. Then there 
exist unique nodes v; € Ly, v2 € Lo, and v3 € 
L3 such that a is precisely the concatenation of 
V1, U2, V3 as assignments. Moreover, any triple 
of nodes vy € Ly,v2 € Lo, and v3 € L3 
corresponds to an assignment. Thus, there is a 
one-to-one correspondence between triangles in 
G and assignments to F. 

The number of clauses satisfied by an assign- 
ment is exactly the weight of its corresponding 
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triangle. To see this, let Tg = {v1, v2, v3} be 
the triangle in G corresponding to assignment a. 
Then 


w(Ta) = w(v1) + w(v2) + w(v3) + w({U1, v2}) 
+ w({v2, U3}) + W({U1, U3}) 


3 
= x l{c € Flv; satisfies F’}| 


i=1 


- > l{c € Flu; and v; satisfy F}| 
ijuiFj 
= l{c € Fla satisfies F}|, 
where the last equality follows from the 
inclusion-exclusion principle. 

Notice that the number of nodes in G is 
3-2”/3, and the absolute value of any node and 
edge weight is m. Therefore, running a MAX 
TRIANGLE algorithm on G, a solution to MAX 2- 
SAT, is obtained in O( f (m)-g(3-2”/3)), which is 
O(f(m) - g(2”/3)) since g is a polynomial. This 
completes the proof of Lemma 1. 


Next, a procedure is described for finding 
a maximum triangle faster than brute-force 
search, using fast matrix multiplication. Alon, 
Galil, and Margalit [1] (following Yuval [22]) 
showed that the distance product for matrices 
with entries drawn from Z[—W,W] can be 
computed using fast matrix multiplication as a 
subroutine. 


Theorem 1 (Alon, Galil, Margalit [1]) Let A 
and B be n Xn matrices with entries from 
Z|—W, W] U {oo}. Then A ® B can be computed 
in O(Wn® logn) time. 


Proof (Sketch) One can replace oo entries in A 
and B with 2W + 1 in the following. Define 
matrices A’ and B’, where 


Ali, j] _ x3W-AlL/] B'li, j] cs xoW BES 


and x is a variable. Let C = A’ x B’. Then 


n 
Cie d= Datta 
k=1 
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The next step is to pick a number x that makes 
it easy to determine, from the sum of arbitrary 
powers of x, the largest power of x appearing 
in the sum; this largest power immediately gives 
the minimum A[i,k] + B[k, j]. Each C[i, j] is 
a polynomial in x with coefficients from Z[0, n]. 
Suppose each C[7, j] is evaluated at x = (n+ 1). 
Then each entry of C[i, 7] can be seen as an 
(n + 1)-ary number, and the position of this num- 
ber’s most significant digit gives the minimum 
Afi, k] + B[k, j]. 

In summary, A ®g B can be computed by 
constructing 


Ali, j] = (n A {ere 


Bi esi a 


in O(W logn) time per entry, computing C = 
A’ x B’ in O(n® - (W logn)) time (as the sizes 
of the entries are O(W logn)), then extracting 
the minimum from each entry of C, in O(n? - 
W logn) time. Note if the minimum for an entry 
Cii, j] is at least 2W + 1, then C[i, j] = oo. 


Using the fast distance product algorithm, one 
can solve MAX TRIANGLE faster than brute 
force. The following is based on an algorithm by 
Itai and Rodeh [10] for detecting if an unweighted 
graph has a triangle in less than n> steps. The 
result can be generalized to counting the number 
of k-cliques, for arbitrary k > 3. (To keep 
the presentation simple, the counting result is 
omitted. Concerning the k-clique result, there is 
unfortunately no asymptotic runtime benefit from 
using a k-clique algorithm instead of a triangle 
algorithm, given the current best algorithms for 
these problems.) 


Theorem 2 MAX TRIANGLE can be solved in 
O(Wn® logn), for graphs with weights drawn 
from Z|—W, W]. 


Proof First, it is shown that a weight function 
on nodes and edges can be converted into an 
equivalent weight function with weights on only 
edges. Let w be the weight function of G, and 
redefine the weights to be: 
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wu) + wv) 


w'({u, v}) = 5 


w'(u) = 0. 


+ wu, vs), 


Note the weight of a triangle is unchanged by this 
reduction. 

The next step is to use a fast distance product 
to find a maximum weight triangle in an edge- 
weighted graph of n nodes. Construe the vertex 
set of G as the set {1,...,”}. Define A to be 
the n x n matrix such that A[i, 7] = —w({i, j}) 
if there is an edge {i, 7}, and Ali,j] = ow 
otherwise. The claim is that there is a triangle 
through node i of weight at least K if and only 
if (A ® A ® A)fi,i] < —K. This is because 
(A @ A @ A)[i,i] < —K if and only if there are 
distinct 7 and k such that {7, 7}, {j,k}, {k, i} are 
edges and Afi, 7] + Aj, &k] + A[k,i] < —K,ice., 
wi, Ji) + ws ki) + wk, th) = K. 

Therefore, by finding an i such that (A ® 
A ® A)[i,i] is minimized, one obtains a node i 
contained in a maximum triangle. To obtain the 
actual triangle, check all m edges {j,k} to see if 
{i, j,k} is a triangle. 


Theorem 3 MAX 2-SAT can be solved in O(m - 
1.732”) time. 


Proof Given a set of clauses C, apply the reduc- 
tion from Lemma | to get a graph G with O(2”/?) 
nodes and weights from Z[—m,m]. Apply the 
algorithm of Theorem 2 to output a max triangle 
in G in O(m- 2@"/3 log(2"/3)) = O(m- 1.732") 
time, using the O(n?:37°) matrix multiplication 
of Coppersmith and Winograd [4]. 


Applications 


By modifying the graph construction, one can 
solve other problems in O(1.732”) time, such 
as Max Cut, Minimum Bisection, and Sparsest 
Cut. In general, any constraint optimization prob- 
lem for which each constraint has at most two 
variables can be solved faster using the above 
approach. For more details, see [18] and the 
survey by Woeginger [21]. Techniques similar to 
the above algorithm have also been used by Dorn 
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[6] to speed up dynamic programming for some 
problems on planar graphs (and in general, graphs 
of bounded branchwidth). 


Open Problems 


Improve the space usage of the above algo- 
rithm. Currently, ©(27”/3) space is needed. 
A very interesting open question is if there 
is a O(1.99") time algorithm for MAX 2-SAT 
that uses only polynomial space. This question 
would have a positive answer if one could 
find an algorithm for solving the k-CLIQUE 
problem that uses polylogarithmic space and 
n*— time for some 5 > 0 and k > 3. 

Find a faster-than-2” algorithm for MAX 2- 
SAT that does not require fast matrix multi- 
plication. The fast matrix multiplication al- 
gorithms have the unfortunate reputation of 
being impractical. 

Generalize the above algorithm to work for 
MAX k-SAT, where k is any positive integer. 
The current formulation would require one 
to give an efficient algorithm for finding a 
small hyperclique in a hypergraph. However, 
no general results are known for this problem. 
It is conjectured that for all kK > 2, MAX 
k-SAT is in 6 (2"- e#1)) time, based on 
the conjecture that matrix multiplication is in 
n2+°() time [17]. 
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Problem Definition 


The treewidth parameter intuitively measures 
whether the graph has a “treelike” structure. 
Given an undirected graph G = (V, E£), a tree 
decomposition of G is a pair (¥,T), where 
T = CU,F) isatree and ¥ = {X; |i € J} 
is a collection of subsets of V called bags 
satisfying: 


1. Vier Rav; 
2. For each edge uv of G, there is a bag Xj 
containing both endpoints, 


Exact Algorithms for Treewidth 


3. Forallv € V,theset {i e J | v € X;} induces 
a connected subtree of 7’. 


The width of a tree decomposition (7, T) is the 
size of its largest bag, minus one. The treewidth 
of G, denoted by tw(G), is the minimum width 
over all possible tree decompositions. One 
can easily observe that n-vertex graphs have 
treewidth at most n — | and that the graphs of 
treewidth at most one are exactly the forests. 
Given a graph G and a number k, the 
TREEWIDTH problem consists in deciding if 
tw(G) < k. Arnborg, Corneil, and Proskurowski 
show that the problem is NP-hard [1]. On 
the positive side, Bodlaender [2] gives an 
algorithm solving the problem in time QO Yn, 
Bouchitté and Todinca [4,5] prove that the 
problem is polynomial on classes of graphs 
with polynomially many minimal separators, 
with an algorithm based on the notion of 
potential maximal clique. This latter technique 
is also employed by several exact, moderately 
exponential algorithms for TREEWIDTH. 


Key Results 


TREEWIDTH can be solved in O*(2") time 
by adapting the O(n*) algorithm of Arnborg 
et al. [1] or the Held-Karp technique initially 
designed for the TRAVELING SALESMAN 
problem [12]. (We use here the O* notation 
that suppresses polynomial factors.) Fomin 
et al. [9] break this “natural” 2” barrier with 
an algorithm running in time O*(1.8135"), 
using the same space complexity. Bodlaender 
et al. [3] present a polynomial-space algorithm 
running in ©*(2.9512”) time. A major 
improvement for both results is due to Fomin and 
Villanger [8]. 


Theorem 1 ((8]) The TREEWIDTH problem can 
be solved in O* (1.7549") time using exponential 
space and in O* (2.6151") time using polynomial 
space. 


These algorithms use an alternative definition for 
treewidth. A graph H = (V,£) is chordal or 
triangulated if it has no induced cycle with four 
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or more vertices. It is well-known that a chordal 
graph has tree decompositions whose bags are 
exactly its maximal cliques. Given an arbitrary 
graph G = (V,£), a chordal graph H = 
(V, F) on the same vertex set is called a minimal 
triangulation of G if H contains G as a subgraph 
and no chordal subgraph of H contains G. The 
treewidth of G can be defined as the minimum 
clique size of H minus one, over all minimal 
triangulations H of G. 

A vertex subset S of graph G is a minimal 
separator if there are two distinct components 
G[C] and G[D] of the graph G[V \ S] such that 
Ng(C) = NGg(D) = S (Ng(C) denotes the 
neighborhood of C in graph G). 

A vertex subset 2 of G is a potential maximal 
clique if there exists some minimal triangulation 
H of G such that §2 induces a maximal clique in 
H. Potential maximal cliques are characterized 
as follows [4]: $2 is a potential maximal clique 
of G if and only if (i) for each pair of vertices 
u,v € 82, u and v are adjacent or see a same 
component of G[V \ 2], and (ii) no component 
of G[V \ Q] sees the whole set £2. As an example, 
when G is a cycle, its minimal separators are 
exactly the pairs of nonadjacent vertices, and the 
potential maximal cliques are exactly the triples 
of vertices. 

A block is pair (S, C) such that S is a minimal 
separator of G and G[C] is acomponent of G[V \ 
S]. Denote by Rg(S, C) the graph obtained from 
G[S U C] by turning S into a clique, ie., by 
adding all missing edges with both endpoints 
in S. The treewidth of G can be obtained as 
follows: 


tw(G) = fo (max tw(R(S, c)) (1) 


where the minimum is taken over all minimal 
separators S and the maximum is taken over all 
connected components G[C] of G[V \ S]. 

All quantities tw(Rg(S, C)) can be computed 
by dynamic programming over blocks (S, C), by 
increasing the size of S U C. We only consider 
here blocks (S$, C) such that S = Ng(C) (see [4] 
for more details). 
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tw(Rg(S,C)) 


= scMtit ic (max (121-1. tw(RG(S;, cy) 
(2) 


where the minimum is taken over all potential 
maximal cliques 2 with S C §2 C SUC and the 
maximum is taken over all pairs (S;, C;), where 
G[C;] is a component of G[C \ 2] and S; = 
Nog(C;). Let 7g denote the set of all potential 
maximal cliques of graph G. It was pointed in [9] 
that the number of triples (S, (2, C) like in Eq. 2 
is at most n|I7g|, which proves that TREEWIDTH 
can be computed in O*(|/7g|) time and space, if 
ITg is given in the input. 

Therefore, it remains to give a good upper 
bound for the number |/7g| of potential maximal 
cliques of G, together with efficient algorithms 
for listing these objects. Based on the previously 
mentioned characterization of potential maximal 
cliques, Kratsch et al. provide an algorithm 
listing them in time O*(1.8135"). Fomin and 
Villanger [8] improve this result, thanks to the 
following combinatorial theorem: 


Theorem 2 ((8]) Let G = (V,E) be an n- 
vertex graph, let v be a vertex of G, and b, f 
be two integers. The number of vertex subsets B 
containing v such that G[B] is connected, |B| = 
b + 1, and |Ng(B)| = f is at most ee): 


The elegant inductive proof also leads to an 
Or )) time algorithm listing all such sets 
B. Eventually, the potential maximal cliques of 
an input graph G can be listed in O*(1.7549") 
time [8]. This bound was further improved to 
O* (1.7347") in [7]. 

In order to obtain polynomial-space algo- 
rithms for TREEWIDTH, Bodlaender et al. [3] 
provide a relatively simple divide-and-conquer 
algorithm, based on the Held-Karp approach, 
running in O*(4”) time. They also observe that 
Eq. 1 can be used for recursive, polynomial- 
space algorithms, by replacing the minimal 
separators S by balanced separators, in the sense 
that each component of G[V \ S] contains at 
most n/2 vertices. This leads to polynomial- 
space algorithm with O*(2.9512”) running time. 
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Fomin and Villanger [8] restrict the balanced 
separators to a subset of the potential maximal 
cliques, and based on Theorem 2 they obtain, 
still using polynomial space, a running time of 
O* (2.6151"). 

We refer to the book of Fomin and Kratsch [6] 
for more details on the TREEWIDTH problem and 
more generally on exact algorithms. 


Applications 


Exact algorithms based on potential maximal 
cliques have been extended to many other prob- 
lems like FEEDBACK VERTEX SET, LONGEST 
INDUCED PATH, of MAXIMUM INDUCED SUB- 
GRAPH WITH A FORBIDDEN PLANAR MINOR. 
More generally, for any constant ¢ and any prop- 
erty P definable in counting monadic second- 
order logic, consider the problem of finding, in an 
arbitrary graph G, a maximum-size induced sub- 
graph G[F'] of treewidth at most ¢ and with prop- 
erty P. This generic problem can be solved in 
O* (|I7q|) time, if 7g is part of the input [7, 10]. 
Therefore, there is an algorithm in O*(1.7347") 
time for the problem, significantly improving the 
O* (2”) time for exhaustive search. 


Open Problems 


Currently, the best known upper bound on the 
number of potential maximal cliques in n-vertex 
graphs is of O*(1.7347”") and does not seem 
to be tight [7]. Simple examples show that 
this bound is of at least 3”/3 ~ 1.4425". A 
challenging question is to find a tight upper 
bound and efficient algorithms enumerating 
all potential maximal cliques of arbitrary 
graphs. 


Experimental Results 


Several experimental results are reported in [3], 
especially on an “engineered” version of the 
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O*(2") time and space algorithm based on the 
Held-Karp approach. This dynamic programming 
algorithm is compared with the branch and bound 
approach of Gogate and Dechter [11] on in- 
stances of up to 50 vertices. The results are rel- 
atively similar. Bodlaender et al. [3] also observe 
that the polynomial-space algorithms become too 
slow even for small instances. 
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Problem Definition 


We focus on the following question: how an 
assumption on the sparsity of an input graph, such 
as bounded (average) degree, can help in design- 
ing exact (exponential-time) algorithms for NP- 
hard problems. The following classic problems 
are studied: 


Traveling Salesman Problem Find a minimum- 
length Hamiltonian cycle in an input graph 
with edge weights. 

Chromatic Number Find a minimum number k 
for which the vertices of an input graph can 
be colored with k colors such that no two 
adjacent vertices receive the same color. 

Counting Perfect Matchings Find the number 
of perfect matchings in an input graph. 


Key Results 


The classic algorithms of Bellman [1] and Held 
and Karp [10] for traveling salesman problem 
run in 2”n° time for n-vertex graphs. Using 
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the inclusion-exclusion principle, the chromatic 
number of an input graph can be determined 
within the same running time bound [4]. Finally, 
as long as counting perfect matchings is con- 
cerned, a half-century-old 2”/2nO-time algo- 
rithm of Ryser for bipartite graphs [12] has only 
recently been transferred to arbitrary graphs by 
Bjorklund [2]. 

In all three aforementioned cases, it is widely 
open whether the 2” or 2”/ factor in the running 
time bound can be improved. In 2008, Bjorklund, 
Husfeldt, Kaski, and Koivisto [5,6] observed that 
such an improvement can be made if we restrict 
ourselves to bounded degree graphs. Further 
work of Cygan and Pilipczuk [8] and Golovnev, 
Kulikov, and Mihajlin [9] extended these results 
to graphs of bounded average degree. 


Bounded Degree Graphs 


Traveling Salesman Problem 
Let us present the approach of Bjorklund, Hus- 
feldt, Kaski, and Koivisto on the example of trav- 
eling salesman problem. Assume we are given 
an n-vertex edge-weighted graph G. The classic 
dynamic programming algorithm picks a root 
vertex r and then, for every vertex v € V(G) 
and every set X C V(G) containing v and r, 
computes TX, v]: the minimum possible length 
of a path in G with vertex set X that starts in r 
and ends in v. The running time bound 27nO is 
dominated by the number of choices of the set X. 

The simple, but crucial, observation is as fol- 
lows: if a set X satisfies X N Neg[u] = {u} for 
some u € V(G) \ {r}, then the values TX, v] are 
essentially useless, as no path starting in r can 
visit the vertex u without visiting any neighbor 
of u (here Ng[u] = Neu) U {u} stands for 
the closed neighborhood of u). Let us call a set 
X C V(G) useful if X N Ne[u] ¢ {u} for every 
u € V(G) \ {r}. The argumentation so far proved 
that we may skip the computation of TX, v] for 
all sets X that are not useful. The natural question 
is how many different useful sets may exist in an 
n-vertex graph? 

Consider the following greedy procedure: ini- 
tiate A = @ and, as long as there exists a vertex 
u € V(G) such that Ne[u] N Ne[A] = 9, 
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add an arbitrarily chosen vertex u to the set A. 
By construction, the set A satisfies the following 
property: for every u1, U2 € A, we have Ng [ui] N 
Noe [u2] = @. An interesting fact is that |A| = 
92(n) for graphs of bounded degree: whenever 
we insert a vertex u into the set A, we cannot 
later insert into A any neighbor of wu nor any 
neighbor of a neighbor of u. However, if the 
maximum degree of G is bounded by d, then 
there are at most d neighbors of u, and every 
such neighbor has at most d—1 further neighbors. 
Consequently, when we insert a vertex u into A, 
we prohibit at most d + d(d — 1) = d* other 


(1 qlNelal _ ) 


ucA 
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vertices from being inserted into A, and |A| > 
n/(1+d7?). 

It is easy to adjust the above procedure such 
that the root vertex r does not belong to A. 
Observe that for every useful set X and every u € 
A, we have X N Ng[u] # {u} and, furthermore, 
the sets Ng[u] for u € A are pairwise disjoint. We 
can think of choosing a useful set X as follows: 
first, for every u € A, we choose the intersection 
X AM Ne[u] (there are 2!%elll — | choices, as the 
choice {u} is forbidden), and, second, we choose 
the set X \ Ng[A]. Hence, the number of useful 
sets is bounded by 


.2n-|NeGlAll — 9”. I] (1 = a AMctl) aor. IIa _ g-4-1) 


ucA 


uceA 


— gn 7 (1 _ a i < Qn . (1 _ 7 aad rar 


= (2. To)”, 


Thus, for every degree bound d, there exists 
a constant ¢g > O such that the number of useful 
sets in an n-vertex graph of maximum degree d is 
bounded by (2—¢g)”, yielding a (2—eg)"nO- 
time algorithm for traveling salesman problem. A 
better dependency on d in the formula for eg can 
be obtained using a projection theorem of Chung, 
Frankl, Graham, and Shearer [7] (see [5]). 


Chromatic Number 

A similar reasoning can be performed for the 
problem of determining the chromatic number 
of an input graph. Here, it is useful to rephrase 
the problem as follows: find a minimum number 
k such that the vertex set of an input graph 
can be covered by k maximal independent sets; 
note that we do not insist that the independent 
sets are disjoint. Observe that if X is a set of 
vertices covered by one or more such maximal 
independent sets, we have X N Neg[u] # OM for 
every u € V(G), as otherwise the vertex u should 
have been included into one of the covering sets. 
Hence, we can call a set X C V(G) useful 
if it intersects every closed neighborhood in G, 


and we obtain again a (2 — eg)” bound on the 
number of useful sets. An important contribution 
of Bjorklund, Husfeldt, Kaski, and Koivisto [5] 
can be summarized as follows: using the fact that 
the useful sets are upward-closed (any superset 
of a useful set is useful as well), we can trim the 
fast subset convolution algorithm of [3] to con- 
sider useful sets only. Consequently, we obtain a 
(2—eq)"n©) time algorithm for computing the 
chromatic number of an input graph of maximum 
degree bounded by d. 


Bounded Average Degree 


Generalizing Algorithms for Bounded Degree 
Graphs 

The above approach for traveling salesman prob- 
lem has been generalized to graphs of bounded 
average degree by Cygan and Pilipczuk [8] using 
the following observation. Assume a graph G 
has n vertices and average degree bounded by d. 
Then, a simple Markov-type inequality implies 
that for every ¢ > 1 there are at most n/C vertices 
of degree larger than €d. However, this bound 
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cannot be tight for all values of ¢ at once, and 
one can prove the following: if we want at most 
n/(a¢) vertices of degree larger than €d for some 
a > I, then we can always find such a constant ¢ 
of order roughly exponential in a. 

An appropriate choice of a and the corre- 
sponding value of ¢ allow us to partition the 
vertex set of an input graph into a large part 
of bounded degree and a very small part of 
unbounded degree. The extra multiplicative gap 
of a in the size bound allows us to hide the cost of 
extensive branching on the part with unbounded 
degree in the gains obtained by considering only 
(appropriately defined) useful sets in the bounded 
degree part. 

With this line of reasoning, Cygan and 
Pilipczuk [8] showed that for every degree 
bound d, there exists a constant ¢g > O such 
that traveling salesman problem in graphs of 
bounded average degree by d can be solved 
in (2 — eg)"n™ time. It should be noted 
that the constant ¢g depends here doubly 
exponentially on d, as opposed to  single- 
exponential dependency in the works for bounded 
degree graphs. 

Furthermore, Cygan and Pilipczuk showed 
how to express the problem of counting perfect 
matchings in an n-vertex graph as a specific 
variant of a problem of counting Hamiltonian 
cycles in an n/2-vertex graph. This reduction not 
only gives a simpler 2”/2nO)-time algorithm for 
counting perfect matchings, as compared to the 
original algorithm of Bjorklund [2], but since the 
reduction does not increase the number of edges 
in a graph, it also provides a (2 — eg)"/2nOM- 
time algorithm in the case of bounded average 
degree. 

In a subsequent work, Golovnev, Kulikov, 
and Mihajlin [9] showed how to use the 
aforementioned multiplicative gap of a to 
obtain a (2 — eq)"nO time algorithm for 
computing the chromatic number of a graph 
with average degree bounded by d. Furthermore, 
they expressed all previous algorithms as the 
task of determining one coefficient in a carefully 
chosen polynomial, obtaining polynomial space 
complexity without any significant loss in time 
complexity. 
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Counting Perfect Matchings in Bipartite 

Graphs 

A somewhat different line of research concerns 
counting perfect matchings in bipartite graphs. 
Here, a 2”/?nO time algorithm is known for 
several decades [12]. Cygan and Pilipczuk pre- 
sented a very simple 2{!—!/G-554))n/2y OW) time 
algorithm for this problem in graphs of average 
degree at most d, improving upon the previous 
works of Servedio and Wan [13] and Izumi and 
Wadayama [11]. Furthermore, this result general- 
izes to the problem of computing the permanent 
of a matrix over an arbitrary commutative ring 
with the number of nonzero entries linear in the 
dimension of the matrix. 
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Problem Definition 


A k-coloring of a graph G = (V, E) assigns one 
of k colors to each vertex such that neighboring 
vertices have different colors. This is sometimes 
called vertex coloring. 


«= $ 
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The smallest integer k for which the graph 
G admits a k-coloring is denoted x(G) and 
called the chromatic number. The number of k- 
colorings of G is denoted P(G; k) and called the 
chromatic polynomial. 


Key Results 


The central observation is that y(G) and P(G; k) 
can be expressed by an inclusion-exclusion for- 
mula whose terms are determined by the num- 
ber of independent sets of induced subgraphs of 
G. For X C V, let s(X) denote the number 
of nonempty independent vertex subsets disjoint 
from X, and let s;(X) denote the number of ways 
to choose r nonempty independent vertex subsets 
Sj,..., 5; (possibly overlapping and with repeti- 
tions), all disjoint from X, such that |S$,| + ---+ 
|S+| = |V]. 


Theorem 1 ({1]) Let G be a graph on n ver- 
tices. 


k: > (-1)*!s(x)* > 0 


x(G)= min 
ke{l,. xov 


wi 


2. Fork =1,...,n, 
k 


PG ky => (*) 


r=1 


dE CD s-(X) 


XCV 


The time needed to evaluate these expressions 
is dominated by the 2” evaluations of s(X) 
and s,(X), respectively. These values can be 
precomputed in time and space within a poly- 
nomial factor of 2” because they satisfy 


if X = JV, 


s(X U {v}) + 5 (X U {vu} U M(v)) + 1, for v ¢ X, 
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where N(v) are the neighbors of v in G. 
Alternatively, the values can be computed us- 
ing exponential-time, polynomial-space algo- 
rithms from the literature. 


This leads to the following bounds: 


Theorem 2 ([3]) For a graph G on n vertices, 
x(G) and P(G;k) can be computed in 


1. Time and space 2"n0™, 
2. Time O(2.2461”") and polynomial space 


The space requirement can be reduced to 


O(1.292") [4]. 


The techniques generalize to arbitrary families 
of subsets over a universe of size n, provided 
membership in the family can be decided in poly- 
nomial time [3,4], and to the Tutte polynomial 
and the Potts model [2]. 


Applications 


In addition to being a fundamental problem in 
combinatorial optimization, graph coloring also 
arises in many applications, including register 
allocation and scheduling. 
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Problem Definition 


Many of the most important known quantum al- 
gorithms operate in the query complexity model. 
In the simplest variant of this model, the goal 
is to compute some Boolean function of n input 
bits by making the minimal number of queries 
to the bits. All other resources (such as time and 
space) are considered to be free. In the model of 
exact quantum query complexity, one insists that 
the algorithm succeeds with certainty on every 
allowed input. The aim is then to find quan- 
tum algorithms which satisfy this constraint and 
still outperform any possible classical algorithm. 
This can be a challenging task, as achieving a 
probability of error equal to zero requires deli- 
cate cancellations between the amplitudes in the 
quantum algorithm. Nevertheless, efficient exact 
quantum algorithms are now known for certain 
functions. 

Some basic Boolean functions which we will 
consider below are: 


© Parity”: f(x1,...,X%n) = X1 BX2 B+: PX. 
* Thresholdf: f(%1,...,%n) = lif |x| = k, 
and f(x) = 0 otherwise, where |x| := 0; x; 
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is the Hamming weight of x. The special case 
k = n/2 is called the majority function. 

¢ Exact: f(x1,...,%n) = 1 if |x| = k, and 
F(x) = 0 otherwise. 

¢ NE (“not-all-equal’’) on 3 bits: f(x1, X2,x3) = 
Oif xy = x2 = x3, and f(x1,x2,x3) = 1 
otherwise. 


Key Results 


Early Results 
One of the earliest results in quantum computa- 
tion was that the parity of 2 bits can be computed 
with certainty using only | quantum query [6], 
implying that Parity” can be computed using 
[n/2] quantum queries. By contrast, any clas- 
sical algorithm which computes this function 
must make 7 queries. The quantum algorithm for 
Parity” can be used as a subroutine to obtain 
speedups over classical computation for other 
problems. For example, based on this algorithm 
the majority function on n bits can be computed 
exactly using n + 1 — w(m) quantum queries, 
where w(n) is the number of Is in the binary 
expansion of 7 [8]; this result has recently been 
improved (see below). 

If the function to be computed is partial, i.e., 
some possible inputs are disallowed, the separa- 
tion between exact quantum and classical query 


NE@(x1,...,%3a) 
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complexity can be exponential. For example, in 
the Deutsch-Jozsa problem we are given query 
access to an n-bit string x (with n even) such that 
either all the bits of x are equal or exactly half 
of them are equal to 1. Our task is to determine 
which is the case. Any exact classical algorithm 
must make at least n/2 + 1 queries to bits of 
x to solve this problem, but it can be solved 
with only one quantum query [7]. An exponential 
separation is even known between exact quantum 
and bounded-error classical query complexity for 
a different partial function [5]. 


Recent Developments 

For some years, the best known separation be- 
tween exact quantum and classical query com- 
plexity of a total Boolean function (i.e., a function 
fo: {0,1}” — {0,1} with all possible n- 
bit strings allowed as input) was the factor of 
2 discussed above. However, recently the first 
example has been presented of an exact quantum 
algorithm for a family of total Boolean func- 
tions which achieves a lower asymptotic query 
complexity than the best possible classical algo- 
rithm [1]. 

The family of functions used can be sum- 
marized as a “not-all-equal tree of depth d.” 
It is based around the recursive use of the NE 
function. Define the function NE?° (x1) = x, and 
then ford > 0 


= NE(NE2~! (x1, Para: ,X3d-1), NEO Gigg-ts.4; ee /Xyde-1), NEO Gaga}, see ,X3a)). 


Then the following separation is known: 


Theorem 1 (Ambainis [1]) There is an exact 
quantum algorithm which computes NE? using 
O(2.593...2) queries. Any classical algorithm 
which computes NE? must make Q(3“) queries, 
even if it is allowed probability of failure 1/3. 


In addition, Theorem | implies the first known 
asymptotic separation between exact quantum 
and classical communication complexity for a to- 
tal function. Improvements over the best possible 


classical algorithms are also known for the other 
basic Boolean functions previously mentioned. 


Theorem 2 (Ambainis, Iraids, and Smotrovs 
[2]) There is an exact quantum algorithm which 
computes Exact; using max{k,n — k} queries 
and an exact quantum algorithm which computes 
Threshold; using max{k,n—k + 1} queries. Both 
of these complexities are optimal. 


By contrast, it is easy to see that any exact 
classical algorithm for these functions must make 
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n queries. An optimal exact quantum algorithm 
for the special case Exact had already been 
found prior to this, in work which also gave 
optimal exact quantum query algorithms for all 
Boolean functions on up to 3 bits [9]. 


Methods 

We briefly describe the main ingredients of the 
efficient quantum algorithm for NE? [1]. The 
basic idea is to fix some small do, start with an 
exact quantum algorithm which computes NE 
using fewer queries than the best possible clas- 
sical algorithm, and then amplify the separation 
by using the algorithm recursively. A difficulty 
with this approach is that the standard approach 
for using a quantum algorithm recursively in- 
curs a factor of 2 penalty in the number of 
queries with each recursive call. This factor of 2 
is required to “uncompute” information left over 
after the algorithm has completed. Therefore, a 
query complexity separation by a factor of 2 or 
less does not immediately give an asymptotic 
separation. 

This problem can be addressed by introducing 
the notion of p-computation. Let p € [—1, 1]. 
A quantum algorithm A is said to p-compute 
a function f(x1,...,%,) if, for some state 


|Wetart): 

¢ Whenever f(x1,....Xn) = 0, AlWetat) = 
| Wstart) - 

¢ Whenever f(x1,....%n) = 1, AlWetat) = 


P\Wstan) + ¥ 1 — p?|W) for some |y), which 
may depend on x, such that (v|Wetart) = 0. 


It can be shown that if there exists an algorithm 
which p-computes some function f for some 
p < 0, there exists an exact quantum algorithm 
which computes f using the same number of 
queries. Further, if an algorithm (—1)-computes 
some function f, the same algorithm can im- 
mediately be used recursively, without needing 
any additional queries at each level of recursion. 
Thus, to obtain an asymptotic quantum-classical 
separation for NE%, it suffices to obtain an algo- 
rithm which (—1)-computes NE“ using strictly 
fewer than 34 queries, for some do. 


697 


The NE? problem also behaves particularly 
well with respect to p-computation for general 
values of p: 


Lemma 1 /f there is an algorithm A which p- 
computes NE@—! using k queries, there is an 
algorithm A’ which p'-computes NE4 with 2k 
queries, for p' = 1— 4(1 — p)?/9. 


This lemma allows algorithms for NE?~! to 
be lifted to algorithms for NEZ%, at the expense of 
making the value of p worse. Nevertheless, given 
that it is easy to write down an algorithm which 
(—1)-computes NE® using one query, the lemma 
is sufficient to obtain an exact quantum algorithm 
for NE? using 4 queries. This is already enough 
to prove an asymptotic quantum-classical separa- 
tion, but this separation can be improved using 
the following lemma (a corollary of a variant of 
amplitude amplification): 


Lemma 2 /f there is an algorithm A which p- 
computes NE? using k queries, there is an al- 
gorithn A! which p'-computes NE% with 2k 
queries, for p' = 2p? — 1. 


Interleaving Lemmas | and 2 allows one to 
derive an algorithm which (—1)-computes NE® 
using 2,048 queries, which implies an exact quan- 
tum algorithm for NE? using O(2,048¢/8) = 
O(2.593 .. .2) queries. 


Experimental Results 


It is a difficult task to design exact quantum query 
algorithms, even for small functions, as these 
algorithms require precise cancellations between 
amplitudes. One way to gain numerical evidence 
for what the exact quantum query complexity of 
a function should be is to use the formulation 
of quantum query complexity as a semidefinite 
programming (SDP) problem [4]. This allows 
one to estimate the optimal success probability 
of any quantum algorithm using a given num- 
ber of queries to compute a given function. If 
this success probability is very close to 1, this 
gives numerical evidence that there exists an 
exact quantum algorithm using that number of 
queries. 


698 


This approach has been applied for all Boolean 
functions on up to 4 bits, giving strong evidence 
that the only function on 4 bits which requires 
4 quantum queries is the AND function and 
functions equivalent to it [9]. This has led to the 
conjecture that, for any n, the only function 
on n bits which requires m quantum queries 
to be computed exactly is the AND function 
and functions equivalent to it. This would be 
an interesting contrast with the classical case 
where most functions on 7 bits require n queries. 
This conjecture has recently been proven for 
various special cases: symmetric functions, 
monotone functions, and functions with formula 
size n [3]. 
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Problem Definition 


From the earliest works on tile self-assembly, 
abstract theoretical models and experimental im- 
plementations have been linked. In 1998, in ad- 
dition to developing the abstract and kinetic Tile 
Assembly Models (aTAM and kTAM) [14], Win- 
free et al. demonstrated the use of DNA tiles 
to construct a simple, periodic lattice [16]. Pe- 
riodic lattices and “uniquely addressed” assem- 
blies, where each tile type appears once in each 
assembly, have been widely studied, with systems 
employing up to a thousand unique tiles in three 
dimensions [8, 13]. While these systems provide 
insight into the behavior of DNA tile systems, al- 
gorithmic tile systems of more theoretical interest 
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pose specific challenges for experimental imple- 
mentation. 

In the aTAM, abstract tiles attach individually 
to empty lattice sites if bonds of a sufficient total 
strength b (at least abstract “temperature” 7) can 
be made, and once attached, never detach. Ex- 
perimentally, free tiles and assemblies of bound 
tiles are in solution. Tiles have short single- 
stranded “sticky ends” regions that form bonds 
with complementary regions on other tiles. Tiles 
attach to assemblies at rates dependent only upon 
their concentrations, regardless of the strength 
of bonds that can be made. Once attached, tiles 
can detach and do so at a rate that is exponen- 
tially dependent upon the total strength of the 
bonds [6]. Thus, for a tile ¢; with concentration 
[t;] binding by a total abstract bond strength b, 
we have attachment and detachment rates of 
—bAGS,/RT +a 


rp =kglti] rh =kyze (1) 
where k ¢ is an experimentally determined rate 
constant, @ is a constant binding free energy 
change (e.g., from entropic considerations), AG?, 
is the free energy change of a single-strength 
bond, and 7 is the (physical) temperature. Us- 
ing the substitutions [t;]] = e~%~**, G,. = 
—AG?,/RT, and kp = ke, these can be 


simplified to 


where G,, is a (positive) unitless free energy for a 
single-strength bond (larger values correspond to 
stronger bonds), Ginc is a free energy analogue of 
concentration (larger values correspond to lower 
concentrations), and k f is an adjusted rate con- 
stant. 

These rates are the basis of the kinetic Tile 
Assembly Model (kTAM), which is widely used 
as a physical model of tile assembly [14]. Tiles 
that attach faster than they detach will tend to 
remain attached and allow further growth: for 
example, if Gy. < 2Gy5e, tile attachments by 
b > 2 will be favorable. Tiles that detach faster 
than they attach will tend to remain detached and 
not allow further growth. Since Gj, is dependent 
upon tile concentration, and G,. is dependent 
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upon physical temperature (lower temperatures 
result in larger G;. values), the attachment and 
detachment rates can be tuned such that attach- 
ment is slightly more favorable than detachment 
for tiles attaching by a certain total bond strength 
and less favorable for less strongly bound tiles. In 
this way, in the limit of low concentrations and 
slow growth, the kTAM approximates the aTAM 
at a given abstract temperature t. When moving 
away from this limit and toward experimentally 
feasible conditions, however, the kKTAM provides 
insight into many of the challenges faced in 
experimental implementation of algorithmic tile 
assembly: 


Growth errors: While tile assembly in the 
aTAM is error-free, tiles can attach in erroneous 
locations in experiments. Even ignoring the 
possibility of lattice defects, malformed tiles, and 
other experimental peculiarities, errors can arise 
in the kTAM via tiles that attach by less than the 
required bond strength (e.g., one single-strength 
bond for a t = 2 system) and are then “frozen” 
in place by further attachments [4]. As the further 
growth of algorithmic systems depends on the 
tiles already present in an assembly, a single 
erroneously incorporated tile can propagate 
undesired growth via further, valid attachments. 
These errors can arise both in growth sites where 
another tile could attach correctly (“growth 
errors”) and lattice sites where no correct tile 
could attach (“facet nucleation errors”’) [3, 14]. 


Seeding: Tile assembly in the aTAM is usually 
initiated from a designated “seed” tile. In solu- 
tion, however, tiles are free to attach to all other 
tiles and can form assemblies without starting 
from a seed, even if this requires several unfavor- 
able attachments to form a stable structure that 
can allow further growth. Depending upon the tile 
system, these “spuriously nucleated” structures 
can potentially form easily. For example, a T = 
2 system with boundaries of identical tiles that 
attach by double bonds on both sides can readily 
form long strings of boundary tiles [10, 11]. 


Tile depletion: As free tiles in solution are incor- 
porated into assemblies, their concentrations are 
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correspondingly reduced. This depletion lowers 
the attachment rates for those tiles and in turn 
changes the favorability of growth. If different 
tile types are incorporated in different quantities, 
their attachment rates will become unequal, and 
at some point in assembly, attachment by two 
single-strength bonds may be favorable for one 
tile type and unfavorable for another. 


Tile design: While theoretical constructions 
may employ an arbitrary number of sticky 
ends types, this number is limited by tile 
designs in practice. Most tiles use short single- 
stranded DNA regions of 5—10 nucleotides (nt), 
limiting the number of possible sticky ends to 
4°—41° at best. However, since partial bonds 
can form between subsequences of the sticky 
ends, sequences with sufficient orthogonality 
are required, and since DNA binding strength 
is sequence dependent, sequences with similar 
binding energies are required [5]. Both of these 
effects place considerably more stringent limits 
on the number of sticky ends and change the 
behavior of experimental systems. 


Key Results 


Winfree and Bekbolatov developed a tileset 
transformation, “uniform proofreading,’ that 
reduced per-site growth error rates from rey ~*~ 
mes (where m is the number of possible 
errors) to ~% me~*@s by scaling each tile into 
a K x K block of individually attaching tiles 
with unique internal bonds [15]. However, this 
transformation did not reduce facet nucleation 
errors. Chen and Goel later created a modified 
transformation, “snaked proofreading,’ that 
reduced both growth and facet nucleation errors 
by changing the strengths of the internal bonds 
used [3]. These and other proofreading methods 
have the potential to drastically reduce error rates 
in experimental systems. 

Schulman et al. analyzed tile system nucle- 
ation through the consideration of “critical nu- 
clei,” tile assemblies where melting and further 
growth are equally favorable, and showed that 
by ensuring a sufficient number of unfavorable 
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attachments would be required for a critical nu- 
cleus to form, the rate of spurious nucleation can 
be kept arbitrarily low [11]. Using this analysis, 
Schulman et al. constructed the “zigzag” ribbon 
system, which forms a ribbon where each row 
must assemble completely before the next can 
begin growth, as an example of a system where 
spurious nucleation can be made arbitrarily low 
by increasing ribbon width. To nucleate desired 
structures, this system makes use of a large, 
preformed seed structure to allow the growth of 
the first ribbon row. 

Schulman et al. also devised a “constant- 
temperature” growth technique where the 
concentrations of assemblies, controlled by the 
concentration of initial seeds in a nucleation- 
controlled system, are kept small enough in 
comparison to the concentrations of free tiles 
that growth does not significantly deplete tile 
concentrations, which thus remain approximately 
constant [12]. After growth is completed, the 
remaining free tiles are “deactivated” by adding 
an excess of DNA strands complementary to 
specific sticky ends sequences. 

In analyzing the effects of DNA sequences 
on tile assembly, Evans and Winfree showed an 
exponential increase of error rates in the kTAM 
for partial binding between different sticky ends 
sequences and for differing sequence-dependent 
binding energies and developed algorithms for 
sequence design and assignment to reduce these 
effects [5]. With reasonable design constraints, 
their algorithms suggested limits of around 80 
sticky ends types for tiles using 5 nt sticky ends 
and around 360 for tiles using 10 nt sticky ends 
before significant sequence effects begin to be- 
come unavoidable and must be incorporated into 
tile system design. 


Experimental Results 


While numerous designs exist for tile structures, 
experimental implementations have usually used 
either double-crossover (DX) tiles with 5 or 
6 nt sticky ends [16] or single-stranded tiles 
(SST) with 10 and 11 nt sticky ends [17]. SSTs 
potentially offer a significantly larger sequence 
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Experimental Implementation of Tile Assembly, 
Fig. 1 Experimental results for algorithmic tile assembly. 
(a) and (b) show the Rothemund et al. XOR system’s DX 
tiles and resulting structures, with (b) illustrating the high 
error rates and seeding problems of the system [9]. (c) 
shows the Fujibayashi et al. fixed-width XOR ribbon [7], 


space and have been employed in large, non- 
algorithmic systems [8, 13] but have not yet been 
used for complex algorithmic systems. 

Early experiments in algorithmic tile assembly 
using DX tiles did not employ any of the 
key results discussed above. Rothemund et al. 
implemented a simple XOR system of four 
logical tiles (eight tiles were needed owing to 
structural considerations), using DNA hairpins on 
“one-valued” tiles as labels [9] and flexible, one- 
dimensional seeds (Fig. 1a,b). While assemblies 
grew, and Sierpinski triangle patterns were 
visible, error rates were between 1 and 10% 
per tile. Barish et al. implemented more complex 
bit-copying and binary counting systems in a 
similar way, finding per-tile error rates of around 
10% [1]. 


while (d) shows the Barish et al. binary counter ribbon 
with partial 2 x 2 proofreading [2]; the rectangular struc- 
tures on the left of both systems are preformed DNA 
origami seeds. (e) shows an example bit-copying ribbon 
from Schulman et al. [12] 


More recently, Fujibayashi et al. used rigid 
DNA origami structures to serve as seeds for 
the growth of a fixed-width XOR ribbon system 
and, in doing so, reduced error rates to 1.4% 
per tile without incorporating proofreading [7] 
(Fig. lc). This seeding mechanism was also used 
by Barish et al. to seed zigzag bit-copying and 
binary counting ribbon systems that implemented 
2 x 2 uniform proofreading [2]. With nucleation 
control and proofreading, these systems resulted 
in dramatically reduced error rates of 0.26 % per 
proofreading block for copying and 4.1 % for the 
more algorithmically complex binary counting, 
which only partially implemented uniform proof- 
reading (Fig. 1d). 

A similar bit-copying ribbon was_ later 
implemented by Schulman et al., with the 
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addition of the constant-temperature, constant- 
concentration growth method and the use of 
biotin-streptavidin labels rather than DNA 
hairpins. The result was a decrease in error 
rates by almost a factor of ten to 0.034% 
per block [12] (Fig. le). At this error rate, 
structures of around 2,500 error-free blocks, 
or 10,000 individual tiles, could be grown 
with reasonable yields, suggesting that with 
the incorporation of proofreading, nucleation 
control and _ constant-concentration growth 
methods, low-error experimental implementa- 
tions of increasingly complex algorithmic tile 
systems may be feasible up to sequence space 
limitations. 
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Problem Definition 


Experimental analysis of algorithms describes 
not a specific algorithmic problem, but rather 
an approach to algorithm design and analysis. It 
complements, and forms a bridge between, tra- 
ditional theoretical analysis, and the application- 
driven methodology used in empirical analysis. 

The traditional theoretical approach to algo- 
rithm analysis defines algorithm efficiency in 
terms of counts of dominant operations, under 
some abstract model of computation such as 
a RAM; the input model is typically either worst- 
case or average-case. Theoretical results are usu- 
ally expressed in terms of asymptotic bounds 
on the function relating input size to number of 
dominant operations performed. 

This contrasts with the tradition of empirical 
analysis that has developed primarily in fields 
such as operations research, scientific computing, 
and artificial intelligence. In this tradition, the 
efficiency of implemented programs is typically 
evaluated according to CPU or wall-clock times; 
inputs are drawn from real-world applications or 
collections of benchmark test sets, and experi- 
mental results are usually expressed in compar- 
ative terms using tables and charts. 

Experimental analysis of algorithms spans 
these two approaches by combining the sensi- 
bilities of the theoretician with the tools of the 
empiricist. Algorithm and program performance 
can be measured experimentally according 
to a wide variety of performance indicators, 
including the dominant cost traditional to theory, 
bottleneck operations that tend to dominate 
running time, data structure updates, instruction 
counts, and memory access costs. A researcher 
in experimental analysis selects performance 
indicators most appropriate to the scale and scope 
of the specific research question at hand. (Of 
course time is not the only metric of interest in 
algorithm studies; this approach can be used to 
analyze other properties such as solution quality 
or space use.) 

Input instances for experimental algorithm 
analysis may be randomly generated or derived 
from application instances. In either case, they 
typically are described in terms of a small- 
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collection of controlled 
parameters. A primary goal of experimentation 
is to investigate the cause-and-effect relationship 
between input parameters and algorithm/program 
performance indicators. 

Research goals of experimental algorith- 
mics may include discovering functions (not 
necessarily asymptotic) that describe the 
relationship between input and performance, 
assessing the strengths and weaknesses of 
different algorithm/data structures/programming 
strategies, and finding best algorithmic strategies 
for different input categories. Results are 
typically presented and illustrated with graphs 
showing comparisons and trends discovered in 
the data. 

The two terms “empirical” and “experimen- 
tal”, are often used interchangeably in the lit- 
erature. Sometimes the terms “old style” and 
“new style” are used to describe, respectively, 
the empirical and experimental approaches to this 
type of research. The related term “algorithm en- 
gineering” refers to a systematic design process 
that takes an abstract algorithm all the way to 
an implemented program, with an emphasis on 
program efficiency. Experimental and empirical 
analysis is often used to guide the algorithm en- 
gineering process. The general term algorithmics 
can refer to both design and analysis in algorithm 
research. 


to medium-sized 


Key Results 


None 


Applications 


Experimental analysis of algorithms has been 
used to investigate research problems originating 
in theoretical computer science. One example 
arises in the average-case analysis of algorithms 
for the One-Dimensional Bin Packing problem. 
Experimental analyses have led to new theorems 
about the performance of the optimal algorithm; 
new asymptotic bounds on average-case perfor- 
mance of approximation algorithms; extensions 
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of theoretical results to new models of inputs; and 
to new algorithms with tighter approximation 
guarantees. Another example is the experi- 
mental discovery of a type of phase-transition 
behavior for random instances of the 3CNF- 
Satisfiabilty problem, which has led to new 
ways to characterize the difficulty of problem 
instances. 

A second application of experimental algorith- 
mics is to find more realistic models of computa- 
tion, and to design new algorithms that perform 
better on these models. One example is found in 
the development of new memory-based models 
of computation that give more accurate time pre- 
dictions than traditional unit-cost models. Using 
these models, researchers have found new cache- 
efficient and I/O-efficient algorithms that exploit 
properties of the memory hierarchy to achieve 
significant reductions in running time. 

Experimental analysis is also used to design 
and select algorithms that work best in practice, 
algorithms that work best on specific categories 
of inputs, and algorithms that are most robust 
with respect to bad inputs. 


Data Sets 


Many repositories for data sets and instance gen- 
erators to support experimental research are avail- 
able on the Internet. They are usually organized 
according to specific combinatorial problems or 
classes of problems. 


URL to Code 


Many code repositories to support experimental 
research are available on the Internet. They 
are usually organized according to specific 
combinatorial problems or classes of problems. 
Skiena’s Stony Brook Algorithm Repository 
(www.cs.sunysb.edu/~algorith/) provides a com- 
prehensive collection of problem definitions and 
algorithm descriptions, with numerous links to 
implemented algorithms. 
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of experimental research is much too large to 
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commentary on experimental methodology in the 
context of algorithm research appear in the list 
below. 

The workshops and journals listed below 
are specifically intended to support research 
in experimental analysis of algorithms. Ex- 
perimental work also appears in more general 
algorithm research venues such as SODA 
(ACM/TEEE Symposium on Data Structures 
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Problem Definition 


Given a propositional formula in conjunctive nor- 
mal form, such as (x V y)A (X VV V z)A (2), 
one wants to find an assignment of truth values 
to the variables that makes the formula evaluate 
to true. Here, [x bh l,y b& 0,z tb O] does 
the job. We call such formulas CNF formulas and 
such assignments satisfying assignments. SAT is 
the problem of deciding whether a given CNF 
formula is satisfiable. If every clause (such as 
(x V y V z) above) has at most k literals, we call 
this a k-CNF formula. The above example is a 3- 
CNF formula. The problem of deciding whether 
a given k-CNF formula is satisfiable is called 
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k-SAT. This is one of the most fundamental NP- 
complete problems. 

Several clever algorithms have been developed 
for k-SAT. In this note we are mostly concerned 
with the PPSZ algorithm [3]. This is itself an 
improved version of the older PPZ algorithm [4]. 
Another prominent SAT algorithm is Schéning’s 
random walk algorithm [6], which is slower than 
PPSZ, but has the benefit that it can be turned into 
a deterministic algorithm [5]. 

Given that we currently cannot prove P 4 
NP, all super-polynomial lower bounds on the 
running time of k-SAT algorithms must be ei- 
ther conditional, that is, rest on widely believed 
but yet unproven assumptions, or must be for a 
particular family of algorithms. In this note we 
sketch exponential lower bounds for the PPSZ al- 
gorithm, which is the currently fastest algorithm 
for k-SAT. We measure the running time of a SAT 
algorithm in terms of 1, the number of variables. 
Often probabilistic algorithms for k-SAT (like 
PPSZ) have polynomial running time and success 
probability p” for some p < 1. One can turn this 
into a Monte Carlo algorithm with success prob- 
ability at least 1/2 by repeating it (//p)” times. 
We prefer the formulation of PPSZ as having 
polynomial running time, and we are interested 
in the worst-case success probability p”. 


Key Results 


The worst-case behavior of PPSZ is exponential. 
That is, there are satisfiable kK-CNF formulas 
on n variables, for which PPSZ finds a satisfy- 
ing assignment with probability at most 2~-2™, 
More precisely, there is a constant C and a 
sequence e, < ener such that the worst-case 
success probability of PPSZ for k-SAT is at most 
2-G-e«x)" | See Theorem 3 below for a formal 
statement. 


The PPSZ Algorithm 


The PPSZ algorithm, named after its inventors 
Paturi, Pudlak, Saks, and Zane [3], is the fastest 
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known algorithm for k-SAT. We now give a 
brief description of it: Choose a random ordering 
o of the n variables x,,...,x, of F. Choose 
random truth values b = (bj,...,),) € {0, 1}. 
Iterate through the variables in the ordering given 
by o. When processing x; check whether it is 
“obvious” what the correct value of x; should be. 
If so, fix x; to that value. Otherwise, fix x; to D;. 
By fixing we mean replacing each occurrence of 
x; in F by that value (and each occurrence of x; 
by the negation of that value). After all variables 
have been processed, the algorithm returns the 
satisfying assignment it has found or returns 
failure if it has run into a contradiction. 

It remains to specify what “obvious” means: 
Given a CNF formula F and a variable x;, we 
say that the correct value of x; is obviously b if 
the statement x; = b can be derived from F 
by width-w resolution, where w is some large 
constant (think of w = 1,000). This can be 
checked in time O(n"), which is polynomial. 

Let ppsz(F, o, b) be the return value of ppsz. 
That is, ppsz(F',0,b) € sat(F) U {failure}, 
where sat(F’) is the set of satisfying assignments 
of F. 


A Very Brief Sketch of the Analysis of PPSZ 
Let o be a permutation of x1,...,X, and let 
b = (b1,...,bn) € {0,1}"”. Suppose we run 
PPSZ on F using this permutation o and the truth 
values b. For 1 < i <_ n, define Z; to be 1 
if PPSZ did not find it obvious what the correct 
value of x; should be. Let Z = Z, +--+ + Zp. 
To underline the dependence on F’, a, and b, we 
sometimes write Z(F,o,b). It is not difficult to 
show the following lemma. 


Lemma 1 ([3]) Let F be a satisfiable CNF for- 
mula over n variables. Let o be a random per- 
mutation of its variables and let a € {0,1}" be 
satisfying assignment of F. Then 


he [2 ARSE) 


(1) 


Prippsz(¥, o,b)=al= 


Since x +> 2* is a convex function, Jensen’s 

inequality implies that E,[2~7] > 27™!4], and 

by linearity of expectation, it holds that E[Z] = 
n 


Vi=1 E[Zi]. 
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Lemma 2 ({3]) There are numbers cy € [0,1] 
such that the following holds: If F is a k-CNF 
formula over n variables with a unique satisfying 
assignment a, then Eg|Z;(F,0,a)| < cx for all 
1 <i <n. Furthermore, for large k we have 


Ck & j=2, and in particular cz = 21n(2)—1 = 


0.38. 


Combining everything, Paturi, Pudlak, Saks, and 
Zane obtain their main result: 


Theorem 1 ({3]) Let F be a k-CNF formula 
with a unique satisfying assignment a. Then 
PPSZ finds this satisfying assignment with 
probability at least 2~°k". 


It takes a considerable additional effort to show 
that the same bound holds also if F has multiple 
satisfying assignments: 


Theorem 2 ({2]) Let F be a satisfiable k-CNF 
formula. Then PPSZ finds a satisfying assignment 
with probability at least 2~°k". 


We sketch the intuition behind the proof of 
Lemma 2. It turns out that in the worst case the 
event Z; = | can be described by the following 
random experiment: Let T = (V,£E) be the 
infinite rooted (k — 1)-ary tree. For each node 
v € V choose t(v) € [0,1] randomly and 
independently. Call a node v alive if t(v) = 
t(root). Then Pr[Z; = 1] is (roughly) equal to 
the probability that 7 contains an infinite path 
of alive vertices, starting with the root. Call this 
probability cz. A simple calculation shows that 
c3 = 21n(2) — 1. For larger values of cz, there is 
not necessarily a closed form, but Paturi, Pudlak, 
Saks, and Zane show that cz, ~ 1— - for large k. 


Hard Instances for the PPSZ 
Algorithm 


One can construct instances on which the success 
probability of PPSZ is exponentially small. The 
construction is probabilistic and rather simple. Its 
analysis is quite technical, so we can only sketch 
it here. We start with some easy estimates. By 
Lemma | we can write the success probability of 
PPSZ as 
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Pr[ppsz(F, o, b) € sat(F)] 
o,b 


-¥ 


aésat(F) 


Bg [2 7"). (2) 


Above we used Jensen’s inequality to prove 
i[2-2] > 2-*IZ1. In this section we want to 
construct hard instances, that is, instances on 
which the success probability (2) is exponentially 
small. Thus, we cannot use Jensen’s inequality, 
as it gives a lower bound, not an upper. Instead, 
we use the following trivial estimate: 


> ‘o(2 ZF cia) Pa max 2-2(F,0,4) 
o 
aésat(F) aésat(F) 
<|sat(F)|- max 27-270%4:4) | (3) 
aésat(F),o 


We would like to construct a satisfiable kK-CNF 
formula F for which (i) |sat(F)| is small, i-e., 
F has few satisfying assignments, and (ii) 
Z(F,o,a) is large for every permutation and 
every satisfying assignment a. It turns out there 
are formulas satisfying both requirements: 


Theorem 3 There are numbers €, converging to 
0 such that the following holds: For every k, 
there is a family (Fn)n>1, where each Fy is a 
satisfiable k-CNF formula over n variables such 
that 


I. |sat(Fy,)| < 2°". 
2. Z(F,0,a) => (1—«,)n for all o and alla € 
sat( Fy). 


Thus, the probability of PPSZ finding a satisfying 
assignment of Fy is at most 2~°-2&)", Further- 


2 
more, €k < a wt) for some universal constant 


C. 


This theorem shows that PPSZ has exponentially 
small success probability. Also, it shows that 
the strong exponential time hypothesis (SETH) 
holds for PPSZ: As k grows, the advantage 
over the trivial success probability 2~” becomes 
negligible. 
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The Probabilistic Construction 

Let A € F5*”". The system Ax = 0 defines 
a Boolean function f4 : {0,1}” — {0,1} as 
follows: fa(x) = 1 if and only if A- x = 0. 
Say A is k-sparse if every row of A has at most 
k nonzero entries. If A is k-sparse, then f4 can 
be written as a K-CNF formula with 1 variables 
and 2‘—'n clauses. Our construction will be prob- 
abilistic. For this, we define a distribution over 
k-sparse matrices in F}*”. Our distribution will 
have the form D”, where PD is a distribution over 
row vectors from FF). That is, we sample each row 
of A independently from D. Let us describe D. 
Define e; € F5 to be the vector with a 1 at the 
it” position and 0 elsewhere. Sample i;,..., ix € 
{1,...,} uniformly and independently and let 
X =e, +-:-+e,. Clearly, X ¢€ F% has at 
most k nonzero entries. This is our distribution 
D. 

Let A be a random matrix sampled as de- 
scribed, and write f4 as a kK-CNF formula F. 
Note that sat(F) = kerA. The challenge is 
to show that F' satisfies the two conditions of 
Theorem 3. 


Lemma 3 (A has high rank) With probability 
1 — 0(]), | ker(A)| < 2°”. 


This shows that F satisfies the first condition of 
the theorem, i.e., it has few satisfying assign- 
ments. Lemma 3 is quite straightforward to prove, 
though not trivial. The next lemma shows that 
Z(F, 0, a) is large. 


Lemma 4 With probability 1—o()), it holds that 
Z(F,o,a) => (1—€,)n for all permutations o and 
alla € sat(F). 


Proving this lemma is the main technical chal- 
lenge. The proof uses ideas from proof complex- 
ity (indeed, the above construction is inspired by 
constructions in proof complexity). 


Open Problems 


Suppose the true worst-case success probability 
of PPSZ on k-CNF formulas is 2~"*”. Paturi, 
Pudlak, Saks, and Zane have proved that rp < 
1 — Q(1k). Chen, Scheder, Talebanfard, and 
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Tang showed that r, > 1—O (4), Can one 


close this gap by construction harder instances or 
maybe even improve the analysis of PPSZ? 

What is the average-case success probability 
of PPSZ on F' when we sample A from D”? Note 
that F is exponentially hard with probability 1 — 
o(1), but this might leave a 1/n probability that 
F is very easy for PPSZ. 

The construction of [1] is probabilistic. Can 
one make it explicit? The proof of Lemma 4 uses 
(implicit in [1]) a nonstandard notion of expan- 
sion. We do not know of explicit construction of 
those expanders. 
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Problem Definition 


Notations The main properties of magnetic 
disks and multiple disk systems can be captured 
by the commonly used parallel disk model 
(PDM), which is summarized below in its current 
form as developed by Vitter and Shriver [22]: 


N = problem size (in units of data items); 

M = internal memory size(inunitsofdata items); 
B = block transfer size (in units of data items); 
D = number of independent disk drives; 


P = number of CPUs, 
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where M < N,and1 < DB < M/2. The 
data items are assumed to be of fixed length. 
In a single I/O, each of the D disks can si- 
multaneously transfer a block of B contiguous 
data items. (In the original 1988 article [2], the 
D blocks per I/O were allowed to come from 
the same disk, which is not realistic.) If P < 
D, each of the P processors can drive about 
D/P. disks; if D < P, each disk is shared by 
about P/D processors. The internal memory 
size is M/P per processor, and the P proces- 
sors are connected by an interconnection net- 
work. 

It is convenient to refer to some of the above 
PDM parameters in units of disk blocks rather 
than in units of data items; the resulting formulas 
are often simplified. We define the lowercase 
notation 


to be the problem input size, internal memory 
size, query specification size, and query output 
size, respectively, in units of disk blocks. 

The primary measures of performance in PDM 
are: 


1. The number of I/O operations performed 

2. The amount of disk space used 

3. The internal (sequential or parallel) computa- 
tion time 


For reasons of brevity in this survey, focus is re- 
stricted onto only the first two measures. Most of 
the algorithms run in O(N log NV) CPU time with 
one processor, which is optimal in the compari- 
son model, and in many cases are optimal for par- 
allel CPUs. In the word-based RAM model, sort- 
ing can be done more quickly in O(N log log NV) 
CPU time. Arge and Thorup [5] provide sort- 
ing algorithms that are theoretically optimal in 
terms of both I/Os and time in the word-based 
RAM model. In terms of auxiliary storage in 
external memory, algorithms and data structures 
should ideally use linear space, which means 
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O(N/B) = O(n) disk blocks of storage. Vit- 
ter [20] gives further details about the PDM 
model and provides optimal algorithms and data 
structures for a variety of problems. The content 
of this chapter comes largely from an abbreviated 
form of [19]. 


Problem 1 External sorting 

INPUT: The input data records Ro, Ri, Ro, ... 
are initially “striped” across the D disks, in units 
of blocks, so that record R; is in block |i/B | and 
block 7 is stored on disk 7 mod D. 

OUTPUT: A striped representation of a permuted 
ordering Roo), Roa), Ro), --. of the input 
records with the property that key(Roq) < 
key(Roq+1) for alli > 0. 


Permuting is the special case of sorting in 
which the permutation that describes the final 
position of the records is given explicitly and 
does not have to be discovered, for example, by 
comparing keys. 


Problem 2 Permuting 

INPUT: Same input assumptions as in external 
sorting. In addition, a permutation o of the in- 
tegers {0, 1, 2,..., N — 1} is specified. 
OUTPUT: A striped representation of a permuted 
ordering Roo), Roa), Ro), --. of the input 
records. 


Key Results 
Theorem 1 ((2, 15]) The and 


worst-case number of I/Os required for sorting 
N =nB data items using D disks is 


average-case 


Sort(N) = ® ‘e log, n) (2) 


Theorem 2 ((2]) The average-case and worst- 
case number of I/Os required for permuting 
N data items using D disks is 


(0) (min x sortiw)} ) F 
D 


A more detailed lower bound is provided in (9) in 
section “Lower Bounds on I/O.” 


(3) 


710 


Matrix transposition is the special case of 
permuting in which the permutation can be rep- 
resented as a transposition of a matrix from row- 
major order into column-major order. 


Theorem 3 (([2]) With D disks, the number 
of I/Os required to transpose a p X q matrix 
from row-major order to column-major order is 


n ; 
2) (5 log,, min{M, p,q, n}) ; (4) 


where N = pq andn = N/B. 


Matrix transposition is a special case of a 
more general class of permutations called bit- 
permute/complement (BPC) permutations, which 
in turn is a subset of the class of bit-matrix- 
multiply/complement (BMMC) _ permutations. 
BMMC permutations are defined by a log N x 
log N nonsingular 0-1 matrix A and a (log NV)- 
length 0-1 vector c. An item with binary 
address x is mapped by the permutation to 
the binary address given by Ax @ c, where @ 
denotes bitwise exclusive-or. BPC permutations 
are the special case of BMMC permutations in 
which A is a permutation matrix, that is, each 
row and each column of A contain a single 1. 
BPC permutations include matrix transposition, 
bit-reversal permutations (which arise in the 
FFT), vector-reversal permutations, hypercube 
permutations, and matrix re-blocking. Cormen 
et al. [8] characterize the optimal number of 
1/Os needed to perform any given BMMC 
permutation solely as a function of the associated 
matrix A, and they give an optimal algorithm for 
implementing it. 


Theorem 4 ([8]) With D disks, the number 
of I/Os required to perform the BMMC 
permutation defined by matrix A and vector c 


n rank(y) 
(5 (1+ em )): 


where y is the lower-left logn x log B submatrix 
of A. 


(5) 


The two main paradigms for external sorting 
are distribution and merging, which are discussed 
in the following sections for the PDM model. 
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Sorting by Distribution 

Distribution sort [12] is a recursive process that 
uses a set of S — 1 partitioning elements to 
partition the items into S disjoint buckets. All the 
items in one bucket precede all the items in the 
next bucket. The sort is completed by recursively 
sorting the individual buckets and concatenating 
them together to form a single fully sorted list. 

One requirement is to choose the S — | par- 
titioning elements so that the buckets are of 
roughly equal size. When that is the case, the 
bucket sizes decrease from one level of recursion 
to the next by a relative factor of O(S), and thus 
there are O(log, ) levels of recursion. During 
each level of recursion, the data are scanned. 
As the items stream through internal memory, 
they are partitioned into S buckets in an online 
manner. When a buffer of size B fills for one of 
the buckets, its block is written to the disks in 
the next I/O, and another buffer is used to store 
the next set of incoming items for the bucket. 
Therefore, the maximum number of buckets (and 
partitioning elements) is S = O(M/B) = 
©(m), and the resulting number of levels of re- 
cursion is @(log,, 7). How to perform each level 
of recursion in a linear number I/Os is discussed 
in (2, 14,22]. 

An even better way to do distribution sort, 
and deterministically at that, is the BalanceSort 
method developed by Nodine and Vitter [14]. 
During the partitioning process, the algorithm 
keeps track of how evenly each bucket has been 
distributed so far among the disks. It maintains an 
invariant that guarantees good distribution across 
the disks for each bucket. 

The distribution sort methods mentioned 
above for parallel disks perform write operations 
in complete stripes, which make it easy to write 
parity information for use in error correction 
and recovery. But since the blocks written in 
each stripe typically belong to multiple buckets, 
the buckets themselves will not be striped on 
the disks, and thus the disks must be used 
independently during read operations. In the 
write phase, each bucket must therefore keep 
track of the last block written to each disk so 
that the blocks for the bucket can be linked 
together. 


External Sorting and Permuting 


An orthogonal approach is to stripe the con- 
tents of each bucket across the disks so that 
read operations can be done in a striped manner. 
As a result, the write operations must use disks 
independently, since during each write, multiple 
buckets will be writing to multiple stripes. Error 
correction and recovery can still be handled ef- 
ficiently by devoting to each bucket one block- 
sized buffer in internal memory. The buffer is 
continuously updated to contain the exclusive-or 
(parity) of the blocks written to the current stripe, 
and after D — 1 blocks have been written, the 
parity information in the buffer can be written to 
the final (Dth) block in the stripe. 

Under this new scenario, the basic loop of 
the distribution sort algorithm is, as before, 
to read one memory load at a time and 
partition the items into S buckets. However, 
unlike before, the blocks for each individual 
bucket will reside on the disks in contiguous 
stripes. Each block therefore has a predefined 
place where it must be written. With the 
normal round-robin ordering for the stripes 
(namely, ...,1,2,3,...,D,1,2,3,...,D,...), 
the blocks of different buckets may “collide,” 
meaning that they need to be written to the 
same disk, and subsequent blocks in those 
same buckets will also tend to collide. Vitter 
and Hutchinson [21] solve this problem by 
the technique of randomized cycling. For each 
of the S buckets, they determine the ordering 
of the disks in the stripe for that bucket via a 
random permutation of {1, 2, ..., D}. The S 
random permutations are chosen independently. 
If two blocks (from different buckets) happen to 
collide during a write to the same disk, one block 
is written to the disk and the other is kept on a 
write queue. With high probability, subsequent 
blocks in those two buckets will be written to 
different disks and thus will not collide. As long 
as there is a small pool of available buffer space to 
temporarily cache the blocks in the write queues, 
Vitter and Hutchinson [21] show that with high 
probability the writing proceeds optimally. 

The randomized cycling method or the related 
merge sort methods discussed at the end of sec- 
tion “Sorting by Merging” are the methods of 
choice for sorting with parallel disks. Distribution 
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sort algorithms may have an advantage over the 
merge approaches presented in section “Sorting 
by Merging” in that they typically make better 
use of lower levels of cache in the memory 
hierarchy of real systems, based upon analysis 
of distribution sort and merge sort algorithms on 
models of hierarchical memory. 


Sorting by Merging 

The merge paradigm is somewhat orthogonal to 
the distribution paradigm of the previous sec- 
tion. A typical merge sort algorithm works as 
follows [12]: In the “run formation” phase, the 
n blocks of data are scanned, one memory load 
at a time; each memory load is sorted into a 
single “run,” which is then output onto a series 
of stripes on the disks. At the end of the run 
formation phase, there are N/M = n/m (sorted) 
runs, each striped across the disks. (In actual 
implementations, “replacement selection” can be 
used to get runs of 2M data items, on the average, 
when M > B [12].) After the initial runs are 
formed, the merging phase begins. In each pass of 
the merging phase, R runs are merged at a time. 
For each merge, the R runs are scanned and its 
items merged in an online manner as they stream 
through internal memory. Double buffering is 
used to overlap I/O and computation. At most 
R = ©(m) runs can be merged at a time, and 
the resulting number of passes is O(log,,, 1). 

To achieve the optimal sorting bound (2), each 
merging pass must be done in O(n/D) I/Os, 
which is easy to do for the single-disk case. In 
the more general multiple-disk case, each parallel 
read operation during the merging must on the 
average bring in the next ©(D) blocks needed for 
the merging. The challenge is to ensure that those 
blocks reside on different disks so that they can be 
read in a single I/O (or a small constant number 
of I/Os). The difficulty lies in the fact that the runs 
being merged were themselves formed during the 
previous merge pass. Their blocks were written to 
the disks in the previous pass without knowledge 
of how they would interact with other runs in later 
merges. 

The Greed Sort method of Nodine and Vit- 
ter [15] was the first optimal deterministic EM 
algorithm for sorting with multiple disks. It works 
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by relaxing the merging process with a final 
pass to fix the merging. Aggarwal and Plax- 
ton [1] developed an optimal deterministic merge 
sort based upon the Sharesort hypercube parallel 
sorting algorithm. To guarantee even distribu- 
tion during the merging, it employs two high- 
level merging schemes in which the schedul- 
ing is almost oblivious. Like Greed Sort, the 
Sharesort algorithm is theoretically optimal (i.e., 
within a constant factor of optimal), but the con- 
stant factor is larger than the distribution sort 
methods. 

One of the most practical methods for sorting 
is based upon the simple randomized merge sort 
(SRM) algorithm of Barve et al. [7], referred to as 
“randomized striping” by Knuth [12]. Each run is 
striped across the disks, but with a random start- 
ing point (the only place in the algorithm where 
randomness is utilized). During the merging pro- 
cess, the next block needed from each disk is read 
into memory, and if there is not enough room, the 
least needed blocks are “flushed” (without any 
I/Os required) to free up space. 

Further improvements in merge sort are pos- 
sible by a more careful prefetching schedule 
for the runs. Barve et al. [6], Kallahalla and 
Varman [11], Shah et al. [17], and others have 
developed competitive and optimal methods for 
prefetching blocks in parallel I/O systems. 

Hutchinson et al. [10] have demonstrated a 
powerful duality between parallel writing and 
parallel prefetching, which gives an easy way to 
compute optimal prefetching and caching sched- 
ules for multiple disks. More significantly, they 
show that the same duality exists between dis- 
tribution and merging, which they exploit to get 
a provably optimal and very practical parallel 
disk merge sort. Rather than use random start- 
ing points and round-robin stripes as in SRM, 
Hutchinson et al. [10] order the stripes for each 
run independently, based upon the randomized 
cycling strategy discussed in section “Sorting 
by Distribution” for distribution sort. These ap- 
proaches have led to successfully faster external 
memory sorting algorithms [9]. Clever algorithm 
engineering optimizations on multicore architec- 
tures have won recent big data sorting competi- 
tions [16]. 
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Handling Duplicates: Bundle Sorting 

For the problem of duplicate removal, in 
which there are a total of K distinct items 
among the N items, Arge et al. [4] use a 
modification of merge sort to solve the problem 
in O(nmax{1,log,,(K/B)}) Os, which 
is optimal in the comparison model. When 
duplicates get grouped together during a merge, 
they are replaced by a single copy of the item and 
a count of the occurrences. The algorithm can 
be used to sort the file, assuming that a group of 
equal items can be represented by a single item 
and a count. 

A harder instance of sorting called bundle 
sorting arises when there are K distinct key 
values among the N items, but all the items 
have different secondary information that must 
be maintained, and therefore items cannot be ag- 
gregated with a count. Matias et al. [13] develop 
optimal distribution sort algorithms for bundle 
sorting using 


O(n max {1, log, min{ K, nyt) (6) 


I/Os and prove the matching lower bound. They 
also show how to do bundle sorting (and sorting 
in general) in place (i.e., without extra disk 
space). 


Permuting and Transposition 

Permuting is the special case of sorting in which 
the key values of the N data items form a per- 
mutation of {1,2,..., NM}. The I/O bound (3) for 
permuting can be realized by one of the optimal 
sorting algorithms except in the extreme case 
Blogm = o(logn), where it is faster to move 
the data items one by one in a nonblocked way. 
The one-by-one method is trivial if D = 1, 
but with multiple disks, there may be bottlenecks 
on individual disks; one solution for doing the 
permuting in O(N/D) V/Os is to apply the ran- 
domized balancing strategies of [22]. 

Matrix transposition can be as hard as general 
permuting when B is relatively large (say, $M ) 
and N is O(M 2) but for smaller B, the special 
structure of the transposition permutation makes 
transposition easier. In particular, the matrix can 
be broken up into square submatrices of B? 
elements such that each submatrix contains B 
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blocks of the matrix in row-major order and also 
B blocks of the matrix in column-major order. 
Thus, if B? < M, the transpositions can be done 
in a simple one-pass operation by transposing 
the submatrices one at a time in internal mem- 
ory. Thonangi and Yang [18] discuss other types 
of permutations realizable with fewer I/Os than 
sorting. 


Fast Fourier Transform and Permutation 
Networks 

Computing the fast Fourier transform (FFT) in 
external memory consists of a series of I/Os that 
permit each computation implied by the FFT 
directed graph (or butterfly) to be done while its 
arguments are in internal memory. A permutation 
network computation consists of an oblivious 
(fixed) pattern of I/Os such that any of the NV! 
possible permutations can be realized; data items 
can only be reordered when they are in internal 
memory. A permutation network can be realized 
by a series of three FFTs. 

The algorithms for FFT are faster and sim- 
pler than for sorting because the computation is 
nonadaptive in nature, and thus the communica- 
tion pattern is fixed in advance [22]. 


Lower Bounds on I/O 
The following proof of the permutation lower 
bound (3) of Theorem 2 is due to Aggarwal and 
Vitter [2]. The idea of the proof is to calculate, for 
each t > 0, the number of distinct orderings that 
are realizable by sequences of ¢ I/Os. The value 
of ¢ for which the number of distinct orderings 
first exceeds N!/2 is a lower bound on the av- 
erage number of I/Os (and hence the worst-case 
number of I/Os) needed for permuting. 
Assuming for the moment that there is only 
one disk, D = 1, consider how the number 
of realizable orderings can change as a result 
of an I/O. In terms of increasing the number of 
realizable orderings, the effect of reading a disk 


2N logn 
D Blogm+2logN 


N 
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block is considerably more than that of writing 
a disk block, so it suffices to consider only the 
effect of read operations. During a read operation, 
there are at most B data items in the read block, 
and they can be interspersed among the M items 
in internal memory in at most ) ways, so the 
number of realizable orderings increases by a fac- 
tor of ). If the block has never before resided 
in internal memory, the number of realizable 
orderings increases by an extra B! factor, since 
the items in the block can be permuted among 
themselves. (This extra contribution of B! can 
only happen once for each of the N/B original 
blocks.) There are at mostn +t < N log N ways 
to choose which disk block is involved in the rth 
1/O (allowing an arbitrary amount of disk space). 
Hence, the number of distinct orderings that can 
be realized by all possible sequences of t I/Os is 


at most 
M t 
(BY)N/8 (wove v()) (7) 


Setting the expression in (7) to be at least N!/2, 
and simplifying by taking the logarithm, the re- 
sult is 


M 
N log B+t (toe N+B log a) = Q(N log N). 
(8) 


Solving for ¢ gives the matching lower bound 
Q(nlog,,n) for permuting for the case D = 
1. The general lower bound (3) of Theorem 2 
follows by dividing by D. 

Hutchinson et al. [10] derive an asymptotic 
lower bound (i.e., one that accounts for constant 
factors) from a more refined argument that ana- 
lyzes both input operations and output operations. 
Assuming that m = M/B is an increasing 
function, the number of I/Os required to sort or 
permute 7 indivisible items, up to lower-order 
terms, is at least 


2 
= log,, 1 if Blogm = w(log N); 


(9) 
if Blogm = o(log N). 
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For the typical case in which Blogm = 
o(log NV), the lower bound, up to lower order 
terms, is 2n log,, 1 I/Os. For the pathological in 
which B logm = o(log N), the I/O lower bound 
is asymptotically N/D. 

Permuting is a special case of sorting, and 
hence the permuting lower bound applies also 
to sorting. In the unlikely case that Blogm = 
o(logn), the permuting bound is only Q(N/D), 
and in that case the comparison model must be 
used to get the full lower bound (2) of Theo- 
rem | [2]. In the typical case in which B logm = 
Q2(logn), the comparison model is not needed to 
prove the sorting lower bound; the difficulty of 
sorting in that case arises not from determining 
the order of the data but from permuting (or 
routing) the data. 

The proof used above for permuting also 
works for permutation networks, in which the 
communication pattern is oblivious (fixed). Since 
the choice of disk block is fixed for each f, 
there is no NlogN term as there is in (7), 
and correspondingly there is no additive log N 
term in the inner expression as there is in (8). 
Hence, solving for ¢ gives the lower bound (2) 
rather than (3). The lower bound follows directly 
from the counting argument; unlike the sorting 
derivation, it does not require the comparison 
model for the case Blogm = o(logn). The 
lower bound also applies directly to FFT, since 
permutation networks can be formed from three 
FFTs in sequence. The transposition lower bound 
involves a potential argument based upon a 
togetherness relation [2]. 

For the problem of bundle sorting, in which 
the N items have a total of K distinct key values 
(but the secondary information of each item is 
different), Matias et al. [13] derive the matching 
lower bound. 

The lower bounds mentioned above assume 
that the data items are in some sense “indivisible,” 
in that they are not split up and reassembled in 
some magic way to get the desired output. It 
is conjectured that the sorting lower bound (2) 
remains valid even if the indivisibility assump- 
tion is lifted. However, for an artificial problem 
related to transposition, removing the indivisi- 
bility assumption can lead to faster algorithms. 
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Whether the conjecture is true is a challenging 
theoretical open problem. 


Applications 


Sorting and sorting-like operations account for 
a significant percentage of computer use [12], 
with numerous database applications. In addition, 
sorting is an important paradigm in the design of 
efficient EM algorithms, as shown in [20], where 
several applications can be found. With some 
technical qualifications, many problems that can 
be solved easily in linear time in internal memory, 
such as permuting, list ranking, expression tree 
evaluation, and finding connected components in 
a sparse graph, require the same number of I/Os 
in PDM as does sorting. 


Open Problems 


Several interesting challenges remain. One diffi- 
cult theoretical problem is to prove lower bounds 
for permuting and sorting without the indivisibil- 
ity assumption. Another question is to determine 
the I/O cost for each individual permutation, as 
a function of some simple characterization of 
the permutation, such as number of inversions. 
A continuing goal is to develop optimal EM 
algorithms and to translate theoretical gains into 
observable improvements in practice. 

Many interesting challenges and opportuni- 
ties in algorithm design and analysis arise from 
new architectures being developed. For example, 
Arge et al. [3] propose the parallel external 
memory (PEM) model for the design of efficient 
algorithms for chip multiprocessors, in which 
each processor has a private cache and shares a 
larger main memory with the other processors. 
The paradigms described earlier form the ba- 
sis for efficient algorithms for sorting, selection, 
and prefix sums. Further architectures to explore 
include other forms of multicore architectures, 
networks of workstations, hierarchical storage 
devices, disk drives with processing capabilities, 
and storage devices based upon microelectrome- 
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chanical systems (MEMS). Active (or intelligent) 
disks, in which disk drives have some processing 
capability and can filter information sent to the 
host, have been proposed to further reduce the 
I/O bottleneck, especially in large database ap- 
plications. MEMS-based nonvolatile storage has 
the potential to serve as an intermediate level 
in the memory hierarchy between DRAM and 
disks. It could ultimately provide better latency 
and bandwidth than disks, at less cost per bit than 
DRAM. 


URL to Code 


Two systems for developing external memory 
algorithms are TPIE and STXXL, which can 
be downloaded from _http://www.madalgo. 
au.dk/tpie/ and __http://stxxl.sourceforge.net/, 
respectively. Both systems include subroutines 
for sorting and permuting and facilitate 
development of more advanced algorithms. 
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Problem Definition 


Facility location problems concern situations 
where a planner needs to determine the location 
of facilities intended to serve a given set of 
clients. The objective is usually to minimize 
the sum of the cost of opening the facilities and 
the cost of serving the clients by the facilities, 
subject to various constraints, such as the number 
and the type of clients a facility can serve. There 
are many variants of the facility location problem, 
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depending on the structure of the cost function 
and the constraints imposed on the solution. 
Early references on facility location problems 
include Kuehn and Hamburger [35], Balinski and 
Wolfe [8], Manne [40], and Balinski [7]. Review 
works include Krarup and Pruzan [34] and 
Mirchandani and Francis [42]. It is interesting to 
notice that the algorithm that is probably one of 
the most effective ones to solve the uncapacitated 
facility location problem to optimality is the 
primal-dual algorithm combined with branch- 
and-bound due to Erlenkotter [16] dating back 
to 1978. His primal-dual scheme is similar to 
techniques used in the modern literature on 
approximation algorithms. 

More recently, extensive research into approx- 
imation algorithms for facility location problems 
has been carried out. Review articles on this 
topic include Shmoys [49, 50] and Vygen [55]. 
Besides its theoretical and practical importance, 
facility location problems provide a showcase of 
common techniques in the field of approximation 
algorithms, as many of these techniques such as 
linear programming rounding, primal-dual meth- 
ods, and local search have been applied suc- 
cessfully to this family of problems. This entry 
defines several facility location problems, gives 
a few historical pointers, and lists approxima- 
tion algorithms with an emphasis on the results 
derived in the paper by Shmoys, Tardos, and 
Aardal [51]. The techniques applied to the un- 
capacitated facility location (UFL) problem are 
discussed in some more detail. 
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In the UFL problem, a set F of np facili- 
ties and a set C of n, clients (also known as 
cities, or demand points) are given. For every 
facility i € F, the facility opening cost is equal 
to f;. Furthermore, for every facility i € F and 
client 7 € C, there is a connection cost cj. The 
objective is to open a subset of the facilities 
and connect each client to an open facility so 
that the total cost is minimized. Notice that once 
the set of open facilities is specified, it is opti- 
mal to connect each client to the open facility 
that yields smallest connection cost. Therefore, 
the objective is to find a set S C F that min- 
imizes Viies fi + Lijec minjes {ci}. This def- 
inition and the definitions of other variants of 
the facility location problem in this entry assume 
unit demand at each client. It is straightforward 
to generalize these definitions to the case where 
each client has a given demand. The UFL prob- 
lem can be formulated as the following integer 
program due to Balinski [7]. Let y;, i €¢ F be 
equal to | if facility i is open, and equal to 0 
otherwise. Let x;;, i € F, j € C be the fraction 
of client j assigned to facility i. 


min) > Sidi + ee (1) 
ief ieF FEC 
subject to)” xj; = 1, forall j EC, (2) 
ie Ff 
xij—-yi x90, foralie F, 7 €C (©) 
x>0, y € {0,1}"% (4) 


In the linear programming (LP) relaxation of 
UFL the constraint y € {0,1}”% is substituted 
by the constraint y € [0,1]"’%. Notice that in 
the uncapacitated case, it is not necessary to 
require x;; € {0,1}, i € F, j € C if each client 
has to be serviced by precisely one facility, as 
O < x;; < 1 by constraints (2) and (4). Moreover, 
if xj is not integer, then it is always possible 
to create an integer solution with the same cost 
by assigning client 7 completely to one of the 
facilities currently servicing j. 

A y-approximation algorithm is a polynomial 
algorithm that, in case of minimization, is guar- 
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anteed to produce a feasible solution having value 
at most yz*, where z* is the value of an optimal 
solution, and y > 1. If y = | the algorithm pro- 
duces an optimal solution. In case of maximiza- 
tion, the algorithm produces a solution having 
value at least yz*, where 0 < y < 1. 

Hochbaum [25] developed an O(logz)- 
approximation algorithm for UFL. By a straight- 
forward reduction from the Set Cover problem, 
it can be shown that this cannot be improved 
unless NP CGC DTIME[n? se”) due 
to a result by Feige [17]. However, if the 
connection costs are restricted to come from 
distances in a metric space, namely cj; = cj; = 0 


for all i¢F,j €C (nonnegativity and 
symmetry) and cj + cj) + cj 71 >= ci for 
all i,i'€ F,j,j’€C (triangle inequality), 


then constant approximation guarantees can 
be obtained. In all results mentioned below, 
except for the maximization objectives, it is 
assumed that the costs satisfy these restrictions. 
If the distances between facilities and clients 
are Euclidean, then for some location problems 
approximation schemes have been obtained [5]. 


Variants and Related Problems 

A variant of the uncapacitated facility location 
problem is obtained by considering the objective 
coefficients cj; as the per unit profit of servicing 
client j from facility i. The maximization version 
of UFL, max-UFL is obtained by maximizing 
the profit minus the facility opening cost, i.e., 
max )rieg Dijec CijXij — Vier Siyi- This vari- 
ant was introduced by Cornuéjols, Fisher, and 
Nemhauser [15]. 

In the k-median problem the facility opening 
cost is removed from the objective function (1) 
to obtain min)? ;ey D jen Cif Xij» and the con- 
straint that no more than k facilities may be 
opened, 7c. Vi <k, is added. In the k-center 
problem the constraint 7j¢y yi <k is again 
included, and the objective function here is to 
minimize the maximum distance used on a link 
between an open facility and a client. 

In the capacitated facility location problem 
a capacity constraint dV iec Xij <ujyi is added 
for all i € F. Here it is important to distin- 
guish between the splittable and the unsplittable 
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case, and also between hard capacities and soft 
capacities. In the splittable case one has x > 0, 
allowing for a client to be serviced by multiple 
depots, and in the unsplittable case one requires 
x € {0,1}"/*"e, If each facility can be opened 
at most once (i.e., yi € {0,1}), the capacities 
are called hard; otherwise, if the problem allows 
a facility i to be opened any number r of times to 
serve ru; clients, the capacities are called soft. 

In the k-level facility location problem, the 
following are given: a set C of clients, k disjoint 
sets F1,..., Fx of facilities, an opening cost 
for each facility, and connection costs between 
clients and facilities. The goal is to connect each 
client j through a path ij,...,i, of open facili- 
ties, with ig € Fe. The connection cost for this 
client is Cj, + Cijip +++: + Cip_,i,- The goal is 
to minimize the sum of connection costs and 
facility opening costs. 

The problems mentioned above have all been 
considered by Shmoys, Tardos, and Aardal [51], 
with the exceptions of max-UFL, and the k-center 
and k-median problems. The max-UFL variant 
is included for historical reasons, and k-center 
and k-median are included since they have a rich 
history and since they are closely related to UFL. 
Results on the capacitated facility location prob- 
lem with hard capacities are mentioned as this, 
at least from the application point of view, is 
a more realistic model than the soft capacity 
version, which was treated in [51]. For k-level 
facility location, Shmoys et al. considered the 
case k = 2. Here, the problem for general k is 
considered. 

There are many other variants of the facility 
location problem that are not discussed here. 
Examples include K-facility location [33], 
universal facility location [24, 38], online 
facility location [3, 18, 41], fault tolerant 
facility location [28, 30, 54], facility location 
with outliers [12, 28], multicommodity facility 
location [48], priority facility location [37, 
48], facility location with hierarchical facility 
costs [52], stochastic facility location [23, 
37, 46], connected facility location [53], load- 
balanced facility location [22, 32, 37], concave- 
cost facility location [24], and capacitated-cable 
facility location [37, 47]. 
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Key Results 


Many algorithms have been proposed for location 
problems. To begin with, a brief description of the 
algorithms of Shmoys, Tardos, and Aardal [51] 
is given. Then, a quick overview of some key 
results is presented. Some of the algorithms giv- 
ing the best values of the approximation guar- 
antee y are based on solving the LP-relaxation 
by a polynomial algorithm, which can actually 
be quite time consuming, whereas some authors 
have suggested fast combinatorial algorithms for 
facility location problems with less competitive 
y-values. Due to space restrictions the focus of 
this entry is on the algorithms that yield the best 
approximation guarantees. For more references 
the survey papers by Shmoys [49, 50] and by 
Vygen [55] are recommended. 


The Algorithms of Shmoys, Tardos, 

and Aardal 

First the algorithm for UFL is described, and then 
the results that can be obtained by adaptations of 
the algorithm to other problems are mentioned. 

The algorithm solves the LP relaxation and 
then, in two stages, modifies the obtained frac- 
tional solution. The first stage is called filtering 
and it is designed to bound the connection cost 
of each client to the most distant facility fraction- 
ally serving him. To do so, the facility opening 
variables y; are scaled up by a constant and then 
the connection variables x;; are adjusted to use the 
closest possible facilities. 

To describe the second stage, the notion 
of clustering, formalized later by Chudak and 
Shmoys [13] is used. Based on the fractional 
solution, the instance is cut into pieces called 
clusters. Each cluster has a distinct client called 
the cluster center. This is done by iteratively 
choosing a client, not covered by the previous 
clusters, as the next cluster center, and adding 
to this cluster the facilities that serve the cluster 
center in the fractional solution, along with other 
clients served by these facilities. This construc- 
tion of clusters guarantees that the facilities in 
each cluster are open to a total extent of one, 
and therefore after opening the facility with the 
smallest opening cost in each cluster, the total 


720 


facility opening cost that is paid does not exceed 
the facility opening cost of the fractional solution. 
Moreover, by choosing clients for the cluster 
centers in a greedy fashion, the algorithm makes 
each cluster center the minimizer of a certain 
cost function among the clients in the cluster. The 
remaining clients in the cluster are also connected 
to the opened facility. The triangle inequality for 
connection costs is now used to bound the cost 
of this connection. For UFL, this filtering and 
rounding algorithm is a 4-approximation algo- 
rithm. Shmoys et al. also show that if the filtering 
step is substituted by randomized filtering, an 
approximation guarantee of 3.16 is obtained. 

In the same paper, adaptations of the algo- 
rithm, with and without randomized filtering, was 
made to yield approximation algorithms for the 
soft-capacitated facility location problem, and 
for the 2-level uncapacitated problem. Here, the 
results obtained using randomized filtering are 
discussed. 

For the problem with soft capacities two 
versions of the problem were considered. Both 
have equal capacities, i.e., uj; = u for alli € F. 
In the first version, a solution is “feasible” if 
the y-variables either take value 0, or a value 
between | and y’ > 1. Note that y’ is not required 
to be integer, so the constructed solution is 
not necessarily integer. This can be interpreted 
as allowing for each facility i to expand to 
have capacity y’u at a cost of y’ fj. A (y, y’)- 
approximation algorithm is a polynomial 
algorithm that produces such a feasible solution 
having a total cost within a factor of y of the true 
optimal cost, i.e., with y € {0,1}”%. Shmoys 
et al. developed a (5.69, 4.24)-approximation 
algorithm for the splittable case of this problem, 
and a (7.62, 4.29)-approximation algorithm for 
the unsplittable case. 

In the second soft-capacitated model, the 
original problem is changed to allow for the 
y-variables to take nonnegative integer values, 
which can be interpreted as allowing multiple 
facilities of capacity u to be opened at each 
location. The approximation algorithms in 
this case produces a solution that is feasible 
with respect to this modified model. It is easy 
to show that the approximation guarantees 
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obtained for the previous model also hold 
in this case, ie., Shmoys et al. obtained 
a 5.69-approximation algorithm for splittable 
demands and a 7.62-approximation algorithm for 
unsplittable demands. This latter model is the 
one considered in most later papers, so this is the 
model that is referred to in the paragraph on soft 
capacity results below. 


UFL 

The first algorithm with constant performance 
guarantee was the 3.16-approximation algorithm 
by Shmoys, Tardos, and Aardal, see above. Since 
then numerous improvements have been made. 
Guha and Khuller [19, 20] proved a lower bound 
on approximability of 1.463, and introduced 
a greedy augmentation procedure. A series of 
approximation algorithms based on LP-rounding 
was then developed (see e.g., [10, 13]). There 
are also greedy algorithms that only use the LP- 
relaxation implicitly to obtain a lower bound 
for a primal-dual analysis. An example is the 
JMS 1.61-approximation algorithm developed by 
Jain, Mahdian, and Saberi [29]. Some algorithms 
combine several techniques, like the 1.52- 
approximation algorithm of Mahdian, Ye, and 
Zhang [39], which uses the JMS algorithm and 
the greedy augmentation procedure. Currently, 
the best known approximation guarantee is 
1.5 reported by Byrka [10]. It is obtained by 
combining a randomized LP-rounding algorithm 
with the greedy JMS algorithm. 


max-UFL 

The first constant factor approximation algorithm 
was derived in 1977 by Cornuéjols et al. [15] 
for max-UFL. They showed that opening one 
facility at a time in a greedy fashion, choosing the 
facility to open as the one with highest marginal 
profit, until no facility with positive marginal 
profit can be found, yields a (1 — 1/e) = 0.632- 
approximation algorithm. The current best ap- 
proximation factor is 0.828 by Ageev and Sviri- 
denko [2]. 


k-Median, k-Center 
The first constant factor approximation algorithm 
for the k-median problem is due to Charikar, 
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Guha, Tardos, and Shmoys [11]. This LP- 
rounding algorithm has the approximation ratio 
of 63. The currently best known approximation 
ratio is 3 + € achieved by a local search heuristic 
of Arya, et al. [6] (see also a separate entry k- 
median and Facility Location). 

The first constant factor approximation 
algorithm for the k-center problem was given 
by Hochbaum and Shmoys [26], who developed 
a 2-approximation algorithm. This performance 
guarantee is the best possible unless P = NP. 


Capacitated Facility Location 

For the soft-capacitated problem with equal ca- 
pacities, the first constant factor approximation 
algorithms are due to Shmoys et al. [51] for 
both the splittable and unsplittable demand cases, 
see above. Recently, a 2-approximation algorithm 
for the soft capacitated facility location problem 
with unsplittable unit demands was proposed by 
Mahdian et al. [39]. The integrality gap of the LP 
relaxation for the problem is also 2. Hence, to 
improve the approximation guarantee one would 
have to develop a better lower bound on the 
optimal solution. 

In the hard capacities version it is important 
to allow for splitting the demands, as otherwise 
even the feasibility problem becomes difficult. 
Suppose demands are splittable, then we may 
to distinguish between the equal capacity case, 
where u; = u for alli € F, and the general case. 
For the problem with equal capacities, a 5.83- 
approximation algorithm was given by Chudak 
and Wiliamson [14]. The first constant factor 
approximation algorithm, with y = 8.53 + e, for 
general capacities was given by Pal, Tardos, 
and Wexler [44]. This was later improved 
by Zhang, Chen, and Ye [57] who obtained 
a 5.83-approximation algorithm also for general 
capacities. 


k-Level Problem 

The first constant factor approximation algorithm 
for k = 2 is due to Shmoys et al. [51], with 
y = 3.16. For general k, the first algorithm, hav- 
ing y = 3, was proposed by Aardal, Chudak, and 
Shmoys [1]. For k = 2, Zhang [56] developed 
a 1.77-approximation algorithm. He also showed 
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that the problem for k = 3 and k = 4 can be 
approximated by y = 2.523 (This value of y 
deviates slightly from the value 2.51 given in the 
paper. The original argument contained a minor 
calculation error.) and y = 2.81 respectively. 


Applications 


Facility location has numerous applications in the 
field of operations research. See the book edited 
by Mirchandani and Francis [42] or the book by 
Nemhauser and Wolsey [43] for a survey and 
a description of applications of facility location 
in problems such as plant location and locating 
bank accounts. Recently, the problem has found 
new applications in network design problems 
such as placement of routers and caches [22, 
36], agglomeration of traffic or data [4, 21], and 
web server replications in a content distribution 
network [31, 45]. 


Open Problems 


A major open question is to determine the exact 
approximability threshold of UFL and close the 
gap between the upper bound of 1.5 [10] and the 
lower bound of 1.463 [20]. Another important 
question is to find better approximation algo- 
rithms for k-median. In particular, it would be 
interesting to find an LP-based 2-approximation 
algorithm for k-median. Such an algorithm would 
determine the integrality gap of the natural LP 
relaxation of this problem, as there are simple 
examples that show that this gap is at least 2. 


Experimental Results 


Jain et al. [28] published experimental results 
comparing various primal-dual algorithms. 
A more comprehensive experimental study of 
several primal-dual, local search, and heuristic 
algorithms is performed by Hoefer [27]. 
A collection of data sets for UFL and several 
other location problems can be found in the OR- 
library maintained by Beasley [9]. 
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Cross-References 


Assignment Problem 

Bin Packing (hardness of Capacitated Facility 
Location with unsplittable demands) 

Circuit Placement 

Greedy Set-Cover Algorithms (hardness of 
a variant of UFL, where facilities may be built 
at all locations with the same cost) 

Local Approximation of Covering and Packing 
Problems 

Local Search for K-medians and Facility Loca- 
tion 
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Problem Definition 


A distributed system is comprised of a collection 
of processes. The processes typically seek to 
achieve some common task by communicating 
through message passing or shared memory. 
Most interesting tasks require, at least at 
certain points of the computation, some form of 
agreement between the processes. An abstract 
form of such agreement is consensus where 
processes need to agree on a single value among 
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a set of proposed values. Solving this seemingly 
elementary problem is at the heart of reliable 
distributed computing and, in particular, of 
distributed database commitment, total ordering 
of messages, and emulations of many shared 
object types. 

Fischer, Lynch, and Paterson’s seminal result 
in the theory of distributed computing [13] says 
that consensus cannot be deterministically solved 
in an asynchronous distributed system that is 
prone to process failures. This impossibility holds 
consequently for all distributed computing prob- 
lems which themselves rely on consensus. 

Failures and asynchrony are fundamental in- 
gredients in the consensus impossibility. The im- 
possibility holds even if only one process fails, 
and it does so only by crashing, i.e., stopping 
its activities. Tolerating crashes is the least one 
would expect from a distributed system for the 
goal of distribution is in general to avoid single 
points of failures in centralized architectures. 
Usually, actual distributed applications exhibit 
more severe failures where processes could devi- 
ate arbitrarily from the protocol assigned to them. 

Asynchrony refers to the absence of assump- 
tions on process speeds and communication de- 
lays. This absence prevents any process from 
distinguishing a crashed process from a correct 
one and this inability is precisely what leads 
to the consensus impossibility. In practice, how- 
ever, distributed systems are not completely asyn- 
chronous: some timing assumptions can typically 
be made. In the best case, if precise lower and 
upper bounds on communication delays and pro- 
cess speeds are assumed, then it is easy to show 
that consensus and related impossibilities can be 
circumvented despite the crash of any number of 
processes [20]. 

Intuitively, the way that such timing assump- 
tions circumvent asynchronous impossibilities 
is by providing processes with information 
about failures, typically through time-out (or 
heart-beat) mechanisms, usually underlying 
actual distributed applications. Whereas certain 
information about failures can indeed be obtained 
in distributed systems, the accuracy of such 
information might vary from a system to another, 
depending on the underlying network, the load 
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of the application, and the mechanisms used to 
detect failures. A crucial problem in this context 
is to characterize such information, in an abstract 
and precise way. 


Key Results 


The Failure Detector Abstraction 

Chandra and Toueg [5] defined the failure de- 
tector abstraction as a simple way to capture 
failure information that is needed to circumvent 
asynchronous impossibilities, in particular the 
consensus impossibility. The model considered 
in [5] is a message passing one where processes 
can fail by crashing. Processes that crash stop 
their activities and do not recover. Processes that 
do not crash are said to be correct. At least 
one process is supposed to be correct in every 
execution of the system. 

Roughly speaking, a failure detector is an 
oracle that provides processes with information 
about failures. The oracle is accessed in each 
computation step of a process and it provides 
the process with a value conveying some failure 
information. The value is picked from some set 
of values, called the range of the failure de- 
tector. For instance, the range could be the set 
of subsets of processes in the system, and each 
subset could depict the set of processes detected 
to have crashed, or considered to be correct. 
This would correspond to the situation where the 
failure detector is implemented using a time-out: 
every process q that does not communicate within 
some time period with some process p, would 
be included in subset of processes suspected of 
having crashed by p. 

More specifically, a failure detector is a func- 
tion, D, that associates to each failure pattern, F, 
a set of failure detector histories {Hi} = D(F). 
Both the failure pattern and the failure detector 
history are themselves functions. 


¢ A failure pattern F is a function that associates 
to each time 1, the set of processes F(t) that 
have indeed crashed by time ¢. This notion 
assumes the existence of a global clock, out- 
side the control of the processes, as well as 


725 


a specific concept of crash event associated 
with time. A set of failure pattern is called an 
environment. 

¢ A failure detector history H is also a function, 
which associates to each process p and time 7, 
some value v from the range of failure detector 
values. (The range of a failure detector D is 
denoted Rp.) This value v is said to be output 
by the failure detector D at process p and 
time ¢. 


Two observations are in order. 


¢ By construction, the output of a failure de- 
tector does not depend on the computation, 
i.e., on the actual steps performed by the pro- 
cesses, on their algorithm or the input of such 
algorithm. The output of the failure detector 
depends solely on the failure pattern, namely 
on whether and when processes crashed. 

¢ A failure detector might associate several 
histories to each failure pattern. Each history 
represents a suite of possible combinations 
of outputs for the same given failure pattern. 
This captures the inherent non-determinism 
of a failure detection mechanism. Such 
a mechanism is typically itself implemented 
as a distributed algorithm and the variations in 
communication delays for instance could lead 
the same mechanism to output (even slightly) 
different information for the same failure 
pattern. 


To illustrate these concepts, consider two classi- 
cal examples of failure detectors. 


1. The perfect failure detector outputs a subset 
of processes, i.e., the range of the failure 
detector is the set of subsets of processes in the 
system. When a process q is output at some 
time f at a process p, then g is said to be 
detected (of having crashed) by p. The perfect 
failure detector guarantees the two following 
properties: 


e Every process that crashes is eventually 
permanently detected; 
¢ No correct process is ever detected. 
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2. The eventually strong failure detector outputs 
a subset of processes: when a process q is 
output at some time f at a process p, then q is 
said to be suspected (of having crashed) by p. 
An eventually strong failure detector ensures 
the two following properties: 

e Every process that crashes is eventually 
suspected; 

e Eventually, some correct process is never 
suspected. 


The perfect failure detector is reliable: if a pro- 
cess q is detected, then g has crashed. An even- 
tually strong failure detector is unreliable: there 
never is any guarantee that the information that is 
output is accurate. The use of the term suspected 
conveys that idea. The distinction between un- 
reliability and reliability was precisely captured 
in [14] for the general context where the range of 
the failure detector can be arbitrary. 


Consensus Algorithms 
Two important results were established in [5]. 


Theorem 1 (Chandra-Toueg [5]) There is an 
algorithm that solves consensus with a perfect 
failure detector. 


The theorem above implicitly says that if the 
distributed system provides means to implement 
perfect failure detection, then the consensus im- 
possibility can be circumvented, even if all but 
one process crashes. In fact, the result holds for 
any failure pattern, i.e., in any environment. 

The second theorem below relates the exis- 
tence of a consensus algorithm to a resilience 
assumption. More specifically, the theorem holds 
in the majority environment, which is the set 
of failure patterns where more than half of the 
processes are correct. 


Theorem 2 (Chandra-Toueg [5]) There is an 
algorithm that implements consensus with an 
eventually strong failure detector in the majority 
environment. 


The algorithm underlying the result above is sim- 
ilar to eventually synchronous consensus algo- 
rithms [10] and share also some similarities with 
the Paxos algorithm [18]. It is shown in [5] that 
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no algorithm using solely the eventually strong 
failure detector can solve consensus without the 
majority assumption. (This result is generalized 
to any unreliable failure detector in [14].) This 
resilience lower bound is intuitively due to the 
possibility of partitions in a message passing 
system where at least half of the processes can 
crash and failure detection is unreliable. In shared 
memory for example, no such possibility exists 
and consensus can be solved with the eventually 
strong failure [19]. 


Failure Detector Reductions 

Failure detectors can be compared. A failure 
detector D2 is said to be weaker than a failure 
detector D, if there is an asynchronous algorithm, 
called a reduction algorithm, which, using D,, 
can emulate Dj. Three remarks are important 
here. 


¢ The fact that the reduction algorithm is asyn- 
chronous means that it does not use any other 
source of failure information, besides D). 

¢ Emulating failure detector D. means imple- 
menting a distributed variable that mimics the 
output that could be provided by D2. 

¢ The existence of a reduction algorithm 
depends on environment. Hence, strictly 
speaking, the fact that a failure detector is 
weaker than another one depends on the 
environment under consideration. 


If failure detector D; is weaker than D2, and vice 
et versa, then D; and Dz are said to be equivalent. 
Else, if D,; is weaker than D2 and D> is not weaker 
than D;, then D, is said to be strictly weaker 
than D2. Again, strictly speaking, these notions 
depend on the considered environment. 

The ability to compare failure detectors help 
define a notion of weakest failure detector to 
solve a problem. Basically, a failure detector D 
is the weakest to solve a problem P if the two 
following properties are satisfied: 


¢ There is an algorithm that solves P using D. 

¢ If there is an algorithm that solves P using 
some failure detector D’, then D is weaker 
than D’. 
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Theorem 3 (Chandra-Hadzilacos-Toueg [4]) 
The eventually strong failure detector is the weak- 
est to solve consensus in the majority environ- 
ment. 


The weakest failure detector to implement con- 
sensus in any environment was later established 
in [8]. 


Applications 


A Practical Perspective 

The identification of the failure detector concept 
had an impact on the design of reliable distributed 
architectures. Basically, a failure detector can be 
viewed as a first class service of a distributed 
system, at the same level as a name service or 
a file service. Time-out and heartbeat mecha- 
nisms can thus be hidden under the failure detec- 
tor abstraction, which can then export a unified 
interface to higher level applications, including 
consensus and state machine replication algo- 
rithms [2, 11, 21]. 

Maybe more importantly, a failure detector 
service can encapsulate synchrony assumptions: 
these can be changed without impact on the 
rest of the applications. Minimal synchrony 
assumptions to devise specific failure detectors 
could be explored leading to interesting 
theoretical results [1, 7, 12]. 


A Theoretical Perspective 

A second application of the failure detector 
concept is a theory of distributed computability. 
Failure detectors enable to classify problems. 
A problem A is harder (resp. strictly harder) 
than problem B if the weakest failure detector 
to solve B is weaker (resp. strictly weaker) than 
the weakest failure detector to solve A. (This 
notion is of course parametrized by a specific 
environment.) 

Maybe surprisingly, the induced failure 
detection reduction between problems does not 
exactly match the classical black-box reduction 
notion. For instance, it is well known that there 
is no asynchronous distributed algorithm that can 
use a Queue abstraction to implement a Compare- 
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Swap abstraction in a system of n > 2 processes 
where n — 1 can fail by crashing [15]. In this 
sense, a Compare-Swap abstraction is strictly 
more powerful (in a black-box sense) than 
a Queue abstraction. It turns out that: 


Theorem 4 (Delporte-Fauconnier-Guerraoui 
[9]) The weakest failure detector to solve the 
Queue problem is also the weakest to solve the 
Compare-Swap problem in a system of n > 2 
processes where n — | can fail by crashing. 


In a sense, this theorem indicates that reducibility 
as induced by the failure detector notion is differ- 
ent from the traditional black-box reduction. 


Open Problems 


Several issues underlying the failure detector 
notion are still open. One such issue consists 
in identifying the weakest failure detector to 
solve the seminal set-agreement problem [6]: 
a decision task where processes need to agree 
on up to k values, instead of a single value 
as in consensus. Three independent groups of 
researchers [3, 16, 22] proved the impossibility 
of solving this problem in an asynchronous 
system with k failures, generalizing the consensus 
impossibility [13]. Determining the weakest 
failure detector to circumvent this impossibility 
would clearly help understand the fundamentals 
of failure detection reducibility. 

Another interesting research direction is to 
relate the complexity of distributed algorithm 
with the underlying failure detector [17]. Clearly, 
failure detectors circumvents asynchronous 
impossibilities, but to what extend do they 
boost the complexity of distributed algorithms? 
One would of course expect the complexity of 
a solution to a problem to be higher if the failure 
detector is weaker. But to what extend? 
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Problem Definition 


In Internet auctions, it is easy for a bidder to sub- 
mit multiple bids under multiple identifiers (e.g., 
multiple e-mail addresses). If only one item/good 
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is sold, a bidder cannot make any additional profit 
by using multiple bids. However, in combinato- 
rial auctions, where multiple items/goods are sold 
simultaneously, submitting multiple bids under 
fictitious names can be profitable. A bid made 
under a fictitious name is called a false-name bid. 

Here, use the same model as the GVA section. 
In addition, false-name bids are modeled as 
follows. 


¢ Each bidder can use multiple identifiers. 

¢ Each identifier is unique and cannot be imper- 
sonated. 

¢« Nobody (except the owner) knows whether 
two identifiers belongs to the same bidder or 
not. 


The goal is to design a false-name-proof protocol, 
i.e., a protocol in which using false-names is use- 
less, thus bidders voluntarily refrain from using 
false-names. 

The problems resulting from collusion have 
been discussed by many researchers. Compared 
with collusion, a false-name bid is easier to ex- 
ecute on the Internet since obtaining additional 
identifiers, such as another e-mail address, is 
cheap. False-name bids can be considered as 
a very restricted subclass of collusion. 


Key Results 


The Generalized Vickrey Auction (GVA) proto- 
col is (dominant strategy) incentive compatible, 
i.e., for each bidder, truth-telling is a dominant 
strategy (a best strategy regardless of the action of 
other bidders) if there exists no false-name bids. 
However, when false-name bids are possible, 
truth-telling is no longer a dominant strategy, i.e., 
the GVA is not false-name-proof. 

Here is an example, which is identical to 
Example | in the GVA section. 


Example I Assume there are two goods a and b, 
and three bidders, bidder 1, 2, and 3, whose types 
are 9;, 92, and 93, respectively. The evaluation 
value for a bundle v(B,6;) is determined as 
follows. 
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tay tb} 
o $6 $0 $6 
62 $0 $0 $8 
63 $0 $5 $5 


As shown in the GVA section, good a is allocated 
to bidder 1, and b is allocated to bidder 3. Bid- 
der 1 pays $3 and bidder 3 pays $2. 

Now consider another example. 


Example 2 Assume there are only two bidders, 
bidder 1 and 2, whose types are 9; and 95, 
respectively. The evaluation value for a bundle 
v(B, 6;) is determined as follows. 


tay tb} ta DS 
0 $6 $5 $11 
A $0 $0 $8 


In this case, the bidder | can obtains both goods, 
but he/she requires to pay $8, since if bidder 1 
does not participate, the social surplus would 
have been $8. When bidder | does participate, 
bidder | takes everything and the social surplus 
except bidder | becomes 0. Thus, bidder | needs 
to pay the decreased amount of the social surplus, 
ie., $8. 

However, bidder 1 can use another identifier, 
namely, bidder 3 and creates a situation identical 
to Example 1. Then, good a is allocated to bid- 
der 1, and b is allocated to bidder 3. Bidder 1 pays 
$3 and bidder 3 pays $2. Since bidder 3 is a false- 
name of bidder 1, bidder 1 can obtain both goods 
by paying $3 + $2 = $5. Thus, using a false- 
name is profitable for bidder 1. 

The effects of false-name bids on combinato- 
rial auctions are analyzed in [4]. The obtained 
results can be summarized as follows. 


¢« As shown in the above example, the GVA 
protocol is not false-name-proof. 

e There exists no false-name-proof combinato- 
rial auction protocol that satisfies Pareto effi- 
ciency. 

¢ Ifasurplus function of bidders satisfies a con- 
dition called concavity, then the GVA is guar- 
anteed to be false-name-proof. 
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Also, a series of protocols that are false-name- 
proof in various settings have been developed: 
combinatorial auction protocols [2, 3], multi- 
unit auction protocols [1], and double auction 
protocols [5]. 

Furthermore, in [2], a distinctive class of 
combinatorial auction protocols called a Price- 
oriented, Rationing-free (PORF) protocol is 
identified. The description of a PORF protocol 
can be used as a guideline for developing 
strategy/false-name proof protocols. 

The outline of a PORF protocol is as 
follows: 


1. For each bidder, the price of each bundle of 
goods is determined independently of his/her 
own declaration, while it depends on the dec- 
larations of other bidders. More specifically, 
the price of bundle (a set of goods) B for bid- 
der i is determined by a function p(B, Ox), 
where @y is a set of declared types by other 
bidders X. 

2. Each bidder is allocated a bundle that maxi- 
mizes his/her utility independently of the allo- 
cations of other bidders (i.e., rationing-free). 
The prices of bundles must be determined so 
that allocation feasibility is satisfied, i.e., no 
two bidders want the same item. 


Although a PORF protocol appears to be quite 
different from traditional protocol descriptions, 
surprisingly, it is a sufficient and necessary con- 
dition for a protocol to be strategy-proof. Further- 
more, if a PORF protocol satisfies the following 
additional condition, it is guaranteed to be false- 
name-proof. 


Definition 1 (No Super-Additive price increase 
(NSA)) For any subset of bidders S CN 
and N’ = N \S, and for i € S, denote B; as 
a bundle that maximizes i’s utility, then Vie s 


D(Bi, U jess tO} U Ow’ )=P(Vies B;, On’). 


An intuitive description of this condition is that 
the price of buying a combination of bundles (the 
right side of the inequality) must be smaller than 
or equal to the sum of the prices for buying these 
bundles separately (the left side). This condition 
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is also a necessary condition for a protocol to 
be false-name-proof, i.e., any false-name-proof 
protocol can be described as a PORF protocol that 
satisfies the NSA condition. 

Here is a simple example of a PORF protocol 
that is false-name-proof. This protocol is called 
the Max Minimal-Bundle (M-MB) protocol [2]. 
To simplify the protocol description, a concept 
called a minimal bundle is introduced. 


Definition 2 (minimal bundle) Bundle B is 
called minimal for bidder i, if for all B’ C B and 
B' & B, v(B',6;) < v(B, 9;) holds. 


In this new protocol, the price of bundle B for 
bidder i is defined as follows: 


° p(B, Ox) = maxgjcm,jex v(B;,9;), 
where BB; A and B; is minimal for 
bidder j. 


How this protocol works using Example 1 
is described here. The prices for each bidder is 
determined as follows. 


{a} {b} {a,b} 
bidder1 $8 $8 $8 
bidder2 $6 $5 $6 
bidder3 $8 $8 $8 


The minimal bundle for bidder 1 is {a}, the 
minimal bundle for bidder 2 is {a, b}, and the 
minimal bundle for bidder 3 is {b}. The price of 
bundle {a} for bidder 1 is equal to the largest 
evaluation value of conflicting bundles. In this 
case, the price is $8, i.e., the evaluation value of 
bidder 2 for bundle {a, b}. Similarly, the price of 
bidder 2 for bundle {a, b} is 6, i.e., the evaluation 
value of bidder 1 for bundle {a}. As a result, 
bundle {a, b} is allocated to bidder 2. 

It is clear that this protocol satisfies the alloca- 
tion feasibility. For each good /, choose bidder j” 
and bundle B; that maximize v(B;,6;) where 
1 < B; and B; is minimal for bidder j. Then, 
only bidder j* is willing to obtain a bundle that 
contains good /. For all other bidders, the price of 
a bundle that contains / is higher than (or equal 
to) his/her evaluation value. 
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Furthermore, it is clear that this protocol sat- 
isfies the NSA condition. In this pricing scheme, 
p(B U B’,@x) = max(p(B, Ox), p(B’, Ox)) 
holds for all B, B’, and ©x. Therefore, the fol- 
lowing formula holds 


P (Us. a) =max p(Bi, Ox)<)) p(Bi.Ow). 


ieS ieS 


Furthermore, in this pricing scheme, prices 
increase monotonically by adding opponents, i.e., 
for all X’ D X, p(B, Oxy’) = p(B, Ox) holds. 
Therefore, for each i, p(Bi,Ujesygy tO} U 
On’) > p(Bi,@n’) holds. Therefore, the 
NSA condition, ie., lies P(Bi, Ujes\gitOs3U 
On’')> P(Uies B;, On’) holds. 


Applications 


In Internet auctions, using multiple identifiers 
(e.g., multiple e-mail addresses) is quite easy 
and identifying each participant on the Internet 
is virtually impossible. Combinatorial auctions 
have lately attracted considerable attention. 
When combinatorial auctions become widely 
used in Internet auctions, false-name-bids could 
be a serious problem. 


Open Problems 


It is shown that there exists no false-name-proof 
protocol that is Pareto efficient. Thus, it is in- 
evitable to give up the efficiency to some extent. 
However, the theoretical lower-bound of the ef- 
ficieny loss, i.e., the amount of the efficiency 
loss that is inevitabe for any false-name-proof 
protocol, is not identified yet. Also, the efficiency 
loss of existing false-name-proof protocols can 
be quite large. More efficient false-name-proof 
protocols in various settings are needed. 
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Problem Definition 


Minimal triangulation is the addition of an inclu- 
sion minimal set of edges to an arbitrary undi- 
rected graph, such that a chordal graph is ob- 
tained. A graph is chordal if every cycle of 
length at least 4 contains an edge between two 
nonconsecutive vertices of the cycle. 

More formally, Let G = (V, FE) be a simple 
and undirected graph, where n=|V| and 
m=|E|. A graph H =(V,EUF), where 
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EQ F =@ is a triangulation of G if H is 
chordal, and H is a minimal triangulation if there 
exists no F’ C F, such that H’ = (V,E U F’) 
is chordal. Edges in F are called fill edges, 
and a triangulation is minimal if and only if 
the removal of any single fill edge results in 
a chordless four cycle [10]. 

Since minimal triangulations were first de- 
scribed in the mid-1970s, a variety of algorithms 
have been published. A complete overview of 
these along with different characterizations of 
chordal graphs and minimal triangulations can 
be found in the survey of Heggernes et al. [5] 
on minimal triangulations. Minimal triangulation 
algorithms can roughly be partitioned into algo- 
rithms that obtain the triangulation through elim- 
ination orderings, and those that obtain it through 
vertex separators. Most of these algorithms have 
an O(nm) running time, which becomes O(n?) for 
dense graphs. Among those that use elimination 
orderings, Kratsch and Spinrad’s O(n?-°)-time 
algorithm [8] is currently the fastest one. The 
fastest algorithm is an 0(n?37°)-time algorithm 
by Heggernes et al. [5]. This algorithm is based 
on vertex separators, and will be discussed further 
in the next section. Both the algorithm of Kratsch 
and Spinrad [8] and the algorithm of Heggernes 
et al. [5] use the matrix multiplication algorithm 
of Coppersmith and Winograd [3] to obtain an 
o(n?)-time algorithm. 


Key Results 


For a vertex set ACV, the subgraph of 
G induced by A is G[A]=(A,W), where 
uv € Wif u,v € Aanduv € E}). The closed 
neighborhood of A is N[A]=U, where 
u,v € U foreveryuv € E, whereu € A} 
and N(A) = N[A]\ A. A is called a clique if 
G[A] is a complete graph. A vertex set S C V is 
called a separator if G[V \ S] is disconnected, 
and S is called a minimal separator if there exists 
a pair of vertices a,b € V \ S such that a, b are 
contained in different connected components of 
G[V \ S], and in the same connected component 
of G[V \ S|] for any S’C S. A vertex set 
92.C V is a potential maximal clique if there 
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exists no connected component of G[V \ Q] 
that contains Q in its neighborhood, and for 
every vertex pair u,v € §2, uv is an edge or there 
exists a connected component of G[V \ £2] that 
contains both u and v in its neighborhood. 

From the results in [1, 7], the following 
recursive minimal triangulation algorithm is 
obtained. Find a vertex set A which is either 
a minimal separator or a potential maximal 
clique. Complete G[A] into a clique. Recursively 
for each connected component C of G[V \ A] 
where G[N[C]] is not a clique, find a minimal 
triangulation of G[N[C]]. An important property 
here is that the set of connected components 
of G[V \ A] defines independent minimal 
triangulation problems. 

The recursive algorithm just described defines 
a tree, where the given input graph G is the 
root node, and where each connected component 
of G[V \ A] becomes a child of the root node 
defined by G. Now continue recursively for each 
of the subproblems defined by these connected 
components. A node H which is actually a sub- 
problem of the algorithm is defined to be at level 
i, if the distance from H to the root in the tree is 
i. Notice that all subproblems at the same level 
can be triangulated independently. Let k be the 
number of levels. If this recursive algorithm can 
be completed for every subgraph at each level 
in O(f(n)) time, then this trivially provides an 
O(f(n) - k)-time algorithm. 

The algorithm in Fig. 1 uses queues to 
obtain this level-by-level approach, and matrix 
multiplication to complete all the vertex 
separators at a given level in O(n) time, where 
a < 2.376 [3]. In contrast to the previously 
described recursive algorithm, the algorithm in 
Fig. 1 uses a partitioning subroutine that either 
returns a set of minimal separators or a potential 
maximal clique. 

Even though all subproblems at the same level 
can be solved independently they may share ver- 
tices and edges, but no nonedges (i.e., pair of 
vertices that are not adjacent). Since triangulation 
involves edge addition, the number of nonedges 
will decrease for each level, and the sum of 
nonedges for all subproblems at the same level 
will never exceed n*. The partitioning algorithm 
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Algorithm FMT - Fast Minimal Triangulation 


Input: An arbitrary graph G = (V,E). 
Output: A minimal triangulation G’ of G. 


Let Q;, Q2 and Q3 be empty queues; Insert G into Q); 


repeat 
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G'=G; 


Construct a zero matrix M with a row for each vertexin V (columns are added later); 


while Q; is nonempty do 
Pop a graph H =(U, D) from Q); 


Call Algorithm Partition( H) which returns a vertex subset A CU; 


Push vertex set A onto Q3; 


for each connected component C of H[U < A] do 
Add acolumn in M such that M(v, C) = 1 for all vertices v € Ny(C); 
if there exists a non-edge uv in H[Ny[C]] with u € C then 
Push Hc =(Ny[C], Dc) onto Q2, where uv ¢ Dc if u € C and uv ¢ D; 


Compute MM fe 


Add to G’ the edges indicated by the nonzero elements of MM Ae 


while Q3 is nonempty do 
Pop a vertex set A from Q3; 


if G’[A] is not complete then Push G’[ A] onto Q); 


Swap names of Q) and Q; 
until Q; is empty 


Fast Minimal Triangulation, Fig. 1 Fast minimal triangulation algorithm 


in Fig. 2 exploits this fact and has an O(n? — m) 
running time, which sums up to O(n?) for each 
level. Thus, each level in the fast minimal trian- 
gulation algorithm given in Fig. 1 can be com- 
pleted in O(n? + n%) time, where O(n“) is the 
time needed to compute MM’. The partitioning 
algorithm in Fig. 2 actually finds a set A that 
defines a set of minimal separators, such that 
no subproblem contains more than four fifths of 
the nonedges in the input graph. As a result, the 
number of levels in the fast minimal triangulation 
algorithm is at most log4 s(n”) = 2logy)s(”), 
and the running time O(n® log 7) is obtained. 


Applications 


The first minimal triangulation algorithms were 
motivated by the need to find good pivotal 
orderings for Gaussian elimination. Finding 
an optimal ordering is equivalent to solving 
the minimum triangulation problem, which 


is a nondeterministic polynomial-time hard 
problem. Since any minimum triangulation 
is also a minimal triangulation, and minimal 
triangulations can be found in polynomial time, 
then the set of minimal triangulations can be 
a good place to search for a pivotal ordering. 
Probably because of the desired goal, the 
first minimal triangulation algorithms were 
based on orderings, and produced an ordering 
called a minimal elimination ordering. The 
problem of computing a minimal triangulation 
has received increasing attention since then, and 
several new applications and characterizations 
related to the vertex separator properties have 
been published. Two of the new applications 
are computing the tree-width of a graph, and 
reconstructing evolutionary history through 
phylogenetic trees [6]. The new  separator- 
based characterizations of minimal triangulations 
have increased the knowledge of minimal 
triangulations [1, 7, 9]. One result based on these 
characterizations is an algorithm that computes 
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Algorithm Partition 
Input: 
Output: 
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A graph H= (U, D) (a subproblem popped from Q)). 
A subset A of Usuch that either A = N[K] for some connected H[K] 


or A is a potential maximal clique of H (and G’). 


Part I: defining P 
Unmark all vertices of H; k=1; 
while there exists an unmarked vertex u do 


if €_,(U \ Ny[ul)< 2|E(H)| then Mark uw as an s-vertex (stop vertex); 


else 


Cx = {u}; Mark was a c-vertex (component vertex); 


while there exists a vertex v € Nz[C,] which is unmarked or marked as an s-vertex do 
if E;(U \ Ny[C; u {v}]) > 2|E(H)| then 
Cy = Cy U {v}; Mark v as a c-vertex (component vertex); 


else 


Mark v as a p-vertex (potential maximal clique vertex); Associate v with C,; 


k=k+1; 
P= the set of all p-vertices and s-vertices; 


Part II: defining A 


if H[U \ P] has a full component C then A = N;,[C]; 
else if there exist two non-adjacent vertices u,v such that u is an s-vertex 


and v is an s-vertex or a p-vertex then A = Ny [u]; 


else if there exist two non-adjacent p-vertices u and v,where u is associated with C; 
and vis associated with Cj and u ¢ Ny(C;) and v ¢ Ny(C;) then A= Ny[C; U {u}]; 


else A = P; 


Fast Minimal Triangulation, Fig. 2 Partitioning algorithm. Let E(f)= W, where uv € W if uv ¢ D be the set 
of nonedges of H. Define £ g (S) to be the sum of degrees in H = (U, E) of vertices in S C U = V(H) 


the tree-width of a graph in polynomial time if the 
number of minimal separators is polynomially 
bounded [2]. A second application is faster exact 
(exponential-time) algorithms for computing the 
tree-width of a graph [4]. 


Open Problems 


The algorithm described shows that a minimal tri- 
angulation can be found in O((n? + n%) log n) 
time, where O(n%) is the time required to 
preform an n xn binary matrix multiplication. 
As a result, any improved binary matrix 
multiplication algorithm will result in a faster 
algorithm for computing a minimal triangulation. 
An interesting question is whether or not this 


relation goes the other way as well. Does 
there exist an O((n? + n®) f(n)) algorithm for 
binary matrix multiplication, where O(n*) is 
the time required to find a minimal triangulation 
and f(n) = o(n%~?) or at least f(n) = O(n). 
A possibly simpler and related question 
previously asked in [8] is: Is it at least as hard to 
compute a minimal triangulation as to determine 
whether a graph contains at least one triangle? 
A more algorithmic question is if there exists 
an O(n? + n%)-time algorithm for computing 
a minimal triangulation. 
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Problem Definition 


A basic strategy to solve hard problems by dy- 
namic programming is to express the partial so- 
lutions using a recurrence over the 2” subsets 
of an n-element set U. Our interest here is in 
recurrences that have the following structure: 


For each subset S C U, in order to obtain the 
partial solution at S, we consider all possible ways 
to partition S into two disjoint parts, T and S\T, 
with T CS. 


Fast subset convolution [1] is a technique to speed 
up the evaluation of such recurrences, assuming 
the recurrence can be reduced to a suitable alge- 
braic form. In more precise terms, let R be an 
algebraic ring, such as the integers equipped with 
the usual arithmetic operations (addition, nega- 
tion, multiplication). We seek a fast solution to: 


Problem (Subset Convolution) 


INPUT: Two functions f : 2 + Randg: 
2U > R. 

OUTPUT: The function f * g :2Y — R, defined 
forall S CU by 


(f *g)(S)= D> f(T)g(S \ T). 


TCS 


() 


Here, we may view the output f * g and the 
inputs f and g each as a table with 2” entries, 
where each entry is an element of R. If we 
evaluate the sum (1) directly for each S C U in 
turn, in total we will execute O()>¥_» (")25) = 
© (3”) arithmetic operations in R to obtain f * g 
from f and g. 


Key Results 


We can considerably improve on the ©(3”) di- 
rect evaluation by taking advantage of the ring 
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structure of R (i.e., the possibility to form and, 
after multiplication, cancel linear combinations 
of the entries in f and g): 


Theorem 1 (Fast Subset Convolution  [1]) 
There exists an algorithm that solves SUBSET 
CONVOLUTION in O(2"n7) arithmetic opera- 
tions in R. 


In what follows, we present an algorithm that 
proceeds via reduction to the union product and 
fast M6bius inversion; an alternative proof is 
possible via reduction to the symmetric differ- 
ence product and fast Walsh-Hadamard (Fourier) 
transforms. 


Fast Evaluation via the Union Product 

Let us start with a relaxed version of subset 
convolution. Namely, instead of partitioning S, 
we split S in all possible ways into a cover (A, B) 
with A U B = §; this cover need not be disjoint 
(i.e., we need not have AM B = Q) as would 
be required by subset convolution. For f and 
g as earlier, define the union product (covering 
product) f Ug: 2” — R forall S CU by 


(fUg(S)= Do f(A)g(B). 
A,BCU 
AUB=S 


The union product diagonalizes into a point- 
wise product via a pair of mutually inverse linear 
transforms. For a given f : 24 -—> R, the 
zeta transform ff : 2% -> R is defined for 
all S CU by f&(S) = Lees f(T), and 
its inverse the Mobius transform fu Oe = 
R is defined for all S C U by fu(S) = 
(-1)!$! 9° -e5(—1)!7! f(T). Using the zeta and 
Mobius transforms to diagonalize into a point- 
wise product, the union product can be evalu- 
ated as 


fUg=(S4)-: (go))u. (2) 


We can now reduce subset convolution to a 
union product over a polynomial ring with coeffi- 
cients in the original ring R. Denote by R[w] the 
univariate polynomial ring with indeterminate w 
and coefficients in the ring R. Let f.g:2% > R 
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be the given input to subset convolution. Extend 
the input f : 24 — R to the input f, : 
2U -s R[w] defined for all S C U by f,(S) = 
f(S)w!'S!, Extend g similarly to g,,. Compute the 
union product f, U gy using (2) over R[w]. For 
all S C U, it now holds that the coefficient of the 
monomial w’*! in the polynomial (f,, U gy)(S) is 
equal to (f * g)(S). 

To compute (2) fast, we require algorithms 
that evaluate zeta and Mobius transforms over 
an arbitrary ring in O(2"n) arithmetic opera- 
tions. We proceed via the following recurrence 
for 7 = 1,2,...,n. Let us assume that U = 
{1,2,...,n}. Let zo = f. Suppose z ;-1: qu =» 
R is available. Then, we compute z; :2Y > R 
for all S C U by 


Zj-1(S) if j ¢ S; 
£)(S) = sae, ca 

Zj-(S)+ 27S \G}) ifs eS. 
We have f¢ = Zn. The recurrence carries out 


exactly 2”~!n additions in R. To compute the 
Mobius transform fy of a given input f, first 
transform the input by negating the values of all 
sets that have odd size, then run the previous 
recurrence with the transformed input as zo, and 
transform the output z, by negating the values of 
all sets that have odd size. The result is fy. 


Remarks The fast algorithm (2) for the union 
product (in a dual form that considers intersec- 
tions instead of unions) is due to Kennes [9], who 
used the algorithm to speed up an implementa- 
tion of the Dempster-Shafer theory of evidence. 
The fast recurrences for the zeta and Mobius 
transforms are special cases of an algorithm of 
Yates [12] for multiplying a vector with an iter- 
ated Kronecker product; see Knuth [10, §4.6.4]. 


Extensions and Variations 

A number of extensions and variations of the 
basic framework are possible [1]. Iterated subset 
convolution (union product) enables one to solve 
set partitioning and packing (covering) problems. 
Assuming the input is sparse, more careful con- 
trol over the space usage of the framework can 
be obtained by splitting the fast zeta transform 
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into two parts [5]. Similarly, the running time 
can be controlled by trimming [4] the trans- 
forms, for example, to the down-closure (subset- 
closure) of the desired outputs and/or the up- 
closure (superset-closure) of the supports of the 
inputs in 2Y. A trimmed complementary dual to 
the union product is investigated in [3]. 

Beyond the subset lattice Qe. C,U,/M), fast 
algorithms are known for the zeta and Mébius 
transforms of lattices (L,<,V,A) with few 
join-irreducible elements [6]. This implies fast 
analogs of the union product (the join product) 
for such lattices. 


Applications 


Fast subset convolution and its variants are ap- 
plied to speed up dynamic programming algo- 
rithms that build up a solution from partitions 
of smaller solutions such that there is little or 
no interaction between the parts. Connectivity, 
partitioning, and subgraph counting problems on 
graphs are natural examples [1—3, 8, 11]. 

To apply fast subset convolution, it is neces- 
sary to reduce the recurrence at hand into the 
algebraic form (1). Let us briefly discuss two 
types of recurrences as examples. 


Boolean Subset Convolution 

Suppose that f and g are {0, 1}-valued, and we 
are seeking to decide whether there exists a valid 
partition of S into two parts so that one part is 
valid by f and the other part valid by g. This 
can be modeled as a Boolean (OR—AND) subset 
convolution: 


Cf *v.n 9S) = \V f(T) Ag(S\T). 


TCS 


Boolean subset convolutions can be efficiently 
reduced into a subset convolution (1) over the 
integers simply by replacing the OR with a sum 
and the AND with multiplication. 


Min-Sum Subset Convolution 
Another common situation occurs when we are 
seeking the minimum cost to partition S so that 
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the cost of one part is measured by f and the 
other by g, where both f and g take nonnegative 
integer values. This can be modeled as a min-sum 
subset convolution: 


(f *min,+ 2)(S) = min S(T) + g(S \ T). 


A min-sum subset convolution over nonneg- 
ative integers can be reduced to a_ subset 
convolution (1) over a univariate polynomial 
ring Z[x] with integer coefficients. Extend 
f : 2% > Zo to f 2 -+ Z[x] by 
setting f,(S) = x/) for all S C U. Extend 
g similarly to g,. Now observe that the degree 
of the least-degree monomial with a nonzero 
coefficient in the polynomial (f/f, * gx)(S) 
equals (f *min,+ g)(S). This reduction requires 
computation with polynomials of degree O(D) 
with D = max{maxscy f(S),maxscy g(S)}, 
which may not be practical compared with the 
O(3") baseline if D is large. 

We refer to [1] for a more detailed discussion 
and examples. 
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Problem Definition 


We consider the fully dynamic graph connectivity 
problem. Here we wish to maintain a data struc- 
ture for a simple graph that changes over time. 
We assume a fixed set of 1 vertices and updates 
consist of adding and deleting single edges. The 
data structure should support queries for vertex 
pairs (u,v) of whether u and v are connected in 
the current graph. 


Key Results 


The first nontrivial data structure is due to 
Frederickson [2] who showed how to achieve 
deterministic worst-case update time O(./m) and 
query time O(1), where m is the current number 
of edges of the graph. Using a sparsification 
technique, Eppstein et al. [1] obtained O(./n) 
update time. Much faster amortized bounds can 
be achieved. Henzinger and King [3] gave a data 
structure with O(log?) randomized expected 
amortized update time and O(logn/ log logn) 
query time. Update time was improved to 
O(log?) by Henzinger and Thorup [4]. A 
deterministic data structure with O(log”) 
amortized update time and O(logn/ log logn) 
query time was given by Holm et al. [5]. 
Thorup [8] achieved a randomized expected 
amortized update time of O(log n(loglogn)?) 
and a query time of O(logn/logloglogn). 
The fastest known deterministic amortized 
data structure was given in [9]. Its update 
time is O(log*n/loglogn) and query time is 
O(logn/loglogn). Kapron et al. [6] gave a 
Monte Carlo algorithm with polylogarithmic 
worst-case operation time. A general cell-prove 
lower bound of (2(log) was shown by Patrascu 
and Demaine [7]. 

In the following, we sketch the main ideas in 
the data structure presented in [9]. 
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A Simple Data Structure 

We start with a simple data structure similar 
to that of Thorup [8] (which is based on the 
data structure of Holm et al. [5]) that achieves 
O(log? n) update time and O(logn) query time. 
In the subsection below, we give the main 
ideas for improving these bounds by a factor 
of loglogn. 

In the following, denote by G = (V, E) the 
current graph. The data structure maintains for 
each edge e € E a level £(e) between 0 and 
Lmax = |logn|. Initially, an edge has level 0 
and its level can only increase over time. For 
the amortization, we can think of max — £(e) 
as the amount of credits associated with edge e, 
and every time £(e) increases, e pays one credit 
(which may correspond to more than one unit 
of time). Let G; denote the subgraph of G with 
vertex set V and containing the edges of level at 
least i and refer to each connected component of 
G; as a level i cluster. The following invariant is 
maintained: 


Invariant: For each 7, any level i cluster contains 
at most n/2' vertices. 


The clusters nest and thus have a forest repre- 
sentation. More specifically, the cluster forest of 
G is a forest C of rooted trees where a node u 
at depth i corresponds to a level i cluster C(u) 
and the children of u correspond to level i + 1 
clusters contained in C(u). Note that roots of 
C correspond to components of Gg = G and 
leaves correspond to vertices of G. Hence, if we 
can maintain C, we can answer a connectivity 
query (u,v) in O(logn) time by traversing the 
leaf-to-root paths from u and v, respectively, and 
checking whether the roots are distinct. 

In the following, for each node w € C, denote 
by n(w) the number of vertices of G contained in 
C(w); equivalently, n(w) is the number of leaves 
in the subtree of C rooted at w. 


Edge insertions: When a new edge e is inserted 
into G, its level £(e) is initialized to 0. Up- 
dating C amounts to merging the roots corre- 
sponding to the endpoints of e if these roots are 
distinct. 
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Edge deletions: Handling the deletion of an 
edge e = (u,v) is more involved. Let i = £(e), 
let C(w;) be the level i cluster containing 
e, and let C(uj41) and C(v;+1) be the level 
i + 1 clusters containing u and v, respectively. 
If C(uj41) = C(vj41), no changes occur 
in C. Otherwise, consider the multigraph M 
obtained from C(w;) by contracting its level 
i + 1 child clusters to single vertices. We search 
in parallel in M from C(uj+1) and C(v;+1), 
respectively, using a standard search procedure 
like BFS or DFS. Note that all edges visited 
have level 7. If a search procedure visits a vertex 
a already visited by the other procedure, the 
removal of e does not disconnect the level i 
cluster containing e. In this case, we terminate 
both search procedures. Consider the two sets 
V, and Vy of vertices of M visited by the 
procedures from C(uj+1) resp. C(vj+1) where 
we only include a in one of the sets. Then 
V, A Vy = @, and since n(w;) < n/2' by 
our invariant, either 2 ee n(w) < n/ 2+! or 
vey, 7w) < n/2't!. Assume w.l.o.g. the 
former. Then we increase the level of all visited 
edges between vertices in V, to i + 1 without 
violating the invariant. These level increases pay 
for the search procedure from C(uj+1), and since 
we ran the two procedures in parallel, they also 
pay for the search procedure from C(v;+1). 

If the two search procedures do not meet, we 
increase edge levels on one side as above but 
now C(w;) is split into two subclusters since 
we did not manage to reconnect it with level i 
edges. In this case, we recursively try to connect 
these two subclusters in the level 7 — 1 cluster 
containing them. If we are in this case at level 0, 
it means that a connected component of Gp = G 
is split in two. 


Performance: To show how to implement the 
above with O(log” n) update time, let us assume 
for now that C is a forest of binary trees. In order 
for a search procedure to visit a level i edge from 
a level i + 1 cluster C(a;+ 1) to a level i + 1 
cluster C(b;+1), it identifies the start point a of 
this edge (a, b) in G by traversing the path in C 
from a;+1 down to leaf a. It then visits (a, b) 
and traverses the path in C from leaf b up to 
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Faster Deterministic Fully-Dynamic Graph Connec- 
tivity, Fig. 1 Hybrid local tree for u. Left subtree Ty (u) 
is a simple local tree for the heavy children and the black 
subtrees are rank trees attached to a rank path ending in 


bj. To guide the downward searches for level 
i edges, we maintain for every node w of C a 
bitmap whose ith bit is 1 iff there is a level i- 
edge incident to a leaf in the subtree of C rooted 
at w. Since trees in C are binary, the start point a 
of (a, b) can be identified from a;+; in O(log n) 
time using these bitmaps. Hence, each edge level 
increase costs O(logn). Since edge levels can 
only increase, an edge pays a total of O(log? n). 
Hence, we achieve an amortized update time of 
O(log? n). 

Above, we assumed that trees in C are binary. 
To handle the general case, we modify C to 
a different forest Cz by adding a simple local 
tree L(u) between each non-leaf node wu and its 
children. Associate with each node v € Ca 
rank rank(v) = |logn(v)|. To form L(w), let 
C be the set of its children. As long as there are 
nodes in C with the same rank 7, we give them 
a common parent with rank r + 1 and replace 
them by this parent in C. When this procedure 
terminates, we have at most logn rank trees 
whose roots have pairwise distinct ranks and we 
attach these roots to a rank path whose root is 
u; rank tree roots with bigger rank are attached 
closer to u than rank tree roots of smaller rank. 
The resulting tree L(u) is binary; see the left 
subtree in Fig. 1 for an illustration. Hence, the 
trees in Cz are binary as well, and it is easy to see 
that they have height O(log). The performance 
analysis above for C then carries through to Cz, 
and we still have an amortized update time of 
O(log? n). 
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T(u) 


the root of Ty (u). Right subtree T7 (uw) is a lazy local tree 
for the light children; see [8,9] for details on the structure 
of this tree 


A Faster Data Structure 

The above data structure has update time 
O(log? n) and query time O(logn). We now 
sketch how to speed up both bounds by a factor 
of loglogn. We can get the speedup for query 
time by adding an upward shortcutting system 
in Cz where for each leaf-to-root path, we have 
shortcuts each skipping O(loglogn) vertices. 
Maintaining this shortcutting system can be 
done efficiently. This system also gives a factor 
log log n speedup for each of the upward searches 
performed by the search procedures described 
earlier. Speeding up downward searches can be 
done using a variant of a downward shortcutting 
system of Thorup [8]. 

These two shortcutting systems alone 
do not suffice to improve update time to 
O(log” n/loglogn). The data structure needs 
to support merges and splits of clusters, and 
with the simple local trees defined above, this 
costs O(logn) time per merge/split, and each 
edge needs to pay a total of O(log*n) for this 
over all its level increases. Thorup [8] considered 
lazy local trees which can be maintained much 
more efficiently under cluster merges/splits. 
However, using these trees to form Cz may 
increase the height of trees in this forest to order 
logn loglogn which will slow down our upward 
shortcutting system by a factor of loglogn. To 
handle this, consider a hybrid of the simple local 
tree and Thorup’s lazy local tree. For a non-leaf 
node u in C, a child v is called heavy if n(v) = 
n(u)/log* n where € > 0 is a constant that we 
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can pick arbitrarily small. A child that is not 
heavy is called light. Now, the hybrid local tree 
of u consists of a simple local tree T),(u) for the 
heavy children and a lazy local tree Tj (u) for the 
light children; see Fig. 1 for an illustration. It can 
be shown that trees in Cz have height o(t logn) 
if we use hybrid local trees. Furthermore, Cz, can 
be maintained efficiently under cluster merges 
and splits. The reason is that, although the hybrid 
local trees contain simple local trees which are 
expensive to maintain, these simple local trees 
are very small as each of them has at most 
log‘ n leaves. Hence, maintaining them is not 
a bottleneck in the data structure. 

Combining hybrid local trees with the 
two shortcutting systems suffice to obtain 
a factor loglogn speedup for updates and 
queries. This gives a deterministic data structure 
with O(log?n/loglogn) update time and 
O(log n/ log log) query time. 


Open Problems 


Two main open problems for dynamic connectiv- 
ity are: 


e Is there a data structure with O(logn) oper- 
ation time (which would be optimal by the 
lower bound in [7])? 

e Is there a data structure with worst-case poly- 
logarithmic operation time which is not Monte 
Carlo? 
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Problem Definition 


The problem of interest is to find a virtual back- 
bone with a certain level of fault tolerance. Vir- 
tual backbone is a subset of nodes to be in charge 
of routing messages among the other nodes and is 
a very effective tool to improve the communica- 
tion efficiency of various wireless networks such 
as mobile ad hoc networks and wireless sensor 
networks [3]. It is known that a virtual backbone 
with smaller cardinality works more efficiently. 
Without the fault-tolerance consideration, the 
problem of computing minimum cardinality vir- 
tual backbone can be formulated as a minimum 
connected dominating set problem [1], which is 
a well-known NP-hard problem [2]. To improve 
the fault tolerance of a connected dominating set 
C in homogenous wireless networks, C needs to 
exhibit two additional properties [4]: 
¢ k-connectivity: C has to be k-vertex- 
connected so that the virtual backbone can 
survive even after k — 1 backbone nodes fail. 
¢ m-domination: each node v has to be adjacent 
to at least m nodes in C so that v can be 
still connected even after m — 1 neighboring 
backbone nodes fail. 


The actual value of the two integers, k and m, can 
be determined by a network operator based on the 
degree of fault tolerance desired. The majority of 
the results on this topic consider homogenous 
wireless networks which is a wireless network 
of uniform hardware functionality. In this case, 
the network can be abstracted using the unit disk 
graph model [6]. 


Mathematical Formulation 

Given a unit disk graph G = (V, E), a subset 
C C V isa dominating set in G if for each node 
v € V\C, v has a neighboring node in C. C is an 


Fault-Tolerant Connected Dominating Set 


m-dominating set in G if foreach node v € V\C, 
v has at least m neighboring nodes in C. C is a 
connected dominating set in G if C is a domi- 
nating set in G and if G[C], the subgraph of G 
induced by C, is connected. C is a k-connected 
dominating set in G if G[C] is a dominating set in 
G and G[C] is k-vertex-connected. Finally, C is 
a k-connected m-dominating set if (a) G[C] is k- 
vertex-connected and (b) C is an m-dominating 
set in G. Given G = (V, £), the minimum k- 
connected m-dominating set problem is to find 
a minimum cardinality subset C of V satisfying 
those two requirements. 


Key Results 


The initial discussion about the need of fault tol- 
erance in virtual backbones has been made by Dai 
and Wu [4]. Since the minimum k-connected m- 
dominating set problem in NP-hard, many efforts 
are made to design a constant factor approxi- 
mation algorithm for the problem. In [7], Wang 
et al. proposed a constant factor approximation 
algorithm for the problem with k = 2 andm = 1. 
In [8], Shang et al. introduced a constant factor 
approximation algorithm for arbitrary integer m 
and k = 1,2. Later, lots of efforts are made to 
introduce a constant factor approximation algo- 
rithm for arbitrary k and m pairs [9-13]. How- 
ever, all of them do not work or lose the claimed 
constant approximation bound in some instances 
when k > 3 [14,15]. 

In [16], the authors introduce an O(1) ap- 
proximation algorithm, Fault-Tolerant Connected 
Dominating Sets Computation Algorithm (FI- 
CDS-CA), which computes (3,m)-CDSs in 
UDGs. The core part of the algorithm is for 
computing a (3, 3)-CDS, and then it can be easily 
adapted to compute (3,m)-CDS for any m > 1. 
The following sections will introduce some key 
ideas and results of this work. 


Constant Approximation for 3-Connected 
m-Dominating Set 


Core Idea 
The algorithm starts from a 2-connected 3- 
dominating set Yo := C2,3, which can be done 
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by the algorithm in [8]. Then, it augments the 
connectivity of the subset by adding a set of 
nodes Co C V \ Yo into Yo while guaranteeing 
the number of the newly added nodes Cp is within 
a constant factor of |Yo|. In order to do so, the 
entry introduces the concept of a good node and a 
bad node. A node u in a 2-connected graph G2 is 
called a good node if G2 \ {u} is still 2-connected, 
that is, it cannot constitute a separator with any 
other node in G2; otherwise it is a bad node. An 
important observation is that a 2-connected graph 
without bad nodes is 3-connected. Then the entry 
shows that one can always convert a bad node 
into a good node by adding a constant number 
of nodes into Yo while not introducing new bad 
nodes, and they gave an efficient way to achieve 
this goal. By repeatedly changing bad nodes in 
Yo into good nodes until no bad node is left, Yo 
eventually becomes 3-connected whose size is 
guaranteed to be within a constant factor of the 
optimal solution. 


Brief Description 


A. Removing Separators If a 2-connected graph 
G2 is not 3-connected, then there exists a pair 
of nodes u and v, called separator of G2, such 
that Gz \ {u, v} splits into several parts. It can 
be shown, due to the properties of UDG, that by 
adding the internal nodes of at most a constant 
number of H3-paths (by an H3-path we mean 
a path with length at most three connecting two 
nodes of a subgraph H of Go, the internal nodes 
of which do not belong to H) into Yo, {u, v} is 
no longer a separator of Yo, and the nodes newly 
added are good nodes because Yo = C2,3 is a 
3-dominating set. 


B. Decomposition of a Connected Graph into a 
Leaf-Block Tree In graph theory, a block of a 
graph is a maximal 2-connected subgraph [5]. 
Given a 2-connected subgraph Yo (initially, this 
is a Cz,3) and the set X of bad points in Yo, we 
select v € X as a root and compute a leaf-block 
tree Ty of Yo \ {v} [5]. Then, 7o constitutes of 
a set of blocks {B,, Bz,..., Bs} and a set of cut 
vertices {c1,...,¢;}. An important fact is that v 
can constitute a separator only with another node 
in {c1,..., Ce}. 
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C. Good Blocks vs Bad Blocks In the process of 
decomposing a block B with root v into a leaf- 
block tree, it is important to identify those blocks 
B; containing internal bad nodes, that is, those 
bad nodes in B; that cannot be connected with 
nodes outside B; directly without going through 
v (otherwise, it is called external bad nodes). 
We call such a block B; with (resp. without) 
an internal bad node a bad block (resp. a good 
block). A key fact is that an internal bad node 
in B; can only constitute a separator of Yo with 
another node inside B;, while this may not be true 
for external bad nodes. 


D. Multilevel Decomposition The purpose of the 
multilevel decomposition is to find a block B 
with root v such that B \ {v} contains only 
good blocks. We assume that X¥ # Q, since 
otherwise Yo is already 3-connected. After setting 
B < Yo, FI-CDS-CA first picks one v € 
X and starts the initial decomposition process 
(say level-O decomposition). Then, B \ {v} is 
decomposed into a (level-0) leaf-block graph To, 
which is a tree whose vertices consist of a set 
of blocks {B,,..., B;} and a set of cut vertices 
{C1,C2,...,¢¢} (s > 2 andt > 1). Now, FT- 
CDS-CA examines each block in 7g to see if 
there is a block B; having an internal bad node 
in it. If all blocks are good blocks, then we are 
done in this step. Otherwise, there must exist 
some B; having an internal node w € B; which 
constitutes a separator {w, uv} of Yo with another 
node u € B; C Yo. Now, set v <— w and 
B < B;, start next level (level-1) decomposition. 
By repeating such process, we can keep making 
our problem smaller and eventually can find a 
block B with root v such that B \ {v} contains 
only good blocks. 


E. Merging Blocks (Reconstructing the Leaf- 
Block Tree with a New Root) After the multilevel 
decomposition process, we obtain a series of 
blocks: Y; C Yj;_1 C-+-Y, C Yo, where Y; = B 
is the final block with a root v such that there is 
no bad block in the leaf-block tree of Y; \ {v}. 
In the induced subgraph G[Y;], v constitutes a 
separator with any of c1,C2,...,Cz, but in Yo 
this is not necessarily true, since there exist some 
blocks B; having external nodes that are adjacent 
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to some nodes in Yo \ Y; (otherwise, Y; and 
Yo \ Y; cannot be connected with each other). So 
these blocks that can be connected directly with 
Yo \ ¥; without going through v should be merged 
together with Yo \ Y; into a larger block. After 
merging all possible blocks into one bigger block, 
we obtain a modified leaf-block tree T/ in which 
one bigger block VB (we call it a virtual block) 
is added representing all the merged blocks and 
Yo \ Y7, and all the cut vertices c; which do not 
constitute a separator with v have to be removed. 
Moreover, we mark every remaining cut vertex of 
VB as a virtual cut vertex. In essence, the above 
merging process can be considered as a process 
to generate a leaf-block tree directly from Yo \ {v} 
with all blocks being good except possibly for the 
virtual block. 


F. One Bad-Node Elimination At this point, 
we have a leaf-block tree T/ with V(T/) = 
{B,, Bo,..., Bs, VB} U {c1,...,C¢}, which is 
obtained through the decomposition of Yo \ {v} 
(or, equivalently, through the merging process), 
where v is the internal bad node chosen as root 
in B = Y;. Note in T/, every B; is a good block 
except possibly for the virtual block VB. The key 
point here is that we must have s > 1; otherwise 
v would be a good node. In this step, a simple 
process is employed to make either v or one of 
the cut vertices in {vj, v2,..., uz} \ C (C is the 
set of cut vertices in the virtual block VB) to be 
a good node. Consider two cases: (i) if the leaf- 
block tree T/ has only virtual cut vertices (i.e., 
T] is a star centered at VB), then the bad node v 
becomes a good node by removing the separators 
consisting of v and the virtual cut vertices, and 
(ii) if the leaf-block tree T; has a cut vertex which 
is not a virtual cut vertex, then we can find a path 
P =(Bo,(,..., R)in T; with one endpoint Bo 
being a leaf in the tree 7/ and the other endpoint 
R being a block (cut vertex) with degree larger 
than two or the virtual block VB (a virtual cut 
vertex c), if the former does not exist. In this 
case, two consecutive blocks B; and Bist can 
be found which share a common cut vertex ¢;. 
Then it can be shown that at most five H3-paths 
are needed such that ¢; cannot constitute a pair 
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of separator of Yo with any of the remaining cut 
vertex or v. Meanwhile, it is still possible that ¢; 
may constitute separators of Yo with the external 
nodes B and Bist (clearly ¢; cannot constitute 
separators of Yo with the internal nodes B; and 
Bj 41). It can be proved that the total number of 
external nodes in B; and Bist that may constitute 
separators of Yo with c; is at most five. In both 
cases, the number of H3-paths added to change 
one bad node (v or ¢;) into a good node is at most 
a constant. 


Open Problems 


While Wang et al. [16] manage to introduce a 
constant factor approximation algorithm for the 
minimum k-connected m-dominating set prob- 
lem in unit disk graph with k = 3 and arbitrary 
integer m > 1, it is still open to design an 
approximation algorithm for the case with k > 4. 


Experimental Results 


Wang et al.’s work [16] presents some simulation 
results. The results show that when a 2-connected 
3-dominating set computed by Shang et al.’s 
approach [8] is augmented to a 3-connected 3- 
dominating set using their algorithm, the size 
of the connected dominating set will modestly 
increase roughly less than 25 %. Their algorithm 
is also compared with an optimal solution us- 
ing an exhaustive computation within small-scale 
random unit disk graphs. The result shows the 
performance gap between exact algorithm and 
their algorithm is no greater than 39.27 %. 
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Problem Definition 


Fault tolerance is the study of reliable computa- 
tion using unreliable components. With a given 
noise model, can one still reliably compute? For 
example, one can run many copies of a clas- 
sical calculation in parallel, periodically using 
majority gates to catch and correct faults. Von 
Neumann showed in 1956 that if each gate fails 
independently with probability p, flipping its out- 
put bit 0 <+ 1, then such a fault tolerance scheme 
still allows for arbitrarily reliable computation 
provided that p is below some constant threshold 
(whose value depends on the model details) [10]. 

In a quantum computer, the basic gates are 
much more vulnerable to noise than classical 
transistors — after all, depending on the imple- 
mentation, they are manipulating single electron 
spins, photon polarizations, and similarly fragile 
subatomic particles. It might not be possible 
to engineer systems with noise rates less than 
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Fault-Tolerant Quantum Computation, Fig. 1 Bit-flip 
X errors flip 0 and 1. In a qubit, |0) and |1) might be 
represented by horizontal and vertical polarization of a 
photon, respectively. Phase-flip Z errors flip the +45° 
polarized states |++) and |—) 


10-7, or perhaps 10~3, per gate. Additionally, 
the phenomenon of entanglement makes quan- 
tum systems inherently fragile. For example, in 
Schrédinger’s cat state — an equal superposition 
between a living cat and a dead cat, often ideal- 
ized as 1/./2(|0”) + |1”)) — an interaction with 
just one quantum bit (“qubit’”) can collapse, or 
decohere, the entire system. Fault tolerance tech- 
niques will therefore be essential for achieving 
the considerable potential of quantum computers. 
Practical fault tolerance techniques will need to 
control high noise rates and do so with low 
overhead, since qubits are expensive. 

Quantum systems are continuous, not discrete, 
so there are many possible noise models. How- 
ever, the essential features of quantum noise for 
fault tolerance results can be captured by a simple 
discrete model similar to the one Von Neumann 
used. The main difference is that, in addition to 
bit-flip X errors which swap 0 and 1, there can 
also be phase-flip Z errors which swap |+) = 
1//2(|0) + |1)) and |-) = 1/¥/2(/0) — |1)) 
(Fig. 1). A noisy gate is modeled as a perfect gate 
followed by independent introduction of X, Z, or 
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Y (which is both X and Z) errors with respective 
probabilities px, pz, Py. One popular model is 
independent depolarizing noise (px = pz = 
Py = p/3); a depolarized qubit is completely 
randomized. 

Faulty measurements and preparations of 
single-qubit states must additionally be modeled, 
and there can be memory noise on resting 
qubits. It is often assumed that measurement 
results can be fed into a classical computer 
that works perfectly and dynamically adjusts 
the quantum gates, although such control 
is not necessary. Another common, though 
unnecessary, assumption is that any pair of 
qubits in the computer can interact; this is called 
a nonlocal gate. In many proposed quantum 
computer implementations, however, qubit 
mobility is limited so gates can be applied only 
locally, between physically nearby qubits. 


Key Results 


The key result in fault tolerance is the existence 
of a noise threshold, for certain noise and compu- 
tational models. The noise threshold is a positive, 
constant noise rate (or set of model parameters) 
such that with noise below this rate, reliable 
computation is possible. That is, given an input- 
less quantum circuit C of perfect gates, there 
exists a “simulating” circuit F TC of faulty gates 
such that with probability at least 2/3, say, the 
measured output of C agrees with that of FTC. 
Moreover, F TC should be only polynomially 
larger than C. 

A quantum circuit with N gates can a priori 
tolerate only O(1/N) error per gate, since 
a single failure might randomize the entire 
output. In 1996, Shor showed how to tolerate 
O(1/poly(log N)) error per gate by encoding 
each qubit into a poly(log N)-sized quantum 
error-correcting code and then implementing 
each gate of the desired circuit directly on the 
encoded qubits, alternating computation and 
error correction steps (similar to Von Neumann’s 
scheme) [8]. Shor’s result has two main technical 
pieces: 
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1. The discovery of quantum error-correcting 
codes (QECCs) was a major result. Re- 
markably, even though quantum errors can 
be continuous, codes that correct discrete 
errors suffice. (Measuring the syndrome of 
a code block projects into a discrete error 
event.) The first quantum code, discovered 
by Shor, was a nine-qubit code consisting 
of the concatenation of the  three-qubit 
repetition code |0) — |000), |1) — |111) 
to protect against bit-flip errors, with its 
dual |+) > | +++), |-) & |———) to 
protect against phase-flip errors. Since then, 
many other QECCs have been discovered. 
Codes like the nine-qubit code that can 
correct bit- and phase-flip errors separately 
are known as Calderbank-Shor-Steane (CSS) 
codes and have quantum code words which 
are simultaneously superpositions over code 
words of classical codes in both the |0/1) and 
| + /—) bases. 


2. QECCs allow for quantum memory or for 
communicating over a noisy channel. For 
computation, however, it must be possible 
to compute on encoded states without first 
decoding. An operation is said to be fault 
tolerant if it cannot cause correlated errors 
within a code block. With the n-bit majority 
code, all classical gates can be applied 
transversely — an encoded gate can be 
implemented by applying the unencoded gate 
to bit i of each code block, 1 < i < n. 
This is fault tolerant because a single failure 
affects at most | bit in each block, and thus, 
failures can’t spread too quickly. For CSS 
quantum codes, the controlled-NOT gate 
CNOT, |a,b) — |a,a @ b), can similarly 
be applied transversely. However, the CNOT 
gate by itself is not universal, so Shor also 
gave a fault-tolerant implementation of the 
Toffoli gate ja,b,c) > |a,b,c ® (an b)). 
Procedures are additionally needed for error 
correction using faulty gates and for the initial 
preparation step. The encoding of |0) will 
be a highly entangled state and difficult to 
prepare (unlike 0” for the classical majority 
code). 
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However, Shor did not prove the existence of a 
constant tolerable noise rate, a noise threshold. 
Several groups — Aharonov/Ben-Or, Kitaev, and 
Knill/Laflamme/Zurek — each had the idea of 
using smaller codes and concatenating the proce- 
dure repeatedly on top of itself. Intuitively, with a 
distance-three code (i.e., code that corrects any 
one error), one expects the “effective” logical 
error rate of an encoded gate to be at most cp? 
for some constant c, because one error can be 
corrected but two errors cannot. The effective 
error rate for a twice-encoded gate should then 
be at most c(cp)?; and since the effective error 
rate is dropping doubly exponentially fast in the 
number of levels of concatenation, the overhead 
in achieving a 1/N error rate is only poly(log 
N). The threshold for improvement, cp? < p, 
is p < 1/c. However, this rough argument is 
not rigorous, because the effective error rate is ill 
defined, and logical errors need not fit the same 
model as physical errors (e.g., they will not be 
independent). 

Aharonov and Ben-Or and Kitaev gave in- 
dependent rigorous proofs of the existence of a 
positive constant noise threshold, in 1997 [1,5]. 

Broadly, there has since been progress on two 
fronts of the fault tolerance problem: 


1. First, work has proceeded on extending the set 
of noise and computation models in which a 
fault tolerance threshold is known to exist. For 
example, correlated or even adversarial noise, 
leakage errors (where a qubit leaves the |0), 
|1) subspace), and non-Markovian noise (in 
which the environment has a memory) have all 
been shown to be tolerable in theory, even with 
only local gates. 


2. Threshold existence proofs establish that 
building a working quantum computer is 
possible in principle. Physicists need only 
engineer quantum systems with a low 
enough constant noise rate. But realizing 
the potential of a quantum computer will 
require practical fault tolerance schemes. 
Schemes will have to tolerate a high noise 
rate (not just some constant) and do so with 
low overhead (not just polylogarithmic). 
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However, rough estimates of the noise rate 
tolerated by the original existence proofs 
are not promising — below 10~° noise per 
gate. If the true threshold is only 107°, 
then building a quantum computer will be 
next to impossible. Therefore, second, there 
has been substantial work on optimizing 
fault tolerance schemes primarily in order 
to improve the tolerable noise rate. These 
optimizations are typically evaluated with 
simulations and heuristic analytical models. 
Recently, though, Aliferis, Gottesman, and 
Preskill have developed a method to prove 
reasonably good threshold lower bounds, up 
to 2 x 10~+, based on counting “malignant” 
sets of error locations [3]. 


In a breakthrough, Knill has constructed a novel 
fault tolerance scheme based on very efficient 
distance-two codes [6]. His codes cannot correct 
any errors, and the scheme uses extensive posts- 
election on no detected errors — i.e., on detecting 
an error, the enclosing subroutine is restarted. He 
has estimated a threshold above 3% per gate, 
an order of magnitude higher than previous es- 
timates. Reichardt has proved a threshold lower 
bound of 1073 for a similar scheme [7], some- 
what supporting Knill’s high estimate. However, 
reliance on postselection leads to an enormous 
overhead at high error rates, greatly limiting prac- 
ticality. (A classical fault tolerance scheme based 
on error detection could not be efficient, but 
quantum teleportation allows Knill’s scheme to 
be at least theoretically efficient.) There seems to 
be tradeoff between the tolerable noise rate and 
the overhead required to achieve it. 

There are several complementary approaches 
to quantum fault tolerance. For maximum 
efficiency, it is wise to exploit any known 
noise structure before switching to general fault 
tolerance procedures. Specialized techniques 
include careful quantum engineering, techniques 
from nuclear magnetic resonance (NMR) such 
as dynamical decoupling and composite pulse 
sequences, and decoherence-free subspaces. For 
very small quantum computers, such techniques 
may give sufficient noise protection. 

It is possible that an inherently reliable 
quantum-computing device will be engineered 
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or discovered, like the transistor for classical 
computing, and this is the goal of topological 
quantum computing [4]. 


Applications 


As quantum systems are noisy and entanglement 
fragile, fault tolerance techniques will probably 
be essential in implementing any quantum algo- 
rithms — including efficient factoring and quan- 
tum simulation. 

The quantum error-correcting codes originally 
developed for fault tolerance have many other 
applications, including quantum key distribution. 


Open Problems 


Dealing with noise may turn out to be the most 
daunting task in building a quantum computer. 
Currently, physicists’ low-end estimates of 
achievable noise rates are only slightly below 
theorists’ high-end (mostly simulation based) 
estimates of tolerable noise rates, at reasonable 
levels of overhead. However, these estimates 
are made with different noise models — most 
simulations are based on the simple independent 
depolarizing noise model, and threshold lower 
bounds for more general noise are much lower. 
Also, both communities may be being too 
optimistic. Unanticipated noise sources may 
well appear as experiments progress. The 
probabilistic noise models used by theorists 
in simulations may not match reality closely 
enough, or the overhead/threshold tradeoff 
may be impractical. It is not clear if fault- 
tolerant quantum computing will work in 
practice, unless inefficiencies are wrung out 
of the system. Developing more efficient fault 
tolerance techniques is a major open problem. 
Quantum system engineering, with more realistic 
simulations, will be required to understand better 
various tradeoffs and strategies for working with 
gate locality restrictions. 

The gaps between threshold upper bounds, 
threshold estimates, and rigorously proven 
threshold lower bounds are closing, at least 
for simple noise models. Our understanding of 
what to expect with more realistic noise models 
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is less developed, though. One current line of 
research is in extending threshold proofs to more 
realistic noise models — e.g., [2]. A major open 
question here is whether a noise threshold can be 
shown to even exist where the bath Hamiltonian 
is unbounded — e.g., where system qubits are 
coupled to a non-Markovian, harmonic oscillator 
bath. Even when a threshold is known to exist, 
rigorous threshold lower bounds in more general 
noise models may still be far too conservative 
(according to arguments, mostly intuitive, known 
as “twirling”’) and, since simulations of general 
noise models are impractical, new ideas are 
needed for more efficient analyses. 

Theoretically, it is of interest what is the best 
asymptotic overhead in the simulating circuit 
FTC versus C? Overhead can be measured in 
terms of size N and depth/time T. With con- 
catenated coding, the size and depth of F TC are 
O(Npolylog N) and O(Tpolylog NV), respec- 
tively. For classical circuit C, however, the depth 
can be only O(T). It is not known if the quantum 
depth overhead can be improved. 


Experimental Results 


Fault tolerance schemes have been simulated 
for large quantum systems, in order to obtain 
threshold estimates. For example, extensive 
simulations including geometric locality 
constraints have been run by Thaker et al. [9]. 

Error correction using very small codes has 
been experimentally verified in the lab. 


URL to Code 


Andrew Cross has written and distributes code 
for giving Monte Carlo estimates of and rigorous 
lower bounds on fault tolerance thresholds: 
http://web.mit.edu/awcross/www/qasm-tools/. 
Emanuel Knill has released Mathematica code 
for estimating fault tolerance thresholds for 
certain postselection-based schemes: http://arxiv. 
org/e-print/quant-ph/0404 104. 
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Problem Definition 


To subdivide an edge e in a graph G with end- 
points u and v, delete the edge from the graph and 
add a path of length two connecting the vertices 
u and v. A graph G is a subdivision of graph 
HT if G can be obtained from H by repeatedly 
subdividing edges. A graph H is a topological 
subgraph (or topological minor) of graph G if 
a subdivision of H is a subgraph of G. Equiva- 
lently, H is a topological subgraph of G if H can 
be obtained from G by deleting edges, deleting 
vertices, and suppressing vertices of degree 2 (to 
suppress a vertex of degree 2, delete the vertex 
and add an edge connecting its two neighbors). 
The notion of topological subgraphs appears in 
the classical result of Kuratowski in 1935 stating 
that a graph is planar if and only if it does not have 
a topological subgraph isomorphic to Ks or K3,3. 
This entry considers the problem of determining, 
given a graph G and H, whether G contains H 
as a topological minor. 


Topological Subgraph Testing 


Input: Graphs G and H 
Output: Determine if H/ is a topological subgraph of G 


Observe that a graph G on 7 vertices contains 
the cycle of length n as a topological subgraph if 
and only if G contains a Hamiltonian cycle. Thus, 
itis NP-complete to decide if H is a topological 
subgraph of a graph G with no further restrictions 
onG or H. 


Previous Work 

The algorithmic problem of testing for topologi- 
cal subgraphs was already studied in the 1970s by 
Lapaugh and Rivest [12] (also see [7]). Fortune, 
Hopcroft, and Wyllie [6] showed that the analo- 
gous problem in directed graphs is NP-complete 
even when H is a fixed small graph. Robertson 
and Seymour, as a consequence of their semi- 
nal work on graphs minors, showed that for a 
fixed graph H, there exists a polynomial time 
algorithm to check whether H is a topological 
subgraph of a graph G given in input. However, 
the running time of the Robertson-Seymour algo- 
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rithm is |V(G)|C""”D_ Following this, Downey 
and Fellows [4] (see also [5]) conjectured that the 
problem of topological subgraph testing is fixed 
parameter tractable: they conjectured that there 
exists a function f and a constant c such that 
there exists an algorithm for testing whether a 
graph H is a topological subgraph of G which 
runs in time f(|V(H)|) -|V(G)|°. 

The problem of topological subgraph testing 
is closely related to that of minor testing and 
the k-disjoint paths problem. A graph H is a 
minor of G if H can be obtained from a subgraph 
of G by contracting edges. The k-disjoint paths 
problem instead takes as input k pairs of vertices 
(51,t1),.-., (Sx, t%) of vertices in a graph G and 
asks if there exist pairwise internally vertex dis- 
joint paths P;,..., Px such that the endpoints of 
P; are s; and t; for all 1 < i < k. Robertson and 
Seymour [13] considered a model of labeled mi- 
nor containment that unites these two problems 
and showed that there is an O(|V(G)|*) time 
algorithm for both H-minor testing for a fixed 
graph H and the k-disjoint paths problem for a 
fixed value k. 

For every H, there exists a finite list 
A,,...,H; of graphs such that a graph G 
contains H as a minor if and only if G contains 
H; as a topological minor for some index 
i; this follows from the definition of minor 
and topological minor. Thus, the problem of 
minor testing reduces to the harder problem 
of topological minor testing. It is not difficult 
to reduce the problem of topological subgraph 
containment for a fixed graph H to the k-disjoint 
paths problem. For each vertex v of H, guess 
a vertex v’ of G, and then for each edge uv 
of H, and seek to find a path connecting w’ 
and v’ in G such that these |E(#)| paths are 
pairwise internally vertex disjoint. This approach 
yields the |V(G)|C¢") time algorithm for 
topological subgraph testing mentioned above. 


Key Results 


The following theorem of Grohe, Kawarabayashi, 
Marx, and Wollan [8] shows that topological 
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subgraph testing is fixed parameter tractable, con- 
firming the conjecture of Downey and Fellows. 


Theorem 1 For every fixed, undirected graph 
H, there is an O(|V(G)|?) time algorithm that 
decides if H is a topological subgraph of G. 


Outline of the Proof 

The algorithm given by Theorem | builds on 
the techniques first developed by Robertson and 
Seymour in their algorithm for minor testing and 
the k-disjoint paths problem. Fix a graph H and 
let G be a graph given in input. The algorithm 
separately considers each of the following three 
cases: 


1. The tree-width of G is bounded (by an appro- 
priate function on |V(#7)|); 

2. G has large tree-width, but the size of the 
largest clique minor is bounded (again by an 
appropriate function on |V(#)|); 

3. G has a large clique minor. 


Note that in the third case, the existence of a 
large clique minor necessarily forces the graph 
G to have large tree-width. We do not use any 
technical aspects of the parameter tree-width here 
and direct interested readers to [1,2] for further 
discussion of this topic. 

The Robertson-Seymour algorithm for minor 
testing offers a roadmap for the proof of Theo- 
rem 1; the discussion of the proof of Theorem 1 
highlights where the proof builds on the tools 
of Robertson and Seymour and where new tech- 
niques are required. As in Robertson-Seymour’s 
algorithm for minor testing, the algorithm consid- 
ers a rooted version of the problem. 


G has Bounded Tree-Width 

Numerous problems can be efficiently solved 
when the input graph is restricted to have 
bounded tree-width (see [1, 3] for examples). 
For example, the k-disjoint paths problem can 
be solved in linear time in graphs of bounded 
tree-width [15]. Standard dynamic programming 
techniques can be used to solve the more general 
rooted version of the topological subgraph 
problem which the algorithm considers. 
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G has Large Tree-Width, but no Large Clique 
Minor 

Robertson and Seymour showed that graphs of 
large tree-width which do not contain a fixed 
clique minor must contain a large, almost planar 
subgraph [11,13]; this result is sometimes known 
as the flat-wall theorem. The proof of correctness 
for their disjoint paths algorithm hinges upon 
this theorem by showing that a vertex in the 
planar subgraph can be deleted without affecting 
the feasibility of a given disjoint paths problem. 
The proof of Theorem | builds on this approach. 
Given graphs G and H, say a vertex v € V(G) 
is irrelevant for the problem of topological sub- 
graph testing if G contains H as a topological 
minor if and only if G — v contains H as a 
topological minor. If the algorithm can efficiently 
find an irrelevant vertex v, then it can proceed 
by recursing on the graph G — v. In order to 
apply a similar irrelevant vertex argument to that 
developed by Robertson and Seymour for the 
disjoint paths problem, the proof of Theorem 1 
shows that a large flat wall contains an irrelevant 
vertex for a given topological subgraph testing 
problem by first generalizing several technical 
results on rerouting systems of paths in graphs 
[10, 13] as well as deriving a stronger version of 
the flat-wall theorem. 


G has a Large Clique Minor 

In the Robertson and Seymour algorithm for 
minor testing, once the graph can be assumed 
to have a large clique minor, the algorithm triv- 
ially terminates. When considering the k-disjoint 
paths problem, again it is a relatively straight- 
forward matter to find an irrelevant vertex for a 
given disjoint paths problem assuming the ex- 
istence of a large clique minor. Instead, if we 
are considering the problem of testing topolog- 
ical subgraph containment, the presence of a 
large clique minor does not yield an easy re- 
cursion. Consider the case where we are testing 
for the existence of a topological subgraph of 
a 4-regular graph H in a graph G which con- 
tains a subcubic subgraph G’ such that G’ has 
a large clique minor. Whether or not G’ will 
prove useful in finding a topological subgraph 
of H in G will depend entirely on whether 
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or not it is possible to link many vertices of 
degree four (in G) to the clique minor in G’ 
and not on the size itself of the clique minor in 
G’. Similar issues arise in [9] when developing 
structure theorems for excluded topological sub- 
graphs. 

The proof of Theorem | proceeds by consid- 
ering separately the case when the large-degree 
vertices can be separated from the clique minor 
by a bounded sized separator or not. If they can- 
not, one can find the necessary rooted topological 
minors. Alternatively, if they can, the algorithm 
recursively calculates the rooted topological mi- 
nors in subgraphs of G and replaces a portion 
of the graph with a bounded size gadget. This 
portion of the argument is substantially different 
from the approach of Robertson and Seymour 
to minor testing and comprises the major new 
development in the proof. 


Applications 


An immersion of a graph H into a graph G is 
defined like a topological embedding, except that 
the paths in G corresponding to the edges of H 
are only required to be pairwise edge disjoint 
instead of pairwise internally vertex disjoint. For- 
mally, an immersion of H into G is a mapping 
v that associates with each vertex v € V(H) a 
distinct vertex v(v) € V(G) and with each edge 
e = uw € E(#) apath v(e) in G with endpoints 
v(v) and v(w) in such a way that the paths v(e) 
fore € E(#) are mutually edge disjoint. Robert- 
son and Seymour [14] showed that graphs are 
well quasi-ordered under the immersion relation, 
proving a conjecture of Nash-Williams. In [8], 
the authors give a construction which implies the 
following corollary of Theorem 1. 


Corollary 1 For every fixed undirected graph 
H, there is an O(|V(G)|?) time algorithm that 
decides if there is an immersion of H into G. 


Again, the algorithm is uniform in H, which im- 
plies that the immersion problem is fixed parame- 
ter tractable. This answers another open question 
by Downey and Fellows [4,5]. Corollary | also 
holds for the more restrictive “strong immersion” 
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version, where v(v) cannot be the internal vertex 
of the path v(e) for any v € V(G) ande € E(G). 
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Problem Definition 


In the classical bin packing (BP) problem, we are 
given a set of items with rational sizes between 0 
and |, and we try to pack them into a minimum 
number of bins of unit size so that no bin contains 
items with total size more than 1. The problem 
definition originates in the early 1970s: Johnson’s 
thesis [10] on bin packing together with Gra- 
ham’s work on scheduling [8, 9] (among other 
pioneering works) started and formed the whole 
area of approximation algorithms. The First Fit 
(FF) algorithm is one among the first algorithms 
which were proposed to solve the BP problem 
and analyzed in the early works. FF performs as 
follows: The items are first given in some list 
L and then are handled by the algorithm in this 
given order. Then, algorithm FF packs each item 
into the first bin where it fits; in case the item does 
not fit into any already opened bin, the algorithm 
opens a new bin and puts the actual item there. A 
closely related algorithm is Best Fit (BF); it packs 
the items also according to a given list, but each 
item is packed into the most full bin where it fits 
or the item is packed into a new bin only if it does 
not fit into any open bin. If the items are ordered 
in the list by decreasing sizes, the algorithms are 
called as FFD (first fit decreasing) and BFD (best 
fit decreasing). 
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Applications 


There are many applications for bin packing 
(in industry, computer science, etc), and BP has 
many different versions. It is worth noting that BP 
has a strong relationship to the area of scheduling. 
So the scientific communities of packing and 
scheduling are almost the same.is a major 


Key Results 


It was immediately shown in the early works 
[6, 12,15] that the asymptotic approximation ratio 
of FF and BF bin packing is 1.7. It means that if 
the optimum packing needs OPT bins, algorithm 
FF never uses more than 1.7 - OPT + C bins, 
where C is a fixed constant (The same holds for 
the BF algorithm). It is easy to see that the mul- 
tiplicative factor, i.e., 1.7, cannot be smaller. But 
the minimum value of the C constant, for which 
the statement remains valid, is not a simple issue. 

First, Ullman in 1971 [15] showed that C can 
be chosen to be 3. But this is not the best choice. 

Soon, the additive term was decreased in [6] 
to 2 and then in [7] to FF < [1.7- OPT]; since 
both FF and OPT denote integer numbers, this is 
the same as FF < 1.7- OPT + 0.9. 

Then, for many years, no new results were 
published regarding the possible decreasing of 
the additive term. 

Another direction is considered in the many- 
times-cited work of Simchi-Levy [14]. He 
showed that the absolute approximation ratio 
of FF (and BF) is at most 1.75. It means that if 
we do not use an additive term in the inequality, 
then FF < 1.75 - OPT is valid. 

Now, if we are interested in the tight result, 
we have two options. One is that we can try to 
decrease the multiplicative factor in the inequal- 
ity of the absolute approximation ratio, i.e., the 
question is the following: What is the smallest 
number, say a, that can be substituted in the place 
of 1.75 such that the inequality FF < a - OPT 
is valid for any input? The other direction is the 
following: What is the smallest possible value 
of the additive constant C such that the FF < 
1.7: OPT + C inequality is true for every input? 
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The next step was made independently from 
each other in two works. Xia and Tan [17] 
and Boyar et al. [1] proved that the absolute 
approximation ratio of FF is not larger than 
12/7 = 1.7143. 

Moreover, [17] also dealt with the other di- 
rection and decreased the value of C to FF < 
1.7- OPT + 0.7. 

If we are interested in how much the additive 
term (or the @ factor) can be decreased, we must 
also deal with the lower bound of the algorithm. 
Regarding this, the early works give examples 
for both the asymptotic and absolute ratios. For 
the asymptotic bound, there exists such input 
for which FF = 17k holds whenever OPT = 
10k + 1; thus, the asymptotic upper bound 1.7 
is tight, see [6, 12, 15]. For the absolute ratio, an 
example is given with FF = 17 and OPT = 10, 
i.e., an instance with approximation ratio exactly 
1.7 [6, 12]. But no example was shown for large 
values of OPT. 

It means that soon, it turned out that the 
value of the multiplicative factor of the absolute 
approximation ratio (i.e., &) cannot be smaller 
than 1.7 or, regarding the another measure, the 
additive constant cannot be chosen to be smaller 
than zero. But this remained an open question for 
40 years whether the smallest possible choice of 
a is really 1.7 or, in other words, the smallest 
possible choice of the additive term is really zero. 

Finally, the papers [3,4] answered the ques- 
tion. Lower-bound instances are given with FF = 
BF = |1.7- OPT | for any value of OPT, and it is 
also shown that FF = BF < |1.7- OPT| holds 
for any value of OPT. So this is the tight bound 
which was looked for 40 years. 


Methods 
To prove the upper bound, the main technique is 
the usage of a weighting function. Any item gets 
some weight according to its size. Then, to get 
the asymptotic ratio, it is only needed that any 
optimal bin has a weight at most 1.7 and any bin 
in the FF packing (with bounded exception) has a 
weight at least 1. 

In the recent paper [13], a nice and surprising 
idea is presented: The same weight function that 
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function 


was used traditionally in the analysis is divided 
into two parts: scaled size and bonus. Thus, the 
weight of any item a is w(a) = s(a) + b(a), 
where s(a) = 6/5-a is the scaled size of the 
item and the remaining part b(a) is the bonus of 
the item, which is defined as follows: 

b(a) is zero if the size of the item is below 
1/6. The bonus is just 0.1 if a is between 1/3 and 
1/2, and it is 0.4 if the size is above 1/2. Between 
1/6 and 1/3, the bonus function is continuous and 
linear. We emphasize that this is the same old 
weighting function, only in a new costume. The 
bonus function can be seen in Fig. 1. 

By this separation, it is easy to show that the 
weight of any optimal bin is at most 1.7, and this 
implies that the weight of the whole instance is at 
most 1.7- OPT. 

The key part is to show that on average, the 
weight of each FF bin is at least 1. This property 
trivially holds if the total size of the items in the 
bin is at least 5/6. It is not hard to handle the 
bins with single items; here, almost all of them 
must be bigger than 1/2, and such items have 
huge bonus (i.e., 0.4), together with the scaled 
size, that is, at least 0.6, we are again done. In 
the remaining bins, the next tricky calculation is 
used: The scaled size of the bin plus the bonus 
of the following bin is at least 1. By this trick, 
the proof will be almost done, but several further 
examinations are also needed for completing the 
tightness result. 


New Lower Bound Construction 
The lower bound construction works in the fol- 
lowing way. Suppose, for the sake of simplicity, 


First Fit Algorithm for Bin Packing 


that OPT = 10k for some integer k, and let ¢ > 0, 
a small value. 

The input consists of OPT small items of size 
approximately 1/6, followed by OPT medium- 
sized items of size approximately 1/3, followed 
by OPT large items of size exactly 1/2 + ¢. The 
optimum packs in each bin one item from each 
group. FF packs the small items into 2k bins with 
5 items with the exception of the first and last 
of these bins, which will have 6 and 4 items, 
respectively. The sizes of items differ from 1/3, 
or differ from 1/6, in both directions by a small 
amount 6;. Finally, every large item will occupy 
its own bin. 

In the original construction, the choice of the 
small and medium-sized items is a bit difficult, 
so one could think that the construction must be 
so difficult, and thus, the construction cannot be 
tightened. It turns out, however, that this is not the 
case. The construction can be modified in the way 
that 5;is exponentially decreasing but remains 
greater than ¢ for all i. This guarantees that only 
the item with the largest 6; in a bin is relevant for 
its final size, and this in turn enables us to order 
the items so that no additional later item fits into 
these bins. Thus, by the modification, not only the 
construction is simpler but it also makes possible 
to prove the tightness. 


Open Problems 


There are many open problems regarding bin 
packing. For example, the tight absolute approx- 
imation ratio of BFD is an open question (For 
FFD, it was recently proved that FFD < (11/9) - 
OPT + 6/9 and this is the tight result, see [5].). 
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Problem Definition 


NP-hard problems are believed to be intractable. 
This is the widely believed assumption that P # 
NP. For all our problems, the size of their input 
is denoted by n. In parameterized complexity, 
the input is refined to (J,k) with k a parame- 
ter related to the input, and the goal is to find 
an exact algorithm for the problem that runs 
in time f(k) - n°, for some function f. In 
this survey, we parameterize by the optimum 
value of the instance unless stated otherwise. 
In addition, the optimum is always integral. In 
approximation algorithms, a p approximation for 
a minimization (maximization) problem P is a 
polynomial time algorithm A, such that for any 
instance J, A returns a solution of value A(/) 
and A(I)/opt(I) < p (opr(1)/A(I) < p) 
with OPT(/) the optimum value for the instance. 
In both subjects, there are intractability results. 
The class FPT are the problems that admit an 
f(k)n°™ time, exact solution for some function 
jf. The classes W[i] for every integeri > 1 
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satisfy FPT C W[1] C W[2] C .... It is widely 
believed that all inclusions are strict. Consider 
the CLIQUE problem. Given a graph G(V, E), 
a subset U C V, forms a clique, if for every 
u,v € U, (u,v) € E. The problem is 


Input: A graph G and a parameter k. 
Question: Is there in G aclique U of size |U| > 
k? 


In [21], it is proved that CLIQUE admits no n!~€ 
approximation unless P = WNP. It is known 
that CLIQUE is W[1]-complete. Thus it is con- 
sidered highly unlikely that CLIQUE € FPT. The 
SETCOVER problem is defined as follows: 


Input: A universe U and a collection S = {5S;} 
of subsets of U/ and a parameter k. 

Question: Is there a subcollection S’ C S con- 
taining at most k sets so that Us, es Si =U? 


SETCOVER is W[2]-complete. In addition, Raz 
and Safra [27] show that unless P = NP, 
SETCOVER admits no c Inn algorithm for some 
constant c, almost matching the simple greedy 
Inn + 1 ratio approximation algorithm. 


Our Subject 

Formally, we deal with the following subject: 
An algorithm for a minimization (resp., max- 
imization) problem P is called an (r,t)-FPT- 
approximation algorithm for P with input param- 
eter k, if the algorithm takes as input an instance 
I with value OPT and an integer parameter k 
and either computes a feasible solution to 7 with 
value at most k - r(k) (resp., at least k/r(k) and 
k/r(h) = o(k)) or computes a certificate that 
k < OPT (resp., k > OPT) in time t(k) - |Z|O™. 
The requirement that k/r(k) = o(k) avoids 
returning a single vertex in the clique problem, 
claiming OPT approximation. 

A problem is called (r, t)-FPT-inapproximable 
(or, (r,t)-FPT-hard) if it does not admit an 
(r,t)-FPT-approximation algorithm. An FPT 
approximation is mainly interesting if the 
problem is W[l]-hard and allowing running 
time f(k)-n?°™ gives improved approximation. 
We restrict our attention to this scenario. Thus, 
we do not discuss many subjects such as 
approximations in OPT that run in polynomial 
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time in n and upper and lower bounds, on 
algorithms with sub exponential time in 1, for 
several conbinatorial problems. 


Our Complexity Assumption 

We assume the following conjecture through- 
out Impagliazzo et al. [4] conjectured the 
following: 


Exponential Time Hypothesis (ETH) 

3-SAT cannot be solved in 2° (q + m)OM 
time where gq is the number of variables and 
m is the number of clauses. 


The following is due to [4]. 


Lemma 1 Assuming ETH, 3-SAT cannot be 
solved in 2° (gq + m)°™ time where q is 
the number of variables and m is the number of 
clauses. 


It is known that the ETH implies that W[1] #4 
FPT. This implies that W[2] 4 FPT as well. 


Key Results 


We survey some FPT-approximability and FPT- 
inapproximability results. Our starting point is a 
survey by Marx [23], and we also discuss recent 
results. The simplest example we are aware of 
in which combining FPT running time and FPT- 
approximation algorithm gives an improved re- 
sult is for the strongly connected directed sub- 
graph (SCDS) problem. 


Input: A directed graph G(V, F), a set T = 
{t1, t2,...,fp} of terminals, and an integer k. 

Question: Is there a subgraph G’(V, E’) so that 
|E’| < k and for every t;,t; € T, there is 
a directed path in G’ from ¢; to t; and vice 
versa? 


The problem is in W[1]-hard. The best approx- 
imation algorithm known for this problem is n£ 
for any constant €. See Charikar et al. [5]. 

The following is due to [7]. 


Theorem 1 The SCDS problem admits an FPT 
time 2 approximation ratio. 
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Proof The directed Steiner tree problem is given 
a directed edge-weighted graph and a root r anda 
set T = {t,f2,...,tp} of terminals; find a mini- 
mum cost-directed tree rooted by r containing T. 
This problem belongs to FPT. See Dreyfus and 
Wagner [12]. Note that for every terminal t;, any 
feasible solution contains a directed tree from t; 
to T and aa reverse-directed Steiner tree from T 
to t;. These two problems can be solved optimally 
in FPT time. In the second application, we reverse 
the direction of edges before we find the directed 
Steiner tree. Moreover, two such trees give a 
feasible solution for the SCDS problem as every 
two terminals ¢; , t; have a path via t;. Clearly, the 
solution has value at most 2 - OPT with OPT the 
optimum value for the SCDS instance. The claim 
follows. 


Definition 1 A polynomial time approximation 
scheme (PTAS) for a problem P is a 1+ e€ 
approximation for any constant € that runs in time 
nfQ/©)| An EPTAS is such an algorithm that runs 
in time f(1/e)nO™, 


The vertex cover problem is to select the smallest 
possible subset U of V so that for every edge 
(u, v), either u € U or v € U (or both). In the 
partial vertex cover problem, a graph G(V, E) 
and an integer k are given. The goal is to find a 
set U of k vertices that is touched by the largest 
number of edges. An edge (u, v) is touched by a 
set U ifu € U orv € U or both. It is known that 
this problem admits no PTAS unless P = NP 
(see Dinur and Safra [10]). The corresponding 
minimum partial vertex cover problem requires 
a set of k vertices touched by the least number of 
edges. This problem admits no better than 2-ratio, 
under the small set expansion conjecture. See 
[15]. Both problems belong to W[1]-hard. The 
following theorem of [23] relies on a technique 
called color coding [1]. 


Theorem 2 ((23]) For every constant €, the par- 
tial vertex cover problem (and in a similar proof 
the minimum partial vertex cover problem) ad- 
mits an EPTAS that runs in time f (k, 1/e)-nO 
with n the number of vertices in the graph. 


Proof Let D = (5) /e. Sort the vertices 
V1, U2,-...,U, by nonincreasing degrees. If for 
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the largest degree, d(vj) satisfies d(v,) > D, the 
algorithm outputs the set {v1, v2,..., vg}. These 
k vertices cover at least yt, deg(v;) - a) 
edges. Clearly, OPT < yt, deg(v;). Hence, 
the value of the constructed solution is at 
least 


ri, deg(v;) — (&) 2 (5) ell 
Tey deg(v;) 7 D 


times the optimum for a | + € approximation. In 
the other case, the optimum OPT < k- D. We 
guess the correct value of the optimum by trying 
all values between 1,...,k - D. Fix the run with 
the correct OPT. Let E* be the set of OPT edges 
that are touched by the optimum. An OPT labeling 
is an assignment of a label in {1,..., OPT} to the 
edges of E. We show that if the labels of E* 
are pairwise distinct, we can solve the problem 
in time h(k, 1/e). Let {u,, u2,..., ug} be the op- 
timum set. Let L; be the labels of the edges of u;. 
As all labels of E* are pairwise distinct, {L(u;)} 
is a disjoint partition of all labels (as otherwise 
there is a labeling with less than OPT labels). The 
number of possible partitions of the labels into k 
sets is at most k°°". Given the correct partition 
{L;}, we need to match every L; with a vertex u; 
so that the labels of u; are L;. This can be done 
in polynomial time by matching computation. To 
get a labeling with different pairwise labels on 
E*, we draw for every edges a label between 
1 and OPT, randomly and independently. The 
probability that the labels of E* are disjoint 
is more than 1/opT°’’. Repeating the random 
experiment for OPT°’’ times implies that with 
probability at least 1 — 1/e, one of the labeling 
has different pairwise labels for E*. This result 
can be derandomized [1]. 


We consider one example in which OPT is not the 
parameter [23]. Consider a graph that contains a 
set X = {x1,...,Xx}. so that G \ X is a planar 
graph. Thus the parameter here is the number 
of vertices that need to be removed to make the 
graph. Consider the minimum coloring problem 
on G. We can determine the best coloring of X 
in time k*. Then we can color G \ X by four 
(different) colors. A simple calculation shows 
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that this algorithm has approximation ratio at 
most 7/3. 

The following is a simple relation exist be- 
tween EPTAS and FPT theory. 


Proposition 1 [f an optimization problem P. ad- 
mits an EPTAS, then P © FPT. 


Proof We prove the theorem for minimization 
problems. For maximization problems, the proof 
is similar. Assume that P has a 1 + € approx- 
imation that runs in time f(1/e) - n°). Set 
e = 1/(2k). Using the EPTAS algorithm gives 
an f(2k)n@™ time (1 + €) approximation. If the 
optimum is at most k, we get a solution of size 
at most (1+ 6)k = k+ 1/2 < k +1. As the 
solution is integral, the cost is at most k. If the 
minimum is k + 1, the approximation will not 
return a better than k + 1 size solution. Thus the 
approximation returns cost at most k if and only 
if there is a solution of size at most k. 


Thus we can rule out the possibility of an EPTAS 
if a problem is W[1]-hard. For example, this 
shows that the maximum independent set for 
unit disks graphs admits no EPTAS as it be- 
longs to W[1]. See many more examples in [23]. 
Chen Grohe and Griiber [6] provide an early 
discussion of our topic. Lange wrote a PDF 
presentation for recent FPT approximation. The 
following theorem is due to Grohe and Griiber 
(see [19]). 


Theorem 3 [fa maximization problem admits an 
FPT-approximation algorithm with performance 
ratio p(k), then for some function p’, there exists 
a o'(k) polynomial time approximation algo- 
rithm for the problem. 


In the traveling salesperson with a deadline, the 
input is a metric on 1 points and a set D C 
V with each v € D having a deadline ty. A 
feasible solution is a simple path containing all 
vertices, so that for every v € D, the length of 
the tour until v is at most t,. The problem admits 
no constant approximation and is not in FPT 
when parameterized by |D|. See Bockenhauer, 
Hromkovic, Kneis, and Kupke [2]. In this entry, 
the authors give a 2.5 approximation that runs 
in time n° + |D|!- |D|. The parameterized 
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undirected multicut problem is given an undi- 
rected graph and a collection {5;, ¢;}7_, of pairs, 
and a parameter k is possible to remove at most 
k edges and disconnect all pairs. Garg, Vazirani, 
and Yannakakis give an O(log”) approximation 
for the problem [16]. In 2009, it was given a 
ratio 2 fixed-parameter approximation (Marx and 
Razgon) algorithm. However, Marx and Razgon 
[25] and Bousquet et al. [3] show that this prob- 
lem is in fact in FPT. Fellows, Kulik, Rosa- 
mond, and Shachnai give the following tradeoff 
(see [14]). The best known exact time algorithm 
for the vertex cover problem has running time 
1.273*. The authors show that if we settle for 
an approximation result, then the running time 
can be improved. Specifically, they gave a > 1 
approximation for vertex cover that runs in time 
1.237@-“* | The minimum edge dominating set 
problem is given a graph and a parameter k, and 
there is a subset E’ C E of size at most k so 
that every edge in E \ E’ is adjacent to at least 
one edge in E’. Escoffier, Monnot, Paschos, and 
Mingyu Xiao (see [13]) prove that the problem 
admits a | + € ratio for any 0 < € < | that 
runs in time 22-©*. A kernel for a problem P 
is a reduction from an instance J to an instance 
I’ whose size is g(k), namely, a function of k, so 
that a yes answer for J implies a yes answer for I’ 
and a no answer for J implies a no answer for I’. 
If a kernel exists, it is clear that P € FPT. How- 
ever, the size of the kernel may determine what is 
the function of k in the f(k):n?™ exact solution. 
The following result seems interesting because it 
may not be intuitive. In the tree deletion problem, 
we are given a graph G(V, £) and a number k 
and the question is if we can delete up to k 
vertices and get a tree. Archontia Giannopoulou, 
Lokshtanov, Saket, and Suchy prove (see [17]) 
that the tree deletion problem admits a kernel 
of size O(k*). However, the problem does not 
admit an approximation ratio of OPT® for any 
constant c. 


Other Parameters 

An independent set is a set vertices so that no two 
vertices in the set share an edge. In parameterized 
version given k, the question is if there is an 
independent set of size at least k. Clearly, the 
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problem is W[1]-complete. Grohe [18] show that 
the maximum independent set admits a FPT- 
approximation scheme if the parameter is the 
genus of the graph. E. D. Demaine, M. Haji- 
aghayi, and K. Kawarabayashi [9] showed that 
vertex coloring has a ratio 2 approximation when 
parameterized by the genus of a graph. The tree 
augmentation problem is given an edge-weighted 
graph and a spanning tree whose edges have cost 
0; find a minimum cost collection of edges to add 
to the tree, so that the resulting graph is 2-edge 
connected. The problem admits several polyno- 
mial time, ratio 2, and approximation algorithms. 
Breaking the 2 ratio for the problem is an impor- 
tant challenge in approximation algorithms. Co- 
hen and Nutov parameterized the problem by the 
diameter D of the tree and gave an f(D) - nO 
time, | + In2 < 1.7 approximation algorithm for 
the problem [8]. 


Fixed-Parameter Inapproximability 
The following inapproximability is from [11]. 
The additive maximum independent set problem 
is given a graph and a parameter k and a constant 
c, and the question is if the problem admits an 
independent set of size at least kK — c or no 
independent set of size k exists. 

It turns out that the problem is equivalent to 
the independent set problem. 


Theorem 4 Unless W[1] = FPT (hence, under 
the ETH), the independent set problem admits no 
additive c approximation. 


Proof Let I be the instance. Find the smallest d 


so that dk 
—C 
>k. 
[Fl 


Output d copies of G and let k - d be the 
parameter of the new instance 7’. We show that 
the new graph has independent set of size dk — 
c if and only if the original instance has an 
independent set of size k. If the original instance 
has an independent set of size k, taking union of 
d independent sets, we get an independent set of 
sizek-d. 

Now say that /’ has an independent set of size 
dk —c. The average size of an independent set in 
a graph in I’ is then (dk — c)/d. Since the size 
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of the independent set is integral, there is a copy 
that admits an independent set of size 


dk-—c 
d |=« 


An independent set J is maximal if for every v ¢ 
7, v-+T is not an independent set. The problem of 
minimum size maximal independent set (MSDIS) 
is shown to be completely inapproximable in 
[11]. Namely, this problem is (r(x), t(k))-FPT- 
hard for any r,t, unless FPT = W/[2] (hence, 
under the ETH). The problem admits no n!~€ ap- 
proximation (see [20]). In the min-WSAT prob- 
lem, a Boolean circuit is given and the task is to 
find a satisfying assignment of minimum weight. 
The weight of an assignment is the number of 
true variables. Min-WSAT was given a complete 
inapproximability by Chen, Grohe, Griiber (see 
[6]) 2006. 

The above two problems are not monotone. 
This implies that the above results are non- 
surprising. The most meaningful complete 
inapproximability is given by Marx [24] who 
shows that the weighted circuit satisfiability for 
monotone or antimonotone circuits is completely 
FPT inapproximable. 

Of course, if the problem has almost no gap, 
namely, the instance can have value k or k — 1, it 
is hard to get a strong hardness. 

A natural question is if we can use gap re- 
ductions from approximation algorithms theory 
to get some strong lower bounds, in particular 
for clique and setcover. It turns out that this 
is very difficult even under the ETH conjecture. 
This subject is related to almost linear PCP (see 
[26]). In this entry, Moshkovitz poses a conjec- 
ture called the projection game conjecture (PGC). 
M. Hajiaghayi, R. Khandekar, and G. Kortsarz 
show the following theorem. 


Theorem 5 Under the ETH and PGC con- 
jectures, SETCOVER is (r,t)-FPT-hard for 
r(k) = (logk)” and t(k) = exp(exp((logk)”))- 
poly(1) exp (kltoe’ ») - poly(n) for some 
constant y > land f =y-—1. 
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Problem Definition 


The problem is concerned with efficient cod- 
ing of the constraint that defines the placement 
of objects on a plane without mutual overlap- 
ping. This has numerous motivations, especially 
in the design automation of integrated semicon- 
ductor chips, where almost hundreds of millions 
of rectangular modules shall be placed within 
a small rectangular area (chip). Until 1994, the 
only known coding efficient in computer-aided 
design was Polish-Expression [1]. However, this 
can only handle a limited class of placements 
of the slicing structure. In 1994 Nakatake, Fu- 
jiyoshi, Murata, and Kajitani [2] and Murata, 
Fujiyoshi, Nakatake, and Kajitani [3] were finally 
successful to answer this long-standing problem 
in two contrasting ways. Their code names are 
Bounded-Sliceline-Grid (BSG) for floorplanning 
and Sequence-Pair (SP) for placement. 


Notations 


1. Floorplanning, placement, compaction, pack- 
ing, layout: Often they are used as exchange- 
able terms. However, they have their own 
implications to be used in the following con- 
text. Floorplanning concerns the design of the 
plane by restricting and partitioning a given 
area on which objects are able to be prop- 
erly placed. Packing tries a placement with 
an intention to reduce the area occupied by 
the objects. Compaction supports packing by 
pushing objects to the center of the placement. 
The result, including other environments, is 
the /ayout. BSG and SP are paired concepts, 
the former for “floorplanning,” the latter for 
“placement.” 

2. ABLR-relation: The objects to be placed are 
assumed rectangles in this entry though they 
could be more general depending on the prob- 
lem. For two objects p and q, p is said to 
be above q (denoted as pAq) if the bottom 
edge (boundary) of p is above the top edge 
of q. Other relations with respect to “below” 
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(0,0) 


Floorplan and Placement, Fig. 1 (a) A feasible place- 
ment whose ABLR-relations could be observed differ- 
ently. (b) Compacted placement if ABLR-relations are 
(qLr), (sAp), .... Its sequence-pair is SP = (qspr,pqrs) 


(pBq), “lefi-of’ (pLq), and “right-of” (pRq) 
are analogously defined. These four relations 
are generally called ABLR-relations. 

A placement without mutual overlapping 
of objects is said to be feasible. Trivially, a 
placement is feasible if and only if every pair 
of objects is in one of ABLR-relations. The 
example in Fig. 1 will help these definitions. 

It must be noted that a pair of objects may 
satisfy two ABLR-relations simultaneously, 
but not three. Furthermore, an arbitrary set 
of ABLR-relations is not necessarily consis- 
tent for any feasible placement. For example, 
any set of ABLR-relations including relations 
(pAq), (qAr), and (rAp) is not consistent. 

3. Compaction: Given a placement, its bounding- 
box is the minimum rectangle that encloses 
all the objects. A placement of objects is 
evaluated by the smallness of the bounding- 
box’s area, abbreviated as the bb-area. An 
ABLR-relation set is also evaluated by the 
minimum bb-area of all the placements that 
satisfy the set. However, given a consistent 
ABLR-relation set, the corresponding 
placement is not unique in general. Still, 
the minimum bb-area is easily obtained by 
a common technique called the “Longest-Path 
Algorithm.” (See, e.g., [4].) 

Consider the placement whose objects are 
all inside the 1st quadrant of the xy-coordinate 
system, without loss of generality with respect 
to minimizing the bb-area. It is evident that if 
a given ABLR-relation set is feasible, there is 
an object that has no object left or below it. 


c d 

= 
and single-sequence is SS = (2413). (c) Compacted 
placement for (qLr), (sRp), .... SP = (qpsr,pqrs). SS = 


(2143). (d) Compacted placement if (qAr), (SAp), .... SP 
= (qspr,prqs). SS = (3412) 
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Place it such that its left-bottom corner is at the 
origin. From the remaining objects, take one 
that has no object left of or below it. Place it as 
leftward and downward as long as any ABLR- 
relation with already fixed objects is not vio- 
lated. See Fig. 1 to catch the concept, where 
the ABLR-relation set is the one obtained the 
placement in (a) (so that it is trivially feasi- 
ble). It is possible to obtain different ABLR- 
relation sets, according to which compaction 
would produce different placements. 


. Slice-line: If it is possible to draw a straight 


horizontal line or vertical line to separate the 
objects into two groups, the line is said a slice- 
line. If each group again has a slice-line, and 
so does recursively, the placement is said to be 
a slicing structure. Figure 2 shows placements 
of slicing and non-slicing structures. 


. Spiral: Two structures each consisting of four 


line segments connected by a T-junction as 
shown in Fig. 3a are spirals. Their regular 
alignment in the first quadrant as shown in 
(b) is the Bounded-Sliceline-Grid or BSG. A 
BSG is a floorplan, or a T-junction dissection, 
of the rectangular area into rectangular regions 
called rooms. It is denoted as ann x m BSG 
if the numbers of rows and columns of its 
rooms are n and m, respectively. According 
to the left-bottom room being p-type or q- 
type, the BSG is said to be p-type or q-type, 
respectively. 


In a BSG, take two rooms x and y. The ABLR- 
relations between them are all that is defined by 
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Floorplan and Placement, Fig. 2 (a) A placement with a slice-line. (b) A slicing structure since a slice-line can be 
found in each ith hierarchy No. k(k = 1,2, 3, 4). (c) A placement that has no slice-line 


Floorplan and 
Placement, Fig.3 (a) 
Two types of the spiral 
structure (b) 5 x 5 p-type 
bounded-slice line-grid 


(BSG) 


the rule: If the bottom segment of x is the top 
segment of y (Fig.3), room x is above room y. 
Furthermore, Transitive Law is assumed: If “x is 
above y” and “z is above x,” then “z is above y.” 
Other relations are analogously defined. 


Lemma 1 A room is in a unique ABLR-relation 
with every other room. 


Ann xn BSG has n? rooms. A BSG-assignment 
is a one-to-one mapping of n objects into the 
rooms of n x n BSG. (n? — n rooms remain 
vacant.) 

After a BSG-assignment, a pair of two ob- 
jects inherits the same ABLR-relation as the 
ABLR-relation defined between corresponding 
rooms. In Fig.3, if x, y, and z are the names 
of objects, the ABLR-relations among them are 


{(xAy), (xRz), (yBx), (yBz), (zLx), (zAy)}. 


Key Results 


The input is n objects that are rectangles of 
arbitrary sizes. The main concern is the solution 
space, the collection of distinct consistent ABLR- 
relation sets, to be generated by BSG or SP. 


Theorem 1 ([4,5]) 


1. For any feasible ABLR-relation set, there is a 
BSG-assignment into n x n BSG of any type 
that generates the same ABLR-relation set. 

2. The sizen xn is a minimum: if the number 
of rows or columns is less than n, there is a 
feasible ABLR-relation set that is not obtained 
by any BSG-assignment. 


The proof to (1) is not trivial [5] (Appendix). 
The number of solutions is ,2C,. A remarkable 
feature of ann xn BSG is that any ABLR-relation 
set of n objects is generated by a proper BSG- 
assignment. By this property, BSG is said to be 
universal [11]. 

In contrast to the BSG-based generation of 
consistent ABLR-relation sets, SP directly im- 
poses the ABLR-relations on objects. 

A pair of permutations of object names, repre- 
sented as ((+,I"~), is called the sequence-pair, 
or SP. See Fig. 1. An SP is decoded to a unique 
ABLR-relation set by the rule: 

Consider a pair (x, y) of names such that x is 
before y in I~. Then (xLy) or (xAy) if x is before 
or after y in I'*, respectively. ABLR-relations 
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“B” and “R” can be derived as the inverse of “A” 
and “L.” Examples are given in Fig. |. 

A remarkable feature of sequence-pair is that 
its generation and decoding are both possible 
by simple operations. The question is what the 
solution space of all SPs is. 


Theorem 2 Any feasible placement has a corre- 
sponding SP that generates an ABLR-relation set 
satisfied by the placement. On the other hand, any 
SP has a corresponding placement that satisfies 
the ABLR-relation set derived from the SP. 


Using SP, acommon compaction technique men- 
tioned before is described in a very simple way: 


Minimum Area Placement from 
SP = (r*+,Ir-) 


1. Relabel the objects such that [~ = 
(1,2,...,). Then D* = (py, po,..., Pn) 
will be a permutation of numbers 1,2,...,7. 
It is simply a kind of normalization of SP 
[6]. But Kajitani [11] considers it a concept 
derived from Q-sequence [10] and studies its 
implication by the name of single-sequence or 
SS. In the example in Fig. 1b, p, q, r, and s are 
labeled as 1, 2, 3, and 4 so that SS = (2413). 

2. Take object 1 and place it at the left-bottom 
corner in the Ist quadrant. 

3. Fork = 2,3,...,n, place k such that its left 
edge is at the rightmost edge of the objects 
with smaller numbers than k and lie before k 
in SS, and its bottom edge is at the topmost 
edge of the objects with smaller numbers than 
k and lie after & in SS. 


Applications 


Many ideas followed after BSG and SP [2-5] 
as seen in the reference. They all applied a 
common methodology of a stochastic heuristic 
search, called simulated annealing, to generate 
feasible placements one after another based on 
some evaluation (with respect to the smallness 
of the bb-area) and to keep the best-so-far as the 
output. This methodology has become practical 
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by the speed achieved due to their simple data 
structure. The first and naive implementation of 
BSG [2] could output the layout of sufficiently 
small area placement of 500 rectangles in several 
minutes. (Finding a placement with the minimum 
bb-area is NP-hard [3].) Since then many ideas 
followed, including currently widely used codes 
such as O-tree [7], B*-tree [8], corner block 
list [9], Q-sequence [10], single-sequence [11], 
and others. Their common feature is in coding 
the nonoverlapping constraint along horizontal 
and vertical directions, which is the inheritant 
property of rectangles. 

As long as applications are concerned with 
the rectangle placement in the minimum area and 
do not mind mutual interconnection, the problem 
can be solved practically enough by BSG, SP, and 
those related ideas. However, in an integrated cir- 
cuit layout problem, mutual connection is a major 
concern. Objects are not restricted to rectangles, 
even soft objects are used for performance. Many 
efforts have been devoted with a certain degree 
of success. For example, techniques concerned 
with rectilinear objects, rectilinear chip, insertion 
of small but numerous elements like buffers and 
decoupling capacitors, replacement for design 
change, symmetric placement for analog circuit 
design, three-dimensional placement, etc. have 
been developed. Here few of them is cited but 
it is recommended to look at proceedings of 
ICCAD (International Conference on Computer- 
Aided Design), DAC (Design Automation 
Conference), ASPDAC (Asia and South Pacific 
Design Automation Conference), DATE (Design 
Automation and Test in Europe), and journals 
TCAD (IEEE Trans. on Computer-Aided Design) 
and TCAS (IEEE Trans. on Circuit and Systems), 
particularly those that cover VLSI (Very Large 
Scale Integration) physical design. 


Open Problems 
BSG 


The claim of Theorem | that a BSG needs n rows 
to provide any feasible ABLR-relation set is rea- 
sonable if considering a placement of all objects 
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Floorplan and 
Placement, Fig. 4 
Octagonal BSG of size n, 
p-type: (a) If n is odd, it 
has (n? + 1)/2 rooms. (b) 
If n is even, it has 

(n? + 2n)/2 rooms 


aligned vertically. This is due to the rectangular 
framework of a BSG. However, experiments have 
been suggesting a question if from the beginning 
[5] if we need such big BSGs. The octagonal 
BSG is defined in Fig. 4. It is believed to hold the 
following claim expecting a drastic reduction of 
the solution space. 

Conjecture (BSG): For any feasible ABLR- 
relation set, there is an assignment of n objects 
into octagonal BSG of size n, any type, that 
generates the same ABLR-relation set. 

If this is true, then the size of the solution 
space needed by a BSG reduces to (424 1)/2Cn or 


(n242n)/2Cn- 


SP or SS 


It is possible to define the universality of SP or 
SS in the same manner as defined for BSG. In 
general, two sequences of arbitrary & numbers 
P = (pi, p2,---, Pk) and Q = (qi, q2,---,qk) 
are said similar with each other if ord(p;) = 
ord(q;) for every i where ord(p;) = j implies 
that p; is the jth smallest in the sequence. If 
they are single-sequences, two similar sequences 
generate the same set of ABLR-relations under 
the natural one-to-one correspondence between 
numbers. 

An SS of length m (necessarily > n) is said 
universal of order n if SS has a subsequence (a 
sequence obtained from SS by deleting some of 
the numbers) that is similar to any sequence of 
length n. Since rooms of a BSG are considered 
n? objects, Theorem 1 implies that there is a 
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universal SS of order n whose length is n?. The 
known facts about smaller universal SS are: 


1. Forn = 2,132,231,213, and 312 are the 
shortest universal SS. Note that 123 and 321 
are not universal. 

2. Forn = 3,SS = 
universal SP. 

3. Forn = 4, the shortest length of universal SS 
10 or less. 

4. The size of universal SS is Q(n?) (Imahori S, 
Dec 2005, Private communication). 


41352 is the shortest 


Open Problem (SP) 


It is still an open problem to characterize the 
universal SP. For example, give a way to (1) 
certify a sequence as universal and (2) generate 
a minimum universal sequence for general n. 
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Problem Definition 


Shortest-job-first heuristics arise in sequencing 
problems, when the goal is minimizing the 
perceived latency of users of a multiuser 
or multitasking system. In this problem, the 
algorithm has to schedule a set of jobs on a pool 
of m identical machines. Each job has a release 
date and a processing time, and the goal is to 
minimize the average time spent by jobs in the 
system. This is normally considered a suitable 
measure of the quality of service provided by 
a system to interactive users. This optimization 
problem can be more formally described as 
follows: 


Input 
A set of m identical machines and a set of n jobs 
1,2,...,n. Every job j has a release date r; and 


a processing time p;. In the sequel, Z denotes the 
set of feasible input instances. 


Goal 

The goal is minimizing the average flow (also 
known as average response) time of the jobs. Let 
C; denote the time at which job j is completed by 
the system. The flow time or response time F; of 
job jis defined by F; = C; — r;. The goal is thus 
minimizing 


1 n 
min — Yee 
neat 
J=1 
Since n is part of the input, this is equivalent to 
minimizing the toral flow time, i.e., )7;—1 Fj. 


Off-line versus On-line 

In the off-line setting, the algorithm has full 
knowledge of the input instance. In particular, 
for every j = 1,...,n, the algorithm knows 1; 
and pj. 


Flow Time Minimization 


Conversely, in the on-line setting, at any time f, 
the algorithm is only aware of the set of jobs 
released up to time f. 

In the sequel, A and OPT denote, respectively, 
the algorithm under consideration and the op- 
timal, off-line policy for the problem. A(/) and 
OPT(I) denote the respective costs on a specific 
input instance J. 


Further Assumptions in the On-line Case 
Further assumptions can be made as to the algo- 
rithm’s knowledge of processing times of jobs. 
In particular, in this survey an important case is 
considered, realistic in many applications, i.e., 
that p; is completely unknown to the on-line al- 
gorithms until the job eventually completes (non- 
clairvoyance) [1, 3]. 


Performance Metric 

In all cases, as is common in combinatorial op- 
timization, the performance of the algorithm is 
measured with respect to its optimal, off-line 
counterpart. In a minimization problem such as 
those considered in this survey, the competitive 
ratio p, is defined as: 


A(/) 
=> max —W——_. 
PA Ter OPT) 


In the off-line case, p, is the approximation 
ratio of the algorithm. In the on-line setting, p, is 
known as the competitive ratio of A. 


Preemption 

When preemption is allowed, a job that is being 
processed may be interrupted and resumed later 
after processing other jobs in the interim. As 
shown further, preemption is necessary to design 
efficient algorithms in the framework considered 
in this survey [5, 6]. 


Key Results 


Algorithms 

Consider any job j in the instance and a time ¢ 
in A’s schedule, and denote by w(t) the amount 
of time spent by A on job j until ¢. Denote 
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by x;(¢) = pj —w,(t) its remaining processing 
time at t. 

The best known heuristic for minimizing the 
average flow time when preemption is allowed 
is shortest remaining processing time (SRPT). 
At any time ¢, SRPT executes a pending job j 
such that x;(¢) is minimum. When preemption is 
not allowed, this heuristic translates to shortest 
job first (SJF): at the beginning of the schedule, 
or when a job completes, the algorithm chooses 
a pending job with the shortest processing time 
and runs it to completion. 


Complexity 

The problem under consideration is polynomially 
solvable on a single machine when preemption 
is allowed [9, 10]. When preemption is allowed, 
SRPT is optimal for the single-machine case. On 
parallel machines, the best known upper bound 
for the preemptive case is achieved by SRPT, 
which was proven to be O(logminn/m, P)- 
approximate [6], P being the ratio between the 
largest and smallest processing times of the in- 
stance. Notice that SRPT is an on-line algorithm, 
so the previous result holds for the on-line case 
as well. The authors of [6] also prove that this 
lower bound is tight in the on-line case. In the off- 
line case, no non-constant lower bound is known 
when preemption is allowed. 

In the non-preemptive case, no off-line algo- 
rithm can be better than 2 (n!/3~)-approximate, 
for every «€ > 0, the best upper bound being 
O(./n/m log(n/m)) [6]. The upper and lower 
bound become O(./7) and Q(n'/2-£) for the 
single machine case [5]. 


Extensions 

Many extensions have been proposed to the sce- 
narios described above, in particular for the pre- 
emptive, on-line case. Most proposals concern 
the power of the algorithm or the knowledge of 
the input instance. For the former aspect, one 
interesting case is the one in which the algo- 
rithm is equipped with faster machines than its 
optimal counterpart. This aspect has been con- 
sidered in [4]. There the authors prove that even 
a moderate increase in speed makes some very 
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simple heuristics have performances that can be 
very close to the optimum. 

As to the algorithm’s knowledge of the input 
instance, an interesting case in the on-line setting, 
consistent with many real applications, is the non- 
clairvoyant case described above. This aspect 
has been considered in [1, 3]. In particular, the 
authors of [1] proved that a randomized variant 
of the MLF heuristic described above achieves 
a competitive ratio that in the average is at most 
a polylogarithmic factor away from the opti- 
mum. 


Applications 


The first and traditional field of application for 
scheduling policies is resource assignment to pro- 
cesses in multitasking operating systems [11]. In 
particular, the use of shortest-job-like heuristics, 
notably the MLF heuristic, is documented in 
operating systems of wide use, such as UNIX 
and WINDOWS NT [8, 11]. Their application to 
other domains, such as access to Web resources, 
has been considered more recently [2]. 


Open Problems 


Shortest-job-first-based heuristics such as those 
considered in this survey have been studied in 
depth in the recent past. Still, some questions 
remain open. One concerns the off-line, parallel- 
machine case, where no non-constant lower 
bound on the approximation is known yet. As 
to the on-line case, there still is no tight lower 
bound for the non-clairvoyant case on parallel 
machines. The current @2(logn) lower bound 
was achieved for the single-machine case [7], 
and there are reasons to believe that it is below 
the one for the parallel case by a logarithmic 
factor. 
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Problem Definition 


Given a connected undirected graph, the problem 
is to determine a straight-line layout such that the 
structure of the graph is represented in a readable 
and unbiased way. Part of the problem is the 
definition of readable and unbiased. 

Formally, we are given a simple, undirected 
graph G = (V, E) with vertex set V and edge set 
EC tan Let n = |V| be the number of vertices 
and m = |E| the number of edges. The neighbors 
of a vertex v are defined as N(v) = {u : {u,v} € 
E}, and deg(v) = |N(v)| is its degree. We 
assume that G is connected, for otherwise the 
connected components can be treated separately. 

A (two-dimensional) layout for G is a vec- 
tor p = (py)vey of vertex positions py = 
(xy, ¥v) € IR?. Since edges are drawn as line 
segments, the drawing is completely determined 
by these vertex positions. All approaches in this 
chapter generalize to higher-dimensional layouts, 
and there are variants for various graph classes 
and desired layout features; we only discuss some 
of them briefly at the end. 

The main idea is to make use of physical 
analogies. A graph is likened to a system of 
objects (the vertices) that are subject to varying 
forces (derived from structural features). Forces 
cause the objects to move around until those 
pushing and pulling into different directions can- 
cel each other out and the graph layout reaches 
an equilibrium state. Equivalently, states might be 
described by an energy function so that forces are 
not specified directly, but derived from gradients 
of the energy function. 

As a reference model, consider the layout 
energy function 


A(p) = > Il Pu — Poll? 


{u,v}EE 


(1) 
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where || Py — Pull? = (Xu — Xv)? + Wu — Yu)? is 
the squared Euclidean distance of the endpoints 
of edge {u, v}. It associates with a layout the sum 
of squared edge lengths, so that its minimization 
marks an attempt to position adjacent vertices 
close to each other. Because of its straightforward 
physical analogy, we refer to the minimization of 
Eq. (1) as the attraction model. 

Note that minimum-energy layouts of this 
pure attraction model are degenerate in that all 
vertices are placed in the same position, since 
such layouts p are exactly those for which 
A(p) = 0 for a connected graph. 

Even if it has not been the starting point of any 
of the approaches sketched in the next section, 
it is instructive to think of them as different 
solutions to the degeneracy problem inherent in 
the attraction model. 


Key Results 


We present force-directed layout methods as vari- 
ations on the attraction model. The first two 
variants retain the objective but introduce con- 
straints, whereas the other two modify the objec- 
tive (Fig. 1). 

For the constraint-based variants, it is more 
convenient to analyze the attraction model in 
matrix form. A necessary condition for a (lo- 
cal) minimum of any objective function is that 
all partial derivatives vanish. For the attraction 
model (1), this amounts to 


9 Aes = ae =0 forallveV. 
OXy 0 


Yv 


For any v € V, 


0 ! 
5 Ale) = YO 2-0) = 0, 
i ue N(v) 


and likewise for i AC p). The necessary condi- 


tions can therefore be translated into 


enw Xy 
deg(v) 


Dents Yu 


d = 
Ae aeg(v) 


Xy = 
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Force-Directed Graph Drawing, Fig. 1 Three different layouts of the same planar triconnected graph. (a) Barycen- 


tric. (b) Spectral. (c) Stress 


for all v € V, i.e., every vertex must lie in 
the barycenter of its neighbors. Bringing all vari- 
ables to the left-hand side, we obtain a system 
of linear equations whose coefficients form an 
eminent graph-related matrix, the Laplacian ma- 
trix L(G) = D(G) — A(G), where D(G) is 
a diagonal matrix with diagonal entries deg(v), 
v € V, and A(G) is the adjacency matrix of G. 
The entries of L = L(G) = (€uy)u,vev are thus 


deg(v) ifu=v 
Lyy = 4-1 ifuA~A vand {u,v} Ee E 
0 otherwise. 


so that the optimality conditions can be written as 


L-p=0, (2) 


where 0 is an 7 x 2-matrix of zeros. As discussed 
above, the solutions to this system of linear equa- 
tions are given by those layouts p in which all x- 
and all y-coordinates are identical. 


Fixed Boundary 

An intuitive approach to prevent attraction from 
collapsing an entire graph onto a single point is 
to grab a few of its vertices and drag them apart. 
Technically, this corresponds to constraining 
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the layout by fixing select vertices to distinct 
positions. 

Let B C V bea nonempty subset of boundary 
vertices for which positions py = (X»y. Jv), 
v € B are pre-specified. A layout is called 
barycentric (with respect to these constraints) if 
it satisfies 


Bs ifve B 


Po = 1 : 
a ae N@) Pu otherwise. 


We next show that the solution of the attrac- 
tion model with a proper boundary constraint is 
unique by showing that the reduced system of 
linear equations has a coefficient matrix with a 
nonzero determinant. Let L? denote the matrix 
obtained by striking out the rows and columns of 
L indexed by B. Then, a barycentric layout is a 
solution of 


L® . py\p = (3) 


so 
uEN(v)NB V\B 
Different from the pure attraction model, the 
barycentric model (with a nondegenerate bound- 
ary) has a nondegenerate solution that is uniquely 
defined. Recall that a system of linear equations 
has a unique solution if and only if the determi- 
nant of its matrix of coefficients is positive. The 
Matrix Tree Theorem [9] asserts that the deter- 
minant of every principal minor of a Laplacian 
matrix equals the number of spanning trees of 
its associated multigraph, and L? is a principal 
minor of the Laplacian of the graph obtained 
from G by contracting the vertices in B. Since 
this graph has at least one spanning tree, the 
determinant of L? is positive and the solution 
of (3) is thus unique. 

The barycentric approach was introduced in 
Tutte [15]. The main result shown in this paper is, 
in fact, that barycentric layouts of a triconnected 
planar graph with one face constrained to a con- 
vex polygon are planar. For the purpose of graph 
drawing, less desirable properties are exponen- 
tially small resolution of angles and edge lengths 
as evidenced by a family of triangular graphs 
obtained by starting from a triangle and adding 
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vertices adjacent to the same two initial ver- 
tices and the most recently added one. For non- 
triconnected graphs, degenerate subgraph layouts 
are possible because components lacking bound- 
ary vertices are mapped to a line if between a 
separation pair, or to a point if hinging on a cut 
vertex. 


Orthogonality 

Barycentric layouts are systematically biased by 
the choice of boundary. An alternative constraint 
avoiding the single-point collapse is to constrain 
the coordinate vector of each dimension to be 
orthogonal to the degenerate layout. 

Observe that the one-dimensional version of 
Eq. (2) can also be read as a special case of the 
eigenequation Lx = Ax, since A = 0 is, in fact, 
an eigenvalue of L associated with eigenvector 
1. The Laplacian of a simple undirected 
graph is a real, symmetric, and positive semi- 
definite matrix so that the eigenvalues are real and 
nonnegative, and eigenvectors associated with 
different eigenvalues are orthogonal. 

Rearranging the eigenequation yields 4 = 


va 
x J2 where x7 
xe x 


x. => 


x only normalizes for scale. 
Since x? Lx = A(x), eigenvectors x | y associ- 
ated with the smallest positive eigenvalues yield 
the best layout in the attraction model subject to 
orthogonality also with the degenerate layout 1. 
Note that 1 L x implies that the average of all 
coordinates xy is zero, so that the layouts are 
centered on the origin. 

Spectral drawings based on the Laplacian have 
been proposed by Hall [7], but can be also be 
defined via other matrices [11]. It is interesting 
to note that Laplacian spectral layout corresponds 
to classical multidimensional scaling using the 
square root of effective resistance as a measure 
of distance between vertices. While spectral lay- 
outs display symmetries, they are highly cluttered 
and imbalanced for graphs of low algebraic con- 
nectivity as measured by their smallest positive 
eigenvalue. 


Distances 

The terms in the objective function of the at- 
traction model correspond to the potential energy 
of a spring with ideal length zero. To avoid 
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collapse, one can thus replace them by springs 
of some nonzero ideal length. While this takes 
care of the adjacent pairs of vertices, vertices that 
are more than one edge away from each other 
can be connected by springs of different length, 
say proportional to their shortest-path distance. 
Down-weighting the influence of distant pairs, 
we obtain the stress-minimization model with 
objective 


Sir)= Vo 


u,vEV 


1 
du, v2 (ll Pu- Pv ||-d (u, v))? , 


constituting another special case of MDS [12] 
with graph-theoretic distances d(u, v) as input 
and inverse quadratic weights. This instantiation 
has been proposed as a graph drawing method 
by Kamada and Kawai [8] using gradient descent 
to determine locally optimal layouts. The use of 
majorization [13] was shown to be superior by 
Gansner, Koren, and North [6]. A comprehensive 
survey of variant layout objective functions is 
given in Chen and Buja [3]. 


Repulsion 

Instead of springs with nonzero ideal length, 
a dual physical analogy motivates another ap- 
proach to counter the collapse caused by attrac- 
tion, namely, repulsion. 

The classic spring embedder of Eades [4] 
specifies forces rather than an energy function. 
While there is a logarithmic force log [pur Pell . 
(Pu— Py) between adjacent vertices that is neutral 
if their distance equals a desired value /, nonadja- 
cent vertices push each other apart with quadrati- 
cally with (eee. Both forces are up to scaling 
constants. A layout is obtained by iteratively 
evaluating the forces exerted on a vertex by all 
others and then moving it in the direction of the 
resulting force, until an approximate equilibrium 
is obtained. 

Many, many variants of the spring embedder 
have been proposed. The most widely used from 
Fruchterman and Reingold [5] replaces the forces 
by quadratically declining repulsion between all 
pairs of vertices and additional quadratic attrac- 
tion between adjacent pairs and also introduces 
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several pragmatic improvements. Brandes and 
Pich [2] find that suitably initialized stress MDS 
yields superior results, though. 

More force-directed methods are surveyed in 
Brandes [1] and Kobourov [10], and forces have 
been used very creatively to realize different 
layout objectives such as common direction of 
edges, edge curvatures, angles between incident 
edges, preferred locations, and many more. A 
relation with graph clustering is pointed out in 
Noack [14]. 
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Problem Definition 


Introduction 
Field-programmable gate array (FPGA) is a 
type of integrated circuit (IC) device that can 
be (re)programmed to implement custom logic 
functions. A majority of FPGA devices use 
lookup table (LUT) as the basic logic element, 
where a LUT of K logic inputs (K-LUT) 
can implement any Boolean function of up 
to K variables. An FPGA also contains other 
logic elements, such as registers, programmable 
interconnect resources, dedicated logic resources 
such as memory blocks and digital signal 
processing (DSP) blocks, and input/output 
resources [6]. 

The programming of an FPGA involves the 
transformation of a logic design into a form 
suitable for implementation on the target FPGA 
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device. This generally takes multiple steps. For 
LUT-based FPGAs, technology mapping is to 
transform a general Boolean logic network (ob- 
tained from the design specification through ear- 
lier transformations) into a functionally equiva- 
lent K-LUT network that can be implemented 
by the target FPGA device. The objective of 
a technology mapping algorithm is to generate, 
among many possible solutions, an optimized 
one according to certain criteria, some of which 
are timing optimization, which is to make the 
resultant implementation operable at faster speed; 
area minimization, which is to make the resultant 
implementation compact in size; and power min- 
imization, which is to make the resultant imple- 
mentation low in power consumption. The algo- 
rithm presented here, named FlowMap [2], is for 
timing optimization; it was the first provably op- 
timal polynomial time algorithm for technology 
mapping problems on general Boolean networks, 
and the concepts and approach it introduced have 
since generated numerous useful derivations and 
applications. 


Data Representation and Preliminaries 
The input data to a technology mapping algo- 
rithm for LUT-based FPGA is a general Boolean 
network, which can be modeled as a direct acyclic 
graph N = (V,E). A node v € V can either 
represent a logic signal source from outside of 
the network, in which case it has no incoming 
edge and is called a primary input (PI) node, or 
it can represent a logic gate, in which case it 
has incoming edge(s) from PIs and/or other gates, 
which are its logic input(s). If the logic output of 
the gate is also used outside of the network, its 
node is a primary output (PO), which can have 
no outgoing edge if it is only used outside. 

If edge (u,v) € E,u is said to be a fanin of 
v and v a fanout of u. For a node v, input(v) 
denotes the set of its fanins; similarly, for a 
subgraph H, input(H) denotes the set of distinct 
nodes outside of H that are fanins of nodes in H. 
If there is a direct path in N from a node u to 
a node v, u is said to be a predecessor of v and 
v a successor of u. The input network of a node 
v, denoted N,, is the subgraph containing v and 
all of its predecessors. A cone of a non-PI node 
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v, denoted Cy, is a subgraph of N, containing 
v and possibly some of its non-PI predecessors, 
such that for any node u € Cy, there is a path 
from u to v in Cy. If |input(Cy)| < K,Cy is 
called a K-feasible cone. The network N is K- 
bounded if every non-PI node has a K-feasible 
cone. A cut of a non-PI node v is a bipartition 
(X, X’) of nodes in N, such that X’ is a cone 
of v; input(X’) is called the cut-set of (X, X’) 
and n(X, X’) = |input(X"')| the size of the cut. 
If n(X, X') < K, (X, X’) is a K-feasible cut. The 
volume of (X, X") is vol(X, X’) = |X’|. 

A topological order of the nodes in the net- 
work N is a linear ordering of the nodes in which 
each node appears after all of its predecessors 
and before any of its successors. Such an order 
is always possible for an acyclic graph. 


Problem Formulation 

A K-cover of a given Boolean network N is a 
network Ny = (Vy, Em), where Vy consists 
of the PI nodes of N and some K-feasible cones 
of nodes in N, such that for each PO node v of 
N, Vu contains a cone C, of v; and if C, € Vy, 
then for each non-PI node v € input(C,), Vy 
also contains a cone Cy of v. Edge (u,Cy) € 
Ey if and only if PI node u €  input(C,); 
edge (C,,C,) € Ey if and only if non-PI node 
u €_ input(C,). Since each K-feasible cone 
can be implemented by a K-LUT, a K-cover 
can be implemented by a network of K-LUTs. 
Therefore, the technology mapping problem for 
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K-LUT based FPGA, which is to transform N 
into a network of K-LUTs, is to find a K-cover 
Ny of N. 

The depth of a network is the number of edges 
in its longest path. A technology mapping solu- 
tion Ny is depth optimal if among all possible 
mapping solutions of N it has the minimum 
depth. If each level of K-LUT logic is assumed 
to contribute a constant amount of logic delay 
(known as the unit delay model), the minimum 
depth corresponds to the smallest logic propaga- 
tion delay through the mapping solution, or in 
other words, the fastest K-LUT implementation 
of the network N. The problem solved by the 
FlowMap algorithm is depth-optimal technology 
mapping for K-LUT based FPGAs. 

A Boolean network that is not K-bounded 
may not have a mapping solution as defined 
above. To make a network K-bounded, gate 
decomposition may be used to break larger 
gates into smaller ones. The FlowMap algorithm 
applies, as preprocessing, an algorithm named 
DMIG [3] that converts all gates into 2-input 
ones in a depth-optimal fashion, thus making 
the network K-bounded for K > 2. Different 
decomposition schemes may result in different 
K-bounded networks and consequently different 
mapping solutions; the optimality of FlowMap 
is with respect to a given K-bounded network 
Fig. | illustrates a Boolean network, its DAG, a 
covering with 3-feasible cones, and the resultant 
3-LUT network. As illustrated, the cones in 
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FPGA Technology Mapping, Fig. 1 A Boolean network, its DAG. a 3-feasible cone covering, and a 3-LUT mapping 
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the covering may overlap; this is allowed and 
often beneficial. (When the mapped network is 
implemented, the overlapped portion of logic 
will be replicated into each of the K-LUTs that 
contain it) 


Key Results 


The FlowMap algorithm takes a two-phase ap- 
proach. In the first phase, it determines for each 
non-PI node a preferred K-feasible cone as a can- 
didate for the covering; the cones are computed 
such that if used, they will yield a depth-optimal 
mapping solution. This is the central piece of 
the algorithm. In the second phase, the cones 
necessary to form a cover are chosen to generate 
a mapping solution. 


Structure of Depth-Optimal K-Covers 

Let M(v) denote a K-cover (or equivalently, K- 
LUT mapping solution) of the input network N, 
of v. If v is a PI, M(v) consists of v itself. (For 
simplicity, in the rest of the article, M(v) shall be 
referred as a K-cover of v.) With that defined, first 
there is 


Lemma 1 /f C, is the K-feasible cone of v ina 
K-cover M(v), then M(v) = {Cy} + U{M(u) : 
u € input(Cy)} where M(u) is a certain K-cover 
of u. Conversely, if Cy is a K-feasible cone of v, 
and for each u € input(Cy), M(u) a K-cover of 
u, then M(v) = {Cy}+U{M(u) : u € input(C,)} 
is a K-cover of v. 


In other words, a K-cover of a node consists 
of a K-feasible cone of the node and a K-cover 
of each input of the cone. Note that for uw, € 
input(Cy),u2 € input(Cy),M(u,) and M(u2) 
may overlap, and an overlapped portion may or 
may not be covered the same way; the union 
above includes all distinct cones from all parts. 
Also note that for a given Cy, there can be 
different K-covers of v containing Cy, varying 
by the choice of M(u) for each u € input(Cy). 

Let d(M(v)) denote the depth of M(v). Then 


Lemma2 For K-cover M(v) = {Cy} + 


U{M(u) : u € input(Cy)}, 
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d(M(v)) = max{d(M(u)) : u € input(Cy)}+1. 


In particular, let M*(u) denote a K-cover 
of u with minimum depth, then d(M(v)) > 
max{d(M*(u)) : u €  input(Cy)} + 1; the 
equality holds when every M(u) in M(v) is of 
minimum depth. 

Recall that C, defines a K-feasible cut 
(X,X') where X’' = C,,X = Ny — Cy. Let 
H(X, X') denote the height of the cut (X, X’), 
defined as H(X, X') = max{d(M*(u)) : u € 
input(X’)} + 1. Clearly, H(X,X’) gives the 
minimum depth of any K-cover of v containing 
C, = X’'. Moreover, by properly choosing 
the cut, H(X,X’) height can be minimized, 
which leads to a K-cover with minimum 
depth: 


Theorem 1 /f K feasible cut (X, X') of v has the 
minimum height among all K-feasible cuts of v, 
then the K-cover M*(v) = {X'} + U{M*(u) : 
u € input(X')} is of minimum depth among all 
K-covers of v. 


That is, a minimum height K-feasible cut 
defines a minimum depth K-cover. So the cen- 
tral task for depth-optimal technology mapping 
becomes the computation of a minimum height 
K-feasible cut for each PO node. 

By definition, the height of a cut depends 
on the (depths of) minimum depth K-covers of 
nodes in Ny — {v}. This suggests a dynamic 
programming procedure that follows topological 
order, so that when the minimum depth K-cover 
of v is to be determined, a minimum depth K- 
cover of each node in Ny — {v} is already known 
and the height of a cut can be readily determined. 
This is how the first phase of the FlowMap 
algorithm is carried out. 


Minimum Height K-Feasible Cut 

Computation 

The first phase of FlowMap was originally called 
the labeling phase, as it involves the computation 
of a label for each node in the K-bounded graph. 
The label of a non-PI node v, denoted /(v), is 
defined as the minimum height of any cut of 
uv. For convenience, the labels of PI nodes are 
defined to be 0. 
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The so-defined label has an important mono- 
tonic property. 


Lemma3 Let p = max{l(u) : ui € 
input(v)}, then p <1(v) < p+1. 


Note that this also implies that for any node 
u € Ny — {v},/(u) < p. Based on this, in order 
to find a minimum height K-feasible cut, it is 
sufficient to check if there is one of height p; if 
not, then any K-feasible cut will be of minimum 
height (p + 1), and one always exists for a K- 
bounded graph. 

The search for a K-feasible cut of a height 
P(p > 0; p = O is trivial) in FlowMap is done 
by transforming N, into a flow network Fy, and 
computing a network flow [5] on it (hence the 
name). The transformation is as follows. For each 
node u € Ny — {v},/(u) < p, F, has two nodes 
uy and up, linked by a bridge edge (u,, uz); Fy 
has a single sink node ¢ for all other nodes in 
Ny, and a single source node s. For each PI node 
u of Ny, which corresponds to a bridge edge 
(u,,U2)inF,, Fy contains edge (s,u,); for each 
edge (u,w) in Ny, if both u and w have bridge 
edges in Fy, then F, contains edge (u2,w1); if 
u has a bridge edge but w does not, F, con- 
tains edge (u2,t); otherwise (neither has bridge) 
no corresponding edge is in F,. The bridging 
edges have unit capacity; all others have infinite 
capacity. Noting that each edge in Fy with fi- 
nite (unit) capacity corresponds to a node u € 
N, with [(u) < p and vice versa, and according 
to the max-flow min-cut theorem [5], it can be 
shown that. 


Lemma 4 Node v has a K-feasible cut of height 
Pp if and only if Fy has a maximum network flow 
of size no more than K. 


On the flow network Fy, a maximum flow 
can be computed by running the augmenting 
path algorithm [5]. Once a maximum flow is 
obtained, the residual graph of the flow network 
is disconnected, and the corresponding min-cut 
(X, X’) can be identified as follows: v € X’; for 
u € N, — {v}, if it is bridged in Fy, and uw; can 
be reached in a depth-first search of the residual 
graph from s, then u € X; otherwise u € X’. 
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Note that as soon as the flow size exceeds K, 
the computation can stop, knowing there will not 
be a desired K-feasible cut. In this case, one can 
modify the flow network by bridging all nodes in 
N, — {v} allowing the inclusion of nodes u with 
J(u) = p in the cut computation, and find a K- 
feasible cut with height p + 1 the same way. 

An augmenting path is found in linear time to 
the number of edges, and there are at most K aug- 
mentations for each cut computation. Applying 
the algorithm to every node in topological order, 
one would have the following result. 


Theorem 2 In a K-bounded Boolean network 
of n nodes and m edges, the computation of a 
minimum height K-feasible cut for every node 
can be completed in O( Kmn) time. 


The cut found by the algorithm has another 
property: 


Lemma 5 The cut (X, X') computed as above is 
the unique maximum volume min-cut; moreover, 
if (Y, Y’) is another min-cut, then Y’ C X'. 


Intuitively, a cut of larger volume defines a 
larger cone which covers more logic, therefore a 
cut of larger volume is preferred. Note however 
that Lemma 5 only claims maximum among min- 
cuts; if n(X,X’) < K, there can be other cuts 
that are still K-feasible but with larger cut size 
and larger cut volume. A post-processing algo- 
rithm used by FlowMap tries to grow (X, X’) by 
collapsing all nodes in X’, plus one or more in the 
cut-set, into the sink, and repeat the flow compu- 
tation; this will force a cut of larger volume, an 
improvement if it is still K-feasible. 


K-Cover Construction 

Once minimum height K-feasible cuts have been 
computed for all nodes, each node v has a K- 
feasible cone C, defined by its cut, which has 
minimum depth. From here, constructing the K- 
cover Ny = (Vy,Em) is straight-forward. 
First, the cones of all PO nodes are included in 
Vu. Then, for any cone C, € Vy, cone C, for 
each non-PI node u € input (C,) is also included 
in Vy; so is every PI node u €_ input(Cy). 
Similarly, an edge (C,, Cy) € Ey for each non- 
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PI node u € input(C,); an edge (u,Cy) € Emu 
for each PI node u € input(C,). 


Lemma 6 The K-cover constructed as above is 
depth optimal. 


This is a linear time procedure, therefore 


Theorem 3 The problem of depth-optimal tech- 
nology mapping for K-LUT based FPGAs on a 
Boolean network of n nodes and m edges can be 
solved in O(Kmn) time. 


Applications 


The FlowMap algorithm has been used as a 
centerpiece or a framework for more complicated 
FPGA logic synthesis and technology mapping 
algorithms. There are many possible variations 
that can address various needs in its applications. 
Some are briefed below; details of such varia- 
tions/applications can be found in [1, 3]. 


Complicated Delay Models 

With minimal change, the algorithm can be ap- 
plied where non-unit delay model is used, allow- 
ing delay of the nodes and/or the edges to vary, 
as long as they are static. Dynamic delay models, 
where the delay of a net is determined by its 
post-mapping structure, cannot be applied to the 
algorithm. In fact, delay-optimal mapping under 
dynamic delay models is NP-hard [3]. 


Complicated Architectures 

The algorithm can be adapted to FPGA architec- 
tures that are more sophisticated than homoge- 
neous K-LUT arrays. For example, mapping for 
FPGA with two LUT sizes can be carried out by 
computing a cone for each size and dynamically 
choosing the best one. 


Multiple Optimization Objectives 

While the algorithm is for delay minimization, 
area minimization (in terms of the number of 
cones selected) as well as other objectives can 
also be incorporated, by adapting the criteria for 
cut selection. The original algorithm considers 
area minimization by maximizing the volume of 
the cuts; substantially, more minimization can be 
achieved by considering more K-feasible cuts 
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and making smart choices to, e.g., increase shar- 
ing among input networks, allow cuts of larger 
heights along no-critical paths, etc. [4] Achieving 
area optimality, however, is NP-hard [3]. 


Integration with Other Optimizations 

The algorithm can be combined with other types 
of optimizations, including retiming, logic resyn- 
thesis, and physical synthesis. 
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Problem Definition 


This entry presents results on fast algorithms 
that produce approximate solutions to problems 
which can be formulated as linear programs (LP) 
and therefore can be solved exactly, albeit with 
slower running times. The general format of the 
family of these problems is the following: Given 
a set of m inequalities on n variables, and an 
oracle that produces the solution of an appro- 
priate optimization problem over a convex set 
P é R’, find a solution x € P that satisfies 
the inequalities, or detect that no such x exists. 
The basic idea of the algorithm will always be 
to start from an infeasible solution x, and use 
the optimization oracle to find a direction in 
which the violation of the inequalities can be 
decreased; this is done by calculating a vector 
y that is a dual solution corresponding to x. 
Then, x is carefully updated towards that direc- 
tion, and the process is repeated until x becomes 
“approximately” feasible. In what follows, the 
particular problems tackled, together with the 
corresponding optimization oracle, as well as the 
different notions of “approximation” used are 
defined. 


Fractional Packing and Covering Problems 


¢ The fractional packing problem and its ora- 
cle are defined as follows: 


PACKING: Given an m X n matrix A, b > 0, 
and a convex set P in R” such that 
Ax>0, Vx € P, is there x € P such 
that Ax < b? 

Given m-dimensional vector y > 
0 and P as above, retun xX := 
argmin{y™Ax : x € P}. 


PACK_ORACLE: 


¢ The relaxed fractional packing problem and 
its oracle are defined as follows: 


RELAXED PACKING: Given ¢ > 0, an m X n matrix A, 
b > 0, and convex sets P and P 
in R” such that P C P and Ax < 
0, Vx € P, find x € P such that 
Ax < (1+6)b, or show that Ax € 
P such that Ax < b. 

REL_PACK_ORACLE: Given m-dimensional vector y > 
0 and P, P as above, return X € 
P such that y' AX < min{y™Ax : 
x € P}. 


¢ The fractional covering problem and its ora- 
cle are defined as follows: 


Given an m X n matrix A, b > 0, and 

a convex set P in R” such that Ax > 

0, Vx € P, is there x € P such that 

Ax >b? 

COVER_ORACLE: Given m-dimensional vector y > 
O and P as above, return X := 

argmax{y™Ax : x € P}. 


COVERING: 


¢ The simultaneous packing and covering 
problem and its oracle are defined as follows: 


SIMULTANEOUS Given Xn and (m—m) Xn matrices 
PACKING ANDA, A respectively, b > 0 and b > 0, 


COVERING: and a convex set P in R” such that 
Ax > Oand Ax < 0, Vx € P, is 
there x € P such that Ax < b, and 
Ax > b? 

SIM_ORACLE: Given P as above, a constant v 


and a dual solution (y,¥), return 


x € P such that Ax < vb, 

and y'Ax — Yo pax = 
iel(v.x) 

minfy'Ax — SY Paix 
1El(v,x) 


x avertex of P such that Ax < vb}, 
where I(v, x) := {1 : ajx < vb;}. 


Fractional Packing and Covering Problems 


¢ The general problem and its oracle are de- 
fined as follows: 


GENERAL: Given an m X n matrix A, an arbitrary 


vector b, and a convex set P in R”, is 

there x € P such that Ax < b,? 
GEN_ORACLE: Given m-dimensional vector y > 0 and 

P as above, return X := argmin{y' Ax : 


x € P}. 


Definitions and Notation 


For an error parameter ¢ > x0, a point x € P 
is an €-approximation solution for the fractional 
packing (or covering) problem if Ax < (1+ 
é)b (or Ax > (1 — €)b). On the other hand, 
if x € P satisfies Ax < b (or Ax > Dy), 
then x is an exact solution. For the GENERAL 
problem, given an error parameter ¢ > O and 
a positive tolerance vector d, x € P is an e- 
approximation solution if Ax < b + ed and an 
exact solution if Ax < b. An €-relaxed decision 
procedure for these problems either finds an e- 
approximation solution or correctly reports that 
no exact solution exists. In general, for a min- 
imization (maximization) problem, an (1 + ¢)- 
approximation ((1—¢)-approximation) algorithm 
returns a solution at most (1 + €) (at least (1 — e)) 
times the optimal. 

The algorithms developed work within time 
that depends polynomially on e~!, for any error 
parameter ¢ > 0. Their running time will also 
depend on the width p of the convex set P relative 
to the set of inequalities Ax < b or Ax > D 
defining the problem at hand. More specifically, 
the width p is defined as follows for each one of 
the problems considered here: 


* PACKING: p := max;maxxep F. 


* RELAXED PACKING: 6 := maxjmax,.p 4. 
(3 
¢ COVERING: p = max;Max yep 
ajx 
bi 


e SIMULTANEOUS PACKING AND COVER- 
ING: p :=  maxyep max{max; $*, max; 


any 


bj 
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* GENERAL: p := max; Maxyep [a5 bel +1, 
Ll 
where d is the tolerance vector defined above. 


Key Results 


Many of the results below were presented in [8] 
by assuming a model of computation with exact 
arithmetic on real numbers and exponentiation in 
a single step. But, as the authors mention [8], 
they can be converted to run on the RAM model 
by using approximate exponentiation, a version 
of the oracle that produces a nearly optimal 
solution, and a limit on the numbers used that 
is polynomial in the input length similar to the 
size of numbers used in exact linear programming 
algorithms. However, they leave as an open prob- 
lem the construction of e-approximate solutions 
using polylogarithmic precision for the general 
case of the problems they consider (as can be 
done, e.g., in the multicommodity flow case [4]). 


Theorem 1 For0 < é < 1, there is a determinis- 
tic e-relaxed decision procedure for the fractional 
packing problem that uses O(¢~?plog(me—!)) 
calls to PACK_ORACLE, plus the time to compute 
Ax for the current iterate x between consecutive 
calls. 


For the case of P being written as a 
product of smaller-dimension polytopes, i.e., 
P Px... x P*, each P! with width 


i 
—_ || 


obviouslyp < > p! , and a= separate 
I 


PACK_ORACLE for each P!,A!’, then ran- 
domization can be used to potentially speed 
up the algorithm. By using the notation 
PACK_ORACLE, for the P!,A’ oracle, the 
following holds: 


Theorem 2 For 0 < e¢ < _ 1, there is a 
randomized &-relaxed decision procedure for the 
fractional packing problem that is expected to use 


O (= (x “) log(me—!) +k Ta) 
7 


calls to PACK_ORACLE, for some 1 € 

{1,...,k} (possibly a different | in every 

call), plus the time to compute >)~ A!x! for the 
1 
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current iterate x = (x!,x?,...,x*) between 


consecutive calls. 


Theorem 2 holds for RELAXED PACKING as 
well, if p is replaced by 6 and PACK_ORACLE by 
REL_PACK_ORACLE. 

In fact, one needs only an approximate version 
of PACK_ORACLE. Let Cp(y) be the minimum 
cost y'Ax achieved by PACK_ORACLE for a 
given y. 


Theorem 3 Let PACK_ORACLE be replaced by 
an oracle that, given vector y = 0, finds a point 
xX € P such that y'Ax < (1 + €/2)Cp(y) + 
(e/2)Ay'b, where is minimum so that Ax < 
Ab is satisfied by the current iterate x. Then, 
Theorems I and 2 still hold. 


Theorem 3 shows that even if no efficient im- 
plementation exists for an oracle, as in, e.g., the 
case when this oracle solves an NP-hard problem, 
a fully polynomial approximation scheme for it 
suffices. 

Similar results can be proven for the fractional 
covering problem (COVER_OrACLE] is 
defined similarly to PACK_ORACLE, 
above): 


Theorem 4 For0 < é€ < 1, there is a determinis- 
tic e-relaxed decision procedure for the fractional 
covering problem that uses O(m + plog?m + 
e *plog(me')) calls to COVER_ORACLE, plus 
the time to compute Ax for the current iterate x 
between consecutive calls. 


Theorem5 For 0 < e <_ 1, there is 
a randomized ¢-relaxed decision procedure 


for the fractional packing problem that is 


expected to use om + S ') log? m + 
i 


kloge™! + e-? {| Xp! } log(me!) ] calls to 
I 


COVER_ORACLE;, for some | €{1,...,k} 
(possibly a different | in every call), plus the 


time to compute )~ A! x! for the current iterate 
I 


Pe tes xk ) between consecutive calls. 


x = (x!\x 


Let Cc(y) be the maximum cost y!Ax 
achieved by COVER_ORACLE for a given y. 
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Theorem 6 Let COVER_ORACLE be replaced 
by an oracle that, given vector y = 0, finds a 
point X € P such that y' Ax > (1—e/2)Ce(y)— 
(e/2)Ay'b, where d is maximum so that Ax> 
Ab is satisfied by the current iterate x. Then, 
Theorems 4 and 5 still hold. 


For the simultaneous packing and covering 
problem, the following is proven: 


Theorem 7 For 0 < « < 1, there is a random- 
ized s-relaxed decision procedure for the simul- 
taneous packing and covering problem that is ex- 
pected to use O(m? (log? p)e~? log(e—!m log p)) 
calls to SIM_ORACLE, and a deterministic ver- 
sion that uses a factor of log p more calls, plus 
the time to compute Ax for the current iterate x 
between consecutive calls. 


For the GENERAL problem, the following is 
shown: 


Theorem8 For 0 < e¢ <_ 1, there is 
a_ deterministic ¢&-relaxed_ decision  proce- 
dure for the GENERAL problem that uses 
O(e~* p? log(mpe')) calls to GEN_ORACLE, 
plus the time to compute Ax for the current iterate 
x between consecutive calls. 


The running times of these algorithms are pro- 
portional to the width p, and the authors devise 
techniques to reduce this width for many special 
cases of the problems considered. One example 
of the results obtained by these techniques is 
the following: If a packing problem is defined 
by a convex set that is a product of k smaller- 
dimension convex sets, i.c., P = P!x...x PK, 
and the inequalities » Aly! <_ b, then there is 

1 


a randomized ¢-relaxed decision procedure that 
is expected to use O(e~*k log(me—!) + k logk) 
calls to a subroutine that finds a minimum-cost 
point in pi = igh 2 Pp? Aix! < by, 
1 =1,...,k anda deterministic version that uses 
O(e~*k? log(me!)) such calls, plus the time 
to compute Ax for the current iterate x between 
consecutive calls. This result can be applied to the 
multicommodity flow problem, but the required 
subroutine is a single-source minimum-cost flow 
computation, instead of a shortest-path calcula- 
tion needed for the original algorithm. 


Fractional Packing and Covering Problems 


Applications 


The results presented above can be used in order 
to obtain fast approximate solutions to linear pro- 
grams, even if these can be solved exactly by LP 
algorithms. Many approximation algorithms are 
based on the rounding of the solution of such pro- 
grams, and hence one might want to solve them 
approximately (with the overall approximation 
factor absorbing the LP solution approximation 
factor), but more efficiently. Two such examples, 
which appear in [8], are mentioned here. 
Theorems 1 and 2 can be applied for 
the improvement of the running time of the 
algorithm by Lenstra, Shmoys, and Tardos 
[5] for the scheduling of unrelated parallel 
machines without preemption (R||Cmax): N 
jobs are to be scheduled on M machines, 
with each job i scheduled on exactly one 
machine j with processing time p;;, so that the 
maximum total processing time over all machines 
is minimized. Then, for any fixed r > 1, 
there is a deterministic (1 + r)-approximation 
algorithm that runs in O(M7N log? N log M) 
time and a randomized version that runs 
in O(MN logM logN) expected time. For 
the version of the problem with preemption, 
there are polynomial-time approximation 
schemes that run in O(MN7 log? N) time 
and O(MN log N log M) expected time in the 
deterministic and randomized case, respectively. 
A well-known lower bound for the metric 
Traveling Salesman Problem (metric TSP) on NV 
nodes is the Held-Karp bound [2], which can be 
formulated as the optimum of a linear program 
over the subtour elimination polytope. By using 
a randomized minimum-cut algorithm by Karger 
and Stein [3], one can obtain a randomized 
approximation scheme that computes the Held- 
Karp bound in O(N‘ log® N) expected time. 


Open Problems 


The main open problem is the further reduction 
of the running time for the approximate solution 
of the various fractional problems. One direction 
would be to improve the bounds for specific 
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problems, as has been done very successfully 
for the multicommodity flow problem in a series 
of papers starting with Shahrokhi and Matula 
[9]. Currently, the best running times for several 
versions of the multicommodity flow problems 
are achieved by Madry [6]. Shahrokhi and Matula 
[9] also led to a series of results by Grigoriadis 
and Khachiyan developed independently to [8], 
starting with [1] which presents an algorithm with 
a number of calls smaller than the one in Theorem 
1 by a factor of log(me—!)/ log m. Considerable 
effort has been dedicated to the reduction of the 
dependence of the running time on the width 
of the problem or the reduction of the width 
itself (e.g., see [10] for sequential and parallel 
algorithms for mixed packing and covering), so 
this can be another direction of improvement. 

A problem left open by [8] is the development 
of approximation schemes for the RAM model 
that use only polylogarithmic in the input length 
precision and work for the general case of the 
problems considered. 
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Problem Definition 


This problem is to enumerate all subgraphs ap- 
pearing with frequencies not less than a threshold 
value in a given graph data set. Let G(V, E, L, £) 
be a labeled graph where V is a set of vertices, 
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E CVxV aset of edges, L a set of labels and ¢ : 
V UE — L alabeling function. A labeled graph 
g(v,e, L, £) is a subgraph of G(V, E, L, £), i.e, 
g C G, if and only if a mapping f : v ~ V 
exists such that Wu; € v, f(u;) € V, €(uj) 
€(f(ui)), and V(uj,uj) € (fui). fluj)) € 
E, €(uj,uj) = €(f(ui), f(u;)). Given a graph 
data set D = {G;|i = 1,...,n}, a support of 
g in D is a set of all G; involving g in D, i.e., 
D(g) = {Gilg © G; € D}. Under a given 
threshold frequency called a minimum support 
minsup > 0, g is said to be frequent, if the 
size of D(g) i.e., |D(g)|, is greater than or equal 
to minsup. Generic frequent graph mining is 
a problem to enumerate all frequent subgraphs 
g of D, while most algorithms focus on con- 
nected and undirected subgraphs. Some focus on 
induced subgraphs or limit the enumeration to 
closed frequent subgraphs where each of them 
is maximal in the frequent subgraphs having an 
identical support. 


II 


Key Results 


Study of the frequent graph mining was 
initiated in the mid-1990s under motivation 
to analyze complex structured data acquired 
and accumulated in our society. Their major 
issue has been principles to efficiently extract 
frequent subgraphs embedded in a given graph 
data set. They invented many original canonical 
graph representations adapted to the data- 
driven extraction, which are different from these 
proposed in studies of efficient isomorphism 
checking [1] and graph enumeration without 
duplications [2]. 

Pioneering algorithms of frequent graph min- 
ing, SUBDUE [3] and GBI [4], did not solve 
the aforementioned standard problem but greed- 
ily extracted subgraphs concisely describing the 
original graph data under some measures such as 
minimum description length (MDL). The earliest 
algorithms for deriving a complete set of the 
frequent subgraphs are WARMER [5] and its ex- 
tension FARMER [6]. They can flexibly focus on 
various types of frequent subgraphs for the enu- 
meration by applying inductive logic program- 
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ming (ILP) in artificial intelligence, while they 
are not very scalable in the size of the enumerated 
subgraphs. 

AGM proposed in 2000 [7, 8] was an epoch- 
making study in the sense that it combined fre- 
quent item set mining [9] and the graph enu- 
meration and enhanced the scalability for prac- 
tical applications. It introduced technical strate- 
gies of (1) incremental candidate enumeration 
based on anti-monotonicity of the subgraph fre- 
quency, (2) canonical graph representation to 
avoid duplicated candidate subgraph extractions, 
and (3) data-driven pruning of the candidates 
by the minimum support. The anti-monotonicity 
is a fundamental nature of the subgraph fre- 
quency that |D(gi)| < |D(g2)| for any sub- 
graphs g; and gz in D if go C gy. Dozens 
of frequent graph mining algorithms have been 
studied along this line after 2000. In the rest 
of this entry, gSpan [10] and Gaston [11], con- 
sidered to be the most efficient up to date, are 
explained. 


gSpan 

gSpan derives all frequent connected subgraphs 
in a given data set of connected and undirected 
graphs [10]. For the aforementioned strategy (1), 
it applies a pattern growth technique which is 
data-driven enumeration of candidate frequent 


C1 ~fT &2 
€1 <b,T €2 


€1 <bfAT 2 
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subgraphs. It is performed by tracing vertices and 
edges of each data graph G in a DFS manner. 
Figure 1b is an example search tree generated by 
starting from the vertex labeled as Y in the graph 
(a). In a search tree T,, the vertex for the next visit 
is the one reachable from the current vertex by 
passing through an edge untraced yet in G. If the 
vertex for the next visit is the one visited earlier in 
G, the edge is called a backward edge otherwise 
a forward edge. They are depicted by dashed 
and solid lines, respectively, in Fig. 1b. When no 
more untraced edges are available from the cur- 
rent vertex, the search backtracks to the nearest 
vertex having the untraced edges. Any subtree of 
T represents a subgraph of G. We denote the sets 
of the forward and the backward edges in T as 
Egr = {elVi, j,i < j,e = (vj,v;) € E} and 
Eyr = {elVi,j,i > j,e = (vi,v;) € EF}, 
respectively, where i and j are integer indices 
numbered at the vertices in their visiting order 
inT. 

There exist many trees T representing an iden- 
tical graph G as another tree of the graph (a) 
shown in Fig. lc. This ambiguity causing dupli- 
cation and miss in the candidate graph enumer- 
ation is avoided by introducing the strategy (2). 
gSpan applied the following three types of partial 
orders of the edges in T. Given ey = (v;,,U;,) 
and €2 = (vj, Vj,), 


if and only if j; < j2 fore;,e2 € Err, 
if and only if (i) 71 < iz or (ii) i) = 72 and jy < jo, fore;,e2 € Ep, 


if and only if (i) 71 < j2 fore; € FE, 7,e2 € Egr 


or (ii) jy S72 fore, € Egy, e2 € Ep r. 


The combination of these partial orders is known 
to give a linear order of the edges. We also 
assume a total order of the labels in L and define 
a representation of T, a DFS code, as a sequence 
of S-tuples ((v;, vj), £(vi), (vi. v;)). (vj) 
following the trace order of the DFS in T. 
A DFS code is smaller if smaller edges and 
smaller labels appear in earlier 5-tuples in the 
sequence. Accordingly, we define the search 
tree T having the minimum DFS code as a 


canonical representation of G. The search tree 
in Fig. lc is canonical, since its DFS code is the 
minimum. As any subtree of a canonical T has 
its minimum DFS code, it is also canonical for its 
corresponding subgraph. 

Moreover, gSpan applies the DFS which 
chooses the untraced edge having the smallest 
5-tuples at the current vertex for visiting the 
next vertex. This focuses on the minimum DFS 
code and ensures to enumerate the canonical 
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Frequent Graph Mining, Fig. 1 A data graph and its search trees 
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Frequent Graph Mining, Fig. 2. Examples of paths and their backbones 


subtree of every subgraph before finding its other 
noncanonical subtrees. This efficiently prunes 
an infrequent subgraph without matching its 
multiple DSF codes in (3). In this manner, the 
canonical graph representation of gSpan is fully 
adapted to its search algorithm. 


Gaston 

Gaston also derives all frequent connected sub- 
graphs in a given data set of connected and 
undirected graphs [11]. It uses the polynomial 
time complexity of the enumeration of paths and 
free trees. Gaston uses a canonical path represen- 
tation, a backbone, in (2). Two sequences of the 
labels of the vertices and edges starting from a 
center, which is a middle vertex, in the path to the 
both terminals are derived as shown in Fig. 2a, 
and the reverse of the lexicographically smaller 
sequence with the appended larger sequence is 
defined to be the backbone. In case of a path 
having an even number of vertices, two centers, 
which are two middle vertices, are used as shown 


in Fig. 2b. Starting from a single vertex, Gaston 
extends a path by adding a vertex to one of the 
terminals in the strategy (1). Finally, it efficiently 
counts the frequency of the extended path in the 
data set by using its backbone and prunes the 
infrequent paths in (3). 

Gaston further enumerates free trees involving 
a frequent path as the longest path by iteratively 
adding vertices to the vertices in the free trees 
except for the terminal vertices of the path. Since 
the set of the free trees having a distinct back- 
bone as its longest path is also distinct, the set 
does not intersect each other. This reduces the 
complexity of the enumeration in (1). Moreover, 
Gaston derives a canonical representation of a 
free tree, a canonical depth sequence, for (2) by 
transforming the tree to a rooted and ordered 
tree where the root is the center of its longest 
path, and its vertices and edges are arranged in a 
lexicographically descending order of the labels. 
If the two center exists in the path, the free tree 
is partitioned for each center, and each free tree 
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is represented by its canonical depth sequence. 
Similarly to the DFS code of gSpan, any subtree 
involving the root in this rooted and ordered tree 
is a canonical depth sequence. This is beneficial 
for (1), since all canonical depth sequences are 
incrementally obtained in the depth first search. 
Gaston efficiently prunes the infrequent free trees 
in the data set by using the canonical depth 
sequence in (3). 

Gaston further enumerates cyclic subgraphs 
from a frequent free tree by iteratively adding 
edges bridging vertex pairs in the tree in (1). For 
(2), Gaston avoids duplicated enumerations of the 
cyclic subgraphs by using Nauty algorithm for 
the graph isomorphism checking [1]. It prunes 
the infrequent cyclic subgraphs of the data set 
in (3) and finally derives the frequent subgraphs. 
Gaston works very efficiently for the sparse data 
graphs, since the candidate cyclic subgraphs is 
less in such graphs. 


URLs to Code and Data Sets 


gSpan suite —_(http://www.cs.ucsb.edu/~xyan/ 
software/gSpan.htm), Gaston suite (http://www. 
liacs.nl/~snijssen/gaston/). Other common suites 
can be found for various frequent substructure 
mining (http://hms.liacs.nl/index.html). 
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Problem Definition 


Pattern mining is a fundamental problem in data 
mining. The problem is to find all the patterns 
appearing in the given database frequently. For 
aset E = {1,...,n} of items, an itemset (also 
called a pattern) is a subset of F. Let D be a given 
database composed of transactions R,,..., Rm, 
R; © E. For an itemset P, an occurrence of P 
is a transaction of D such that P C R, and the 
occurrence set Occ(P) is the set of occurrences 
of P. The frequency of P, also called support, 
is |Occ(P)| and denoted by frq(P). For a given 
constant o called minimum support, an itemset P 
is frequent if frq(P) => o. For given a database 
and a minimum support, frequent itemset mining 
is the problem of enumerating all frequent item- 
sets in D. 


Key Results 


The enumeration is solved in output polyno- 
mial time [1], and the space complexity is in- 
put polynomial [5]. Many algorithms have been 
proposed for practical efficiency on real-world 
data [7, 8, 10, 11] that are drastically fast. We 
show algorithm LCM that is the winner in the 
competition [6], and several techniques used in 
LCM. 


Algorithms 


There have been a lot of algorithms for this prob- 
lem. The itemsets satisfy the following monotone 
property, and this is used in almost all existing 
algorithms. 


Lemma 1 For any itemsets P and Q such that 
P 2D Q, there holds frq(P) < frq(Q). In 
particular, Occ(P) © Occ(Q). Oo 
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Using the monotone property, we can enumer- 
ate all frequent itemsets from @ by recursively 
adding items. Oo 


Lemma 2 Any frequent itemset P of size k is 
generated by adding an item to a frequent itemset 
of sizek — 1. 


Apriori 

Apriori algorithm was proposed in the first paper 
of frequent pattern mining, by Agrawal et al. in 
1993 [1]. The computational resources are not 
enough in the era, and the database could not 
fit memory, thus stored in an HDD. Apriori is 
designed to be efficient in such environments so 
that it scans the database only few times. Apriori 
is a breadth-first search algorithm that iteratively 
generates all frequent itemsets of size 1, size 2, 
and so on. Apriori generates candidate itemsets 
by adding an item to each frequent itemset of 
size k — 1. From the monotone property, any 
frequent itemset of size k is in the candidate 
itemsets. Apriori then checks the inclusion re- 
lation between a transaction and all candidates. 
By doing this for all transactions, the frequencies 
of candidates are computed and infrequent can- 
didates are removed. The algorithm is written as 
follows: 


Algorithm Apriori(D, oc): 


1. Po = {0}; k :=0; 

2. while Px 4 @ do 

3. Pro := 9; 

4. for each P € Px, 
P+ U{P U fi} |i € E} 
5. for each R € D, frq(P) := frq(P) + 1 for all 
PePei1,P CR. 

6. remove all P from Px+, satisfying frq(P) <o 
7. output all P€ Pei; k:=k4+1 

8. end while 


frq(P) := 0; Pr4oi = 


The space complexity of Apriori is O(n) 
where jz is the number of frequent itemsets 
of D. The time complexity is O(n||D||) 


where ||D|| = Dopep|R| is the size of 
D. Hence Apriori is output polynomial 
time. 
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Backtrack Algorithm 

Backtrack algorithm is a depth-first search-based 
frequent itemset mining algorithm that is first 
proposed Bayardo et al. [5] in 1998. The amount 
of memory in a computer was rapidly increasing 
in the era, and thus the databases began to fit 
the memory. We can then reduce the memory 
space for storing candidate itemsets and thus 
huge amount of itemsets can be enumerated. 
Moreover, a technique so-called down project 
accelerates the computation. According to the 
monotone property, we can see that P U {i} is 
included in a transaction R only if R € Occ(P) 
holds. By using this, down project reduces the 
checks only with transactions in Occ(P). Partic- 
ularly, we can see that Occ(P U {i}) = Occ(P)N 
Occ({i}). Moreover, we can reduce the check for 
the duplication by using a technique so-called 
tail extension. We denote the maximum item in 
P by tail(P). Tail extension generates itemsets 
PU {i} only with 7,7 > tail(P). In this way, 
any frequent itemset P is generated uniquely 
from another frequent itemset; thus duplications 
are efficiently avoided, by recursively generating 
with tail extensions. 


Algorithm BackTrack (P, Occ(P),o): 


1. output P 

2. for each item i > tail(P) do 

3. if |Occ(P) N Occ({i})| > o , then call BackTrack 
(P U {i}, Occ(P) N Occ({i}), o) 

4. end for 


The space complexity of BackTrack is 
O(||D||); thereby BackTrack is polynomial 
space. The time complexity is O(||D||{), since 
step 3 is done by marking transactions in Occ(P) 
and checking whether each transaction of 
Occ({i}) is marked in constant time. Moreover, 
since the depth of recursion is at most n, the 
delay of BackTrack is O(n||D||). BackTrack 
with down project reduces practical computation 
time in order of magnitude, in implementation 
competitions FIMIO3 and FIMI04 [6]. 


Database Reduction 
The technique of database reduction was first 
developed in FP-growth by Han et al. [8] and 


787 


modified in LCM by Uno et al. [11]. Database re- 
duction drastically reduces practical computation 
time, as shown in the experiments in [6]. 

We observe that down project removes the un- 
necessary transactions from the database given to 
a recursive call, where unnecessary transactions 
are those never used in the recursion, and this 
fastens the computation. The idea of database 
reduction is to further remove unnecessary items 
from the database. The unnecessary items are (1) 
items i satisfying i < tail(P) and (2) items i 
such that P U {i} is not frequent. Items of (1) are 
never used because of the rule of tail extension. 
(2) comes from that P U {i} U {7} is not frequent 
for any item j, by the monotone property. Thus, 
the removal of these items never disturbs the 
enumeration. The database obtained by remov- 
ing unnecessary items from each transaction of 
Occ(P) is called the conditional database. 

In the deep of recursion, the conditional 
database tends to have few items since tail(P) is 
large and frq(P) is small. In such cases, several 
transactions would be identical. The computation 
for the identical transactions is the same, and 
thus we unify these transactions and put a mark 
of their quantity to the unified transaction as the 
representation of the multiplicity. For example, 
three transactions R1, R2, R3 = {100, 105, 110} 
are replaced by R; = {100, 105, 110} and a mark 
“three” is put to R;. By this, the computation 
time on the bottom levels of the recursion is 
drastically shortened when o is large. This is 
because conditional databases usually have k 
items in the bottom levels where k is a small 
constant and thus can have at most 2" different 
transactions. The obtained database is called the 
reduced database and is denoted by D*(P, 0c). 

The computation for the unification of 
identical transactions can be done by, for 
example, radix sort in O(||D*(P, o)||) time [11]. 
FP-growth further reduces by representing the 
database by a trie [7, 8]. However, experiments 
in [6] show that the overheads of trie are often 
larger than the gain; thus in many cases FP- 
growth is slower than LCM. The computation 
of frequent itemset mining generates recursions 
widely spread as the depth, thus so-called bottom 
expanded. In such case, the computation time on 
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the bottom levels dominates the total computation 
time [9]; thus database reduction performs very 
well. 


Delivery 

Delivery [10, 11] is a technique to compute 
Occ(P U {i}) for all i > tail(P) at once. Down 
project computes Occ(P U {i}) in O(|Occ({i})]) 
time and thus takes O(||D||) time for all 7. 
Delivery computes Occ(P U {i}) for all i at 
once in (||Occ(P)||) time. The idea is to find 
all P U {i} that are included in R, for each 
transaction R € Occ(P). Actually, P U {i} C R 
iff i € R; thus this is done by just scanning 
items i > tail(P). The algorithm is described as 
follows: 


1. Occ(P U {7}) := @ for each i > tail(P) 

2. for each R € Occ(P) do 

3. for each item i € R,i > tail(P), insert R to 
Occ(P U {7}) 

4. end for 


By using the reduced database D*(P, 0), de- 
livery is done in O(||D*(P, o)||) time. Note that 
the frequency is the sum of multiplications of 
transactions in the reduced database. 


Generalizations and Extensions 


The frequent itemset mining problem is extended 
by varying patterns and databases, such as trees 
in XML databases, labeled graphs in chemical 
compound databases, and so on. Let £ be a class 
of structures, and < be an binary relation on L. 
A member of CL is called a pattern. Suppose that 
we are given a database D composed of records 
R,,..., Rm, Ri € L. Itemset mining is the case 
that £ = 2” anda < b holds iff a C b. Fora 
pattern P € CL, an occurrence of P is a record 
R € 7D such that P < R, and the other notations 
are defined in the same way. For given a database 
and a minimum support, frequent pattern mining 
is the problem of enumerating all the frequent 
patterns in D. 


Frequent Pattern Mining 


When £ is arbitrary, the frequent pattern min- 
ing is hard. Thus, we often assume that (£, <) is 
a lattice, and there is an element of £ such that 
t < P holds for any P € L£. We then have the 
monotone property. 


Lemma 3 For any P,Q € CL satisfying P < Q, 
there holds frq(P) = frq(Q). O 


Let suc(P) (resp., prc(P)) be the set of ele- 
ments OQ € £\ {P} such that P < Q (resp., 
Q < P)holds and no X € CL \ {P, Q} satisfies 
P< X X QO ctesp, OQ x X X P). Using 
the monotone property, we can enumerate all fre- 
quent patterns from L by recursively generating 
all elements of suc(P). 

In this general setting, Apriori needs an as- 
sumption that (£,~<) is modular; thus for any 
P,Q such that P < Q, the length of any 
maximal chain P = X,---Xx < Q is identical. 
By this assumption, we can define the size of a 
pattern P by the length of the maximal chain 
from L to P. Apriori then works by replacing 
{P U {i} |i © E} of step 4 by suc(P). 

Let T be the time to generate a pattern in 
suc(P) and T’ be the time to evaluate a < b. 
Note that T’ may be large, for example, in the 
case that £ is the set of graphs and T” is the 
time for graph isomorphism. Apriori generates 
|suc(P)| patterns for each pattern P, and we 
have to check whether each generated pattern 
is already in P;+; by comparing P and each 
member of P;4 1. Thus, the total computation 
time is O(s(T + T’(u + |P|))) where s is the 
maximum size of suc(P). 

The depth-first search algorithm needs an al- 
ternative for tail extension. The alternative is 
given by reverse search technique proposed by 
Avis and Fukuda [4]. A pattern Q is gener- 
ated from many patterns in prc(Q), and this 
makes duplications. We avoid this by defining 
the parent P(Q) by one of prc(Q) and allow 
to generate Q only from P(Q), so that O is 
uniquely generated. For example, the same as 
tail extension, we define an order in prc(Q) 
and define P(Q) by the minimum one in the 
order. 
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Algorithm Backtrack2 (P, Occ(P),o): 


1. output P 

2. for each Q € suc(P) do 

3. compute Occ(Q) from Occ(P) 
4. if frq(Q) > o and P = 
Backtrack2 (Q, Occ(Q), 0) 

5. end for 


P(Q) then call 


The time complexity of BackTrack2 is 
O(s(T + T"” + T’|D|)) where T” is the time 
to compute the parent of a pattern. The heaviest 
part of T’|D| is usually reduced by down project. 
The algorithm will be efficient if all patterns Q 
satisfying P = P(Q) are efficiently enumerated. 
Such examples are sequences [12], trees [3], and 
motifs with wildcards [2]. 


Frequent Sequence Mining 

£ is composed of strings on alphabet »’, and 
a,b € £ satisfy a < b iff a is a subsequence 
of b, i.e., a is obtained from b by deleting some 
letters. For a pattern P, suc(P) is the set of 
strings obtained by inserting a letter to P at some 
position. 

We define the parent of P by the string 
obtained by removing the last letter from 
P. Then, the children of P is generated by 
appending a letter to the tail of P. Since 
< can be tested in linear time, BackTrack2 
runs in O(|'| x ||D||) time for each frequent 
sequence. 


Frequent Ordered Tree Mining 
£ is composed of rooted trees such that each 
vertex has a label and an ordering of children. 
Such a tree is called a labeled ordered tree. a, b € 
L£ satisfy a < b iff a is a subtree of b with 
correspondence keeping the children orders and 
vertex labels; a vertex of label “A” has to be 
mapped to a vertex having label “A,” and children 
orders do not change. For a pattern P, suc(P) 
is the set of labeled ordered trees obtained by 
inserting a vertex as a leaf. 

The rightmost path of an ordered tree is 
{v1,...,Ux¢} where v, is the root, and v; is 
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the last child of v;-;. We define the parent of 
P by that obtained by removing the rightmost 
leaf vy. Then, the children of P is generated by 
appending a vertex with a label so that the vertex 
is the last child of a vertex in the rightmost path. 
Since < can be tested in linear time, BackTrack2 
runs in O(t|’| x ||D||) time for each frequent 
ordered tree, where ¢ is the maximum height of 
the pattern tree. 
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Problem Definition 


The problem is concerned with efficiently main- 
taining information about all-pairs shortest paths 
in a dynamically changing graph. This problem 
has been investigated since the 1960s [17, 18, 20], 
and plays a crucial role in many applications, 
including network optimization and routing, traf- 
fic information systems, databases, compilers, 
garbage collection, interactive verification sys- 
tems, robotics, dataflow analysis, and document 
formatting. 

A dynamic graph algorithm maintains a given 
property P on a graph subject to dynamic 
changes, such as edge insertions, edge deletions 
and edge weight updates. A dynamic graph 
algorithm should process queries on property 
P quickly, and perform update operations faster 
than recomputing from scratch, as carried out 
by the fastest static algorithm. An algorithm is 
said to be fully dynamic if it can handle both 
edge insertions and edge deletions. A partially 
dynamic algorithm can handle either edge 
insertions or edge deletions, but not both: it is 
incremental if it supports insertions only, and 
decremental if it supports deletions only. In this 
entry, fully dynamic algorithms for maintaining 
shortest paths on general directed graphs are 
presented. 

In the fully dynamic All Pairs Shortest Path 
(APSP) problem one wishes to maintain a di- 
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rected graph G = (V, E) with real-valued edge 
weights under an intermixed sequence of the 
following operations: 


Update(x, y, w): update the weight of edge (x, y) 
to the real value w; this includes as a special 
case both edge insertion (if the weight is set 
from +oo to w < +00) and edge deletion (if 
the weight is set to w = +00); 

Distance(x, y): output the shortest 
from x to y. 

Path(x, y): report a shortest path from x to y, if 
any. 


distance 


More formally, the problem can be defined as 
follows. 


Problem 1 (Fully Dynamic All-Pairs Shortest 
Paths) 

INPUT: A weighted directed graph G = (V, E£), 
and a sequence o of operations as defined above. 
OUTPUT: A matrix D such entry D[x, y] stores 
the distance from vertex x to vertex y throughout 
the sequence o of operations. 


Throughout this entry, m and n denotes respec- 
tively the number of edges and vertices in G. 

Demetrescu and Italiano [3] proposed a new 
approach to dynamic path problems based on 
maintaining classes of paths characterized by 
local properties, i.e., properties that hold for all 
proper subpaths, even if they may not hold for the 
entire paths. They showed that this approach can 
play a crucial role in the dynamic maintenance of 
shortest paths. 


Key Results 


Theorem 1 The fully dynamic shoretest path 
problem can be solved inO(n? log? n) amortized 
time per update during any intermixed sequence 
of operations. The space required is O(mn). 


Using the same approach, Thorup [22] has shown 


how to slightly improve the running times: 


Theorem 2 The fully dynamic shoretest path 
problem can be solved in O(n?(logn + 
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log?(m/n))) amortized time per update during 
any intermixed sequence of operations. The space 
required is O(mn). 


Applications 


Dynamic shortest paths find applications in many 
areas, including network optimization and rout- 
ing, transportation networks, traffic information 
systems, databases, compilers, garbage collec- 
tion, interactive verification systems, robotics, 
dataflow analysis, and document formatting. 


Open Problems 


The recent work on dynamic shortest paths has 
raised some new and perhaps intriguing ques- 
tions. First, can one reduce the space usage for 
dynamic shortest paths to O(n?)? Second, and 
perhaps more importantly, can one solve effi- 
ciently fully dynamic single-source reachabil- 
ity and shortest paths on general graphs? Fi- 
nally, are there any general techniques for making 
increase-only algorithms fully dynamic? Similar 
techniques have been widely exploited in the 
case of fully dynamic algorithms on undirected 
graphs [11-13]. 


Experimental Results 
A thorough empirical study of the algorithms 
described in this entry is carried out in [4]. 


Data Sets 


Data sets are described in [4]. 
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Problem Definition 


Design a data structure for an undirected graph 
with a fixed set of nodes which can process 
queries of the form “Are nodes i and j con- 
nected?” and updates of the form “Insert edge 
{i,j }°; “Delete edge {7,7}. The goal is to min- 
imize update and query times, over the worst- 
case sequence of queries and updates. Algorithms 
to solve this problem are called “fully dynamic” 
as opposed to “partially dynamic” since both 
insertions and deletions are allowed. 


Key Results 


Holm et al. [4] gave the first deterministic fully 
dynamic graph algorithm for maintaining con- 
nectivity in an undirected graph with polylog- 
arithmic amortized time per operation, specifi- 
cally, O(log?n) amortized cost per update op- 
eration and O(logn/loglogn) worst-case per 
query, where n is the number of nodes. The 
basic technique is extended to maintain minimum 
spanning trees in O(log4n) amortized cost per 
update operation and 2-edge connectivity and 
biconnectivity in O(log°n) amortized time per 
operation. 

The algorithm relies on a simple novel tech- 
nique for maintaining a spanning forest in a graph 
which enables efficient search for a replacement 
edge when a tree edge is deleted. This technique 
ensures that each nontree edge is examined no 
more than log 2n times. The algorithm relies on 
previously known tree data structures, such as 
top trees or ET-trees to store and quickly retrieve 
information about the spanning trees and the 
nontree edges incident to them. 

Algorithms to achieve a query time 
O(logn/logloglogn) and expected amortized 
update time O(log n (log logn)*) for connectivity 
and O(log?nloglogn) expected amortized 
update time for 2-edge and biconnectivity were 
given in [6]. Lower bounds showing a continuum 
of tradeoffs for connectivity between query and 
update times in the cell probe model which match 
the known upper bounds were proved in [5]. 
Specifically, if ¢, and t, are the amortized update 
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and query time, respectively, then tg-log(t,/tq) = 
2 (logn) and t, - log(tg/t,) = 2(logn). 

A previously known, somewhat different, ran- 
domized method for computing dynamic connec- 
tivity with O(log 3) amortized expected update 
time can be found in [2], improved to O(log 2) 
in [3]. A method which minimizes worst-case 
rather than amortized update time is given in [1] 
O(./n) time per update for connectivity as well 
as 2-edge connectivity and bipartiteness. 


Open Problems 


Can the worst-case update time be reduced to 
O(n1/2), with polylogarithmic query time? 

Can the lower bounds on the trade-offs in [6] 
be matched for all possible query costs? 


Applications 


Dynamic connectivity has been used as a subrou- 
tine for several static graph algorithms, such as 
the maximum flow problem in a static graph [7], 
and for speeding up numerical studies of the Potts 
spin model. 


URL to Code 


See _http://www.mpi-sb.mpg.de/LEDA/friends/ 
dyngraph.html for software which implements 
the algorithm in [2] and other older methods. 
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Problem Definition 


The problem is concerned with efficiently main- 
taining information about connectivity in a dy- 
namically changing graph. A dynamic graph al- 
gorithm maintains a given property P on a graph 
subject to dynamic changes, such as edge in- 
sertions, edge deletions and edge weight up- 
dates. A dynamic graph algorithm should pro- 
cess queries on property P quickly, and perform 
update operations faster than recomputing from 
scratch, as carried out by the fastest static algo- 
rithm. An algorithm is said to be fully dynamic if 
it can handle both edge insertions and edge dele- 
tions. A partially dynamic algorithm can handle 
either edge insertions or edge deletions, but not 
both: it is incremental if it supports insertions 
only, and decremental if it supports deletions 
only. 

In the fully dynamic connectivity problem, 
one wishes to maintain an undirected graph G = 
(V, E) under an intermixed sequence of the fol- 
lowing operations: 


Connected(u, v): Return true if vertices u and v 
are in the same connected component of the 
graph. Return false otherwise. 

Insert(x, y): Insert a new edge between the two 
vertices x and y. 

Delete(x, y): Delete the edge between the two 
vertices x and y. 


Key Results 


In this section, a high level description of the al- 
gorithm for the fully dynamic connectivity prob- 
lem in undirected graphs described in [11] is 
presented: the algorithm, due to Holm, de Licht- 
enberg and Thorup, answers connectivity queries 
in O(logn/loglogn) worst-case running time 
while supporting edge insertions and deletions in 
O(log? n) amortized time. 

The algorithm maintains a spanning forest F 
of the dynamically changing graph G. Edges in F 
are referred to as tree edges. Let e be a tree 
edge of forest F, and let T be the tree of F 
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containing it. When e is deleted, the two trees 
T,; and T> obtained from T after the deletion 
of e can be reconnected if and only if there is 
a non-tree edge in G with one endpoint in 7; 
and the other endpoint in Tz. Such an edge is 
called a replacement edge for e. In other words, if 
there is a replacement edge for e, Tis reconnected 
via this replacement edge; otherwise, the deletion 
of e creates a new connected component in G. 

To accommodate systematic search for re- 
placement edges, the algorithm associates to each 
edge e a level (e) and, based on edge levels, 
maintains a set of sub-forests of the spanning 
forest F: for each level i, forest F; is the sub-forest 
induced by tree edges of level > i. Denoting by L 
denotes the maximum edge level, it follows that: 


F=fo2 Ff, 2 fr,2---2 FL. 


Initially, all edges have level 0; levels are then 
progressively increased, but never decreased. The 
changes of edge levels are accomplished so as 
to maintain the following invariants, which ob- 
viously hold at the beginning. 


Invariant (1): F is a maximum spanning forest 
of G if edge levels are interpreted as weights. 

Invariant (2): The number of nodes in each tree 
of F; is at most n/2'. 


Invariant (1) should be interpreted as follows. 
Let (u, v) be a non-tree edge of level £(u, v) and 
let u---v be the unique path between u and v 
in F (such a path exists since F is a spanning 
forest of G). Let e be any edge in u---v and 
let £(e) be its level. Due to (1), €(e) => €(u, v). 
Since this holds for each edge in the path, and 
by construction F¢,,,) contains all the tree edges 
of level > €(u, v), the entire path is contained in 
Feu,v), 1.€., u and v are connected in Fy). 

Invariant (2) implies that the maximum num- 
ber of levels is L < [log, |. 

Note that when a new edge is inserted, it is 
given level 0. Its level can be then increased at 
most |log,]| times as a consequence of edge 
deletions. When a tree edge e = (v,w) of level 
£(e) is deleted, the algorithm looks for a replace- 
ment edge at the highest possible level, if any. 
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Due to invariant (1), such a replacement edge has 
level £ < £(e). Hence, a replacement subroutine 
Replace((u, w),€(e)) is called with parameters 
e and £(e). The operations performed by this 
subroutine are now sketched. 


Replace((u, w), £) finds a replacement edge of 
the highest level < @, if any. If such a replace- 
ment does not exist in level £, there are two 
cases: if £ > 0, the algorithm recurses on level 
£—1; otherwise, £ = 0, and the deletion of (v, 
w) disconnects v and w in G. 


During the search at level £, suitably chosen tree 
and non-tree edges may be promoted at higher 
levels as follows. Let Ty, and T,, be the trees of 
forest Fg obtained after deleting (v, w) and let, 
w.l.o.g., Ty be smaller than 7,,. Then 7, contains 
at most n/2‘+! vertices, since T, U T, U {(v, w)} 
was a tree at level £ and due to invariant (2). Thus, 
edges in T, of level £ can be promoted at level + 
1 by maintaining the invariants. Non-tree edges 
incident to Ty are finally visited one by one: if an 
edge does connect T,, and T,,, a replacement edge 
has been found and the search stops, otherwise its 
level is increased by 1. 

Trees of each forest are maintained so that the 
basic operations needed to implement edge inser- 
tions and deletions can be supported in O(log n) 
time. There are few variants of basic data struc- 
tures that can accomplish this task, and one could 
use the Euler Tour trees (in short ET-tree), first 
introduced in [17], for this purpose. 

In addition to inserting and deleting edges 
from a forest, ET-trees must also support opera- 
tions such as finding the tree of a forest that con- 
tains a given vertex, computing the size of a tree, 
and, more importantly, finding tree edges of level 
£ in Ty and non-tree edges of level £ incident to 
Ty. This can be done by augmenting the ET-trees 
with a constant amount of information per node: 
the interested reader is referred to [11] for details. 

Using an amortization argument based on 
level changes, the claimed O(log* n) bound on 
the update time can be proved. Namely, inserting 
an edge costs O(log), as well as increasing its 
level. Since this can happen O(logn) times, the 
total amortized insertion cost, inclusive of level 
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increases, is O(log). With respect to edge 
deletions, cutting and linking O(logn) forest 
has a total cost O(log”); moreover, there are 
O(logn) recursive calls to Replace, each of 
cost O(log) plus the cost amortized over level 
increases. The ET-trees over Fo = F allows it to 
answer connectivity queries in O(logn) worst- 
case time. As shown in [11], this can be reduced 
to O(logn/loglogn) by using a O(logn)-ary 
version of ET-trees. 


Theorem 1 A dynamic graph G with n vertices 
can be maintained upon insertions and deletions 
of edges using O(log*n) amortized time per 
update and answering connectivity queries in 
O(log n/ log logn) worst-case running time. 


Later on, Thorup [18] gave another data structure 
which achieves slightly different time bounds: 


Theorem 2 A dynamic graph G with n vertices 
can be maintained upon insertions and dele- 
tions of edges using O(logn - (log logn)*) amor- 
tized time per update and answering connectivity 
queries in O(log n/ log log log n) time. 


The bounds given in Theorems | and 2 are not 
directly comparable, because each sacrifices the 
running time of one operation (either query or 
update) in order to improve the other. 

The best known lower bound for the dynamic 
connectivity problem holds in the bit-probe 
model of computation and is due to Patrascu 
and Tarnité [16]. The bit-probe model is an 
instantiation of the cell-probe model with one- 
bit cells. In this model, memory is organized in 
cells, and the algorithms may read or write a cell 
in constant time. The number of cell probes is 
taken as the measure of complexity. For formal 
definitions of this model, the interested reader is 
referred to [13]. 


Theorem 3 Consider a bit-probe implemen- 
tation for dynamic. connectivity, in which 
updates take expected amortized time t,, and 
queries take expected time tg. Then, in the 
average case of an input distribution, ty = 
2 (log?n/log?(ty + tg). In particular 
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logn 


2 
max{ty, tg} = 2 (7 . 


In the bit-probe model, the best upper bound 
per operation is given by the algorithm of The- 
orem 2, namely it is O(log? n/logloglogn). 
Consequently, the gap between upper and lower 
bound appears to be limited essentially to doubly 
logarithmic factors only. 


Applications 


Dynamic graph connectivity appears as a basic 
subproblem of many other important problems, 
such as the dynamic maintenance of minimum 
spanning trees and dynamic edge and vertex 
connectivity problems. Furthermore, — there 
are several applications of dynamic graph 
connectivity in other disciplines, ranging from 
Computational Biology, where dynamic graph 
connectivity proved to be useful for the dynamic 
maintenance of protein molecular surfaces as the 
molecules undergo conformational changes [6], 
to Image Processing, when one is interested 
in maintaining the connected components of 
a bitmap image [3]. 


Open Problems 


The work on dynamic connectivity raises some 
open and perhaps intruiguing questions. The first 
natural open problem is whether the gap between 
upper and lower bounds can be closed. Note 
that the lower bound of Theorem 3 seems to 
imply that different trade-offs between queries 
and updates could be possible: can we design 
a data structure with o(logn) time per update 
and O(poly(logn)) per query? This would be 
particulary interesting in applications where the 
total number of queries is substantially larger 
than the number of updates. 

Finally, is it possible to design an algorithm 
with matching O(log) update and query bounds 
for general graphs? Note that this is possible in 
the special case of plane graphs [5]. 
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Experimental Results 


A thorough empirical study of dynamic connec- 
tivity algorithms has been carried out in [1, 12]. 


Data Sets 


Data sets are described in [1, 12]. 
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Problem Definition 


The problem is concerned with efficiently main- 
taining information about edge and vertex con- 
nectivity in a dynamically changing graph. Be- 
fore defining formally the problems, a few pre- 
liminary definitions follow. 

Given an undirected graph G = (V, E), and 
an integer k > 2, a pair of vertices (u,v) is said 
to be k-edge-connected if the removal of any 
(k — 1) edges in G leaves u and v connected. 
It is not difficult to see that this is an equiva- 
lence relationship: the vertices of a graph G are 
partitioned by this relationship into equivalence 
classes called k-edge-connected components. G 
is said to be k-edge-connected if the removal of 
any (kK — 1) edges leaves G connected. As a result 
of these definitions, G is k-edge-connected if 
and only if any two vertices of G are k-edge- 
connected. An edge set E’ C E is an edge-cut for 
vertices x and y if the removal of all the edges in 
E’ disconnects G into two graphs, one containing 
x and the other containing y. An edge set E’ C E 
is an edge-cut for G if the removal of all the edges 
in E’ disconnects G into two graphs. An edge-cut 
E’ for G (for x and y, respectively) is minimal if 
removing any edge from E’ reconnects G (for x 
and y, respectively). The cardinality of an edge- 
cut E’, denoted by |£’|, is given by the number 
of edges in E’. An edge-cut E’ for G (for x and y, 
respectively) is said to be a minimum cardinality 
edge-cut or in short a connectivity edge-cut if 
there is no other edge-cut E” for G (for x and 
y respectively) such that |E”| < | £’|. Connectiv- 
ity edge-cuts are of course minimal edge-cuts. 
Note that G is k-edge-connected if and only if 
a connectivity edge-cut for G contains at least k 
edges, and vertices x and y are k-edge-connected 
if and only if a connectivity edge-cut for x and y 
contains at least k edges. A connectivity edge-cut 
of cardinality 1 is called a bridge. 
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The following theorem due to Ford and 
Fulkerson, and Elias, Feinstein and Shannon 
(see [7]) gives another characterization of k-edge 
connectivity. 


Theorem 1 (Ford and Fulkerson, Elias, Fe- 
instein and Shannon) Given a graph G and 
two vertices x and y in G, x and y are k-edge- 
connected if and only if there are at least k edge- 
disjoint paths between x and y. 


In a similar fashion, a vertex set V’ C V — {x, y} 
is said to be a vertex-cut for vertices x and y if 
the removal of all the vertices in V’ disconnects x 
and y. V’ C V isa vertex-cut for vertices G if the 
removal of all the vertices in V’ disconnects G. 

The cardinality of a vertex-cut V’, denoted by 
|V’|, is given by the number of vertices in V’. 
A vertex-cut V’ for x and y is said to be a min- 
imum cardinality vertex-cut or in short a con- 
nectivity vertex-cut if there is no other vertex- 
cut V” for x and y such that |V”| < |V’|. Then 
x and y are k-vertex-connected if and only if 
a connectivity vertex-cut for x and y contains 
at least k vertices. A graph G is said to be k- 
vertex-connected if all its pairs of vertices are 
k-vertex-connected. A connectivity vertex-cut of 
cardinality 1 is called an articulation point, while 
a connectivity vertex-cut of cardinality 2 is called 
a separation pair. Note that for vertex connec- 
tivity it is no longer true that the removal of 
a connectivity vertex-cut splits G into two sets of 
vertices. 

The following theorem due to Menger 
(see [7]) gives another characterization of k- 
vertex connectivity. 


Theorem 2 (Menger) Given a graph G and 
two vertices x and y in G, x and y are k-vertex- 
connected if and only if there are at least k vertex- 
disjoint paths between x and y. 


A dynamic graph algorithm maintains a given 
property P on a graph subject to dynamic 
changes, such as edge insertions, edge deletions 
and edge weight updates. A dynamic graph 
algorithm should process queries on property 
P quickly, and perform update operations faster 
than recomputing from scratch, as carried out by 
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the fastest static algorithm. An algorithm is fully 
dynamic if it can handle both edge insertions and 
edge deletions. A partially dynamic algorithm 
can handle either edge insertions or edge 
deletions, but not both: it is incremental if it 
supports insertions only, and decremental if it 
supports deletions only. 

In the fully dynamic k-edge connectivity prob- 
lem one wishes to maintain an undirected graph 
G = (V, E) under an intermixed sequence of the 
following operations: 


¢ k-EdgeConnected(u, v): Return true if vertices 
u and v are in the same k-edge-connected 
component. Return false otherwise. 

¢ Insert(x, y): Insert a new edge between the two 
vertices x and y. 

¢ Delete(x, y): Delete the edge between the two 
vertices x and y. 


In the fully dynamic k-vertex connectivity 
problem one wishes to maintain an undirected 
graph G = (V, £) under an intermixed sequence 
of the following operations: 


e k-VertexConnected(u, v): Return true if ver- 
tices u and v are k-vertex-connected. Return 
false otherwise. 

¢ Insert(x, y): Insert a new edge between the two 
vertices x and y. 

¢ Delete(x, y): Delete the edge between the two 
vertices x and y. 


Key Results 


To the best knowledge of the author, the most 
efficient fully dynamic algorithms for k-edge and 
k-vertex connectivity were proposed in [3, 12]. 
Their running times are characterized by the 
following theorems. 


Theorem 3 The fully dynamic k-edge connectiv- 
ity problem can be solved in: 


1. O(log? n) time per update and O(log? n) time 
per query, fork = 2 
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2. O(n?) time per update and query, for k = 3 

3. O(na(n)) time per update and query, for 
k=4 

4. O(nlogn) time per update and query, for 
k>5. 


Theorem 4 The fully dynamic k-vertex connec- 
tivity problem can be solved in: 


1. O(log* n) time per update and O(log? n) time 
per query, fork = 2 

2. O(n) time per update and query, for k = 3 

3. O(na(n)) time per update and query, for 
k=4., 


Applications 


Vertex and edge connectivity problems arise of- 
ten in issues related to network reliability and 
survivability. In computer networks, the vertex 
connectivity of the underlying graph is related 
to the smallest number of nodes that might fail 
before disconnecting the whole network. Sim- 
ilarly, the edge connectivity is related to the 
smallest number of links that might fail before 
disconnecting the entire network. Analogously, if 
two nodes are k-vertex-connected then they can 
remain connected even after the failure of up 
to (k — 1) other nodes, and if they are k-edge- 
connected then they can survive the failure of up 
to (k — 1) links. It is important to investigate the 
dynamic versions of those problems in contexts 
where the networks are dynamically evolving, 
say, when links may go up and down because of 
failures and repairs. 


Open Problems 


The work of Eppstein et al. [3] and Holm 
et al. [12] raises some intriguing questions. First, 
while efficient dynamic algorithms for k-edge 
connectivity are known for general k, no efficient 
fully dynamic k-vertex connectivity is known for 
k > 5. To the best of the author’s knowledge, 
in this case even no static algorithm is known. 
Second, fully dynamic 2-edge and 2-vertex 
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connectivity can be solved in polylogarithmic 
time per update, while the best known update 
bounds for higher edge and vertex connectivity 
are polynomial: Can this gap be reduced, i.e., can 
one design polylogarithnmic algorithms for fully 
dynamic 3-edge and 3-vertex connectivity? 
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Problem Definition 


In this entry, the problem of maintaining a dy- 
namic planar graph subject to edge insertions 
and edge deletions that preserve planarity but 
that can change the embedding is considered. 
In particular, in this problem one is concerned 
with the problem of efficiently maintaining in- 
formation about edge and vertex connectivity in 
such a dynamically changing planar graph. The 
algorithms to solve this problem must handle in- 
sertions that keep the graph planar without regard 
to any particular embedding of the graph. The 
interested reader is referred to the chapter » Fully 
Dynamic Planarity Testing of this encyclopedia 
for algorithms to learn how to check efficiently 
whether a graph subject to edge insertions and 
deletions remains planar (without regard to any 
particular embedding). 

Before defining formally the problems consid- 
ered here, a few preliminary definitions follow. 

Given an undirected graph G = (V, E), and 
an integer k > 2, a pair of vertices (u,v) is said 
to be k-edge-connected if the removal of any 
(k —1) edges in G leaves u and v connected. 
It is not difficult to see that this is an equiva- 
lence relationship: the vertices of a graph G are 
partitioned by this relationship into equivalence 
classes called k-edge-connected components. G 
is said to be k-edge-connected if the removal of 
any (kK — 1) edges leaves G connected. As a result 
of these definitions, G is k-edge-connected if 
and only if any two vertices of G are k-edge- 
connected. An edge set E’ C E is an edge-cut for 
vertices x and y if the removal of all the edges in 
E’ disconnects G into two graphs, one containing 
x and the other containing y. An edge set E’ C E 
is an edge-cut for G if the removal of all the edges 
in E’ disconnects G into two graphs. An edge-cut 
E’ for G (for x and y, respectively) is minimal if 
removing any edge from E’ reconnects G (for x 
and y, respectively). The cardinality of an edge- 
cut E’, denoted by |£’|, is given by the number 
of edges in E’. An edge-cut E’ for G (for x and y, 
respectively) is said to be a minimum cardinality 
edge-cut or in short a connectivity edge-cut if 
there is no other edge-cut E” for G (for x and 
y, respectively) such that |£”| < | E’|. Connec- 
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tivity edge-cuts are of course minimal edge-cuts. 
Note that G is k-edge-connected if and only if 
a connectivity edge-cut for G contains at least k 
edges, and vertices x and y are k-edge-connected 
if and only if a connectivity edge-cut for x and y 
contains at least k edges. A connectivity edge-cut 
of cardinality | is called a bridge. 

Ina similar fashion, a vertex set V’C V— {x, y} 
is said to be a vertex-cut for vertices x and y if 
the removal of all the vertices in V’ disconnects 
x and y. V’ C V is a vertex-cut for vertices G if 
the removal of all the vertices in V’ disconnects 
G. 

The cardinality of a vertex-cut V’, denoted by 
|V’|, is given by the number of vertices in V’. 
A vertex-cut V’ for x and y is said to be a mini- 
mum cardinality vertex-cut or in short a connec- 
tivity vertex-cut if there is no other vertex-cut V” 
for x and y such that |V”| < |V’|. Then x and y are 
k-vertex-connected if and only if a connectivity 
vertex-cut for x and y contains at least k vertices. 
A graph G is said to be k-vertex-connected if 
all its pairs of vertices are k-vertex-connected. 
A connectivity vertex-cut of cardinality 1 is 
called an articulation point, while a connectivity 
vertex-cut of cardinality 2 is called a separation 
pair. Note that for vertex connectivity it is no 
longer true that the removal of a connectivity 
vertex-cut splits G into two sets of vertices. 

A dynamic graph algorithm maintains a given 
property P on a graph subject to dynamic 
changes, such as edge insertions, edge deletions 
and edge weight updates. A dynamic graph 
algorithm should process queries on property P 
quickly, and perform update operations faster 
than recomputing from scratch, as carried out by 
the fastest static algorithm. An algorithm is fully 
dynamic if it can handle both edge insertions and 
edge deletions. A partially dynamic algorithm 
can handle either edge insertions or edge 
deletions, but not both: it is incremental if it 
supports insertions only, and decremental if it 
supports deletions only. 

In the fully dynamic k-edge connectivity prob- 
lem for a planar graph one wishes to maintain 
an undirected planar graph G = (V, E) under 
an intermixed sequence of edge insertions, edge 
deletions and queries about the k-edge connectiv- 
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ity of the underlying planar graph. Similarly, in 
the fully dynamic k-vertex connectivity problem 
for a planar graph one wishes to maintain an 
undirected planar graph G = (V, E) under an in- 
termixed sequence of edge insertions, edge dele- 
tions and queries about the k-vertex connectivity 
of the underlying planar graph. 


Key Results 


The algorithms in [2, 3] solve efficiently the 
above problems for small values of k: 


Theorem 1 One can maintain a planar graph, 
subject to insertions and deletions that preserve 
planarity, and allow queries that test the 2-edge 
connectivity of the graph, or test whether two 
vertices belong to the same 2-edge-connected 
component, in O(log n) amortized time per inser- 
tion or query, and O(log? n) per deletion. 


Theorem 2 One can maintain a planar graph, 
subject to insertions and deletions that preserve 
planarity, and allow testing of the 3-edge and 4- 
edge connectivity of the graph in O(n'/?) time 
per update, or testing of whether two vertices 
are 3- or 4-edge-connected, in O(n!) time per 
update or query. 


Theorem 3 One can maintain a planar graph, 
subject to insertions and deletions that preserve 
planarity, and allow queries that test the 3-vertex 
connectivity of the graph, or test whether two 
vertices belong to the same 3-vertex-connected 
component, in O(n'/?) amortized time per up- 
date or query. 


Note that these theorems improve on the bounds 
known for the same problems on general graphs, 
reported in the chapter > Fully Dynamic Higher 
Connectivity 


Applications 


The interest reader is referred to the chapter 
Fully Dynamic Higher Connectivity for appli- 
cations of dynamic edge and vertex connectivity. 
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The case of planar graphs is especially impor- 
tant, as these graphs arise frequently in applica- 
tions. 


Open Problems 


A number of problems related to the work of 
Eppstein et al. [2, 3] remain open. First, can 
the running times per operation be improved? 
Second, as in the case of general graphs, also 
for planar graphs fully dynamic 2-edge connec- 
tivity can be solved in polylogarithmic time per 
update, while the best known update bounds for 
higher edge and vertex connectivity are poly- 
nomial: Can this gap be reduced, i.e., can one 
design polylogarithnmic algorithms at least for 
fully dynamic 3-edge and 3-vertex connectivity? 
Third, in the special case of planar graphs can 
one solve fully dynamic k-vertex connectivity for 
general k? 
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Problem Definition 


Let G = (V,E) be an undirected weighted 
graph. The problem considered here is concerned 
with maintaining efficiently information about 
a minimum spanning tree of G (or minimum 
spanning forest if G is not connected), when G 
is subject to dynamic changes, such as edge in- 
sertions, edge deletions and edge weight updates. 
One expects from the dynamic algorithm to per- 
form update operations faster than recomputing 
the entire minimum spanning tree from scratch. 

Throughout, an algorithm is said to be fully 
dynamic if it can handle both edge insertions and 
edge deletions. A partially dynamic algorithm 
can handle either edge insertions or edge dele- 
tions, but not both: it is incremental if it supports 
insertions only, and decremental if it supports 
deletions only. 


Key Results 


The dynamic minimum spanning forest algo- 
rithm presented in this section builds upon the 
dynamic connectivity algorithm described in the 
entry >» Fully Dynamic Connectivity. In particu- 
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lar, a few simple changes to that algorithm are 
sufficient to maintain a minimum spanning forest 
of a weighted undirected graph upon deletions 
of edges [13]. A general reduction from [11] 
can then be applied to make the deletions-only 
algorithm fully dynamic. 

This section starts by describing a decremen- 
tal algorithm for maintaining a minimum span- 
ning forest under deletions only. Throughout the 
sequence of deletions, the algorithm maintains 
a minimum spanning forest F of the dynamically 
changing graph G. The edges in F are referred 
to as tree edges and the other edges (in G — F) 
are referred to as non-tree edges. Let e be an 
edge being deleted. If e is a non-tree edge, then 
the minimum spanning forest does not need to 
change, so the interesting case is when e is a tree 
edge of forest F. Let T be the tree of F containing 
e. In this case, the deletion of e disconnects the 
tree T into two trees TJ; and 7»: to update the 
minimum spanning forest, one has to look for 
the minimum weight edge having one endpoint 
in T; and the other endpoint in 7,. Such an edge 
is called a replacement edge for e. 

As for the dynamic connectivity algorithm, 
to search for replacement edges, the algorithm 
associates to each edge e a level £(e) and, based 
on edge levels, maintains a set of sub-forests of 
the minimum spanning forest F: for each level i, 
forest F; is the sub-forest induced by tree edges 
of level > i. Denoting by L the maximum edge 
level, it follows that: 


F=fo2 Fi, 2 Fp 2::- 2 Fr. 


Initially, all edges have level 0; levels are then 
progressively increased, but never decreased. The 
changes of edge levels are accomplished so as 
to maintain the following invariants, which ob- 
viously hold at the beginning. 


Invariant (1): F is a maximum spanning forest 
of G if edge levels are interpreted as weights. 

Invariant (2): The number of nodes in each tree 
of F; is at most n/2!. 

Invariant (3): Every cycle C has a non-tree 
edge of maximum weight and minimum level 
among all the edges in C. 
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Invariant (1) should be interpreted as follows. 
Let (u,v) be a non-tree edge of level £(u, v) and 
let u---v be the unique path between u and v 
in F (such a path exists since F is a spanning 
forest of G). Let e be any edge in u---v and 
let £(e) be its level. Due to (1), £(e) = €(u, v). 
Since this holds for each edge in the path, and 
by construction F¢(,,,) contains all the tree edges 
of level > €(u, v), the entire path is contained in 
Feu,v), 1.€., u and v are connected in Fq,,y). 

Invariant (2) implies that the maximum num- 
ber of levels is L < [log, n]. 

Invariant (3) can be used to prove that, among 
all the replacement edges, the lightest edge is 
on the maximum level. Let e; and e2 be two 
replacement edges with w(e1) < w(e : 2), and 
let C; be the cycle induced by e; in F,i = 1,2. 
Since F is a minimum spanning forest, e; has 
maximum weight among all the edges in C;. In 
particular, since by hypothesis w(e1) < w(e : 2), 
2 is also the heaviest edge in cycle C = (C, U 
C2) \ (Cy N C2). Thanks to Invariant (3), e2 has 
minimum level in C, proving that £(e2) < €(e;). 
Thus, considering non-tree edges from higher to 
lower levels is correct. 

Note that initially, an edge is is given level 0. 
Its level can be then increased at most |log, 7 | 
times as a consequence of edge deletions. 
When a tree edge e = (v,w) of level £(e) is 
deleted, the algorithm looks for a replacement 
edge at the highest possible level, if any. Due 
to invariant (1), such a replacement edge has 
level £ < £(e). Hence, a replacement subroutine 
Replace((u, w), £(e)) is called with parameters 
e and £(e). The operations performed by this 
subroutine are now sketched. 


Replace((u,w),¢) finds a replacement edge of 
the highest level < ¢, if any, considering 
edges in order of increasing weight. If such 
a replacement does not exist in level £, there 
are two cases: if £ > 0, the algorithm recurses 
on level £ — 1; otherwise, £ = 0, and the 
deletion of (v,w) disconnects v and w in G. 


It is possible to show that Replace returns 
a replacement edge of minimum weight on the 
highest possible level, yielding the following 
lemma: 


804 


Lemma 1 There exists a deletions-only mini- 
mum spanning forest algorithm that can be ini- 
tialized on a graph with n vertices and m edges 
and supports any sequence of edge deletions in 
O(m log? n) total time. 


The description of a fully dynamic algorithm 
which performs updates in O(log* n) time now 
follows. The reduction used to obtain a fully 
dynamic algorithm is a slight generalization of 
the construction proposed by Henzinger and 
King [11] and works as follows. 
Lemma 2 Suppose there is a_ deletions- 
only minimum spanning tree algorithm that, 
for any k and @, 
a graph with k vertices and € edges and 


can be initialized on 
supports any sequence of S2(€) deletions in 
total time O(€ - t(k,€)), where t is a non- 
decreasing function. Then there exists a fully- 
dynamic minimum spanning tree algorithm 
for a graph with n nodes starting with no 
edges, that, for m edges, supports updates in 
time 


3+loggm ji 


O loge n + a Y\t (min{n, 2/},2/) 


i=l j=l 


The interested reader is referred to refer- 
ences [11] and [13] for the description of 
the construction that proves Lemma 2. From 
Lemma | one gets t(k, 2) = O(log” k). Hence, 
combining Lemmas | and 2, the claimed result 
follows: 


Theorem 3 There exists a fully-dynamic mini- 
mum spanning forest algorithm that, for a graph 
with n vertices, starting with no edges, maintains 
a minimum spanning forest in O(log* n) amor- 
tized time per edge insertion or deletion. 


There is a lower bound of S2(logn) for dy- 
namic minimum spanning tree, given by Eppstein 
et al. [6], which uses the following argument. Let 
A be an algorithm for maintaining a minimum 
spanning tree of an arbitrary (multi)graph G. Let 
A be such that change weight(e, A) returns the 
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edge f that replace e in the minimum spanning 
tree, if e is replaced. Clearly, any dynamic span- 
ning tree algorithm can be modified to return 
f. One can use algorithm A to sort 1 positive 
numbers x;, x2, ..., X,, aS follows. Construct 
a multigraph G consisting of two nodes con- 
nected by (n + 1) edges eo, e1, ..., @n, Such that 
edge eg has weight 0 and edge e; has weight x;. 
The initial spanning tree is eg. Increase the weight 
of @9 to +00. Whichever edge replaces eo, say é;, 
is the edge of minimum weight. Now increase the 
weight of e; to +00: the replacement of e; gives 
the second smallest weight. Continuing in this 
fashion gives the numbers sorted in increasing 
order. A similar argument applies when only edge 
decreases are allowed. Since Paul and Simon [14] 
have shown that any sorting algorithm needs 
§2(n logn) time to sort n numbers on a unit-cost 
random access machine whose repertoire of oper- 
ations include additions, subtractions, multiplica- 
tions and comparisons with 0, but not divisions 
or bit-wise Boolean operations, the following 
theorem follows. 


Theorem 4 Any unit-cost random access 
algorithm that performs additions, subtractions, 
multiplications and comparisons with 0, but 
not divisions or bit-wise Boolean operations, 
requires Q(logn) amortized time per oper- 
ation to maintain a minimum spanning tree 
dynamically. 


Applications 


Minimum spanning trees have applications in 
many areas, including network design, VLSI, 
and geometric optimization, and the problem of 
maintaining minimum spanning trees dynami- 
cally arises in such applications. 

Algorithms for maintaining a minimum 
spanning forest of a graph can be used also for 
maintaining information about the connected 
components of a graph. There are also other 
applications of dynamic minimum spanning trees 
algorithms, which include finding the k smallest 
spanning trees [3-5, 8, 9], sampling spanning 
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trees [7] and dynamic matroid intersection prob- 
lems [10]. Note that the first two problems are not 
necessarily dynamic: however, efficient solutions 
for these problems need dynamic data structures. 


Open Problems 


The first natural open question is to ask whether 
the gap between upper and lower bounds for the 
dynamic minimum spanning tree problem can be 
closed. Note that this is possible in the special 
case of plane graphs [6]. 

Second, the techniques for dynamic minimum 
spanning trees can be extended to dynamic 2- 
edge and 2-vertex connectivity, which indeed can 
be solved in polylogarithmic time per update. Can 
one extend the same technique also to higher 
forms of connectivity? This is particularly im- 
portant, since the best known update bounds for 
higher edge and vertex connectivity are polyno- 
mial, and it would be useful to design polylog- 
arithnmic algorithms at least for fully dynamic 
3-edge and 3-vertex connectivity. 


Experimental Results 


A thorough empirical study on the performance 
evaluation of dynamic minimum spanning trees 
algorithms has been carried out in [1, 2]. 


Data Sets 


Data sets are described in [1, 2]. 
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Problem Definition 


In this entry, the problem of maintaining a dy- 
namic planar graph subject to edge insertions and 
edge deletions that preserve planarity but that 
can change the embedding is considered. Before 
formally defining the problem, few preliminary 
definitions follow. 

A graph is planar if it can be embedded in 
the plane so that no two edges intersect. In 
a dynamic framework, a planar graph that is 
committed to an embedding is called plane, 
and the general term planar is used only when 
changes in the embedding are allowed. An 
edge insertion that preserves the embedding is 
called embedding-preserving, whereas it is called 
planarity-preserving if it keeps the graph planar, 
even though its embedding can change; finally, 
an edge insertion is called arbitrary if it is not 
known to preserve planarity. Extensive work 
on dynamic graph algorithms has used ad hoc 
techniques to solve a number of problems such as 
minimum spanning forests, 2-edge-connectivity 
and planarity testing for plane graphs (with 
embedding-preserving insertions) [5—7, 9-12]: 
this entry is concerned with more general 
planarity-preserving updates. 

The work of Galil et al. [8] and of Eppstein 
et al. [3] provides a general technique for dy- 
namic planar graph problems, including those 
mentioned above: in all these problems, one can 
deal with either arbitrary or planarity-preserving 
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insertions and therefore allow changes of the 
embedding. 

The fully dynamic planarity testing problem 
can be defined as follows. One wishes to main- 
tain a (not necessarily planar) graph subject to 
arbitrary edge insertions and deletions, and allow 
queries that test whether the graph is currently 
planar, or whether a potential new edge would 
violate planarity. 


Key Results 


Eppstein et al. [3] provided a way to apply the 
sparsification technique [2] to families of graphs 
that are already sparse, such as planar graphs. 

The new ideas behind this technique are the 
following. The notion of a certificate can be ex- 
panded to a definition for graphs in which a sub- 
set of the vertices are denoted as interesting; these 
compressed certificates may reduce the size of the 
graph by removing uninteresting vertices. Using 
this notion, one can define a type of sparsification 
based on separators, small sets of vertices the 
removal of which splits the graph into roughly 
equal size components. Recursively finding sepa- 
rators in these components gives a separator tree 
which can also be used as a sparsification tree; 
the interesting vertices in each certificate will be 
those vertices used in separators at higher levels 
of the tree. The notion of a balanced separator 
tree, which also partitions the interesting vertices 
evenly in the tree, is introduced: such a tree can be 
computed in linear time, and can be maintained 
dynamically. Using this technique, the following 
results can be achieved. 


Theorem 1 One can maintain a planar graph, 
subject to insertions and deletions that preserve 
planarity, and allow queries that test whether 
a new edge would violate planarity, in amortized 
time O(n'/*) per update or query. 


This result can be improved, in order to allow 
arbitrary insertions or deletions, even if they 
might let the graph become nonplanar, using the 
following approach. The data structure above can 
be used to maintain a planar subgraph of the given 


Fully Dynamic Planarity Testing 


graph. Whenever one attempts to insert a new 
edge, and the resulting graph would be nonplanar, 
the algorithm does not actually perform the inser- 
tion, but instead adds the edge to a list of non- 
planar edges. Whenever a query is performed, 
and the list of nonplanar edges is nonempty, the 
algorithm attempts once more to add those edges 
one at a time to the planar subgraph. The time 
for each successful addition can be charged to the 
insertion operation that put that edge in the list of 
nonplanar edges. As soon as the algorithm finds 
some edge in the list that can not be added, it 
stops trying to add the other edges in the list. The 
time for this failed insertion can be charged to 
the query the algorithm is currently performing. 
In this way the list of nonplanar edges will be 
empty if and only if the graph is planar, and the 
algorithm can test planarity even for updates in 
nonplanar graphs. 


Theorem 2 One can maintain a graph, subject 
to arbitrary insertions and deletions, and allow 
queries that test whether the graph is presently 
planar or whether a new edge would violate 
planarity, in amortized timeO(n'/?) per update 


or query. 


Applications 


Planar graphs are perhaps one of the most 
important interesting subclasses of graphs 
which combine beautiful structural results with 
relevance in applications. In particular, planarity 
testing is a basic problem, which appears 
naturally in many applications, such as VLSI 
layout, graphics, and computer aided design. In 
all these applications, there seems to be a need 
for dealing with dynamic updates. 


Open Problems 


The O(n'/2) bound for planarity testing is amor- 
tized. Can we improve this bound or make it 
worst-case? 

Finally, the complexity of the algorithms pre- 
sented here, and the large constant factors in- 
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volved in some of the asymptotic time bounds, 
make some of the results unsuitable for practical 
applications. Can one simplify the methods while 
retaining similar theoretical bounds? 
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Problem Definition 


Design a data structure for a directed graph with 
a fixed set of node which can process queries of 
the form “Is there a path from i to j ?” and updates 
of the form: “Insert edge (i, j)”; “Delete edge (i, 
J)’. The goal is to minimize update and query 
times, over the worst case sequence of queries 
and updates. Algorithms to solve this problem are 
called “fully dynamic” as opposed to “partially 
dynamic” since both insertions and deletions are 
allowed. 


Key Results 


This work [4] gives the first deterministic fully 
dynamic graph algorithm for maintaining the 
transitive closure in a directed graph. It uses 
O(n? logn) amortized time per update and O(1) 
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worst case query time where n is number of nodes 
in the graph. The basic technique is extended to 
give fully dynamic algorithms for approximate 
and exact all-pairs shortest paths problems. 

The basic building block of these algorithms 
is amethod of maintaining all-pairs shortest paths 
with insertions and deletions for distances up to d. 
For each vertex v, a single-source shortest path 
tree of depth d which reach v (“Jn,”’) and another 
tree of vertices which are reached by v (“Outy’’) 
are maintained during any sequence of deletions. 
Each insert of a set of edges incident to v results 
in the rebuilding of In, and Out, I. For each pair 
of vertices x, y and each length, a count is kept of 
the number of v such that there is a path from x in 
In, to y in Outy of that length. 

To maintain transitive closure, log n levels of 
these trees are maintained for trees of depth 2, 
where the edges used to construct a forest on 
one level depend on the paths in the forest of the 
previous level. 

Space required was reduced from O(n?) 
to O(n”) in [6]. A log n factor was shaved 
off [7, 10]. Other tradeoffs between update and 
query time are given in [1, 7-10]. A deletions 
only randomized transitive closure algorithm run- 
ning in O(mn) time overall is given by [8] where 
m is the initial number of edges in the graph. 
A simple monte carlo transitive closure algorithm 
for acyclic graphs is presented in [5]. Dynamic 
single source reachability in a digraph is 
presented in [8, 9]. All-pairs shortest paths can be 
maintained with nearly the same update time [2]. 


Applications 


None 


Open Problems 


Can reachability from a single source in a di- 
rected graph be maintained in o(mn) time over 
a worst case sequence of m deletions? 

Can strongly connected components be main- 
tained in o(mn) time over a worst case sequence 
of m deletions? 
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Experimental Results 


Experimental results on older techniques can be 
found in [3]. 
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Problem Definition 


For a detailed exposition of the solution approach 
presented in this entry, please refer to [15]. As 
evidenced by the successive announcement of 
ever-faster computer systems in the past decade, 
increasing the speed of VLSI systems continues 
to be one of the major requirements for VLSI 
system designers today. Faster integrated circuits 
are making possible newer applications that were 
traditionally considered difficult to implement in 
hardware. In this scenario of increasing circuit 
complexity, reduction of circuit delay in inte- 
grated circuits is an important design objective. 
Transistor sizing is one such task that has been 
employed for speeding up circuits for quite some 
time now [6]. Given the circuit topology, the 
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delay of a combinational circuit can be controlled 
by varying the sizes of transistors in the circuit. 
Here, the size of a transistor is measured in terms 
of its channel width, since the channel lengths of 
MOS transistors in a digital circuit are generally 
uniform. In any case, what really matters is the 
ratio of channel width to channel length, and if 
channel lengths are not uniform, this ratio can be 
considered as the size. In coarse terms, the circuit 
delay can usually be reduced by increasing the 
sizes of certain transistors in the circuit from the 
minimum size. Hence, making the circuit faster 
usually entails the penalty of increased circuit 
area relative to a minimum-sized circuit, and the 
area-delay trade-off involved here is the problem 
of transistor size optimization. A related problem 
to transistor sizing is called gate sizing, where a 
logic gate in a circuit is modeled as an equivalent 
inverter and the sizing optimization is carried 
out on this modified circuit with equivalent in- 
verters in place of more complex gates. There 
is, therefore, a reduction in the number of size 
parameters corresponding to every gate in the cir- 
cuit. Needless to say, this is an easier problem to 
solve than the general transistor sizing problem. 
Note that gate sizing mentioned here is distinct 
from library-specific gate sizing that is a discrete 
optimization problem targeted to selecting appro- 
priate gate sizes from an underlying cell library. 
The gate sizing problem targeted here is one of 
continuous gate sizing where the gate sizes are 
allowed to vary in a continuous manner between 
a minimum and a maximum size. There has been 
a large amount of work done on transistor sizing 
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[1-3, 5, 6, 9, 10, 12, 13], that underlines the im- 
portance of this optimization technique. Starting 
from a minimum-sized circuit, TILOS, [6], uses a 
greedy strategy for transistor sizing by iteratively 
sizing transistors in the critical path. A sensitivity 
factor is calculated for every transistor in the 
critical path to quantify the gain in circuit speed 
achieved by a unit upsizing of the transistor. The 
most sensitive transistor is then bumped up in 
size by a small constant factor to speed up the 
circuit. This process is repeated iteratively until 
the timing requirements are met. The technique is 
extremely simple to implement and has run-time 
behavior proportional to the size of the circuit. Its 
chief drawback is that it does not have guaranteed 
convergence properties and hence is not an exact 
optimization technique. 


Key Results 


The solution presented in the entry heretofore 
referred to as MINFLOTRANSIT was a novel 
way of solving the transistor sizing problem ex- 
actly and in an extremely fast manner. Even 
though the entry treats transistor sizing, in the 
description, the results apply as well to the less 
general problem of continuous gate sizing as de- 
scribed earlier. The proposed approach has some 
similarity in form to [2,5,8] which will be sub- 
sequently explained, but the similarity in content 
is minimal and the details of implementation are 
vastly different. 

In essence, the proposed technique and the 
techniques in [2, 5, 8] are iterative relaxation 
approaches that involve a two-step optimization 
strategy. The first step involves a delay budget- 
ing step where optimal delays are computed for 
transistors/gates. The second step involves sizing 
transistors optimally under this “constant delay” 
model to achieve these delay budgets. The two 
steps are iteratively alternated until the solution 
converges, i.e., until the delay budgets calculated 
in the first step are exactly satisfied by the tran- 
sistor sizes determined by the second step. 
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The primary features of the proposed approach 
are: 


¢ Itis computationally fast and is comparable to 
TILOS in its run-time behavior. 

¢ It can be used for true transistor sizing as well 
as the relaxed problem of gate sizing. Addi- 
tionally, the approach can easily incorporate 
wire sizing [15]. 

¢ It can be adapted for more general delay 
models than the Elmore delay model [15]. 


The starting point for the proposed approach is 
a fast guess solution. This could be obtained, 
for example, from a circuit that has been op- 
timized using TILOS to meet the given delay 
requirements. The proposed approach, as out- 
lined earlier, is an iterative relaxation procedure 
that involves an alternating two-phase relaxed 
optimization sequence that is repeated iteratively 
until convergence is achieved. The two phases in 
the proposed approach are: 


¢ The D-phase where transistor sizes are 
assumed fixed and transistor delays are 
regarded as variable parameters. Irrespective 
of the delay model employed, this phase 
can be formulated as the dual of a min- 
cost network flow problem. Using |V| to 
denote the number of transistors and |£| the 
number of wires in the circuit, this step in 
our application has worst-case complexity of 
O(\V||E|log(og|V|)) [7]. 

¢ The W-phase where transistor/gate delays are 
assumed fixed and their sizes are regarded as 
variable parameters. As long as the gate delay 
can be expressed as a separable function of the 
transistor sizes, this step can be solved as a 
Simple Monotonic Program (SMP) [11]. The 
complexity of SMP is similar to an all-pairs 
shortest-path algorithm in a directed graph, 
[4,11], ie, O(V||E]). 


The objective function for the problem is the 
minimization of circuit area. In the W-phase, this 
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Gate Sizing, Table 1 Comparison of TILOS and MIN- 
FLOTRANSIT on a Sun Ultraspare 10 workstation for 
ISCAS85 and MCNC91 benchmarks for 0.13 um technol- 


813 


ogy. The delay specs are with respect to a minimum-sized 
circuit. The optimization approach followed here was gate 
sizing 


Area saved over CPU time 
Circuit #Gates TILOS (%) Delay specs. (Din) | CPU time (TILOS) (s) | (OURS) (s) 
Adder32 480 <1 0.5 2.2 5 
Adder256 3,840 <1 0.5 262 608 
Cm163a 65 2.1 0.55 0.13 0.32 
Cm162a 71 10.4 0.5 0.23 0.96 
Parity8 89 «37 0.45 0.68 2.15 
Frg1 177 1.9 0.7 0.55 1.49 
Population 518 6.7 0.4 57 179 
Pmult8 1,431 5 0.5 637 1476 
Alu2 826 2.6 0.6 28 71 
C432 160 9.4 0.4 0.5 4.8 
C499 202 7.2 0.57 1.47 11.26 
C880 383 4 0.4 2.7 8,2 
C1355 546 9.5 0.4 29 76 
C1908 880 4.6 0.4 36 84 
C2670 1,193 9.1 0.4 27 69 
C3540 1,669 7.7 0.4 226 651 
C5315 2,307 2 0.4 90 201 
C6288 2,416 16.5 0.4 1,677 4,138 
C7552 3,512 3.3 0.4 320 683 


objective is addressed directly, and in the D-phase 
the objective is chosen to facilitate a move in the 
solution space in a direction that is known to lead 
to a reduction in the circuit area. 


Applications 


The primary application of the solution provided 
here is circuit and system optimization in auto- 
mated VLSI design. The solution provided here 
can enable electronic design automation (EDA) 
tools that take a holistic approach toward tran- 
sistor sizing. This will in turn enable making 
custom circuit design flows more realizable in 
practice. The mechanics of some of the elements 
of the solution provided here especially the D- 
phase have been used to address other circuit 
optimization problems [14]. 


Open Problems 


The related problem of discrete gate sizing op- 
timization matching gate sized to available gate 
sizes from a standard cell library is a provably 
hard optimization problem which could be aided 
by the development of efficient heuristics and 
probabilistic algorithms. 


Experimental Results 


A telative comparison of MINFLOTRANSIT 
with TILOS is provided in Table | for gate 
sizing of ISACS85 and mcnc91 benchmark 
circuits. As can be seen a significant performance 
improvement is observed with a tolerable loss in 
execution time. 
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Problem Definition 


This problem is concerned with the computa- 
tional complexity of finding an exchange market 
equilibrium. The exchange market model consists 
of a set of agents, each with an initial endowment 
of commodities, interacting through a market, 
trying to maximize each’s utility function. The 
equilibrium prices are determined by a clearance 
condition. That is, all commodities are bought, 
collectively, by all the utility maximizing agents, 
subject to their budget constraints (determined by 
the values of their initial endowments of com- 
modities at the market price). The work of Deng, 
Papadimitriou and Safra [3] studies the com- 
plexity, approximability, inapproximability, and 
communication complexity of finding equilib- 
rium prices. The work shows the NP-hardness 
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of approximating the equilibrium in a market 
with indivisible goods. For markets with divisible 
goods and linear utility functions, it develops 
a pseudo-polynomial time algorithm for comput- 
ing an e-equilibrium. It also gives a communi- 
cation complexity lower bound for computing 
Pareto allocations in markets with non-strictly 
concave utility functions. 


Market Model 

In a pure exchange economy, there are m traders, 
labeled by i = 1,2,...,m, and n types of com- 
modities, labeled by j = 1,2,...,n. The com- 
modities could be divisible or indivisible. Each 
trader i comes to the market with initial en- 
dowment of commodities, denoted by a vector 
w; € R‘_, whose j-th entry is the amount of com- 
modity j held by trader i. 

Associate each trader i a consumption set 
X; to represents the set of possible commodity 
bundles for him. For example, when there are 
n , divisible commodities and (n — n 1) indivis- 
ible commodities, X; can be R’j! x Z’,"!. Each 
trader has a utility function X; + R+ to present 
his utility for a bundle of commodities. Usually, 
the utility function is required to be concave and 
nondecreasing. 

In the market, each trader acts as both a buyer 
and a seller to maximize his utility. At a cer- 
tain price p € R”, trader i is is solving the 
following optimization problem, under his budget 
constraint: 

max u;(x;) s.t. x;¢X; and (p, x;)<(p, wi). 
Definition 1 An equilibrium in a pure exchange 
economy is a price vector p € IR". and bundles of 
commodities {x; € R%.,i = 1,...,m}, such that 


x; €argmax{u; (x;)|xj;EX; and (x;, p)<(wj, P)}, 
Vl<i<m 
m m 
diy s Dow. Vi sj 0. 
i=1 i=1 


The concept of approximate equilibrium was in- 
troduced in [3]: 
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Definition 2 ([3 )] An €-approximate equilibrium 
in an exchange market is a price vector p € R’ 
and bundles of goods {x; € R4.,i = 1,...,m}, 
such that 


7 1 = 
uj (X;) => —— max{u;(x;)|x; € X;, (xj, D) 
l+e 


< (wi, p)}, Vi (1) 
(xi, P) < A + €)(wi, p), Vi (2) 
Yor <U+E > wyVi- GB) 


i=1 i=1 


Key Results 


A linear market is a market in which all the agents 
have linear utility functions. The deficiency of 
a market is the smallest « > 0 for which an e- 
approximate equilibrium exists. 


Theorem 1 The deficiency of a linear market 
with indivisible goods is NP-hard to compute, 
even if the number of agents is two. The deficiency 
is also NP-hard to approximate within 1/3. 


Theorem 2 There is a polynomial-time algo- 
rithm for finding an equilibrium in linear markets 
with bounded number of divisible goods. Ditto for 
a polynomial number of agents. 


Theorem 3 /f the number of goods is bounded, 
there is a polynomial-time algorithm which, for 
any linear indivisible market for which a price 
equilibrium exists, and for any € > 0, finds an €- 
approximate equilibrium. 


If the utility functions are strictly concave and the 
equilibrium prices are broadcasted to all agents, 
the equilibrium allocation can be computed 
distributely without any communication, since 
each agent’s basket of goods is uniquely 
determined. However, if the utility functions are 
not strictly concave, e.g., linear functions, com- 
munications are needed to coordinate the agents’ 
behaviors. 
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Theorem 4 Any protocol with binary domains 
for computing Pareto allocations of m agents 
and n divisible commodities with concave utility 
Junctions (resp. €-Pareto allocations for indivisi- 
ble commodities, for any € < 1) must have mar- 
ket communication complexity §2(m log(m + n)) 
bits. 


Applications 


This concept of market equilibrium is the out- 
come of a sequence of efforts trying to fully un- 
derstand the laws that govern human commercial 
activities, starting with the “invisible hand” of 
Adam Smith, and finally, the mathematical con- 
clusion of Arrow and Debreu [1] that there exists 
a set of prices that bring supply and demand into 
equilibrium, under quite general conditions on 
the agent utility functions and their optimization 
behavior. 

The work of Deng, Papadimitriou and 
Safra [3] explicitly called for an algorithmic 
complexity study of the problem, and developed 
interesting complexity results and approximation 
algorithms for several classes of utility functions. 
There has since been a surge of algorithmic study 
for the computation of the price equilibrium 
problem with continuous variables, discovering 
and rediscovering polynomial time algorithms 
for many classes of utility functions, see [2, 
4-9]. 

Significant progress has been made in the 
above directions but only as a first step. New 
ideas and methods have already been invented 
and applied in reality. The next significant step 
will soon manifest itself with many active stud- 
ies in microeconomic behavior analysis for E- 
commercial markets. Nevertheless the algorith- 
mic analytic foundation in [3] will be an in- 
dispensable tool for further development in this 
reincarnated exciting field. 


Open Problems 


The most important open problem is what is the 
computational complexity for finding the equilib- 
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rium price, as guaranteed by the Arrow—Debreu 
theorem. To the best of the author’s knowledge, 
only the markets whose set of equilibria is con- 
vex can be solved in polynomial time with cur- 
rent techniques. And approximating equilibria in 
some markets with disconnected set of equilibria, 
e.g., Leontief economies, are shown to be PPAD- 
hard. Is the convexity or (weakly) gross substi- 
tutability a necessary condition for a market to be 
polynomial-time solvable? 

Second, how to handle the dynamic case is es- 
pecially interesting in theory, mathematical mod- 
eling, and algorithmic complexity as bounded 
rationality. Great progress must be made in those 
directions for any theoretical work to be mean- 
ingful in practice. 

Third, incentive compatible mechanism de- 
sign protocols for the auction models have been 
most actively studied recently, especially with 
the rise of E-Commerce. Especially at this level, 
a proper approximate version of the equilibrium 
concept handling price dynamics should be espe- 
cially important. 
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Problem Definition 


The generalized Steiner network problem is a net- 
work design problem, where the input consists of 
a graph together with a collection of connectivity 
requirements, and the goal is to find the cheapest 
subgraph meeting these requirements. 

Formally, the input to the generalized Steiner 
network problem is an undirected multigraph 
G = (V, E), where each edge e € E has a non- 
negative cost c(e), and for each pair of ver- 
ticesi, 7 € V, there is a connectivity requirement 
ri,; € Z. A feasible solution is a subset E’ C E 
of edges, such that every pairi, 7 € V of vertices 
is connected by at least r;,; edge-disjoint path 
in graph G’ = (V, E’). The generalized Steiner 
network problem asks to find a solution E’ of 
minimum cost 7 <¢ 77 c(e). 
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This problem generalizes several classical net- 
work design problems. Some examples include 
minimum spanning tree, Steiner tree and Steiner 
forest. The most general special case for which 
a 2-approximation was previously known is the 
Steiner forest problem [1, 4]. 

Williamson et al. [8] were the first to 
show a non-trivial approximation algorithm 
for the generalized Steiner network prob- 
lem, achieving a 2k-approximation, where 
k = max;,jev{ri,;}. This result was improved to 
O(log k)-approximation by Goemans et al. [3]. 


Key Results 


The main result of [6] is a factor-2 approximation 
algorithm for the generalized Steiner network 
problem. The techniques used in the design and 
the analysis of the algorithm seem to be of inde- 
pendent interest. 

The 2-approximation is achieved for a more 
general problem, defined as follows. The in- 
put is a multigraph G = (V, £) with costs c(-) 
on edges, and connectivity requirement function 
f 2” — Z. Function f is weakly submodular, 
i.e., it has the following properties: 


1. f(V) =0. 
2. For all A, B C V, at least one of the following 
two conditions holds: 


* f(A) + f(B) s f(A\ B)+ f(B\ A). 
* f(A) + f(B) s f(A B) + f(AU B). 


For any subset S C V of vertices, let 5(S) 
denote the set of edges with exactly one endpoint 
in S. The goal is to find a minimum-cost subset of 
edges FE’ C E, such that for every subset S C V 
of vertices, |6(S)M E’| > f(S). 

This problem can be equivalently expressed as 
an integer program. For each edge e € E, let x, 
be the indicator variable of whether e belongs to 
the solution. 

(IP) min 2 c(e)Xe 


ecE 


subject to: 
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It is easy to see that the generalized Steiner 
network problem is a special case of (IP), where 
foreach S CV, f(S) = maxjes,jgstri,j}- 


Techniques 

The approximation algorithm uses the LP- 
rounding technique. The initial linear program 
(LP) is obtained from (IP) by replacing the 
integrality constraint (2) with: 


O0<xe <1 Veeck (3) 
It is assumed that there is a separation oracle 
for (LP). It is easy to see that such an oracle 
exists if (LP) is obtained from the generalized 
Steiner network problem. The key result used in 
the design and the analysis of the algorithm is 


summarized in the following theorem. 


Theorem 1 Jn any basic solution of (LP), there 
is at least one edge e € E with xe > 1/2. 


The approximation algorithm works by 
iterative LP-rounding. Given a basic optimal 
solution of (LP), let E* C E be the subset of 
edges e with x, > 1/2. The edges of E* are 
removed from the graph (and are eventually 
added to the solution), and the problem is then 
solved recursively on the residual graph, by 
solving (LP) on G* = (V,E \ E*), where for 
each subset S CV, the new requirement is 
F(S) — |6($) N E*|. The main observation that 
leads to factor-2 approximation is the following: 
if FE’ is a 2-approximation for the residual 
problem, then E£’U E* is a 2-approximation 
for the original problem. 

Given any solution to (LP), set S CV is 
called tight iff constraint (1) holds with equality 
for S. The proof of Theorem | involves con- 
structing a large laminar family of tight sets 
(a family where for every pair of sets, either 
one set contains the other, or the two sets are 
disjoint). After that a clever accounting scheme 
that charges edges to the sets of the laminar 
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family is used to show that there is at least one 
edge e € E with xe > 1/2. 


Applications 


Generalized Steiner network is a very basic and 
natural network design problem that has many 
applications in different areas, including the de- 
sign of communication networks, VLSI design 
and vehicle routing. One example is the design 
of survivable communication networks, which 
remain functional even after the failure of some 
network components (see [5] for more details). 


Open Problems 


The 2-approximation algorithm of Jain [6] for 
generalized Steiner network is based on LP- 
rounding, and it has high running time. It 
would be interesting to design a combinatorial 
approximation algorithm for this problem. 

It is not known whether a better approximation 
is possible for generalized Steiner network. Very 
few hardness of approximation results are known 
for this type of problems. The best current hard- 
ness factor stands on 1.01063 [2], and this result 
is valid even for the special case of Steiner tree. 
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Problem Definition 


In the generalized two-server problem, we are 
given two servers: one moving in a metric space 
X and one moving in a metric space Y. They are 
to serve requests r € X x Ywhich arrive one by 
one. A request r = (x,y) is served by moving 
either the X-server to point x or the Y-server 
to point y. The decision as to which server to 
move to the next request is irrevocable and has 
to be taken without any knowledge about future 
requests. The objective is to minimize the total 
distance traveled by the two servers (Fig. 1). 
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Generalized Two-Server Problem, Fig. 1 In this ex- 
ample, both servers move in the plane and start from 
the configuration (Xo, yo). The X-server moves through 
requests | and 3, and the Y-server takes care of requests 
2 and 4. The cost of this solution is the sum of the path- 
lengths 


Online Routing Problems 

The generalized two-server problem belongs to a 
class of routing problems called metrical service 
systems [4, 10]. Such a system is defined by a 
metric space M of all possible system configu- 
rations, an initial configuration Co, and a set R of 
possible requests, where each request r € FR is 
a subset of M. Given a sequence, r1,/2...,Tn, 
of requests, a feasible solution is a sequence, 
C1,C2,...,Cn, of configurations such that C; € 
r; for alli €{1,...,n}. 

When we model the generalized two-server 
problem as a metrical service system we have 
M = Xx Yand R = {{x x Y}U {Xx y}|x € 
X,y € Y}. In the classical two-server problem, 
both servers move in the same space and receive 
the same requests, that is, M = X x X and 
R = {{x x Y}U{X x x}|x € X}. 

The performance of algorithms for online opti- 
mization problems is often measured using com- 
petitive analysis. We say that an algorithm is 
a-competitive (a >=1) for some minimization 
problem if for every possible instance the cost 
of the algorithm’s solution is at most a times the 
cost of an optimal solution for the instance. 

A standard algorithm that performs provably 
well for several elementary routing problems is 
the so-called work function algorithm [2, 5, 8]; 
after each request, the algorithm moves to a con- 
figuration with low cost and which is not too far 
from the current configuration. More precisely, 
if the system’s configuration after serving a se- 
quence o is C andr C Mis the next request, then 
the work function algorithm with parameter A >1 
moves to a configuration C’ € r that minimizes 


AWg,r(C’) = d(C, C’), 
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where d(C,C’) is the distance between configu- 
rations C and C’, and W,,-(C’) is the cost of an 
optimal solution that serves all requests (in order) 
in o plus request r with the restriction that it ends 
in configuration C’. 


Key Results 


The main result in [11] is a sufficient condition 
for a metrical service system to have a constant- 
competitive algorithm. Additionally, the authors 
show that this condition holds for the generalized 
two-server problem. 

For a fixed metrical service system S with 
metric space M, denote by A(C,o) the cost of 
algorithm A on input sequence o, starting in 
configuration C. Let OPT(C, a) be the cost of the 
corresponding optimal solution. We say that a 
path T in M serves a sequence o if it visits all 
requests in order. Hence, a feasible path is a path 
that serves the sequence and starts in the initial 
configuration. 

Paths 7 and T> are said to be independent if 
they are far apart in the following way: |7;| + 
|T2| < d(C¥,C\) + d(C{,C}), where C? and 
Ci are, respectively, the start and end point of 
path 7;(i €{1,2}). Notice, for example, that two 
intersecting paths are not independent. 


Theorem 1 Let S be a metrical service system 
with metric space M. Suppose there exists an 
algorithm A and constants a >1, B >0, and 
m >2 such that for any point C € M, sequence 
o and pairwise independent paths T,, T2,..., Tm 
that serve o 


m 
A(C,o) <@OPTC,o)+ BD IT]. (1) 
i=1 
Then there exists an algorithm B that is constant 
competitive for S. 


The proof in [11] of the theorem above pro- 
vides an explicit formulation of B. This algo- 
rithm combines algorithm A with the work func- 
tion algorithm and operates in phases. In each 
phase, it applies algorithm A until its cost be- 
comes too large compared to the optimal cost. 
Then, it makes one step of the work function al- 
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gorithm and a new phase starts. In each phase, al- 
gorithm A makes a restart, that is, it takes the final 
configuration of the previous phase as the initial 
configuration, whereas the work function algo- 
rithm remembers the whole request sequence. 

For the generalized two-server problem the so- 
called balance algorithm satisfies condition (1). 
This algorithm stores the cumulative costs of the 
two servers and with each request it moves the 
server that minimizes the maximum of the two 
new values. The balance algorithm itself is not 
constant competitive but Theorem | says that, 
if we combine it in a clever way with the work 
function algorithm, then we get an algorithm that 
is constant competitive. 


Applications 


A set of metrical service systems can be com- 
bined to get what is called in [9] the sum system. 
A request of the sum system consists of one 
request for each system, and to serve it we need 
to serve at least one of the individual requests. 
The generalized two-server problem should be 
considered as one of the simplest sum systems 
since the two individual problems are completely 
trivial: There is one server and each request 
consists of a single point. 

Sum systems are particularly interesting to 
model systems for information storage and re- 
trieval. To increase stability or efficiency, one 
may store copies of the same information in 
multiple systems (e.g., databases, hard disks). To 
retrieve one piece of information, we may read it 
from any system. However, to read information it 
may be necessary to change the configuration of 
the system. For example, if the database is stored 
in a binary search tree, then it is efficient to make 
online changes to the structure of the tree, that is, 
to use dynamic search trees [12]. 


Open Problems 
A proof that the work function algorithm is com- 


petitive for the generalized two-server problem 
(as conjectured in [9] and [11]) is still lacking. 
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Also, a randomized algorithm with a smaller 
competitive ratio than that of [11] is not known. 
No results (except for a lower bound) are known 
for the generalized problem with more than two 
servers. It is not even clear if the work function 
algorithm may be competitive here. 

There are systems for which the work function 
algorithm is not competitive. It would be inter- 
esting to have a nontrivial property that implies 
competitiveness of the work function algorithm. 
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Problem Definition 


Auctions are used for allocating goods, tasks, re- 
sources, etc. Participants in an auction include an 
auctioneer (usually a seller) and bidders (usually 
buyers). An auction has well-defined rules that 
enforce an agreement between the auctioneer and 
the winning bidder. Auctions are often used when 
a seller has difficulty in estimating the value of an 
auctioned good for buyers. 

The Generalized Vickrey Auction protocol 
(GVA) [5] is an auction protocol that can 
be used for combinatorial auctions [3] in 
which multiple items/goods are sold simul- 
taneously. Although conventional auctions 
sell a single item at a time, combinatorial 
auctions sell multiple items/goods. These 
goods may have interdependent values, e.g., 
these goods are complementary/substitutable 
and bidders can bid on any combination of 
goods. In a combinatorial auction, a bidder can 
express complementary/substitutable preferences 
over multiple bids. By taking into account 
complementary/substitutable preferences, the 
participants’ utilities and the revenue of the seller 
can be increased. The GVA is one instance of 
the Clarke mechanism [2, 4]. It is also called the 
Vickrey—Clarke—Groves mechanism (VCG). As 
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its name suggests, it is a generalized version of 
the well-known Vickrey (or second-price) auction 
protocol [6], proposed by an American economist 
W. Vickrey, a 1996 Nobel Prize winner. 

Assume there is a set of bidders N = {1,2, 
...,m} and a set of goods M = {1,2,...,m}. 
Each bidder i has his/her preferences over a bun- 
dle, i.e., a subset of goods B C M. Formally, this 
can be modeled by supposing that bidder i pri- 
vately observes a parameter, or signal, 6;, which 
determines his/her preferences. The parameter 6; 
is called the type of bidder i. A bidder is assumed 
to have a quasilinear, private value defined as 
follows. 


Definition 1 (Utility of a Bidder) The utility of 
bidder i, when i obtains B C M and pays pj, is 
represented as v (B, 0;) — pj. 


Here, the valuation of a bidder is determined 
independently of other bidders’ valuations. Also, 
the utility of a bidder is linear in terms of the 
payment. Thus, this model is called a quasilinear, 
private value model. 


Definition 2 (Incentive Compatibility) An 
auction protocol is (dominant-strategy) incentive 
compatible (or strategy-proof) if declaring 
the true type/evaluation values is a dominant 
strategy for each bidder, i.e., an optimal strategy 
regardless of the actions of other bidders. 


A combination of dominant strategies of all bid- 
ders is called a dominant-strategy equilibrium. 


Definition 3 (Individual Rationality) An auc- 
tion protocol is individually rational if no par- 
ticipant suffers any loss in a dominant-strategy 
equilibrium, i.e., the payment never exceeds the 
evaluation value of the obtained goods. 


Definition 4 (Pareto Efficiency) An auction 
protocol is Pareto efficient when the sum of 
all participants’ utilities (including that of the 
auctioneer), i.e., the social surplus, is maximized 
in a dominant-strategy equilibrium. 


The goal is to design an auction protocol that is 
incentive compatible, individually rational, and 
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Pareto efficient. It is clear that individual rational- 
ity and Pareto efficiency are desirable. Regarding 
the incentive compatibility, the revelation princi- 
ple states that in the design of an auction protocol, 
it is possible to restrict attention only to incentive 
compatible protocols without loss of general- 
ity [4]. In other words, if a certain property (e.g., 
Pareto efficiency) can be achieved using some 
auction protocol in a dominant-strategy equilib- 
rium, then the property can also be achieved using 
an incentive-compatible auction protocol. 


Key Results 


A feasible allocation is defined as a vector of 
n bundles B = (By,..., Bn), where Ujien Bj 
C M and forall j 4 j', By; ON By = @ hold. 

The GVA protocol can be described as 
follows. 


1. Each bidder i declares his/her type 6;, which 
can be different from his/her true type 6;. 

2. The auctioneer chooses an optimal allocation 
B* according to the declared types. More 
precisely, the auctioneer chooses B* defined 


as follows: 


3. Each bidder i pays p;, which is defined as 
follows (By and B; are the jth element of 


B~i and B*, respectively): 


pi= »~ v (B;",6)) - > v (BF. 4)). 


JEN\{i} JEN\{i} 


where B™! = arg max ~ U (B; , 6). 
B jen\{i} 


(1) 


The first term in Eq. (1) is the social surplus when 
bidder i does not participate. The second term is 
the social surplus except bidder i when i does 
participate. In the GVA, the payment of bidder i 
can be considered as the decreased amount of 
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the other bidders’ social surplus resulting from 
his/her participation. 

A description of how this protocol works is 
given below. 


Example I Assume there are two goods a and b, 
and three bidders, 1, 2, and 3, whose types are 
61, 02, and 63, respectively. The evaluation value 
for a bundle v(B, 6;) is determined as follows. 


tay {by tab} 
0, $6 $0 $6 
4 $0 $0 $8 
6; $0 $5 $5 


Here, bidder 1 wants good a only, and bidder 3 
wants good b only. Bidder 2’s utility is all-or- 
nothing, i.e., he/she wants both goods at the same 
time and having only one good is useless. 


Assume each bidder i declares his/her true 
type 6;. The optimal allocation is to allocate 
good a to bidder | and b to bidder 3, ie., 
Bt = ({a}, , {b}). The payment of bidder 1 
is calculated as follows. If bidder 1 does not 
participate, the optimal allocation would have 
been allocating both items to bidder 2, ie., 
Bul = ({}, {a, b}, {}) and the social surplus, 
ie, DV jen\y Y (Bs, 4;) is equal to $8. When 
bidder 1 does participate, bidder 3 obtains {b}, 
and the social surplus except for bidder 1, ie., 
Djemyay? (BF, 6;), is 5. Therefore, bidder 1 
pays the difference $8 — $5 = $3. The obtained 
utility of bidder 1 is $6 — $3 = $3. The payment 
of bidder 3 is calculated as $8 — $6 = $2. 


Generalized Vickrey 
Auction, Fig. 1 Utilities 
and Payments in the GVA 


Utility ($3) 
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The intuitive explanation of why truth telling 
is the dominant strategy in the GVA is as follows. 
In the GVA, goods are allocated so that the social 
surplus is maximized. In general, the utility of 
society as a whole does not necessarily mean 
maximizing the utility of each participant. There- 
fore, each participant might have an incentive for 
lying if the group decision is made so that the 
social surplus is maximized. 

However, the payment of each bidder in the 
GVA is cleverly determined so that the utility of 
each bidder is maximized when the social surplus 
is maximized. Figure | illustrates the relationship 
between the payment and utility of bidder | in 
Example |. The payment of bidder | is defined 
as the difference between the social surplus when 
bidder | does not participate (i.e., the length of 
the upper shaded bar) and the social surplus ex- 
cept bidder 1 when bidder 1 does participate (the 
length of the lower black bar), i.e., $8 — $5 = $3. 

On the other hand, the utility of bidder 1 
is the difference between the evaluation value 
of the obtained item and the payment, which 
equals $6 — $3 = $3. This amount is equal to 
the difference between the total length of the 
lower bar and the upper bar. Since the length 
of the upper bar is determined independently of 
bidder 1’s declaration, bidder 1 can maximize 
his/her utility by maximizing the length of the 
lower bar. However, the length of the lower bar 
represents the social surplus. Thus, bidder | can 
maximize his/her utility when the social surplus 
is maximized. Therefore, bidder 1 does not have 
an incentive for lying since the group decision is 
made so that the social surplus is maximized. 


Social Surplus when bidder | does not participate ($8) 


Payment 


($3) 


eS 


Bidder 1's evaluation value ($6) 


Social Surplus 
except bidder 1 ($5) 


Social Surplus ($11) 
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Theorem 1 The GVA is incentive compatible. 


Proof Since the utility of bidder i is assumed to 


be quasilinear, it can be represented as 
v (Bi, 6) — pi = v (Bi, 9) 


JEN\i} JEN\{i} 
= v (B;, 6;) + = v (BF,4;) 
JEN\Ki} 
— » v (B;",4;) 
JEN\{i} 


(2) 


The second term in Eq. (2) is determined inde- 
pendently of bidder i’s declaration. Thus, bid- 
der 1 can maximize his/her utility by maximizing 
the first term. However, B* is chosen so that 
» jenv (B oF 6 ) is maximized. Therefore, bid- 
der i can maximize his/her utility by declaring 
6j = 6;, i.e., by declaring his/her true type. O 


Theorem 2 The GVA is individually rational. 


Proof This is clear from Eq. (2), since the first 
term is always larger than (or at least equal to) 
the second term. Oo 


Theorem 3 The GVA is Pareto efficient. 


Proof From Theorem 1, truth telling is 
a dominant-strategy equilibrium. From the way 
of choosing the allocation, the social surplus 
is maximized if all bidders declare their true 
types. Oo 


Applications 


The GVA can be applied to combinatorial auc- 
tions, which have lately attracted considerable 
attention [3]. The US Federal Communications 
Commission has been conductingauctions for al- 
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locating spectrum rights. Clearly, there exist in- 
terdependencies among the values of spectrum 
rights. For example, a bidder may desire licenses 
for adjoining regions simultaneously, i.e., these 
licenses are complementary. Thus, the spectrum 
auctions is a promising application field of com- 
binatorial auctions and have been a major driving 
force for activating the research on combinatorial 
auctions. 


Open Problems 


Although the GVA has these good characteris- 
tics (Pareto efficiency, incentive compatibility, 
and individual rationality), these characteristics 
cannot be guaranteed when bidders can submit 
false-name bids. Furthermore, [1] pointed out 
several other limitations such as vulnerability to 
the collusion of the auctioneer and/or losers. 

Also, to execute the GVA, the auctioneer must 
solve a complicated optimization problem. Var- 
ious studies have been conducted to introduce 
search techniques, which were developed in the 
artificial intelligence literature, for solving this 
optimization problem [3]. 
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Problem Definition 


Geographic routing is a type of routing 
particularly well suited for dynamic ad hoc 
networks. Sometimes also called directional, 
geometric, location-based, or position-based 
routing, it is based on two principal assumptions. 
First, it is assumed that every node knows its own 
and its network neighbors’ positions. Second, the 
source of a message is assumed to be informed 
about the position of the destination. Geographic 
routing is defined on a Euclidean graph, that 
is a graph whose nodes are embedded in the 
Euclidean plane. Formally, geographic ad hoc 
routing algorithms can be defined as follows: 


Definition 1 (Geographic Ad Hoc Routing AI- 
gorithm) Let G = (V, E) be a Euclidean graph. 
The task of a geographic ad hoc routing algorithm 
A is to transmit a message from a source s € V 
to a destination t € V by sending packets over the 
edges of G while complying with the following 
conditions: 


e All nodes v € V know their geographic posi- 
tions as well as the geographic positions of all 
their neighbors in G. 
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¢ The source s is informed about the position of 
the destination rf. 

¢ The control information which can be stored 
in a packet is limited by O(log 7) bits, that is, 
only information about a constant number of 
nodes is allowed. 

¢ Except for the temporary storage of packets 
before forwarding, a node is not allowed to 
maintain any information. 


Geographic routing is particularly interesting, as 
it operates without any routing tables whatsoever. 
Furthermore, once the position of the destination 
is known, all operations are strictly local, that is, 
every node is required to keep track only of its 
direct neighbors. These two factors — absence of 
necessity to keep routing tables up to date and 
independence of remotely occurring topology 
changes — are among the foremost reasons why 
geographic routing is exceptionally suitable for 
operation in ad hoc networks. Furthermore, in 
a sense, geographic routing can be considered 
a lean version of source routing appropriate 
for dynamic networks: While in source routing 
the complete hop-by-hop route to be followed 
by the message is specified by the source, in 
geographic routing the source simply addresses 
the message with the position of the destination. 
As the destination can generally be expected 
to move slowly compared to the frequency 
of topology changes between the source and 
the destination, it makes sense to keep track 
of the position of the destination instead of 
maintaining network topology information up 
to date; if the destination does not move too 
fast, the message is delivered regardless of 
possible topology changes among intermediate 
nodes. 

The cost bounds presented in this entry are 
achieved on unit disk graphs. A unit disk graph 
is defined as follows: 


Definition 2 (Unit Disk Graph) Let V C R? be 
a set of points in the 2-dimensional plane. The 
graph with edges between all nodes with distance 
at most | is called the unit disk graph of V. 
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Unit disk graphs are often employed to model 
wireless ad hoc networks. 

The routing algorithms considered in this en- 
try operate on planar graphs, graphs that contain 
no two intersecting edges. There exist strictly 
local algorithms constructing such planar graphs 
given a unit disk graph. The edges of planar 
graphs partition the Euclidean plane into contigu- 
ous areas, so-called faces. The algorithms cited in 
this entry are based on these faces. 


Key Results 


The first geographic routing algorithm shown to 
always reach the destination was Face Routing 
introduced in [14]. 


Theorem 1 [f the source and the destination are 
connected, Face Routing executed on an arbi- 
trary planar graph always finds a path to the 
destination. It thereby takes at most O(n) steps, 
where n is the total number of nodes in the 
network. 


There exists however a geographic routing algo- 
rithm whose cost is bounded not only with respect 
to the total number of nodes, but in relation to 
the shortest path between the source and the 
destination: The GOAFRt™ algorithm [15, 16, 
18, 24] (pronounced as “gopher-plus”) combines 
greedy routing — where every intermediate node 
relays the message to be routed to its neighbor 
located nearest to the destination — with face 
routing. Together with the locally computable 
Gabriel Graph planarization technique, the effort 
expended by the GOAFR? algorithm is bounded 
as follows: 


Theorem 2 Let c be the cost of an optimal path 
from s to t in a given unit disk graph. GOAFR™ 
reaches t with cost O(c?) if s and t are connected. 
If s and t are not connected, GOAFR®™ reports so 
to the source. 


On the other hand it can be shown that — on 
certain worst-case graphs — no geographic routing 
algorithm operating in compliance with the above 
definition can perform asymptotically better than 
GOAFRT: 
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Theorem 3 There exist graphs where any deter- 
ministic (randomized) geographic ad hoc routing 
algorithm has (expected) cost 2(c?). 


This leads to the following conclusion: 


Theorem 4 The cost expended by GOAFR™ to 
reach the destination on a unit disk graph is 
asymptotically optimal. 


In addition, it has been shown that the GOAFR™ 
algorithm is not only guaranteed to have low 
worst-case cost but that it also performs well 
in average-case networks with nodes randomly 
placed in the plane [15, 24]. 


Applications 


By its strictly local nature geographic routing is 
particularly well suited for application in poten- 
tially highly dynamic wireless ad hoc networks. 
However, also its employment in dynamic net- 
works in general is conceivable. 


Open Problems 


A number of problems related to geographic 
routing remain open. This is true above all with 
respect to the dissemination within the network 
of information about the destination position and 
on the other hand in the context of node mobility 
as well as network dynamics. Various approaches 
to these problems have been described in [7] as 
well as in chapters 11 and 12 of [24]. More gen- 
erally, taking geographic routing one step further 
towards its application in practical wireless ad 
hoc networks [12, 13] is a field yet largely open. 
A more specific open problem is finally posed 
by the question whether geographic routing can 
be adapted to networks with nodes embedded in 
three-dimensional space. 


Experimental Results 


First experiences with geographic and in par- 
ticular face routing in practical networks have 
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been made [12, 13]. More specifically, problems 
in connection with graph planarization that can 
occur in practice were observed, documented, 
and tackled. 
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Problem Definition 


The central problem of private data analysis is to 
extract meaningful information from a statistical 
database without revealing too much about any 
particular individual represented in the database. 
Here, by a statistical database, we mean a mul- 
tiset D € X” of n rows from the data universe 
X. The notation |D| 2 n denotes the size of 
the database. Each row represents the information 
belonging to a single individual. The universe V 
depends on the domain. A natural example to 
keep in mind is ¥ = {0, 1, i.e., each row of the 
database gives the values of d binary attributes 
for some individual. 

Differential privacy formalizes the notion that 
an adversary should not learn too much about any 
individual as a result of a private computation. 
The formal definition follows. 
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Definition 1 ((8]) A randomized algorithm A 
satisfies (¢, 5)-differential privacy if for any two 
databases D and D’ that differ in at most a single 
row (ie., |DAD’| < 1), and any measurable 
event S in the range of A, 


Pr[A(D) € S] < e® Pr[ A(D’) € S] +8. 


Above, probabilities are taken over the internal 
coin tosses of A. 


Differential privacy guarantees to a data owner 
that allowing her data to be used for analysis does 
not risk much more than she would if she did not 
allow her data to be used. 

In the sequel, we shall call databases D and D’ 
that differ in a single row neighboring databases, 
denoted D ~ D’. Usually, the parameter ¢ is 
set to be a small constant so that e€ ~ 1+, 
and 6 is set to be no bigger than n~? or even 
n~°())_ The case of 5 = 0 often requires different 
techniques from the case 6 > 0; as is common 
in the literature, we shall call the two cases pure 
differential privacy and approximate differential 
privacy. 


Query Release 

In the query release problem, we are given a set 
O of queries, where each gq € Q is a function 
q : &” — R. Our goal is to design a differen- 
tially private algorithm A which takes as input a 
database D and outputs a list of answers to the 
queries in Q. We shall call such an algorithm a 
(query answering) mechanism. Here, we treat the 
important special case of query release for sets 
of linear queries. A linear query q is specified 
by a function g : ¥ — [-1,1], and, slightly 
abusing notation, we define the value of the query 
as q(D) o yen q(e). When g : © — {0, 1} is 
a predicate, q(D) is a counting query: it simply 
counts the number of rows of D that satisfy the 
predicate. 

It is easy to see that a differentially private 
algorithm (with any reasonable choice of ¢ and 6) 
cannot answer a nontrivial set of queries exactly. 
For this reason, we need to have a measure of 
error, and here we introduce the two most com- 
monly used ones: average and worst-case error. 
Assume that on an input database D and a set of 
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linear queries Q, the algorithm A gives answer 
q“(D) for query q € Q. The average errorof A 
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on the query set QO for databases of size at most n 
is equal to 


max 
D:|D\|<n 


€tTaye(A, QO, n) 4 


1 


re) 


21 >> Ig4(D) - q()/ |. 


qeQ 


The worst-case error is equal to 


eltyc(A, QO, 7) 4S max 


= max |g“(D)—q(D)|. 
D:|D\|<n 


qEQ 


In both definitions above, expectations are 
taken over the coin throws of A. We also 
define erfaye(A, Q) = sup, effayg(A, Q,7), and 
respectively errwe(A, Q) = sup, effwe(A, Q,7), 
to be the maximum error over all database sizes. 
The objective in the query release problem is to 
minimize error subject to privacy constraints. 


Marginal Queries 

An important class of counting queries are the 
marginal queries. A k-way marginal query 
mars : {0,1}4 —> {0,1} is specified by a 
subset of attributes S C {1,...,d} of size k and 
a vector a € {0,1}5. The query evaluates to 1 
on those rows that agree with a on all attributes 
in S, ie. marga(y) = /jes Xi = 1 for any 
x € {0,1}%. Recall that, using the notation we 
introduced above, this implies that mars (D) 
counts the number of rows in the database D 
that agree with a on S. Marginal queries capture 
contingency tables in statistics and OLAP cubes 
in databases. They are widely used in the sciences 
and are released by a number of official agencies. 


Matrix Notation 

It will be convenient to encode the query release 
problem for linear queries using matrix notation. 
A common and very useful representation of a 
database D € XX” is the histogram representa- 
tion: the histogram of D is a vector x € P* (P is 
the set of nonnegative integers) such that for any 
X € &, xy is equal to the number of copies of 
in D. Notice that ||x||, = 7 and also that if x and 
x’ are, respectively, the histograms of two neigh- 


boring databases D and D’, then ||x — x’||; < 1 
(here ||x||1 = pane |xy| is the standard £; norm). 
Linear queries are a linear transformation of x. 
More concretely, let us define the query matrix 
A € [-1,1]2** associated with a set of linear 
queries Q by dg,y = g(x). Then it is easy to 
see that the vector Ax gives the answers to the 
queries QO on a database D with histogram x. 


Key Results 


A central object of study in geometric approaches 
to the query release problem is a convex body 
associated with a set of linear queries. Before 
introducing some of the main results and algo- 
rithms, we define this body. 


The Sensitivity Polytope 

Let A be the query matrix for some set of queries 
Q, and let x and x’ be the histograms of two 
neighboring databases, respectively, D and D’. 
Above, we observed that D ~ D’ implies that 
||x —x|]1 < 1. Let us use the notation By S {x: 
\|x |], < 1} for the unit ball of the 2; norm in R*. 
Then, Ax — Ax’ € Kg, where 


Kg & {Ax : |x| < 1p = A: BY 


is the sensitivity polytope associated with Q. 
In other words, the sensitivity polytope is the 
smallest convex body such that Ax’ € Ax + Kg 
for any histogram x and any histogram x’ of a 
neighboring database. In this sense, Kg describes 
how the answers to the queries Q can change 
between neighboring databases, which motivates 
the terminology. Informally, a differentially pri- 
vate algorithm must “hide” where in Ax + Kg 
the true query answers are. 
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Another very useful property of the sensitivity 
polytope is that the vector Ax of query answers 
for any database of size at most n is contained in 
n-Kg = {Ax: ||x]l1 <7}. 

Geometrically, Kg is a convex polytope in 
R&, centrally symmetric around 0, ic., Ko = 
—Kg. It is the convex hull of the points {tay : 
x € X}, where ay is the column of A indexed by 
the universe element y, i.e., dy = (¢(X))gea. 

The sensitivity polytope was introduced by 
Hardt and Talwar [12]. The name was suggested 
by Li Zhang. 


The Generalized Gaussian Mechanism 

We mentioned informally that a differentially 
private mechanism must hide where in Ax + Ko 
the true query answers lie. A simple formaliza- 
tion of this intuition is the Gaussian Mechanism, 
which we present here in a generalized geometric 
variant. 

Recall that an ellipsoid in R” is an affine 
transformation F - B27’ + y of the unit Euclidean 
ball B” & {x € R™ : ||xllo < 1 ((Ixllo is the 
usual Euclidean, i.e., £2 norm). In this article, we 
will only consider centrally symmetric ellipsoids, 
i.e., ellipsoids of the form E = F’- Bf”. 


Algorithm 1: Generalized Gaussian Mecha- 
nism Ag 
Input: (Public) Query set Q; ellipsoid FE = F - Be 
such that Kg C E. 
Input: (Private) Database D. 
Sample a vector g ~ N(O, ae al where 
0.5./e+/2mC/5) , 


Ce$ = € 2 
Compute the query matrix A and the histogram x for 
the database D; 

Output: Vector of query answers Ax + Fg. 


The generalized Gaussian mechanism Ag 
is shown as Algorithm |. The notation g ~ 
N (O05 5)" means that each coordinate of g 
is an independent Gaussian random variable 
with mean 0 and variance ong s- In the special 
case, when the ellipsoid F is just the Euclidean 
ball A> - BS, with radius equal to the diameter 
A> 4 MaXxyeKg ||y|l2 of Ka, Ag is the well- 
known Gaussian mechanism, whose privacy was 
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analyzed in [6-8]. The diameter A> is also known 
as the £2-sensitivity of Q and for linear queries 
is always upper bounded by lol . The privacy 
of the generalized version is an easy corollary of 
the privacy of the standard Gaussian mechanism 
(see [16] for a proof). 


Theorem 1 ((6-8,16]) For any ellipsoid E con- 
taining Kg, Ag satisfies (€,5)-differential pri- 
vacy. 


It is not hard to analyze the error of the 
mechanism Ag. Let E = F- B2, and recall the 
Hilbert-Schmidt norm || F'|l7s = ./tr(F FT) and 
the 1-to-2 norm || FT||;2 which is equal to the 
largest £2 norm of any row of F'. Geometrically, 
|| F lls is equal to the square root of the sum of 
squared major axis lengths of E, and || FT||;2 
is equal to the largest €.5 norm of any point in E. 
We have the error bounds 


= 
|Q| 
ett wo(A, Q) = O(Ce,8 V log |Q|) . | FT lis. 


elTaye(A, Q) = O(Ce,3) . |F lls: 


Surprisingly, for any query set Q, there exists 
an ellipsoid E such that the generalized Gaussian 
noise mechanism A g is nearly optimal among all 
differentially private mechanisms for Q. In order 
to formulate the result, let us define opt®.? (Q) 
(respectively, opt’? (Q)) to be the infimum of 
€tTaye(A, Q) (respectively, errye(A, Q)) over all 
(e, 6)-differentially private mechanisms A. 


Theorem 2 ([16]) Let E = F- BS be the ellip- 
soid that minimizes || F \|Hs over all ellipsoids E 
containing Ko. Then 


ettayg(Ag, Q) = O(log |Q| vlog 1/5)-opt%5 (Q). 


fE=F. RF minimizes || FT||1;+2 subject to 
Ko CE, then 


Cilwe (Ag ry Q) 
= O((log |Q|)>/? Vlog 1/8) - opt?5(Q). 


Minimizing ||F|lzs or ||FT|1.2 subject to 
Kg © F- By is aconvex minimization problem. 
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An optimal solution can be approximated to 
within any prescribed degree of accuracy in 
time polynomial in |Q| and |4’| via the ellipsoid 
algorithm. In fact, more efficient solutions are 
available: both problems can be formulated as 
semidefinite programs and solved via interior 
point methods, or one can also use the Plotkin- 
Shmoys-Tardos framework [1, 17]. Algorithm 
Ag also runs in time polynomial in n, |Q|, ||, 
since it only needs to compute the true query 
answers and sample |Q| many Gaussian random 
variables. Thus, Theorem 2 gives an efficient 
approximation to the optimal differentially 
private mechanism for any set of linear queries. 

The near-optimal mechanisms of Theorem 2 
are closely related to the matrix mechanism [13]. 
The matrix mechanism, given a set of queries 
Q with query matrix A, solves an optimiza- 
tion problem to find a strategy matrix M, then 
computes answers y to the queries Mx using 
the standard Gaussian mechanism, and outputs 
AM~'¥. The generalized Gaussian mechanism 
Ag instantiated with ellipsoid E = F - Be is 
equivalent to the matrix mechanism with strategy 
matrix F~! A. 

The proof of optimality for the generalized 
Gaussian mechanism is related to a fundamental 
geometric fact: if all ellipsoids containing a con- 
vex body are “large,” then the body itself must 
be “large.” In particular, if the sum of squared 
major axis lengths of any ellipsoid containing 
Kg is large, then Ko must contain a simplex of 
proportionally large volume. Moreover, this sim- 
plex is the convex hull of a subset of the contact 
points of Kg with the optimal ellipsoid. Since the 
contact points must be vertices of Kg, and all 
vertices of Ko are either columns of the query 
matrix A or their negations, this guarantees the 
existence of a submatrix of A with large determi- 
nant. Determinants of submatrices in turn bound 
opt’: (Q) from below (this is a consequence 
of a connection between combinatorial discrep- 
ancy and privacy [15], and the determinant lower 
bound on discrepancy [14]). This phenomenon is 
related to the Restricted Invertibility Principle of 
Bourgain and Tzafriri [4] and was established for 
the closely related minimum volume ellipsoid by 
Vershynin [18]. 
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The Gaussian noise mechanism can only pro- 
vide approximate privacy guarantees: when 6 = 
0, the noise variance scaling factor cz is un- 
bounded. The case of pure privacy requires differ- 
ent techniques. Nevertheless, (¢, 0)-differentially 
private algorithms with efficiency and optimality 
guarantees analogous to these in Theorem 2 are 
known [2,12]. They use a more complicated noise 
distribution. In the important special case when 
Kg is “well rounded” (technically, when Kg is 
isotropic), the noise vector is sampled uniformly 
from r - Kg, where r is a [’-distributed random 
variable. Optimality is established conditional 
on the Hyperplane Conjecture [12] or uncondi- 
tionally using Klartag’s proof of an isomorphic 
version of the conjecture [2]. 


The Projection Mechanism 

Despite the near-optimality guarantees, the gen- 
eralized Gaussian mechanism has some draw- 
backs that can limit its applicability. One issue 
is that in some natural scenarios, the universe 
size |X| can be huge, and running time linear 
in |X| is impractical. Another is that its error 
is sometimes larger even than the database size, 
making the query answers unusable. We shall 
see that a simple modification of the Gaussian 
mechanism, based on an idea from statistics, goes 
a long way towards addressing these issues. 

It is known that there exist sets of linear 
queries Q for which opt’? (Q) = 2/9) 
for any small enough constant ¢ and 6 [6, 9]. 
However, this lower bound only holds for large 
databases, and algorithms with significantly 
better error guarantees are known when n = 
o({Q|) [3, 10, 11]. We now know that there 
are (¢€,6)-differentially private algorithms that 
answer any set Q of linear queries on any 
database D € ¥” with average error at most 


O (arose eee") » GH) 


Moreover, for k-way marginal queries, this much 
error is necessary, up to factors logarithmic in n 
and |Q| [5]. Here, we describe a simple geometric 
algorithm from [16] that achieves this error bound 
for any Q. 
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We know that for any set of queries Q and 
a database of size n, the true query answers are 
between 0 and n. Therefore, it is always safe to 
take the noisy answers y output by the Gaussian 
mechanism and truncate them inside the interval 
[0, n]. However, we can do better by using knowl- 
edge of the query set Q. For any database D 
of size n, the true query answers y = Ax lie 
inn- Ko. This suggests a regression approach: 
find the vector of answers » € n- Kg which is 
closest to the noisy output y from the Gaussian 
mechanism. This is the main insight used in the 
projection mechanism (Algorithm 2). 


Algorithm 2: Projection Mechanism Aproj 


Input: (Public) Query set Q; 
Input: (Private) Database D € X”. 
Compute a noisy vector of query answers y with Ag 


for E = /|9Q|- Be. 
Compute a projection ¥ of y onton: Ko: 


~ A : eee 
J = argmingen.Kg lV — Fil. 
Output: Vector of answers y. 


The fact that the projection mechanism is 
(e, 6)-differentially private is immediate, because 
its only interaction with the database is via the 
(e, 6)-differentially private Gaussian mechanism, 
and post-processing cannot break differential 
privacy. 

The projection step in Ap,o, reduces the noise 
significantly when n = o(|Q|/e). Intuitively, in 
this case, n- Kg is small enough so that projection 
cancels a significant portion of the noise. Let us 
sketch the analysis. Let y = Ax be the true query 
answers and g = y—y the Gaussian noise vector. 
A simple geometric argument shows that || y — 
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to Answering Queries, 
Fig. 1 The projection 
mechanism Ap;oj on the 
left: the angle 0 is 
necessarily obtuse or right. 
The figure on the right 
shows the value of the 
support function h xg (g) 
is equal to sllglle times 
the width of Kg in the 
direction of g 
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S{]F < 2|(y — J, g)|: the main observation is that 
in the triangle formed by y, ¥, and ¥, the angle 
at » is an obtuse or right angle; see Fig. 1. Since 
y €n- Kg, there exists some histogram vector ¥ 
with |||], <7 such that » = AX. We can rewrite 
the inner product (y—¥, g) as (x—X, ATg). Now, 
we apply Holder’s inequality and get 


Selly — $13 < 2Eg|(x — &, ATg)| 


< 2Eg||x — Xi] ATSlloo 


< 4nEg||ATg loo. (2) 


The term E,||ATg||oo is the expected maximum 
of |X| Gaussian random variables, each with 
mean 0 and variance c?,|Q|?, and standard 


techniques give the bound O(c¢,5|Q|./log ||). 
Plugging this into (2) shows that erraye(Aproj, Q) 
is always bounded by (1). It is useful to 
note that |/ATg||,o is equal to hx,(g) 
MaXyeKg |(y,g)|, where hx, is the support 
function of Kg. Geometrically, hx (g) is equal 
to half the width of Ko in the direction of g, 
scaled by the Euclidean length of g (see Fig. 1). 
Thus, the average error of Ap;oj scales with the 
expected width of n - Kg ina random direction. 


Running in Time Sublinear in |v| 

An important example when running time linear 
in || is impractical is marginal queries: the size 
of the universe is 24 | which is prohibitive even 
for a moderate number of attributes. Notice, how- 
ever, that in order to compute y in Algorithm 2, 
we only need to compute the true query answers 
and add independent Gaussian noise to each. This 
can be done in time O(n|Q]). The computa- 


hko(9) 
llgl|2 
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Geometric Approaches to Answering Queries, Fig. 
2 The sensitivity polytope for 2-way marginals on 3 
attributes (/eft) and a spectrahedral relaxation of the 


tionally expensive operation then is computing 
the projection y. This is a convex optimization 
problem, and can be solved using the ellipsoid 
algorithm, provided we have a separation oracle 
for Kg. (A more practical approach is to use 
the Frank-Wolfe algorithm which can be imple- 
mented efficiently as long as we can solve arbi- 
trary linear programs with feasible region Kg.) 
For k-way marginals, after a linear transforma- 
tion that doesn’t significantly affect error, Ko can 
be assumed to be the convex hull of {+ y7®* : x € 
{—1, 134}, where 7? is the k-fold tensor power 
of y. Unfortunately, even for k = 2, separation 
for this convex body is NP-hard. Nevertheless, a 
small modification of the analysis of Aproj shows 
that the algorithm achieves asymptotically the 
same error bound if we project onto a convex 
body L such that Kg C L and Eghz(g) < 
O(1) - Eghxg(g). In other words, we need a 
convex L that relaxes Ko but is not too much 
wider than Kg in a random direction. If we can 
find such an L with an efficient separation oracle, 
we can implement A,,o; in time polynomial in Q 
and n while only increasing the error by a con- 
stant factor. For 2-way marginals, an appropriate 
relaxation can be derived from Grothendieck’s 
inequality and is formulated using semidefinite 
programming. The sensitivity polytope Kg and 
the relaxation L are shown for 2-way marginals 
on {0, 1}3 in Fig.2. Finding a relaxation L for 
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polytope (right). A projection onto 3 of the 12 queries 
restricted to the positive orthant is shown 


k-way marginals with efficient separation and 
mean width bound Eghz(g) < O()-Eghx,(g) 
is an open problem for k > 3. 


Optimal Error for Small Databases 


2 5 
We can refine the optimal error opt;y,(Q) to 
€,6 


a curve opt,;,(Q,), where opt’:3 (Q,n) is 
the infimum of erfay(A, Q,n) over all (¢, 65)- 
differentially private algorithms A. There exists 
an algorithm that, for any database of size at 
most m and any query set Q, has an average 
error only a polylogarithmic (in |Q|, ||, and 
1/5) factor larger than opté:? (Q, 2) [16]. The 
algorithm is similar to Aproj. However, the 
noise distribution used is the optimal one from 
Theorem 2. The post-processing step is also 
slightly more complicated, but the key step 
is again noise reduction via projection onto a 
convex body. The running time is polynomial 
in n,|Q|, |X|. Giving analogous guarantees for 
worst-case error remains open. 
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Problem Definition 


Urban street systems can be modeled by plane 
geometric networks G = (V, E) whose edges e € 
E are piecewise smooth curves that connect the 
vertices v € V C R?. Edges do not intersect, ex- 
cept at common endpoints in V. Since streets are 
lined with houses, the quality of such a network 
can be measured by the length of the connections 


Geometric Dilation of Geometric Networks 


Geometric Dilation of 
Geometric Networks, 
Fig.1 Minimum dilation 
embeddings of regular 
point sets 


A(S3) = 2/N3 


it provides between two arbitrary points p and q 
on G. 

Let &g(p,q) denote a shortest path from p to 
q in G. Then 


Sc (p.9)| 


(1) 
|pq| 


5(p,q) i= 


is the detour one encounters when using network 
G, in order to get from p to q, instead of walk- 
ing straight. Here, | . | denotes the Euclidean 
length. The geometric dilation of network G is 
defined by 

6(G) := 


sup 4(p,q). (2) 


p#qEG 


This definition differs from the notion of stretch 
factor (or spanning ratio) used in the context 
of spanners; see the monographs by Eppstein 
[6] or Narasimhan and Smid [11]. In the latter, 
only the paths between the vertices p,q € V 
are considered, whereas the geometric dilation 
involves all points on the edges as well. As a 
consequence, the stretch factor of a triangle T 
equals 1, but its geometric dilation is given by 
6(T) = /2/(1—cosa) > 2, where a < 60° 
is the most acute angle of 7. 

Presented with a finite set S of points 
in the plane, one would like to find a 
finite geometric network containing S whose 
geometric dilation is as small as possible. The 
value of 


A(S) := inf{5(G); G finite plane geometric 
network containing S} 


is called the geometric dilation of point set S. The 
problem is in computing, or bounding, A(S) for 
a given set S. 
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A(S4) = V2 A(S,,) = n/2 if n >5 


Key Results 


Theorem 1 ([4]) Let S, denote the set of 
corners of a regular n-gon. Then, A(S3) = 
2/73, A(S4)=V2, and A(Sp) = 1/2 forall 


n>5. 


The networks realizing these minimum values 
are shown in Fig. 1. The proof of minimality 
uses the following two lemmata that may be 
interesting in their own right. Lemma | was 
independently obtained by Aronov et al. [1]. 


Lemma 1 Let Tbe a tree containing Sy . Then 
6(T) > n/n. 


Lemma 2 follows from a result of Gromov’s 
[7]. It can more easily be proven by applying 
Cauchy’s surface area formula; see [4]. 


Lemma 2 Let C denote a simple closed curve in 
the plane. Then 5(C) > 1/2. 


Clearly, Lemma 2 is tight for the circle. The 
next lemma implies that the circle is the only 
closed curve attaining the minimum geometric 
dilation of 2/2. 


Lemma 3 ((3]) Let Cbe a simple closed curve 
of geometric dilation <n/2 + €(6). Then C is 
contained in an annulus of width 6. 


For points in general position, computing their 
geometric dilation seems quite complicated. Only 
for sets S = {A,B,C} of size three is the 
solution completely known. 


Theorem 2 ((5]) The plane geometric network 
of minimum geometric dilation containing three 
given points {A, B,C} is either a line segment, 
or a Steiner tree as depicted in Fig. 1, or a simple 
path consisting of two line segments and one 
segment of an exponential spiral; see Fig. 2. 
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Geometric Dilation of Geometric Networks, Fig. 2 
The minimum dilation embedding of points A, B, and C 


Gp 


Geometric Dilation of Geometric Networks, Fig. 3 A 
network of geometric dilation +1,6778 


The optimum path shown in Fig. 2 contains a 
degree two Steiner vertex, P, situated at distance 
|AB| from B. The path runs straight between 
A,B and B,P. From P to C, it follows an 
exponential spiral centered at A. 

The next results provide upper and lower 
bounds to A(S). 


Theorem 3 ([4]) For each finite point set S, the 
estimate A(S) < 1.678 holds. 


To prove this general upper bound, one can 
replace each vertex of the hexagonal tiling of R? 
with a certain closed Zindler curve (by definition, 
all point pairs bisecting the perimeter of a Zindler 
curve have identical distance). This results in a 
network Gr of geometric dilation ~1.6778; see 
Fig. 3. Given a finite point set S, one applies a 
slight deformation to a scaled version of Gr, 
such that all points of S lie on a finite part, 
G, of the deformed net. By Dirichlet’s result on 
simultaneous approximation of real numbers by 
rationals, a deformation small as compared to the 


Geometric Dilation of Geometric Networks 


cell size is sufficient, so that the dilation is not 
affected. See [8] for the history and properties of 
Zindler curves. 


Theorem 4 ((3]) There exists a finite point set S 
such that A(S) > (1 + 107!)x/2. 


Theorem 4 holds for the set S of 19 x 19 
vertices of the integer grid. Roughly, if S were 
contained in a geometric network G of dilation 
close to 2/2, the boundaries of the faces of G 
must be contained in small annuli, by Lemma 3. 
To the inner and outer circles of these annuli, one 
can now apply a result by Kuperberg et al. [9] 
stating that an enlargement, by a certain factor, of 
a packing of disks of radius <1 cannot cover a 
square of size 4. 


Applications 


The geometric dilation has applications in 
the theory of knots; see, e.g., Kusner and 
Sullivan [10] and Denne and Sullivan [2]. 
With respect to urban planning, the above 
results highlight principal dilation bounds for 
connecting given sites with plane geometric 
networks. 


Open Problems 


For practical applications, one would welcome 
upper bounds to the weight (= total edge length) 
of a geometric network, in addition to upper 
bounds on its geometric dilation. Some theoret- 
ical questions require further investigation, too. 
Is A(S) always attained by a finite network? 
How to compute, or approximate, A(S) for a 
given finite set S? What is the precise value of 
sup{A(S); S finite}? 
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Problem Definition 


Enumerating objects with the given property is 
one of basic problems in mathematics. We review 
some geometric objects enumeration problems 
and algorithms to solve them. 

A graph is planar if it can be embedded in the 
plane so that no two edges intersect geometrically 
except at a vertex to which they are both incident. 
A plane graph is a planar graph with a fixed 
planar embedding. A plane graph divides the 
plane into connected regions called faces. The 
unbounded face is called the outer face, and other 
faces are called inner faces. 

A plane graph is a floor plan if each face 
(including the outer face) is a rectangle. A based 
floor plan is a floor plan with one designated line 
on the contour of the outer face. The designated 
line is called the base line and we always draw the 
base line as the lowermost horizontal line of the 
drawing. The 25 based floor plans having 4 inner 
faces are shown in Fig. 1. Given an integer f the 
problem of floor plan enumeration asks for gen- 
erating all floor plans with exactly f inner faces. 

A plane graph is a plane triangulation if 
each inner face has exactly three edges on its 
contour. A based plane triangulation is a plane 
triangulation with one designated edge on the 
contour of the outer face. The designated edge is 
called the base edge. Triangulations are important 
model for 3D modeling. A graph is biconnected 
if removing any vertex always results in a 
connected graph. A graph is triconnected if 
removing any two vertices always results in a 
connected graph. Given two integers n and r, 
the problem of biconnected plane triangulation 
enumeration asks for generating all biconnected 
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plane triangulations with exactly n_ vertices 
including exactly r vertices on the outer face. 
Given two integers n and r, the problem of 
triconnected plane triangulation enumeration 
asks for generating all triconnected plane 
triangulations with exactly n vertices including 
exactly r vertices on the outer face. 


Key Results 


Enumeration of All Floor Plans 

Using reverse search method [1], one can 
enumerate all based floor plans with f inner 
faces in O(1) time for each [3]. We sketch the 
method in [3]. 

Let Sy be the set of all based floor plans 
with f > 1 inner faces. Let R be a based floor 
plan in Sy and F a face of R having the upper 
right corner of R. We have two cases. If R has a 
vertical line segment with upper end at the lower 
left corner of F, then by continually shrinking 
R to the uppermost horizontal line of R with 
preserving the width of F and enlarging the faces 
below R, we can have a based floor plan with one 
less inner face. If R has no vertical line segment 
with the upper end at the lower left corner of F, 
then R has a horizontal line segment with the 
right end at the lower left corner of F’, and then by 
continually shrinking R to the rightmost vertical 
line of R with preserving the height of F and 
enlarging the faces locating the left of R, we can 
have a base floor plan with one less inner face. 
Repeating this results in the sequence of based 
floor plans which always ends with the based 
floor plan with one inner face. See an example in 
Fig. 2. If we merge the sequence of all R in Sy, 
then we have the tree 7’'y in which every R in S'¢ 
appears as a leaf in T¢. See Fig. 3. 

The reverse search method efficiently 
traverses the tree (without storing the tree in 
the memory) and output each based floor plan 
in Sy at each corresponding leaf. Thus, we can 
efficiently enumerate all based floor plans in S . 
The algorithm enumerates all based floor plans in 
Sy in O(1) time for each. 


Enumeration of Triangulations 
Similarly, using reverse search method [1], given 
two integers n and r, one can enumerate all 
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based biconnected triangulations having exactly 
n vertices including exactly r vertices on the 
outer face in O(1) time for each [2], all based 
triconnected triangulations having exactly n ver- 
tices including exactly r vertices on the outer face 
in O(1) time for each [3], and all triconnected 
(non-based) plane triangulation having exactly n 
vertices including exactly 7 vertices on the outer 
face in O(r2n) time for each [3]. Also one can 
enumerate all based triangulation having exactly 
n vertices with exactly three vertices on the outer 
face in O(1) time for each [3]. 
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Problem Definition 


Finding the shortest path between a source and a 
destination is a natural optimization problem with 
many applications. Perhaps the oldest variant of 
the problem is the geometric shortest path prob- 
lem, in which the domain is physical space: the 
problem is relevant to human travelers, migrating 
animals, and even physical phenomena like wave 
propagation. The key feature that distinguishes 
the geometric shortest path problem from the cor- 
responding problem in graphs or other discrete 
spaces is the unbounded number of paths in a 
multidimensional space. To solve the problem 
efficiently, one must use the “shortness” criterion 
to limit the search. 

In computational geometry, physical space is 
modeled abstractly as the union of some number 
of constant-complexity primitive elements. The 
traditional formulation of the shortest path 
problem considers paths in a domain bounded 
by linear elements — line segments in two 
dimensions, triangles in three dimensions, and 
(d — 1)-dimensional simplices in d > 3. Canny 
and Reif showed that the three-dimensional 
shortest path problem is NP-complete [2], so 
this article will focus on the two-dimensional 
problem. 

We consider paths in a free space P bounded 
by / polygons — one outer boundary and (A — 1) 
obstacles — with a total of n vertices. The free 
space is closed, so paths may touch the boundary. 
The source and destination of the shortest path 
are points s and f inside or on the boundary of P. 
The goal of the shortest path problem is to find 
the shortest path from s to t inside P, denoted by 
m(s,t), as efficiently as possible, where running 
time and memory use are expressed as functions 
of n and h. The length of z(s,t) is denoted 
by dist(s,t); in some applications, it may be 
desirable to compute dist(s,t) without finding 
m(s,t) explicitly. 
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Key Results 


Visibility Graph Algorithms 

Early approaches to the two-dimensional shortest 
path problem exploited the visibility graph to 
reduce the continuous shortest path problem to a 
discrete graph problem [1, 12, 18]. The visibility 
graph is a graph whose nodes are s, t, and the 
vertices of P and whose edges (u,v) connect 
vertex pairs such that the line segment uv is 
contained in P. It is convenient and customary 
to identify the edges of the abstract visibility 
graph with the line segments they represent. The 
visibility graph is important because the edges 
of the shortest path z(s,t) are a subset of the 
visibility graph edges. This is easy to understand 
intuitively, because of subpath optimality — for 
any two points a,b € z/(s,t), the subpath of 
mt(s,t) between a and 5 is also the shortest path 
between a and b. In particular, if ab is contained 
in P, then (s,t) coincides with ab between a 
and b, and the distance dist(a, b) is equal to |ab|, 
the length of the segment ab. If 2(s, t) has a bend 
anywhere except at a vertex of P, an infinitesimal 
subpath near the bend can be shortened by a 
straight shortcut, implying that z(s,t) is not the 
shortest path. Hence every segment of z(s, 1) is 
an edge of the visibility graph. 

This observation leads directly to an algo- 
rithm for computing shortest paths: compute the 
visibility graph of P and then run Dijkstra’s 
algorithm to find the shortest path from s to t 
in the visibility graph. The visibility graph can 
be constructed in O(n logn + m) time, where 
m is the number of edges, using an algorithm of 
Ghosh and Mount [7]. Dijkstra’s algorithm takes 
O(n logn + m) time on a graph with n nodes 
and m edges [4], so this is the running time of 
the straightforward visibility graph solution to 
the shortest path problem. This algorithm can be 
quadratic in the worst case, since m, the number 
of visibility graph edges, can be as large as 
O(n”). 

The running time can be improved somewhat 
by noting that only a subset of the visibility graph 
edges can belong to a shortest path. In particular, 
any shortest path must turn toward the boundary 
of P at any path vertex. This limits the edges to 
common tangents of the polygons of P. We omit 
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Geometric Shortest 
Paths in the Plane, Fig. 1 
The spreading wavefront 


the details, but note that if s and ¢ are known, the 
common tangent restriction limits the number of 
visibility graph edges that may belong to z(s, t) 
to O(n + h?) [11]. These useful edges can be 
computed in O(n logn + h?) time [17], and so 
the shortest path can be computed in the same 
time bound by applying Dijkstra’s algorithm to 
the subgraph [11]. 


Continuous Dijkstra Algorithms 

Visibility graph approaches to finding the shortest 
path run in quadratic time in the worst case, since 
h may be @(n). This led Mitchell to propose 
an alternative approach called the continuous 
Dijkstra method [15]. Imagine a wavefront that 
spreads at unit speed inside P, starting from s. 
The wavefront at time t is the set of points in 
P whose geodesic (shortest path) distance from 
s is exactly t. Said another way, the shortest path 
distance from s to a point p € P is equal to the 
time at which the wavefront reaches p. 

The wavefront at time t is a union of paths 
and cycles bounding the region whose geodesic 
distance from s is at most t. Each path or cycle is 
a sequence of circular arc wavelets, each centered 
on a vertex v that is its root. The radius of the 
wavelet is t — dist(s,v), with dist(s,v) < T. 
As T increases, the combinatorial structure of the 
wavefront changes at discrete event times when 
the wavefront hits the free space boundary, col- 
lides with itself, or eliminates wavelets squeezed 
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between neighboring wavelets. See Fig. 1. The 
continuous Dijkstra method simulates the spread 
of this wavefront, computing the shortest path 
distance to every point of the free space in the 
process. 

Mitchell used the continuous Dijkstra method 
to compute shortest paths under the L; metric in 
O(n logn) time [15]. He later extended the ap- 
proach to compute L (Euclidean) shortest paths 
in O(n?/3+¢) time, for € arbitrarily small [16]. 
Hershberger and Suri gave an alternative im- 
plementation of the continuous Dijkstra scheme, 
using different data structures, that computes Eu- 
clidean shortest paths in O(n logn) time [9]. The 
next two subsections discuss these algorithms in 
more detail. 


Continuous Dijkstra with Sector 

Propagation Queries 

If p isa pointin P, (s, p) is a shortest path from 
s to p, and the predecessor of p is the vertex 
of z(s, p) adjacent to (immediately preceding) 
p in the path. If a point is reached by multiple 
shortest paths, it has multiple predecessors. The 
shortest path map is a linear-complexity partition 
of P into regions such that every point inside 
a region has the same predecessor. See Fig. 2. 
The root of each region is the predecessor of all 
points in the region. The edges of the shortest 
path map are polygon edges and bisectors (curves 
with two distinct predecessors, namely, the roots 
of the regions separated by the bisector). 
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Geometric Shortest Paths in the Plane, Fig. 2 The 
shortest path map for the wavefront in Fig. | 


Mitchell’s shortest path algorithm simulates 
the spread of the wavefront inside the shortest 
path map. This may seem a bit peculiar, since 
the shortest path map is not known until the 
shortest paths have been computed. The trick is 
that the algorithm builds the shortest path map 
as it runs, and it propagates a pseudo-wavefront 
inside its current model of the shortest path map 
at each step. The true wavefront is a subset of 
the pseudo-wavefront. This pseudo-wavefront is 
locally correct — each wavelet’s motion is deter- 
mined by its neighbors in the pseudo-wavefront 
and the shortest path map known so far — but 
it may overrun itself. When an overrun is de- 
tected, the algorithm revises its model of the 
shortest path map in the neighborhood of the 
overrun. 

To be more specific, each wavelet w is a 
circular arc centered at a root vertex r(w). The 
endpoints of w move along left and right tracks 
a(w) and f(w). Each track is either a straight 
line segment (a polygon edge or an extension 
of a visibility edge) or a bisector determined by 
w and the left/right neighbor wavefront L(w) or 
R(w). For example, if r = r(w) and r’ = 
r(L(w)), the left bisector is the set of points x 
such that dist(s,r) + |rx| = dist(s,r’) + |r’x|; 
consequently, the bisector is a hyperbolic arc. 
For every wavelet w, the algorithm computes a 
next event, which is the next value of t where 
w reaches an endpoint of one of its tracks, the left 
and right tracks collide, or w hits a polygon vertex 


843 


between its left and right tracks. (Collisions with 
polygon edges or other wavefront arcs are not 
detected.) The events for all wavelets are placed 
in a global priority queue and processed in order 
of increasing t values. 

When the algorithm processes an event, it up- 
dates the wavelets involved and their events in the 
priority queue. Processing wavelet collisions with 
a polygon vertex v is the most complicated case: 
To detect possible previous collisions with poly- 
gon edges, the algorithm performs a ray shooting 
query from r(w) toward v [8]. If the ray hits 
an edge, the algorithm traces the edge through 
the current shortest path map regions and updates 
the corresponding wavelets. If v is reached for 
the first time by w, then the algorithm updates the 
wavefront with a new wavelet rooted at v. If ver- 
tex v was previously reached by another wavelet, 
then there are previously undiscovered bisectors 
between w and the other wavelet. The algorithm 
traces these bisectors through its local shortest 
path map model and carves off portions that are 
reached by a shorter path following another route. 
Processing other events (track vertices and track 
collisions) is similar. 

Mitchell shows that even though vertices may 
be reached more than once by different wavelets, 
and portions of the shortest path map are carved 
off and discarded when they are discovered to 
be invalid, no vertex is reached more than O(1) 
times, and the total shortest path map complexity, 
even including discarded portions, is O(n). The 
most costly part of the algorithm is finding the 
first polygon vertex hit by each wavelet. All 
the rest of the algorithm — ray shooting, prior- 
ity queue, bisector tracing, and maintaining the 
shortest path map structure — can be done in 
O(n logn) total time. 

The complexity of Mitchell’s algorithm is 
dominated by wavelet dragging queries, which 
find the first obstacle vertex hit by a wavelet 
w between the left and right tracks a(w) and 
B(w). Mitchell phrases this as a ten-dimensional 
optimization problem dependent on the position 
and distance from s for the root of w and its 
neighbors in the wavefront, plus the start time tT. 
Although Euclidean distances are square roots 
of quadratics (by Pythagoras), Mitchell is able, 
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Geometric Shortest 
Paths in the Plane, Fig. 3 
The well-covering region 
for edge e is bounded by 
input(e) 


by squaring, substitution, and simplification, to 
convert the distance minimization problem into a 
linear optimization range query over a constant- 
size polyhedron in 9t°. (The objects in the range 
query are n 5-dimensional points, images of 
the polygon vertices.) There are O(n) such 
queries to be performed. Using known bounds 
and balancing preprocessing against query time, 
the O(n) queries can be answered in O(n5/3+¢) 
time and space [3, 13, 14]. All other parts of the 
algorithm take near-linear time, so the total time 
for Mitchell’s algorithm to find the Euclidean 
shortest path map is O(n5/3+€) [16]. 


Continuous Dijkstra in a Conforming 
Subdivision 

The challenge of implementing the continuous 
Dijkstra paradigm is that detecting and process- 
ing wavefront events in strict temporal order is 
difficult to do efficiently, but processing events 
out of order may lead to incorrect results or 
to processing too many invalid events. Mitchell 
addresses the challenge by detecting only one 
subclass of events in temporal order (wavelet con- 
tacts with polygon vertices) and repairing errors 
in the shortest path map structure as they are 
discovered. Hershberger and Suri achieve a better 
time bound (optimal O(” logn)) by processing 
events in an even more relaxed order [9]. The key 
to their approach is a subdivision of the free space 
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input(e) 


in which spatial locality is used to bound the tem- 
poral inaccuracy of wavefront event processing. 

As a simple example, consider a wavefront 
propagating across an obstacle-free plane that has 
been subdivided into a grid of unit squares. Each 
edge e of the grid lies at the center of a 4 x 5 
rectangle of squares. The distance from e to each 
of the 18 edges on the rectangle boundary is at 
least 2. If the wavefront source is outside the 
rectangle, then the first wavelet that reaches any 
point p € e must pass through the rectangle 
boundary at least two time units before it reaches 
p. By the triangle inequality, an edge of length 
6 is completely covered by the wavefront within 
time 6 of the time the wavefront first hits it. It 
follows that if the shortest path to p € e passes 
through an edge f on the rectangle boundary, 
edge f is completely covered by the wavefront at 
least one time unit before the wavefront reaches 
p. See Fig. 3. 

The algorithm propagates the wavefront from 
edge to edge in the grid. For each edge e, let 
input(e) be the edges on the boundary of the 
4x5 rectangle around e. The algorithm computes 
a cover time for e, denoted cover(e), that is an 
upper bound on the time when the wavefront 
completely covers e. If fvc(e) is the time at which 
the wavefront first contacts a vertex of e, and 
le| is the length of e, then cover(e) is defined 
to be fvc(e) + |e|. For each edge e, cover(e) is 
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determined by a wavefront passing through f € 
input(e), and cover(f) < cover(e). 

The propagation algorithm processes edges in 
order of cover time, computing the wavefront 
at e by combining the wavefronts from edges 
Ff € input(e) with cover(f) < cover(e). The 
combination algorithm is linear in the number 
of features of the shortest path map that lie 
inside the rectangle for e. The algorithm com- 
putes a one-dimensional representation of the 
intersection of the shortest path map with each 
edge of the grid; each bisector that has an event 
(an arc endpoint) within the input region of an 
edge e is flagged in the wavefront representation 
for e. To turn the one-dimensional wavefront 
representation at edges into a two-dimensional 
representation of the shortest path map, the algo- 
rithm combines the wavefronts of the edges on 
each cell’s boundary to compute the shortest path 
map inside the cell. (The algorithm computes 
additively weighted Voronoi diagrams [6] for the 
wavelet roots whose bisectors have events (end- 
points) in the cell, plus compact representations 
of the groups of bisectors that have no endpoints 
in the cell.) 

The key feature of the grid subdivision is well- 
covering property: each edge e is surrounded by 
a region that is the union of O(1) cells, and 
the distance from e to the region boundary is 
relatively large. In particular, if f is an edge on 
the boundary, dist(e, f) > 2-max(|e|, | |). This 
property allows the algorithm to perform spatial 
(not temporal) wavefront propagation at discrete 
cover times. Hershberger and Suri show how to 
extend the well-covering property to a special 
conforming subdivision of free space made up 
of O(n) constant-complexity cells. The wave- 
front propagation algorithm carries over from 
the grid to the conforming subdivision of free 
space with only a few changes to handle the 
obstacle vertices. As on the grid, the number of 
propagation steps and data structure changes is 
O(n). Including the overhead of a priority queue 
and data structure updates (full persistence is 
needed [5]) increases the time and space by a 
factor of O(logn), so the overall algorithm runs 
in O(n logn) time and space. 
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Extensions 
Hershberger and Suri’s algorithm supports 
multiple wavefront sources, including _line- 


segment sources. Hence the algorithm can be 
used to compute geodesic Voronoi diagrams, in- 
cluding Voronoi diagrams whose sites are points, 
segments, polygons, or combinations of all 
these. 

Since the publication of Hershberger and 
Suri’s optimal-time algorithm for shortest paths 
among polygonal obstacles, their result has been 
extended to other two-dimensional domains. 
Schreiber showed how to find shortest paths 
on the surface of a convex polyhedron in 
O(n logn) time [20]. His algorithm decomposes 
the surface into cells and then propagates 
wavefronts between cell edges similarly to 
Hershberger and Suri’s algorithm. Schreiber 
extended his algorithm for polyhedra to work 
for polygonal terrains as well, assuming that 
the maximum gradient of the terrain is bounded 
by a constant [19]. More recently, Hershberger, 
Suri, and Yildiz [10] extended the algorithm for 
polygonal obstacles [9] to find shortest paths in a 
free space bounded by curved obstacle edges. The 
conforming subdivision for the free space is very 
similar to that for polygonal obstacles; the chief 
difficulty is computing the positions of bisector 
events (intersections). Bisectors for polygonal 
obstacles are hyperbolic arcs, but they are much 
more complicated curves for curved obstacles. 
The algorithm of [10] approximates the bisector 
events using primitive tangent-finding operations 
on individual obstacle curves, with the result that 
the algorithm’s running time is O(n log(n/e)), 
where € is the relative error of the computed path 
length. 
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Problem Definition 


Consider a set S of 1 points in d-dimensional 
Euclidean space. A network on S can be modeled 
as an undirected graph G with vertex set S of 
size n and an edge set E where every edge (u, v) 
has a weight. A geometric (Euclidean) network 
is a network where the weight of the edge (u, v) 
is the Euclidean distance |wv| between its end 
points. Given a real number ¢ > 1, we say that 
G is at-spanner for S, if for each pair of points 
u,v € S, there exists a path in G of weight at 
most f times the Euclidean distance between u 
and v. The minimum ¢ such that G is a ¢-spanner 
for S is called the stretch factor, or dilation, of G. 
For a detailed description of many constructions 
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of t-spanners, see the book by Narasimhan and 
Smid [30]. The problem considered is the con- 
struction of f-spanners given a set S of n points 
in R@ and a positive real value t > 1, where 
d is a constant. The aim is to compute a good 
t-spanner for S with respect to the following 
quality measures: 


size: the number of edges in the graph 

degree: the maximum number of edges incident 
on a vertex 

weight: the sum of the edge weights 

spanner diameter: the smallest integer k such 
that for any pair of vertices u and v in S, 
there is a path in the graph of length at most 
t -|uv| between u and v containing at most k 
edges 

fault tolerance: the resilience of the graph to 
edge, vertex, or region failures 


Thus, good ft-spanners require large fault 
tolerance and small size, degree, weight, and 
spanner diameter. Additionally, the time required 
to compute such spanners must be as small as 
possible. 


Key Results 


This section contains descriptions of several 
known approaches for constructing a f-spanner 
of a set of points in Euclidean space. We also 
present descriptions of the construction of fault- 
tolerant spanners, spanners among polygonal 
obstacles, and, finally, a short note on dynamic 
and kinetic spanners. 
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Spanners of Points in Euclidean Space 

The most well-known classes of f-spanner 
networks for points in Euclidean space include 
@-graphs, WSPD graphs, and greedy spanners. 
In the following sections, the main idea of each 
of these classes is given, together with the known 
bounds on the quality measures. 


The ©-Graph 

The ©-graph was discovered independently by 
Clarkson and Keil in the late 1980s. The general 
idea is to process each point p € S independently 
as follows: partition R?@ into k simplicial cones 
of angular diameter at most 6 and apex at p, 
where k = O(1/04~1). For each nonempty cone 
C, an edge is added between p and the point in 
C whose orthogonal projection onto some fixed 
ray in C emanating from p is closest to p; see 
Fig. la. The resulting graph is called the 0- 
graph on S. The following result is due to Arya 
et al. [9]. 


Theorem 1 The ©-graph is a t-spanner of S for 


t= g with O| 57-7) edges and can 


a 
cos @—sin 


be computed in O (sda log’! n) time using 
O Ca +nlog?~? n) space. 


The following variants of the ©-graph also 
give bounds on the degree, spanner diameter, and 
weight. 


Skip-List Spanners 
The idea is to generalize skip lists and apply 
them to the construction of spanners. Construct a 


Geometric Spanners, Fig. 1 (a) Illustrating the O-graph and (b) a graph with a region fault 
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sequence of h subsets, S;,...,.S,, where S$; = 
S and §S; is constructed from S;—; as follows 
(reminiscent of the levels in a skip list). For each 
point in S;_, flip a fair coin. The set S; is the 
set of all points of S;-1 whose coin flip produced 
heads. The construction stops if S$; = @. For each 
subset, a @-graph is constructed. The union of the 
graphs is the skip-list spanner of S with dilation 


t, having O (gtr) edges and O(log) spanner 
diameter with high probability [9]. 


Gap Greedy 

A set of directed edges is said to satisfy the gap 
property if the sources of any two distinct edges 
in the set are separated by a distance that is at 
least proportional to the length of the shorter of 
the two edges. Arya and Smid [6] proposed an 
algorithm that uses the gap property to decide 
whether or not an edge should be added to the 
t-spanner graph. Using the gap property, the 
constructed spanner can be shown to have degree 
O(1/02-!) and weight O(logn - wt(MST(S))), 
where wt(MST(S)) is the weight of the minimum 
spanning tree of S. 


The WSPD Graph 

The well-separated pair decomposition (WSPD) 
was developed by Callahan and Kosaraju [12]. 
The construction of a ¢-spanner using the well- 
separated pair decomposition is done by first 
constructing a WSPD of S with respect to a 
separation constant s = ee Initially set 
the spanner graph G = (S,@) and add edges 
iteratively as follows. For each well-separated 
pair {A,B} in the decomposition, an edge 
(a,b) is added to the graph, where a and Db 
are arbitrary points in A and B, respectively. 
The resulting graph is called the WSPD graph 
on S. 


Theorem 2 The WSPD graph is a t-spanner for 
S with O(s¢ -n) edges and can be constructed in 
time O(s¢n+n logn), where s = 4(t+1)/(t—1). 


There are modifications that can be made to 
obtain bounded spanner diameter or bounded 
degree. 
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Bounded spanner diameter: Arya, Mount, and 
Smid [7] showed how to modify the construction 
algorithm such that the spanner diameter of the 
graph is bounded by 2logn. Instead of selecting 
an arbitrary point in each well-separated set, their 
algorithm carefully chooses a representative point 
for each set. 


Bounded degree: A single point v can be part 
of many well-separated pairs, and each of these 
pairs may generate an edge with an end point at v. 
Arya et al. [8] suggested an algorithm that retains 
only the shortest edge for each cone direction, 
thus combining the @-graph approach with the 
WSPD graph. By adding a postprocessing step 
that handles all high-degree vertices, a f-spanner 


of degree O ( ) is obtained. 


ae = 
(24-1 
The Greedy Spanner 
The greedy algorithm was first presented in 1989 
by Bern, and since then, the greedy algorithm 
has been subject to considerable research. The 
graph constructed using the greedy algorithm is 
called a Greedy spanner, and the general idea is 
that the algorithm iteratively builds a graph G. 
The edges in the complete graph are processed 
in order of increasing edge length. Testing an 
edge (u,v) entails a shortest path query in the 
partial spanner graph G. If the shortest path in 
G between u and v is at most ¢ - |uv|, then the 
edge (u, v) is discarded; otherwise, it is added to 
the partial spanner graph G. 

Das, Narasimhan, and Salowe [22] proved 
that the Greedy spanner fulfills the so-called 
leapfrog property. A set of undirected edges E 
is said to satisfy the t-leapfrog property, if for 
every k > 2, and for every possible sequence 
{(p1.91).---> (Pk, dk)} of pairwise distinct 
edges of F, 


k k=1 
tlpinl<)> > inate Jpsel im) 


i=2 i=1 


Using the leapfrog property, it has been shown 
that the total edge weight of the graph is within 
a constant factor of the weight of a minimum 
spanning tree of S. 
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Using Dijkstra’s  shortest-path algorithm, 
the greedy spanner can be constructed in 
O(n? logn) time. Bose et al. [10] improved the 
time to O(n? logn), while using O(n”) space. 
Alewijnse et al. [4] improved the space bound to 
O(n), while slightly increasing the time bound to 
O(n? log? n). 

Das and Narasimhan [21] observed that an 
approximation of the greedy spanner can be con- 
structed while maintaining the leapfrog property. 
This observation allowed for faster construction 
algorithms. 


Theorem 3 ((27]) The greedy spanner is a t- 
spanner of S with O (q57 loe(4)) edges, 


maximum degree O (Ax bos(4y), and 
weight O (= : wt(MST(S))) and can be 


computed in time O (ate log n). 
The Transformation Technique 
Chandra et al. [16, 17] introduced a transfor- 
mation technique for general metrics that trans- 
forms an algorithm for constructing spanners 
with small stretch factor and size into an algo- 
rithm for constructing spanners with the same 
asymptotic stretch factor and size, but with the 
additional feature of small weight. Elkin and 
Solomon [24] refined their approach to develop 
a transformation technique that achieved the fol- 
lowing: It takes an algorithm for constructing 
spanners with small stretch factor, small size, 
small degree, and small spanner diameter and 
transforms it into an algorithm for constructing 
spanners with a small increase in stretch factor, 
size, degree, and spanner diameter, but that also 
has small weight and running time. 

Using the transformation technique allowed 
Elkin and Solomon to prove the following theo- 
rem. 


Theorem 4 ((24]) For any set of n points in 
Euclidean space of any constant dimension d, 
any € > 0, and any parameter p = 2, there exists 
a (1+ €)-spanner with O(n) edges, degree O(p), 
spanner diameter O(log, n + a(p)), andweight 
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O(p-log, n-wt(MST)), which can be constructed 
in time O(n logn). 


Given the lower bounds proved by Chan and 
Gupta [13] and Dinitz et al. [23], these results 
represent optimal tradeoffs in the entire range of 
the parameter p. 


Fault-Tolerant Spanners 

The concept of fault-tolerant spanners was first 
introduced by Levcopoulos et al. [28] in 1998: 
After one or more vertices or edges fail, the span- 
ner should retain its good properties. In particular, 
there should still be a short path between any 
two vertices in what remains of the spanner after 
the fault. Czumaj and Zhao [19] showed that a 
greedy approach produces a k-vertex (or k-edge) 
fault-tolerant geometric t-spanner with degree 
O(k) and total weight O(k? - wt(MST(S))); 
these bounds are asymptotically optimal. Chan 
et al. [15] used a “standard net-tree with cross- 
edge framework” developed by [14,26] to design 
an algorithm that produces a k-vertex (or k- 
edge) fault-tolerant geometric (1 + €)-spanner 
with degree O(k?), diameter O(logn), and to- 
tal weight O(k?logn - wt(MST(S))). Such a 
spanner can be constructed in O(n logn + k?n) 
time. 

For geometric spanners, it is natural to con- 
sider region faults, i.e., faults that destroy all 
vertices and edges intersecting some geometric 
fault region. For a fault region F, let GOF be 
the part of G that remains after the points from S 
inside F and all edges that intersect F have 
been removed from the graph; see Fig. 1b. Abam 
et al. [2] showed how to construct region-fault 
tolerant ¢-spanners of size O(n logn) that are 
fault tolerant to any convex region fault. If one 
is allowed to use Steiner points, then a linear size 
t-spanner can be achieved. 


Spanners Among Obstacles 

The visibility graph of a set of pairwise non- 
intersecting polygons is a graph of intervisible 
locations. Each polygonal vertex is a vertex in 
the graph and each edge represents a visible 
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connection between them, that is, if two vertices 
can see each other, an edge is drawn between 
them. This graph is useful since it contains the 
shortest obstacle avoiding path between any pair 
of vertices. 

Das [20] showed that a f-spanner of the 
visibility graph of a point set in the Euclidean 
plane can be constructed by using the 0- 
graph approach followed by a pruning step. 
The obtained graph has linear size and constant 
degree. 


Dynamic and Kinetic Spanners 
Arya et al. [9] designed a data structure of 
size O(n log? n) that maintains the skip-list 
spanner, described in section “The ©-Graph,” 
in O(log? nloglogn) expected amortized time 
per insertion and deletion in the model of random 
updates. 

Gao et al. [26] showed how to maintain a f- 


spanner of size o( and maximum de- 


one 
(t-1)4 
gree O (> log a), in time O (287) per 
insertion and deletion, where a denotes the aspect 
ratio of S, i.e., the ratio of the maximum pairwise 
distance to the minimum pairwise distance. The 
idea is to use an hierarchical structure 7 with 
O(log a) levels, where each level contains a set 
of centers (subset of S'). Each vertex v on level 
i in T is connected by an edge to all other 


vertices on level i within distance O (4) of 
v. The resulting graph is a ¢-spanner of S' and it 
can be maintained as stated above. The approach 
can be generalized to the kinetic case so that 
the total number of events in maintaining the 
spanner is O(n? logn) under pseudo-algebraic 


motion. Each event can be updated in O ( Gaya ) 


time. 

The problem of maintaining a spanner under 
insertions and deletions of points was settled by 
Gottlieb and Roditty [5]: For every set of 1 points 
in a metric space of bounded doubling dimension, 
there exists a (1 + €)-spanner whose maximum 
degree is O(1) and that can be maintained under 
insertions and deletions of points, in O(logn) 
time per operation. 
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Recently several papers have considered the 
kinetic version of the spanner construction prob- 
lem. Abam et al. [1,3] gave the first data struc- 
tures for maintaining the ©-graph, which was 
later improved by Rahmati et al. [32]. Assuming 
the trajectories of the points can be described 
by polynomials whose degrees are at most a 
constant s, the data structure uses O(n log? n) 


space and handles O(n”) events with a total cost 


of O (ndzs+2(n) log?+! n 


), where A2542(”) is 
the maximum length of Davenport-Schinzel se- 
quences of order 2s +2 onn symbols. The kinetic 
data structure is compact, efficient, responsive (in 


an amortized sense), and local. 


Applications 


The construction of sparse spanners has been 
shown to have numerous application areas 
such as metric space searching [31], which 
includes query by content in multimedia 
objects, text retrieval, pattern recognition, and 
function approximation. Another example is 
broadcasting in communication networks [29]. 
Several well-known theoretical results also use 
the construction of t-spanners as a_ building 
block, for example, Rao and Smith [33] made a 
breakthrough by showing an optimal O(n log )- 
time approximation scheme for the well-known 
Euclidean traveling salesperson problem, using 
t-spanners (or banyans). Similarly, Czumaj and 
Lingas [18] showed approximation schemes for 
minimum-cost multi-connectivity problems in 
geometric networks. 


Open Problems 
A few open problems are mentioned below: 


1. Determine if there exists a fault-tolerant t- 
spanner of linear size for convex region 
faults. 

2. Can the k-vertex fault-tolerant spanner be 
computed in O(n logn + kn) time? 
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Experimental Results 


The problem of constructing spanners has re- 
ceived considerable attention from a theoretical 
perspective but not much attention from a prac- 
tical or experimental perspective. Navarro and 
Paredes [31] presented four heuristics for point 
sets in high-dimensional space (d = 20) and 
showed by empirical methods that the running 
time was O(n?-?4) and the number of edges in the 
produced graphs was O(n!-!3), Farshi and Gud- 
mundsson [25] performed a thorough compari- 
son of the construction algorithms discussed in 
section “Spanners of Points in Euclidean Space.” 
The results showed that the spanner produced by 
the original greedy algorithm is superior com- 
pared to the graphs produced by the other ap- 
proaches discussed in section “Spanners of Points 
in Euclidean Space” when it comes to number 
of edges, maximum degree, and weight. How- 
ever, the greedy algorithm requires O(n? logn) 
time [10] and uses quadratic space, which re- 
stricted experiments in [25] to instances contain- 
ing at most 13,000 points. Alewijnse et al. [4] 
showed how to reduce the space usage to linear 
only paying an additional O(log) factor in the 
running time. In their experiments, they could 
handle more than a million points. In a follow-up 
paper, Bouts et al. [11] gave further experimental 
improvements. 
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Problem Definition 


Given a graph G in which every edge has a 
nonnegative capacity, the goal of the minimum- 
cut problem is to find a subset of edges of G 
with minimum total capacity whose deletion dis- 
connects G. The closely related minimum (s, f)- 
cut problem further requires two specific vertices 
s and ¢ to be separated by the deleted edges. 
Minimum cuts and their generalizations play a 
central role in divide-and-conquer and network 
optimization algorithms. 

The fastest algorithms known for comput- 
ing minimum cuts in arbitrary graphs run in 
roughly O(mn) time for graphs with n vertices 
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and m edges. However, even faster algorithms 
are known for graphs with additional topologi- 
cal structure. This entry sketches algorithms to 
compute minimum cuts in near-linear time when 
the input graph can be drawn on a surface with 
bounded genus — informally, a sphere with a 
bounded number of handles. 


Problem 1 (Minimum (s,t)-Cut) 

INPUT: An undirected graph G = (V, E) em- 
bedded on an orientable surface of genus g, a 
nonnegative capacity function c: E — R, and 
two vertices s and t. OUTPUT: A minimum- 
capacity (s, t)-cut in G. 


Problem 2 (Global Minimum Cut) 

INPUT: An undirected graph G = (V, E) em- 
bedded on an orientable surface of genus g and 
a nonnegative capacity function c:E —> R. 
OUTPUT: A minimum-capacity cut in G. 


Key Results 


Topological Background 

A surface is a compact space in which each point 
has a neighborhood homeomorphic to either the 
plane or a closed half plane. Points with half- 
plane neighborhoods comprise the boundary of 
the surface, which is the union of disjoint simple 
cycles. The genus is the maximum number of 
disjoint simple cycles whose deletion leaves the 
surface connected. A surface is orientable if it 
does not contain a MAbius band. An embedding 
is a drawing of a graph on a surface, with vertices 
drawn as distinct points and edges as simple 
interior-disjoint paths, whose complement is a 
collection of disjoint open disks called the faces 
of the embedding. 

An even subgraph of G is a subgraph in which 
every vertex has even degree; each component 
of an even subgraph is Eulerian. Two even 
subgraphs of an embedded graph G are Z»- 
homologous, or in the same Z2-homology class, 
if their symmetric difference is the boundary 
of a subset of the surface. If G is embedded 
on a surface of genus g with b > 0 boundary 
cycles, the even subgraphs of G fall into 278+9-! 
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Zz-homology classes. An even subgraph of 
G is Z2-minimal if it has minimum total cost 
within its Z2-homology class. Each component 
of a Zp-minimal even subgraph is itself 
Z2-minimal. 

Every embedded graph G has a dual graph 
G*, embedded on the same surface, whose 
vertices correspond to faces of G and whose 
edges correspond to pairs of faces that share an 
edge in G. The cost of a dual edge in G* is 
the capacity of the corresponding primal edge 
in G. 

Duality maps cut to certain sets of cycles and 
vice versa. For example, the minimum-capacity 
(s,¢)-cut in any planar graph G is dual to the 
minimum-cost cycle in G* that separates the 
dual faces s* and t*. If we remove s* and ¢* 
from the sphere, the dual of the minimum cut 
is the shortest generating cycle of the resulting 
annulus [11, 15]. More generally, let X denote 
the set of edges that cross some minimum (s, t)- 
cut in an embedded graph G, and let X* denote 
the corresponding subgraph of the dual graph G*. 
Then X* is aminimum-cost even subgraph of G* 
that separates s* and t*. If we remove s* and 
t* from the surface, X¥* becomes a Z-minimal 
subgraph homologous with the boundary of s*. 


Crossing Sequences 

Our first algorithm [6] reduces computing a 
minimum (s,f)-cut in a graph embedded on a 
genus-g surface to g?‘%) instances of the planar 
minimum-cut problem. 

The algorithm begins by constructing a collec- 
tion A of 2g + 1 paths in G*, called a greedy 
system of arcs, with three important properties. 
First, the endpoints of each path are incident to 
the boundary faces s* and t*. Second, each path 
is composed of two shortest paths plus at most 
one additional edge. Finally, the complement 
» \ A of the paths is a topological open disk. A 
greedy system of arcs can be computed in O(gn) 
time [5,9]. 

We regard each component of X* as a closed 
walk, and we enumerate all possible sequences 
of crossings between the components of X* with 
the arcs in A. The components of any Z2-minimal 
even subgraph cross any shortest path, and there- 
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fore any arc in A, at most O(g) times. It follows 
that we need to consider at most g?°) crossing 
sequences, each of length at most O(g’). Fol- 
lowing Kutz [13], the shortest closed walk with a 
given crossing sequence is the shortest generating 
cycle in an annulus obtained by gluing together 
O(g”) copies of the disk © \ A, which can be 
computed in O(g2n log log) time using the pla- 
nar minimum-cut algorithm of Italiano et al. [12]. 
The overall running time of this algorithm is 
gn log logn. 

Surprisingly, a reduction from MAXCUT im- 
plies that finding the minimum-cost even sub- 
graph in an arbitrary Z2-homology class is NP- 
hard. Different reductions imply that it is NP-hard 
to find the minimum-cost closed walk [5] or 
simple cycle [2] in a given Z2-homology class. 


Z2-Homology Cover 

Our second algorithm [9] finds the minimum- 
cost closed walks in G* in every Z2-homology 
class by searching a certain covering space and 
then assembles X* from these closed walks via 
dynamic programming. 

As in our first algorithm, we first compute a 
greedy system of arcs A. The homology class of 
any cycle y is determined by the parity of the 
number of crossings of y with each arc in A. Each 
arc a; € A appears as two paths Cal and a; on 
the boundary of the disk D = ¥ \ A. 

We then construct a new surface ¥, called 
the Z2-homology cover of »’, by gluing together 
several copies of D as follows. We associate each 
homology class h € Yas with a vector of 
2g+1 bits. Let hAi denote the bit vector obtained 
from h by flipping its ith bit. For each bit vector 
h, we construct a copy Dy, of D; let a and a; 
denote the copies of at and a; on the boundary 
of Dy. Finally, we construct ¥ by identifying the 
paths oe; ;, and OF hai for each homology class h 
and index 7. 

This construction also yields a graph G em- 
bedded in SY’, with 28+! vertices vj, and edges ep 
for each vertex v and edge e of G*. Each edge ey, 
of G inherits the cost of the corresponding edge e 
in G*. Any walk in G projects to a walk in G* by 
dropping subscripts; in particular, any walk in G 
from vo to va, projects to a closed walk in G* with 
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homology class h that starts and ends at vertex v. 
Conversely, the shortest closed walk in G* in any 
homology class h is the projection of the shortest 
path from vo to vz, for some vertex v. 

Any cycle in any nontrivial homology class 
crosses some path a@;, an odd number of times, 
and therefore at least once. To find all such cycles 
for each index i, we slice G along the lifted path 
di,9 to obtain a new boundary cycle, and then 
compute the shortest path from each vertex vo on 
this cycle to every other vertex vp in 29)n logn 
time, using an algorithm of Chambers et al. [3]. 
Altogether, we find the shortest closed walk in 
every Z2-homology class in 2?)n logn time. 
The dual minimum cut X* can then be built from 
these Z-minimal cycles in 2?) additional time 
via dynamic programming. 


Global Minimum Cuts 

Our final result generalizes the recent O(n log 
logn)-time algorithm for planar graphs by Lacki 
and Sankowski [14], which in turn relies on the 
O(n loglogn)-time algorithm for planar mini- 
mum (s, ¢)-cuts of Italiano et al. [12]. 

The global minimum cut X in a surface graph 
G is dual to the minimum-cost nonempty sep- 
arating subgraph of the dual graph G*. In par- 
ticular, if G is planar, X is dual to the shortest 
nonempty cycle in G*. There are two cases to 
consider: either X* is a simple contractible cycle, 
or it isn’t. We describe two algorithms, one of 
which is guaranteed to return the minimum-cost 
separating subgraph. 

To handle the contractible cycle case, we first 
slice the surface »' to make it planar, first along 
the shortest non-separating cycle a in G*, which 
we compute in g?)nloglogn time using a 
variant of our crossing sequence algorithm, and 
then along a greedy system of arcs A connecting 
the resulting boundary cycles. Call the resulting 
planar graph D; each edge of a U A appears as 
two edges on the boundary of D. Let e* and e~ 
be edges on the boundary of D corresponding to 
some edge e of a. Using the planar algorithm of 
Lacki and Sankowski [14], we find the shortest 
cycle y* in D \ e* and the shortest cycle y~ in 
D \ e~. The shorter of these two cycles projects 
to a closed walk y in the original dual graph G*. 
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Results of Cabello [2] imply that if y is a simple 
cycle, it is the shortest contractible simple cycle 
in G*; otherwise, X* is not a simple cycle. 

Our second algorithm begins by enumerat- 
ing all 2°) Z>-minimal even subgraphs in G* 
in g°@n loglogn time, using our crossing se- 
quence algorithm. Our algorithm marks the faces 
on either side of an arbitrary edge of each Z2- 
minimal even subgraphs in G*. If X* is not 
a simple contractible cycle, then some pair of 
marked faces must be separated by X*. In other 
words, in g?)n log logn time, we identify a set 
T of 2°°8) vertices of G, at least two of which 
are separated by the global minimum cut. Thus, 
if we fix an arbitrary source vertex s and compute 
the minimum (s,¢)-cut for each vertex t € T in 
gn log logn time, the smallest such cut is the 
global minimum cut X. 


Open Problems 


Extending these algorithms to directed surface 
graphs remains an interesting open problem; 
currently the only effective approach known 
is to compute a maximum (s,f¢)-flow and 
apply the maxflow-mincut theorem. The recent 
algorithm of Borradaile and Klein [1] computes 
maximum flows in directed planar graphs 
in O(nlogn) time. For higher-genus graphs, 
Chambers et al. [7] describes maximum-flow 
algorithms that run in g?)n3/? time for 
arbitrary capacities and in O(g®n log” n log? C) 
for integer capacities that sum to C. 

Another open problem is reducing the de- 
pendencies on the genus from exponential to 
polynomial. Even though there are near-quadratic 
algorithms to compute minimum cuts, the only 
known approach to achieving near-linear time for 
bounded-genus graphs with weighted edges is to 
solve an NP-hard problem. 

Finally, it is natural to ask whether minimum 
cuts can be computed quickly in other minor- 
closed families of graphs, for which embeddings 
on to bounded-genus surfaces may not exist. Such 
results already exist for one-crossing-minor-free 
families [4] and in particular, graphs of bounded 
treewidth [10]. 
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Problem Definition 


Global routing is a key step in VLSI physical 
design after floor planning and placement. Its 
main goal is to reduce the overall routing com- 
plexity and guide the detailed router by planning 
the approximate routing path of each net. The 
commonly used objectives during global routing 
include minimizing total wirelength, mitigating 
routing congestion, or meeting routing resource 
constraints. If timing critical paths are known, 
they can also be put in the design objectives 
during global routing, along with other metrics 
such as manufacturability and noise. 

The global routing problem can be formulated 
using graph models. For a given netlist graph 
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G(C, N), vertices C represent pins on placed 
objects such as standard cells or IP blocks, and 
edges N represent nets connecting the pins. The 
routing resources on a chip can be modeled in 
another graph G(V, E) by dividing the entire 
global routing region into a set of smaller re- 
gions, so-called global routing cells (G-cells), 
where v € V represents a G-cell ande ¢€ E 
represents the boundary between two adjacent 
G-cells with a given routing capacity (ce). Fig- 
ure | shows how the chip can be abstracted into 
a 2-dimensional global routing graph. Such ab- 
straction can be easily extended to 3-dimensional 
global routing graph to perform layer assign- 
ment (e.g., [15]). Since all standard cells and 
IP blocks are placed before the global routing 
stage (e.g., C can be mapped into V), the goal of 
global routing is to find G-cell to G-cell paths 
for N while trying to meet certain objectives 
such as routability optimization and wirelength 
minimization. 

A straightforward mathematical optimization 
for global routing can be formulated as a 0-1 
integer linear programming (ILP). Let R; be a 
set of Steiner trees on G for net n; € N, and 
x;,; be the binary variable to indicate whether 
r;,; © R; is selected as the routing solution. Then 
an example ILP formulation can be written as 
follows: 

The above formulation minimizes the total 
routing capacity utilization under the maximum 
routing capacity constraint for each edge e € 
E. In fact, minimizing the total routing capac- 
ity is equivalent to minimizing total wirelength, 
because a unit wirelength in the global routing 
utilizes one routing resource (i.e., crossing the 
boundary between two adjacent G-cells). Other 
objectives/constraints can include timing opti- 
mization, noise reduction [10, 14], or manufac- 
turability (e.g., CMP) [8]. 


Key Results 


The straightforward formulation using ILP, e.g., 
as in Fig.2, is NP-complete which cannot be 
solved efficiently for modern VLSI designs. One 
common technique to solve ILP is to use linear 
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Global Routing, Fig. 1 


Graph model for global 
routing. (a) Real circuit 


with G-cells. (b) Global 
routing graph 
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Global Routing, Fig. 2 An example of global routing 
formulation using ILP 


programming relaxation where the binary vari- 
ables are made continuous, x;,; € [0, 1], and 
that can be solved in polynomial time. Once a 
linear programming solution is obtained, round- 
ing technique is used to find the binary solution. 
Another technique is a hierarchical divide-and- 
conquer scheme to limit the complexity, which 
solves many independent subproblems of similar 
sizes. These approaches may suffer from large 
amount of rounding errors or lack of interactions 
between subproblems, resulting in poor quality. 

BoxRouter [1, 6] proposed a new approach 
to divide the entire routing region into a set 
of synergistic subregions. The key idea in 
BoxRouter is the progressive ILP based on the 
routing box expansion, which pushes congestion 
outward progressively from the highly congested 
region. Unlike conventional hierarchical divide- 
and-conquer approach, BoxRouter solves a 
sequence of ILP problems where an early 
problem is a subset of a later problem. BoxRouter 
progressively applies box expansion to build a 
sequence of ILP problems starting from the most 
congested region which is obtained through a 
very fast pre-routing stage. Figure 3 illustrates 
the concept of box expansion. 

The advantage of BoxRouter over conven- 
tional approach is that each problem synergi- 


Global Routing, Fig. 3 Box expansion for Progressive 
ILP during BoxRouter 


cally reflects the decisions made so far by taking 
the previous solutions as constraints, in order 
to enhance congestion distribution and shorten 
the wirelength. In that sense, the first ILP prob- 
lem has the largest flexibility which motivates 
the box expansion originating from the most 
congested region. Even though the last box can 
cover the whole design, the effective ILP size 
remains tractable in BoxRouter, as ILP is only 
performed on the wires between two subsequent 
boxes. 

Compared with the formulation in Fig.2 
which directly minimizes the total wirelength, 
progressive ILP in BoxRouter maximizes the 
completion rate (e.g., minimizing unrouted nets) 
which can be more important and _ practical 
than minimizing total wirelength as shown in 
Fig.4. Wirelength minimization in BoxRouter 
is indirectly achieved by allowing only the 
minimum rectilinear Steiner trees for each binary 
variable (i.e., x;,;) and being augmented with 
the post-maze routing step. Such change in the 
objective function provides higher computation 
efficiency: it is found that the BoxRouter 
formulation can be solved significantly faster 
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Global Routing, Fig. 4 Global routing formulation in 
BoxRouter for minimal unrouted nets 


than the traditional formulations due to its simple 
and well-exploited knapsack structure [7]. 

In case some nets remain unrouted after each 
ILP problem either due to insufficient routing 
resources inside a box or a limited number 
of Steiner graphs for each net, BoxRouter 
applies adaptive maze routing which penalizes 
using routing resources outside the current 
box, in order to reserve them for subsequence 
problems. Based on the new ILP techniques, 
BoxRouter has obtained much better results than 
previous state-of-the-art global routers [4,9] and 
motivated many further studies in global routing 
(e.g., [5, 11-13, 15]) and global routing contests 
at ISPD 2007 and ISPD 2008 [2, 3]. 
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Problem Definition 


Let G = (V,E) be an undirected graph with 
|V| = n and |E| = m. The edge connectivity 
of two vertices s,t € V, denoted by A(s,f), 
is defined as the size of the smallest cut that 
separates s and f; such a cut is called a minimum 
s —t cut. Clearly, one can represent the A(s, t) 
values for all pairs of vertices s and t in a table of 
size O(n”). However, for reasons of efficiency, 
one would like to represent all the A(s,t) values 
in a more succinct manner. Gomory-Hu trees 
(also known as cut trees) offer one such suc- 
cinct representation of linear (i.e., O(n)) space 
and constant (i.e., O(1)) lookup time. It has the 
additional advantage that apart from representing 
all the A(s,t) values, it also contains structural 
information from which a minimum s — ¢ cut can 
be retrieved easily for any pair of vertices s and f. 
Formally, a Gomory-Hu tree T = (V, F’*) of an 
undirected graph G = (V, E) is a weighted undi- 
rected tree defined on the vertices of the graph 
such that the following properties are satisfied: 


¢ For any pair of vertices s,t € V,A(s,t) is 
equal to the minimum weight on an edge in 
the unique path connecting s to ¢ in T. Call 
this edge e(s,t). If there are multiple edges 
with the minimum weight on the s to ¢ path 
in T, any one of these edges is designated as 
e(s,t). 

¢ For any pair of vertices s and f, the biparti- 
tion of vertices into components produced by 
removing e(s,¢) (if there are multiple candi- 
dates for e(s,t), this property holds for each 
candidate edge) from T corresponds to a min- 
imum s — ¢ cut in the original graph G. 


To understand this definition better, consider the 
following example. Figure | shows an undirected 
graph and a corresponding Gomory-Hu tree. Fo- 
cus on a pair of vertices, for instance, 3 and 5. 
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Gomory-Hu Trees, Fig. 1 An undirected graph (left) 
and a corresponding Gomory-Hu tree (right) 


Clearly, the edge (6,5) of weight 3 is a minimum- 
weight edge on the 3 to 5 path in the Gomory- 
Hu tree. It is easy to see that A(3,5) = 3 
in the original graph. Moreover, removing edge 
(6,5) in the Gomory-Hu tree produces the vertex 
bipartition ({1,2,3,6},{4,5}), which is a cut of 
size 3 in the original graph. 

It is not immediate that such Gomory-Hu trees 
exist for all undirected graphs. In a classical result 
in 1961, Gomory and Hu [8] showed that not 
only do such trees exist for all undirected graphs 
but that they can also be computed using n — 1 
minimum s-t cut (or equivalently maximum s- 
t flow) computations. In fact, a graph can have 
multiple Gomory-Hu trees. 

All previous algorithms for constructing 
Gomory-Hu trees for undirected graphs used 
maximum-flow subroutines. Gomory and Hu 
gave an algorithm to compute a cut tree T using 
n — 1 maximum-flow computations and graph 
contractions. Gusfield [9] proposed an algorithm 
that does not use graph contractions; all n — 1 
maximum-flow computations are performed on 
the input graph. Goldberg and Tsioutsiouliklis 
[7] did an experimental study of the algorithms 
due to Gomory and Hu and due to Gusfield 
for the cut tree problem and described efficient 
implementations of these algorithms. Examples 
were shown by Benczir [1] that cut trees do not 
exist for directed graphs. 

Any maximum-flow-based approach for con- 
structing a Gomory-Hu tree would have a running 
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time of (7 — 1) times the time for computing 
a single maximum flow. Till now, faster algo- 
rithms for Gomory-Hu trees were by-products 
of faster algorithms for computing a maximum 
flow. The current fastest O(m+nA(s, t)) (polylog 
n factors ignored in O notation) maximum-flow 
algorithm, due to Karger and Levine [11], yields 
the current best expected running time of O(n?) 
for Gomory-Hu tree construction on simple un- 
weighted graphs with n vertices. Bhalgat et al. 
[2] improved this time complexity to O(mn). 
Note that both Karger and Levine’s algorithm 
and Bhalgat et al.’s algorithm are randomized 
Las Vegas algorithms. The fastest deterministic 
algorithm for the Gomory-Hu tree construction 
problem is a by-product of Goldberg and Rao’s 
maximum-flow algorithm [6] and has a running 
time of O(nm!/? min(m, n3/2)). 

Since the publication of the results of Bhalgat 
et al. [2], it has been observed that the maximum- 
flow subroutine of Karger and Levine [11] can 
also be used to obtain an O(mn) time Las Vegas 
algorithm for constructing the Gomory-Hu tree 
of an unweighted graph. However, this algorithm 
does not yield partial Gomory-Hu trees which are 
defined below. For planar undirected graphs, Bor- 
radaile et al. [3] gave an O(mn) time algorithm 
for constructing a Gomory-Hu tree. 

It is important to note that in spite of the 
tremendous recent progress in approximate max- 
imum s-t flow (or approximate minimum s-f cut) 
computation, this does not immediately trans- 
late to an improved algorithm for approximate 
Gomory-Hu tree construction. This is because of 
two reasons: first, the property of uncrossability 
of minimum s-t cuts used by Gomory and Hu in 
their minimum s-f cut based cut tree construction 
algorithm does not hold for approximate mini- 
mum s-t cuts, and second, the errors introduced 
in individual minimum s-t cut computation can 
add up to create large errors in the Gomory-Hu 
tree. 


Key Results 


Bhalgat et al. [2] considered the problem of 
designing an efficient algorithm for construct- 
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ing a Gomory-Hu tree on unweighted undirected 
graphs. The main theorem shown in this entry is 
the following. 


Theorem 1 Let G = (V,E) be a simple un- 
weighted graph with m edges and n vertices. 
Then a Gomory-Hu tree for G can be built in 
expected time O(mn). 


Their algorithm is always faster by a factor of 
Q(n?/9) (polylog n factors ignored in Q nota- 
tion) compared to the previous best algorithm. 

Instead of using maximum-flow subroutines, 
they use a Steiner connectivity algorithm. The 
Steiner connectivity of a set of vertices S (called 
the Steiner set) in an undirected graph is the 
minimum size of a cut which splits S into two 
parts; such a cut is called a minimum Steiner cut. 
Generalizing a tree-packing algorithm given by 
Gabow [5] for finding the edge connectivity of a 
graph, Cole and Hariharan [4] gave an algorithm 
for finding the Steiner connectivity k of a set of 
vertices in either undirected or directed Eulerian 
unweighted graphs in O(mk?) time. (For undi- 
rected graphs, their algorithm runs a little faster 
in time O(m + nk3).) Bhalgat et al. improved 
this result and gave the following theorem. 


Theorem 2 In an undirected or directed Eule- 
rian unweighted graph, the Steiner connectivity 
k of a set of vertices can be determined in time 


O(mk). 


The algorithm in [4] was used by Hariharan 
et al. [10] to design an algorithm with expected 
running time O(m + nk?) to compute a par- 
tial Gomory-Hu tree for representing the A(s, t) 
values for all pairs of vertices s,t¢ that satisfied 
A(s,t) < k. Replacing the algorithm in [4] by 
the new algorithm for computing Steiner connec- 
tivity yields an algorithm to compute a partial 
Gomory-Hu tree in expected running time O(m+ 
nk”), Bhalgat et al. showed that using a more 
detailed analysis, this result can be improved to 
give the following theorem. 


Theorem 3 The partial Gomory-Hu tree of an 
undirected unweighted graph to represent all 
X(s,t) values not exceeding k can be constructed 
in expected time O(mk). 
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Since A(s,t) <n for all s,t vertex pairs in an 
unweighted (and simple) graph, setting k to n in 
Theorem 3 implies Theorem 1. 


Applications 


Gomory-Hu trees have many applications in mul- 
titerminal network flows and are an important 
data structure in graph connectivity literature. 


Open Problems 

The problem of derandomizing the algorithm 
due to Bhalgat et al. [2] to produce an O(mn) 
time deterministic algorithm for constructing 
Gomory-Hu trees for unweighted undirected 
graphs remains open. The other main challenge 
is to extend the results in [2] to weighted graphs. 


Experimental Results 


Goldberg and Tsioutsiouliklis [7] did an exten- 
sive experimental study of the cut tree algorithms 
due to Gomory and Hu [8] and that due to 
Gusfield [9]. They showed how to efficiently 
implement these algorithms and also introduced 
and evaluated heuristics for speeding up the algo- 
rithms. Their general observation was that while 
Gusfield’s algorithm is faster in many situations, 
Gomory and Hu’s algorithm is more robust. For 
more detailed results of their experiments, refer 
to [7]. 

No experimental results are reported for the 
algorithm due to Bhalgat et al. [2]. 
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Problem Definition 


Given an input string S, the grammar-based 
compression problem is to find a_ small 
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Grammar Compression, Fig. 1 (Left) an SLP G that 
represents string aabaaaaaba, where the variable X7 
is the start symbol. (Center) derivation tree of G. (Right) 


description of S that is based on a deterministic 
context-free grammar that generates a language 
consisting only of S. We will call such 
a context-free grammar, a grammar that 
represents S. 

Generally, grammar-based compression can 
be divided into two phases [8], the grammar 
transform phase, where a context-free grammar 
G that represents string S is computed, and the 
grammar encoding phase, where an encoding for 
G is computed. Kieffer and Yang [8] showed that 
if a grammar transform is irreducible, namely, if 
the resulting grammar that represents S' satisfies 
the following three conditions: (1) distinct vari- 
ables derive different strings, (2) every variable 
other than the start symbol is used more than 
once (rule utility), and (3) all pairs of symbols 
have at most one nonoverlapping occurrence in 
the right-hand side of production rules (di-gram 
uniqueness); then, the grammar-based code using 
a zero order arithmetic code for encoding the 
grammar is universal. 

Grammar-based compression algorithms dif- 
fer mostly by how they perform the grammar 
transform, which can be stated as the following 
problem. 


Problem 1 (Smallest Grammar Problem) 
Given an input string S of length N, out- 
put the smallest context-free grammar _ that 
represents S. 


an ordered DAG corresponding to G, where the solid 
and dashed edges respectively correspond to the first and 
second child of each node 


Here, the size of the grammar is defined as the 
total length of the right-hand side of the produc- 
tion rules in the grammar. Often, grammars are 
considered to be in the Chomsky normal form, 
in which case the grammar is called a straight 
line program (SLP) [7], i.e., the right-hand side of 
each production rule is either a terminal character 
or a pair of variables. Note that any grammar 
of size n can be converted into an SLP of size 
O(n). Figure 1 shows an example of an SLP 
that represents string aabaaaaaba. A grammar 
representing a string can be considered as an or- 
dered directed acyclic graph. Another important 
feature is that grammars allow for exponential 
compression, that is, the size of a grammar that 
represents a string of length N can be as small as 
O(log NV). 

Grammar-based compression is known to 
be especially suitable for compressing highly 
repetitive strings, for example, multiple whole 
genomes, where, although each individual string 
may not be easily compressible, the ensemble of 
strings is very compressible since each string is 
very similar to each other. Also, due to its ease of 
manipulation, grammar-based representation of 
strings is a frequently used model for compressed 
string processing, where the aim is to efficiently 
process compressed strings without explicit 
decompression. Such an approach allows for 
theoretical and practical speedups compared to a 
naive decompress-then-process approach. 
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Key Results 


Hardness 

The smallest grammar problem is known to 
be NP-hard [21]. The approximation ratio of 
a grammar-based algorithm A is defined as 
IGal 
ies’ 
grammar that represents string S produced 
by A and IGe| is the size of the smallest 
grammar that represents string S. Charikar et 
al. [3] showed that there is no polynomial-time 
algorithm for the smallest grammar problem 
with approximation ratio less than Been unless 
P = NP. Furthermore, they show that for a 
given set {k1,...,km} of positive integers, the 
smallest grammar for string a*!ba*2b---bakm 
is within a constant factor of the smallest 
number of multiplications required to compute 
Foss given a real number x. This is 
a well-studied problem known as the addition 
chain problem, whose best-known approximation 


maxsey* where IG3| is the size of the 


k 
4 a 


algorithm has an approximation ratio of 
log N + ait log N 
O( ae wv) [23]. Thus, achieving o0( een) 


approximation for the smallest grammar problem 
may be difficult. 


Algorithms for Finding Small Grammars 


Heuristics 

Below, we give brief descriptions of several 
grammar-based compression algorithms based on 
simple greedy heuristics for which approximation 
ratios have been analyzed [3] (see Table 1). 


¢ LZ78 [24] can be considered as constructing 
a grammar. Recall that each LZ78 factor of 
length at least two consists of a previous 
factor and a letter and can be expressed as a 
production rule of a grammar. 

¢ SEQUITUR [15] processes the string in an 
online manner and adds a new character of 
the string to the right-hand side of the pro- 
duction rule of the start symbol, which is 
initially empty. For each new character, the 
algorithm updates the grammar, adding or 
removing production rules and replacing cor- 
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responding symbols in the grammar so that the 
di-gram uniqueness and rule utility properties 
are satisfied. The algorithm can be imple- 
mented to run in expected linear time. The 
grammar produced by SEQUITUR is not nec- 
essarily irreducible, and thus a revised version 
called SEQUENTIAL was proposed in [8]. 

¢ RE-PAIR [11] greedily and recursively re- 
places the most frequent di-gram in the string 
with a new symbol until no di-gram occurs 
more than once. Each such replacement cor- 
responds to a new production rule in the final 
grammar. The algorithm can be implemented 
to run in linear time. 

¢ LONGEST MATCH [8] greedily and recur- 
sively replaces the longest substring that has 
more than one nonoverlapping occurrence. 
The algorithm can be implemented to run in 
linear time by carefully maintaining a struc- 
ture based on the suffix tree, through the 
course of the algorithm [10, 14]. 

¢ GREEDY [1] (originally called OFF-LINE, 
but coined in [3]) greedily and recursively 
replaces substrings that give the highest com- 
pression (with several variations in its defini- 
tion). The algorithm can be implemented to 
run in O(N log NV) time for each production 
tule, utilizing a data structure called minimal 
augmented suffix trees, which augments the 
suffix tree in order to consider the total num- 
ber of nonoverlapping occurrences of a given 
substring. 

¢ BISECTION [9] recursively partitions the 
string S into strings L and R of lengths 2’ and 
N —2', where i = [log N] — 1, each time 
forming a production rule Xs5 > X,XpR. A 
new production rule is created only for each 
distinct substring, and the rule is shared for 
identical substrings. The algorithm can be 
viewed as fixing the shape of the derivation 
tree and then computing the smallest grammar 
whose derivation tree is of the given 
shape. 


Approximation Algorithms 
Rytter [16] and Charikar et al. [3] independently 
and almost simultaneously developed linear time 
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Grammar Compression, Table 1 Known upper and lower bounds on approximation ratios for the simple heuristic 


algorithms (Taken from [3] with corrections) 


Algorithm Upper bound Lower bound 

LZ78 [24] Q(N?/3/log N) 

RE-PAIR [11] O((N/ log N)2/3) Q(./log N) 

LONGEST MATCH [8] 92 (log log N) 

GREEDY [1] (Slog 3)/(3 log 5) > 1.137... 
SEQUENTIAL [8] O((N/ log N)3/4) Q(N!/3) 

BISECTION [9] O((N/ log N)!/2) Q(N!/2/log N) 


Grammar Compression, Table 2. Approximation algorithms for the smallest grammar problem. JN is the size of the 


input string, and 77 is the size of the output grammar 
Algorithm 


Charikar et al. [3] 
Rytter [16] 


Approximation ratio 


LEVELWISE- O(log(N/Ge")) 
REPAIR [18] 

Jez [5] 

Jez [6] 

LCA [19] O((log N) log G2") 
LCA* [20] O((log* N) log N) 
OLCA [12] O(log? N) 

FOLCA [13] O(log? N) 


algorithms which achieve the currently best ap- 
proximation ratio of O(log(N/G3")), essentially 
relying on the same two key ideas: the LZ77 fac- 
torization and balanced binary grammars. Below, 
we briefly describe the approach by Rytter to 
obtain an O(log N) approximation algorithm. 

The string is processed from left to right, 
and the LZ77 factorization of the string helps 
to reuse, as much as possible, the grammar of 
previously occurring substrings. For string S, 
let S = fi... fg be the LZ77 factorization 
of S. The algorithm sequentially processes each 
LZ factor fj, maintaining a grammar G; for 
ji... fj. Recall that by definition, each factor 
jf, of length at least 2 occurs in f,... fj-1. 
Therefore, there exists a sequence of O(h;_1) 
variables of grammar G;—; whose concatenation 
represents f;, where h;—-, is the height of the 
derivation tree of G;_1. Using this sequence of 
variables, a grammar for f; is constructed, which 
is then subsequently appended to G;_, to finally 
construct G;. 


Working space Running time 


O(N) O(N) 

O(n) O(N) expected 

O(n) O(N log* N) expected 
O(n) O(N) expected 

2n logn(1+o(1))+2n O(N log N) 

bits 


A balanced binary grammar is a grammar in 
which the shape of the derivation tree resembles 
a balanced binary tree. Rytter proposed AVL 
(height balanced) grammars, where the height 
of sibling sub-trees differ by at most one. By 
restricting the grammar to AVL grammars, the 
height of the grammar is bounded by O(log NV), 
and the above operations can be performed in 
O(log N) time for each LZ77 factor, by adding 
O(log N) new variables and using techniques 
resembling those of binary balanced search trees 
for re-balancing the tree. The resulting time com- 
plexity as well as the size of the grammar is 
O(z log N). 

Finally, an important observation is that the 
size of the LZ77 factorization of a string S' is a 
lower bound on the size of any grammar G that 
represents S. 


Theorem 1 ((3, 16]) For string S, let S = 
Si... fz be the LZ77 factorization of S. Then, 
for any grammar G that represents S, z < |G|. 


Grammar Compression 


Thus, the total size of the grammar is 
oc." log NV), achieving an O(log NV) approxi- 
mation ratio. Instead of AVL grammars, Charikar 
et al. use a balanced (length balanced) grammars, 
where the ratio between the lengths of sibling 
sub-trees is between 72, and 1“ for some 
constant0 <a < s, but the remaining arguments 
are similar. 

Several other linear time algorithms that 
achieve O(log(N/G$")) approximation have 
been proposed [5, 6, 18]. These algorithms 
resemble RE-PAIR in that they basically 
replace di-grams in the string with a new 
symbol in a bottom-up fashion but with specific 
mechanisms to choose the di-grams so that a 
good approximation ratio is achieved. 

LCA and its variants [12, 13, 19, 20] are ap- 
proximation algorithms shown to be among the 
most scalable and practical. The approximation 
ratios are slightly weaker, but the algorithm can 
be made to run in an online manner and to use 
small space (see Table 2). Although seemingly 
proposed independently, the core idea of LCA 
is essentially the same as LCP [17] which con- 
structs a grammar based on a technique called 
locally consistent parsing. The parsing is a par- 
titioning of the string that can be computed using 
only local characteristics and guarantees that for 
any two occurrences of a given substring, the par- 
titioning in the substring will be almost identical 
with exceptions in a sufficiently short prefix and 
suffix of the substring. This allows the production 
rules of the grammar to be more or less the 
same for repeated substrings, thus bounding the 
approximation ratio. 


Decompression 

The string that a grammar represents can be 
recovered in linear time by a simple depth-first 
left-to-right traversal on the grammar. Given an 
SLP G of size n that represents a string S of 
length NV, G can be preprocessed in O(n) time 
and space so that each variable holds the length 
of the string it derives. Using this information, it 
is possible to access S[i] for any 1 <i < N 
in O(h) time, where h is the height of the SLP, 
by simply traversing down the production rules 
starting from the start symbol until reaching a ter- 
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minal character corresponding to S[i]. Balanced 
SLPs have height O(log NV’) and, therefore, allow 
access to any position of S in O(log NV) time. For 
any grammar G, G can be preprocessed in O(n) 
time and space, so that an arbitrary substring of 
length / of S can be obtained in O(/ + log NV) 
time [2]. Also, G can be preprocessed in O(n) 
time and space so that the prefix or suffix of any 
length / for any variable in G can be obtained 
in O(/) time [4]. On the other hand, it has been 
shown that using any data structure of size poly- 
nomial in 7, the time for retrieving a character at 
an arbitrary position is at least (log N)!~€ for any 
constant € > 0 [22]. 


URLs to Code and Data Sets 


Publicly available implementations of SE- 
QUITUR: 

¢  http://www.sequitur.info 

Publicly available implementations of RE- 


PAIR: 


¢  http://www.dcc.uchile.cl/~gnavarro/software/ 
repair.tgz 

¢ http://www.cbrc.jp/~rwan/en/restore.html, 
and 

¢ https://code.google.com/p/re-paitr/ 


Publicly available implementations of GREEDY 
(OFF-LINE): 


¢ http://www.cs.ucr.edu/~stelo/Offline/. 


Publicly available implementations of LCA vari- 
ants: 


¢ https://code.google.com/p/lcacomp/ 
¢ https://github.com/tb-yasu/olca-plus-plus 


Cross-References 


Arithmetic Coding for Data Compression 
Lempel-Ziv Compression 
Pattern Matching on Compressed Text 


866 


Recommended Reading 


1. 


10. 


11. 


12. 


14. 


15. 


16. 


17. 


Apostolico A, Lonardi S (2000) Off-line compres- 
sion by greedy textual substitution. Proc IEEE 
88(11):1733-1744 

Bille P, Landau GM, Raman R, Sadakane K, Satti 
SR, Weimann O (2011) Random access to grammar- 
compressed strings. In: Proceedings of the SODA’ 11, 
San Francisco, pp 373-389 

Charikar M, Lehman E, Liu D, Panigrahy R, Prab- 
hakaran M, Sahai A, Shelat A (2005) The small- 
est grammar problem. IEEE Trans. Inf. Theory 
51(7):2554—2576 

Gasieniec L, Kolpakov R, Potapov I, Sant P (2005) 
Real-time traversal in grammar-based compressed 
files. In: Proceedings of the DCC’05, Snowbird, 
p 458 

Jez A (2013) Approximation of grammar-based com- 
pression via recompression. In: Proceedings of the 
CPM’ 13, Bad Herrenalb, pp 165-176 

Jez A (2014) A really simple approximation of 
smallest grammar. In: Proceedings of the CPM’ 14, 
Moscow, pp 182-191 

Karpinski M, Rytter W, Shinohara A (1997) 
An efficient pattern-matching algorithm for strings 
with short descriptions. Nord J Comput 4:172- 
186 

Kieffer JC, Yang EH (2000) Grammar-based codes: 
a new class of universal lossless source codes. IEEE 
Trans Inf Theory 46(3):737-754 

Kieffer J, Yang E, Nelson G, Cosman P (2000) 
Universal lossless compression via multilevel pat- 
tern matching. IEEE Trans Inf Theory 46(4):1227- 
1245 

Lanctét JK, Li M, Yang E (2000) Estimating DNA 
sequence entropy. In: Proceedings of the SODA’00, 
San Francisco, pp 409-418 

Larsson NJ, Moffat A (2000) Off-line dictionary- 
based compression. Proc IEEE 88(11):1722-1732 
Maruyama S, Sakamoto H, Takeda M (2012) An on- 
line algorithm for lightweight grammar-based com- 
pression. Algorithms 5(2):214—235 


. Maruyama S, Tabei Y, Sakamoto H, Sadakane K 


(2013) Fully-online grammar compression. In: Pro- 
ceedings of the SPIRE’ 13, Jerusalem, pp 218-229 
Nakamura R, Inenaga S, Bannai H, Funamoto T, 
Takeda M, Shinohara A (2009) Linear-time text 
compression by longest-first substitution. Algorithms 
2(4):1429-1448 

Nevill-Manning CG, Witten IH (1997) Identifying 
hierarchical structure in sequences: a linear-time al- 
gorithm. J Artif Intell Res 7(1):67—82 

Rytter W (2003) Application of Lempel-Ziv 
factorization to the approximation of grammar- 
based compression. Theor Comput Sci 302(1-3): 
211-222 

Sahinalp SC, Vishkin U (1995) Data compression 
using locally consistent parsing. Technical report, 
UMIACS Technical Report 


Graph Bandwidth 


18. Sakamoto H (2005) A fully linear-time approxima- 
tion algorithm for grammar-based compression. J 
Discret Algorithms 3(2—-4):416—430 

Sakamoto H, Kida T, Shimozono S (2004) A space- 
saving linear-time algorithm for grammar-based com- 
pression. In: Proceedings of the SPIRE’04, Padova, 
pp 218-229 

Sakamoto H, Maruyama S, Kida T, Shimozono 
S (2009) A space-saving approximation algorithm 
for grammar-based compression. IEICE Trans 92- 
D(2):158-165 

Storer JA (1977) NP-completeness results concerning 
data compression. Technical report 234, Department 
of Electrical Engineering and Computer Science, 
Princeton University 

Verbin E, Yu W (2013) Data structure lower bounds 
on random access to grammar-compressed strings. In: 
Proceedings of the CPM’ 13, Bad Herrenalb, pp 247— 
258 

Yao ACC (1976) On the evaluation of powers. SIAM 
J Comput 5(1):100—1-03 

Ziv J, Lempel A (1978) Compression of individual 
sequences via variable-length coding. IEEE Trans Inf 
Theory 24(5):530-536 


19. 


20. 


21. 


22. 


23; 


24. 


Graph Bandwidth 


James R. Lee 

Department of Computer Science and 
Engineering, University of Washington, Seattle, 
WA, USA 


Keywords 


Approximation algorithms; Graph bandwidth; 
Metric embeddings 


Years and Authors of Summarized 
Original Work 


1998; Feige 

2000; Feige 

Problem Definition 

The graph bandwidth problem concerns produc- 


ing a linear ordering of the vertices of a graph 
G =(V,E) so as to minimize the maximum 


Graph Bandwidth 


“stretch” of any edge in the ordering. Formally, 
let n= |V|, and consider any one-to-one 
mapping 2: V — {1,2,...,2}. The bandwidth 
of this ordering is bW;,(G) = maxy, vee |T(u) 
—n(v)|. The bandwidth of G is given by 
the bandwidth of the best possible ordering: 
bw(G) = min, bw, (G). 

The original motivation for this problem lies 
in the preprocessing of sparse symmetric square 
matrices. Let A be such an n Xn matrix, and 
consider the problem of finding a permutation 
matrix P such that the non-zero entries of P™ AP 
all lie in as narrow a band as possible about the 
diagonal. This problem is equivalent to minimiz- 
ing the bandwidth of the graph G whose vertex 
set is {1,2,...,} and which has an edge {u, v} 
precisely when A, # 0. 

In lieu of this fact, one tries to efficiently 
compute a linear ordering m for which 
bw,,(G) < A-bw(G), with the approximation 
factor A is as small as possible. There is even ev- 
idence that achieving any value A = O(1) is NP- 
hard [18]. Much of the difficulty of the bandwidth 
problem is due to the objective function being 
a maximum over all edges of the graph. This 
makes divide-and-conquer approaches ineffective 
for graph bandwidth, whereas they often succeed 
for related problems like Minimum Linear Ar- 
rangement [6] (here the objective is to minimize 
ies |7(u) — z(v)|). Instead, a more global 
algorithm is required. To this end, a good lower 
bound on the value of bw(G) has to be initially 
discussed. 


The Local Density 

For any pair of vertices u,v eV, let dtu, 
v) to be the shortest path distance be- 
tween u and v in the graph G. Then, de- 
fine B(v,r)={ueV:d(u,v) <r} as the 
ball of radius r about a vertex veV. 
Finally, the local density of G is defined by 
D(G) = maxyey,r>1|B(v,r)|/(r). It is not 
difficult to see that bw(G) > D(G). Although 
it was conjectured that an upper bound of 
the form bw(G) < poly(logn)- D(G)_ holds, 
it was not proven until the seminal work of 
Feige [7]. 
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Key Results 
Feige proved the following. 


Theorem 1 There is an efficient algorithm that, 
given a graph G = (V,E) as input, produces 
a linear ordering m: VV — {1,2,...,n} for which 


bw,,(G) < O ((logn)? /Togn loglogn)-D(G). 
In particular, this provides a_ poly(logn)- 
approximation algorithm for the bandwidth 
problem in general graphs. 


Feige’s algorithmic framework can be described 
quite simply as follows. 


1. Compute a representation f: V > R” of Gin 
Euclidean space. 

2. Let u1,U2,...,U, be independent N(0, 1). 
(N(0; 1) denotes a standard normal random 
variable with mean 0 and variance 1.) random 
variables, and for each vertex v € V, compute 
h(v) = >°7_, ui fi(v), where f;(v) is the ith 
coordinate of the vector f(v). 

3. Sort the vertices by the value h(v), breaking 
ties arbitrarily, and output the induced linear 
ordering. 


An equivalent characterization of steps (2) and 
(3) is to choose a uniformly random vector 
aeéS”~! from the (n — 1)-dimensional sphere 
S"-! CR" and output the linear ordering 
induced by the values h(v) = (a, f(v)), where 
(-,-) denotes the usual inner product on R”. 
In other words, the algorithm first computes 
a map f:V — R", projects the images of the 
vertices onto a randomly oriented line, and then 
outputs the induced ordering; step (2) is the 
standard way that such a random projection is 
implemented. 


Volume-Respecting Embeddings 

The only step left unspecified is (1); the function 
f has to somehow preserve the structure of the 
graph G in order for the algorithm to output 
a low-bandwidth ordering. The inspiration for 
the existence of such an f comes from the 
field of low-distortion metric embeddings (see, 
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e.g., [2, 14]). Feige introduced a generalization 
of low-distortion embeddings to mappings called 
volume respecting embeddings. Roughly, the 
map f should be non-expansive, in the sense that 
| fu) — f(v)|| < 1 for every edge {u,v} € E, 
and should satisfy the following property: For 
any set of k vertices v1,...,ux, the (k — 1)- 
dimensional volume of the convex hull of the 
points f(v1),..., f(vg) should be as large 
as possible. The proper value of k is chosen 
to optimize the performance of the algorithm. 
Refer to [7, 10, 11] for precise definitions on 
volume-respecting embeddings, and a detailed 
discussion of their construction. Feige showed 
that a modification of Bourgain’s embedding [2] 
yields a mapping {:V — R” which is good 
enough to obtain the results of Theorem 1. 

The requirement || f(u) — f(v)|| < 1 for ev- 
ery edge {u,v} is natural since f(u) and f(v) 
need to have similar projections onto the random 
direction a; intuitively, this suggests that u and 
v will not be mapped too far apart in the induced 
linear ordering. But even if |h(u) — h(v)| is small, 
it may be that many vertices project between h(u) 
and h(v), causing u and v to incur a large stretch. 
To prevent this, the images of the vertices should 
be sufficiently “spread out,’ which corresponds 
to the volume requirement on the convex hull of 
the images. 


Applications 


As was mentioned previously, the graph band- 
width problem has applications to preprocess- 
ing sparse symmetric matrices. Minimizing the 
bandwidth of matrices helps in improving the 
efficiency of certain linear algebraic algorithms 
like Gaussian elimination; see [3, 8, 17]. Follow- 
up work has shown that Feige’s techniques can be 
applied to VLSI layout problems [19]. 


Open Problems 


First, state the bandwidth conjecture (see, 
e.g., [13]). 

Conjecture: For any n-node graph G = (V, E), 
one has bw(G) = O(logn)- D(G). 


Graph Bandwidth 


The conjecture is interesting and unresolved 
even in the special case when G is a tree (see [9] 
for the best results for trees). The best-known 
bound in the general case follows from [7, 10], 
and is of the form bw(G) = O(logn)?° - D(G). 
It is known that the conjectured upper bound is 
best possible, even for trees [4]. One suspects that 
these combinatorial studies will lead to improved 
approximation algorithms. 

However, the best approximation algorithms, 
which achieve ratio O((logn)3(loglogn)!/*), 
are not based on the local density bound. 
Instead, they are a hybrid of a semi-definite 
programming approach of [1, 5] with the 
arguments of Feige, and the volume-respecting 
embeddings constructed in [12, 16]. Determining 
the approximability of graph bandwidth is an 
outstanding open problem, and likely requires 
improving both the upper and lower bounds. 
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Problem Definition 


An independent set in an undirected graph G = 
(V, E) is a set of vertices that induce a subgraph 
which does not contain any edges. The size of 
the maximum independent set in G is denoted 
by a(G). For an integer k, a k-coloring of G 
is a function o : V — [1...k] which assigns 
colors to the vertices of G. A valid k-coloring 
of G is a coloring in which each color class 
is an independent set. The chromatic number 
x(G) of G is the smallest k for which there 
exists a valid k-coloring of G. Finding y7(G) is 
a fundamental NP-hard problem. Hence, when 
limited to polynomial time algorithms, one turns 
to the question of estimating the value of y(G) 
or to the closely related problem of approximate 
coloring. 


Problem 1 (Approximate coloring) 

INPUT: Undirected graph G = (V, E). 

OuTpuT: A valid coloring of G with r - y(G) 
colors, for some approximation ratio r > 1. 
OBJECTIVE: Minimize r. 


Let G be a graph of size n. The approximate 
coloring of G can be solved efficiently within an 


approximation ratio of r = O ( ztesigen* ) [12]. 
This holds also for the approximation of a(G) 
[8]. These results may seem rather weak; how- 
ever, it is NP-hard to approximate a(G) and 
x(G) within a ratio of n!~* for any constant 
é > 0 [9, 14, 23]. Under stronger complexity 
assumptions, there is some constant 0 < 6 < 1 
such that neither problem can be approximated 
within a ratio of n/ glos? n [19,23]. This entry will 
concentrate on the problem of coloring graphs 
G for which y(G) is small. As will be seen, 
in this case the approximation ratio achievable 
significantly improves. 
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Vector Coloring of Graphs 

The algorithms achieving the best ratios for ap- 
proximate coloring when y(G) is small are all 
based on the idea of vector coloring, introduced 
by Karger, Motwani, and Sudan [17]. (Vector 
coloring as presented in [17] is closely related to 
the Lovasz 0 function [21]. This connection will 
be discussed shortly.) 


Definition 1 A vector k-coloring of a graph is 
an assignment of unit vectors to its vertices, 
such that for every edge, the inner product of 
the vectors assigned to its endpoints is at most 
(in the sense that it can only be more negative) 


=1/(k—1), 


The vector chromatic number FY(G) of G is 
the smallest k for which there exists a vector k- 
coloring of G. The vector chromatic number can 
be formulated as follows: 


Y(G) Minimize k 
subject to: (v;,v;) < an VG, j) Ek 
(vj,v;) = 1 VieV 


Here, assume that V = [1,...,m] and that the 
vectors {u;}7_, are in R”. Every k-colorable 
graph is also vector k-colorable. This can be seen 
by identifying each color class with one vertex of 
a perfect (k — 1)-dimensional simplex centered 
at the origin. Moreover, unlike the chromatic 
number, a vector k-coloring (when it exists) can 
be found in polynomial time using semidefinite 
programming (up to an arbitrarily small error in 
the inner products). 


Claim 1 (Complexity of vector coloring [17]) 
Let ¢ > 0. If a graph G has a vector k-coloring, 
then a vector (kK + ¢)-coloring of the graph 
can be constructed in time polynomial in n and 
log(1/e). 


One can strengthen Definition 1 to obtain a 
different notion of vector coloring and the vector 
chromatic number: 


Y2(G) Minimize k 
subject to: (uj,u;) = —4 VG, j)Ek 
(y;,0:)=1  WieV 


Graph Coloring 


¥3(G) Minimize k 
subject to: (vj,vj) =—-z4, VG, j)E£ 
(vj,0j) = 4 Vij eV 


(vj,v;) = 1 VieV 


The function F2(G) is referred to as the strict 
vector chromatic number of G and is equal to the 
Lovasz 6 function on G [17,21], where G is the 
complement graph of G. The function ¥ 3(G) is 
referred to as the strong vector chromatic number. 
An analog to Claim 1 holds for both ¥2(G) 
and ¥3(G). Let w(G) denote the size of the 
maximum clique in G; it holds that w(G) < 
H(G) < F2G) < F¥3(G) < xO). 


Key Results 


In what follows, assume that G has n vertices and 
maximal degree A. The O(-) and Q(-) notation 
are used to suppress polylogarithmic factors. We 
now state the key result of Karger, Motwani, and 
Sudan [17]: 


Theorem 1 ({17]) If ¥(G) = k, then G 
can be colored in polynomial time using 
min{ O(At-2/*), 

O(n'-3/k+D) colors. 


As mentioned above, the use of vector color- 
ing in the context of approximate coloring was 
initiated in [17]. Roughly speaking, once given a 
vector coloring of G, the heart of the algorithm 
in [17] finds a large independent set in G. In 
a nutshell, this independent set corresponds to 
a set of vectors in the vector coloring which 
are close to one another (and thus by definition 
cannot share an edge). Combining this with the 
ideas of Wigderson [22] mentioned below yields 
Theorem 1. 

We proceed to describe related work. The first 
two theorems below appeared prior to the work 
of Karger, Motwani, and Sudan [17]. 


Theorem 2 ((22]) If y(G) = k, then G 
can be colored in polynomial time using 
O(kn!—"/&-)) colors. 
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Theorem 3 ({1]) /f ¥(G) = 3, then G can be 
colored in polynomial time using O(n?/8) colors. 
If x(G) = k > 4 then G can be colored in 
polynomial time using at most O(n'—'/k-3/2)) 
colors. 


Combining the techniques of [17] and [1], the 
following results were obtained for graphs G with 
x(G) = 3, 4 (these results were also extended for 
higher values of 7(G)). 


Theorem 4 ((2]) Jf ¥(G) = z 
be colored in polynomial time using O(n 
colors. 


Theorem 5 ((13]) Jf y(G) = 4, then G can 
be colored in polynomial time using O(n7!!9) 
colors. 


3, then G can 
3/14) 


The currently best known result for coloring 
a 3-colorable graph is presented in [16]. The 
algorithm of [16] combines enhanced notions of 
vector coloring presented in [5] with the combi- 
natorial coloring techniques of [15]. 


Theorem 6 ([16]) 7f ¥(G) = 3, then G can 
be colored in polynomial time using O(n°-1999°) 
colors. 


To put the above theorems in perspective, it 
is NP-hard to color a 3-colorable graph G with 
4 colors [11, 18] and a k-colorable graph (for 
sufficiently large k) with k oa colors [19]. Under 
stronger complexity assumptions (related to the 
unique games conjecture [20]) for any constant 
k, it is hard to color a k-colorable graph with 
any constant number of colors [6]. The wide gap 
between these hardness results and the approxi- 
mation ratios presented in this section has been 
a major initiative in the study of approximate 
coloring. 

Finally, the limitations of vector coloring are 
addressed. Namely, are there graphs for which 
F(G) is a poor estimate of y(G)? One would 
expect the answer to be “yes” as estimating y(G) 
beyond a factor of n!~® is a hard problem. As 
will be stated below, this is indeed the case 
(even when FT ( G) is small). Some of the results 
that follow are stated in terms of the maximum 
independent set a(G) in G. As x(G) > n/a(G), 
these results imply a lower bound on x(G). 
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Theorem 7 (i) states that the original analysis of 
[17] is essentially tight. Theorem 7 (ii) presents 
bounds for the case of JT (G) = 3. Theorem 7 
(iii) and Theorem 8 present graphs G in which 
there is an extremely large gap between 7(G) and 
the relaxations JY(G) and F¥2(G). 


Theorem 7 ({10]) (i) For every constant ¢ > 0 
and constant k > 2, there are infinitely many 
graphs G with {(G) = k and a(G) < 
n/Al-k-€ (here A > n° for some constant 
5 > 0). (ii) There are infinitely many graphs G 
with ¥(G) = 3 and a(G) < n°-843, (iii) For 
some constant c, there are infinitely many graphs 


G with Y% (G) = O(*"_) and a(G) < log® n. 


log logn 
Theorem 8 ([7]) For some constant c, there are 
infinitely many graphs G with JF 2G) < Qvlogn 
and x(G) > n/2°viee". 


Vector colorings, including the Lovasz 6 func- 
tion and its variants, have been extensively stud- 
ied in the context of approximation algorithms 
for problems other than Problem 1. These include 
approximating a(G), approximating the mini- 
mum vertex cover problem, and combinatorial 
optimization in the context of random graphs. 


Applications 


Besides its theoretical significance, graph color- 
ing has several concrete applications (see, e.g., 
[3,4]). 


Open Problems 


By far the major open problem in the context 
of approximate coloring addresses the wide gap 
between what is known to be hard and what 
can be obtained in polynomial time. The case 
of constant 7(G) is especially intriguing, as the 
best known upper bounds (on the approximation 
ratio) are polynomial while the lower bounds are 
of constant nature. Regarding the vector color- 
ing paradigm, a majority of the results stated 
in section “Key Results” use the weakest from 
of vector coloring ¥(G) in their proof, while 
stronger relaxations may also be considered. It 
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would be very interesting to improve upon the 
algorithmic results stated above using stronger 
relaxations, as would a matching analysis of the 
limitations of these relaxations. 
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Problem Definition 


An undirected graph is said to be k-connected 
(specifically, k-vertex-connected) if the removal 
of any set of k — 1 or fewer vertices (with their 
incident edges) does not disconnect G. Anal- 
ogously, it is k-edge-connected if the removal 
of any set of k —1 edges does not disconnect 


Graph Connectivity 


G. Menger’s theorem states that a k-vertex- 
connected graph has at least k openly vertex- 
disjoint paths connecting every pair of vertices. 
For k-edge-connected graphs there are k edge- 
disjoint paths connecting every pair of vertices. 
The connectivity of a graph is the largest value 
of k for which it is k-connected. Finding the con- 
nectivity of a graph, and finding k disjoint paths 
between a given pair of vertices can be found 
using algorithms for maximum flow. An edge is 
said to be critical in a k-connected graph if upon 
its removal the graph is no longer k-connected. 

The problem of finding a minimum- 
cardinality k-vertex-connected (k-edge- 
connected) subgraph that spans all vertices of 
a given graph is called k-VCSS (k-ECSS) and 
is known to be nondeterministic polynomial- 
time hard for k > 2. We review some results in 
finding approximately minimum solutions to k- 
VCSS and k-ECSS. We focus primarily on simple 
graphs. A simple approximation algorithm is 
one that considers the edges in some order 
and removes edges that are not critical. It thus 
outputs a k-connected subgraph in which all 
edges are critical and it can be shown that it is a 
2-approximation algorithm (that outputs 
a solution with at most kn edges in an n-vertex 
graph, and since each vertex has to have degree 
at least k, we can claim that kn/2 edges are 
necessary). 

Approximation algorithms that do better than 
the simple algorithm mentioned above can be 
classified into two categories: depth first search 
(DFS) based, and matching based. 


Key Results 


Lower Bounds for k-Connected Spanning 
Subgraphs 

Each node of a k-connected graph has at least k 
edges incident to it. Therefore, the sum of the 
degrees of all its nodes is at least kn, where n 
is the number of its nodes. Since each edge is 
counted twice in this degree-sum, the cardinality 
of its edges is at least kn/2. This is called 
the degree lower bound. Expanding on this idea 
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yields a stronger lower bound on the cardinality 
of a k-connected spanning subgraph of a given 
graph. Let D; be a subgraph in which the degree 
of each node is at least k. Unlike a k-connected 
subgraph, D, has no connectivity constraints. The 
counting argument above shows that any D, has 
at least kn /2 edges. A minimum cardinality Dx 
can be computed in polynomial time by reducing 
the problem to matching, and it is called the 
matching lower bound. 


DFS-Based Approaches 

The following natural algorithm finds a 3/2 ap- 
proximation for 2-ECSS. Root the tree at some 
node r and run DFS. All edges of the graph are 
now either tree edges or back edges. Process the 
DFS tree in postorder. For each subtree, if the 
removal of the edge from its root to its parent 
separates the graph into two components, then 
add a farthest-back edge from this subtree, whose 
other end is closest to r. It can be shown that the 
number of back edges added by the algorithm is 
at most half the size of Opt. 

This algorithm has been generalized to solve 
the 2-VCSS problem with the same approxima- 
tion ratio, by adding carefully chosen back edges 
that allow the deletion of tree edges. Wherever it 
is unable to delete a tree edge, it adds a vertex 
to an independent set J. In the final analysis, the 
number of edges used is less than n + |J|. Since 
Opt is at least max(n,2|/|), it obtains a 3/2- 
approximation ratio. 

The algorithm can also be extended to the 
k-ECSS problem by repeating these ideas k/2 
times, augmenting the connectivity by 2 in each 
round. It has been shown that this algorithm 
achieves a performance of about 1.61. 


Matching-Based Approaches 

Several approximation algorithms for k-ECSS 
and k-VCSS problems have used a minimum 
cardinality D, as a starting solution, which 
is then augmented with additional edges 
to satisfy the connectivity constraints. This 
approach yields better ratios than the DFS-based 
approaches. 
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1 + % Algorithm for k-VCSS 

Find a minimum cardinality Dz_-,. Add just 
enough additional edges to it to make the 
subgraph k-connected. In this step, it is ensured 
that the edges added are critical. It is known by 
a theorem of Mader that in a k-connected graph, 
a cycle of critical edges contains at least one 
node of degree k. Since the edges added by the 
algorithm in the second step are all critical, there 
can be no cycle induced by these edges because 
the degree of all the nodes on such a cycle would 
be at least k + 1. Therefore, at most n — 1 edges 
are added in this step. The number of edges added 
in the first step, in the minimum D x_, is at most 
Opt —n/2. The total number of edges in the 
solution thus computed is at most (1 + 1/k) 
times the number of edges in an optimal 
k-VCSS. 


1 + <4 Algorithm for k-ECSs 

Mader’s theorem about cycles induced by crit- 
ical edges is valid only for vertex connectivity 
and not edge connectivity, Therefore, a differ- 
ent algorithm is proposed for k-ECSS in graphs 
that are k-edge-connected, but not k-connected. 
This algorithm finds a minimum cardinality D, 
and augments it with a minimal set of edges to 
make the subgraph k-edge-connected. The num- 
ber of edges added in the last step is at most 
pa(n — 1). Since the number of edges added in 
the first step is at most Opt, the total number of 
edges is at most (1 + Eat) Opt. 


Better Algorithms for Small k 

For k € {2,3}, better algorithms have been 
obtained by implementing the abovementioned 
algorithms carefully, deleting unnecessary 
edges, and by getting better lower bounds. For 
k = 2, a 4/3 approximation can be obtained by 
generating a path/cycle cover from a minimum 
cardinality D2 and 2-connecting them one at 
a time to a “core” component. Small cycles/paths 
allow an edge to be deleted when they are 2- 
connected to the core, which allows a simple 
amortized analysis. This method also generalizes 
to the 3-ECSS problem, yielding a 4/3 ratio. 


Graph Connectivity 


Hybrid approaches have been proposed which 
use the path/cycle cover to generate a specific 
DFS tree of the original graph and then 2-connect 
the tree, trying to delete edges wherever possible. 
The best ratios achieved using this approach are 
5/4 for 2-ECSS, 9/7 for 2-VCSS, and 5/4 for 
2-VCSS in 3-connected graphs. 


Applications 


Network design is one of the main application 
areas for this work. This involves the construction 
of low-cost highly connected networks. 


Recommended Reading 


For additional information on DFS, match- 
ings and path/cycle covers, see [3]. Fast 
2-approximation algorithms for k-ECSS and 
k-VCSS_ were studied by Nagamochi and 
Ibaraki [13]. DFS-based algorithms for 
2-connectivity were introduced by Khuller and 
Vishkin [11]. They obtained 3/2 for 2-ECSS, 5/3 
for 2-VCSS, and 2 for weighted k-ECSS. The 
ratio for 2-VCSS was improved to 3/2 by Garg 
et al. [6], 4/3 by Vempala and Vetta [14], and 
9/7 by Gubbala and Raghavachari [7]. Khuller 
and Raghavachari [10] gave an algorithm for 
k-ECSS, which was later improved by Gabow [4], 
who showed that the algorithm obtains a ratio 
of about 1.61. Cheriyan et al. [2] studied the 
k-VCSS problem with edge weights and designed 
an O(logk) approximation algorithm in graphs 
with at least 6k? vertices. 

The matching-based algorithms were intro- 
duced by Cheriyan and Thurimella [1]. They 
proposed algorithms with ratios of 1+ 4 for 
k-VCSS, 1+ ¢%; for k-ECSS, 1+ % for k- 
VCSS in directed graphs, and 1 + Zz for k- 
ECSS in directed graphs. Vempala and Vetta [14] 
obtained a ratio of 4/3 for 2-VCSS. The ratios 
were further improved by Krysta and Kumar [12], 
who introduced the hybrid approach, which was 
used to derive a 5/4 algorithm by Jothi et al. [9]. 
A 3/2-approximation algorithm for 3-ECSS has 
been proposed by Gabow [5] that works on 
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multigraphs, whereas the earlier algorithm of 
Cheriyan and Thurimella gets the same ratio in 
simple graphs only. This ratio has been improved 
to 4/3 by Gubbala and Raghavachari [8]. 
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Problem Definition 


The problem of determining isomorphism of two 
combinatorial structures is a ubiquitous one, with 
applications in many areas. The paradigm case 
of concern in this chapter is isomorphism of two 
graphs. In this case, an isomorphism consists of 
a bijection between the vertex sets of the graphs 
which induces a bijection between the edge sets 
of the graphs. One can also take the second graph 
to be acopy of the first, so that isomorphisms map 
a graph onto themselves. Such isomorphisms 
are called automorphisms or, less formally, sym- 
metries. The set of all automorphisms forms 
a group under function composition called the 
automorphism group. Computing the automor- 
phism group is a problem rather similar to that 
of determining isomorphisms. 

Graph isomorphism is closely related to many 
other types of isomorphism of combinatorial 
structures. In the section entitled “Applications”, 
several examples are given. 


Formal Description 

A graph is a pair G = (V, E) of finite sets, with 
E being a set of 2-tuples (v, w) of elements 
of V. The elements of V are called vertices (also 
points, nodes), while the elements of EF are called 
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directed edges (also arcs). A complementary pair 
(v,w), (w, v) of directed edges (v 4 w) will be 
called an undirected edge and denoted {v, w}. 
A directed edge of the form (v, v) will also 
be considered an undirected edge, called a loop 
(also self-loop). The word “edges” without qual- 
ification will indicate undirected edges, directed 
edges, or both. 

Given two graphs G,; =(Vj,F\) and 
G2 = (V2, E2), an isomorphism from G, to Gz 
is a bijection from Vj to V2 such that the induced 
action on £; is a bijection onto E>. If G; = Go, 
then the isomorphism is an automorphism of 
G,. The set of all automorphisms of G, is 
a group under function composition, called 
the automorphism group of Gy, and denoted 
Aut (G1). 

In Fig. 1 two isomorphic graphs are shown, 
together with an isomorphism between them and 
the automorphism group of the first. 


Canonical Labeling 

Practical applications of graph isomorphism 
testing do not usually involve individual pairs 
of graphs. More commonly, one must decide 
whether a certain graph is isomorphic to any 
of a collection of graphs (the database lookup 
problem) or one has a collection of graphs and 
needs to identify the isomorphism classes in it 
(the graph sorting problem). Such applications 


are not well served by an algorithm that can only 
test graphs in pairs. 

An alternative is a canonical labeling algo- 
rithm. The essential idea is that in each isomor- 
phism class there is a unique, canonical graph 
which the algorithm can find, given as input any 
graph in the isomorphism class. The canonical 
graph might be, for example, the least graph 
in the isomorphism class according to some or- 
dering (such as lexicographic) of the graphs in 
the class. Practical algorithms usually compute 
a canonical form designed for efficiency rather 
than ease of description. 


Key Results 


The graphisomorphism problem plays a key 
role in modern complexity theory. It is not 
known to be solvable in polynomial time, 
nor to be NP-complete, nor is it known to 
be in the class co-NP. See [3, 8] for details. 
Polynomial-time algorithms are known for many 
special classes, notably graphs with bounded 
genus, bounded degree, bounded tree-width, 
and bounded eigenvalue multiplicity. The fastest 
theoretical algorithm for general graphs requires 
exp(n!/2+°()) time [1], but it is not known to be 
practical. 


Graph Isomorphism 


In this entry, the focus is on the program 
nauty, which is generally regarded as the most 
successful for practical use. McKay wrote the 
first version of nauty in 1976 and described its 
method of operation in [5]. It is known [7] to have 
exponential worst-case time, but in practice the 
worst case is rarely encountered. 

The input to nauty is a graph with colored 
vertices. Two outputs are produced. The first is 
a set of generators for the color-preserving auto- 
morphism group. Though it is rarely necessary, 
the full group can also be developed element by 
element. The second, optional, output is a canon- 
ical graph. The canonical graph has the following 
property: two input graphs with the same number 
of vertices of each color have the same canonical 
graph if and only if they are isomorphic by 
a color-preserving isomorphism. 

Two graph data structures are supported: 
a packed adjacency matrix suitable for small 
dense graphs and a linked list suitable for large 
sparse graphs. 


Applications 


As mentioned, nauty can handle graphs with 
colored vertices. In this section, it is described 
how several other types of isomorphism problems 
can be solved by mapping them onto a problem 
for vertex-colored graphs. 


lsomorphism of Edge-Colored Graphs 

An isomorphism of two graphs, each with both 
vertices and edges colored, is defined in the 
obvious way. An example of such a graph appears 
at the left of Fig. 2. 


NON Oo WwW 
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In the center of the figure the colors are iden- 
tified with the integers 1,2,3. At the right of 
the figure an equivalent vertex-colored graph is 
shown. In this case there are two layers, each with 
its own color. Edges of color 1 are represented as 
an edge in the first (lowest) layer, edges of color 2 
are represented as an edge in the second layer, 
and edges of color 3 are represented as edges in 
both layers. It is now easy to see that the auto- 
morphism group of the new graph (specifically, 
its action on the first layer) is the automorphism 
group of the original graph. Moreover, the order 
in which a canonical labeling of the new graph 
labels the vertices of the first layer can be taken 
to be a canonical labeling of the original graph. 

More generally, if the edge colors are integers 
in {1,2,... 24 — 1}, there are d layers, and the 
binary expansion of each color number dictates 
which layers contain edges. The vertical threads 
(each corresponding to one vertex of the original 
graph) can be connected using either paths or 
cliques. If the original graph has n vertices and 
k colors, the new graph has O(n log k) vertices. 
This can be improved to O(n ./log k) vertices by 
also using edges that are not horizontal. 


Isomorphism of Hypergraphs and Designs 

A hypergraph is similar to an undirected graph 
except that the edges can be vertex sets of any 
size, not just of size 2. Such a structure is also 
called a design. 

On the left of Fig. 3 there is a hypergraph 
with five vertices, two edges of size 2, and one 
edge of size 3. On the right is an equivalent 
vertex-colored graph. The vertices on the left, 
colored with one color, represent the hypergraph 
edges, while the edges on the right, colored with a 


Graph Isomorphism, Fig. 2. Graph isomorphism with colored edges 
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Graph Isomorphism, Fig. 3 Hypergraph/design isomorphism as graph isomorphism 


different color, represent the hypergraph vertices. 
The edges of the graph indicate the hypergraph 
incidence (containment) relationship. 

The edge-vertex incidence matrix appears 
in the center of the figure. This can be any 
binary matrix at all, which correctly suggests 
that the problem under consideration is just 
that of determining the 0-1 matrix equivalence 
under independent permutation of the rows and 
columns. By combining this idea with the previ- 
ous construction, such an equivalence relation on 
the set of matrices with arbitrary entries can be 
handled. 


Other Examples 

For several applications to equivalence operations 
such as isotopy, important for Latin squares and 
quasigroups, see [6]. 

Another important type of equivalence relates 
matrices over {—1,-+1}. As well as permuting 
rows and columns, it allows multiplication of 
rows and columns by —1. A method of converting 
this Hadamard equivalence problem to a graph 
isomorphism problem is given in [4]. 


Experimental Results 


Nauty gives a choice of sparse and dense data 
structures, and some special code for difficult 
graph classes. For the following timing examples, 
the best of the various options are used for a sin- 
gle CPU of a 2.4 GHz Intel Core-duo processor. 


1. Random graph with 10,000 vertices, p = 3: 
0.014 s for group only, 0.4 s for canonical 
labeling as well. 


2. Random cubic graph with 100,000 vertices: 
8s. 

3. 1-skeleton of 20-dimensional cube (1,048,576 
vertices, group size 2.5 x 1074): 92s. 

4. 3-dimensional mesh of size 50 (125,000 ver- 
tices): 0.7 s. 

5. 1027-vertex strongly regular graph from ran- 
dom Steiner triple system: 0.6 s. 


Examples of more difficult graphs can be found 
in the nauty documentation. 


URL to Code 


The source code of nauty is available at http:// 
cs.anu.edu.au/~bdm/nauty/. Another implemen- 
tation of the automorphism group portion of 
nauty, highly optimized for large sparse graphs, 
is available as saucy [2]. Nauty is also 
incorporated into a number of general-purpose 
packages, including GAP, Magma, and MuPad. 
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Problem Definition 


The basic problem we consider is testing whether 
an undirected graph G on n nodes {v1,..., Un} 
is connected. We consider this problem in the 
following two related models: 


1. Dynamic Graph Stream Model: The graph 
G is defined by a sequence of edge insertions 
and deletions; the edges of G are the set of 
edges that have been inserted but not subse- 
quently deleted. An algorithm for analyzing G 
may only read the input sequence from left to 
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right and has limited working memory. If the 
available memory was O(n7) bits, then the al- 
gorithm could maintain the exact set of edges 
that have been inserted but not deleted. The 
primary objective in designing an algorithm 
in the stream model is to reduce the amount 
of memory required. Ideally, the time to pro- 
cess each element of the stream and the post- 
processing time should be small but ensuring 
this is typically a secondary objective. 

2. Simultaneous Communication Model: We 
consider the n rows of the adjacency matrix 
of G to be partitioned between n players 
P,,..., Py, where P; receives the ith row of 
the matrix. This means that the existence of 
any edge is known by exactly two players. 
An additional player Q wants to evaluate a 
property of G, and to facilitate this, each 
player P; simultaneously sends a message mj; 
to QO such that Q may evaluate the property 
given the messages ™1,1M2,...,M™ny. With n- 
bit messages from each player, Q could learn 
the entire graph and the problem would be 
uninteresting. The objective is to minimize 
the number of bits sent by each player. Note 
that the P; players may not communicate to 
each other and that each message m; must 
be constructed given only the ith row of the 
adjacency matrix and possibly a set of random 
bits that is known to all the players. 


If there were no edge deletions in the data 
stream setting, it would be simple to determine 
whether G was connected using O(n logn) mem- 
ory since it is possible to maintain the connected 
components of the graph; whenever an edge is 
added, we merge the connected components con- 
taining the endpoints of this edge. This algo- 
rithm is optimal in terms of space [19]. Such 
an approach does not extend if edges may also 
be deleted since it is unclear how the connected 
components should be updated when an edge is 
deleted within a connected component. 

To illustrate the challenge in the simultaneous 
communication model, suppose G is connected 
but G \ {e} is disconnected for some edge e. The 
player QO can only learn about the existence of the 
edge e = {v;, v;} from either player P; or player 
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P;, but since both of these players have limited 
knowledge of the graph, neither will realize the 
important role this edge plays in determining the 
connectivity of the graph. 


Linear Sketches 

For both models the best known algorithms are 
based on random linear projections, aka linear 
sketches. If we denote the n rows of the adjacency 
matrix by xj,...,X, € {0,1}", then the linear 
sketches of the graph are A1(x1),...,An(Xn) 
where each A; € R?*" is a random matrix 


on the insertion of {v;, v;}: 
on the deletion of {v;, v;}: 


on the insertion/deletion of {v;, vx} fori ¢ {j,k}: 
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chosen according to a specific distribution. Note 
that the matrices A;,...,.A, need not be chosen 


independently. 
In the simultaneous communication model, 
the message from player P; is mj = Aj;(x;), 


and, assuming that the entries of A; have poly- 
nomial precision, each of these messages requires 
O(d polylog n) bits. In the dynamic graph stream 
model, the algorithm constructs each A; (x;) us- 
ing O(nd polylogn) bits of space. Note that each 
Aj; (x;) can be constructed incrementally using 
the following update rules: 


Aj (xi) <— Ai (xi) + Ai(e;) 
Ai (xi) <— Aj (x) — Ai (e;) 
Ai (i) <— Ai (xi) 


where e; is the characteristic vector of the set 
{7}. Hence, we have transformed the problem 
of designing an efficient algorithm into finding 
the minimum d such that there exists a distri- 
bution of matrices A,,...,An € R@*" such 
that for any graph G, we can determine (with 
high probability) whether G is connected given 
Ai (x1), Perea An (Xn). 


Key Results 


The algorithm for connectivity that we present 
in this entry, and much of the subsequent work 
on graph sketching, fits the following template. 
First, we consider a basic “non-sketch” algorithm 
for the graph problem in question. Second, we 
design sketches A; such that it is possible to 
emulate the steps of the basic algorithm given 
only the projections A;(x;) € R? where d = 
O(polylogn). 


Connectivity 
Basic Non-sketch Algorithm 


We pick an incident edge for each node arbitrarily 
and collapse the resulting connected components 


into a set of “supernodes.” In each subsequent 
round of the algorithm, we pick an edge from 
each supernode to another supernode (if one 
exists) and collapse the connected components 
into new supernodes. It can be shown that this 
process terminates after O(log n) rounds and that 
the set of edges picked during the different rounds 
include a spanning forest of the graph. From this 
we can deduce whether the graph is connected. 


Designing the Sketches 
There are two main steps required in constructing 
the sketches for the connectivity algorithm: 


An Alternative Graph Representation. Rather 
than consider the rows of the adjacency matrix 
x;, it will be convenient to consider an alternative 
representation aj € {-1,0, phe with entries 
indexed by pairs 


1 iff =j <kand{vu;,usek 
ai[{j,k}] = 4-1 if j <k =i and {v,;,u,}E€E 


0 otherwise 


These vectors have the useful property that for 
any subset of nodes {v;};e5, the non-zero en- 
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tries of }';<5 aj correspond exactly to the edges 
across the cut (S,V \ S). 

For example, consider the graph on nodes 
{v1, U2, U3, v4} with edges {v1, v2}, {v2, v3}, 
{v3, v4}, and {v1, v4}. Then 


a, = ( 1 0) 
ay = (-l 0) 
ag = ( 0 1) 
a = ( 0 -1) 


where the entries correspond to the pairs {1, 2}, 
{1,3}, {1,4}, {2,3}, {2,4}, {3,4} in that order. 
Note that the nonzero entries of 

ajtao=( 001 1 0 O) 
correspond to {1, 4} and {2,3} which are exactly 
the edges across the cut (S,V \ S) for S = 
{v1 , v2}. 


£o-Sampling via Linear Sketches. £9-sampling 
is a technique that has found numerous 
applications in data stream processing. We 
appeal to a result by Jowhari et al. [12] that 
shows the existence of a distribution over 
matrices M € RPolylos™)xpoly(™) such that for 
any nonzero vector z € RP'Y), the index of 
some nonzero entry of z can be reconstructed 
with high probability given M(z) € RPiee™), 
Note that we do not get to choose which entry is 
reconstructed. 


Emulation Basic Algorithm via Sketches 

Let M,,...,M; be r = O(logn) independent 
sketch matrices for £9-sampling. Given M ;(a;) 
for all 7 € [r] andi ¢€ [n], we can emulate the 
basic algorithm as follows: 


1. Given M,(a1),Mj(az),...,Mi(an), we 
may emulate the first round of the algorithm 
since from each M ; (a; ) we may reconstruct a 
nonzero entry of a;, and these nonzero entries 
correspond to edges incident to v;. 

2. To emulate round 7 > 1 of the algorithm, 
suppose S is one of the connected components 
already constructed. Then, given 
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YM jai) = Mj) ai) 


ieS ieS 


we may reconstruct a nonzero entry of 
jes a Which corresponds to an edge across 
the cut (S,V \ S). 


Extensions and Further Work 


Subsequent work has extended the above results 
significantly. If d is increased to O(k polylogn) 
then, it is possible to test whether every cut has at 
least k edges [1]. With d = O(€~? polylogn), 
it is possible to construct graph sparsifiers that 
can be used to estimate the size of every cut 
up to a (1 + €) factor [2] along with spectral 
properties such as the eigenvalues of the graph 
[14]. With d = O(e~'k polylogn), it is possi- 
ble to distinguish graphs which are not k-vertex 
connected from those that are at least (1 + €)k- 
vertex connected [11]. Some of the above results 
have also been extended to hypergraphs [11]. 
The algorithm presented in this entry can be im- 
plemented with O(polylog 7) update time in the 
dynamic graph stream model, but a connectivity 
query may take (2(n) time. This was addressed in 
subsequent work by Kapron et al. [15]. 

More generally, solving graph problems via 
linear sketches has become a very active area of 
research [1—8, 10, 11, 13, 14, 16, 17]. Other prob- 
lems that have been considered include approxi- 
mating the densest subgraph [6, 9, 18], maximum 
matching [5,7,8, 16], vertex cover and hitting set 
[8], correlation clustering [4], and estimating the 
number of triangles [17]. 
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Technique for analysis of greedy approximation 


Problem Definition 


Consider a graph G = (V, E). A subset C of V 
is called a dominating set if every vertex is either 
in C or adjacent to a vertex in C. If, furthermore, 
the subgraph induced by C is connected, then C 
is called a connected dominating set. 

Given a connected graph G, find a connecting 
dominating set of minimum cardinality. This 
problem is denoted by MCDS and is NP- 
hard. Its optimal solution is called a minimum 
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connected dominating set. The following is a 
greedy approximation with potential function /. 


Greedy Algorithm A: 

C<Q; 

while f(C) > 2do 

choose a vertex x to maximize f(C)— f(C U 
{x}) and 

C <—C U {x}; output C. 


Here, f is defined as f(C) = p(C) + q(C) 
where p(C) is the number of connected 
components of subgraph induced by C and 
q(C) is the number of connected components 
of subgraph with vertex set V and edge 
set {(u,v)€E|ueCorveC}. f has an 
important property that C is a connected 
dominating set if and only if f(C) = 2. 

If C is a connected dominating set, then 


p(C) = g(C) = 1, and hence f(C) = 
2. Conversely, suppose f(C U {x}) = 2. 
Since p(C) > 1 and q(C) = 1, one has 


p(C) = g(C) = 1 which implies that C 
is a connected dominating set. f has another 
property, for G with at least three vertices, that 
if f(C) > 2, then there exists x € V such 
that f(C) — f(C U {x}) > 0. In fact, for 
C = 9, since G is a connected graph with at 
least three vertices, there must exist a vertex x 
with degree at least two, and for such a vertex 
x, f(C U {x}) < f(C). For C ¥ @, consider 
a connected component of the subgraph induced 
by C. Let B denote its vertex set which is a 
subset of C. For every vertex y adjacent to 
B, if y is adjacent to a vertex not adjacent to 
B and not in C, then p(C U {y}) < p(C) 
and g(C U {y}) < q(C); if y is adjacent to a 
vertex in C — B, then p(C U {y}) < p(C) and 
q(C Uty}) < q(C). 

Now, look at a possible analysis for the above 
greedy algorithm: Let x,,...,Xg be vertices 
chosen by the greedy algorithm in the ordering of 
their appearance in the algorithm. Denote C; = 
{X1,..., xj}. Let C* = {y1,..., Vopr} be a min- 
imum connected dominating set. Since adding 
C* to C; will reduce the potential function value 
from f(C; ) to 2, the value of f reduced by 
a vertex in C* would be (f(C;) — 2)/opt in 
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average. By the greedy rule for choosing x; + 1, 
one has 


FT (Ci) — f(Ci41) = a. 
opt 
Hence, 
F(Cit1) —2 5 F(Gi) — 2) (1 = ai) 


<(f@)-D(1- 4) 
=(@-a(1- 3)", 


where n = |V|. Note that 1 — 1/opt < e7!/°?", 
Hence, 


f(Ci)-2<5 a—-De VP", 


Choose i such that f(C;) > opt +2 > f(Cj41). 
Then 
opt <(n—2)e 1/9 


and 
g—i <opt. 


Therefore, 


. n—-2 
g < opt +i <opt(1+In ): 
opt 


Is this analysis correct? The answer is NO. Why? 
How could one give a correct analysis? This entry 
will answer those questions and introduce a new 
general technique, analysis of greedy approxima- 
tion with nonsubmodular potential function. 


Key Results 


The Role of Submodularity 

Consider a set X and a function f defined on the 
power set 2*, i.e., the family of all subsets of X. 
f is said to be submodular if for any two subsets 
A and B in 2*, 


F(A) + f(B) = f(A B) + f(AU B). 


For example, consider a connected graph G. Let 
X be the vertex set of G. The function —q(C) 
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defined in the last section is submodular. To 
see this, first mention a property of submodular 
functions. 

A submodular function f is normalized if 
f(@) = 0. Every submodular function f can 
be normalized by setting g(A) = f(A) — (9). 
A function f is monotone increasing if f(A) < 
Ff (B) for A Cc B. Denote A, f(A) = f(AU 
{x}) — f(A). 


Lemma 1 A function f : 2* — R is submod- 
ular if and only if A, f(A) < Ax f(B) for any 
x € X—Band A C B. Moreover, f is monotone 
increasing if and only if A, f(A) < A, f(B) for 
anyx € Band ACB. 


Proof If f is submodular, then for x € X — B 
and A C B, one has 


F(AU {x}) + f(B) 
> f(AU {x}) U B) + f(AU {x}) 1 B) 
= f(BU {x}) + f(A), 
that is, 
Ax f(A) = Ax f(B). (1) 


Conversely, suppose (1) holds for any x € B and 
A C B. Let C and D be two sets and C/D = 
{x1,..., Xx}. Then 
f(C UD) —f(D) 

k 
ps Ax; f(D U {x1, ee ,Xj-1) 


i=1 
k 
Y Ax, f(CNDYU{ax1,..., x71) 
i=1 
= f(C)- f(C nD). 


II 


IA 


If f is monotone increasing, then for A C B, 
F(A) < f(B). Hence, for x € B, 


A)x f(A) 2 0 = A, f(B). 


Conversely, if A, f(A) > A, f(B) for any x € B 
and A C B, then for any x and A, A, f(A) > 
Ax f(A U {x}) = 0, that is, f(A) < f(A U {x}). 
Let B— A = {x1,..., x4}. Then 


F(A) Ss f(AU {x1}) 
S f(AU {%1, x2}) S++ f(B). 
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Next, the submodularity of —q(A) is studied. O 
Lemma 2 /f A C B, then Ayq(A) = Ayq(B). 


Proof Note that each connected component of 
graph (V, D(B)) is constituted by one or more 
connected components of graph (V, D(A)) since 
A C B. Thus, the number of connected compo- 
nents of (V, D(B)) dominated by y is no more 
than the number of connected components of 
(V, D(A)) dominated by y. Therefore, the lemma 
holds. 

The relationship between submodular func- 
tions and greedy algorithms has been established 
for a long time [2]. 

Let f be a normalized, monotone increasing, 
submodular integer function. Consider the mini- 
mization problem 

min c(A) 
subject to A € Cy. 


where c is a nonnegative cost function defined on 
2* and Cr = {Cmidf(C U {x}) — f(C) = 
Oforallx ce X}. The following is a greedy 
algorithm to produce approximation solution for 
this problem. Oo 


Greedy Algorithm B 

input submodular function f and cost function 
Cy 

A<Q; 

while there exists x € E such that A, f(A) > 0 

do select a vertex x that maximizes A, f(A)/c(x) 
and set 

A<AU {x}; 

return A. 


The following two results are well known. 


Theorem 1 /f f is a normalized, monotone 
increasing, submodular integer function, then 
Greedy Algorithm B produces an approximation 
solution within a factor of H(y) from optimal, 
where y = maxxer f ({x}). 

Theorem 2 Let f be a normalized, monotone 


increasing, submodular function and c a non- 
negative cost function. If in Greedy Algorithm B, 
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selected x always satisfies A, f(Aj-1)/c(x) = 
1, then it produces an approximation solution 
within a factor of 1 + In(f*/opt) from optimal 
for the above minimization problem where f* = 
f(A*) and opt = 
A*. 


c(A*) for optimal solution 


Now, come back to the analysis of Greedy 
Algorithm A for the MCDS. It looks like that 
the submodularity of f is not used. Actually, 
the submodularity was implicitly used in the 
following statement: 

“Since adding C* to C; will reduce the po- 
tential function value from f(C; ) to 2, the 
value of f reduced by a vertex in C* would be 
(f(C; —2)/opt in average. By the greedy rule for 
choosing x; + 1, one has 


F(Ci) —2 


S(Ci) — f(Ci41) = —— 
opt 


To see this, write this argument more carefully. 
Let C* = {yi,.-.,Yopr} and denote C7 = 
{y1,---, yy}. Then 


a oe UC*) 
= Sycue; iv -— A(GUC;)I 


where Cj = @. By the greedy rule for choosing 
x; + 1, one has 


F(Ci) — f(Cits) = $C) — AIG {yy}) 


for 7 = 1,..., opt. Therefore, it needs to have 


—Ay, f(Ci) = (Ci) — FG {y;}) 
= MCU Ci) — GUC) 
ay, MG Wer) 
(2) 


in order to have 


C; 
NG) ~ Gis) = ZO 


Equation (2) asks the submodularity of — f. Un- 
fortunately, — f is not submodular. A counterex- 
ample can be found in [2]. This is why the 
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analysis of Greedy Algorithm A in section “Prob- 
lem Definition” is incorrect. 


Giving Up Submodularity 


Giving up submodularity is a challenge task since 
it is open for a long time. But, it is possible 
based on the following observation on (2) by 
Du et al. [1]: The submodularity of —/ is 
applied to increment of a vertex y; belonging 
to optimal solution C*. 

Since the ordering of y;’s is flexible, one may 
arrange it to make Ay; f(Cj;)— Ay; f(Ci UCj_)) 
under control. This is a successful idea for the 
MCDS. 


Lemma 3 Let y;’s be ordered in the way that 
for any j = 1,...,opt,{y1,...,y;} induces a 
connected subgraph. Then 

Ay, f(Ci) — Ay, f(Ci U Cj pil. 
Proof Since all y;,..., yj—1 are connected, y; 
can dominate at most one additional connected 
component in the subgraph induced by C;_; U 
Cr , than in the subgraph induced by c; — 1. 
fience: 


Ay; P(Ci) — Ay; f(Ci UC; —y) <1. 


Moreover, since —q is submodular, 


Ay, q(Gi) — Ay, q(Gi UC j-) <0. 
Therefore, 
Ay, f(Ci) — Ay, (Gi UC; p<. 


Now, one can give a correct analysis for the 
greedy algorithm for the MCDS [3]. 
By Lemma 3, 


F(Gi)-2 © 


I(Gi) — f(Gi41) = 
opt 


Hence, 
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f(Ci+1) —2 — opt 
< (f(Ci) -2 + opt) (1- 4s) 


< (f(@) -2~opt)(1- 3) 
= (n—2-—opt) (1 - a) ; 


where n = |V|. Note that 1 — 1/opt < e71/9?", 
Hence, 


f (Ci) —2—opt < (n—2)e/"”, 


Choose i such that f(C;) > 2-opt+2 > 
F (C41). Then 


opt < (n—2)et/P! 


and 
g—i <2-opt. 
Therefore, 
. n—2 
g<2-opt+i <opt| 2+I1n ; <opt(2+In6) 
op 


where 6 is the maximum degree of input graph 
G. oO 


Applications 


The technique introduced in the previous section 
has many applications, including analysis of it- 
erated 1-Steiner trees for minimum Steiner tree 
problem and analysis of greedy approximations 
for optimization problems in optical networks [3] 
and wireless networks [2]. 


Open Problems 


Can one show the performance ratio 1 + H(6) 
for Greedy Algorithm B for the MCDS? The 
answer is unknown. More generally, it is 
unknown how to get a clean generalization of 
Theorem |. 


Greedy Set-Cover Algorithms 
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Problem Definition 


Given a collection S of sets over a universe U, 
a set cover C C S is a subcollection of the 
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sets whose union is U. The set-cover problem is, 
given S, to find a minimum-cardinality set cover. 
In the weighted set-cover problem, for each set 
s € S,a weight ws >0 is also specified, and the 
goal is to find a set-cover C of minimum total 


weight }° ws. 
SEC : ; 
Weighted set cover is a special case of mini- 


mizing a linear function subject to a submodular 
constraint, defined as follows. Given a collection 
S of objects, for each object s a nonnegative 
weight w s, and a nondecreasing submodular 
function f : 25 — R, the goal is to find a 
subcollection C C Ssuch that f(C) = f (S) 


minimizing )° ws. (Taking f(C) = | Usec 8| 
sec 
gives weighted set cover.) 


Key Results 


The greedy algorithm for weighted set cover 
builds a cover by repeatedly choosing a set s that 
minimizes the weight w, divided by the number 
of elements in s not yet covered by chosen sets. It 
stops and returns the chosen sets when they form 
a cover: A 

Let Hy, denote >> 1/i ~ Ink, where k is the 


i=1 
largest set size. 


greedy-set-cover(S, w) 
1. Initialize C9. Define f(C) = |Usec $I. 
2. Repeat until f(C) = f(S): 


3. Choose s€S minimizing the price per 
element w,/[f(C U {s}) — f(C)]. 

4. Let Ce CU {s}. 

5. Return C. 


Theorem 1 The greedy algorithm returns a set 
cover of weight at most Hx times the minimum 
weight of any cover. 


Proof When the greedy algorithm chooses a set 
s, imagine that it charges the price per element 
for that iteration to each element newly covered 
by s. Then, the total weight of the sets chosen 
by the algorithm equals the total amount charged, 
and each element is charged once. 
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Consider any set s = {xxz, X~-1,..-, X1} in the 
optimal set cover C*. Without loss of generality, 
suppose that the greedy algorithm covers the 
elements of s in the order given: xp, Xp-1,..-,%1- 
At the start of the iteration in which the algorithm 
covers element x; of s, at least i elements of s 
remain uncovered. Thus, if the greedy algorithm 
were to choose s in that iteration, it would pay a 
cost per element of at most ws /i. Thus, in this 
iteration, the greedy algorithm pays at most ws / 
i per element covered. Thus, it charges element 
x; at most ws /i to be covered. Summing over 
i, the total amount charged to elements in s is at 
most ws Hx. Summing over s € C* and noting 
that every element is in some set in C*, the total 
amount charged to elements overall is at most 


>> W; AH, = H,OPT. O 
seC* 
The theorem was shown first for the 


unweighted case (each ws = 1) by Johnson 
[5], Lovasz [8], and Stein [13] and then extended 
to the weighted case by Chvatal [2]. 

Since then a few refinements 
provements have been shown, including the 
following: 


and im- 


Theorem 2 Let S be a set system over a universe 
with n elements and weights ws <1. The total 
weight of the cover C returned by the greedy 
algorithm is at most [1 + In(n/OPT)] OPT + 1 
(compare to [12]). 


Proof Assume without loss of generality that 
the algorithm covers the elements in order 
Xn.Xn—-1,...,X1,. At the start of the iteration 
in which the algorithm covers x;, there are at 
least i elements left to cover, and all of them 
could be covered using multiple sets of total cost 
OPT. Thus, there is some set that covers not-yet- 
covered elements at a cost of at most OPT/i per 
element. 

Recall the charging scheme from the previ- 
ous proof. By the preceding observation, ele- 
ment x; is charged at most OPT/i. Thus, the 
total charge to elements xXy,,...,x; is at most 
(H, — Hj-1)OPT. Using the assumption that 
each ws <1, the charge to each of the remaining 
elements is at most | per element. Thus, the total 
charge to all elements is at most i — 1 + (Hy, — 
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Hj—,)OPT. Taking i = 1 + [OPT], the total 
charge is at most [OPT] + (Hn — Hjopr))OPT < 
1+ OPT(1 + In(n/OPT)). Oo 


Each of the above proofs implicitly constructs 
a linear-programming primal-dual pair to show 
the approximation ratio. The same approximation 
ratios can be shown with respect to any fractional 
optimum (solution to the fractional set-cover lin- 
ear program). 


Other Results 

The greedy algorithm has been shown to have an 
approximation ratio of In n—In In n+ O(1) [11]. 
For the special case of set systems whose duals 
have finite Vapnik-Chervonenkis (VC) dimen- 
sion, other algorithms have substantially better 
approximation ratio [1]. Constant-factor approxi- 
mation algorithms are known for geometric vari- 
ants of the closely related k-median and facility 
location problems. 

The greedy algorithm generalizes naturally to 
many problems. For example, for minimizing 
a linear function subject to a submodular con- 
straint (defined above), the natural extension of 
the greedy algorithm gives an Hx -approximate 
solution, where k = maxses f({s}) — f(@), 
assuming f is integer valued [10]. 

The set-cover problem generalizes to allow 
each element x to require an arbitrary number r;, 
of sets containing it to be in the cover. This gen- 
eralization admits a polynomial-time O(log 7)- 
approximation algorithm [7]. 

The special case when each element belongs 
to at most r sets has a simple r-approximation 
algorithm ({15] § 15.2). When the sets have 
uniform weights (ws = 1), the algorithm reduces 
to the following: select any maximal collection 
of elements, no two of which are contained in the 
same set; return all sets that contain a selected 
element. 

The variant “Max k-coverage” asks for a set 
collection of total weight at most & covering as 
many of the elements as possible. This variant 
has a (1 — 1/e)-approximation algorithm ([15] 
Problem 2.18) (see [6] for sets with nonuniform 
weights). 
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For a general discussion of greedy methods for 
approximate combinatorial optimization, see ([4] 
Ch. 4). 

Finally, under likely complexity-theoretic as- 
sumptions, the In n approximation ratio is essen- 
tially the best possible for any polynomial-time 
algorithm [3, 9]. 


Applications 


Set cover and its generalizations and variants are 
fundamental problems with numerous applica- 
tions. Examples include: 


¢ Selecting a small number of nodes in a net- 
work to store a file so that all nodes have a 
nearby copy 

e Selecting a small number of sentences to 
be uttered to tune all features in a speech- 
recognition model [14] 

¢ Selecting a small number of telescope snap- 
shots to be taken to capture light from all 
galaxies in the night sky 

¢ Finding a short string having each string in a 
given set as a contiguous sub-string 
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Problem Definition 


E. Marczewski proved that every graph can be 
represented by a list of sets where each vertex 
corresponds to a set and the edges to nonempty 
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intersections of sets. It is natural to ask what sort 
of graphs would be most likely to arise if the list 
of sets is generated randomly. 

Consider the model of random graphs where 
each vertex chooses randomly from a universal 
set the members of its corresponding set, each 
independently of the others. The probability 
space that is created is the space of random inter- 
section graphs, Gyjm,p, where n is the number of 
vertices, m is the cardinality of a universal set of 
elements and p is the probability for each vertex 
to choose an element of the universal set. The 
model of random intersection graphs was first 
introduced by M. Karonsky, E. Scheinerman, and 
K. Singer-Cohen in [4]. A rigorous definition 
of the model of random intersection graphs 
follows: 


Definition 1 Let n, m be positive integers and 
0<p<i1. The random intersection graph 
Gnm,p is a probability space over the set of 
graphs on the vertex set {1,...,”} where each 
vertex is assigned a random subset from a fixed 
set of m elements. An edge arises between two 
vertices when their sets have at least a common 
element. Each random subset assigned to a vertex 
is determined by 


Pr [vertex i chooses element j] = p 
with these events mutually independent. 


A common question for a graph is whether it has 
a cycle, a set of edges that form a path so that the 
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first and the last vertex is the same, that visits all 
the vertices of the graph exactly once. We call this 
kind of cycle the Hamilton cycle and the graph 
that contains such a cycle is called a Hamiltonian 
graph. 


Definition 2 Consider an undirected graph 
G = (V, E) where V is the set of vertices and E 
the set of edges. This graph contains a Hamilton 
cycle if and only if there is a simple cycle that 
contains each vertex in V. 


Consider an instance of Gy m,p, for specific val- 
ues of its parameters n, m, and p, what is the prob- 
ability of that instance to be Hamiltonian? Taking 
the parameter p, of the model, to be a function 
of n and m, in [2], a threshold function P(n,m) 
has been found for the graph property “Contains 
a Hamilton cycle”; i.e., a function P(n,m) is 
derived such that 


if p(n,m) < P(n,m) 


lim Pr Gin, pContains Hamilton cycle] =0 
n,m—>oo 


if p(n,m) > P(n,m) 


lim Pr [ Gam, pContains Hamilton cycle] =1 
n,m—>oo 


When a graph property, such as “Contains 
a Hamilton cycle,” holds with probability that 
tends to | (or 0) as n, m tend to infinity, then it 
is said that this property holds (does not hold), 
“almost surely” or “almost certainly.” 

If in Gnjm,p the parameter m is very small 
compared to n, the model is not particularly in- 
teresting and when m is exceedingly large (com- 
pared to n) the behavior of Gyjm,p is essentially 
the same as the Erdés—Rényi model of random 
graphs (see [3]). If someone takes m = [n%], for 
fixed real a > 0, then there is some deviation 
from the standard models, while allowing for 
a natural progression from sparse to dense graphs. 
Thus, the parameter m is assumed to be of the 
form m = [n“] for some fixed positive real a. 

The proof of existence of a Hamilton cy- 
cle in Gnjm,p is mainly based on the estab- 
lishment of a stochastic order relation between 
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the model Gym,» and the Erdés—Rényi random 
graph model G,,, 5. 


Definition 3 Let n be a positive integer, 
0<p<1. The random graph G(n, p) is 
a probability space over the set of graphs on 
the vertex set {1,...,} determined by 


Prfi.j]= 
with these events mutually independent. 


The stochastic order relation between the two 
models of random graphs is established in the 
sense that if A is an increasing graph property, 
then it holds that 


Pr [Grp € A] < Pr[Gramp € Al 


where p = f(p). A graph property .A is in- 
creasing if and only if given that A holds for 
a graph G(V, E£) then A holds for any G(V, E’): 
E' DE. 


Key Results 

Theorem 1 Let m = [n™], where a is a fixed 
real positive, and C,,C2 be sufficiently large 
constants. If 


logn 


p=aQ for O0<a<1 or 


m 


[1 
p=Cr Oe" for a> 
nm 


then almost all Gnim,p are Hamiltonian. Our 
bounds are asymptotically tight. 


Note that the theorem above says nothing 
when m = n,1.e.,a@ = 1. 


Applications 


The Erdés—Rényi model of random graphs, Gp, p, 
is exhaustively studied in computer science 
because it provides a framework for studying 
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practical problems such as “reliable network 
computing” or it provides a “typical instance” 
of a graph and thus it is used for average 
case analysis of graph algorithms. However, 
the simplicity of Gy,» means it is not able to 
capture satisfactorily many practical problems 
in computer science. Basically, this is because 
of the fact that in many problems independent 
edge-events are not well justified. For example, 
consider a graph whose vertices represent a set 
of objects that either are placed or move in a 
specific geographical region, and the edges are 
radio communication links. In such a graph, 
we expect that, any two vertices u, w are more 
likely to be adjacent to each other, than any 
other, arbitrary, pair of vertices, if both are 
adjecent to a third vertex v. Even epidemiological 
phenomena (like the spread of disease) tend to 
be more accurately captured by this proximity- 
sensitive random intersection graph model. Other 
applications may include oblivious resource 
sharing in a distributive setting, interaction of 
mobile agents traversing the web etc. 

The model of random intersection graphs 
Gnjm,p Was first introduced by M. Karonsky, 
E. Scheinerman, and K. Singer-Cohen in [4] 
where they explored the evolution of random 
intersection graphs by studying the thresholds 
for the appearance and disappearance of 
small induced subgraphs. Also, J.A. Fill, E.R. 
Scheinerman, andK. Singer Cohen in [3] proved 
an equivalence theorem relating the evolution of 
Gnjm,p and Gy,p, in particular they proved that 
when m = n®™ where a > 6, the total variation 
distance between the graph random variables 
has limit 0. S. Nikoletseas, C. Raptopoulos, and 
P. Spirakis in [8] studied the existence and the 
efficient algorithmic construction of close to 
optimal independent sets in random intersection 
graphs. D. Stark in [11] studied the degree of 
the vertices of the random intersection graphs. 
However, after [2], Spirakis and Raptopoulos, 
in [10], provide algorithms that construct 
Hamilton cycles in instances of Gy m,p, for 
p above the Hamiltonicity threshold. Finally, 
Nikoletseas et al. in [7] study the mixing time and 
cover time as the parameter p of the model varies. 
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Open Problems 


As in many other random structures, e.g., Gn,p 
and random formulae, properties of random in- 
tersection graphs also appear to have threshold 
behavior. So far threshold behavior has been 
studied for the induced subgraph appearance and 
hamiltonicity. 

Other fields of research for random intersec- 
tion graphs may include the study of connectivity 
behavior, of the model i.e., the path formation, 
the formation of giant components. Additionally, 
a very interesting research question is how cover 
and mixing times vary with the parameter p, of 
the model. 
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Problem Definition 


In many diploid organisms like humans, chro- 
mosomes come in pairs. Genetic variation oc- 
curs in some “positions” along the chromosomes. 
These genetic variations are commonly modelled 
in the form of single nucleotide polymorphisms 
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(SNPs) [5], which are the nucleotide sites where 
more than one nucleotide can occur. A hap- 
lotype is the sequence of linked SNP genetic 
markers (small segments of DNA) on a single 
chromosome. However, experiments often yield 
genotypes, which is a blend of the two haplotypes 
of the chromosome pair. It is more useful to 
have information on the haplotypes, thus giving 
rise to the computational problem of inferring 
haplotypes from genotypes. 

The physical position of a marker on a chro- 
mosome is called a locus and its state is called 
an allele. SNP are often biallelic, i.e., the allele 
can take on two different states, corresponding 
to two different nucleotides. In the language of 
computer science, the allele of a biallelic SNP 
can be denoted by O and 1, and a haplotype 
with m loci is represented as a length-m string 
in {0,1} and a genotype as a length-m string 
in {0, 1, 2}””. Consider a haplotype pair (/1, h2) 
and a corresponding genotype g. For each locus, 
if both haplotypes show a 0, then the geno- 
type must also be 0, and if both haplotypes 
show a 1, the genotype must also be |. These 
loci are called homozygous. If however one of 
the haplotypes shows a O and the other a 1, 
the genotype shows a 2 and the locus is called 
heterozygous. This is called SNP consistency. 
For example, considering a single individual, the 
genotype g = 012212 has four SNP-consistent 
haplotype pairs: {(011111, 010010), (011110, 
010011), (011011, 010110), (011010, 010111) }. 
In general, if a genotype has s heterozygous 
loci, it can have 2°~! SNP-consistent haplotype 
solutions. 

Haplotypes are passed down from an indi- 
vidual to its descendants. Mendelian consistency 


requires that, in the absence of recombinations 
or mutations, each child inherits one haplotype 
from one of the two haplotypes of the father 
and inherits the other haplotype from the mother 
similarly. This gives us more information to in- 
fer haplotypes when we are given a pedigree. 
The computational problem is therefore, given 
a pedigree with n individuals where each indi- 
vidual is associated with a genotype of length 
m, find an assignment of a pair of haplotypes 
to each individual such that SNP consistency 
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Haplotype Inference on Pedigrees Without Recom- 
binations, Fig. 1 (a) Example of a pedigree with four 
nodes. (b) The graph G with 12 vertices, 6 red edges, 


and Mendelian consistency are obeyed for each 
individual. In rare cases (especially for humans) 
[3], the pedigree may contain mating loops: a 
mating loop is formed when, for example, there 
is a marriage between descendants of a common 
ancestor. 

As a simple example, consider the pedigree in 
Fig. la for a family of four individuals and their 
genotypes. Due to SNP consistency, mother M’s 
haplotypes must be (0000, 1000) (the order does 
not matter). Similarly, daughter D’s haplotypes 
must be (1000, 1100). Now we apply Mendelian 
consistency to deduce that D must obtain the 
1000 haplotype from M since neither of father 
F’s haplotypes can be 1000 (considering locus 
2). Therefore, D obtains 1100 from F, and F’s 
haplotypes must be (0101, 1100). With F’s and 
M’s haplotypes known, the only solution for the 
haplotypes of son S that is consistent with his 
genotype 2202 is (0101, 1000). 


Key Results 


While this kind of deduction might appear to 
be enough to resolve all haplotype values, it 
is not the case. As we will shortly see, there 
are “long-distance” constraints that need to be 
considered. These constraints can be represented 
by a system of linear equations in GF(2) and 
solved using Gaussian elimination. This gives a 
O(m?n?) time algorithm [3]. Subsequent papers 
try to capture or solve the constraints more eco- 
nomically. The time complexity was improved 
in [6] to O(mn? + n? log’ n loglogn) by elimi- 
nating redundant equations and using low-stretch 
spanning trees. A different approach was used 


s £2101 —2000 ) D 1100 — 1000 


and 4 brown edges. Each vector is a vertex in G. Vector 
pairs enclosed by rounded rectangles belong to the same 
individual 


in [1], representing the constraints by the parity 
of edge labels of some auxiliary graphs and 
finding solutions of these constraints using graph 
traversal without (directly) solving a system of 
linear equations. This gives a linear O(mn) time 
algorithm, although it only works for the case 
with no mating loops and only produces one 
particular solution even when the pedigree admits 
more than one solution. Later algorithms include 
[4] which returns the full set of solutions in 
optimal time (again without mating loops) and 
[2] which can handle mating loops and runs in 
O(kmn + k?m) time where k is the number of 
mating loops. 

In the following we sketch the idea behind 
the linear time algorithm in [1]. Each individual 
only has a pair of haplotypes, but the algorithm 
first produces a number of vector pairs for each 
individual, one vector pair for each trio (a father- 
mother-child triplet) that this individual belongs 
to. Each vector pair represents the information 
about the two haplotypes of this individual that 
can be derived by considering this trio only. 
These vector pairs will eventually be “unified” to 
become a single pair. 

For the pedigree in Fig. la, the algorithm 
first produces the graph G in Fig. 1b, which has 
two connected components for the two trios F- 
M-S and F-M-D. The rule for enforcing SNP 
consistency (Mendelian consistency) is that the 
unresolved loci values, i.e., the ? values, must 
be different (same) at opposite ends of a red 
(brown) edge. There is only one way to unify the 
vector pairs of F consistently (due to locus 4): 
2101 must correspond to 0101. We add an edge 
between these two vectors to represent the fact 
that they should be identical. Then all ? values 
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Haplotype Inference on Pedigrees Without Recombi- 
nations, Fig. 2, An example showing how constraints are 
represented by labeled edges in another graph. (a) The 


can be resolved by traversing the now-connected 
graph and applying the aforementioned rules for 
enforcing consistency. 

However, consider another pedigree in Fig. 2a. 
The previous steps can only produce Fig. 2b, 
which has four connected components and un- 
resolved loci. We need to decide for A and B 
whether A = 00? should connect to B =??0 
or its complement ??1 and similarly for B’ and 
C, etc. Observe that a path between A and C 
must go through an odd number of red edges 
since locus 1 changes from 0 to 1. To capture this 
type of long-distance constraints, we construct a 
parity constraint graph J where the edge labels 
represent the parity constraints; see Fig. 2c. In 
effect, J represents a set of linear equations in 
GF(2); in Fig. 2c, the equations are x4g-+XB’C = 
l,xpctxc’p = 0,and x4g+xB/c+Xxc’p = 0. 

Finally, we can traverse J along the unique 
path between any two nodes; the parity of this 
path tells us how to merge the vector pairs in G. 
For example, the parity between A and B should 
be 0, indicating 00? in A should connect to ??0 in 
B (so both become 000), while the parity between 
B’ and C is 1, so B’ and C should be 000 and 
111, respectively. 


pedigree. (b) The local graph G. (c) The parity constraint 
graph J. Three constraints are added 
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Problem Definition 


The work of Pitt and Valiant [18] deals with 
learning Boolean functions in the Probably Ap- 
proximately Correct (PAC) learning model intro- 
duced by Valiant [19]. A learning algorithm in 
Valiant’s original model is given random exam- 
ples of a function f : {0,1}” > {0,1} from a 
representation class F and produces a hypothesis 
h e€ F that closely approximates f. Here, a 
representation class is a set of functions and a 
language for describing the functions in the set. 
The authors give examples of natural represen- 
tation classes that are NP-hard to learn in this 
model, whereas they can be learned if the learn- 
ing algorithm is allowed to produce hypotheses 
from a richer representation class H. Such an 
algorithm is said to learn F by H; learning F by 
F is called proper learning. 

The results of Pitt and Valiant were the first 
to demonstrate that the choice of representation 
of hypotheses can have a dramatic impact on the 
computational complexity of a learning problem. 
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Their specific reductions from NP-hard problems 
are the basis of several other follow-up works on 
the hardness of proper learning [1,3, 7]. 


Notation 

Learning in the PAC model is based on the as- 
sumption that the unknown function (or concept) 
belongs to a certain class of concepts C. In order 
to discuss algorithms that learn and output func- 
tions, one needs to define how these functions 
are represented. Informally, a representation for 
a concept class C is a way to describe concepts 
from C that defines a procedure to evaluate a con- 
cept in C on any input. For example, one can rep- 
resent a conjunction of input variables by listing 
the variables in the conjunction. More formally, a 
representation class can be defined as follows. 


Definition 1 A representation class F is a pair 
(L,R) where 


¢ Lisalanguage over some fixed finite alphabet 
(e.g., {0, 1); 

¢ FR is an algorithm that foro ¢€ L, on input 
(o, 1”) returns a Boolean circuit over {0, 1}”. 


In the context of efficient learning, only ef- 
ficient representations are considered, or, rep- 
resentations for which F is a polynomial-time 
algorithm. The concept class represented by F is 
the set of functions over {0,1}”" defined by the 
circuits in {R(o,1")|o € L}. For a Boolean 
function f, “f © F” means that f belongs to 
the concept class represented by F and that there 
isao € L whose associated Boolean circuit com- 
putes f. For most of the representations discussed 
in the context of learning, it is straightforward 
to construct a language L and the corresponding 
translating function ?, and therefore, they are not 
specified explicitly. 

Associated with each representation is the 
complexity of describing a Boolean function 
using this representation. More formally, for a 
Boolean function f € C, F-size(f) is the 
length of the shortest way to represent f using 
F,ormin{|jo||o¢ L, R(o, 1") = f}. 

We consider Valiant’s PAC model of learning 
[19], as generalized by Pitt and Valiant [18]. 
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In this model, for a function f and a distribu- 
tion D over X, an example oracle EX(f,D) 
is an oracle that, when invoked, returns an ex- 
ample (x, f(x)), where x is chosen randomly 
with respect to D, independently of any previous 
examples. For € > 0, we say that function 
g €-approximates a function f with respect to 
distribution D if Prp[ f(x) 4 g(x)] <e. 


Definition 2 A representation class F is PAC 
learnable by representation class H. if there exists 
an algorithm that for every « > 0,6 > 0, n, 
f € Ff, and distribution D over X, given e, 4, 
and access to EX(f, D), runs in time polynomial 
inn,s = F-size(c), 1/e and 1/6, and outputs, 
with probability at least 1—6, a hypothesis h « H 
that €-approximates /f. 


A DNF expression is defined as an OR of 
ANDs of literals, where a literal is a possibly 
negated input variable. We refer to the ANDs of a 
DNF formula as its terms. Let DNF(k) denote the 
representation class of k-term DNF expressions. 
Similarly, a CNF expression is an OR of ANDs 
of literals. Let kK-CNF denote the representation 
class of CNF expressions with each AND having 
at most k literals. 

For a real-valued vector c € R” and@ € R,a 
linear threshold function (also called a halfspace) 
Tc,9(x) is the function that equals 1 if and only 
if )°,-, cix; = 6. The representation class of 
Boolean threshold functions consists of all linear 
threshold functions with c € {0,1}” and @ an 
integer. 


Key Results 


Theorem 1 ({18]) For every k > 2, the repre- 
sentation class of DNF(k) is not properly learn- 
able unless RP = NP. 


More specifically, Pitt and Valiant show that 
learning DNF(&) by DNF(£) is at least as hard as 
coloring a k-colorable graph using £ colors. For 
the case k = 2, they obtain the result by reducing 
from Set Splitting (see [9] for details on the 
problems). Theorem | is in sharp contrast with 
the fact that DNF(k) is learnable by kK-CNF [19]. 
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Theorem 2 ({18]) The representation class of 
Boolean threshold functions is not properly 
learnable unless RP = NP. 


This result is obtained via a reduction from 
the NP-complete Zero-One Integer Programming 
problem (see [9] (p.245) for details on the prob- 
lem). The result is contrasted by the fact that gen- 
eral linear thresholds are properly learnable [4]. 

These results show that using a specific repre- 
sentation of hypotheses forces the learning algo- 
rithm to solve a combinatorial problem that can 
be NP-hard. In most machine learning applica- 
tions it is not important which representation of 
hypotheses is used as long as the value of the 
unknown function is predicted correctly. There- 
fore, learning in the PAC model is now defined 
without any restrictions on the output hypothesis 
(other than it being efficiently evaluatable). Hard- 
ness results in this setting are usually based on 
cryptographic assumptions (cf. [15]). 

Hardness results for proper learning based on 
assumption NP # RP are now known for several 
other representation classes and for other vari- 
ants and extensions of the PAC learning model. 
Blum and Rivest show that for any k > 3, 
unions of k halfspaces are not properly learn- 
able [3]. Hancock et al. prove that decision trees 
(cf. [16] for the definition of this representation) 
are not learnable by decision trees of somewhat 
larger size [11]. This result was strengthened 
by Alekhnovich et al. who also proved that in- 
tersections of two halfspaces are not learnable by 
intersections of k halfspaces for any constant 
k, general DNF expressions are not learnable 
by unions of halfspaces (and in particular are not 
properly learnable) and k-juntas are not properly 
learnable [1]. Further, DNF expressions remain 
NP-hard to learn properly even if membership 
queries, or the ability to query the unknown func- 
tion at any point, are allowed [7]. Khot and Saket 
show that the problem of learning intersections 
of two halfspaces remains NP-hard even if a 
hypothesis with any constant error smaller than 
1/2 is required [17]. No efficient algorithms or 
hardness results are known for any of the above 
learning problems if no restriction is placed on 
the representation of hypotheses. 


Hardness of Proper Learning 


The choice of representation is important even 
in powerful learning models. Feldman proved 
that n°-term DNF are not properly learnable for 
any constant c even when the distribution of 
examples is assumed to be uniform and member- 
ship queries are available [7]. This contrasts with 
Jackson’s celebrated algorithm for learning DNF 
in this setting [13], which is not proper. 

In the agnostic learning model of Haussler 
[12] and Kearns et al. [14], even the representa- 
tion classes of conjunctions, decision lists, halfs- 
paces, and parity functions are NP-hard to learn 
properly (cf. [2,6, 8, 10] and references therein). 
Here again the status of these problems in the 
representation-independent setting is largely un- 
known. 


Applications 


A large number of practical algorithms use repre- 
sentations for which hardness results are known 
(most notably decision trees, halfspaces, and neu- 
ral networks). Hardness of learning F by H 
implies that an algorithm that uses H. to represent 
its hypotheses will not be able to learn F in 
the PAC sense. Therefore such hardness results 
elucidate the limitations of algorithms used in 
practice. In particular, the reduction from an NP- 
hard problem used to prove the hardness of learn- 
ing F by H can be used to generate hard instances 
of the learning problem. 


Open Problems 

A number of problems related to proper 
learning in the PAC model and its extensions 
are open. Almost all hardness of proper 
learning results are for learning with respect 
to unrestricted distributions. For most of the 
problems mentioned in section “Key Results” 
it is unknown whether the result is true if the 
distribution is restricted to belong to some 
natural class of distributions (e.g., product 
distributions). It is unknown whether decision 
trees are learnable properly in the PAC model 
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or in the PAC model with membership queries. 
This question is open even in the PAC model 
restricted to the uniform distribution only. Note 
that decision trees are learnable (non-properly) 
if membership queries are available [5] and are 
learnable properly in time O(n'2*), where s is 
the number of leaves in the decision tree [1]. 

An even more interesting direction of research 
would be to obtain hardness results for learning 
by richer representation classes, such as AC° cir- 
cuits, classes of neural networks and, ultimately, 
unrestricted circuits. 
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Problem Definition 


One of the goals of the design of the harmonic 
algorithm (or class of algorithms) was to provide 
an online algorithm for the classic bin pack- 
ing problem that performs well with respect to 
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the asymptotic competitive ratio, which is the 
standard measure for online algorithms for bin 
packing type problems. The competitive ratio for 
a given input is the ratio between the costs of 
the algorithm and of an optimal off-line solution. 
The asymptotic competitive ratio is the worst- 
case competitive ratio of inputs for which the 
optimal cost is sufficiently large. In the online 
(standard) bin packing problem, items of rational 
sizes in (0,1] are presented one by one. The 
algorithm must pack each item into a bin before 
the following item is presented. The total size of 
items packed into a bin cannot exceed 1, and the 
goal is to use the minimum number of bins, where 
a bin is used if at least one item was packed into 
it. All items must be packed, and the supply of 
bins is unlimited. 

When an algorithm acts on an input, it can 
decide to close some of its bins and never use 
them again. A bin is called closed in such a case, 
while otherwise a used bin (which already has at 
least one item) is called open. The motivation for 
closing bins is to obtain fast running times per 
item (so that the algorithm will pack it into a 
bin selected out of a small number of options). 
Simple algorithms such as First Fit (FF), Best 
Fit (BF), and Worst Fit (WF) have worst-case 
running times of O(log N) per item, where N 
is the number of items at the time of assignment 
of the new item. On the other hand, the simple 
algorithm Next Fit (NF), which keeps at most of 
open bin and closes it when a new item cannot 
be packed there (before it uses a new bin for the 
new item), has a worst-case running time of O(1) 
per item. Algorithms that keep a constant number 
of open bins are called bounded space. In many 
practical applications, this property is desirable, 
since the number of candidate bins for a new item 
is small and it does not increase with the input 
size. 

Algorithm HARM, (for an integer k > 3) 
was defined by Lee and Lee [7]. The fundamental 
and natural idea of “harmonic-based” algorithms 
is classify each item by size first (for online 
algorithms, the classification of an item must be 
done immediately upon arrival) and then pack it 
according to its class (instead of letting the exact 
size influence packing decisions). For the classifi- 
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cation of items, HARM, splits the interval (0, 1] 
into subintervals. There are k — 1 subintervals 
of the form (Fl fori = 1,...,k — 1 and 
one final subinterval (0, il: Each bin will contain 
only items from one subinterval (type). Every 
type is packed independently into its own bins 
using NF. Thus, there are at most kK — 1 open 
bins at each time (since for items of sizes above 
S, two items cannot share a bin, and any bin can 
be closed once it receives an item). Moreover, for 
i < k, as the items of type i have sizes no larger 
than 4 but larger than a every closed bin of 
this type will have exactly i items. For type k, a 
closed bin will contain at least k items, but it may 
contain many more items. This defines a class of 
algorithms (containing one algorithm for any k > 
3). The term the harmonic algorithm (or simply 
HARM) refers to HARM, for a sufficiently large 
value of k, and its asymptotic competitive ratio 
is the infimum value that can be achieved as the 
asymptotic competitive ratio of any algorithm of 
this class. 


Key Results 


It was shown in paper [7] that for k tending to 
infinity, the asymptotic ratio of HARM is a sum 
of series denoted by [Tg (see below), and it is 
equal to approximately 1.69103. Moreover, this is 
the best possible asymptotic competitive ratio of 
any online bounded space algorithm for standard 
bin packing. 

The crucial item sizes are of the form + + 8, 
where € > 0 is small and £ is an integer. These 
are items of type € — 1, and bins consisting of 
such items contain £ — 1 items (except for the last 
bin used for this type that may contain a smaller 
number of items). However, a bin (of an off-line 
solution) that already contains an item of size 
5 + €, and an item of size 4 + &€2 (for some small 
€1,€2 > 0) cannot contain also an item whose 
size is slightly above i. The largest item of this 
form would be slightly larger than s. Thus, the 
following sequence was defined [7]. Let 71; = 1 
and, for 7 > 1, 7; = mj;~-1(a;-1 + 1) (note that 
1 ;/ is divisible by any 2; for j < j’). It turns out 
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that the crucial item sizes are just above a ; 
mj+l 

The series v1 a give the asymptotic compet- 


itive ratio of the HARM, IT. For a long time the 
best lower bound on the asymptotic competitive 
ratio of (unbounded space) online algorithms was 
the one by van Vliet [8, 13], proved using this 
sequence (but the current best lower bound was 
proved using another set of inputs [1]). 

In order to prove the upper bound J7,, on the 
competitive ratio, weights were used [12]. In this 
case weights are defined (for a specific value of k) 
quite easily such that all bins (except for the bins 
that remain open when the algorithm terminates) 
have total weights of at least 1. The weight of an 
item of typei < k is 4 The bins of type k are 
almost full for sufficiently large values of k (a bin 
can be closed only if the total size of its items 
exceeds | — L). Assigning such an item a weight 
that is —Z times its size will allow one to show 
that all bins except for a constant number of bins 
(at most k—1 bins) have total weights of at least 1. 
It is possible to show that the total weight of any 
packed bin is sufficiently close to ITo, for large 
values of k. As both HARMg, and an optimal 
solution pack the same items, the competitive 
ratio is implied. To show the upper bound on the 
total weight of any packed bin, it is required to 
show that the worst-case bin contains exactly one 
item of size just above oa fora; < k-1 
(and the remaining space can only contain items 
of type k). Roughly speaking, this holds as once 
it was proved that the bin contains the largest 
such items, the largest possible additional weight 
can be obtained only by adding the next such 
item. 

Proving that no better bounded space algo- 
rithms exist can be done as follows. Let j’ be 
a fixed integer. Let N be a large integer and 
consider a sequence containing N items of each 


size a + 6 for a sufficiently small 5 > 0, 


Pi 


for any j = j’,j’ —1,---,1. If 6 is chosen 


appropriately, we have ye it + 7/6 <1, 
so the items can be packed (off-line) into N 
bins. However, if items are presented in this order 
(sorted by nondecreasing size), after all items of 
one size have been presented, only a constant 


number of bins can receive larger items, and 
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thus the items of each size are packed almost 
independently. 


Related Results 


The space of a bounded space algorithm is the 
number of open bins that it can have. The space of 
NF is 1, while the space of harmonic algorithms 
increases with k. A bounded space algorithm 
with space 2 and the same asymptotic competitive 
ratio as FF and BF have been designed [3] (for 
comparison, HARM; has an asymptotic compet- 
itive ratio of f). A modification where smaller 
space is used to obtain the same competitive 
ratios of harmonic algorithms (or alternatively, 
smaller competitive ratios were obtained using 
the same space) was designed by Woeginger [15]. 
Thus, there exists another sequence of bounded 
space algorithms, with an increasing sequence of 
open bins, where their sequence of competitive 
ratios tends to [Tg, such that the space required 
for every competitive ratio is much smaller than 
that of [7]. 

One drawback of the model above is that an 
off-line algorithm can rearrange the items and 
does not have to process them as a sequence. The 
variant where it must process them in the same 
order as an online algorithm was studied as well 
[2]. Algorithms that are based on partitioning into 
classes and have smaller asymptotic competitive 
ratios (but they are obviously not bounded space) 
were designed [7,9, 11]. 

Generalizations have been studied too, in par- 
ticular, bounded space bin packing with cardi- 
nality constraints (where an item cannot receive 
more than f items for a fixed integer tf > 2) 
[5], parametric bin packing (where there is an 
upper bound strictly smaller than | on item sizes) 
[14], bin packing with rejection (where an item i 
has a rejection penalty r; associated with it, and 
it can be either packed, or rejected for the cost 
r;) [6], variable-sized bin packing (where bins of 
multiple sizes are available for packing) [10], and 
bin packing with resource augmentation (where 
the online algorithm can use bins of size b > 1 
for a fixed rational number b, while an off-line 
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algorithm still uses bins of size 1) [4]. In this last 
variant, the sequences of critical item sizes were 
redefined as a function of b, while variable-sized 
bin packing required a more careful partition into 
intervals. 
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Problem Definition 


The general idea of hierarchical self-assembly 
(a.k.a., multiple tile [2], polyomino [8, 10], two- 
handed [3,5, 6]) is to model self-assembly of 
tiles in which attachment of two multi-tile assem- 
blies is allowed, as opposed to all attachments 
being that of a single tile onto a larger assembly. 
Several problems concern comparing hierarchical 
self-assembly to its single-tile-attachment variant 
(called the “seeded” model of self-assembly), 
so we define both models here. The model of 
hierarchical self-assembly was first defined (in 
a slightly different form that restricted the size 
of assemblies that could attach) by Aggarwal, 
Cheng, Goldwasser, Kao, Moisset de Espanes, 
and Schweller [2]. Several generalizations of the 
model exist that incorporated staged mixing of 
test tubes, “dissolvable” tiles, active signaling 
across tiles, etc., but here we restrict attention 
to the model closest to the seeded model of 
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Winfree [9], different from that model only in 
the absence of a seed and the ability of two large 
assemblies to attach. 


Definitions 

A tile type is a unit square with four sides, each 
consisting of a glue label (often represented as a 
finite string) and a nonnegative integer strength. 
We assume a finite set T of tile types, but an 
infinite number of copies of each tile type, each 
copy referred to as a tile. An assembly is a 
positioning of tiles on the integer lattice Z7, ie., 
a partial function a : Z? --> T. We write 
|a| to denote |dom a|. Write a EC f to denote 
that a is a subassembly of B, which means that 
dom a C dom f and a(p) = B(p) for all points 
p € dom qa. We abuse notation and take a tile 
type t to be equivalent to the single-tile assembly 
containing only f (at the origin if not otherwise 
specified). Two adjacent tiles in an assembly in- 
teract if the glue labels on their abutting sides are 
equal and have positive strength. Each assembly 
induces a binding graph, a grid graph whose 
vertices are tiles, with an edge between two tiles 
if they interact. The assembly is t-stable if every 
cut of its binding graph has strength at least r, 
where the weight of an edge is the strength of 
the glue it represents. That is, the assembly is 
stable if at least energy t is required to separate 
the assembly into two parts. 

We now define both the seeded and hierarchi- 
cal variants of the tile assembly model. A seeded 
tile system is a triple 7 = (T,o,t), where T is 
a finite set of tile types, 0 : Z? --> T is a finite, 
t-stable seed assembly, and t is the temperature. 
If 7 has a single seed tile s € T (i.e., 0(0,0) = s 
for some s € T and is undefined elsewhere), 
then we write 7 = (T,s,1). Let |7| denote |7}. 
An assembly @ is producible if either a = o 
or if 6 is a producible assembly and @ can be 
obtained from f by the stable binding of a single 
tile. In this case, write 8B — a (q@ is producible 
from f by the attachment of one tile), and write 
B — aif B -} a@ (@ is producible from f by the 
attachment of zero or more tiles). An assembly is 
terminal if no tile can be t-stably attached to it. 

A hierarchical tile system is a pair T = (T,T), 
where T is a finite set of tile types and t ¢ N 
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is the temperature. An assembly is producible if 
either it is a single tile from T or it is the t-stable 
result of translating two producible assemblies 
without overlap. Therefore, if an assembly @ is 
producible, then it is produced via an assembly 
tree, a full binary tree whose root is labeled with 
a, whose |a| leaves are labeled with tile types, 
and each internal node is a producible assembly 
formed by the stable attachment of its two child 
assemblies. An assembly a is terminal if for 
every producible assembly f, a and f cannot be 
t-stably attached. If a can grow into B by the 
attachment of zero or more assemblies, then we 
write a > B. 

In either model, let A[7] be the set of pro- 
ducible assemblies of 7, and let Ag[7T] € A[T] 
be the set of producible, terminal assemblies of 
JT. A TAS T is directed (a.k.a., deterministic, 
confluent) if |Ag[T]| = 1. If 7 is directed with 
unique producible terminal assembly a, we say 
that J uniquely produces a. It is easy to check 
that in the seeded aTAM, 7 uniquely produces 
a if and only if every producible assembly 6 C 
a. In the hierarchical model, a similar condition 
holds, although it is more complex since hierar- 
chical assemblies, unlike seeded assemblies, do 
not have a “canonical translation” defined by the 
seed position. 7 uniquely produces a if and only 
if for every producible assembly f, there is a 
translation 6’ of B such that 6’ C a. In particular, 
if there is a producible assembly 6B 4 a@ such 
that dom a = dom f, then a is not uniquely 
produced. Since dom 8 = dom q@, every nonzero 
translation of 6B has some tiled position outside 
of dom a, whence no such translation can be a 
subassembly of a, implying @ is not uniquely 
produced. 


Power of Hierarchical Assembly Compared 

to Seeded 

One sense in which we can conclude that one 
model of computation M is at least as powerful as 
another model of computation M’ is to show that 
any machine defined by M’ can be “simulated 
efficiently” by a machine defined by M. In self- 
assembly, there is a natural definition of what it 
means for one tile system S to “simulate” another 
T .. We now discuss intuitively how to define such 
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a notion. There are several intricacies to the full 
formal definition that are discussed in further 
detail in [3,5]. 

First, we require that there is a constant k € 
Z* (the “resolution loss”) such that each tile type 
t in J is “represented” by one or more k x k 
blocks £ of tiles in S. In this case, we write 
r(B) = t, where B : {1,...,k}? --> S and S 
is the tile set of S. Then 6 represents a k x k 
block of such tiles, possibly with empty positions 
at points x where f(x) is undefined. We call 
such a k x k block in S a “macrotile.” We can 
extend r to a function R that, given an assembly 
Qs partitioned into k x k macrotiles, outputs an 
assembly a7 of 7 such that, for each macrotile 
B of as, r(B) = t, where t¢ is the tile type at the 
corresponding position in a7. 

Given such a representation function R indi- 
cating how to interpret assemblies of S as repre- 
senting assemblies of 7, we now define what it 
means to say that S simulates T. For each pro- 
ducible assembly a7 of 7, there is a producible 
assembly as of S such that R(as) = 7, and 
furthermore, for every producible assembly as, if 
R(as) = T, then 7 is producible in 7. Finally, 
we require that R respects the “single attach- 
ment” dynamics of 7: there is a single tile that 
can be attached to ary to result in a, if and only if 
there is some sequence of attachments to as that 
results in assembly arg such that R(a’g) = 7. 

With such an idea in mind, we can ask, “Is 
the hierarchical model at least as powerful as the 
seeded model?” 


Problem 1 For every seeded tile system 7, de- 
sign a hierarchical tile system S that simulates 7. 


Another interpretation of a solution to Prob- 
lem | is that, to the extent that the hierarchical 
model is more realistic than the seeded model by 
incorporating the reality that tiles may aggregate 
even in the absence of a seed, such a solution 
shows how to enforce seeded growth even in 
such an unfriendly environment that permits non- 
seeded growth. 


Assembly Time 
We now define time complexity for hierarchi- 
cal systems (this definition first appeared in [4], 
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where it is explained in more detail). We treat 
each assembly as a single molecule. If two assem- 
blies w and 6 can attach to create an assembly 
y, then we model this as a chemical reaction 
a+ 6 — y,in which the rate constant is assumed 
to be equal for all reactions (and normalized to 
1). In particular, if @ and B can be attached 
in two different ways, this is modeled as two 
different reactions, even if both result in the same 
assembly. 

At an intuitive level, the model we define can 
be explained as follows. We imagine dumping 
all tiles into solution at once, and at the same 
time, we grab one particular tile and dip it into 
the solution as well, pulling it out of the solution 
when it has assembled into a terminal assem- 
bly. Under the seeded model, the tile we grab 
will be a seed, assumed to be the only copy 
in solution (thus requiring that it appears only 
once in any terminal assembly). In the seeded 
model, no reactions occur other than the attach- 
ment of individual tiles to the assembly we are 
holding. In the hierarchical model, other reac- 
tions are allowed to occur in the background 
(we model this using the standard mass-action 
model of chemical kinetics [7]), but only those 
reactions with the assembly we are holding move 
it “closer” to completion. The other background 


dla\(t) < 
dt 


i=1 i=1 
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reactions merely change concentrations of other 
assemblies (although these indirectly affect the 
time it will take our chosen assembly to complete, 
by changing the rate of reactions with our chosen 
assembly). 

More formally, let 7 = (7,7) be a hierarchi- 
cal TAS, and let p : T — [0, 1] be a concentra- 
tions function, giving the initial concentration of 
each tile type (we require that )°,.7 p(t) = 1, 
a condition known as the “finite density con- 
straint”). Let Rt = [0, 00), and let t € Rt. For 
a € A[T], let [a],(¢) (abbreviated [w](t) when p 
is clear from context) denote the concentration of 
a at time ¢ with respect to initial concentrations 
p, defined as follows. Given two assemblies a and 
B that can attach to form y, we model this event 
as a chemical reaction R : a + B > y. Say that 
areactiona + B > y is symmetric ifa = B. 
Define the propensity (a.k.a., reaction rate) of R 
at time t € R*™ to be pr(t) = [a](t) - [B](t) if R 
is not symmetric and pr(t) = 5 - [a](t)? if R is 
symmetric. 

If @ is consumed in reactions a + By —> 
V1,--.,@+ By — VY, and produced in asymmet- 
ric reactions B, + y; > a@,...,B),, +¥, > @ 
and symmetric reactions B{ + Bi > a@,...,B,+ 
Bi, > @, then the concentration [a](t) of a at 
time ¢ is described by the differential equation: 


DIB - PAO + 5 (IH? — Prelit, a) 


i=1 


with boundary conditions [a](0) = p(r) if a 
is an assembly consisting of a single tile r and 
[a](0) = 0 otherwise. In other words, the propen- 
sities of the various reactions involving a de- 
termine its rate of change, negatively if a is 
consumed and positively if a is produced. 

This completes the definition of the dynamic 
evolution of concentrations of producible assem- 
blies; it remains to define the time complexity 
of assembling a terminal assembly. Although we 
have distinguished between seeded and hierarchi- 
cal systems, for the purpose of defining a model 


of time complexity in hierarchical systems and 
comparing them to the seeded system time com- 
plexity model of [1], it is convenient to introduce 
a seedlike “timekeeper tile” into the hierarchical 
system, in order to stochastically analyze the 
growth of this tile when it reacts in a solution 
that is itself evolving according to the continu- 
ous model described above. The seed does not 
have the purpose of nucleating growth but is 
introduced merely to focus attention on a single 
molecule that has not yet assembled anything, in 
order to ask how long it will take to assemble 
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into a terminal assembly. The choice of which tile 
type to pick will be a parameter of the definition, 
so that a system may have different assembly 
times depending on the choice of timekeeper tile. 

Fix a copy of a tile type s to designate as 
a “timekeeper seed.” The assembly of s into 
some terminal assembly @ is described as a time- 
dependent continuous-time Markov process in 
which each state represents a producible assem- 
bly containing s, and the initial state is the size- 
1 assembly with only s. For each state aw rep- 
resenting a producible assembly with s at the 
origin, and for each pair of producible assemblies 
B,y such that a + 6 — y (with the translation 
assumed to happen only to f so that @ stays 
“fixed” in position), there is a transition in the 
Markov process from state @ to state y with 
transition rate [6](t). 

We define T7,p,5 to be the random variable 
representing the time taken for the copy of s 
to assemble into a terminal assembly via some 
sequence of reactions as defined above. We define 
the time complexity of a directed hierarchical 
TAS 7 with concentrations p and timekeeper s 
to be T(T, p, 8) = E[T r,s]. 

For a shape S C Z” (finite and connected), 
define the diameter of S to be diam(S) = 
max ||u — v||1, where ||w||1 is the L; norm of w. 
u,v! 


Problem 2 Design a hierarchical tile system 
JT = (T,T) such that every producible terminal 
assembly @ has the same shape S, and for 
some s € T and concentrations function 
p:T — [0,1], T(7, p,s) = o(diam(S)). 


It is provably impossible to achieve this with 
the seeded model [1,4], since all assemblies in 
that model require expected time at least propor- 
tional to their diameter. 


Key Results 


Power of Hierarchical Assembly Compared 

to Seeded 

Cannon, Demaine, Demaine, Eisenstat, Patitz, 
Schweller, Summers, and Winslow [3] showed a 
solution to Problem 1. (They also showed sev- 
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eral other ways in which the hierarchical model 
is more powerful than the seeded model, but 
we restrict attention to simulation here.) For the 
most part, temperature 2 seeded systems are as 
powerful as those at higher temperatures, but the 
simulation results of [3] hold for higher temper- 
atures as well. In particular, they showed that 
every seeded temperature >4 tile system 7 can 
be simulated by a hierarchical temperature 4 tile 
system (as well as showing it is possible for tem- 
perature t hierarchical tile systems to simulate 
temperature t seeded tile systems for t € {2, 3}, 
using similar logic to the higher-temperature con- 
struction). The definition of simulation has a 
parameter k indicating the resolution loss of the 
simulation. In fact, the simulation described in [3] 
requires only resolution loss k = 5. 

Figure | shows an example of S simulating 
7. The construction enforces the “simulation of 
dynamics” constraint that if and only if a single 
tile can attach in 7, and then a 5 x 5 macrotile 
representing it in S can assemble. It is critical 
that each tile type in 7 is represented by more 
than one type of macrotile in S: each different 
type of macrotile represents a different subset 
of sides that can cooperate to allow the tile to 
bind. To achieve this, each macrotile consists of a 
central “brick” (itself a 3 x 3 block composed of 9 
unique tile types with held together with strength- 
4 glues) surrounded by “mortar” (forming a ring 
around the central brick). Figure 1 shows “mortar 
rectangles” but, similarly to the brick, these are 
just 3 x 1 assemblies of 3 individual tile types 
with strength-4 glues. The logic of the system 
is such that if a brick B designed for a subset 
of cooperating sides C C {N,S,E,W}, then 
only if the mortar for all sides in C is present 
can B attach. Its attachment is required to fill 
in the remaining mortar representing the other 
sides in {N, S, E, W}\C that may not be present. 
Finally, those tiles enable the assembly of mortar 
in adjacent 5 x 5 blocks, to be ready for possible 
cooperation to bind bricks in those blocks. 


Assembly Time 

Chen and Doty [4] showed a solution to Prob- 
lem 2, by proving that for infinitely many n € 
N, there is a (non-directed) hierarchical TAS 
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Brick 


Mortar tile 


| 


Macrotile 


Hierarchical Self-Assembly, Fig. 1 Simulation of a 
seeded tile system 7 of temperature >4 by a hierarchical 
tile system S of temperature 4 (Figure taken from [3]). 
Filled arrows represent glues of strength 2, and unfilled 


T = (T,2) that strictly self-assembles an n x n’ 
rectangle S, where n’ = o(n) (hence diam(S) = 
@(n)), such that |T| = O(logn) and there is a 
tile type s € T and concentrations function p : 
T — [0, 1] such that T(7, p, s) = O(n*/> logn). 

The construction consists of m = n'/5 stages 
shown in Fig. 2, where each stage consists of the 
attachment of two “horizontal bars” to a single 
“vertical bar” as shown in Fig.3. The vertical 
bar of the next stage then attaches to the right 
of the two horizontal bars, which cooperate to 
allow the binding because they each have a single 
strength 1 glue. All vertical bars are identical 
when they attach, but attachment triggers the 
growth of some tiles (shown in orange in Figs. 2 
and 3) that make the attachment sites on the right 
side different from their locations in the previous 
stage, which is how the stages “count down” from 
m to lL. 

The bars themselves are assembled in a “stan- 
dard” way that requires time linear in the diame- 


Mortar rectangle 


Simulated aTAM, T = 4 


arrows represent glues of strength 1. In the seeded tile 
system, the number of dashes on the side of a tile represent 
its strength 


4/5 for a horizontal 


ter of the bar, which is w = n 
bar and mk? = n3/5 (where k is a parameter 
that we set to be n!/5) for a vertical bar. The 
speedup comes from the fact that each horizontal 
bar can attach to one of & different binding sites 
on a vertical bar, so the expected time for this 
to happen is factor k lower than if there were 
only a single binding site. The vertical “arm” on 
the left of each horizontal bar has the purpose of 
preventing any other horizontal bars from binding 
near it. Each stage also requires filler tiles to fill 
in the gap regions, but the time required for this 
is negligible compared to the time for all vertical 
and horizontal bars to attach. 

Note that this construction is not directed: 
although every producible terminal assembly has 
the shape of an n x n’ rectangle, there are many 
such terminal assemblies. Chen and Doty [4] 
also showed that for a class of directed systems 
called “partial order tile systems,” no solution to 
Problem 2 exists: provably any such tile system 
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Hierarchical 
Self-Assembly, Fig. 2 
High-level overview of 
interaction of “vertical 
bars” and “horizontal bars’ 
to create the rectangle in 
the solution to Problem 2 
that assembles in time 
sublinear in its diameter. 
Filler tiles fill in the empty 
regions. If glues overlap 
two regions then represent 
a formed bond. If glues 
overlap one region but not 
another, they are glues 
from the former region but 
are mismatched (and thus 
“covered and protected”) 
by the latter region 


2 


NJ horizontal bar type A r 
WK 
“SS ik identical pairs of glues 
spaced O(1) apart; vertical bar as it 
“group A glues" appears before ———» 
binding 


previous stage lower 
horizontal bar attached 
to one of these k glues 


partial vertical ———> 
bar (will 

complete after 

binding to two 
horizontal bars) 


width = w 


height = 
O(mk?) horizontal bar type B 


\ 


k identical pairs of glues 
(different glues from top) 
spaced O(k) apart; "group 
B glues" 


mk identical glues 
spaced O(k) apart 


“— vertical bar after "post- 
binding processing" to 
place stage-specific 
right-side glues 


Hierarchical Self-Assembly, Fig. 3 “Vertical bars” for stage of Fig. 2. “Type B” horizontal bars have a longer 
the construction of a fast-assembling square, and their vertical arm than “Type A” since the glues they must block 
interaction with horizontal bars, as shown for a single are farther apart 
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assembling a shape of diameter d requires ex- 
pected time 92(d). 


Open Problems 


It is known [2] that the tile complexity of assem- 
bling an n x k rectangle in the seeded aTAM, if 
k< nen is asymptotically lower 


log log n—log log logn’ 
bounded by 2 ( E 


ni/k 
O(n'/*). For the hierarchical model, the up- 
per bound holds as well [2], but the strongest 


known lower bound is the information-theoretic 
logn 
loglogn 


) and upper bounded by 


Question 1 What is the tile complexity of assem- 
bling ann xk rectangle in the hierarchical model, 


1 
when k < ——_%" ____9 
log log n—log log logn 
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Problem Definition 


Many algorithmic problems on spatial data can 
be solved efficiently if a suitable decomposition 
of the ambient space is available. Two desirable 
properties of the decomposition are that its cells 
have a nice shape — convex and/or of constant 
complexity — and that each cell intersects only a 
few objects from the given data set. Another de- 
sirable property is that the decomposition is hier- 
archical, meaning that the space is partitioned in a 
recursive manner. Popular hierarchical space de- 
compositions include quadtrees and binary space 
partitions. 

When the objects in the given data set are 
nonpoint objects, they can be fragmented by 
the partitioning process. This fragmentation has 
a negative impact on the storage requirements 
of the decomposition and on the efficiency of 
algorithms operating on it. Hence, it is desirable 
to minimize fragmentation. In this chapter, we 
describe methods to construct linear-size com- 
pressed quadtrees and binary space partitions 
for so-called low-density sets. To simplify the 
presentation, we describe the constructions in the 
plane. We use S to denote the set of n objects 
for which we want to construct a space decompo- 
sition and assume for simplicity that the objects 
in S are disjoint, convex, and of nonzero area. 


Binary Space Partitions 

A binary space partition for a set S of n objects 
in the plane is a recursive decomposition of the 
plane by lines, typically such that each cell in 
the final decomposition intersects only a few 
objects from S. The tree structure modeling this 
decomposition is called a binary space partition 
tree, or BSP tree for short — see Fig.1 for an 
illustration. Thus, a BSP tree 7 for S can be 
defined as follows. 


e If a predefined stopping criterion is met — 
often this is when |S| is sufficiently small — 
then 7 consists of a single leaf where the set 
S is stored. 

¢ Otherwise the root node v of 7 stores a 
suitably chosen splitting line £. Let €~ and 
£* denote the half-planes lying to the left 
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and to the right of @, respectively (or, if £ is 

horizontal, below and above £). 

— The left subtree of v is a BSP tree for 
S~ := {oN & :0 € S}, the set of object 
fragments lying in the half-plane ¢~. 

— The right subtree of v is a BSP tree for 
St := {oN £*t :0 € S}, the set of object 
fragments lying in the half-plane £*. 


The size of a BSP tree is the total number of object 
fragments stored in the tree. 


Compressed Quadtrees 

Let U = [0,1]? be the unit square. We say that 
a square 0 C U is a canonical square if there 
is an integer k > O such that o is a cell of the 
regular subdivision of U into 2* x 2* squares. 
A donut is the set-theoretic difference Oou \ Oin 
of a canonical square Ooyt and a canonical square 
Cin C Oout- A compressed quadtree J for a set P 
of points inside a canonical square o defined as 
follows; see also Fig. 2 (middle). 


¢ If a predefined stopping criterion is met — 
usually this is when | P| is sufficiently small 

— then 7 consists of a single leaf storing the 

set P. 

¢ Ifthe stopping criterion is not met, then 7 is 
defined as follows. Let oy, denote the north- 
east quadrant of o and let Pyp := PM One. 

Define Osp, Osw, Onw and Pop, Psw, Pyw sim- 

ilarly for the other three quadrants. (Here we 

should make sure that points on the boundary 
between quadrants are assigned to quadrants 
in a consistent manner.) Now 7 consists of 

a root node v with four or two children, 

depending on how many of the sets Pyz, Psz, 

Psw, Pyw are nonempty: 

— If at least two of the sets Pys, Psp, Psw, 
Pyw are nonempty, then v has four children 
UnE> Use, Usw, Unw- The child vuyg is the 
root of a compressed quadtree for the set 
Pyg inside the square oyw; the other three 
children are defined similarly for the point 
sets inside the other quadrants. 

— If only one of Pye, Psz, Psw, Pnw is 
nonempty, then v has two children vin 
and Voy. The child vj, is the root of a 
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Hierarchical Space Decompositions for Low-Density Scenes, Fig. 1 A binary space partition for a set of polygons 


(/eft) and the corresponding BSP tree (right) 


op 


ao ale 


za at 


r‘P 


Hierarchical Space Decompositions for Low-Density 
Scenes, Fig. 2 Construction of a compressed quadtree 
for a set of disks: take the bounding-box vertices (Jeff), 


compressed quadtree for P inside oin, 
where Oj, is the smallest canonical square 
containing all points from P. The other 
child is a leaf corresponding to the donut 
o \ Gin. 


A compressed quadtree for a set of n points has 
size O(n). 

Above we defined compressed quadtrees for 
point sets. In this chapter, we are interested 
in compressed quadtrees for nonpoint objects. 
These are defined similarly: each internal node 
corresponds to a canonical square, and each leaf 
is acanonical square or a donut. This time donuts 
need not be empty, but may intersect objects 
(although not too many). The right picture in 
Fig. 2 shows a compressed quadtree for a set of 


construct a compressed quadtree for the vertices (middle), 
and put the disks back in (right) 


disks. The size of a compressed quadtree for a set 
of nonpoint objects is defined as the total number 
of object fragments stored in the tree. Because 
nonpoint objects may be split into fragments 
during the subdivision process, a compressed 
quadtree for nonpoint objects is not guaranteed 
to have linear size. 


Low-Density Scenes 

The main question we are interested in is the 
following: given a set S of n objects, can we 
construct a compressed quadtree or BSP tree 
with O(n) leaves such that each leaf region 
intersects O(1) objects? In general, the answer to 
this question is no. For compressed quadtrees, 
this can be seen by considering a set S of 
slanted parallel segments that are very close 
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together. A linear-size BSP tree cannot be 
guaranteed either: there are sets of n disjoint 
segments in the plane for which any BSP tree has 
size Q(n log n/loglogn) [7]. In R? the situation 
is even worse: there are sets of n disjoint triangles 
for which any BSP tree has size (Q(n?) [5]. 
(Both bounds are tight: there are algorithms that 
guarantee a BSP tree of size O(n logn/ log logn) 
in the plane [8] and of size O(n”) in R? [6].) 
Fortunately, in practice, the objects for which we 
want to construct a space decomposition are often 
distributed nicely, which allows us to construct 
much smaller decompositions than for the worst- 
case examples mentioned above. To formalize 
this, we define the concept of density of a set of 
objects in R?. 


Definition 1 The density of a set S of objects in 
IR, denoted density(S'), is defined as the smallest 
number A such that the following holds: any ball 
b C R@ intersects at most A objects o € S such 
that diam(o) > diam(b), where diam(-) denotes 
the diameter of an object. 


As illustrated in Fig.3(i), a set of n parallel 
segments can have density n if the segments are 
very close together. In most practical situations, 
however, the input objects are distributed nicely 
and the density will be small. For many classes 
of objects, one can even prove that the density 
is O(1). For example, a set of disjoint disks in 
the plane has density at most 5. More generally, 
any set of disjoint objects that are fat — exam- 
ples of fat objects are disks, squares, triangles 
whose minimum angle is lower bounded — has 
density O(1) [3]. The main question now is: Is 


(i) 
b 
Hierarchical Space Decompositions for Low-Density 
Scenes, Fig. 3 (i) The ball b intersects all m segments 


and the segments have diameter larger than diam(b), so 
the density of the set of segments is n. (ii) Any ball b, no 


Hierarchical Space Decompositions for Low-Density Scenes 


low density sufficient to guarantee a hierarchical 
space decomposition of linear size? The answer 
is yes, and constructing the space decomposition 
is surprisingly easy. 


Key Results 


The construction of space decompositions for 
low-density sets is based on the following lemma. 
In the lemma, the square o is considered to be 
open, that is, 0 does not include its boundary. Let 
bb(o) denote the axis-aligned bounding box of an 
object o. 


Lemma 1 Let S be a set of n objects in the plane 
and let Bs denote the set of 4n vertices of the 
bounding boxes bb(o) of the objects o € S. Let 
o be any square region in the plane. Then the 
number of objects in S intersecting o is at most 
k + 4A, where k is the number of bounding-box 
vertices inside o and i := density(S). 


With Lemma | in hand, it is surprisingly simple 
to construct BSP trees or compressed quadtrees 
of small size for any given set S whose density is 
small. 


Binary Space Partitions 

For BSP trees we proceed as follows. Let Bs be 
the set of vertices of the bounding boxes of the 
objects in S. In a generic step in the recursive 
construction of 7, we are given a square o and 
the set of points Bs(o) := Bs No. Initially o 
is a square containing all points of Bs. When 
Bs = 9, then 7 consists of a single leaf and the 


»y 


matter where it is placed or what its size is, intersects at 
most three triangles with diameter at least diam(b), so the 
density of the set of triangles is 3 
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(i) (ii) 
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Hierarchical Space Decompositions for Low-Density Scenes, Fig. 4 Two cases in the construction of the BSP tree 


recursion ends; otherwise we proceed as follows. 
Let Ong, Osz, Osw, ANd Oyw denote the four quad- 
rants of o. We now have two cases, illustrated in 
Fig. 4. 


Case (i): all points in Bs(o) lie in the same 
quadrant. Let o’ be the smallest square shar- 
ing a corner with o and containing all points 
from Bg (c) in its interior or on its boundary. 
Split o into three regions using a vertical 
and a horizontal splitting line such that o’ is 
one of those regions; see Fig. 4(i). Recursively 
construct a BSP tree for the square o’ with 
respect to the set Bs(o’) of points lying in the 
interior of o’. 

Case (ii): not all points in Bs(c) lie in the same 
quadrant. Split o into four quadrants using a 
vertical and two horizontal splitting lines; see 
Fig. 4(ii). Recursively construct a BSP tree for 
each quadrant with respect to the points lying 
in its interior. 


The construction produces a subdivision of the 
initial square into O(n) leaf regions, which are 
squares or rectangles and which do not contain 
points from Bg in their interior. Using Lemma 1, 
one can argue that each leaf region intersects 
O(A) objects. 


Compressed Quadtrees 

The construction of a compressed quadtree for 
a low-density set S is also based on the set 
Bs of bounding-box vertices: we construct a 
compressed quadtree for Bs, where we stop the 
recursive construction when a square contains 
bounding-box vertices from at most one object in 
S or when all bounding-box vertices inside the 


square coincide. Figure 2 illustrates the process. 
The resulting compressed quadtree has O() leaf 
regions, which are canonical squares or donuts. 
Again using Lemma 1, one can argue that each 
leaf region intersects O(A) objects. 


Improvements and Generalizations 

The constructions above guarantee that each re- 
gion in the space decomposition is intersected 
by O(A) objects and that the number of regions 
is O(n). Hence, the total number of the object 
fragments is O(An). The main idea behind the 
introduction of the density A is that in prac- 
tice A is often a small constant. Nevertheless, 
it is (at least from a theoretical point of view) 
desirable to get rid of the dependency on A 
in the number of fragments. This is possible 
by reducing the number of regions in the de- 
composition to (1/4). To this end, we allow 
leaf regions to contain up to O(A) bounding- 
box vertices. Note that Lemma | implies that a 
square with O(A) bounding-box vertices inside 
intersects O(A) objects. If implemented correctly, 
this idea leads to decompositions with O(n/A) 
regions each of which intersects O(A) objects, 
both for binary space partitions [2, Section 12.5] 
and for compressed quadtrees [4]. The results can 
also be generalized to higher dimensions, giving 
the following theorem. 


Theorem 1 Let S be a set of n objects in R@ 
and let  := density(S). There is a binary space 
partition for S consisting of O(n/) leaf regions, 
each intersecting O(A) objects. Similarly, there is 
a compressed quadtree with O(n/A) leaf regions, 
each intersecting O(A) objects. 
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Problem Definition 


Algorithm engineering refers to the process 
required to transform a_ pencil-and-paper 
algorithm into a robust, efficient, well tested, 
and easily usable implementation. Thus it 
encompasses a number of topics, from modeling 
cache behavior to the principles of good 
software engineering; its main focus, however, 
is experimentation. In that sense, it may be 
viewed as a recent outgrowth of Experimental 
Algorithmics [14], which is specifically devoted 
to the development of methods, tools, and 
practices for assessing and refining algorithms 
through experimentation. The ACM Journal of 
Experimental Algorithmics (JEA), at URL www. 
jea.acm.org, is devoted to this area. 

High-performance algorithm engineering [2] 
focuses on one of the many facets of algorithm 
engineering: speed. The high-performance aspect 
does not immediately imply parallelism; in fact, 
in any highly parallel task, most of the impact of 
high-performance algorithm engineering tends to 
come from refining the serial part of the code. 

The term algorithm engineering was first used 
with specificity in 1997, with the organization 
of the first Workshop on Algorithm Engineering 
(WAE 97). Since then, this workshop has taken 
place every summer in Europe. The 1998 Work- 
shop on Algorithms and Experiments (ALEX98) 
was held in Italy and provided a discussion forum 
for researchers and practitioners interested in the 
design, analyzes and experimental testing of ex- 
act and heuristic algorithms. A sibling workshop 
was started in the Unites States in 1999, the Work- 
shop on Algorithm Engineering and Experiments 
(ALENEX99), which has taken place every win- 
ter, colocated with the ACM/SIAM Symposium on 
Discrete Algorithms (SODA). 


Key Results 


Parallel computing has two closely related main 
uses. First, with more memory and _ storage 
resources than available on a single workstation, 
a parallel computer can solve correspondingly 
larger instances of the same problems. This 
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increase in size can translate into running higher- 
fidelity simulations, handling higher volumes 
of information in data-intensive applications, 
and answering larger numbers of queries and 
datamining requests in corporate databases. 
Secondly, with more processors and _ larger 
aggregate memory subsystems than available 
on a single workstation, a parallel computer 
can often solve problems faster. This increase 
in speed can also translate into all of the 
advantages listed above, but perhaps its crucial 
advantage is in turnaround time. When the 
computation is part of a real-time system, such 
as weather forecasting, financial investment 
decision-making, or tracking and guidance 
systems, turnaround time is obviously the critical 
issue. A less obvious benefit of shortened 
turnaround time is higher-quality work: when 
a computational experiment takes less than an 
hour, the researcher can afford the luxury of 
exploration — running several different scenarios 
in order to gain a better understanding of the 
phenomena being studied. 

In algorithm engineering, the aim is to present 
repeatable results through experiments that apply 
to a broader class of computers than the specific 
make of computer system used during the experi- 
ment. For sequential computing, empirical results 
are often fairly machine-independent. While ma- 
chine characteristics such as word size, cache and 
main memory sizes, and processor and bus speeds 
differ, comparisons across different uniprocessor 
machines show the same trends. In particular, 
the number of memory accesses and proces- 
sor operations remains fairly constant (or within 
a small constant factor). In high-performance al- 
gorithm engineering with parallel computers, on 
the other hand, this portability is usually absent: 
each machine and environment is its own special 
case. One obvious reason is major differences 
in hardware that affect the balance of commu- 
nication and computation costs — a true shared- 
memory machine exhibits very different behav- 
ior from that of a cluster based on commodity 
networks. 

Another reason is that the communication 
libraries and parallel programming environments 
(e.g., MPI [12], OpenMP [16], and High- 
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Performance Fortran as well as the 


[10]), 
parallel algorithm packages (e.g., fast Fourier 
transforms using FFTW [6] or parallelized 
linear algebra routines in ScaLAPACK [4]), 
often exhibit differing performance on different 
types of parallel platforms. When multiple 
library packages exist for the same task, a user 
may observe different running times for each 
library version even on the same platform. 
Thus a running-time analysis should clearly 
separate the time spent in the user code from 
that spent in various library calls. Indeed, if 
particular library calls contribute significantly 
to the running time, the number of such calls 
and running time for each call should be 
recorded and used in the analysis, thereby 
helping library developers focus on the most cost- 
effective improvements. For example, in a simple 
message-passing program, one can characterize 
the work done by keeping track of sequential 
work, communication volume, and number 
of communications. A more general program 
using the collective communication routines of 
MPI could also count the number of calls to 
these routines. Several packages are available to 
instrument MPI codes in order to capture such 
data (e.g., MPICH’s nupshot [8], Pablo [17], 
and Vampir [15]). The SKaMPI benchmark [18] 
allows running-time predictions based on such 
measurements even if the target machine is not 
available for program development. SKaMPI was 
designed for robustness, accuracy, portability, 
and efficiency; For example, SKaMPI adaptively 
controls how often measurements are repeated, 
adaptively refines message-length and step-width 
at “interesting” points, recovers from crashes, 
and automatically generates reports. 


Applications 


The following are several examples of algorithm 
engineering studies for high-performance and 
parallelcomputing. 


1. Bader’s prior publications (see [2] and http:// 
www.cc.gatech.edu/~bader) contain many 
empirical studies of parallel algorithms for 
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combinatorial problems like sorting, selection, 
graph algorithms, and image processing. 

. In a recent demonstration of the power of 
high-performance algorithm engineering, 
a million-fold speed-up was achieved through 
a combination of a 2,000-fold speedup 
in the serial execution of the code and 
a 512-fold speedup due to parallelism 
(a speed-up, however, that will scale to any 
number of processors) [13]. (In a further 
demonstration of algorithm engineering, 
additional refinements in the search and 
bounding strategies have added another 
speedup to the serial part of about 1,000, 
for an overall speedup in excess of 2 billion) 

. JaéJé and Helman conducted empirical studies 
for prefix computations, sorting, and _ list- 
ranking, on symmetric multiprocessors. The 
sorting research (see [9]) extends Vitter’s 
external Parallel Disk Model to the internal 
memory hierarchy of SMPs and uses this new 
computational model to analyze a general- 
purpose sample sort that operates efficiently in 
shared-memory. The performance evaluation 
uses nine well-defined benchmarks. The 
benchmarks include input distributions 
commonly used for sorting benchmarks (such 
as keys selected uniformly and at random), 
but also benchmarks designed to challenge the 
implementation through load imbalance and 
memory contention and to circumvent algo- 
rithmic design choices based on specific input 
properties (such as data distribution, presence 
of duplicate keys, pre-sorted inputs, etc.). 

. In [3] Blelloch et al. compare through analysis 
and implementation three sorting algorithms 
on the Thinking Machines CM-2. Despite the 
use of an outdated (and no longer available) 
platform, this paper is a gem and should be 
required reading for every parallel algorithm 
designer. In one of the first studies of its kind, 
the authors estimate running times of four 
of the machine’s primitives, then analyze the 
steps of the three sorting algorithms in terms 
of these parameters. The experimental studies 
of the performance are normalized to provide 
clear comparison of how the algorithms 
scale with input size on a 32K-processor 
CM-?2. 
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5. Vitter et al. provide the canonical theoretic 
foundation for I/O-intensive experimental 
algorithmics using external parallel disks (e.g., 
see [1, 19, 20]). Examples from sorting, FFT, 
permuting, and matrix transposition problems 
are used to demonstrate the parallel disk 
model. 

6. Juurlink and Wijshoff [11] perform one of 
the first detailed experimental accounts on the 
preciseness of several parallel computation 
models on five parallel platforms. The authors 
discuss the predictive capabilities of the 
models, compare the models to find out which 
allows for the design of the most efficient 
parallel algorithms, and _ experimentally 
compare the performance of algorithms 
designed with the model versus those designed 
with machine-specific characteristics in mind. 
The authors derive model parameters for each 
platform, analyses for a variety of algorithms 
(matrix multiplication, bitonic sort, sample 
sort, all-pairs shortest path), and detailed 
performance comparisons. 

7. The LogP model of Culler et al. [5] provides 
a realistic model for designing parallel 
algorithms for message-passing platforms. Its 
use is demonstrated for a number of problems, 
including sorting. 

8. Several research groups have performed 
extensive algorithm engineering for high- 
performance numerical computing. One of the 
most prominent efforts is that led by Dongarra 
for ScaLAPACK [4], a scalable linear algebra 
library for parallel computers. ScaLAPACK 
encapsulates much of the high-performance 
algorithm engineering with significant impact 
to its users who require efficient parallel 
versions of matrix—matrix linear algebra 
routines. New approaches for automatically 
tuning the sequential library (e.g., LAPACK) 
are now available as the ATLAS package [21]. 


Open Problems 


All of the tools and techniques developed over 
the last several years for algorithm engineer- 
ing are applicable to high-performance algorithm 
engineering. However, many of these tools need 


High Performance Algorithm Engineering for Large-Scale Problems 


further refinement. For example, cache-efficient 
programming is a key to performance but it is not 
yet well understood, mainly because of complex 
machine-dependent issues like limited associativ- 
ity, virtual address translation, and increasingly 
deep hierarchies of high-performance machines. 
A key question is whether one can find simple 
models as a basis for algorithm development. 
For example, cache-oblivious algorithms [7] are 
efficient at all levels of the memory hierarchy in 
theory, but so far only few work well in practice. 
As another example, profiling a running program 
offers serious challenges in a serial environment 
(any profiling tool affects the behavior of what 
is being observed), but these challenges pale 
in comparison with those arising in a parallel 
or distributed environment (for instance, mea- 
suring communication bottlenecks may require 
hardware assistance from the network switches 
or at least reprogramming them, which is sure 
to affect their behavior). Designing efficient and 
portable algorithms for commodity multicore and 
manycore processors is an open challenge. 
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Problem Definition 


The framework of Holant problems is intended 
to capture a class of sum-of-product computa- 
tions in a more refined way than counting CSP 
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problems and is inspired by Valiant’s holographic 
algorithms [12] (also cf. entry » Holographic Al- 
gorithms). A constraint function f, or signature, 
is a mapping from [k]” to C, representing a local 
contribution to a global sum. Here, [x] is a finite 
domain set, and 7 is the arity of f. The range 
is usually taken to be C, but it can be replaced 
by any commutative semiring. A Holant problem 
Holant(F) is parameterized by a set of constraint 
functions #. We usually focus on the Boolean 
domain, namely, « = 2. For consideration of 
models of computation, we restrict function val- 
ues to be complex algebraic numbers. 

We allow multigraphs, namely, graphs with 
self-loops and parallel edges. A signature grid 
QQ. = (G,z) of Holant(F) consists of a graph 
G = (V,£), where mz assigns each vertex v € 
V and its incident edges with some fy € F 
and its input variables. We say S2 is a planar 
signature grid if G is planar. The Holant problem 
on instance £2 is to evaluate 


Holant(Q; F) = » I] So(© |Eq)); 
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a sum over all edge labelings o : E — [k], where 
E(v) denotes the incident edges of v and o | g(v) 
denotes the restriction of o to E(v). This is also 
known as the partition function in the statistical 
physics literature. 

Formally, a set of signatures F defines the 
following Holant problem: 


Name Holant(7) 
Instance A signature grid 2 = (G,7) 
Output Holant(2; F) 


The problem Pl-Holant(7) is defined similarly 
using a planar signature grid. 

A function f, can be represented by listing 
its values in lexicographical order as in a truth 
table, which is a vector in ce or as a tensor 
in (C*)®2@). Special focus has been put on 
symmetric signatures, which are functions in- 
variant under any permutation of the input. An 
example is the EQUALITY signature =, of arity 
n. A Boolean symmetric function f of arity n 
can be listed as [fo, f1,.--, ful, where ff, is the 


function value of # when the input has Hamming 
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weight w. Using this notation, an EQUALITY sig- 
nature is [1,0,...,0, 1]. Another example is the 
EXACTONE signature [0, 1,0,..., 0]. Clearly, the 
Holant problem defined by this signature counts 
the number of perfect matchings. 

The set F is allowed to be an infinite set. For 
Holant(F) to be tractable, the problem must be 
computable in polynomial time even when the 
description of the signatures in the input {2 is 
included in the input size. In contrast, we say 
Holant(F) is #P-hard if there exists a finite subset 
of F for which the problem is #P-hard. 

The Holant framework is a generalization and 
refinement of both counting graph homomor- 
phisms and counting constraint satisfaction prob- 
lems (see entry >» Complexity Dichotomies for 
Counting Graph Homomorphisms for more de- 
tails and results). 


Key Results 


The Holant problem was introduced by Cai, Lu, 
and Xia [3], which also contains a dichotomy of 
Holant* for symmetric Boolean complex func- 
tions. The notation Holant* means that all unary 
functions are assumed to be available. This re- 
striction is later weakened to only allow two 
constant functions that pin a variable to 0 or 1. 
This framework is called Holant‘. In [5], a di- 
chotomy of Holant® is obtained. The need to 
assume some freely available functions is fi- 
nally avoided in [10]. In this paper, Huang and 
Lu proved a dichotomy for Holant but with the 
caveat that the functions must be real weighted. 
This result was later improved by Cai, Guo, and 
Williams [6], who proved a dichotomy for Holant 
parameterized by any set of symmetric Boolean 
complex functions. 

We will give some necessary definitions and 
then state the dichotomy from [6]. First are 
several tractable families of functions over the 
Boolean domain. 


Definition 1 A signature f of arity n is degen- 
erate if there exist unary signatures u; € Cc 
(d < j <n) such that f = uw @---@m. 
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A symmetric degenerate signature has the form 
@n 
ue”, 


Definition 2 A k-ary function f(x1,...,xx) is 
of affine type if it has the form 


AX Ax=0* Jfaqriat eam) 


where A € C, x = (x4, X2,...,Xx, 1)", Aisa 
matrix over F2, a; is a vector over F, and x 
is a O—1 indicator function such that 7 4x=0 is 1 
iff Ax = 0. Note that the dot product (a;, x) is 
calculated over F2, while the summation et 
on the exponent of i = \/—1 is evaluated as a 
sum mod 4 of 0-1 terms. We use .&to denote the 
set of all affine-type functions. 


An alternative but equivalent form for an 
affine-type function is A7 4x=-0:-V —1 cacaciaiis 
where Q(-) is a quadratic form with integer 
coefficients that are even for every cross 
term. 


Definition 3 A function is of product type if it 
can be expressed as a product of unary functions, 
binary equality functions ([1,0,1]), and binary 
disequality functions ([0,1,0]), each applied to 
some of its variables. We use Y to denote the set 
of product-type functions. 


Definition 4 A function f is called vanishing if 
the value Holant(Q; { f}) is 0 for every signature 
grid {2. We use ¥ to denote the set of vanishing 
functions. 


For vanishing signatures, we need some more 
definitions. 


Definition 5 An arity 1 symmetric signature of 
the form f = [fo, fi..--, fa] is in & for 
a nonnegative integer ¢ > Oif ¢ > n or for 
anyO < k <n-t, fx,..., fe+s satisfy the 
recurrence relation 


()) 
EY yn 2 ee 
+ ar Siete og}! te =9 


(1) 
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We define %, similarly but with —i in place of i 
in (1). 


With Z;", one can define the recurrence degree of 
a function /. 


Definition 6 For a nonzero symmetric signature 
f of arity n, it is of positive (resp. negative) 
recurrence degree t < n, denoted by rd*(f) = t 
(resp. rd-(f) = £), if and only if f ¢€ sae - 
Rf (resp. f € #4, -&, ). lf f is the all-zero 
signature, we define rd‘ (f) = rd-(f) = —1. 


In [6], it is shown that f € Vif and only if for 
either 0 = + or —, we have 2rd°(f) < arity(/). 
Accordingly, we split the set Y of vanishing 
signatures in two. 


Definition 7 We define Y° foro € {+,—} as 


V" ={f | 2rd°(f) < arity(f)}. 


To state the dichotomy, we also need the no- 
tion of F-transformable. For a matrix T € C?*?, 
and a signature set F, define TF = {g | df ¢€ 
F of arity n, g = T®" f}. Here, we view 
the signatures as column vectors. Let =2 be the 
equality function of arity 2. 


Definition 8 A signature set F’ is F-transformable 


if there exists a non-singular matrix T € C?*? 
such that F’ C TF and (=2)T®? € F. 


If a set of functions F’ is F-transformable and F 
is a tractable set, then Holant(#’) is tractable as 
well. 

The dichotomy of Holant problems over sym- 
metric Boolean complex functions is stated as 
follows. 


Theorem 1 ((6]) Let F be any set of symmetric, 
complex-valued signatures in Boolean variables. 
Then, Holant(¥) is #P-hard unless F satisfies 
one of the following conditions, in which case the 
problem is in P: 


1. All nondegenerate signatures in F are of arity 
at most 2; 
. F is transformable; 
. F is P-transformable; 
4.F CW ULf € Bs | arity(f) = 2} for 
some ao € {+,-}; 


Ys NO 
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5. All nondegenerate signatures in F are in #5 
for some o € {+,-}. 


Theorem | is about Holant problems parame- 
terized by symmetric Boolean complex functions 
over general graphs. Holant problems are studied 
in other settings as well. For planar graphs, [2] 
contains a dichotomy for Holant® with real sym- 
metric functions. There are signature sets that are 
#P-hard over general graphs but tractable over 
planar graphs. The algorithms for such sets are 
due to Valiant’s holographic algorithms and the 
theory of matchgates [1, 12]. 

Another generalization looks at a broader 
range of functions. One may consider asymmetric 
functions as in [4], which contains a dichotomy 
for Holant* problems defined by asymmetric 
Boolean complex functions. One can also 
consider functions of larger domain size. For 
domain size 3, [7] contains a dichotomy for a 
single arity 3 symmetric complex function in 
the Holant* setting. For any constant domain 
size, [8] contains a dichotomy for a single arity 3 
complex weighted function that satisfies a strong 
symmetry property. 

One can consider constraint functions with a 
range other than C. Replacing C by some finite 
field F, for some prime p defines counting prob- 
lems modulo p. The case p = 2 is called parity 
Holant problems. It is of special interest because 
computing the permanent modulo 2 is tractable, 
which implies a family of tractable matchgate 
functions even over general graphs. For parity 
Holant problems, a complete dichotomy for sym- 
metric functions is obtained by Guo, Lu, and 
Valiant [9]. 


Open Problems 


Unlike the progress in the general graph setting, 
the strongest known dichotomy results for pla- 
nar Holant problems are rather limited. These 
planar dichotomies showed that newly tractable 
problems over planar graphs are captured by 
holographic algorithms with matchgates, but with 
restrictions like symmetric functions or regular 
graphs. The theory of holographic algorithms 
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with matchgates can be applied to planar graphs 
and asymmetric signatures. A true test of its 
power would be to obtain an asymmetric complex 
weighted dichotomy of planar Holant problems. 
The situation is similarly limited for higher do- 
main sizes, where things seem considerably more 
complicated. A reasonable first step in this direc- 
tion would be to consider some restricted (yet still 
powerful) family of functions. 

Despite the success for F2, little is known 
about the complexity of Holant problems over 
other finite fields or semirings. As Valiant showed 
in [11], counting problems modulo some finite 
modulus include some interesting and surprising 
phenomena. It deserves further research. 
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Problem Definition 


Holographic algorithm, introduced by L. Valiant 
[11], is an algorithm design technique rather than 
a single algorithm for a particular problem. In 
essence, these algorithms are reductions to the 
FKT algorithm [7-9] to count the number of 
perfect matchings in a planar graph in polyno- 
mial time. Computation in these algorithms is 
expressed and interpreted through a choice of lin- 
ear basis vectors in an exponential “holographic” 
mix, and then it is carried out by the FKT method 
via the Holant Theorem. This methodology has 
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produced polynomial time algorithms for a va- 
riety of problems ranging from restrictive ver- 
sions of satisfiability, vertex cover, to other graph 
problems such as edge orientation and node/edge 
deletion. No polynomial time algorithms were 
known for these problems, and some minor varia- 
tions are known to be NP-hard (or even #P-hard). 

Let G = (V, E, W) be a weighted undirected 
planar graph, where V, F, and W are sets of 
vertices, edges, and edge weights, respectively. 
A matchgate is a tuple (G, X) where X C V 
is a set of external nodes on the outer face. A 
matchgate is considered a generator or a recog- 
nizer matchgate when the external nodes are con- 
sidered output or input nodes, respectively. They 
differ mainly in the way they are transformed. 
The external nodes are ordered clockwise on the 
external face. J” is called an odd (resp. even) 
matchgate if it has an odd (resp. even) number 
of nodes. 

Each matchgate is assigned a signature tensor. 
A generator J” with m output nodes is assigned 
a contravariant tensor G € Vj” of type (5); 
where V,” is the tensor space spanned by the m- 
fold tensor products of the standard basis b = 


[bo, bi] = I(«) , (1)] The tensor G under the 


standard basis b has the form 
YG? b;, @ bi, @ +++ @ bi, 
where 
G'1!2--lm — PerfMatch(G — Z). 


Here Z is the subset of the output nodes of 
I” having the characteristic sequence yz = 
iji2...im € {0,1}, PerfMatch(G — Z) = 
Vu llajem Wis is a sum over all perfect 
matchings M in the graph G — Z obtained from 
G by removing Z and its incident edges, and 
w;; is the weight of the edge (i, j). Similarly a 
recognizer J’ with underlying graph G’ having m 
input nodes is assigned a covariant tensor R € Vo 
of type (2): This tensor under the standard (dual) 
basis b* has the form 
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oe Rin in -cci BD”! @b? @---@bi”, 
where 
Rij in.cim = PerfMatch(G’ — Z), 


and Z is the subset of the input nodes of I”’ hav- 
ing the characteristic sequence 7z = i1/2...im. 

As a contravariant tensor, G transforms as 
follows. Under a basis transformation B; = 


d; biti, 


(G!)/1/2--Jm = Gill2-im ran 72 pim 
) i, fig - 


Im 


where (i! ) is the inverse matrix of (t'). Similarly, 
R transforms as a covariant tensor, namely, 


Ie 5 a in nc 3 iy ,12 im 
(R) jt j2...im = y Rijin..imt jt, bf 


A signature is symmetric if each entry only 
depends on the Hamming weight of the index 
1ji2 ...im. This notion is invariant under a basis 
transformation. A symmetric signature is denoted 
by [o0,01,--.,0m], where o; denotes the value 
of a signature entry whose Hamming weight of 
its index is 7. 

A matchgrid Q = (A, B,C) is a weighted 
planar graph consisting of a disjoint union of: a 


set of g generators A = (Aj,...,Ag), a set of 
r recognizers B = (B,,...,B,), and a set of 
f connecting edges C = (Cj,...,Cy), where 


each C; edge has weight | and joins an output 
node of a generator with an input node of a 
recognizer, so that every input and output node in 
every constituent matchgate has exactly one such 
incident connecting edge. 

Let G = @f_, G(A;) be the tensor product 
of all the generator signatures, and let R = 
jai R(B;) be the tensor product of all the 
recognizer signatures. Then Holanty is defined 
to be the contraction of the two product tensors, 
under some basis 8, where the corresponding 
indices match up according to the f connecting 
edges Cx: 
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Holantg = (R,G) = > {UTi<i<gG(Ai. x|4;)] - Ui<j<rR(B;j.x"|B,)]}- (1) 


xeB?’ 


If we write the covariant tensor R as a row 
vector of dimension 2/ , write the contravariant 
tensor G as a column vector of dimension 24 : 
both indexed by some common ordering of the 
connecting edges, then Holantg is just the dot 
product of these two vectors. Valiant’s beautiful 
Holant Theorem is as follows: 


Theorem 1 (Valiant) For any matchgrid Q over 
any basis B, let G be its underlying weighted 
graph, then 


Holantg = PerfMatch(G). 


The FKT algorithm can compute the perfect 
matching polynomial PerfMatch(G) for a 
planar graph in polynomial time. This gives 
a polynomial time algorithm to compute 
Holanty. 


Key Results 


To design a holographic algorithm for a given 
problem, the creative part is to formalize the 
given problem as a Holant problem. The theory 
of holographic algorithms is trying to answer the 
second question: given a Holant problem, can we 
find a basis transformation so that all the signa- 
tures in the Holant problem can be realized by 
some matchgates on that basis? More formally, 
we want to solve the following simultaneous 
realizability problem (SRP). 


Definition 1 Simultaneous Realizability Prob- 
lem (SRP): 


Input: A set of constraint functions for genera- 
tors and recognizers. 

Output: A common basis under which these 
functions can be simultaneously realized by 


matchgate signatures, if any exists; “NO” if 
they are not simultaneously realizable. 


The theory of matchgates and holographic 
algorithms provides a systematic understanding 
of which constraint functions can be realized 
by matchgates, the structure for the bases, 
and finally solve the simultaneous realizability 
problem. 


Matchgate Identities 

There is a set of algebraic identities [1,6] which 
completely characterizes signatures directly re- 
alizable without basis transformation by match- 
gates for any number of inputs and outputs. These 
identities are derived from Grassmann-Pliicker 
identities for Pfaffians. 

Patterns a, 8 are m-bit strings, ie., a, B € 
{0, 1}. A position vector P = {p;},i € [/] is 
a subsequence of {1,2,...,m}, ie., pi € [m] 
and py < po < -::: < py. We also use p to 
denote the m-bit string, whose (p1, p2,..., Pi)- 
th bits are 1 and others are 0. Let e; € {0, 1} be 
the pattern with | in the i-th bit and 0 elsewhere. 
Let a, B € {0,1}” be any pattern, and let P = 
{pi} = a+ B,i € [I] be their bit-wise XOR 
as a position vector. Then, we have the following 
identity: 


I 
Yi(-piarte Geter =0. (2) 


i=1 


A tensor G = (G'!'") is realizable as the 
signature, without basis transformation, of some 
planar matchgate iff it satisfies the matchgate 
identities (2) for all w and f. 


Basis Collapse 
When we consider basis transformations for 
holographic algorithms, we mainly focus on 


924 


invertible transformations, and these are bases 
of dimension 2. However, in a paper called 
“accidental algorithm” [10], Valiant showed 
that a basis of dimension 4 can be used to 
solve in P an interesting (restrictive SAT) 
counting problem mod 7. In a later paper [4], 
we have shown, among other things, that for 
this particular problem, this use of bases of size 
2 is unnecessary. Then, in a sequence of two 
papers [2,3], we completely resolve the problem 
of the power of higher dimensional bases. We 
prove that 2-dimensional bases are universal 
for holographic algorithms in the Boolean 
domain. 


Theorem 2 (Basis Collapse Theorem) Any 
holographic algorithm on a basis of any 
dimension which employs at least one nondegen- 
erate generator can be efficiently transformed 
to a holographic algorithm in a basis of 
dimension 2. More precisely, if generators 
G1, Go,...,Gs and recognizers R,, R2,..., Rt 
are simultaneously realizable on a basis T of any 
dimension, and not all generators are degenerate, 
then all the generators and recognizers are 
simultaneously realizable in a_ basis i of 
dimension 2. 


From Art to Science 

Based on the characterization for matchgate sig- 
natures and basis transformations, we can solve 
the simultaneous realizability problem [5]. In 
order to investigate the realizability of signatures, 
it is useful to introduce a basis manifold M, 
which is defined to be the set of all possible 
bases modulo an equivalence relation. One can 
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Fig. 1 Some matchgates 
used in #PL-3-NAE-ICE 
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characterize in terms of M all realizable symmet- 
ric signatures under basis transformations. This 
structural understanding gives: (i) a uniform ac- 
count of all the previous successes of holographic 
algorithms using symmetric signatures [10, 11]; 
(ii) generalizations to solve other problems, when 
this is possible; and (iii) a proof when this is not 
possible. 


Applications 


In this section, we list a few problems which can 
be solved by holographic algorithms. 
#PL-3-NAE-ICE 


INPUT: A planar graph G = (V, £) of maximum 
degree 3. 

OuTPUT: The number of orientations such that 
no node has all incident edges directed toward 
it or all incident edges directed away from it. 


Hence, #PL-3-NAE-ICE counts the number of 
no-sink-no-source orientations. A node of degree 
one will preclude such an orientation. We assume 
every node has degree 2 or 3. To solve this prob- 
lem by a holographic algorithm with matchgates, 
we design a signature grid based on G as follows: 
We attach to each node of degree 3 a generator 
with signature [0, 1, 1, 0]. This represents a NOT- 
ALL-EQUAL or NAE gate of arity 3. For any 
node of degree 2, we use a generator with the bi- 
nary NAE (i.e., a binary DISEQUALITY) signature 
(A>) = [0,1,0]. For each edge in E, we use 
a recognizer with signature (42), which stands 
for an orientation from one node to the other. (To 
express such a problem, it is completely arbitrary 


i) 
Nie 


| 
i) 
NIF 
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to label one side as generators and the other side 
as recognizers.) From the given planar graph G, 
we obtain a signature grid {2, where the underly- 
ing graph G’ is the edge-vertex incidence graph 
of G. By definition, Holantg is an exponential 
sum where each term is a product of appropriate 
entries of the signatures. Each term is indexed 
by a 0-1 assignment on all edges of G’; it has 
a value of O or 1, and it has a value of 1 iff 
it corresponds to an orientation of G such that 
at every vertex of G the local NAE constraint 


925 


is satisfied. Therefore, Holantg is precisely the 
number of valid orientations required by #PL-3- 
NAE-ICE. 

Note that the signature [0,1,1,0] is not the 
signature of any matchgate. A simple reason for 
this is that a matchgate signature, being defined in 
terms of perfect matchings, cannot have nonzero 
values for inputs of both odd and even Hamming 
weights. 


However, under a holographic transformation 


11 


using H = [i _4/> 


H®310, 1, 1,0] = He i} - ble - cr} = [6, 0, —2, 0], 


H®(0, 1,0] = He Hie a tle = cr} = 7-09), 


and 
1 
[0, 1, 0](1)®? = 5ll.0,—I). 


These signatures are all realizable as matchgate 
signatures by verifying all the matchgate identi- 
ties. More concretely, we can exhibit the requisite 
three matchgates in Fig. 1. 

Hence, #PL-3-NAE-ICE is precisely the fol- 
lowing Holant problem on planar graphs: 


Holant({0, 1, 0] | [0, 1, 0], [0, 1, 1, 0]) 
=r Holant(5[1,0,—1] | [2,0, —2], [6, 0, —2, 0]). 


Now we may replace each signature $l, 0,—-1], 
[2,0,—2], and [6,0,—2,0] in Q by their corre- 
sponding matchgates, and then we can compute 
Holantg in polynomial time by Kasteleyn’s algo- 
rithm. 

The next problem is a satisfiability problem. 
#PL-3-NAE-SAT 


INPUT: A planar formula @ consisting of a con- 
junction of NAE clauses each of size 3. 

OUTPUT: The number of satisfying assignments 
of @. 


This is a variant of 3SAT. A Boolean for- 
mula is planar if it can be represented by a 
planar graph where vertices represent variables 
and clauses, and there is an edge iff the vari- 
able or its negation appears in that clause. The 
SAT problem is when the gate for each clause 
is the Boolean OR. When SAT is restricted to 
planar formulae, it is still NP-complete, and its 
corresponding counting problem is #P-complete. 
Moreover, for many connectives other than NAE 
(e.g., EXACTLY ONE), the unrestricted or the 
planar decision problems are still NP-complete, 
and the corresponding counting problems are #P- 
complete. 

We design a signature grid as follows: To each 
NAE clause, we assign a generator with signature 
[0, 1, 1,0]. To each Boolean variable, we assign 
a generator with signature (=,) where k is the 
number of clauses the variable appears, either 
negated or unnegated. Further, if a variable oc- 
currence is negated, we have a recognizer [0, 1, 0] 
along the edge that joins the variable generator 
and the NAE generator, and if the variable oc- 
currence is unnegated, then we use a recognizer 
[1, 0, 1] instead. Under a holographic transforma- 
tion using H, (= ;) is transformed to 


= 2[1,0,1,0,...]. 


It can be verified that all the signatures used 
satisfy all matchgate identities and thus can be 
realized by matchgates under the holographic 
transformation. 
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Problem Definition 


An instance I of the Hospitals/Residents problem 
(HR) [6, 7, 18] involves a set R = {rj,...,Tn} 
of residents and a set H = {hy,...,hm} of 
hospitals. Each hospital h; € H has a posi- 
tive integral capacity, denoted by c;. Also, each 
resident r; € R has a preference list in which 
he ranks in strict order a subset of H. A pair 
(r;,h;) € Rx H is said to be acceptable if 
h; appears in r;’s preference list; in this case 
rj is said to find h; acceptable. Similarly each 
hospital h; ¢€ H has a preference list in which 
it ranks in strict order those residents who find 
h; acceptable. Given any three agents x,y,z € 
RU H, x is said to prefer y to z if x finds each 
of y and z acceptable, and y precedes z on x’s 
preference list. Let C = )1y, cH Cj- 

Let A denote the set of acceptable pairs in /, 
and let L = |A|. An assignment M is a subset 
of A. If (r;,h;) € M, 1; is said to be assigned 
to h;, and h; is assigned r;. For each q € RU 
H, the set of assignees of g in M is denoted by 
M(q). If r; ¢ Rand M(r;) = Q, 7; is said to be 
unassigned; otherwise r; is assigned. Similarly, 
any hospital h; € H is under-subscribed, full, 
or over-subscribed according as |M(h;)| is less 
than, equal to, or greater than c ;, respectively. 

A matching M is an assignment such that 
|M(r;)| < 1 for each r; € R and |M(h;)| < cj 
for each h; € H (i.e., no resident is assigned 
to an unacceptable hospital, each resident is as- 
signed to at most one hospital, and no hospital 
is over-subscribed). For notational convenience, 
given a matching M and a resident r; € R such 
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that M(r;) ~ @, where there is no ambiguity, the 
notation M(r;) is also used to refer to the single 
member of M(r;). 

A pair (r;,h;) € A\M blocks a matching 
M or is a blocking pair for M, if the following 
conditions are satisfied relative to M: 


1. rj is unassigned or prefers h; to M(r;); 
2. hj; is under-subscribed or prefers r; to at least 
one member of M(h ;) (or both). 


A matching M is said to be stable if it admits 
no blocking pair. Given an instance J of HR, the 
problem is to find a stable matching in /. 


Key Results 


HR was first defined by Gale and Shapley [6] 
under the name “College Admissions Problem.” 
In their seminal paper, the authors’ primary 
consideration is the classical Stable Marriage 
problem (SM; see Entries >» Stable Marriage and 

Optimal Stable Marriage), which is a special 
case of HR in whichn = m, A = Rx H, 
and c; = 1 for all h; € H — in this case, 
the residents and hospitals are more commonly 
referred to as the men and women, respectively. 
Gale and Shapley showed that every instance 
I of HR admits at least one stable matching. 
Their proof of this result is constructive, i.e., an 
algorithm for finding a stable matching in J is 
described. This algorithm has become known as 
the Gale/Shapley algorithm. 

An extended version of the Gale/Shapley 
algorithm for HR is shown in Fig.l. The 
algorithm involves a sequence of apply and delete 
operations. At each iteration of the while loop, 
some unassigned resident 7; with a nonempty 
preference list applies to the first hospital hj; 
on his list and becomes provisionally assigned 
to h; (this assignment could subsequently be 
broken). If 4; becomes over-subscribed as a 
result of this assignment, then h; rejects its 
worst assigned resident rg. Next, if h; is full 
(irrespective of whether h; was over-subscribed 
earlier in the same loop iteration), then for each 
resident 7; that 4; finds less desirable than its 
worst assigned resident r;, the algorithm deletes 
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the pair (rj,h;), which comprises deleting hj 
from r)’s preference list and vice versa. 

Given that the above algorithm involves resi- 
dents applying to hospitals, it has become known 
as the Resident-oriented Gale/Shapley algorithm, 
or RGS algorithm for short [7, Section 1.6.3]. The 
RGS algorithm terminates with a stable match- 
ing, given an instance of HR [6] [7, Theorem 
1.6.2]. Using a suitable choice of data structures 
(extending those described in [7, Section 1.2.3]), 
the RGS algorithm can be implemented to run in 
O(L) time. This algorithm produces the unique 
stable matching that is simultaneously best possi- 
ble for all residents [6] [7, Theorem 1.6.2]. These 
observations may be summarized as follows: 


Theorem 1 Given an instance of HR, the RGS 
algorithm constructs, in O(L) time, the unique 
stable matching in which each assigned resident 
obtains the best hospital that he could obtain 
in any stable matching, while each unassigned 
resident is unassigned in every stable matching. 


A counterpart of the RGS algorithm, known as 
the Hospital-oriented Gale/Shapley algorithm, or 
HGS algorithm for short [7, Section 1.6.2], gives 
the unique stable matching that similarly satisfies 
an optimality property for the hospitals [7, Theo- 
rem 1.6.1]. 

Although there may be many stable matchings 
for a given instance J of HR, some key structural 
properties hold regarding unassigned residents 
and under-subscribed hospitals with respect to all 
stable matchings in J, as follows. 


Theorem 2 For a given instance of HR: 


¢ The same residents are assigned in all stable 
matchings; 

¢ Each hospital is assigned the same number of 
residents in all stable matchings; 

e Any hospital that is under-subscribed in one 
stable matching is assigned exactly the same 
set of residents in all stable matchings. 


These results are collectively known as the “Rural 
Hospitals Theorem” (see [7, Section 1.6.4] for 
further details). Furthermore, the set of stable 
matchings in J forms a distributive lattice under 
a natural dominance relation [7, Section 1.6.5]. 
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M :=9@; 


Hospitals/Residents Problem 


while (some resident 7; is unassigned and r; has a nonempty list) { 


h; := first hospital on r;’s list; 
/* rj applies to h; */ 
M := M U{(7i,h;)}: 


if (4; is over-subscribed) { 


rx := worst resident in M(h ;) according to h ;’s list; 


M := M\i(rk,hj)}: 
} 
if (h; is full) { 


rx := worst resident in M(h ;) according to h ;’s list; 


for (each successor r; of rx on h;’s list) 


delete the pair (77, h;); 


Hospitals/Residents Problem, Fig. 1 Gale/Shapley algorithm for HR 


Applications 


Practical applications of HR are widespread, 
most notably arising in the context of centralized 
automated matching schemes that assign 
applicants to posts (e.g., medical students to 
hospitals, school leavers to universities, and 
primary school pupils to secondary schools). 
Perhaps the largest and best-known example 
of such a scheme is the National Resident 
Matching Program (NRMP) in the USA [8], 
which annually assigns around 31,000 graduating 
medical students (known as residents) to their 
first hospital posts, taking into account the 
preferences of residents over hospitals and vice 
versa and the hospital capacities. Counterparts of 
the NRMP are in existence in other countries, 
including Canada [9] and Japan [10]. These 
matching schemes essentially employ extensions 
of the RGS algorithm for HR. 

Centralized matching schemes based largely 
on HR also occur in other practical contexts, such 
as school placement in New York [1], university 
faculty recruitment in France [3], and university 
admission in Spain [16]. Further applications are 
described in [15, Section 1.3.7]. 

Indeed, the Nobel Prize in Economic Sci- 
ences was awarded in 2012 to Alvin Roth and 
Lloyd Shapley, partly for their theoretical work 
on HR and its variants [6, 18] and partly for 
their contribution to the widespread deployment 


of algorithms for HR in practical settings such as 
junior doctor allocation as noted above. 


Extensions of HR 


One key extension of HR that has considerable 
practical importance arises when an instance may 
involve a set of couples, each of which submits 
a joint preference list over pairs of hospitals 
(typically in order that the members of the cou- 
ple can be located geographically close to one 
another). The extension of HR in which couples 
may be involved is denoted by HRC; the stability 
definition in HRC is a natural extension of that in 
HR (see [15, Section 5.3] for a formal definition 
of HRC). It is known that an instance of HRC 
need not admit a stable matching (see [4]). More- 
over, the problem of deciding whether an HRC 
instance admits a stable matching is NP-complete 
[17]. 

HR may be regarded as a many-one general- 
ization of SM. A further generalization of SM 
is to a many-many stable matching problem, 
in which both residents and hospitals may be 
multiply assigned subject to capacity constraints. 
In this case, residents and hospitals are more 
commonly referred to as workers and firms, re- 
spectively. There are two basic variations of the 
many-many stable matching problem according 
to whether workers rank (i) individual acceptable 
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firms in order of preference and vice versa or (11) 
acceptable subsets of firms in order of preference 
and vice versa. Previous work relating to both 
models is surveyed in [15, Section 5.4]. 

Other variants of HR may be obtained if pref- 
erence lists include ties. This extension is again 
important from a practical perspective, since it 
may be unrealistic to expect a popular hospital to 
rank a large number of applicants in strict order, 
particularly if it is indifferent among groups of 
applicants. The extension of HR in which pref- 
erence lists may include ties is denoted by HRT. 
In this context three natural stability definitions 
arise, the so-called weak stability, strong stability, 
and super-stability (see [15, Section 1.3.5] for 
formal definitions of these concepts). Given an 
instance J of HRT, it is known that weakly 
stable matchings may have different sizes, and 
the problem of finding a maximum cardinality 
weakly stable matching is NP-hard (see entry 

Stable Marriage with Ties and Incomplete Lists 
for further details). On the other hand, in contrast 
to the case for weak stability, a super-stable 
matching in J need not exist, though there is an 
O(L) algorithm to find such a matching if one 
does [11]. Analogous results hold in the case of 
strong stability — in this case, an O(L7) algo- 
rithm [13] was improved by an O(CL) algorithm 
[14] and extended to the many-many case [5]. 
Furthermore, counterparts of the Rural Hospitals 
Theorem hold for HRT under each of the super- 
stability and strong stability criteria [11,19]. 

A further generalization of HR arises when 
each hospital may be split into several depart- 
ments, where each department has a capacity, 
and residents rank individual departments in or- 
der of preference. This variant is modeled by 
the Student-Project Allocation problem (15, Sec- 
tion 5.5]. Finally, the Hospitals/Residents prob- 
lem under Social Stability [2] is an extension 
of HR in which an instance is augmented by 
a social network graph G (a bipartite graph 
whose vertices correspond to residents and hos- 
pitals and whose edges form a subset of A) such 
that a blocking pair must additionally satisfy the 
property that it forms an edge of G. Edges in 
G correspond to resident—hospital pairs that are 
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acquainted with one another and therefore more 
likely to block a matching in practice. 


Open Problems 


As noted in Section “Applications,” ties in the 
hospitals’ preference lists may arise naturally in 
practical applications. In an HRT instance, weak 
stability is the most commonly-studied stability 
criterion, due to the guaranteed existence of such 
a matching. Attempting to match as many resi- 
dents as possible motivates the search for large 
weakly stable matchings. Several approximation 
algorithms for finding a maximum cardinality 
weakly stable matching have been formulated 
(see > Stable Marriage with Ties and Incomplete 
Lists and [15, Section 3.2.6] for further details). 
It remains open to find tighter upper and lower 
bounds for the approximability of this problem. 


URL to Code 


Ada implementations of the RGS and HGS 
algorithms for HR may be found via _ the 
following URL: = http://www.dcs.gla.ac.uk/ 
research/algorithms/stable. 
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Problem Definition 


The Hospitals/Residents (HR) problem is the 
many-to-one version of the stable marriage prob- 
lem introduced by Gale and Shapley. In this prob- 
lem, a bipartite graph G = (R U H, E) is given. 
Each vertex in represents a hospital and each 
vertex in R a resident. Each vertex has a prefer- 
ence over its neighboring vertices. Each hospital 
h has an upper quota u(h) specifying the maxi- 
mum number of residents it can take in a match- 
ing. The goal is to find a stable matching while 
respecting the upper quotas of the hospitals. 

The original HR has been well studied in the 
past decades. A recent trend is to assume that 
each hospital h also comes with a lower quota 
I(h). In this context, it is required (if possible) 
that a matching satisfies both the upper and the 
lower quotas of each hospital. The introduction 
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of such lower quotas is to enforce some policy 
in hiring or to make the outcome more fair. It 
is well-known that hospitals in some rural areas 
suffer from the shortage of doctors. 

With the lower quotas, the definition of sta- 
bility in HR and the objective of the problem 
depend on the applications. Below we summarize 
three variants that have been considered in the 
literature. 


Minimizing the Number of Blocking Pairs 

In this variant, a matching M is feasible if, for 
each hospital h, /(h) < |M(h)| < u(h). Given 
a feasible matching, a resident r and a hospital 
h form a blocking pair if the following condition 
holds. (i) (7,4) € E\M, (ii) r is unassigned in 
M or r prefers h to his assignment M(r), and 
(iii) |M(A)| < u(h) or h prefers r to one of its 
assigned residents. A matching is stable if the 
number of blocking pairs is 0. It is straightfor- 
ward to check whether a stable matching exists. 
We assume that the given instance has no stable 
matching and the objective is to find a matching 
with the minimum number of blocking pairs. We 
call this problem Min-BP HR. An alternative 
objective is to minimize the number of residents 
that are part of a blocking pair in a matching. We 
call this problem Min-BR HR. 


HR with the Option of Closing a Hospital 

The following variation of HR is motivated by 
the higher education system in Hungary. Instead 
of requiring all hospitals to have enough residents 
to meet their lower quotas, it is allowed that a 
hospital be closed as long as there is not too much 
demand for it. 

Precisely, in this variant, a matching M is fea- 
sible if, for each hospital h, |M(h)| = Oor/(h) < 
|M(h)| < u(A). In the former case, a hospital 
is closed; in the latter case, a hospital is opened. 
Given a feasible matching M, it is stable if 


1. There is no opened hospital / and resident r 
so that (i) (A,r) € E\M, (ii) r is unassigned 
in M orr prefers h to his assignment M(r), 
and (iii) |M(h)| < u(h) or fA prefers r to one 
of its assigned residents; 
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2. There is no closed hospital # anda set RC R 
of residents so that (i) |R| > |/(A)|, Gi) for 
each r € R, (r,h) € E\M, and (iii) each 
resident r € R is either unassigned or prefers 
h to his assigned hospital M(r). 


With the above definition of stability, we refer 
to the question of the existence of a stable match- 
ing as HR woCH. 


Classified HR 

Motivated by the practice in academic hiring, 
Huang introduced a more generalized variant of 
HR. In this variant, a hospital / has a classifica- 
tion he over its neighboring residents. Each class 
C € he comes with a upper quota u(C) and a 
lower quota /(C). A matching M is feasible if, 
for each hospital 4 and for each of its classes 
c € he, l(C) < |M(h)| < u(C). A feasible 
matching M is stable if the following condition 
holds: there is no hospital / such that 


1. There exists a resident r so that (r,h) € E\M, 
and r is either unassigned in M or r prefers h 
to his assignment M(r); 

2. For every class C € he, I(C) < |M(hA)U 
{r}| < u(C), or there exists another resident 
r’ € M(h) so that h prefers r to r’ and 

for every class C € he, I(C) < |M(h) U 


(H\{r] < u(C). 


With the above definition of stability, we refer 
to the question of the existence of a stable match- 
ing as CHR. 


Key Results 


For the first variant where the objective is to 
minimize the number of blocking pairs, Hamada 
et al. showed the following tight results. 


Theorem 1 ((3]) For any positive constant « > 
0, there is no polynomial-time (|R| + \H|)!~€- 
approximation algorithm for Min-BP HR unless 
P=NP. This holds true even if the given bipartite 
graph is complete and all upper quotas are 1 and 
all lower quotas are 0 or 1. 
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Theorem 2 ((3]) There is a_ polynomial-time 
(|R| + |#|)-approximation algorithm for Min- 
BP AR. 


In the case that the objective is to minimize the 
number of residents involved in blocking pairs, 
Hamada et al. showed the following. 


Theorem 3 ([3]) Min-BR HR is NP-hard. This 
holds true even if the given bipartite graph is 
complete and all hospitals have the same prefer- 
ence over the residents. 


Theorem 4 ((3]) There is a_ polynomial-time 
/ |R|-approximation algorithm for Min-BR HR. 


For the second variant, where a hospital is 
allowed to be closed, Bird et al. showed the 
following. 


Theorem 5 ({1]) The problem HR woCH8 is NP- 
complete. This holds true even if all upper quotas 
are at most 3. 


For the last variant where each hospital is 
allowed to classify the neighboring residents and 
sets the upper and lower quotas for each of its 
classes, Huang showed that if all classifications 
of the hospitals are laminar families, the problem 
is in P. Fleiner and Kamiyama later proved the 
same result by a significantly simpler matroid- 
based technique. 


Theorem 6 ((2,4]) Jn CHR, if all classifications 
of the hospitals are laminar families, then one 
find a stable matching or detect its absence in 
the given instance in O(nm) time, wheren = 


|R UH| andm = |E|. 
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Problem Definition 


Given a directed graph G = (V,A) (withn = 
|V| and m = |A|) with a length function  : A > 
Rt anda pair of vertices s,t, a distance oracle 
returns the distance dist(s, t) from s to t. A label- 
ing algorithm [18] implements distance oracles 
in two stages. The preprocessing stage computes 
a label for each vertex of the input graph. Then, 
given s and f, the query stage computes dist(s, t) 
using only the labels of s and ¢; the query does 
not explicitly use G and £. 

Hub labeling (HL) (or 2-hop labeling) is a 
special kind of labeling algorithm. The label 
L(v) of a vertex v consists of two parts: the 
forward label L ¢ (v) is a collection of vertices w 
with their distances dist(v, w) from v, while the 
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Hub Labeling (2-Hop Labeling), Fig. 1 Example of a 
hub labeling. The hubs of s are circles; the hubs of ¢ are 
crosses (Taken from [3]) 


backward label L»(v) is a collection of vertices u 
with their distances dist(u, v) to v. (If the graph is 
undirected, a single label per vertex suffices.) The 
vertices in v’s label are the hubs of v. The labels 
must obey the cover property: for any two ver- 
tices s and f, the set L ¢(s) M Ly(t) must contain 
at least one hub that is on the shortest s — ¢ path. 
Given the labels, HL queries are straightforward: 
to find dist(s, ¢), simply find the hub x € L ¢(s)N 
Ly (t) that minimizes dist(s, x) + dist(x,t) (see 
Fig. | for an example). If the hubs in each label 
are sorted by ID, queries consist of a simple linear 
sweep over the labels, as in mergesort. 

The size of a forward (backward) label, 
|L ¢(v)| (L,(v)|), is the number of hubs it con- 
tains. The size of a labeling L is the sum of the 
average label sizes, (L ¢(v) + Ly(v))/2, over all 
vertices. The memory footprint of the algorithm 
is proportional to the size of the labeling, while 
query times are determined by the maximum 
label size. Queries themselves are trivial; the 
hard part is an efficient implementation of a 
preprocessing algorithm that, given G and @, 
computes a small hub labeling. 


Key Results 


We describe an approximation algorithm for find- 
ing labelings of size within O(logn) of the op- 
timal [9], as well as its generalization to other 
objectives, including the maximum label size [6]. 
Although polynomial, these approximation al- 
gorithms do not scale to large networks. For 
more practical alternatives, we discuss hierar- 
chical hub labelings (HHLS), a subclass of HL. 
We show that HHLs are closely related to ver- 
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tex orderings and present efficient algorithms 
for computing the minimal HHL for a given 
ordering, as well as heuristics for finding vertex 
orderings that lead to small labels. In particular, 
the RXL algorithm uses sampling to efficiently 
approximate a greedy vertex order, leading to em- 
pirically small labels. RXL can handle large prob- 
lems from several application domains. We then 
discuss representations of hub labels that allow 
various trade-offs between space and query time. 


General Hub Labelings 

The time and space efficiency of the distance ora- 
cles we discuss depend on the label size. If labels 
are big, HL is impractical. Gavoille et al. [15] 
show that there exist graphs for which general la- 
belings must have size @(n?). For planar graphs, 
they give an 92(n*/3) lower and O(n3/?) up- 
per bound. They also show that graphs with k- 
separators have hub labelings of size O(nk). 
Abraham et al. [1] show that graphs with small 
highway dimension (which they conjecture in- 
clude road networks) have small hub labelings. 

Given a particular graph, computing a labeling 
with the smallest size is NP-hard. Cohen et al. [9] 
developed an O(log n)-approximation algorithm 
for the problem. Next we discuss this general HL 
(GHL) algorithm. 

A partial labeling is a labeling that does not 
necessarily satisfy the cover property. Given a 
partial labeling L = (Ly, Lp), we say that a 
vertex pair [u, w] is covered if L ¢(u) N Law) 
contains a vertex on a shortest path from u to 
w and uncovered otherwise. GHL maintains a 
partial labeling L (initially empty) and the cor- 
responding set U of uncovered vertex pairs. Each 
iteration of the algorithm selects a vertex v and 
two subsets X’, Y’ C V, adds (v, dist(x, v)) to 
L ¢(x) for all x € X’, and adds (y, dist(v, y)) 
to La(y) for all y € Y’. Then, GHL deletes 
from U the set U(v, X’, Y’) of vertex pairs that 
become covered by this augmentation. Among all 
v € V and X’,Y’ C V, the triple (v, X’, Y’) 
picked in each iteration is one that maximizes 
|U(v, X', Y')|/(X"| + |¥’]), ie., the ratio of the 
number of paths covered over the increase in label 
size. 
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Cohen et al.’s efficient implementation of 
GHL uses the notion of center graphs. Given a 
set U of vertex pairs and a vertex v, the center 
graph Gy = (X,Y, A,) is a bipartite graph with 
X = Y = V such that an are (u,w) € Ay 
if [u,w] © U and some shortest path from 
u to w in G go through v. If U is the set of 
uncovered vertex pairs, then, for a fixed vertex v, 
maximizing |U(v, X’, Y’)|/(|X’| + |Y’|) over 
all X’,Y' C V is (by definition) the same 
as finding the vertex induced-subgraph of G, 
with maximum density (defined as its number 
of arcs divided by its number of vertices). This 
maximum density subgraph (MDS) problem can 
be solved in polynomial time using parametric 
flows (see e.g., [14]). To maximize the ratio 
over all triples (v,X’,Y’), GHL solves an 
MDS problem for center graphs Gy and picks 
the densest of the n resulting subgraphs. It 
then adds the corresponding vertex v* to the 
labels of the vertices given by the sides of the 
MDS. Arcs corresponding to newly covered 
pairs are removed from center graphs between 
iterations. 

Cohen et al. show that GHL is a special case 
of the greedy set cover algorithm [8] and thus 
gives an O(logn)-optimal labeling. They also 
show that the same guarantee holds if one uses 
a constant-factor approximation to the MDS. We 
refer to a k approximation of MDS as a k- 
AMDS. Using a linear-time 2-AMDS algorithm 
by Kortsarz and Peleg [17], each GHL iteration is 
dominated by n AMDS computations on graphs 
with O(n”) arcs. Since each iteration increases 
the size of the labeling, the number of iterations 
is at most O(n7). The total running time of GHL 
is thus O(n). 

Delling et al. [11] improve the time bound 
for GHL to O(n? logn) using eager and lazy 
evaluation. Intuitively, eager evaluation finds an 
AMDS G’ of G such that deleting G’ reduces 
the MDS value of G by a constant factor. More 
precisely, given a graph G, an upper bound ju 
on the MDS value of G and a parameter a > 
1, a-eager evaluation attempts to find a (2a)- 
AMDS G’ of G such that the MDS value of G 
with the arcs of G’ deleted is at most w/a. If 
the evaluation fails to find such G’, the MDS 
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value of G is at most w/a. Lazy evaluation was 
introduced by Cohen et al. [9] to speed up their 
implementation of GHL and refined by Stengel 
et al. [20]. It is based on the observation that the 
MDS value of a center graph does not increase as 
the algorithm adds vertices to labels and removes 
arcs from center graphs. 

The eager-lazy algorithm maintains upper 
bounds on the center subgraph densities [Ly 
computed in previous iterations. These values 
are computed during initialization and updated in 
a lazy fashion as follows. In each iteration, the 
algorithm picks the maximum jy and applies 
a-eager evaluation to Gy. If the evaluation 
succeeds, the labels are updated. Regardless of 
whether the evaluation succeeds or not, jty/a@ is 
a valid upper bound on the density of G, at the 
end of the iteration. This can be used to show that 
each vertex is selected by O(n logn) iterations, 
each taking O(n”) time. 

Babenko et al. [6] generalize the definition of 
a labeling size as follows. Suppose vertex IDs are 
1,2,...,n. Define a (21)-dimensional vector £ 
by Loi-1 = |Ly(i)| and Lo; = |Lp(i)|. The p- 
norm of L is defined as ||L||p = oaeerg LPP, 
where p is a natural number and ||Lllo = 
max £;. Note that ||£||;/2 is the total size 
of the labeling and ||L|loo is the maximum 
label size. Babenko et al. [6] generalize the 
algorithm of Cohen et al. to obtain an O(logn)- 
approximation algorithm for this more general 
problem in O(n°) time. Delling et al. [11] show 
that the eager-lazy approach yields an O(logn)- 
approximation algorithm running in time 
O(n? logn min(p, log n)). 


Hierarchical Hub Labelings 

Even with the performance improvements men- 
tioned above, GHL requires too much time and 
space to work on large networks. To overcome 
this problem, one may use heuristics that have 
no known theoretical guarantees on the label size 
but produce small labels for large instances from 
a wide variety of domains. The most successful 
current heuristics use a restricted class of label- 
ings called hierarchical hub labeling (HHL) [4]. 
Hierarchical labels have the cover property and 
implement exact distance oracles. 


Hub Labeling (2-Hop Labeling) 

Given a labeling, let v < w if w is a hub 
of L(v). HL is hierarchical if < is a partial order. 
(Intuitively, v < w if w is “more important” 
than v.) We say that an HHL respects a given 
(total) order on the vertices if the partial order 
< induced by the HHL is consistent with the 
order. 

Consider an order defined by a permutation 
rank, with rank(v) < rank(w) if v appears 
before (is less important than) w. The canonical 
labeling L for rank is defined as follows [4]. 
Vertex v belongs to L ¢(u) if and only if there 
exists w such that v is the highest-ranked vertex 
that hits [u,w]. Similarly, v belongs to Ly(w) 
if and only if there exists wu such that v is the 
highest-ranked vertex that hits [v, w]. 

Abraham et al. [4] prove that the canonical 
labeling for a given vertex order rank is the 
minimum-sized labeling that respects rank. This 
suggests a two-stage approach for finding a small 
hierarchical hub labeling: first, find a “good” 
vertex order, and then compute its corresponding 
canonical labeling. We first discuss the latter step 
and then the former. 


From Orderings to Labelings 

We first consider how, given an order rank, one 
can compute the canonical hierarchical labeling 
L that respects rank. 

The straightforward way is to just apply the 
definition: for every pair [u, w] of vertices, find 
the maximum-ranked vertex on any shortest u— 
w path, and then add it to Ly¢(u) and Lp(w). 
Although polynomial, this algorithm is too slow 
in practice. 

A faster (but still natural) algorithm is as 
follows [4]. Start with an empty (partial) labeling 
L, and process vertices from the most to least 
important. When processing v, for every uncov- 
ered pair [u,w] that v covers, add v to L¢(u) 
and Ly,(w). (In other words, add v to the labels 
of all end points of arcs in the center graph Gy.) 
Abraham et al. [4] show how to implement this in 
O(mn log n) time and @(n?) space, which is still 
impractical for large instances. 

When labels are not too large, a much more 
efficient solution is the pruned labeling (PL) 
algorithm by Akiba et al. [5]. Starting from empty 
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labels, PL also processes vertices from the most 
to least important, with the iteration that pro- 
cesses vertex uv, adding v to all relevant labels. 
The crucial observation is that, when processing 
v, one only needs to look at uncovered pairs 
containing v itself; if [u,v] is not covered, PL 
adds v to L ¢ (u); if [v, w] is not covered, it adds v 
to L,(w). This is enough because of the subpath 
optimality property of the shortest paths. 

To process v efficiently, PL runs two pruned 
Dijkstra searches [13] from v. The first search 
works on the forward graph (out of v) as 
follows. Before scanning a vertex w (with 
distance label d(w) within the Dijkstra search), 
it computes a v—w distance estimate g by 
performing an HL query with the current 
partial labels. (If the labels do not intersect, 
setg = oo.) If q < d(w), the [v,w] pair is 
already covered by previous hubs, so PL prunes 
the search (ignores w). Otherwise (if g > d(w)), 
PL adds (v,dist(v,w)) to Ly(w) and scans w 
as usual. The second Dijkstra search uses 
the reverse graph and is pruned similarly; it 
adds (v,dist(w,v)) to Ly(w) for all scanned 
vertices w. Note that the number of Dijkstra 
scans equals the size of the labeling. Since 
each visited vertex requires an HL query 
using partial labels, the running time can 
be quadratic in the average label size. It 
is easy to see that PL produces canonical 
labelings. 

The final algorithm we discuss, due to Abra- 
ham et al. [4], computes a hierarchical hub la- 
beling from a vertex ordering recursively. Its 
basic building block is the shortcut operation (see 
e.g., [16]). To shortcut a vertex v, the operation 
deletes v from the graph and adds arcs to ensure 
that the distances between the remaining vertices 
remain unchanged. For every pair consisting of 
an incoming arc (u, v) and an outgoing arc (v, w), 
the algorithm checks if (u, v) - (v, w) is the only 
shortest u—w path (by running a partial Dijkstra 
search from u or w) and, if so, adds a new arc 
(u, w) with length €(u, w) = €(u, v) + £(v, w). 

The recursive algorithm computes one label at 
a time, from the bottom up (from the least to the 
most important vertex). It starts by shortcutting 
the least important vertex v from G to get a graph 
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G’ (same as G, but without v and its incident arcs 
and with the added shortcuts). It then recursively 
finds a labeling for G’, which gives correct dis- 
tances (in G) for all pairs of vertices not contain- 
ing v. Then, the algorithm computes the label of v 
from the labels of its neighbors. We describe how 
to compute L ¢(v); Ly(v) is computed similarly. 
The crucial observation is that any nontrivial 
shortest path starting at v must go through one of 
its neighbors. Accordingly, we initialize L ¢(v) 
with entry (v, 0) (to cover the trivial path from v 
to itself), and then, for every neighbor w of v in 
G and every entry (x, dist(w, x)) € L¢(w), add 
(x, £(v, w) + dist(w, x)) to L ¢(v). If x already 
is a hub of v, we only keep the smallest entry 
for x. Finally, we prune from L ¢(v) the entries 
(x, £(v,w) + dist(w,x)) for which €(v,w) + 
dist(w,x) > dist(v,x). (This can happen if 
the shortest path from v to x through another 
neighbor w’ of v is shorter than the one through 
w.) Note that dist(v,x) can be computed using 
the labels of v and x. In general, the shortcut 
operation can make the graph dense, limiting the 
efficiency of the bottom-up approach. On some 
network classes, such as road networks, the graph 
remains sparse and the approach scales to large 
problems. 


Vertex Ordering Heuristics 

As mentioned above, the size of the labeling is 
determined by the ordering. The most natural 
approach to capture the notion of importance is 
attributed to Abraham et al. [4], whose greedy 
ordering algorithm obtains good orderings on a 
wide class of problems. It orders vertices from the 
most to least important using a greedy selection 
rule. In each iteration, it selects as the next most 
important hub the vertex v that hits the most 
vertex pairs not covered by previously selected 
vertices. 

When the shortest paths are unique, this can 
be implemented relatively efficiently. The algo- 
rithm maintains (initially full) the shortest-path 
trees from each vertex in the graph. The tree Ts 
rooted at s implicitly represents all shortest paths 
starting at s. The total number of descendants of a 
vertex v (in aggregate over all trees) is exactly the 
number of paths it covers. Once such a vertex v 
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is picked as the next hub, we restore this invariant 
for the remaining paths by removing all of v’s 
descendants (including v itself) from all trees. 
Abraham et al. [4] show how the entire greedy 
order can be found in O(nmlogn) time. An 
alternative algorithm (in the same spirit) works 
even if the shortest paths are not unique, but takes 
O(n) time [12]. 

The weighted greedy ordering algorithm is 
similar but selects v so as to maximize the ratio 
of the number of uncovered paths that v covers 
to the increase in the label size if v is selected 
next. This gives slightly better results and can 
be implemented in the same time bounds as 
the greedy ordering algorithm [4, 12]. Although 
faster than GHL, none of these greedy variants 
scale to large graphs. 

To cope with this problem, Delling et al. [12] 
developed RXL (Robust eXact Labeling), which 
can be seen as a sampling version of the greedy 
ordering algorithm. In each iteration, RXL finds 
a vertex v that approximately maximizes the 
number of pairs covered. Rather than maintaining 
n shortest-path trees, RXL maintains shortest- 
path trees from a small number of roots picked 
uniformly at random. It estimates the coverage 
of v based on how many descendants it has in 
these trees. To reduce the bias in this estimation, 
the algorithm discards outliers before taking the 
average number of descendants. Moreover, as the 
original trees shrink (because some of its subtrees 
become covered), new subtrees (from other roots) 
are added. These new trees are not full, however; 
they are pruned from the start (using PL), ensur- 
ing the total space (and time) usage remains under 
control. 

For certain graph classes, simpler ordering 
techniques can be used. Akiba et al. [5] show that 
ordering by degree works well on a subclass of 
complex networks. Abraham et al. [2,4] show that 
the order induced by the contraction hierarchies 
(CH) algorithm [16] works well on road networks 
and other sparse inputs. CH order vertices from 
the bottom up: using only local information, it 
determines the least important vertex, shortcuts 
it, and repeats the process in the remaining graph. 
The most relevant signals to estimate the im- 
portance of v are the arc difference (of number 
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of arcs removed and added if v were shortcut) 
and how many neighbors of v have already been 
shortcut. 


Label Representation and Queries 

Given a source s and a target f, one can compute 
the minimum of dist(s,v) + dist(v,f) over all 
ve Le(s)NLa(t) in O(\L ¢(s)|+|L ¢ (6) time. 
If vertex labels are represented as arrays sorted 
by hub IDs, one can compute L f(s) N Ly(t) by 
a coordinated sweep of the corresponding arrays, 
as in mergesort. This is very cache efficient and 
works well when the two labels have similar 
Sizes. 

In some applications, label sizes can be very 
different. Assuming (without loss of general- 
ity) that |Z ¢(s)| « |Ly¢(¢)|, one can compute 
L ¢(s)A L(t) in time OL ¢(s)|+og(|La(0)))) 
by performing a binary search for each hub v € 
L¢(s) to determine if v is in L,(t). In fact, 
this set intersection problem can be solved even 
faster, in O(min(|L ¢(s)|, |L,(¢)|)) time [19]. 

As each label can be stored in a contiguous 
memory block, HL queries are well suited for an 
external memory (or even distributed) implemen- 
tations, including relational databases [3] or key- 
value stores. In such cases, query times depend 
on the time to fetch two blocks of data. 

For in-memory implementations of HL, stor- 
age may be a bottleneck. One can trade space for 
time using label compression, which interprets 
each label as a tree and stores common subtrees 
only once; this reduces space consumption by an 
order of magnitude, but queries become much 
less cache efficient [10,12]. Another technique to 
reduce the space consumption is to store vertices 
and a constant number of their neighbors as 
superhubs in the labels [5]; on unweighted and 
undirected graphs, distances from a vertex v to 
all elements of a superhub can be represented 
compactly in difference form. This works well on 
some social and communication networks [5]. 

HL has efficient extensions to problems 
beyond point-to-point shortest paths, including 
one-to-many and via-point queries. These are 
important for applications in road networks, such 
as finding the closest points of interest, ride 
sharing, and path prediction [3]. 
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Experimental Results 


Even for very small (constant) sample sizes, the 
labels produced by RXL are typically no more 
than about 10% bigger [12] than those pro- 
duced by the full greedy hierarchical algorithms, 
which in turn are not much worse than those 
produced by GHL [11]. Scalability is much dif- 
ferent, however. In a few hours in a modern CPU, 
GHL can only handle graphs with about 10,000 
vertices [11]; for the greedy hierarchical algo- 
rithms, the practical limit is about 100,000 [4]. 
In contrast, as long as labels remain small, RXL 
scales to problems with millions of vertices [12] 
from a wide variety of graph classes, including 
meshes, grids, random geometric graphs (sensor 
networks), road networks, social networks, col- 
laboration networks, and web graphs. For exam- 
ple, for a web graph with 18.5 million vertices 
and almost 300 million arcs, one can find labels 
with fewer than 300 hubs on average in about half 
a day [12]; queries then take less than 2 us. 

For some graph classes, other methods have 
faster preprocessing. For continental road net- 
works with tens of millions of vertices, a hybrid 
approach combining weighted greedy (for the 
top few thousand vertices) with the CH order 
(for all other vertices) provides the best trade-off 
between preprocessing times and label size [2,4]. 
On a benchmark data set representing Western 
Europe (about 18 million vertices, 42.5 million 
arcs), it takes roughly an hour to compute la- 
bels with about 70 hubs on average, leading 
to average query times of about 0.5 Us, roughly 
the time of ten random memory accesses. With 
additional improvements, one can further reduce 
query times (but not the label sizes) by half [2], 
making it the fastest algorithm for this applica- 
tion [7]. For some unweighted and undirected 
complex (social, communication, and collabora- 
tion) networks, simply sorting vertices by de- 
gree [5] produces labels that are not much bigger 
than those computed by a more sophisticated 
ordering technique. 

Overall, RXL is the most robust method. For 
all instances tested in the literature, its prepro- 
cessing is never much slower than any other 
methods (and often much faster), and query times 
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are similar. In particular, CH-based ordering is 
too costly for large complex networks (as con- 
traction tends to create dense graphs), and the 
degree-based order leads to prohibitively large 
labels for road networks and web graphs. 
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Problem Definition 


A sequence of 7 positive weights or frequencies 
is given, (wj > 0| 0 <i <n), together with an 
output radix r, with r = 2 in the case of binary 
output strings. 


Objective To determine a sequence of integral 
codeword lengths (£; | 0 < i < n) such that: 
(a) apr < 1, and (b) C = Sh i wi 
is minimized. Any sequence of codeword lengths 
(£;) that satisfies these two properties describes a 
minimum-redundancy code for the weights (w;). 
Once a set of minimum-redundancy codeword 
lengths (€;) has been identified, a prefix-free r- 
ary code in which symbol i is assigned a code- 
word of length £; can always be constructed. 


Constraints 


1. Long messages. In one application, each 
weight w; is the frequency of symbol 7 in a 
message M of length m = |M| = 7") wi, 
and C is the number of symbols required by 
a compressed representation of M. In this 
application it is usual to assume that m > n. 

2. Entropy-based limit. Define W = S77=) wi 
to be the sum of the weights and pj = w;/W 
to be the corresponding probability of symbol 
i. Define Hy = — 3-4 (p; logy p;) to be the 
zero-order entropy of the distribution. Then 
when r = 2, [nHo|] < C <n[log,n]. 


Key Results 


A minimum-redundancy code can be identified in 
O(n) time if the weights w; are nondecreasing 
and in O(n logn) time if the weights must be 
sorted first. 


Example Weights 
The n = 10 weights (1,1,1,1,3,4, 4,7, 9, 9) 
with W = 40 are used as an example. 


Huffman’s Algorithm 
In 1952 David Huffman [3] described a process 
for calculating minimum-redundancy codes, de- 
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veloped in response to a term-paper challenge 
set the year before by his MIT class instructor, 
Robert Fano, a problem that Fano and his col- 
laborator Claude Shannon had already tackled 
unsuccessfully [7]. In his solution Huffman cre- 
ated a classic algorithm that is taught to most 
undergraduate computing students as part of al- 
gorithms classes. Initially the sequence of input 
weights (w;) is regarded as being the leaves of 
a tree, with no internal nodes, and each leaf 
the root of its own subtree. The two subtrees 
(whether singleton leaves or internal nodes) with 
the smallest root nodes are then combined by 
making both of them children of a new parent, 
with an assigned weight calculated as the sum 
of the two original nodes. The pool of subtrees 
decreases by one at each cycle of this process; 
after n — | iterations a total of n — 1 internal nodes 
has been added, and all of the original nodes must 
be leaves in a single tree and descendants of that 
tree’s root node. 

Figure 1 shows an example of codeword 
length computation, with the original weights 
across the top. Each iteration takes the two least- 
weight elements (leaf or internal) and combines 
them to make a new internal node; note that 
the internal nodes are created in nondecreasing 
weight order. Once the Huffman tree has been 
constructed, the sequence (£;) can be read from 
it, by computing the depth of each corresponding 
leaf node. In Fig.1, for example, one of the 
elements of weight 4 is at depth three in the tree, 
and one is at depth four from the root, hence 
fs = 4and lg = 3. A set of codewords can 
be assigned at the same time as the depths are 
being computed; one possible assignment of 
codewords that satisfies the computed sequence 
(€;) is shown in the second row in the lower 
box. Decoding throughput is considerably faster 
if codewords are assigned systematically based 
on codeword length in the manner shown, rather 
than by strictly following the edge labeling of the 
Huffman tree from which the codeword lengths 
were extracted [6]. 

Because ties can occur and can be broken 
arbitrarily, different codes are also possible. The 
sequence (f;) = (6, 6, 6, 6, 4, 3, 3, 3, 2, 2) has the 
same cost of C = 117 bits as the one shown 
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Huffman Coding, Fig. 1 Example of (binary) codeword 
lengths calculated using Huffman’s algorithm, showing 
the order in which internal nodes are formed, and their 
weights. The input weights in the top section are used 


in the figure. For the example weights, Hp = 
2.8853 bits per symbol, providing a lower bound 
of [115.41] = 116 bits on the total cost C for 
the input weights. In this case, the minimum- 
redundancy codes listed are just 1 bit inferior to 
the entropy-based lower limit. 


Implementing Huffman’s Algorithm 
Huffman’s algorithm is often used in algorithms 
textbooks as an example of a process that requires 
a dynamic priority queue. If a heap is used, for 
example, the 7 initial and n — 2 subsequent insert 
operations, take a total of O(n log) time, as do 
the 2(m — 1) extract-min operations. 

A simpler approach is to first sort the n 
weights into increasing order and then apply an 
O(n)-time algorithm due to van Leeuwen [10]. 
Two sorted lists are maintained: a static one of 
original weights, representing the leaves of the 
Huffman tree, and a dynamic queue of internal 
nodes that is initially empty, to which new 
internal nodes are appended as they are created. 
Each iteration compares front-of-list elements 
from the two lists and combines the two that have 
the least weight and then adds the new internal 
node at the tail of the queue. The algorithm stops 


to compute the corresponding codeword lengths in the 
bottom box. A valid assignment of prefix-free codewords 
is also shown 


when the queue contains only one node; it is the 
last item that was added and is the root of the 
Huffman tree. 

If the input weights are provided in an array 
wi = Ali | 0 <i < al of sorted integers, 
that array can be processed in situ into an output 
array £; = Ali] in O(n) time by van Leeuwen’s 
technique using an implementation described by 
Moffat and Katajainen [5]. Each array element 
takes on values that are, variously, input weight, 
internal node weight, parent pointer, and then, 
finally, codeword length. Algorithm | is taken 
from Moffat and Katajainen [5] and describes 
this process in detail. There are three phases 
of operation. In the first phase, in steps 2-2, 
leaf weights in A[leaf...n —1] are combined 
with a queue of internal node weights in 


Alroot...next—1] to form a list of parent 
pointers in A[0...root — 1]. At the end of this 
phase, A[0...” — 3] is a list of parents, A[n — 2] 


is the sum of the weights, and A[n — 1] is unused. 

In phase 2 (steps 12-3), the set of parent 
pointers of internal nodes is converted to a set 
of internal node depths. This mapping is done by 
processing the tree from the root down, making 
the depth of each node one greater than the depth 
of its parent. 
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Algorithm 1 Compute Huffman codeword lengths 


941 


function calc_huff_lens(A, n) 
// Phase 1 
set leaf <— 0 and root < 0 
for next — 0 ton —2 do 


set A[next] < Al[root] and A[root] <— next and root <— root + 1 


else 


set A[next] < Al[leaf] and leaf < leaf +1 


end if 


9: repeat steps 1-8, but adding to A[next] rather than assigning to it 


10: end for 

11: // Phase 2 

12: set A[n —2] <0 

13: for next < n — 3 downto 0 do 


0: 
1 
2 
3 
4: if leaf > n or (root < next and A[root] < Al[leaf]) then 
5: 
6 
7 
8 


> Input: A[i — 1] < A[i] forO <i<n 


> Use internal node 
> Use leaf node 


> Find second child 


14: set A[next] < A[A|[next]] + 1 > Compute depths of internal nodes 
15: end for 
16: // Phase 3 
17: set avail < 1 and used < 0 and depth < 0 and root <n — 2 and next<—n—-1 
18: while avail > 0 do 
19: while root > 0 and A[root] = depth do 
> Count internal nodes used at depth depth 
20: set used <— used + 1 and root <— root — 1 
21: end while 
22: while avail > used do > Assign as leaves any nodes that are not internal 
23: set A[next] < d and next <— next — 1 and avail <— avail —1 
24: end while 
25: set avail <— 2 - used and depth <— depth + 1 and used <— 0 > Move to next depth 


26: end while 
27: return A 
28: end function 


> Output: Ali] is the length £; of the i th codeword 


Phase 3 (steps 17-4) then processes those 
internal node depths and converts them to a list 
of leaf depths. At each depth, some total number 
avail of nodes exist, being twice the number of 
internal nodes at the previous depth. Some num- 
ber used of those are internal nodes; the balance 
must thus be leaf nodes at this depth and can be 
assigned as codeword lengths. Initially there is 
one node available at depth = 0, representing the 
root of the whole Huffman tree. Table | shows 
several snapshots of the Moffat and Katajainen 
code construction process when applied to the 
example sequence of weights. 


Nonbinary Output Alphabets 

The example Huffman tree developed in Fig. 1 
and the process shown in Algorithm | assume 
that the output alphabet is binary. Huffman noted 
in his original paper that for r-ary alphabets 
all that is required is to add additional dummy 
symbols of weight zero, so as to bring the total 


number of symbols to be one more than a multi- 
ple of (r — 1). Each merging step then combines 
r leaf or internal nodes to form a new root node 
and decreases the number of items by r — 1. 


Dynamic Huffman Coding 

Another assumption made by the processes 
described so far is that the symbol weights 
are known in advance and that the code that is 
computed can be static. This assumption can 
be satisfied, for example, by making a first 
pass over the message that is to be encoded. 
In a dynamic coding system, symbols must be 
coded on the fly, as soon as they are received 
by the encoder. To achieve this, the code must 
be adaptive, so that it can be altered after each 
symbol. Vitter [11] summarizes earlier work 
by Gallager [2], Knuth [4], and Cormack and 
Horspool [1] and describes a mechanism in 
which the total encoding cost, including the 
cost of keeping the code tree up to date, is 
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Huffman Coding, Table 1 Sequence of values com- 
puted by Algorithm | for the example weights. The first 
row shows the initial state of the array, with A[i] = w;. 
Values “-2-” indicate parent pointers of internal nodes 


0 1 
Initial arrangement, A[i] = w; 1 1 
Phase 1, root = 3,next = 5, leaf = 7 -2- -2- 
Phase 1, finished, root = 8 -2- -2- 
Phase 2, next = 4 -2- -2- 
Phase 2, finished (4) (4) 
Phase 3, next = 5, avail = 4 (4) (4) 
Final arrangement, A[i] = €; 5 5 


O(1) per output bit. Turpin and Moffat [9] 
describe an alternative approximate algorithm 
that reduces the time required by a constant 
factor, by collecting the frequency updates into 
batches and allowing controlled inefficiency in 
the length of the coded output sequence. Their 
“GEO” Coding method is faster than dynamic 
Huffman Coding and also faster than dynamic 
Arithmetic Coding, which is comparable in 
speed to dynamic Huffman Coding, but uses less 
space for the dynamic frequency-counting data 
structure. 


Applications 


Minimum-redundancy codes have widespread 
use in data compression systems. The sequences 
of weights are usually conditioned according 
to a model, rather than taken as plain symbol 
frequency counts in the source message. The 
use of multiple conditioning contexts, and hence 
multiple codes, one per context, allows improved 
compression when symbols are not independent 
in the message, as is the case in natural 
language data. However, when the contexts are 
sufficiently specific that highly biased probability 
distributions arise, Arithmetic Coding will yield 
superior compression effectiveness. 

Turpin and Moffat [8] consider several ancil- 
lary components of Huffman Coding, including 
methods for transmitting the description of the 
code to the decoder. 


Huffman Coding 


that have already been merged; italic values “7” indicate 
weights of internal nodes before being merged; values 
“(4)” indicate depths of internal nodes; bold values “5” 


indicate depths of leaves; and values “—” are unused 
3 4 8 9 
1 3 4 4 9 9 
7 8 - - 7 9 9 
-5- -6- -7- -8- -8- 40 - 
5S 6 2 Gd WM OO - 
QB @ @ dM Wd |oO 
(3) (2) (2) 3 3 2 2 
5 4 4 3 3 2 2 
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Definition 


The input/output model (I/O model) [1] views the 
computer as consisting of a processor, internal 
memory (RAM), and external memory (disk). See 
Fig. 1. The internal memory is of limited size, 
large enough to hold M data items. The external 
memory is of conceptually unlimited size and is 
divided into blocks of B consecutive data items. 
All computation has to happen on data in internal 
memory. Data is brought into internal memory 
and written back to external memory using //O 
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operations (I/Os), which are performed explicitly 
by the algorithm. Each such operation reads or 
writes one block of data from or to external 
memory. The complexity of an algorithm in this 
model is the number of I/Os it performs. 

The parallel disk model (PDM) [15] is an 
extension of the I/O model that allows the ex- 
ternal memory to consist of D > 1 parallel 
disks. See Fig.2. In this model, a single I/O 
operation is capable of reading or writing up to 
D independent blocks, as long as each of them is 
stored on a different disk. 

The parallel external memory (PEM) [5] 
model is a simple multiprocessor extension of the 
1/O model. See Fig. 3. It consists of P processing 
units, each having a private cache of size M. Data 
exchange between the processors takes place via 
a shared main memory of conceptually unlimited 
size: in a parallel I/O operation, each processor 
can transfer one block of size B between its 
private cache and the shared memory. 

The relationship between the PEM model and 
the very popular MapReduce framework is dis- 
cussed in [8]. A survey of realistic computer 
models can be found in [2]. 


Key Results 


A few complexity bounds are of importance to 
virtually every I/O-efficient algorithm or data 
structure. The searching bound of O(logzn) 
I/Os, which can be achieved using a Btree [6], is 
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1/O-Model, Fig. 1 The 
1/O model 


1/O-Model, Fig. 2. The parallel disk model 


the cost of searching for an element in an ordered 
collection of n elements using comparisons only. 
It is thus the equivalent of the O(log 7) searching 
bound in internal memory. 

Scanning a list of m consecutive data items ob- 
viously takes [n/B] I/Os. This scanning bound 
is usually referred to as a “linear number of I/Os” 
because it is the equivalent of the O(n) time 
bound required to do the same in internal mem- 
ory. The respective PDM and PEM bounds are 
[n/ DB] and [n/ PB}. 

The sorting bound of sort(n) = ©O((n/B) 
logyjp(/B)) V/Os denotes the cost of sorting 
n elements using comparisons only. It is 
thus the equivalent of the ©(nlogn) sorting 
bound in internal memory. In the PDM and 
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PEM model, the sorting bound becomes 
@((n/DB)logy;,(n/B)) and @((n/ PB) logyyp 
(n/B)), respectively. The sorting bound can be 
achieved using a range of sorting algorithms, 
including external merge sort [1, 5, 10] and 
distribution sort [1,5,9]. 

Arguably, the most interesting bound is 
the permutation bound, that is, the cost of 
rearranging n elements in a given order, which 
is ©(min(sort(7),7)) [1] or, in the PDM, 
© (min(sort(7),n/D)) [15]. For all practical 
purposes, this is the same as the sorting 
bound. Note the contrast to internal memory 
where, up to constant factors, permuting has 
the same cost as a linear scan. Since almost 
all nontrivial algorithmic problems include a 
permutation problem, this implies that only 
exceptionally simple problems can be solved 
in O(scan(n)) I/Os; most problems have an 
Q(perm(n)), that is, essentially an Q2(sort(7)) 
lower bound. Therefore, while internal-memory 
algorithms aiming for linear time have to 
carefully avoid the use of sorting as a tool, 
external-memory algorithms can sort without 
fear of significantly exceeding the lower bound. 
This makes the design of I/O-optimal algorithms 
potentially easier than the design of optimal 
internal-memory algorithms. It is, however, 
counterbalanced by the fact that, unlike in 
internal memory, the sorting bound is not equal 
to n times the searching bound, which implies 
that algorithms based on querying a tree-based 
search structure O(n) times usually do not 
translate into I/O-efficient algorithms. Buffer 
trees [4] achieve an amortized search bound of 
O((1/B) logy;g(N/B)) WOs but can be used 
only if the entire update and query sequence is 
known in advance and thus provide only a limited 
solution to this problem. 

Apart from these fundamental results, there 
exist a wide range of interesting techniques, 
particularly for solving geometric and graph 
problems. For surveys, refer to [3, 14]. Also, 
many I/O-efficient algorithms have been derived 
from fast and work-efficient parallel algorithms; 
see the book entry on external-memory list 
ranking for a well-known example of this 
technique. 
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1/O-Model, Fig. 3 The 
PEM model 


Applications 


Modern computers are equipped with memory 
hierarchies consisting of several levels of cache 
memory, main memory (RAM), and disk(s). Ac- 
cess latencies increase with the distance from the 
processor, as do the sizes of the memory levels. 
To amortize these increasing access latencies, 
data are transferred between different levels of 
cache in blocks of consecutive data items. As a 
result, the cost of a memory access depends on 
the level in the memory hierarchy currently hold- 
ing the data item — the difference in access latency 
between L1 cache and disk is about 10° — and the 
cost of a sequence of accesses to data items stored 
at the same level depends on the number of blocks 
over which these items are distributed. 
Traditionally, algorithms were designed to 
minimize the number of computation steps; the 
access locality necessary to solve a problem using 
few data transfers between memory levels was 
largely ignored. Hence, the designed algorithms 
work well on data sets of moderate size but do 
not take noticeable advantage of cache memory 
and usually break down completely in out-of- 
core computations. Since the difference in access 
latencies is largest between main memory and 
disk, the I/O model focuses on minimizing 
this I/O bottleneck. This two-level view of the 
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Main 
Memory 


memory hierarchy keeps the model simple and 
useful for analyzing sophisticated algorithms 
while providing a good prediction of their 
practical performance. The picture is slightly 
more complex for flash memory-based solid state 
disks, which have recently become quite popular 
(also due to their energy efficiency [7]): not only 
do they internally use different block sizes for 
reading and writing, but their (reading) latency is 
also significantly smaller compared to traditional 
hard disks. Nevertheless, the latency gap of solid 
state disks compared to main memory remains 
large, and optimized device controllers or 
translation layers manage to hide the read/write 
discrepancy in most practical settings. Thus, the 
1/O model still provides reasonable estimates on 
flash memory, but extended models with different 
block sizes and access costs for reading and 
writing are more accurate. 

Much effort has been made already to translate 
provably I/O-efficient algorithms into highly ef- 
ficient implementations. Examples include TPIE 
[12] and STXXL [11], two libraries that aim to 
provide highly optimized and powerful primi- 
tives for the implementation of I/O-efficient al- 
gorithms. In particular, TPIE has been used to 
realize a number of geometric and GIS applica- 
tions, whereas STXXL has served as a basis for 
the implementation of various graph algorithms. 
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In spite of these efforts, a significant gap between 
the theory and practice of I/O-efficient algorithms 
remains (see next section). 


Open Problems 


There are a substantial number of open prob- 
lems in the area of I/O-efficient algorithms. The 
most important ones concern graph and geomet- 
ric problems. 

Traditional graph algorithms usually apply a 
well-organized graph traversal such as depth-first 
search or breadth-first search to gain information 
about the structure of the graph and then use this 
information to solve the problem at hand. For 
massive sparse graphs, no I/O-efficient depth-first 
search algorithm is known, and for breadth-first 
search and shortest paths, only limited progress 
has been made on undirected graphs. Some re- 
cent results concern dynamic and approximation 
variants or all-pairs shortest paths problems. For 
directed graphs, even such simple problems as 
deciding whether there exists a directed path 
between two vertices are currently still open. 
The main research focus in this area is therefore 
to either develop (or disprove the existence of) 
1/O-efficient general traversal algorithms or to 
continue the current strategy of devising graph 
algorithms that depart from traditional traversal- 
based approaches. 

Techniques for solving geometric problems 
I/O efficiently are much better understood than 
is the case for graph algorithms, at least in two 
dimensions. Nevertheless, there are a few impor- 
tant frontiers that remain. Despite new results 
on some range reporting problems in three and 
higher dimensions, arguably the most important 
frontier is the development of I/O-efficient algo- 
rithms and data structures for higher-dimensional 
geometric problems. Motivated by database ap- 
plications, results on specialized range search- 
ing variants (such as coloured and top-K range 
searching) have begun to appear in the literature. 
Little work has been done in the past on solving 
proximity problems, which pose another frontier 
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currently being explored. Motivated by the need 
for such structures in a range of application ar- 
eas and in particular in geographic information 
systems, there has been some recent focus on 
the development of multifunctional data struc- 
tures, that is, structures that can answer different 
types of queries efficiently. This is in contrast 
to most existing structures, which are carefully 
tuned to efficiently support one particular type of 
query. 

We also face a significant lack of external- 
memory lower bounds. Classic results concern 
permuting and sorting (see [14] for an overview), 
and more recent results concentrate on I/O- 
efficient data structure problems such as dynamic 
membership [13]. The optimality of many 
basic external-memory algorithms, however, 
is completely open. For instance, it is unclear 
whether sparse graph traversal (and hence 
probably a large number of advanced graph 
problems) will ever be solvable in an I/O-efficient 
manner. 

For both I/O-efficient graph algorithms and 
computational geometry, there is still a substan- 
tial gap between the obtained theoretical results 
and what is known to be practical, even though 
quite some algorithm engineering work has been 
done during the last decade. Thus, if I/O-efficient 
algorithms in these areas are to have more 
practical impact, increased efforts are needed 
to bridge this gap by developing practically 
1/O-efficient algorithms that are still provably 
efficient. 
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Problem Definition 


DIMACS Implementation Challenges (http:// 
dimacs.rutgers.edu/Challenges/) are scientific 
events devoted to assessing the practical perfor- 
mance of algorithms in experimental settings, 
fostering effective technology transfer and 
establishing common benchmarks for fundamen- 
tal computing problems. They are organized by 
DIMACS, the Center for Discrete Mathematics 
and Theoretical Computer Science. One of 
the main goals of DIMACS Implementation 
Challenges is to address questions of determining 
realistic algorithm performance where worst case 
analysis is overly pessimistic and probabilistic 
models are too unrealistic: experimentation can 
provide guides to realistic algorithm performance 
where analysis fails. Experimentation also brings 


948 


algorithmic questions closer to the original 
problems that motivated theoretical work. It also 
tests many assumptions about implementation 
methods and data structures. It provides an 
opportunity to develop and test problem 
instances, instance generators, and other methods 
of testing and comparing performance of 
algorithms. And it is a step in technology transfer 
by providing leading edge implementations of 
algorithms for others to adapt. 

The first Challenge was held in 1990- 
1991 and was devoted to Network flows 
and Matching. Other addressed problems 
included: Maximum Clique, Graph Coloring, and 
Satisfiability (1992-1993), Parallel Algorithms 
for Combinatorial Problems (1993-1994), Frag- 
ment Assembly and Genome Rearrangements 
(1994-1995), Priority Queues, Dictionaries, 
and Multi-Dimensional Point Sets (1995— 
1996), Near Neighbor Searches (1998-1999), 
Semidefinite and Related Optimization Problems 
(1999-2000), and The Traveling Salesman 
Problem (2000-2001). 

This entry addresses the goals and the results 
of the 9th DIMACS Implementation Challenge, 
held in 2005-2006 and focused on Shortest Path 
problems. 


The 9th DIMACS Implementation 
Challenge: The Shortest Path Problem 
Shortest path problems are among the most fun- 
damental combinatorial optimization problems 
with many applications, both direct and as sub- 
routines in other combinatorial optimization al- 
gorithms. Algorithms for these problems have 
been studied since the 1950s and still remain an 
active area of research. 

One goal of this Challenge was to create 
a reproducible picture of the state of the art in 
the area of shortest path algorithms, identifying 
a standard set of benchmark instances and gen- 
erators, as well as benchmark implementations 
of well-known shortest path algorithms. Another 
goal was to enable current researchers to compare 
their codes with each other, in hopes of identify- 
ing the more effective of the recent algorithmic 
innovations that have been proposed. 

Challenge participants studied the following 
variants of the shortest paths problem: 
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¢ Point to point shortest paths [4, 5, 6, 9, 10, 11, 
14]: the problem consists of answering mul- 
tiple online queries about the shortest paths 
between pairs of vertices and/or their lengths. 
The most efficient solutions for this problem 
preprocess the graph to create a data structure 
that facilitates answering queries quickly. 

¢ External-memory shortest paths [2]: the 
problem consists of finding shortest paths in 
a graph whose size is too large to fit in internal 
memory. The problem actually addressed in 
the Challenge was single-source shortest paths 
in undirected graphs with unit edge weights. 

¢ Parallel shortest paths [8, 12]: the problem 
consists of computing shortest paths using 
multiple processors, with the goal of achiev- 
ing good speedups over traditional sequen- 
tial implementations. The problem actually 
addressed in the Challenge was single-source 
shortest paths. 

¢ K-shortest paths [13, 15]: the problem 
consists of ranking paths between a pair 
of vertices by non decreasing order of their 
length. 

¢ Regular-language constrained _ shortest 
paths: [3] the problem consists of a general- 
ization of shortest path problems where paths 
must satisfy certain constraints specified by 
a regular language. The problems studied in 
the context of the Challenge were single- 
source and point-to-point shortest paths, 
with applications ranging from transportation 
science to databases. 


The Challenge culminated in a Workshop held at 
the DIMACS Center at Rutgers University, Pis- 
cataway, New Jersey on November 13-14, 2006. 
Papers presented at the conference are avail- 
able at the URL: http://www.dis.uniromal .it/~ 
challenge9/papers.shtml. Selected contributions 
are expected to appear in a book published by the 
American Mathematical Society in the DIMACS 
Book Series. 


Key Results 


The main results of the 9th DIMACS Implemen- 
tation Challenge include: 
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¢ Definition of common file formats for 
several variants of the shortest path problem, 
both static and dynamic. These include an 
extension of the famous DIMACS graph 
file format used by several algorithmic 
software libraries. Formats are described 
at the URL: http://www.dis.uniromal .it/~ 
challenge9/formats.shtml. 

¢ Definition of a common set of core input 


instances for evaluating shortest path 
algorithms. 

¢ Definition of benchmark codes for shortest 
path problems. 


¢ Experimental evaluation of state-of-the-art 
implementations of shortest path codes on the 
core input families. 

¢ A discussion of directions for further research 
in the area of shortest paths, identifying 
problems critical in real-world applications 
for which efficient solutions still remain 
unknown. 


The chief information venue about the 9th DI- 
MACS Implementation Challenge is the website 
http://www.dis.uniromal .it/~challenge9. 


Applications 


Shortest path problems arise naturally in 
a remarkable number of applications. A limited 
list includes transportation planning, network 
optimization, packet routing, image segmenta- 
tion, speech recognition, document formatting, 
robotics, compilers, traffic information systems, 
and dataflow analysis. It also appears as 
a subproblem of several other combinatorial 
optimization problems such as network flows. 
A comprehensive discussion of applications of 
shortest path problems appears in [1]. 


Open Problems 


There are several open questions related to short- 
est path problems, both theoretical and practical. 
One of the most prominent discussed at the 9th 
DIMACS Challenge Workshop is modeling traf- 
fic fluctuations in point-to-point shortest paths. 
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The current fastest implementations preprocess 
the input graph to answer point-to-point queries 
efficiently, and this operation may take hours on 
graphs arising in large-scale road map naviga- 
tion systems. A change in the traffic conditions 
may require rescanning the whole graph several 
times. Currently, no efficient technique is known 
for updating the preprocessing information with- 
out rebuilding it from scratch. This would have 
a major impact on the performance of routing 
software. 


Data Sets 


The collection of benchmark inputs of the 9th 
DIMACS Implementation Challenge includes 
both synthetic and real-world data. All graphs are 
strongly connected. Synthetic graphs include 
random graphs, grids, graphs embedded on 
a torus, and graphs with small-world properties. 
Real-world inputs consist of graphs representing 
the road networks of Europe and USA. Europe 
graphs are provided by courtesy of the PTV 
company, Karlsruhe, Germany, subject to signing 
a (no-cost) license agreement. They include the 
road networks of 17 European countries: AUT, 
BEL, CHE, CZE, DEU, DNK, ESP, FIN, FRA, 
GBR, IRL, ITA, LUX, NDL, NOR, PRT, SWE, 
with a total of about 19 million nodes and 23 
million edges. USA graphs are derived from the 
UA Census 2000 TIGER/Line Files produced 
by the Geography Division of the US Census 
Bureau, Washington, DC. The TIGER/Line 
collection is available at: http://www.census. 
gov/geo/www/tiger/tigerua/ua_tgr2k.html. The 
Challenge USA core family contains a graph 
representing the full USA road system with about 
24 million nodes and 58 million edges, plus 11 
subgraphs obtained by cutting it along different 
bounding boxes as shown in Table |. Graphs in 
the collection include also node coordinates and 
are given in DIMACS format. 

The benchmark input package also features 
query generators for the single-source and point- 
to-point shortest path problems. For the single- 
source version, sources are randomly chosen. 
For the point-to-point problem, both random and 
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Implementation Challenge for Shortest Paths, Table 1 USA road networks derived from the TIGER/Line 


collection 

Name Description Nodes 

USA Full USA 23 947 347 
CTR Central USA 14 081 816 
W Western USA 6 262 104 
E Eastern USA 3 598 623 
LKS Great Lakes 2 758 119 
CAL California and Nevada 1 890 815 
NE Northeast USA 1524 453 
NW Northwest USA 1 207 945 
FLA Florida 1 070 376 
COL Colorado 435 666 
BAY Bay Area 321 270 
NY New York City 264 346 


local queries are considered. Local queries of the 
form (s, f) are generated by randomly picking ¢ 
among the nodes with rank in [2',2't!) in the 
ordering in which nodes are scanned by Dijkstra’s 
algorithm with source s, for any parameter i. 
Clearly, the smaller 7 is, the closer nodes s and 
t are in the graph. Local queries are important to 
test how the algorithms’ performance is affected 
by the distance between query endpoints. 

The core input families of the 9th DIMACS 
Implementation Challenge are available at the 
URL: _ http://www.dis.uniromal .it/~challenge9/ 
download.shtml. 


Experimental Results 


One of the main goals of the Challenge was to 
compare different techniques and algorithmic ap- 
proaches. The most popular topic was the point- 
to-point shortest path problem, studied by six 
research groups in the context of the Challenge. 
For this problem, participants were additionally 
invited to join a competition aimed at assessing 
the performance and the robustness of different 
implementations. The competition consisted of 
preprocessing a version of the full USA graph 
of Table | with unit edge lengths and answering 
a sequence of 1,000 random distance queries. 
The details were announced on the first day of 


Bounding box Bounding box 


Arcs latitude (N) longitude (W) 
58 333 344 - - 
34 292 496 [25.0; 50.0] [79.0; 100.0] 
15 248 146 [27.0; 50.0] [100.0; 130.0] 
8 778 114 [24.0; 50.0] [-co; 79.0] 
6 885 658 [41.0; 50.0] (74.0; 93.0] 
4 657 742 [32.5; 42.0] [114.0; 125.0] 
3 897 636 [39.5, 43.0] [-co; 76.0] 
2 840 208 [42.0; 50.0] [116.0; 126.0] 
2712 798 [24.0; 31.0] [79; 87.5] 
1 057 066 [37.0; 41.0] [102.0; 109.0] 
800 172 [37.0; 39.0] [121; 123] 
733 846 [40.3; 41.3] [73.5; 74.5] 


the workshop and the results were due on the 
second day. To compare experimental results by 
different participants on different platforms, each 
participant ran a Dijkstra benchmark code [7] on 
the USA graph to do machine calibration. The fi- 
nal ranking was made by considering each query 
time divided by the time required by the bench- 
mark code on the same platform (benchmark 
ratio). Other performance measures taken into ac- 
count were space usage and the average number 
of nodes scanned by query operations. 

Six point-to-point implementations were run 
successfully on the USA graph defined for the 
competition. Among them, the fastest query time 
was achieved by the HH-based transit code [14]. 
Results are reported in Table 2. Codes RE and 
REAL(16, 1) [9] were not eligible for the compe- 
tition, but used by the organizers as a proof that 
the problem is feasible. Some other codes were 
not able to deal with the size of the full USA 
graph, or incurred runtime errors. 

Experimental results for other variants of the 
shortest paths problem are described in the papers 
presented at the Challenge Workshop. 


URL to Code 


Generators of problem families and benchmark 
solvers for shortest paths problems are avail- 
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Implementation Challenge for Shortest Paths, Table 2 
Results of the Challenge competition on the USA graph 
(23.9 million nodes and 58.3 million arcs) with unit arc 


lengths. The benchmark ratio is the average query time 
Preprocessing 
Time 
Code (minutes) Space (MB) 
HH -based transit [14] 104 3664 
TRANSIT [4] 720 n.a. 
HH Star [6] 32 2662 
REAL(16,1) [9] 107 2435 
HH with DistTab [6] 29 2101 
RE [9] 88 861 


able at the URL: http://www.dis.uniromal .it/~ 
challenge9/download.shtml. 
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divided by the time required to answer a query using the 


Chal 


lenge Dijkstra benchmark code on the same platform. 


Query times and node scans are average values per query 


over 


Quer 


Node scans 


n.a. 
n.a. 
1082 
823 
1671 
3065 


6. 


10. 


11. 


12. 


13. 


14. 


15. 


1000 random queries 


y 


Benchmark ratio 
4.78 - 10-6 
10.77 - 10~¢ 
287.32 - 10-6 
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Time (ms) 
0.019 
0.052 

1.14 

1.42 

1.61 
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Problem Definition 


The Eighth DIMACS Implementation Challenge, 
sponsored by DIMACS, the Center for Discrete 
Mathematics and Theoretical Computer Science, 
concerned heuristics for the symmetric Traveling 
Salesman Problem. The Challenge began in June 
2000 and was organized by David S. Johnson, 
Lyle A. McGeoch, Fred Glover and César Rego. 
It explored the state-of-the-art in the area of TSP 
heuristics, with researchers testing a wide range 
of implementations on a common (and diverse) 
set of input instances. The Challenge remained 
ongoing in 2007, with new results still being 
accepted by the organizers and posted on the 
Challenge website: www.research.att.com/~dsj/ 
chtsp. A summary of the submissions through 
2002 appeared in a book chapter by Johnson and 
McGeoch [5]. 

Participants tested their heuristics on four 
types of instances, chosen to test the robustness 
and scalability of different approaches: 


1. The 34 instances that have at least 1000 cities 
in TSPLIB, the instance library maintained by 
Gerd Reinelt. 

2. A set of 26 instances consisting of points 
uniformly distributed in the unit square, with 
sizes ranging from 1000 to 10,000,000 cities. 
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3. A set of 23 randomly generated clustered 
instances, with sizes ranging from 1000 to 
316,000 cities. 

4. A set of 7 instances based on random distance 
matrices, with sizes ranging from 1000 to 
10,000 cities. 


The TSPLIB instances and generators for the 
random instances are available on the Challenge 
website. In addition, the website contains 
a collection of instances for the asymmetric TSP 
problem. 

For each instance upon which a heuristic was 
tested, the implementers reported the machine 
used, the tour length produced, the user time, 
and (if possible) memory usage. Some heuristics 
could not be applied to all of the instances, 
either because the heuristics were inherently geo- 
metric or because the instances were too large. 
To help facilitate timing comparisons between 
heuristics tested on different machines, partici- 
pants ran a benchmark heuristic (provided by the 
organizers) on instances of different sizes. The 
benchmark times could then be used to normal- 
ize, at least approximately, the observed running 
times of the participants’ heuristics. 

The quality of a tour was computed from 
a submitted tour length in two ways: as a ratio 
over the optimal tour length for the instance 
(if known), and as a ratio over the Held-Karp 
(HK) lower bound for the instance. The Concorde 
optimization package of Applegate et al. [1] was 
able to find the optimum for 58 of the instances in 
reasonable time. Concorde was used in a second 
way to compute the HK lower bound for all but 
the three largest instances. A third algorithm, 
based on Lagrangian relaxation, was used to com- 
pute an approximate HK bound, a lower bound on 
true HK bound, for the remaining instances. The 
Challenge website reports on each of these three 
algorithms, presenting running times and a com- 
parison of the bounds obtained for each instance. 

The Challenge website permits a variety of 
reports to be created: 


1. For each heuristic, tables can be generated 
with results for each instance, including tour 
length, tour quality, and raw and normalized 
running times. 
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2. For each instance, a table can be produced 
showing the tour quality and normalized run- 
ning time of each heuristic. 

3. For each pair of heuristics, tables and graphs 
can be produced that compare tour quality and 
running time for instances of different type 
and size. 


Heuristics for which results were submitted 
to the Challenge fell into several broad cate- 
gories: 

Heuristics designed for speed. These heuris- 
tics — all of which target geometric instances — 
have running times within a small multiple of the 
time needed to read the input instance. Examples 
include the strip and spacefilling-curve heuris- 
tics. The speed requirement affects tour quality 
dramatically. Two of these algorithms produced 
tours with 14 % of the HK lower bound for 
a particular TSPLIB instance, but none came 
within 25 % on the other 89 instances. 

Tour construction heuristics. These heuristics 
construct tours in various ways, without seeking 
to find improvements once a single tour passing 
through all cities is found. Some are simple, 
such as the nearest-neighbor and greedy heuris- 
tics, while others are more complex, such as the 
famous Christofides heuristic. These heuristics 
offer a number of options in trading time for tour 
quality, and several produce tours within 15 % 
of the HK lower bound on most instances in 
reasonable time. The best of them, a variant of 
Christofides, produces tours within 8 % on uni- 
form instances but is much more time-consuming 
than the other algorithms. 

Simple local improvement heuristics. These 
include the well-known two-opt and three-opt 
heuristics and variants of them. These heuristics 
outperform tour construction heuristics in terms 
of tour quality on most types of instances. For 
example, 3-opt gets within about 3 % of the 
HK lower bound on most uniform instances. The 
submissions in this category explored various im- 
plementation choices that affect the time-quality 
tradeoff. 

Lin-Kernighan and _ its These 
heuristics extend the local search neighborhood 
used in 3-opt. Lin-Kernighan can produce high- 


variants. 
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quality tours (for example, within 2 % of the HK 
lower bound on uniform instances) in reasonable 
time. One variant, due to Helsgaun [3], 
obtains tours within 1 % on a wide variety 
of instances, although the running time can be 
substantial. 

Repeated local search heuristics. These 
heuristics are based on repeated executions of 
a heuristic such as Lin-Kernighan, with random 
kicks applied to the tour after a local optimum is 
found. These algorithms can yield high-quality 
tours at increased running time. 

Heuristics that begin with repeated local 
search, One example is the tour-merge heuris- 
tic [2], which runs repeated local search multiple 
times, builds a graph containing edges found in 
the best tours, and does exhaustive search within 
the resulting graph. This approach yields the best 
known tours for some of the instances in the 
Challenge. 

The submissions to the Challenge demon- 
strated the remarkable effectiveness of heuristics 
for the traveling salesman problem. They 
also showed that implementation details, such 
a choice of data structure or whether to 
approximate aspects of the computation, can 
affect running time and/or solution quality 
greatly. Results for a given heuristic also varied 
enormously depending on the type of instance to 
which it is applied. 


URL to Code 


www.research.att.com/~dsj/chtsp 
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Problem Definition 


A distributed system is composed of a collec- 
tion of m processes which communicate with 
one another. Two means of interprocess com- 
munication have been heavily studied. Message- 
passing systems model computer networks where 
each process can send information over message 
channels to other processes. In shared-memory 
systems, processes communicate less directly by 
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accessing information in shared data structures. 
Distributed algorithms are often easier to de- 
sign for shared-memory systems because of their 
similarity to single-process system architectures. 
However, many real distributed systems are con- 
structed as message-passing systems. Thus, a key 
problem in distributed computing is the imple- 
mentation of shared memory in message-passing 
systems. Such implementations are also called 
simulations or emulations of shared memory. 

The most fundamental type of shared data 
structure to implement is a (read-write) register, 
which stores a value, taken from some domain 
D. It is initially assigned a value from D and 
can be accessed by two kinds of operations, read 
and write(v), where v € D. A register may be 
either single-writer, meaning only one process 
is allowed to write it, or multi-writer, meaning 
any process may write to it. Similarly, it may be 
either single-reader or multi-reader. Attiya and 
Welch [4] give a survey of how to build multi- 
writer, multi-reader registers from single-writer, 
single-reader ones. 

If reads and writes are performed one at a time, 
they have the following effects: a read returns 
the value stored in the register to the invoking 
process, and a write(v) changes the value stored 
in the register to v and returns an acknowl- 
edgment, indicating that the operation is com- 
plete. When many processes apply operations 
concurrently, there are several ways to specify 
a register’s behavior [14]. A single-writer reg- 
ister is regular if each read returns either the 
argument of the write that completed most re- 
cently before the read began or the argument 
of some write operation that runs concurrently 
with the read. (If there is no write that com- 
pletes before the read begins, the read may return 
either the initial value of the register or the 
value of a concurrent write operation.) A reg- 
ister is atomic (see » Linearizability) if each 
operation appears to take place instantaneously. 
More precisely, for any concurrent execution, 
there is a total order of the operations such that 
each read returns the value written by the last 
write that precedes it in the order (or the initial 
value of the register, if there is no such write). 
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Moreover, this total order must be consistent 
with the temporal order of operations: if one 
operation finishes before another one begins, the 
former must precede the latter in the total order. 
Atomicity is a stronger condition than regularity, 
but it is possible to implement atomic registers 
from regular ones with some complexity over- 
head [12]. 

This article describes the problem of 
implementing registers in an asynchronous 
message-passing system in which processes 
may experience crash failures. Each process 
can send a message, containing a finite string, 
to any other process. To make the descriptions 
of algorithms more uniform, it is often assumed 
that processes can send messages to themselves. 
All messages are eventually delivered. In the 
algorithms described below, senders wait for an 
acknowledgment of each message before sending 
the next message, so it is not necessary to assume 
that the message channels are first-in-first-out. 
The system is totally asynchronous: there is no 
bound on the time required for a message to 
be delivered to its recipient or for a process to 
perform a step of local computation. A process 
that fails by crashing stops executing its code, 
but other processes cannot distinguish between 
a process that has crashed and one that is running 
very slowly. (Failures of message channels [3] 
and more malicious kinds of process failures [15] 
have also been studied.) 

A t-resilient register implementation provides 
programmes to be executed by processes to simu- 
late read and write operations. These programmes 
can include any standard control structures and 
accesses to a process’s local memory, as well as 
instructions to send a message to another process 
and to read the process’s buffer, where incoming 
messages are stored. The implementation should 
also specify how the processes’ local variables 
are initialized to reflect any initial value of the im- 
plemented register. In the case of a single-writer 
register, only one process may execute the write 
programme. A process may invoke the read and 
write programmes repeatedly, but it must wait 
for one invocation to complete before starting the 
next one. In any such execution where at most tf 
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processes crash, each of a process’s invocations 
of the read or write programme should eventually 
terminate. Each read operation returns a result 
from the set D, and these results should satisfy 
regularity or atomicity. 

Relevant measures of algorithm complexity 
include the number of messages transmitted in 
the system to perform an operation, the number 
of bits per message, and the amount of local 
memory required at each process. One measure of 
time complexity is the time needed to perform an 
operation, under the optimistic assumption that 
the time to deliver messages is bounded by A and 
local computation is instantaneous (although al- 
gorithms must work correctly even without these 
assumptions). 


Key Results 


Implementing a Regular Register 

One of the core ideas for implementing shared 
registers in message-passing systems is a con- 
struction that implements a regular single-writer 
multi-reader register. It was introduced by At- 
tiya, Bar-Noy and Dolev [3] and made more 
explicit by Attiya [2]. A write(v) sends the value 
v to all processes and waits until a majority 
of the processes ({ 3 | + 1, including the writer 
itself) return an acknowledgment. A reader sends 
a request to all processes for their latest values. 
When it has received responses from a majority 
of processes, it picks the most recently written 
value among them. If a write completes before 
a read begins, at least one process that answers 
the reader has received the write’s value prior to 
sending its response to the reader. This is because 
any two sets that each contain a majority of the 
processes must overlap. The time required by 
operations when delivery times are bounded is 
2A. 

This algorithm requires the reader to deter- 
mine which of the values it receives is most 
recent. It does this using timestamps attached to 
the values. If the writer uses increasing integers as 
timestamps, the messages grow without bound as 
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the algorithm runs. Using the bounded timestamp 
scheme of Israeli and Li [13] instead yields the 
following theorem. 


Theorem 1 (Attiya [2]) There is an | *>*]- 
resilient implementation of a regular single- 
writer, multi-reader register in a message-passing 
system of n processes. The implementation uses 
O(n) messages per operation, with @(n?) bits 
per message. The writer uses O(n*) bits of 
local memory and each reader uses @(n?*) 
bits. 


Theorem | is optimal in terms of fault-tolerance. 
If [5] processes can crash, the network can be 
partitioned into two halves of size [3 |, with 
messages between the two halves delayed in- 
definitely. A write must terminate before any 
evidence of the write is propagated to the half not 
containing the writer, and then a read performed 
by a process in that half cannot return an up-to- 
date value. For t > El , registers can be imple- 
mented in a message-passing system only if some 
degree of synchrony is present in the system. The 
exact amount of synchrony required was studied 
by Delporte-Gallet et al. [6]. 

Theorem 1 is within a constant factor of the 
optimal number of messages per operation. Ev- 
idence of each write must be transmitted to at 
least [4] — 1 processes, requiring §2(n) mes- 
sages; otherwise this evidence could be obliter- 
ated by crashes. A write must terminate even 
if only | | + 1 processes (including the writer) 
have received information about the value writ- 
ten, since the rest of the processes could have 
crashed. Thus, a read must receive information 
from at least [3] processes (including itself) to 
ensure that it is aware of the most recent write 
operation. 

A f-resilient implementation, fort < [5] , that 
uses ©(f) messages per operation is obtained by 
the following adaptation. A set of 2 + 1 pro- 
cesses is preselected to be data storage servers. 
Writes send information to the servers, and wait 
for t + 1 acknowledgments. Reads wait for re- 
sponses from ¢ + 1 of the servers and choose the 
one with the latest timestamp. 
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Implementing an Atomic Register 

Attiya, Bar-Noy and Dolev [3] gave a construc- 
tion of an atomic register in which readers for- 
ward the value they return to all processes and 
wait for an acknowledgment from a majority. 
This is done to ensure that a read does not 
return an older value than another read that pre- 
cedes it. Using unbounded integer timestamps, 
this algorithm uses © (7) messages per operation. 
The time needed per operation when delivery 
times are bounded is 2A for writes and 4A for 
reads. However, their technique of bounding the 
timestamps increases the number of messages 
per operation to O(n”) (and the time per oper- 
ation to 12A). A better implementation of atomic 
registers with bounded message size is given 
by Attiya [2]. It uses the regular registers of 
Theorem | to implement atomic registers using 
the “handshaking” construction of Haldar and 
Vidyasankar [12], yielding the following result. 


Theorem 2 (Attiya [2]) There is an ["*]- 
resilient implementation of an atomic single- 
writer, multi-reader register in a message-passing 
system of n processes. The implementation uses 
@(n) messages per operation, with @(n?) bits 
per message. The writer uses O(n?) bits of local 
memory and each reader uses @(n*) bits. 


Since atomic registers are regular, this algorithm 
is optimal in terms of fault-tolerance and within 
a constant factor of optimal in terms of the num- 
ber of messages. The time used when delivery 
times are bounded is at most 14A for writes and 
18A for reads. 


Applications 


Any distributed algorithm that uses shared 
registers can be adapted to run in a message- 
passing system using the implementations 
described above. This approach yielded new 
or improved message-passing solutions for 
a number of problems, including randomized 
consensus [1], multi-writer registers [4], and 
snapshot objects » Distributed Snapshots. The 
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reverse simulation is also possible, using 
a straightforward implementation of message 
channels by single-writer, single-reader registers. 
Thus, the two asynchronous models are 
equivalent, in terms of the set of problems 
that they can solve, assuming only a minority 
of processes crash. However there is some 
complexity overhead in using the simulations. 

If a shared-memory algorithm is implemented 
in a message-passing system using the algorithms 
described here, processes must continue to oper- 
ate even when the algorithm terminates, to help 
other processes execute their reads and writes. 
This cannot be avoided: if each process must 
stop taking steps when its algorithm terminates, 
there are some problems solvable with shared 
registers that are not solvable in the message- 
passing model [5]. 

Using a majority of processes to “validate” 
each read and write operation is an example 
of a quorum system, originally introduced for 
replicated data by Gifford [10]. In general, a quo- 
rum system is a collection of sets of processes, 
called quorums, such that every two quorums 
intersect. Quorum systems can also be designed 
to implement shared registers in other models 
of message-passing systems, including dynamic 
networks and systems with malicious failures. 
For examples, see [7, 9, 11, 15]. 


Open Problems 


Although the algorithms described here are op- 
timal in terms of fault-tolerance and message 
complexity, it is not known if the number of bits 
used in messages and local memory is optimal. 
The exact time needed to do reads and writes 
when messages are delivered within time A is 
also a topic of ongoing research. (See, for exam- 
ple, [8].) As mentioned above, the simulation of 
shared registers can be used to implement shared- 
memory algorithms in message-passing systems. 
However, because the simulation introduces con- 
siderable overhead, it is possible that some of 
those problems could be solved more efficiently 
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by algorithms designed specifically for message- 
passing systems. 
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Problem Definition 


Ensuring truthful evaluation of alternatives in 
human activities has always been an important 
issue throughout history. In sports, in particular, 
such an issue is vital and practice of the fair-play 
principle has been consistently put forward as a 
matter of foremost priority. In addition to relying 
on the code of ethics and professional responsi- 
bility of players and coaches, the design of game 
rules is an important measure in enforcing fair 
play. 

Ranking alternatives through pairwise com- 
parisons (or competitions) is the most common 
approach in sports tournaments. Its goal is to 
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find out the “true” ordering among alternatives 
through complete or partial pairwise competi- 
tions [1, 3-7]. Such studies have been mainly 
based on the assumption that all the players play 
truthfully, i.e., with their maximal effort. It is, 
however, possible that some players form a coali- 
tion and cheat for group benefit. An interesting 
example can be found in [2]. 


Problem Description 


The work of Chen, Deng, and Liu [2] considers 
the problem of choosing m winners out of n 
candidates. 

Suppose a tournament is held among n players 
Ph = {pi,.--Pn} and m winners are expected 
to be selected by a selection protocol. Here a 
protocol fm is a predefined function (which will 
become clear later) to choose winners through 
pairwise competitions, with the intention of find- 
ing m players of highest capacity. When the tour- 
nament starts, a distinct ID in N, = {1,2,...n} 
is assigned to each player in P, by a randomly 
picked indexing function J : P, — N,. Then 
a match is played between each pair of players. 
The competition outcomes will form a graph G, 
whose vertex set is N, and edges represent the 
results of all the matches. Finally, the graph will 
be treated as the input to fm, and it will output 
a set of m winners. Now it should be clear that 
Jn.m Maps every possible tournament graph G to 
a subset (of cardinality m) of Ny. 

Suppose there exists a group of bad players 
who play dishonestly, i.e., they might lose a 
match on purpose to gain overall benefit for the 
whole group, while the rest of the players always 
play truthfully, ie., they try their best to win 
matches. The group of bad players gains benefit if 
they are able to have more winning positions than 
that according to the true ranking. Given knowl- 
edge of the selection protocol fn,m, the indexing 
function /, and the true ranking of all players, the 
bad players try to find a cheating strategy that can 
fool the protocol and gain benefit. 

The problem is discussed under two models 
in which the characterizations of bad players are 
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different. Under the collective incentive compat- 
ible model, bad players are willing to sacrifice 
themselves to win group benefit, while the ones 
under the alliance incentive compatible model 
only cooperate if their individual interests are 
well maintained in the cheating strategy. 

The goal is to find an “ideal” protocol, under 
which players or groups of players maximize 
their benefits only by strictly following the fair- 
play principle, i.e., always play with maximal 
effort. 


Formal Definitions 


When the tournament begins, an indexing func- 
tion J is randomly picked, which assigns ID 
I(p) € Ny, to each player p € P,. Then a 
match is played between each pair of players, 
and the results are represented as a directed graph 
G. Finally, G is fed into the predefined selection 
protocol fn,m, to produce a set of m winners 
I“!(W), where W = fn. m(G) C Nn. 


Notations 


An indexing function J for a tournament attended 
by players Py = {P1, P2,.-. Pn} is a one-to- 
one correspondence from P, to the set of IDs: 
Nn = {1,2,...n}. Aranking function R is a one- 
to-one correspondence from P, to {1,2,...n}. 
R(p) represents the underlying true ranking of 
player p among the n players. The smaller, the 
stronger. 

A tournament graph of size n is a directed 
graph G = (N,,, £) such that for alli 4 j € Ny, 
either ij € FE (player with ID i beats player 
with ID jn) or ji € E,. Let Ky, denote the 
set of all such graphs. A selection protocol fim, 
which chooses m winners out of 1 candidates, is 
a function from K, to {S C Ny, and |S| = m}. 

A tournament 7, among players P, is a pair 
Tn = (R, B) where R is a ranking function from 
P,, to N, and B C Py is the group of bad players. 


Definition 1 (Benefit) Given a protocol fim, a 
tournament 7, = (R, B), an indexing function 
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I, and a tournament graph G € Ky, the benefit 
of the group of bad players is 


Ben( fnm,Tn1.G)=|{i€ fnm(G), I~! (i) € B}| 
— |{p € B, R(p) < m}|. 


Given knowledge of fim, Tn, and J, not every 
G € Ky, isa feasible strategy for B: the group of 
bad players. First, it depends on the tournament 
Tn = (R, B), e.g., a player pp € B cannot win 
a player pg ¢ B if R(p,) > R(pg). Second, it 
depends on the property of bad players which is 
specified by the model considered. Tournament 
graphs, which are recognized as feasible strate- 
gies, are characterized below, for each model. 
The key difference is that a bad player in the 
alliance incentive compatible model is not willing 
to sacrifice his/her own winning position, while a 
player in the other model fights for group benefit 
at all costs. 


Definition 2 (Feasible Strategy) Given fhm, 
Tn = (R, B), and J, graph G € Ky is c-feasible 
if 


1. For every two players p;, p; ¢ B, if R(pi) < 
R(pj), then I(pi)I(pj) € E; 

2. For all pg ¢ Band pp € B, if R(pg) < 
R(pp), then edge I(pg)I(pp) € E. 


Graph G € K, is a-feasible if it is c-feasible and 
also satisfies 


1. For every bad player p € B, if R(p) < m, 
then I(p) € fnym(G). 


A cheating strategy is then a feasible tournament 
graph G that can be employed by the group of 
bad players to gain positive benefit. 


Definition 3 (Cheating Strategy) Given fhm, 
T, = (R, B), and J, a cheating strategy for the 
group of bad players under the collective incen- 
tive compatible (alliance incentive compatible) 
model is a graph G € Ky which is c-feasible (a- 
feasible) and satisfies Ben( frm. Tn, 1,G) > 0. 
The following two problems are studied in 
[2]: (1) Is there a protocol f,,_ such that for 
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all 7, and J no cheating strategy exists under 
the collective incentive compatible model? (2) Is 
there a protocol fm such that for all 7, and 
I, no cheating strategy exists under the alliance 
incentive compatible model? 


Key Results 


Definition 4 For all integers n and m such that 
2 < m < n-—2,a tournament graph Gyn = 
(N,, £) € K,, which consists of three parts 71, 
T>, and 73, is defined as follows: 


1. T, = {1,2,...m—2}. 

For alli < j € 7), edgeij € E; 

2. To = {m—1,m,m + ll}. 
(m—1)m,m(m + 1),(m+ 1)(m—-1) € E; 
3. T3 = {m+ 2,m + 3,...n}. 

For alli < j € 73, edgeij € E; 


4. For all i’ € 7; and j’ € T; such thati < j, 
edge i’ j’ € E. 


Theorem 1 Under the collective incentive com- 
patible model , for every selection protocol fnm 
with2 <m <n-—2,ifT, = (R, B) satisfies 
(1) at least one bad player ranks as high as 
m — 1, (2) the ones ranked m + 1 and m + 2 are 
both bad players, and (3) the one ranked m is a 
good player, then there always exists an indexing 
function I such that Gn,m is a cheating strategy. 


Theorem 2 Under the alliance incentive com- 
patible model , ifn —m > 3, then there exists 
a selection protocol fn,m [2] such that for ev- 
ery tournament T,, indexing function I, and a- 
feasible strategy G € Kn, Ben( fam, Tn, 1, G) 
<0. 


Applications 


The result shows that if players are willing to sac- 
rifice themselves, no protocol is able to prevent 
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malicious coalitions from obtaining undeserved 
benefits. 

The result may have potential applications in 
the design of output truthful mechanisms. 


Open Problems 


Under the collective incentive compatible model, 
the work of Chen, Deng, and Liu indicates that 
cheating strategies are available in at least 1/8 of 
tournaments, assuming the probability for each 
player to be in the bad group is 1/2. Could 
this bound be improved? Or could one find a 
good selection protocol in the sense that the 
number of tournaments with cheating strategies 
is close to this bound? On the other hand, al- 
though no ideal protocol exists in this model, 
does there exist any randomized protocol, under 
which the probability of having cheating strate- 
gies is negligible? 
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Problem Definition 


This problem is concerned with the efficient con- 
struction of an independent set of vertices (i.e., 
a set of vertices with no edges between them) 
with maximum cardinality, when the input is 
an instance of the uniform random intersection 
graphs model. This model was introduced by 
Karonski, Sheinerman, and Singer-Cohen in [4] 
and Singer-Cohen in [10] and it is defined as 
follows 
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Definition 1 (Uniform random intersection 
graph) Consider a universe M={1,2,...,m} of 
elements and a set of vertices V={v1, v2,..., Un}. 
If one assigns independently to each vertex 
vj,j =1,2,...,n, a subset Sy; of M by 
choosing each element independently with 
probability p and puts an edge between two 
vertices U;,,U;, if and only if Sv;, N Sv;, # @, 
then the resulting graph is an instance of the 
uniform random intersection graph Gy m,p. 


The universe M is sometimes called label set and 
its elements labels. Also, denote by L;, for! € M, 
the set of vertices that have chosen label /. 

Because of the dependence of edges, this 
model can abstract more accurately (than the 
Bernoulli random graphs model Gy,» _ that 
assumes independence of edges) many real-life 
applications. Furthermore, Fill, Sheinerman, and 
Singer-Cohen show in [3] that for some ranges 
of the parameters n,m,p (m=n*",a > 6), 
the spaces Gyjm,p and G,, are equivalent 
in the sense that the total variation distance 
between the graph random variables has limit 
0. The work of Nikoletseas, Raptopoulos, and 
Spirakis [7] introduces two new models, namely 
the general random intersection graphs model 
Gaim,p> P = [P1+ P2,+++, Pm) and the regular 
random intersection graphs model Gyma,4 > 0 
that use a different way to randomly assign 
labels to vertices, but the edge appearance 
rule remains the same. The G, 5 model is 
a generalization of the uniform model where 
each label i € M is chosen independently with 
probability p;, whereas in the Gy »,, model each 
vertex chooses a random subset of M with exactly 
d labels. 

The authors in [7] first consider the existence 
of independent sets of vertices of a given cardi- 
nality in general random intersection graphs and 
provide exact formulae for the mean and variance 
of the number of independent sets of vertices 
of cardinality k. Furthermore, they present and 
analyze three polynomial time (on the number 
of labels m and the number of vertices n) al- 
gorithms for constructing large independent sets 
of vertices when the input is an instance of the 
Gn,m,p Model. To the best knowledge of the 
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entry authors, this work is the first to consider 
algorithmic issues for these models of random 
graphs. 


Key Results 


The following theorems concern the existence of 
independent sets of vertices of cardinality k in 
general random intersection graphs. The proof of 
Theorem | uses the linearity of expectation of 
sums of random variables. 


Theorem 1 Let X“) denote the number of in- 
dependent sets of size k in a random intersection 


graph G(n,m, p), where p = [p1, P2.-.-, Pm|- 
Then 


E[x®| -(') [1 (c-paktkpit—pyk). 
i=1 


Theorem 2 Let X“ denote the number of in- 
dependent sets of size k in a random intersection 
graph G(n,m, p), where p = [p1, P2,---; Pm: 
Then 


Var (x®) = ze ee ; (*~ ‘ 


E[X®] — E?[x] 
(ve @ ~~ Gy 


where E [x] is the mean number of indepen- 
dent sets of size k and 


m 


v(k.s) = T] (a= pa 


i=1 
+(k —s)pi(1 — pi)F-8“! 
(1 7 im)) ; 


Theorem 2 is proved by first writing the vari- 
ance as the sum of covariances and then apply- 
ing a vertex contraction technique that merges 
several vertices into one supervertex with sim- 
ilar probabilistic behavior in order to compute 
the covariances. By using the second moment 
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method (see [1]) one can derive thresholds for the 
existence of independent sets of size k. 

One of the three algorithms that were proposed 
in [7] is presented below. The algorithm starts 
with V (i.e., the set of vertices of the graph) as 
its “candidate” independent set. In every subse- 
quent step it chooses a label and removes from 
the current candidate independent set all vertices 
having that label in their assigned label set except 
for one. Because of the edge appearance rule, this 
ensures that after doing this for every label in M, 
the final candidate independent set will contain 
only vertices that do not have edges between 
them and so it will be indeed an independent set. 


Algorithm: 
Input: A random intersection graph Gym, p- 
Output: An independent set of vertices A,,. 


1. set Ap := V; set L:= M; 

2. fori = 1 tomdo 

3. begin 

4. select a random label /; € L; set 
L:= L — {lj}; 

5. set Dj := {vu EAi-1:] € Sy}; 

6. if (|D;| = 1) then select a random vertex 


u € D; and set D; := Dj; — {u}; 
7. set A; := A;_; — Dj; 
8. end 
9. output A,,; 


The following theorem concerns the cardinality 
of the independent set produced by the algorithm. 
The analysis of the algorithm uses Wald’s equa- 
tion (see [9]) for sums of a random number of 
random variables to calculate the mean value of 
|Am|, and also Chernoff bounds (see e.g., [6]) for 
concentration around the mean. 


Theorem 3 For the case mp = a logn, for some 
constant a > 1 and m =n, and for some con- 
stant B > 0, the following hold with high prob- 
ability: 


1. Ifnp > o then |Am| => (1 — B) 


logn* 
2. If np > b where b > 0 is a constant then 


|Am| => (1 — B)n(l — e7?). 
3. Ifnp > 0 then |Am| => (1 — B)n. 
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The above theorem shows that the algorithm 
manages to construct a quite large independent 
set with high probability. 


Applications 


First of all, note that (as proved in [5]) any graph 
can be transformed into an intersection graph. 
Thus, the random intersection graphs models can 
be very general. Furthermore, for some ranges 
of the parameters n,m, p (m = n%,a > 6) the 
spaces Gy m,p and Gy,» are equivalent (as proved 
by Fill, Sheinerman, and Singer-Cohen in [3], 
showing that in this range the total variation 
distance between the graph random variables has 
limit 0). 

Second, random intersection graphs (and in 
particular the general intersection graphs model 
of [7]) may model real-life applications more 
accurately (compared to the Gy,» case). In partic- 
ular, such graphs can model resource allocation in 
networks, e.g., when network nodes (abstracted 
by vertices) access shared resources (abstracted 
by labels): the intersection graph is in fact the 
conflict graph of such resource allocation prob- 
lems. 


Other Related Work 

In their work [4] Karofski et al. consider the 
problem of the emergence of graphs with a con- 
stant number of vertices as induced subgraphs 
of Gnm,p graphs. By observing that the Gy m,p 
model generates graphs via clique covers (for 
example the sets L;,/ € M constitute an obvious 
clique cover) they devise a natural way to use 
them together with the first and second moment 
methods in order to find thresholds for the appear- 
ance of any fixed graph Has an induced subgraph 
of Gn.m,p for various values of the parameters n, 
m and p. 

The connectivity threshold for Gym,p Was 
considered by Singer-Cohen in [10]. She studies 
the case m = n%,a > O and distinguishes 
two cases according to the value of a. For the 
case a > 1, the results look similar to the Gn,p 
graphs, as the mean number of edges at the con- 
nectivity thresholds are (roughly) the same. On 
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the other hand, for a < 1 we get denser graphs 
in the Gym,» model. Besides connectivity, [10] 
examines also the size of the largest clique in 
uniform random intersection graphs for certain 
values of n, m and p. 

The existence of Hamilton cycles in Gym,p 
graphs was considered by Efthymiou and 
Spirakis in [2]. The authors use coupling 
arguments to show that the threshold of 
appearance of Hamilton cycles is quite close 
to the connectivity threshold of Gy m,p. Efficient 
probabilistic algorithms for finding Hamilton 
cycles in uniform random intersection graphs 
were presented by Raptopoulos and Spirakis 
in [8]. The analysis of those algorithms verify that 
they perform well w.h.p. even for values of p that 
are close to the connectivity threshold of Gy m,p. 
Furthermore, in the same work, an expected 
polynomial algorithm for finding Hamilton 
cycles in Gyjm,p graphs with constant p is given. 

In [11] Stark gives approximations of the dis- 
tribution of the degree of a fixed vertex in the 
Gnm,p Model. More specifically, by applying 
a sieve method, the author provides an exact 
formula for the probability generating function of 
the degree of some fixed vertex and then analyzes 
this formula for different values of the parameters 
n, mand p. 


Open Problems 


A number of problems related to random in- 
tersection graphs remain open. Nearly all the 
algorithms proposed so far concerning construct- 
ing large independent sets and finding Hamilton 
cycles in random intersection graphs are greedy. 
An interesting and important line of research 
would be to find more sophisticated algorithms 
for these problems that outperform the greedy 
ones. Also, all these algorithms were presented 
and analyzed in the uniform random intersection 
graphs model. Very little is known about how the 
same algorithms would perform when their input 
was an instance of the general or even the regular 
random intersection graph models. 

Of course, many classical problems concern- 
ing random graphs have not yet been studied. 
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One such example is the size of the minimum 
dominating set (i.e., a set of vertices that has 
the property that all vertices of the graph either 
belong to this set or are connected to it) in a ran- 
dom intersection graph. Also, what is the degree 
sequence of Gnm,p graphs? Note that this is very 
different from the problem addressed in [11]. 

Finally, notice that none of the results pre- 
sented in the bibliography for general or uniform 
random intersection graphs carries over immedi- 
ately to regular random intersection graphs. Of 
course, for some values of n,m, p and i, certain 
graph properties shown for Gym,» could also be 
proved for Gym, by showing concentration of 
the number of labels chosen by any vertex via 
Chernoff bounds. Other than that, the fixed sizes 
of the sets assigned to each vertex impose more 
dependencies to the model. 
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Problem Definition 


Consider a text S[1...] over a finite alphabet 
»'. The problem is to build an index for S such 
that for any query pattern P[1 ...7m] and any inte- 
ger k > 0, all locations in S that match P with at 
most & errors can be reported efficiently. If the er- 
ror is measured in terms of the Hamming distance 
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(number of character substitutions), the problem 
is called k-mismatch problem. If the error is 
measured in terms of the edit distance (number of 
character substitutions, insertions, or deletions), 
the problem is called k-difference problem. The 
two problems are formally defined as follows. 


Problem 1 (k-mismatch problem) Consider a 
text S[1...”] over a finite alphabet X’. For any 
pattern P and threshold k, position i is an occur- 
rence of P if the Hamming distance between P 
and S[i...i’] is less than k for some i’. The k- 
mismatch problem asks for an index 7 for S such 
that, for any pattern P, all occurrences of P in S 
can be reported efficiently. 


Problem 2 (k-difference problem) Consider a 
text S[1...”] over a finite alphabet ’. For any 
pattern P and threshold k, position i is an oc- 
currence of P if the edit distance between P 
and S[i ...i’] is less than k for some i’. The k- 
difference problem asks for an index J for S such 
that, for any pattern P, all occurrences of P in S 
can be reported efficiently. 


These two problems are also called indexed 
inexact pattern matching problem or indexed pat- 
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tern searching problem based on Hamming dis- 
tance or edit distance. 

The major concern of these two problems is 
how to achieve efficient pattern searching without 
using a large amount of space for storing the 
index. 


Key Results 


For indexed k-mismatch or k-difference string 
matching, a naive solution either requires an 
index of size @(n*) or supports the query 
using 2(m*) time. The first non-trival solution 
is by Cole et al. [10]. They modify suffix 
tree to give an O(n log* n)-word index that 
supports k-difference query using O(m + 
oce + Alc logn)* loglogn) time. After that, 
a number of indexes are proposed that support 
k-mismatch/k-difference pattern query for any 
k > 0. All these indexes are created by 
augmenting the suffix tree and its variants. 
Tables 1 and 2 summarize the related results 
in the literature for k = 1 and k > 2. Below, the 
current best results are briefly summarized. 


Indexed Approximate String Matching, Table 1 Known results for 1-difference matching. € is some positive 
constant smaller than 1 and occ is the number of 1-difference occurrences in the text 


Space 

O(|'|n logn) words in avg 
O(|2'|n log n) words 

O(n log” n) words 

O(n logn) words 


O(n) words 


O(n(logn log logn)? log | Z|) bits 
O(n./logn log |X|) bits 

O(n log® n log | Z'|) bits 

O(n log log n log | |) bits 

O(n log ||) bits 


Running time 


O(m + occ) 15] 
O(m + occ) in avg [15] 
O(mlogn log logn + occ) [1] 
O(mloglogn + occ) 4] 
O(m + occ + logn log logn) 10] 
O(min{n, | X'|m2} + occ) [8] 
O(|2'|m logn + occ) [13] 
O(n* logn) [16] 
O(n‘) al 
O(m + oce + || log? 7 log log n) 5] 
O(m + occ + logn log log n) [6] 
O(m + occ) [2] 
O(|2'|m log log n + occ) 14] 
O(|X'|m + occ) 3] 
O((|'|m + occ) log log n) [3] 
O(|=|m log? n + occ logn) [13] 
O((|X'|m log logn + occ) log* n) 14] 
O(m + (occ + || log* n log log n) log* n) 5] 
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Indexed Approximate String Matching, Table 2 
Known results for k-difference matching for k > 2. c¢ 
and d are some positive constants and € is some positive 
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constant smaller than 1. occ is the number of k-difference 
occurrences in the text 


Space Running time 
O(n'+€) words O(m + loglogn + occ) [19] 
O(|=|*xn log* n) words in avg O(m + occ) [15] 
O(\z|Kn log* n) words O(m + occ) in avg [15] 
O(n log* n) words in avg O(3k m*+! + occ) [9] 
o(Zin log* n) words O(m + 3% oce + ule logn)* log logn) [10] 
O(n log<—! n) words O(m + k33*% occ + alc logn)* log logn) [5] 
O(n) words O(min{n, | | mk +2} + occ) [8] 
O((|=|m)* max(k, logn) + occ) [13] 
O(m + k33* oce + (c logn)**+) loglogn) [5] 
O((2|Z|)*¥—!m*—! logn log logn + occ) [6] 
O(n./logn log ||) bits O((|=|m)* (k + loglogn) + occ) [14] 
O(n log | X'|) bits O((\Z|m)* max(k, log? n) + occ logn) [13] 
O(((\Z|m)* (k + log log n) + occ) logé n) [14] 
O(m + (k33* oce + (€ log n)k?+2k log log) log* n) [5] 


Inexact Matching When k = 7 


For 1-mismatch and 1-difference approximate 
matching problem, the theorems below give the 
current best solutions. Both algorithms try to 
handle long and short patterns separately. Short 
patterns of size polylog(7) can be handled using 
index of size O(polylog()) space by brute force. 
Long patterns can be handled with the help of 
some augmented suffix tree. 

When the index is of size O(n log |’|) bits, 
the next theorem is the current best result. 


Theorem 1 (Chan, Lam, Sung, Tam, and 
Wong [5]) Given an index of size O(n log |»'|) 
bits, 1-mismatch or 1-difference query can be 
supported in O(m + (occ + | Z| log* n log log n) 
log‘ n) time where € is any positive constant 
smaller than or equal to 1. 


When we allow a bit more space, Belazzougui 
can further reduce the query time, as shown in the 
following theorem. 


Theorem 2 (Belazzougui [3]) Given an 
index of size O(nlog*nlog||) bits (or 
O(n loglognlog|'|) bits, respectively), 1- 
mismatch/\ -difference lookup can be supported 


in O(|X’'|m+occ) (or O((|X'|m-+occ) log log n), 
respectively) time. 


Inexact Matching When k > 2 


For k-mismatch and k-difference approximate 
matching problem where k > 2, existing solu- 
tions are all based on the so-called k-error suffix 
trees and its variants (following the idea of Cole 
et al.). 

Some current solutions create indexes whose 
sizes depend on k. Theorems 3-6 summarize the 
current best results in this direction. 


Theorem 3 (Maas and Nowak [15]) Given 
an index of size O(\|=\Kn log* n) words, k- 
mismatch/k -difference lookup can be supported 
in O(m + occ) expected time. 


Theorem 4 (Maas and Nowak [15]) Consider 
a uniformly and independently generated text 
of length n. There exists an index of size 
O(|Z|knlog* n) words on average such that 
an k-mismatch/k-difference lookup query can 
be supported in O(m + occ) worst-case 
time. 
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Theorem 5 (Chan, Lam, Sung, Tam, and 
Wong [5]) Given an index of size O(n ioe * 
n) words where h < k, k-mismatch lookup can be 
supported in O(m + occ + ck? log NMS 
loglogn) time where c is a positive constant. 
For k-difference lookup, the term occ becomes 
k33* oce. 


Theorem 6 (Chan, Lam, Sung, Tam, and 
Wong [6]) Given an index of size O(n log’! n) 
words, k-mismatch/k-difference lookup can be 
supported in O(m + occ + log* n log logn) 
time. 


Theorems 7—12 summarize the current best 
results when the index size is independent of k. 


Theorem 7 (Chan, Lam, Sung, Tam, and 
Wong [5]) Given an index of size O(n) 
words, k-mismatch lookup can be supported 
in O(m + oce + (clogn)*®*) Jog logn) time 
where c is a positive constant. For k-difference 
lookup, the term occ becomes k33* occ. 


Theorem 8 (Chan, Lam, Sung, Tam, and 
Wong [5]) Given an index of size O(n log |»'|) 
bits, k-mismatch lookup can be supported in 
O(m + (oce + (c logn)**+?) log log n) log* n) 
time where c is a positive constant and € is any 
positive constant smaller than or equal to 1. 
For k-difference lookup, the term occ becomes 
k33* oce. 


Theorem 9 (Lam, Sung, and Wong [14]) 
Given an index of size O(n,/logn log |x|) 
bits, k-mismatch/k-difference lookup can be 
supported in O((|Z|m)*(k + loglogn) + occ) 
time. 


Theorem 10 (Lam, Sung, and Wong [14]) 
Given an index of size O(nlog|S'|) bits, k- 
mismatch/k -difference lookup can be supported 
in O(((|Z|m)* (k + log log n) + occ) log* n) time 
where € is any positive constant smaller than or 
equal to 1. 


Theorem 11 (Chan, Lam, Sung, Tam, and 
Wong [6]) Given an index of size O(n) 
words, k-mismatch/k-difference lookup can be 
supported in O((2|5|)*~1m*“! log n log logn + 
occ) time. 
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Theorem 12 (Tsur [19]) Given an index of size 
O(n'**) words, k-mismatch/k-difference lookup 
can be supported in O(m + occ + log log n) time. 


Practically Fast Inexact Matching 


In addition, there are indexes which are efficient 
in practice for small k/m but give no worst-case 
complexity guarantees. Those methods are based 
on filtration. The basic idea is to partition the 
pattern into short segments and locate those short 
segments in the text allowing zero or a small 
number of errors. Those short segments help to 
identify candidate regions for the occurrences of 
the pattern. Finally, by verifying those candidate 
regions, all occurrences of the pattern are recov- 
ered. See [18] for a summary of those results. One 
of the best results based on filtration is stated in 
the following theorem. 


Theorem 13 (Myers [16] and Navarro and 
Baeza-Yates [17]) [fk/m < 1—O(1/,/|5)), k- 
mismatch/k -difference search can be supported 
in O(n‘) expected time, where € is a positive 
constant smaller than 1, with an index of size 
O(n) words. 


Other methods with good performance on av- 
erage include [11] and [12]. 

All the above approaches either try to index 
the strings with errors or are based on filtering. 
There are also solutions which use radically dif- 
ferent approaches. For instance, there are solu- 
tions which transform approximate string search- 
ing into range queries in metric space [7]. 


Applications 


Due to the advance in both the Internet and 
biological technologies, enormous text data is ac- 
cumulated. For example, 60G genomic sequence 
data are currently available in GenBank. The data 
size is expected to grow exponentially. 

To handle the huge data size, indexing tech- 
niques are vital to speed up the pattern matching 
queries. Moreover, exact pattern matching is no 
longer sufficient for both the Internet and bio- 
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logical data. For example, biological data usu- 
ally contains a lot of differences due to exper- 
imental errors and due to mutation and evolu- 
tion. Therefore, approximate pattern matching 
becomes more appropriate. This gives the mo- 
tivation for developing indexing techniques that 
allow pattern matching with errors. 


Open Problems 


The complexity for indexed approximate match- 
ing is still not fully understood. A number of 
questions are still open. For instance, there are 
two open questions: (1) Given a fixed index size 
of O(n) words, what is the best time complexity 
of a k-mismatch/k-difference query? (2) Fixed 
the k-mismatch/k-difference query time to be 
O(m + occ), what is the best space complexity 
of the index? 
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Problem Definition 


Regular expressions (REs) provide an expres- 
sive and powerful formalism for capturing the 
structure of messages, events, and documents. 
Consequently, they have been used extensively 
in the specification of a number of languages 
for important application domains, including the 
XPath pattern language for XML documents [6] 
and the policy language of the Border Gateway 
Protocol (BGP) for propagating routing informa- 
tion between autonomous systems in the Internet 
[12]. Many of these applications have to manage 
large databases of RE specifications and need to 
provide an effective matching mechanism that, 
given an input string, quickly identifies all the 
REs in the database that match it. This RE re- 
trieval problem is therefore important for a va- 
riety of software components in the middleware 
and networking infrastructure of the Internet. 

The RE retrieval problem can be stated as 
follows: Given a large set S of REs over an 
alphabet &, where each RE r e€ S defines a 
regular language L(r), construct a data structure 
on S that efficiently answers the following query: 
given an arbitrary input string w € &*, find the 
subset S,, of REs in S whose defined regular 
languages include the string w. More precisely, 
r € S, iff w € L(r). Since S is a large, dynamic, 
disk-resident collection of REs, the data structure 
should be dynamic and provide efficient support 
of updates (insertions and deletions) to S. Note 
that this problem is the opposite of the more 
traditional RE search problem where S C d* is 
a collection of strings and the task is to efficiently 
find all strings in S that match an input regular 
expression. 


Notations 
An RE r over an alphabet & represents a subset 
of strings in o* (denoted by L(r)) that can be 
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defined recursively as follows [9]: (1) the con- 
stants « and @ are REs, where L (€) = {e} and 
L(@) = @; (2) for any letter a € o, a is an 
RE where L(a) = {a}; (3) if ry and rz are REs, 
then their union, denoted by 7; + 72, is an RE 
where L(r) +72) = L(r1) U L(r2); (4) ifr) and 
rz are REs, then their concatenation, denoted by 
r1.r2, is an RE where L(r}.r2) = {5152 | 51 € 
L(r1), 52 € L(r2)}; (5) if r is an RE, then its 
closure, denoted by r*, is an RE where L (r*) = 
L(e) UL(r)UL (rr) UL (rrr) U---; and (6) ifr 
is an RE, then a parenthesized r, denoted by (r), 
is an RE where L((r)) = L(r). For example, if 
o = {a,b,c}, then (a + b).(a +b + c)*.c is an 
RE representing the set of strings that begins with 
either a “a” or a “b” and ends with a “c.” A string 
s € 0* is said to match an REr ifs € L(r). 

The language L(r) defined by an RE r can 
be recognized by a finite automaton (FA) M that 
decides if an input string w is in L(r) by reading 
each letter in w sequentially and updating its 
current state such that the outcome is determined 
by the final state reached by M after w has been 
processed [9]. Thus, M is an FA for r if the 
language accepted by M, denoted by L(M), is 
equal to L(r). An FA is classified as a determin- 
istic finite automaton (DFA) if its current state 
is always updated to a single state; otherwise, it 
is a nondeterministic finite automaton (NFA) if 
its current state could refer to multiple possible 
states. The trade-off between a DFA and an NFA 
representations for an RE is that the latter is 
more space efficient, while the former is more 
time efficient for recognizing a matching string 
by checking a single path of state transitions. Let 
|L(M)| denote the size of L(M) and |L,(M)| 
denote the number of length-n strings in L(M). 
Given a set M of finite automata, let L (M) 
denote the language recognized by the automata 


inM;ie., L(M)= (LU L(M). 
M;EM 


Key Results 


The RE retrieval problem was first studied for a 
restricted class of REs in the context of content- 
based dissemination of XML documents using 
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XPath-based subscriptions (e.g., [1,3, 7]), where 
each XPath expression is processed in terms of a 
collection of path expressions. While the XPath 
language [6] allows rich patterns with tree struc- 
ture to be specified, the path expressions that it 
supports lack the full expressive power of REs 
(e.g., XPath does not permit the RE operators *, 
+ and - to be arbitrarily nested in path expres- 
sions), and thus extending these XML-filtering 
techniques to handle general REs may not be 
straightforward. Further, all of the XPath-based 
methods are designed for indexing main-memory 
resident data. Another possible approach would 
be to coalesce the automata for all the REs 
into a single NFA and then use this structure 
to determine the collection of matching REs. 
It is unclear, however, if the performance of 
such an approach would be superior to a sim- 
ple sequential scan over the database of REs; 
furthermore, it is not easy to see how such a 
scheme could be adapted for disk-resident RE 
data sets. 

The first disk-based data structure that can 
handle the storage and retrieval of REs in their 
full generality is the RE-tree [4,5]. Similar to 
the R-tree [8], an RE-tree is a dynamic, height- 
balanced, hierarchical index structure, where the 
leaf nodes contain data entries corresponding to 
the indexed REs, and the internal nodes contain 
“directory” entries that point to nodes at the next 
level of the index. Each leaf node entry is of the 
form (id, M), where id is the unique identifier of 
an REr and M is a finite automaton representing 
r. Each internal node stores a collection of finite 
automata, and each node entry is of the form 
(M, ptr), where M is a finite automaton and ptr 
is a pointer to some node WN (at the next level) 
such that the following containment property is 
satisfied: If My is the collection of automata 
contained in node N, then L(My) C L(M),. 
The automaton M is referred to as the bounding 
automaton for My. The containment property 
is key to improving the search performance of 
hierarchical index structures like RE-trees: if a 
query string w is not contained in L(M), then 
it follows that w ¢ L(M;) for all Mj € My. 
As a result, the entire subtree rooted at N can 
be pruned from the search space. Clearly, the 
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closer L(M) is to L(My), the more effective 
this search-space pruning will be. 

In general, there are an infinite number of 
bounding automata for My with different de- 
grees of precision from the least precise bounding 
automaton with L(M) = o* to the most precise 
bounding automaton, referred to as the minimal 
bounding automaton, with L(M) = L(Mvy). 
Since the storage space for an automaton is de- 
pendent on its complexity (in terms of the number 
of its states and transitions), there is a space- 
precision trade-off involved in the choice of a 
bounding automaton for each internal node entry. 
Thus, even though minimal bounding automata 
result in the best pruning due to their tightness, it 
may not be desirable (or even feasible) to always 
store minimal bounding automata in RE-trees 
since their space requirement can be too large 
(possibly exceeding the size of an index node), 
thus resulting in an index structure with a low 
fan-out. Therefore, to maintain a reasonable fan- 
out for RE-trees, a space constraint is imposed 
on the maximum number of states (denoted by «) 
permitted for each bounding automaton in inter- 
nal RE-tree nodes. The automata stored in RE- 
tree nodes are, in general, NFAs with a minimum 
number of states. Also, for better space utiliza- 
tion, each individual RE-tree node is required 
to contain at least m entries. Thus, the RE-tree 
height is O(log,,(|S|)). 

RE-trees are conceptually similar to other hi- 
erarchical, spatial index structures, like the R-tree 
[8] that is designed for indexing a collection of 
multidimensional rectangles, where each internal 
entry is represented by a minimal bounding rect- 
angle (MBR) that contains all the rectangles in 
the node pointed to by the entry. RE-tree search 
simply proceeds top-down along (possibly) mul- 
tiple paths whose bounding automaton accepts 
the input string; RE-tree updates try to identify 
a “good” leaf node for insertion and can lead to 
node splits (or, node merges for deletions) that 
can propagate all the way up to the root. There is, 
however, a fundamental difference between the 
RE-tree and the R-tree in the indexed data types: 
regular languages typically represent infinite sets 
with no well-defined notion of spatial locality. 
This difference mandates the development of 
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novel algorithmic solutions for the core RE-tree 
operations. To optimize for search performance, 
the core RE-tree operations are designed to keep 
each bounding automaton M in every internal 
node to be as “tight” as possible. Thus, if M is the 
bounding automaton for My, then L(M) should 
be as close to L (My) as possible. 

There are three core operations that need to be 
addressed in the RE-tree context: (P1) selection 
of an optimal insertion node, (P2) computing 
an optimal node split, and (P3) computing an 
optimal bounding automaton. The goal of (P1) 
is to choose an insertion path for a new RE that 
leads to “minimal expansion” in the bounding 
automaton of each internal node of the inser- 
tion path. Thus, given the collection of automata 
M (N) in an internal index node N and a new 
automaton M, an optimal M; € M (N) needs to 
be chosen to insert M such that |L(M;)N L(M)| 
is maximum. The goal of (P2), which arises when 
splitting a set of REs during an RE-tree node- 
split, is to identify a partitioning that results in 
the minimal amount of “covered area” in terms 
of the languages of the resulting partitions. More 
formally, given the collection of automata M = 
{M,, M2,..., Mj} in an overflowed index node, 
find the optimal partition of M into two disjoint 
subsets My, and Mp) such that |M,| > m, 
|M2| = m, and |L(M;,)| + |L(M2)| is mini- 
mum. The goal of (P3), which arises during inser- 
tions, node-splits, or node-merges, is to identify a 
bounding automaton for a set of REs that does 
not cover too much “dead space.” Thus, given 
a collection of automata M, the goal is to find 
the optimal bounding automaton M such that 
the number of states of M is no more than o, 
L(M) ¢ L(M) and |L(M)| is minimum. 

The objective of the above three operations is 
to maximize the pruning during search by keep- 
ing bounding automata tight. In (P1), the optimal 
automaton M; selected (within an internal node) 
to accommodate a newly inserted automaton M 
is to maximize |L(M;) N L(M)|. The set of 
automata M are split into two tight clusters in 
(P2), while in (P3), the most precise automaton 
(with no more than @ states) is computed to cover 
the set of automata in M. Note that (P3) is unique 
to RE-trees, while both (P1) and (P2) have their 
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equivalents in R-trees. The heuristics solutions 
[2, 8] proposed for (P1) and (P2) in R-trees 
aim to minimize the number of visits to nodes 
that do not lead to any qualifying data entries. 
Although the minimal bounding automata in RE- 
trees (which correspond to regular languages) are 
very different from the MBRs in R-trees, the 
intuition behind minimizing the area of MBRs 
(total area or overlapping area) in R-trees should 
be effective for RE-trees as well. The counterpart 
for area in an RE-tree is |L(M)|, the size of 
the regular language for M. However, since a 
regular language is generally an infinite set, new 
measures need to be developed for the size of a 
regular language or for comparing the sizes of 
two regular languages. 

One approach to compare the relative sizes of 
two regular languages is based on the following 
definition: for a pair of automata M; and M;, 
L(M;) is said to be larger than L(M;) if 
there exists a positive integer N such that for 

k k 
al k > N, py |L1 (M;)| 22 |Li (M;)|. 
Based on the above intuition, three increasingly 
sophisticated measures are proposed to capture 
the size of an infinite regular language. The 
max-count measure simply counts the number 
of strings in the language up to a certain size 


A 
A; ie., L(M)| — a [ee (M)|. This measure 
i=1 


is useful for applications where the maximum 
length of all the REs to be indexed is known and 
is not too large so that A can be set to some value 
slightly larger than the maximum length of the 
REs. A second more robust measure that is less 
sensitive to the A parameter value is the rate-of- 
growth measure which is based on the intuition 
that a larger language grows at a faster rate than 
a smaller language. The size of a language is 
approximated by computing the rate of change 
of its size from one “window” of lengths to the 
next consecutive “window” of lengths: if A is a 
length parameter that denote the start of the first 
window and @ is a window-size parameter, then 


A+20-1 A+0-1 
LAD), = YY [LAo|/ © (Li). 
146 i 


As in the max-count measure, the parameters A 
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and @ should be chosen to be slightly greater than 
the number of states of M to ensure that strings 
involving a substantial portion of paths, cycles, 
and accepting states are counted in each window. 
However, there are cases where the rate-of- 
growth measure also fails to capture the “larger 
than” relationship between regular languages 
[4]. To address some of the shortcomings of the 
first two metrics, a third information-theoretic 
measure is proposed that is based on Rissanen’s 
minimum description length (MDL) principle 
[11]. The intuition is that if L(M;) is larger 
than L(M;), then the per-symbol cost of an 
MDL-based encoding of a random string in 
L(M;) using M; is very likely to be higher than 
that of a string in L(M;) using M;, where the 
per-symbol cost of encoding a string w € L(M) 
is the ratio of the cost of an MDL-based 
encoding of w using M to the length of w. More 
specifically, if w = wy .W2..... Wn € L(M) 
and So, 81,...,Sn is the unique sequence of states 
visited by w in M, then the MDL- ar encoding 


cost of w using M is given by o flog, (;)], 


where each n; denotes the aundber: of transitions 
out of state s;, and log,(n;) is the number of 
bits required to specify the transition out of state 
s;. Thus, a reasonable measure for the size of 
a regular language L(M) is the expected per- 
symbol cost of an MDL-based encoding for a 
random sample of strings in L(M). 

To utilize the above metrics for measuring 
L(M), one common operation needed is the 
computation of |L,(M)|, the number of length- 
n strings in L(M). While |L,(M)| can be effi- 
ciently computed when M is a DFA, the problem 
becomes #P-complete when M is an NFA [10]. 
Two approaches were proposed to approximate 
|Ln(M)| when N is an NFA [10]. The first 
approach is an unbiased estimator for |L,(M)|, 
which can be efficiently computed but can have 
a very large standard deviation. The second ap- 
proach is a more accurate randomized algorithm 
for approximating |L,(M)| but it is not very 
useful in practice due to its high time com- 
plexity of O(n'°*™). A more practical approx- 
imation algorithm with a time complexity of 
O(n?|M |? min {|o|, |M|}) was proposed in [4]. 
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The RE-tree operations (P1) and (P2) require 
frequent computations of |L(M; M M;)| and 
|L(M; U M;)| to be performed for pairs of 
automata M;, M;. These computations can ad- 
versely affect RE-tree performance since con- 
struction of the intersection and union automaton 
M can be expensive. Furthermore, since the final 
automaton M may have many more states than 
the two initial automata M; and M,, the cost of 
measuring |(M)| can be high. The performance 
of these computations can, however, be optimized 
by using sampling. Specifically, if the counts and 
samples for each L(M;) are available, then this 
information can be utilized to derive approxi- 
mate counts and samples for L(M; N M;) and 
L(M; U M;) without incurring the overhead of 
constructing the automata M; 1M; and M; UM; 
and counting their sizes. The sampling techniques 
used are based on the following results for ap- 
proximating the sizes of and generating uniform 
samples of unions and intersections of arbitrary 
sets: 


Theorem 1 (Chan, Garofalakis, Rastogi [4]) 
Let r, and rz be uniform random samples of sets 
Sy and Sp, respectively. 


1. (Ir, A S2||$1|)/|r1| is an unbiased estimator 
of the size of Sy O So. 

2. ry S2 is a uniform random sample of S$, S'2 
with size |r N S|. 

3. If the sets Sy and Sz are disjoint, then a 
uniform random sample of S; U Sz can be 
computed in O(|r1| + |r|) time. If Sy and S2 
are not disjoint, then an approximate uniform 
random sample of S; U S2 can be computed 
with the same time complexity. 


Applications 


The RE retrieval problem also arises in the 
context of both XML document classification, 
which identifies matching DTDs for XML 
documents, as well as BGP routing, which 
assigns appropriate priorities to BGP advertise- 
ments based on their matching routing-system 
sequences. 
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Experimental Results 


Experimental results with synthetic data sets [5] 
clearly demonstrate that the RE-tree index is 
significantly more effective than performing a 
sequential search for matching REs and, in a 
number of cases, outperforms sequential search 
by up to an order of magnitude. 
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Problem Definition 


This entry is concerned with designing and build- 
ing indexes of a two-dimensional matrix, which 
is basically the generalization of indexes of a 
string, the suffix tree [12] and the suffix array 
[11], to a two-dimensional matrix. This problem 
was first introduced by Gonnet [7]. Informally, 
a two-dimensional analog of the suffix tree is a 
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tree data structure storing all submatrices of an 
n Xm matrix, n > m. The submatrix tree [2] 
is an incarnation of such indexes. Unfortunately, 
building such indexes requires Q(nm7) time [2]. 
Therefore, much of the attention paid has been 
restricted to square matrices and submatrices, 
the important special case in which much better 
results are available. 

For square matrices, the Lsuffix tree and its 
array form, storing all square submatrices of an 
n X n matrix, have been proposed [3, 9, 10]. 
Moreover, the general framework for these index 
families is also introduced [4,5]. Motivated by 
LZl1-type image compression [14], the online 
case, i.e., the matrix is given one row or column 
at a time, has been also considered. These data 
structures can be built in time close to n?. Build- 
ing these data structures is a nontrivial extension 
of the algorithms for the standard suffix tree and 
suffix array. Generally, a tree data structure and 
its array form of this type for square matrices are 
referred to as the two-dimensional suffix tree and 
the two-dimensional suffix array, which are the 
main concerns of this entry. 


Notations 

Let A be ann Xn matrix with entries defined 
over a finite alphabet ©. Afi ...k, 7 .../] denotes 
the submatrix of A with corners (i, 7), (k,/), 
(i,/), and (k,/). Wheni = k or j = J, one of 
the repeated indexes is omitted. For |< i,j < 
n, the suffix A(i, 7) of A is the largest square 
submatrix of A that starts at position (i, 7) in A. 
That is, A(Zi, 7) = Ali...i +k, j...7 +k), 
where k = n — max(i, /). Let $; be a special 
symbol not in © such that $; is lexicographically 
smaller than any other character in ©. Assume 
that $; is lexicographically smaller than $; for 
i < j. For notational convenience, assume that 
the last entries of the ith row and column are $;. 
It makes all suffixes distinct. See Fig. 1a, b for an 
example. 


fo,e) : 
Let LS = (J ©”71. The strings of LZ 
i=l 
are referred to as Lcharacters, and each of them 
is considered as an atomic item. L& is called 


the alphabet of Lcharacters. Two Lcharacters are 
equal if and only if they are equal as strings over 
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&. Moreover, given two Lcharacters La and Lb 
of equal length, La is lexicographically smaller 
than or equal to Lb if and only if the string 
corresponding to La is lexicographically smaller 
than or equal to that corresponding to Lb. A 
chunk is the concatenation of Lcharacters with 
the following restriction: an Lcharacter in 5?'~! 
can precede only one in ©2¢+—! and succeed 
only one in ©2@-)—!, An Lstring is a chunk such 
that the first Lcharacter is in ©. 

For dealing with matrices as strings, a lin- 
ear representation of square matrices is needed. 
Given A[1...,1...m], divide A into n Lshaped 
characters. Let a(i) be the concatenation of row 
Ali,1...i¢ — 1] and column A[1...i,7]. Then, 
a(i) can be regarded as an Lcharacter. The lin- 
earized string of matrix A, called the Lstring 
of matrix A, is the concatenation of Lcharac- 
ters a(1),...,a(m). See Fig. lc for an example. 
Slightly different linearizations have been used 
[9, 10, 13], but they are essentially the same in the 
aspect of two-dimensional functionality. 


Two-Dimensional Suffix Trees 

The suffix tree of matrix A is a compacted trie 
over the alphabet L> that represents Lstrings 
corresponding to all suffixes of A. Formally, the 
two-dimensional suffix tree of matrix A is a rooted 
tree that satisfies the following conditions (see 
Fig. 1d for an example): 


1. Each edge is labeled with a chunk. 

2. There is no internal node of outdegree one. 

3. Chunks assigned to sibling edges start with 
different Lcharacters, which are of the same 
length as strings in 4*. 

4. The concatenation of the chunks labeling the 
edges on the path from the root to a leaf gives 
the Lstring of exactly one suffix of A, say 
A(i, j). It is said that this leaf is associated 
with A(i, /). 

5. There is exactly one leaf associated with each 
suffix. 


Conditions 4 and 5 mean that there is a one-to- 
one correspondence between the leaves of the 
tree and the suffixes of A (which are all distinct 
because $; is unique). 
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a Matrix A 


b Leharacters 
ify ‘ 
oH i]@.1)]a abb bbbbb $:$:$:$:$:$)$ 
2 | (2,1) a bbb $:$:bb$; 
3|(3,1)|b Sib$ 
4|(3,2)|b $2:b $s 
5|(3,3)| b $3$3$« 
6 | (1,3) b bS$:$2 
7\(2,3)| b b$2$3 
c 8|(2,2)|}b bbb $2$3$2$394 
a bbb $:$:bb 9|(1,2)|}b bbb bb $:$2$; 


$s 


Indexed Two-Dimensional String Matching, Fig. 1 
(a) A matrix A, (b) the suffix A(2,1) and Lcharacters 
composing A(2,1), (c) the Lstring of A(2,1), (d) the suffix 


Problem I (Construction of 2D suffix tree) 


INPUT: Anz Xn matrix A. 
OUTPUT: A two-dimensional suffix tree storing 
all square submatrices of A. 


Online Suffix Trees 

Assume that A is read online in row major order 
(column major order can be considered simi- 
larly). Let Ay = A[l...¢,1...n] and row, = 
Alt,1...n]. At time ¢ — 1, nothing but A;— 
is known about A. At time f, row; is read and 
so A; is known. After time ¢, the online suffix 
tree of A is storing all suffixes of A; . Note that 
Condition 4 may not be satisfied during the online 
construction of the suffix tree. A leaf may be 
associated with more than one suffix, because the 
suffixes of A; are not all distinct. 


Problem 2 (Online construction of 2D suffix tree) 


INPUT: A sequence of rows of n x n matrix A, 
TOW ,TOW2,...,7OWn. 

OUTPUT: A two-dimensional suffix tree storing 
all square submatrices of A; after reading 
row; . 


tree of A, and (e) the suffix array of A (omitting the 
suffixes started with $; ) 


Two-Dimensional Suffix Arrays 

The two-dimensional suffix array of matrix A is 
basically a sorted list of all Lstrings correspond- 
ing to suffixes of A. Formally, the Ath element 
of the array has the start position (7, 7) if and 
only if the Lstring of A(i, 7) is the kth smallest 
one among the Lstrings of all suffixes of A. See 
Fig. le for an example. The two-dimensional 
suffix array is also coupled with additional infor- 
mation tables, called Licp and Ricp, to enhance 
its performance like the standard suffix array. The 
two-dimensional suffix array can be constructed 
from the two-dimensional suffix tree in linear 
time. 


Problem 3 (Construction of 2D suffix array) 


INPUT: Ann Xn matrix A. 
OuTPUT: The two-dimensional suffix array stor- 
ing all square submatrices of A. 


Submatrix Trees 

The submatrix tree is a tree data structure storing 
all submatrices. This entry just gives a result on 
submatrix trees. See [2] for details. 


976 


Problem 4 (Construction of a submatrix tree) 


INPUT: Ann xX m matrix B,n > m. 
OUTPUT: The submatrix tree and its array form 
storing all submatrices of B. 


Key Results 


Theorem 1 (Kim et al. 2011 [10], Cole and 
Hariharan 2003 [1]) Given ann x n matrix A 
over an integer alphabet, one can construct the 
two-dimensional suffix tree in O(n?) time. 


Kim and Park’s result is a deterministic al- 
gorithm, while Cole and Hariharan’s result is a 
randomized one. For an arbitrary alphabet, one 
needs first to sort it and then to apply the theorem 
above. 


Theorem 2 (Na et al. 2007 [13]) Given ann x 
n matrix A, one can construct online the two- 
dimensional suffix tree of A in O(n? log n) time. 


Theorem 3 (Kim et al. 2003 [9]) Given annxn 
matrix A, one can construct the two-dimensional 
suffix array of A in O(n*log n) time without 
constructing the two-dimensional suffix tree. 


Theorem 4 (Giancarlo 1993 [2]) Given ann x 
m matrix B, one can construct the submatrix tree 
of B in O(nm?log(nm)) time. 


Applications 


Two-dimensional indexes can be used for many 
pattern-matching problems of two-dimensional 
applications such as low-level image processing, 
image compression, visual data bases, and so 
on [3, 6]. Given an n x n text matrix and an 
m Xm pattern matrix over an alphabet &, the two- 
dimensional pattern retrieval problem, which is 
a basic pattern-matching problem, is to find all 
occurrences of the pattern in the text. The two- 
dimensional suffix tree and array of the text 
can be queried in O(m7log || + occ) time 
and O(m? + log n + occ) time, respectively, 
where occ is the number of occurrences of the 
pattern in the text. This problem can be easily 
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extended to a set of texts. These queries have 
the same procedure and performance as those of 
indexes for strings. Online construction of the 
two-dimensional suffix tree can be applied to LZ- 
1-type image compression [6]. 


Open Problems 


The main open problems on two-dimensional 
indexes are to construct indexes in optimal time. 
The linear-time construction algorithm for two- 
dimensional suffix trees is already known [10]. 
The online construction algorithm due to [13] 
is optimal for unbounded alphabets, but not for 
integer or constant alphabets. Another open prob- 
lem is to construct two-dimensional suffix arrays 
directly in linear time. 


Experimental Results 


An experiment that compares construction algo- 
rithms of two-dimensional suffix trees and suffix 
arrays was presented in [8]. Giancarlo’s algo- 
rithm [2] and Kim et al.’s algorithm [8] were im- 
plemented for two-dimensional suffix trees and 
suffix arrays, respectively. Random matrices of 
sizes 200 x 200 ~ 800 x 800 and alphabets of 
sizes 2, 4, 16 were used for input data. According 
to experimental results, the construction of two- 
dimensional suffix arrays is ten times faster and 
five times more space efficient than that of two- 
dimensional suffix trees. 
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Problem Definition 


The theory of inductive inference is concerned 
with the capabilities and limitations of machine 
learning. Here the learning machine, the concepts 
to be learned, as well as the hypothesis space 
are modeled in recursion theoretic terms, based 
on the framework of identification in the limit 
[1,9, 15]. 

Formally, considering recursive functions 
(mapping natural numbers to natural numbers) 
as target concepts, a learner (inductive inference 
machine) is supposed to process, step by step, 
gradually growing initial segments of the graph 
of a target function. In each step, the learner 
outputs a program in some fixed programming 
system, where successful learning means that the 
sequence of programs returned in this process 
eventually stabilizes on some program actually 
computing the target function. 

Case and Smith [3,4] proposed several vari- 
ants of this model in order to study the influ- 
ence that certain constraints or relaxations may 
have on the capabilities of learners. Their models 
restrict (i) the number of mind changes (i.e., 
changes of output programs) a learner is allowed 
to make during the learning process and (ii) the 
number of errors the program eventually hypoth- 
esized may have when compared to the target 
function. 

One major result of studying the correspond- 
ing effects is a hierarchy of inference types culmi- 
nating in a model general enough to allow for the 
identification of the whole class of recursive func- 
tions by a single inductive inference machine. 


Notation 

The target concepts for learning in the model 
discussed below are recursive functions [14] 
mapping natural numbers to natural numbers. 
Such functions, as well as partial recursive func- 
tions in general, are considered as computable 
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in an arbitrary, but fixed Gdédel numbering 
» = (gi)ien. Here N = {0,1,2,...} denotes 
the set of all natural numbers. g = (9;)jen is 
interpreted as a programming system, where the 
number i € N is called a program for the partial 
recursive function ¢;. 

Suppose f and g are partial recursive func- 
tions andn € N. Below f =" g is written if 
the set {x € N| f(x) 4 g(x)} is of cardinality 
at most n. If the set {x € N| f(x) # g(x)} is 
finite, this is denoted by f =* g. One considers 
* aS a special symbol for which the <-relation 
is extended by n < x for all nm e N. For any 
recursive f and any z € N, let f[z] denote 
(z, (f(0),..., f(Z))) for short. 

For further basic recursion theoretic notions, 
the reader is referred to [14]. 


Learning Models 

Case and Smith [4] build their theory upon the 
fundamental model of identification in the limit 
[1,9]. There a learner can be understood as an 
algorithmic device, called an inductive inference 
machine, which, given any “graph segment” f [z] 
as its input, returns a program i € N. Such a 
learner M identifies a recursive function f in the 
limit, if there is some 7 € N such that 


gj = f and M(f[z)) 
= j for all but finitely many z € N. 


A class of recursive functions is learnable in 
the limit, if there is an inductive inference ma- 
chine identifying each function in the class in 
the limit. Identification in the limit is called EX- 
identification, since a program for f is termed an 
explanation for f. 

For instance, the class of all primitive re- 
cursive functions is EX-identifiable, whereas the 
class of all recursive functions is not [9]. 

The central question discussed by Case and 
Smith [4] is how the limitations of EX-learners 
are affected by posing certain requirements on the 
success criterion, concerning: 


¢ Convergence criteria: 
— e.g., when restricting the number of permit- 
ted mind changes 
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— e.g., when relaxing the constraints on syn- 
tactical convergence of the sequence of 
programs returned in the learning process 

e Accuracy: 

— e.g., when relaxing the number of per- 
mitted anomalies in the programs returned 
eventually 


Problem 1 In which way do modifications of 
EX-identification in terms of accuracy and con- 
vergence criteria affect the capabilities of the 
corresponding learners? 


Problem 2 In particular, if inaccuracies are per- 
mitted, can EX-learners always refute inaccurate 
hypotheses? 


Problem 3 How much relaxation of the model 
of ExX-identification is meeded to achieve 
learnability of the full class of recursive 
functions? 


Key Results 


Accuracy and Convergence Constraints 

In order to systematically address these problems, 
Case and Smith [4] defined inference types 
reflecting restrictions and relaxations of EX- 
identification as follows. 


Definition 1 Suppose S' is a class of recursive 
functions and m,n € N U {x}. S is EX,”- 
identifiable if there is an inductive inference ma- 
chine M, such that for any function f € S, there 
is some j € N satisfying: 


« M(f[z]) = / for all but finitely many z € N. 

* j=mf. 

¢ The cardinality of the set {z € N|M(f[z]) # 
M(f[z + 1))} is at most n. 


For intuition one may view 7 as an upper bound 
on the allowed number of “mind changes” and 
m as an upper bound on the allowed number of 
“anomalies.” 

EX,,” denotes the set of all classes of recur- 
sive functions which are EX,,”"-identifiable. 
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Definition 2 Suppose S' is a class of recursive 
functions and m € NU{x}. S is BC” -identifiable 
if there is an inductive inference machine M, 
which, for any function f € S, satisfies: 


* OM(sf[z) =" Ff for all but finitely many z € 
N. 


BC” denotes the set of all classes of recursive 
functions which are BC” -identifiable. BC is short 
for behaviorally correct; the difference to EX- 
learning is that convergence of the sequence of 
programs returned by the learner is defined only 
in terms of semantics, no longer in terms of 
syntax. 


The Impact of Accuracy and Convergence 
Constraints 

In general, each permission of mind changes or 
anomalies increases the capabilities of learners; 
however, mind changes cannot be traded in for 
anomalies or vice versa. 


Theorem 1 Let a,b,c,d € NU {x}. Then 
EX,% C EX@° ifand only ifa <candb < d. 


Corollary 1 For any m,n € N, the following 
inclusions hold. 


I. EX? CEX™*?) ¢ EX*, 
2. EX” C EX™ | C EX™. 


Theorem 2 Let n € N. Then EX® Cc BC” C 
BOY CBC’. 
These results provide a solution to Problem 1. 


Refutability 
In particular, refutability demands (in the 
sense that every incorrect hypothesis should be 
refutable; see [13]) are not applicable in the 
theory of inductive inference; see Problem 2. 
Formally, Case and Smith [4] consider 
refutability as a property guaranteed by 
Popperian machines, the latter being defined 
as follows: 


Definition 3 Suppose M is an inductive infer- 
ence machine M. M is Popperian if, on any in- 
put, M returns a program of a recursive function. 
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Results thereon include the following: 


Theorem 3 There is an EX-identifiable class S 
of recursive functions for which there is no Pop- 
perian inductive inference machine witnessing its 
EX-identifiability. 


Corollary 2 There is an EX' -identifiable class S 
of recursive functions for which there is no Pop- 
perian inductive inference machine witnessing its 
EX! -identifiability. 


Additionally, in EX!-identification, Popper’s 
refutability principle cannot be applied even if it 
concerns only those hypotheses returned in the 
limit. 


Learning All Recursive Functions 

Since the results above yield a hierarchy of in- 
ference types with strictly growing collections of 
learnable classes, there is also an implicit answer 
to Problem 3: the class of recursive functions is 
neither in EX, for any m,n € N U {«} nor in 
BC” for any m € N. In contrast to that, Case and 
Smith [4] prove: 


Theorem 4 The class of all recursive functions 
is in BC*. 


Applications 


The work of Case and Smith [4] has been of high 
impact in learning theory. 

A consequence of the discussion of anomalies 
is that refutability principles in general do not 
hold for identification in the limit. This result 
has given rise to later studies on methods and 
techniques inductive inference machines might 
apply in order to discover their errors [7] and 
thus to further insights into the nature of inductive 
inference. 

Concerning the study of mind change hier- 
archies, among others, their lifting to transfinite 
ordinal numbers [8] is a notable extension. 

Moreover, the theory of learning as proposed 
by Case and Smith [4] has been applied for 
the development of the theory of identifying re- 
cursive [11] or recursively enumerable [10] lan- 
guages. 
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Open Problems 


Among the currently open problems in induc- 
tive inference, one key challenge is to find a 
reasonable notion of the complexity of learning 
problems (i.e., of classes of recursive functions) 
involving the run-time complexity of learners as 
well as the number of mind changes required to 
learn the functions in a class. In particular, special 
natural classes of functions should be analyzed in 
terms of such a complexity notion. 

Though of course the hierarchies EX9” C 
EX,” Cc EX)” Cc . for any m € N 
reflect some increase of complexity in that 
sense, a corresponding complexity notion 
would not address the aspect of run-time 
complexity of learners. Different complexity 
notions have been introduced, such as the so- 
called intrinsic complexity [2,6] (neglecting run- 
time complexity) and the “measure under the 
curve” [5] (respecting the number of examples 
required, but neglecting the number of mind 
changes). In particular, for learning deterministic 
finite automata, different notions of run-time 
complexity have been discussed [12]. 

However, the definition of a more capacious 
complexity notion remains an open issue. 
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Problem Definition 


A social network is a graph of relationships and 
interactions within a set of individuals. Informa- 
tion can spread within a social network by “word- 
of-mouth” effects. In other words, information 
diffuses from individuals to individuals in a social 
network through the connections between them, 
and if some information is spread by some initial 
individuals, many individuals may believe in it 
due to information diffusion. A social network 
is denoted as G=(V, E,w), where V is a set of 
vertexes with size n, E C V x V is a set of 
edges with size m, and w : E — [0,1] is the 
set of all w(u,v) which is the weight of edge 
(u, Vv). 


Independent Cascade (IC) and Linear 

Threshold (LT) Models 

The IC and LT models [1] are two basic models of 
influence diffusion in social networks, and there 
are two vertex stages: inactive and active. The 
influence always starts from a set of a set S 
consists of seeds (initially active nodes). The time 
is divided into discrete steps 0,1,2,... Denote 
S; the set of active vertexes at step i (So = 
S and S_; = @). In the IC model, influence 
propagates as follows: S; is the union set of 
S;—1 and other vertexes activated by vertexes in 
S;—1 \ Sj—2 in step 7. Each node u has only one 
chance to activate each of its neighbors v with 
probability w(u, v) when u first becomes active. 
In the LT model, influence propagates as follows: 
at the beginning each vertex v picks a threshold 
6, uniformly at random from [0, 1] which is the 
threshold of this vertex becoming active. In each 
step i, Si = Sj-1 U {v| Diues,_, WU. v) = Ay}. 
Both IC and LT models stop at the step t+ 1 when 
the process reaches its maximum influence, i.e., 
St+1 = S}. 


Problem 1: Influence Maximization 
Problem (InfMax) [1] 
INPUT: A social network G = (V, E,w) and k, 
the number of seeds. 
OUTPUT: The set S containing k seeds that 
maximizes the influence T(S). 
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Price-Related Propagation (PR) Frame 
Adding monetary factor into the propagation 
process of the IC and LT models makes this 
price related (PR) propagation frame. In the 
PR frame, only the individual who adopts a 
product propagates this product’s influence, 
and the adoption depends on the relationship 
between the price offered and the individual’s 
valuation about this product. In detail, every 
vertex u has three stages: neutral, influenced, 
and active. Vertex u being neutral means it 
has no idea or positive attitude about this 
product. When u becomes influenced, u holds 
a positive attitude to the product but wu hasn’t 
adopted this product yet. Only if uw further 
turns into active stage, u adopts the product 
and propagates the influence by telling its 
network neighbors. The PR frame separates, 
holding a positive attitude and propagating 
influence, in which the two are the same in 
traditional IC and LT models. This separation 
comes from the fact that individuals in social 
networks are independent human beings who not 
only are influenced by the people around but 
also have their own judgements. If someone 
receives some information, surely he or she 
should first evaluate the information before 
spreading it. 

The PR frame assumes that each individual 
u has a valuation for the product, which is the 
highest price this individual thinks the product is 
worth. The rule of judging whether an influenced 
individual turns into active is the following: only 
if u is influenced and its valuation is higher than 
the offered price, u will turn active, adopt this 
product, and propagate the influence. The PR 
frame is an extension to the IC and LT models; 
it contains the PR-I model based on the IC and 
the PR-L model based on the LT. The rules of 
an individual turning from neutral to influenced 
in PR-I and PR-L model are the same as the 
rules of an individual turning from inactive to 
active in IC and LT model, respectively. However 
in the PR frame the influenced individuals do 
not propagate influence, but only the active ones 
do, and an influence individual turns to active 
if and only if the offered price is lower than its 
valuation. 
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Pricing Strategies in the PR Frame 

Since price is vitally significant in the PR frame, 
we design two strategies to determine the prices 
offered to the individuals. The first one is binary 
pricing (BYC), in which all chosen seeds are 
given free samples and all other individuals are 
charged the same price, and the second one is 
panoramic pricing (PAP), in which prices for 
individuals including seeds are unconstrained dif- 
ferent values that can be any value if needed. 

In the PR frame, choosing node u as a seed 
merely means turning u to influenced. However, 
in BYC, any seed u must further become active 
for each seed is offered a free sample, i.e., the of- 
fered price is 0 and no greater than the valuation. 
In PAP, on the other hand, a seed u may not be 
active. 

Price plays a vital role on the influence and 
profit in the PR frame. High prices may bring 
high profit but it hinders the influence propaga- 
tion, and to enlarge the influence, some sacrifice 
on profit is inevitable. Base on this observation, 
a parameter A € [0,1) is adopted to denote 
the decision maker’s preference toward influence 
and profit, and the objective is the weighted sum 
of influence and profit, which we call balanced 
influence and profit (BIP). 


Problem 2: Balanced Influence and Profit 
Maximization Problem (BIPMax) [2] 

INPUT: A social network G = (V,E,w), the 
distribution of customer evaluation, and X the 
decision maker’s preference. 

OUTPUT: The seed set S and the price p for all 
individuals that maximize the objective func- 
tion B(S, p) = 4-Z(S, p) + A—-A)- RIS, p) 
where Z(S, p) is the influence and R(S, p) is 
the profit. 


Key Results 


Result 1: InfMax under the IC and LT models is 
both NP-hard. [1] 

Result 2: BIPMax under the PR-I and PR-L 
models is both NP-hard. [2] 
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The above two results show the difficulties 
of solving InfMax and BIPMax, respectively. It 
can be seen that both of them are “hard” to 
solve. However, approximation algorithms may 
exist, and the following two properties are used 
to design and analyze algorithms. Suppose /f is a 
set function on subsets of V. 


Submodularity and Monotony 

1. Submodular function. f is called submodu- 
lar if forevery X CY CV andzeV\Y, 
F(X Ula) — FAX) = FO Viz) = f%). 

2. Monotone function. f is called monotone if 
F(X U {z}) = F(X) for any set ¥ C V and 
element z € V. 


Result 3: Influence T(S) under both IC and 
LT models is submodular and monotone w.rt. 
S. [1] 

Result 4: BIP B(S, p) under both PR-I and PR- 
L models is submodular w.rt. S, if the prices 
p are fixed and p; = c, where p; is the ith 
element of p and c is the manufacturing cost 
of the product. [2] 


Remark I B(S, p) under both PR-I and PR-L 
models is non-monotone w.r.t. S. [2] 


Algorithm for InfMax 

Nemhauser et al. in [3] showed that greedy hill- 
climbing algorithm has the approximation ratio 
with 1 — 1/e of maximizing a submodular and 
monotone set function f. The greedy algorithm 
of maximizing influence is presented in Algo- 
rithm 1: each time the vertex that brings the 
highest marginal influence will be picked as a 
new seed, until the desired number of seeds 
are picked. Hence, according to Result 3, Algo- 
rithm | has a constant performance ratio 1 — 1/e 
solving InfMax. 

Note that computing actual influence as well 
as marginal influence is #P-hard [1]. To estimate 
the influence in a reasonable time, Monte Carlo 
simulation is usually adopted, generating a num- 
ber of samples and calculating the average value 
of all samples. 
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Algorithm 1 Greedy algorithm 
S<@; 
while |S| < k do 
u = argmax{Z(S U {u}) — Z(S)}; 
uEeV\S 
S<SUu; 
end while 
output S; 


Algorithms for BIPMax 
For a non-monotone submodular set function /, 
Feige et al. [4] devised a deterministic local- 
search 4-approximation and a randomized 2. 
approximation algorithm if f is nonnegative; 
therefore, if the prices p are preknown and fixed, 
the techniques in [4] may be ideal approximation 
algorithms. However, usually prices p are to be 
determined to achieve the maximum BIP, and 
general algorithms need careful consideration. 
To give a better pricing method for BIPMax, 
both the manufacturing cost and local influence 
should be considered. The manufacturing cost 
is denoted by c. Individual v;’s evaluation is a 
random variable X; whose cumulative distribu- 
tion function (CDF) is denoted by F;. (If v; is 
influenced and being offered price q, then the 
probability that v; turns active is Prob(x; > q) = 
1 — Fj(q).) For v; itself, if it is chosen as the 
only seed and offered price p, the expected profit 
solely from v; is (1—F;(p))(p—c), what is more, 
the expected influence to other nodes solely from 
v; is (1— F;(p))-Z(u;, p). However, Z(v;, p) de- 
pends on other nodes’ prices; to ease the compu- 
tation, the following simple one-hop estimation 
is adopted: }!¥y.) outneighbor of v, W(i, #)/d™ (Ui), 
where d°(v;) is the outdegree of v;. Then the 
optimal price p; for v; is calculated as follows: 


Bi(p) = 0 Fi(p)) & Fae, 
+(1-A)- Fi(p))(p— 0), () 


arg min B;(p). (2) 
pe(0,1] 


Pi = 


Equation (1) considers both the manufacturing 
cost and the network structure; however, the price 
calculated by (2) is still myopic. 
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Determine the Seeds and Prices Under BYC 

In BYC the prices can only be 0 or the full price; 
for a company this strategy takes the least im- 
plementation expense. ABYC is the algorithm for 
BYC. ABYC contains two stages: first offering 
every individual a same full price and second 
determining the seeds whom free samples are 
given to. 

Equation (2) is not used in the first stage 
since the obtained p/ may vary from v;. Instead 
we aoe the universal optimal price: P, = 
arg min )* B;(p). 
pe[o,1] i 

Greedy is used in the second stage of 
determining seeds: every round for each non-seed 
vertex u we compute the marginal BIP of picking 
u as a seed, and choose the vertex that provides 
the highest marginal gain. When no marginal BIP 
gain can be bought by any vertex, ABYC stops. 

Suppose the price vector p = (p1,... Pn); 
denote (p_;,q) the vector obtained by altering 
pi, the ith element of p to q, i.e., (p_;,q) = 
(Diyss5 Pits Gs Pitts > ++ Pan): 


Algorithm 2 ABYC: the algorithm for BYC 


S<@,p<0; 
for Vv; € V do 
Pi — Py; 
end for 
while true do 
u < arg max{B(S U {v;}, (p_;,0)) — BCS, p)}: 
vijiEeV\S 
if BCS U {u}, (p_;, 0)) — BCS, p) > 0 then 
S<—SU tu}; p< (p_;,0); 
else break; 
end if 
end while 
output (S, p); 


Determine the Seeds and Prices Under PAP 

BYC is easy to implement, however it is too 
simple and constrained, PAP is much freer where 
prices are assigned with no constraint. APAP is 
the algorithm for PAP, and like ABYC it also 
contains two stages. In the first stage, to obtain 
p; for every v; (2) is adopted. In the second stage, 
the vertex with the maximum marginal BIP gain 
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is picked step by step until no positive gain is 
available. 

The computation of the marginal BIP gain 
under PAP when adding v; into the seed set S is 
much more complex comparing to BYC, since 
when choosing v; as a new seed a new price may 
also be offered to it. Suppose the new price for v; 
is q, then the marginal BIP gain of adding 1; is: 
B(S U{u;}, (p_;.4)) —B(S, p). Since B(S, p) is 
a constant w.r.t. g, BCS U {vi}, (p_;,q)) should 
be maximized. When offering price g to v;, only 
two outcomes exist in the sample space, outcome 
@, where v; accepts the price and turns active, 
outcome @g where v; rejects the price, stays 
influenced and never spreads the influence. If a1 
happens, the influence gain collected from v; is 
1 and the profit gain collected from v; is gq — c, 
suppose the influence from other nodes is /; and 
the profit from other nodes is R,, then the BIP 
gain is g;(q) = AU +1) + (1—-A)(q—c + Ri), 
which is a linear function w.r.t. g. Else if wo 
happens, the influence gain collected from v; 
is 1 and the profit gain collected from v; is 0, 
suppose the influence from other nodes is /g and 
the profit from other nodes is Ro, then the BIP 
gain ish; = AUo + 1) + (1 — A) Ro, a constant 
independent of qg. Prob(@,;) = 1 — F;(q) and 
Prob(@o) = F;(q). Hence the expected BIP is: 


6 (q) = gi(q)-d-Fig)) +h-Filg) ©) 


Algorithm 3 APAP: the algorithm for PAP 
S<—9,p<0; 
for Vv; € V do 


Di<—p,= arg min B; (p); 
pe[o.1] 


end for 
while true do 
for Vv; € V \ S do 
pj < arg max 6; (q); 
qe(0,1] 
end for 
u < arg max{B(S U {u;}, (p_;, p7)) — BCS, p)}: 
viEV\S 
if BOS U {u}, (p_;, pj} )) — BCS, p) > 0 then 
S<—SU {tu}; p< (p_i, 7); 
else break; 
end if 
end while 
output (S, p); 
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To calculate /; and Rj, set v; turns active 
with probability 1 and run Monte Carlo simula- 
tions, and to calculate 7) and Ro, set v; turns 
active with probability 0 and run Monte Carlo 
simulations. After obtaining /;, Ry, Jo, and Ro, 
p; should be computed. If 6;(q) is a closed 
form, then p; is easy to calculate. However 
6i(q) may not be a close form. For example, if 
the valuation follows normal distribution, then 
F; contains an integral term and 6;(q) is not a 
closed form. In this case, golden section search 
[5] which works fast on finding the extremum 
of a strictly unimodal function can be used. This 
technique successively narrows the range inside 
which the extremum exists to find it. Even if 
6;(q) is not always unimodal, it is unimodel 
in subintervals of [0,1]. To reduce error, di- 
vide the interval [0,1] into several small inter- 
vals with the same size and pick each small 
interval’s midpoint as a sample q;. The search 
starts with the interval that contains the sample 
qo = arg max; 6;(g;) and stops when the interval 
that contains p* is narrower than a predefined 
threshold. 
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Problem Definition 


One of the fundamental problems in social net- 
work is influence maximization. Informally, if 
we can convince a small number of individuals 
in a social network to adopt a new product or 
innovation, and the target is to trigger a maximum 
further adoptions, then which set of individuals 
should we convince? Consider a social network 
as a graph G(V,£) consisting of individuals 
(node set V) and relationships (edge set F); 
essentially influence maximization comes down 
to the problem of finding important nodes or 
structures in graphs. 


Influence Diffusion 

In order to address the influence maximization 
problem, first it is needed to understand the in- 
fluence diffusion process in social networks. In 
other words, how does the influence propagate 
over time through a social network? Assume 
time is partitioned into discrete time slots, and 
then influence diffusion can be modeled as the 
process by which activations occur from neighbor 
to neighbor. In each time slot, all previously 
activated nodes remain active and others either 
remain inactive or be activated by their neighbors 
according to the activation constraints. The whole 
process runs in a finite number of time slots and 
stops at a time slot when no more activation 
occurs. Let S denote the set of initially activated 
nodes; we denote by f(S) eventually the number 
of activations, and the target is to maximize f(S) 
with a limited budget. 


Problem (Influence Maximization) 

INPUT: A graph G(V, E) where V is the set of 
individuals and F is the set of relationships, 
an activation model f, and a limited budget 
number K. 

OUTPUT: A set S of nodes where S C V such 
that the final activations f(S) is maximized 
and|S|< K. 


Activation Models 
The influence maximization problem was first 
proposed by Domingos et al. and Richardson 
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Influence Maximization, 
Fig. 1 Pseudo-code: 
Greedy algorithm 


Ce eed 


et al. in [4] and [8], respectively, in which the 
social networks are modeled as Markov random 
field. After that, Kempe et al. ({6] and [7]) fur- 
ther investigated this problem in two models: 
Independent Cascade proposed by Goldenberg 
et al. ({5] and [11]) and Linear Threshold pro- 
posed by Granovetter et al. and Schelling et al., 
respectively, in [9] and [10]. 

In the Independent Cascade model, the ac- 
tivations are independent among different indi- 
viduals, i.e., each newly activated individual u 
will have a chance, in the next time slot, to 
activate his or her neighbors v with certain prob- 
ability p(u,v) which is independent with other 
activations. In the Linear Threshold model, the 
activation is based on a threshold manner; the 
influence from an individual u to another indi- 
vidual v is presented by a weight w(i, j) and 
the individual v will be activated at the moment 
when the sum of weights he or she receives 
from previous activated neighbors exceeds the 
threshold ¢(v). It is worthy to note that there are 
two ways to assign the thresholds to individuals: 
random and deterministic. In the random model, 
the thresholds are randomly selected at uniform 
during the time, while in the deterministic model, 
the thresholds are assigned to individuals at the 
beginning and fixed for all time slots. For the 
sake of simplicity, they are called Random Linear 
Threshold and Deterministic Linear Threshold, 
respectively. 


Key Results 


Greedy Algorithm 

In [6], it has been found that the activation 
function f under the Independent Cascade 
model and the Random Linear Threshold model 
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Greedy Algorithm 
let S — @ (S holds the selected nodes); 
while |S| < K do 
find v € (V\ S) such that f(.S U {v}) is maximized; 
let S— SU {v}; 


end while 


is sub-modular. Therefore, the natural greedy 
algorithm (Fig. 1), which selects the node with 
the maximum marginal gain repeatedly, achieves 
a (1 — 4)-approximation solution. However, the 
problem of exactly calculating the activation 
function f in a general graph G under the 
Independent Cascade model or the Random 
Linear Threshold model, respectively, is #P-hard 
[1,2], which indicates that the greedy algorithm 
is not a polynomial time algorithm for the two 
models. The time complexity directly follows 
the pseudo-code (Fig. 1). Assume there exists an 
oracle that can compute the activation function f 
in tT time, and then the greedy algorithm runs in 
O(K|V |r) time. 

In [13], it has been found that the problem 
of exactly calculating the activation function f 
given an arbitrary set S under the Deterministic 
Linear Threshold model can be solved in linear 
time in terms of the number of edges. Therefore, 
the greedy algorithm runs in O(K|V||E]|) time. 
However, it has no approximation guarantee un- 
der this model. 


Inapproximation Results 
Under the Independent Cascade model or the 
Random Linear Threshold model, it can be shown 
by doing a gap-preserving reduction from the Set 
Cover problem [3] that ( 1- +) is the best possi- 
ble polynomial time approximation ratio for the 
influence maximization problem; assume NP ¢ 
DTIME (n'°'°8"), Under the Deterministic Lin- 
ear Threshold model, it has been shown that there 
is no polynomial time n!~¢-approximation al- 
gorithm for the influence maximization problem 
unless P= NP where n is the number of nodes and 
0<e<1[12]. 

Actually in the case that an individual can 
be activated after one of his or her neighbors 
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becomes active, the greedy algorithm achieves 
a polynomial time (1 - 4)-approximation 
solution, and even in the simple case that 
an individual can be activated when one or 
two of his or her neighbors become active, 
the influence maximization problem under the 
Deterministic Linear Threshold model is NP- 
hard to approximate. 


Degree-Bounded Graphs 

A graph G(V, E) is a (di, d2)-degree-bounded 
graph if every node in V has at most d; incoming 
edges and at most d2 outgoing edges. 

For the sake of simplicity, the influence max- 
imization problem over such a degree-bounded 
graph is called (d;, d2)-influence maximization. 
In [13], it has been found that for any constant 
€ € (0,1), there is no polynomial time n!~<- 
approximation algorithm for the (2, 2)-influence 
maximization problem under the Deterministic 
Linear Threshold model unless P=NP where n 
is the number of nodes, which indicates that 
the influence maximization problem under De- 
terministic Linear Threshold model is NP-hard to 
approximate to within any nontrivial factor, even 
if an individual can be activated when at least two 
of his or her neighbors become active. 


Applications 


Influence maximization would be of great interest 
for corporations, such as Facebook, LinkedIn, 
and Twitter, as well as individuals who desire to 
spread their products, ideas, etc. The solutions 
have a wide range of applications in various 
fields, such as product promotions where corpo- 
rations want to distribute sample products among 
customers, political elections where candidates 
want to spread their popularity or political ideas 
among voters, and emergency situations where 
emergency news like sudden earthquake needs 
to spread to every resident in the community. 
In addition, the solutions may be also applicable 
in military defense where malicious information 
which has already propagated dynamically needs 
to be blocked. 
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Problem Definition 


This problem is concerned with efficiently find- 
ing a set of documents that closely match a query 
within a large corpus of documents (i.e., a search 
engine). This is accomplished by producing an 
index offline to the query processing and then 
using the index to quickly answer the queries. 
The indexing stage involves splitting the dataset 
into tokens and then constructing an inverted 
index which maps from each token to the list 
of document identifiers of the documents that 
contain that token (a postings list). The query 
can then be executed by converting it to a set of 
query tokens, using the inverted index to find the 
corresponding postings lists, and intersecting the 
lists to find the documents contained in all the 
lists (conjunctive intersection or boolean AND). 
A subsequent ranking step is used to restrict the 
conjunctive intersection to a list of top-k results 
that best answer the query. 


Objective 

Produce an efficient system to answer queries, 
where efficiency is a space-time trade-off involv- 
ing the storage of the inverted index (space) and 
the intersection of the lists (time). If the inverted 
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index is stored on a slow medium, then efficiency 
might also include the size of the lists required to 
answer the query (transfer). 


Design Choices 
There are many degrees of freedom in designing 
such a system: 


1. Creating a document to internal identifier 

mapping 

Encoding of the inverted index mapping 

Encoding of the postings lists 

Using auxiliary structures in postings lists 

Ordering of the internal identifiers in the post- 

ings lists 

6. Order and method of executing the list inter- 
section 


ea alee es 


Variants 

For queries where the conjunctive intersection is 
too small, the intersection can be relaxed to find 
documents containing a weighted portion of the 
query tokens (see t-threashold or Weak-AND). 
The query lists can be intersected using any 
boolean operators, though the most commonly 
added is the boolean NOT operator which can be 
used to quickly reduce the number of conjunctive 
results. The query results can also be reduced 
by including token offset restrictions, such as 
ensuring tokens appear as a phrase or within 
some proximity. While many implementations 
interleave the conjunctive intersection with the 
calculation of the ranking, some also use ranking 
information to prune documents from the inter- 
section or to terminate the query early when the 
correct results (or good enough results) have been 
found. 


Key Results 


Traditionally, inverted indexes were stored on 
disk, causing the reduction of transfer costs to 
be the dominant objective. Modern systems often 
store their inverted indexes in memory meaning 
that reducing overall index size is important and 
that implementation details of the intersection 
algorithms can produce significant performance 
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differences, thus leading to a more subtle space- 
time trade-off. In either case, the mapping portion 
of the inverted index (the dictionary or lexicon) 
can be implemented using a data structure such 
as a B-tree which is both fast and compact, so we 
do not examine dictionary implementations in the 
remainder of this article. 


Multi-list Processing 

Intersecting multiple lists can be implemented 
by intersecting the two smallest lists and then 
intersecting the result with the next smallest list 
iteratively, thus producing a set versus set (Svs) 
or term-at-a-time (TAAT) approach. If the lists 
are in sorted order, then each step of the svs 
approach uses the merge algorithm, which takes 
each element in the smaller list M and finds it in 
the larger list N by executing a forward search 
and then reports any elements that are found. The 
M list could be encoded differently than the NV 
list, and indeed, after the first svs step, it is the 
uncompressed result list of the previous step. The 
sequential processing and memory accesses of 
the svs approach allows the compiler and CPU 
to optimize the execution, making this approach 
extremely fast, even though temporary memory is 
required for intermediate result accumulators. If 
the lists are not sorted, then additional temporary 
memory must be used to intersect the lists using 
some equality-join algorithm. 

Intersecting multiple lists can also be imple- 
mented by intersecting all the lists at the same 
time. If the lists are not in sorted order, using 
this approach may require a large amount of 
temporary space for the join structures and the 
accumulators. If the lists are sorted, then we call 
this a document-at-a-time (DAAT) approach, and 
it requires very little temporary space: just one 
pointer per list to keep track of its processing 
location. The order that the lists are intersected 
could be static, such as ascending term frequency 
order (as done with svs), or it could adapt to 
the processing. All of these non-svs approaches 
jump among the lists that are stored at different 
memory locations, so it is more difficult for the 
compiler and CPU to optimize the execution. In 
addition, the loop iterating over the lists for each 
result item and the complications when using 
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different list encodings will slow down non-svs 
implementations. Despite these limitations, many 
of the optimizations for svs list intersection can 
be applied to implementations using non-svs ap- 
proaches. For systems that return the top-ranked 
results, a small amount of additional memory is 
needed for a top-k heap to keep track of the best 
results. Instead of adding results that match all 
the terms into a simple array of results, they are 
added to the heap. At the end of query processing, 
the content of the heap is output in rank order to 
form the final top-k query results. 


Uncompressed Lists 

Storing the lists of document identifiers in an 
uncompressed format simply means using a se- 
quential array of integers. For fast intersection, 
the integers in these lists are stored in order, 
thus avoiding join structures and allowing many 
methods of searching for a particular value in a 
list. As a result, there are many fast algorithms 
available for intersecting uncompressed integer 
lists, but the memory used to store the uncom- 
pressed lists is very large and probes into the 
list can produce wasted or inefficient memory 
access. All of these uncompressed intersection 
algorithms rely on random access into the lists, 
so they are inappropriate for compressed lists. 
We present only three of the best performing 
algorithms [2]: 


Galloping svs (g-svs): Galloping forward 
search probes into the list to find a point past 
the desired value, where the probe distance 
doubles each time, then the desired location is 
found using binary search within the last two 
probe points. 

Galloping swapping svs (g-swsvs): In the pre- 
vious galloping svs algorithm, values from 
the smaller list are found in the larger list. 
Galloping swapping svs, however, finds val- 
ues from the list with the smaller number 
of remaining integers in the other list, thus 
potentially swapping the roles of the lists. 

Sorted Baeza-Yates using adaptive binary for- 
ward search (ab-sBY): The Baeza-Yates al- 
gorithm is a divide and conquer approach that 
finds the median value of the smaller list in the 
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larger list, splits the lists, and recurses. Adding 
matching values at the end of the recursion 
produces a sorted result list. The adaptive bi- 
nary forward search variant uses binary search 
within the recursed list boundaries, rather than 
using the original list boundaries. 


Compressed Lists 

There are a large variety of compression algo- 
rithms available for sorted integer lists. The lists 
are first converted into differences minus one 
(i.e., deltas or d-gaps) to get smaller values, but 
this removes the ability to randomly access ele- 
ments in the list. Next, a variable length encoding 
is used to reduce the number of bits needed to 
store the values, often grouping multiple values 
together to allow word or byte alignment of 
the groups and faster bulk decoding. The most 
common list compression algorithms are Variable 
byte (vbyte), PForDelta (PFD), and Simple9 (S9). 
Recent work has improved decoding, and delta 
restore speeds for many list compression algo- 
rithms using vectorization [7]. Additional gains 
are possible by changing delta encoding to act 
on groups of values, thus improving runtime at 
the expense of using more space. Another re- 
cent approach called quasi-succinct indexing [10] 
first acts on the values as monotone sequences 
and then incorporates some delta encoding more 
deeply in the compression algorithm. 


List Indexes 

List indexes, also known as skip structures or 
auxiliary indexes, can be included to jump over 
values in the postings lists and thus avoid de- 
coding, or even accessing, portions of the lists. 
The desired jump points can be encoded in- 
line with the lists, but they are better stored in 
a separate contiguous memory location without 
compression, allowing fast scanning through the 
jump points. These list index algorithms can be 
used with compressed lists by storing the deltas 
of the jump points, but the block-based structure 
causes complications if the jump point is not 
byte or word aligned, as well as block aligned. 
The actual list values that are found in the skip 
structures can either be maintained within the 
original compressed list (overlaid) preserving fast 
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iteration through the list, or the values could 
be extracted (i.e., removed) from the original 
compressed lists, giving a space reduction but 
slower iteration through the list. 

A simple list index algorithm (“skipper” [9]) 
groups by a fixed number of elements storing 
every X" element in a separate array structure, 
where X is a constant, so we refer to it as 
skips(X). When intersecting lists, the skip struc- 
ture is scanned linearly to find the appropri- 
ate jump point into the compressed structure, 
where the decoding can commence. Using vari- 
able length skips is possible, such as tuning the 
number of skipped values relative to the list size 
n, perhaps using a multiple of ./n or log(n). 

Another type of list index algorithm 
(“lookup” [9]) groups by a fixed size document 
identifier range using the top-level bits of 
the value to index into an array storing the 
desired location in the encoded list, similar to 
a segment/offset scheme. Each list can pick the 
number of bits in order to produce reasonable 
jump sizes. We use D as the domain size and n as 
the list size, giving a list’s density as y = j. 
If we assume randomized data and use the 
parameter B to tune the system, then by using 


[1og, (2) bottom level bits will leave between 


B and B entries per segment in expectation. As a 


result, we call this algorithm segment(B). 


Bitvectors 

When using a compact domain of integers, as we 
are, the lists can instead be stored as bitvectors, 
where the bit number is the integer value and 
the bit is set if the integer is in the list. For 
runtime performance benefits, this mapping from 
the identifier to the bit location can be changed if 
it remains a one-to-one mapping and is applied to 
all bitvectors. 

If all the lists are stored as bitvectors, conjunc- 
tive list intersection can be easily implemented 
by combining the bitvectors of the query terms 
using bitwise AND (bvand), with the final step 
converting the result to a list of integers (bvcon- 
vert). Note, except for the last step, the result of 
each step is a bitvector rather than an uncom- 
pressed result list. The bvconvert algorithm can 
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be implemented as a linear scan of the bits of 
each word, but using a logarithmic check is faster 
since the bitvectors being converted are typically 
sparse. Encoding all the lists as bitvectors gives 
good query runtime, but the space usage is very 
large since there are many tokens. 

To alleviate the space costs of using bitvectors, 
the lists with density less than a parameter value 
F can be stored using normal delta compression, 
resulting in a hybrid bitvector algorithm [4]. This 
hybrid algorithm intersects the delta-compressed 
lists using merge and then intersects the remain- 
der with the bitvectors by checking if the ele- 
ments are contained in the first bitvector (bv- 
contains), repeating this for each bitvector in the 
query, with the final remaining values being the 
query result. Bitvectors are faster than other ap- 
proaches for dense lists, so this hybrid algorithm 
is faster than non-bitvector algorithms. It can 
also be more compact than other compression 
schemes, because dense lists can be compactly 
stored as bitvectors. In addition, large overlaid 
skips can be used in the delta-compressed lists to 
improve query runtime. 

In order to store more postings in the faster 
bitvector form, a semi-bitvector structure [5] en- 
codes the front portion of a list as a bitvector and 
the rest using skips over delta compression. By 
skewing lists to have dense front portions, this 
approach can improve both space and runtime. 


Other Approaches 
Quasi-succinct indices [10] store list values in 
blocks with the lower bits of the values in an array 
and the higher bits as deltas using unary encoding 
combined with skips for fast access. This struc- 
ture produces a good space-time trade-off when 
the number of higher-level bits is limited. The list 
intersection implementation can exploit the unary 
encoding of the higher-level bits by counting the 
number of ones in machine words to find values 
quickly. The resultant space-time performance is 
comparable to various skip-type implementations 
for conjunctive list intersection, though indexing 
speed may be slower. 

The treap data structure combines the func- 
tionality of a binary tree and a binary heap. This 
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treap data structure can implement list intersec- 
tion [6] by storing each list as a treap where 
the list values are used as the tree order and the 
frequencies are used as the heap order. During 
list intersection, subtrees can be pruned from 
the processing if the frequency is too low to 
produce highly ranked results. In order to make 
this approach viable, low-frequency values are 
stored in separate lists using delta-compression 
with skips. This treap and frequency separated 
index structure can produce some space-time per- 
formance improvements compared to existing 
ranking-based search systems. 

Wavelet trees can also be used to implement 
list intersection [8]. The postings lists are ordered 
by frequency, and each value is assigned a global 
location, so that each list can be represented 
by a range of global locations. A series of bit 
sequences represents a tree which starts with the 
frequency-ordered lists of the global locations 
and translates them into the actual document 
identifiers. Each level of the tree splits the docu- 
ment identifier domain ranges in half and encodes 
the edges of the tree using the bit sequences 
(left as O and right as 1). Multiple lists can be 
intersected by following the translation of their 
document ranges in this tree of bit sequences, 
only following the branch if it occurs in all of the 
list translations. After some careful optimization 
of the translation code, the wavelet tree data 
structure results in similar space usage, but faster 
query runtimes than some existing methods for 
conjunctive queries. 


Ranking 

Ranking algorithms are closely guarded trade se- 
crets for large web search companies, so their de- 
tails are not generally known. Many approaches, 
however, add term frequencies and/or postings 
offsets into the postings lists and combine this 
with corpus statistics to produce good results, as 
done in the standard BM25 approach. Unfortu- 
nately, information such as frequencies cannot be 
easily added to bitvector structures, thus limiting 
their use. Data and link analysis can also help 
by producing a global order, such as PageRank, 
which can be factored into the ranking function 
to improve results. 
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Reordering 

Document identifiers in postings lists can be 
assigned to make the identifier deltas smaller 
and more compressible. As a first stage, they are 
assigned to form a compact domain of values, 
while a second stage renumbers these identi- 
fiers to optimize the system performance in a 
process referred to as document reordering [3]. 
Reordering can improve space usage by placing 
documents with similar terms close together in 
the ordering, thus reducing the size of the deltas, 
which can then be stored more compactly, among 
other benefits. 


Applications 


Intersecting inverted lists is at the heart of search 
engine query processing and top-k operators in 
databases. 


Open Problems 


Going forward, main goal is to design list 
representations which are provably optimal in 
terms of space usage (entropy) together with 
algorithms that achieve the optimal trade-off 
between time and the space used by the given 
representation. 
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Experimental Results 


Most published works involving intersections of 
inverted lists prove their results through experi- 
mentation. Following this approach, we show the 
relative performance of the described approaches 
using an in-memory conjunctive list intersection 
system that has document ordered postings lists 
without any ranking-based structures or process- 
ing. We run this system on the TREC GOV? cor- 
pus and execute 5,000 of the associated queries. 
These experiments were executed on an AMD 
Phenom II X6 1090T 3.6GHz Processor with 
6GB of memory running Ubuntu Linux 2.6.32- 
43-server. The code was compiled using the gcc 
4.4.3 compiler with the -O3 command line pa- 
rameter. Our results are presented in Fig. | using 
a space-time (log-log) plot. For the configurations 
considered, the block size of the encoding always 
equals the skip size, X, and the bitvector config- 
urations all use X¥ = 256. 

With compressed lists alone, intersection is 
slow. On the other hand, uncompressed lists 
are much larger than compressed ones, but 
random access allows them to be fast. List 
indexes when combined with the compressed 
lists add some space, but their targeted access 
into the lists allows them to be even faster 
than the uncompressed algorithms. (For the list 
indexes, we present the fastest configurations we 
tested over the parameter ranges of X and B.) 
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Intersections of Inverted Lists, Fig. 1 Space vs. time (log-log) plot for various intersection algorithms 
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The performance of list indexes suggests that 
the benefits of knowing where to probe into 
the list (.e., using skips rather than a probing 
search routine) outweigh the cost of decoding 
the data at that probe location. Using the hybrid 
bitvector approach is much faster and somewhat 
smaller than the other techniques. Adding large 
overlaid skips to the delta-compressed lists 
allows the bitvectors + skips algorithm to improve 
performance. 

Reordering the documents to be in URL 
order gives significant space improvements. 
This ordering also improves query runtimes 
significantly, for all combinations of skips 
and/or bitvectors. Splitting the documents into 
eight groups by descending number of terms 
in document, reordering within the groups by 
URL ordering (td-g8-url), and using semi- 
bitvectors produce additional improvements in 
both space and runtime. This demonstration of 
the superior performance of bitvectors suggests 
that integrating them into ranking-based systems 
warrants closer examination. 


URLs to Code and Datasets 


Several standard datasets and query workloads 
are available from the Text REtreival Conference 
(TREC at http://trec.nist.gov). Many implemen- 
tations of search engines are available in the open 
source community, including Wumpus (http:// 
www.wumpus-search.org), Zettair (http://www. 
seg.rmit.edu.au/zettair/), and Lucene (http:// 
lucene.apache.org). 


Cross-References 


Compressing Integer Sequences 


Recommended Reading 


1. Anh VN, Moffat A (2005) Inverted index com- 
pression using word-aligned binary codes. Inf Retr 
8(1):15 1-166 

2. Barbay J, L6pez-Ortiz A, Lu T, Salinger A (2009) An 
experimental investigation of set intersection algo- 
rithms for text searching. J Exp Algorithmics 14:3.7, 
1-24 


993 


3. Blandford D, Blelloch G (2002) Index compression 
through document reordering. In: Proceedings of 
the data compression conference (DCC), Snowbird. 
IEEE, pp 342-351 

4. Culpepper JS, Moffat A (2010) Efficient set inter- 
section for inverted indexing. ACM Trans Inf Syst 
29(1):1, 1-25 

5. Kane A, Tompa FW (2014) Skewed partial bitvectors 
for list intersection. In: Proceedings of the 37th ACM 
international conference on research and develop- 
ment in information retrieval (SIGIR), Gold Coast. 
ACM, pp 263-272 

6. Konow R, Navarro G, Clarke CLA, Lopez-Ortiz 
A (2013) Faster and smaller inverted indices with 
treaps. In: Proceedings of the 36th ACM international 
conference on research and development in informa- 
tion retrieval (SIGIR), Dubin. ACM, pp 193-202 

7. Lemire D, Boytsov L (2013) Decoding billions of in- 
tegers per second through vectorization. Softw Pract 
Exp. doi: 10.1002/spe.2203. To appear 

8. Navarro G, Puglisi SJ (2010) Dual-sorted inverted 
lists. In: String processing and information retrieval 
(SPIRE), Los Cabos. Springer, pp 309-321 

9. Sanders P, Transier F (2007) Intersection in in- 
teger inverted indices. In: Proceedings of the 9th 
workshop on algorithm engineering and experiments 
(ALENEX), New Orlean. SIAM, pp 71-83 

10. Vigna S (2013) Quasi-succinct indices. In: Proceed- 
ings of the 6th international conference on web search 
and data mining (WSDM), Rome. ACM, pp 83-92 

11. Zukowski M, Heman S, Nes N, Boncz P (2006) 
Super-scalar RAM-CPU cache compression. In: Pro- 
ceedings of the 22nd international conference on data 
engineering (ICDE), Atlanta. IEEE, pp 59.1-59.12 


Intrinsic Universality in 
Self-Assembly 


Damien Woods 
Computer Science, California Institute of 
Technology, Pasadena, CA, USA 


Keywords 


Abstract Tile Assembly Model; 
universality; Self-assembly; Simulation 


Intrinsic 


Years and Authors of Summarized 
Original Work 


2012; Doty, Lutz, Patitz, Schweller, Summers, 
Woods 

2013; Demaine, Patitz, Rogers, Schweller, Sum- 
mers, Woods 


994 


2014; Meunier, Patitz, Summers, Theyssier, 
Winslow, Woods 

2014; Demaine, Demaine, 
Schweller, Winslow, Woods 


Fekete, Patitz, 


Problem Definition 


Algorithmic self-assembly [11] is the idea that 
small self-assembling molecules can compute as 
they grow structures. It gives programmers a set 
of theoretical models in which to specify and 
design target structures while trying to optimize 
resources such as number of molecule types or 
even construction time. The abstract Tile Assem- 
bly Model [11] is one such model. An instance of 
the model is called a tile assembly system and 
is a triple 7 = (T,o,T) consisting of a finite 
set T of square tiles, a seed assembly o (one or 
more tiles stuck together), and a temperature t € 
{1,2,3,...}, as shown in Fig. la. Each side of a 
square tile has a glue (or color) g which in turn 
has a strength s € {0,1,2,...}. Growth occurs on 
the integer plane and begins from a seed assembly 


b 
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(or a seed tile) placed at the origin, as shown 
in Fig. 1b. A tile sticks to a partially formed 
assembly if it can be placed next to the assembly 
in such a way that enough of its glues match the 
glues of the adjacent tiles on the assembly and the 
sum of the matching glue strengths is at least the 
temperature. Growth proceeds one tile at a time, 
asynchronously and nondeterministically. 

Here we discuss recent results and suggest 
open questions on intrinsic universality and sim- 
ulation as a method to compare self-assembly 
models. Figure 2 gives an overview of these and 
other results. For more details, see [12]. 


Simulation and Intrinsic Universality 

Intuitively, one self-assembly model simulates 
another if they grow the same structures, via 
the same dynamical growth processes, possibly 
with some spatial scaling. Let S and 7 be tile 
assembly systems of the abstract Tile Assembly 
Model described above. S is said to simulate T if 
the following conditions hold: (1) each tile of T 
is represented by one or more m x m blocks of 
tiles in S called supertiles, (2) the seed assembly 


c d 
mx mseed 
assembly : : 
La Lb 7 


Intrinsic Universality in Self-Assembly, Fig. 1 An in- 
stance of the abstract Tile Assembly Model and an ex- 
ample showing simulation and intrinsic universality. (a) 
A tile assembly system 7 consists of a tile set, seed tile, 
and a temperature t € N. Colored glues on the tiles’ 
sides have a natural number strength (shown here as 0, 
1, or 2 colored tabs). (b) Growth begins from the seed 
with tiles sticking to the growing assembly if the sum of 
the strengths of the matching glues is at least t. (c) An 
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intrinsically universal tile set U. (d) When initialized with 
a seed assembly (which encodes 7) and at temperature 2, 
the intrinsically universal tile set simulates the dynamics 
of 7 with each tile placement in 7 being simulated by the 
growth of anm xm block of tiles. Single tile attachment is 


* 
denoted by —, and — denotes multiple tile attachments. 
Note that both systems have many other growth dynamics 
that are not shown 
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Intrinsic Universality in Self-Assembly, Fig. 2 Classes 
of tile assembly systems and their relationship with re- 
spect to simulation. There is an arrow from B to A if A 
contains B with respect to simulation: that is, for each tile 
assembly system B € B, there is a tile assembly system 
Ag € A that simulates B. Dashed arrows denote contain- 
ment, solid arrows denote strict containment, a self-loop 
denotes the existence of an intrinsically universal tile set 
for a class and its omission implies that the existence of 


of 7 is represented by the seed assembly of S 
(one or more connected m x m supertiles), and 
(3) via supertile representation every sequence of 
tile placements in the simulated system 7 has a 
corresponding sequence of supertile placements 
in the simulator system S, and vice versa. It 
is worth pointing out that although the intuitive 
idea of one assembly system simulating another 
is fairly simple, the formal definition of simula- 
tion [10] gets a little technical as the filling out of 
supertiles in the simulator is an asynchronous and 
nondeterministic distributed process with many 
supertiles growing independently and in parallel 
in the simulator system. 


i 2HAM, tT = 22 


Locally consistent 
aTAM,T = 2 


such a tile set is an open problem. aTAM: abstract Tile 
Assembly Model (growth from a seed assembly by single 
tile addition in 2D), t denotes “temperature.” 2HAM: 
Two-Handed Tile Assembly Model (assemblies of tiles 
stick together in 2D). A 2HAM temperature hierarchy 
is shown for some c € {2,3,4,...} and, in fact, for 
each such ¢ the set of temperatures {c'|i € {2,3,...}} 
gives an infinite hierarchy of classes of strictly increasing 
simulation power in the 2HAM 


Key Results 


The Abstract Tile Assembly Model Is 
Intrinsically Universal 

A class of tile assembly systems C is said to be 
intrinsically universal if there exists a single set 
of tiles U that simulates any instance of C. For 
each such simulation, U should be appropriately 
initialized as an instance (i.e., a tile assembly sys- 
tem) of C itself. Figure Id illustrates the concept. 
For example, the abstract Tile Assembly Model 
has been shown to be intrinsically universal [5]. 
Specifically, this means that there is a single set 
of tiles U that when appropriately initialized is 
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capable of simulating an arbitrary tile assembly 
system J. To program such a simulation, tiles 
from 7 are represented as m x m supertiles (built 
from tiles in U), and the seed assembly of TJ is 
represented as a connected assembly o7 of such 
supertiles. Furthermore, the entire tile assembly 
system J (a finite object) is itself encoded in 
the supertiles of o7 of U/. Then if we watch all 
possible growth dynamics in both T = (T,o, T) 
and U = (U,07,2), we get that both systems 
produce the same set of assemblies via the same 
dynamics where we use a supertile representation 
function to map from supertiles over U to tiles 
from TJ. It is worth pointing out that in this 
particular construction [5], the simulating system 
is always (merely) at temperature t = 2 no 
matter how large the temperature (t > 1) of the 
simulated system. 

This intrinsically universal tile set U has the 
ability to simulate both the geometry and growth 
order of any tile assembly system. Modulo spatial 
rescaling U represents the full power and expres- 
sivity of the entire abstract Tile Assembly Model. 


Noncooperative Assembly Is Weaker than 
Cooperative Assembly 

The temperature 1, or noncooperative, model is a 
restriction of the abstract Tile Assembly Model. 
Despite its esoteric name, it models a fundamen- 
tal and ubiquitous form of growth: asynchronous 
growing and branching tips in Euclidian space 
where each new tile is added if it matches on at 
least one side. Separating the power of the nonco- 
operative and cooperative models has presented 
significant challenge to the community. 

Recently it has been shown that the nonco- 
operative model is provably weaker than the full 
model [10] in that sense that it is not capable of 
simulating arbitrary tile assembly systems. This 
is the first fully general negative result about 
temperature | that does not assume restrictions 
on the model nor unproven hypotheses. 

An interesting aspect of this result is that it 
holds for 3D noncooperative systems; they too 
cannot simulate arbitrary tile assembly systems. 
This seems quite shocking, given that 3D nonco- 
operative systems are Turing-universal [1]! So in 
particular, 3D noncooperative systems can sim- 
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ulate 2D (or 3D) cooperative systems by sim- 
ulating a Turing machine that in turn simulates 
the cooperative system, but this loose style of 
simulation ends up destroying the geometry and 
dynamics of tile assembly by encoding every- 
thing as “geometry-less” strings. Hence, Turing- 
universal algorithmic behavior in self-assembly 
does not imply the ability to simulate, in a di- 
rect geometric fashion, arbitrary algorithmic self- 
assembly processes. 


One Tile to Rule Them All 

As an example of a simulation result on a very 
different model of self-assembly, Demaine, De- 
maine, Fekete, Patitz, Schweller, Winslow, and 
Woods [4] describe a sequence of simulations 
that route from square tiles, to the intrinsically 
universal tile set, to hexagons (with strength < 
T, or weak, glues), to a single polygon that is 
translatable, rotatable, and flipable. Their fixed- 
sized polygon, when appropriately seeded, simu- 
lates any tile assembly system from the abstract 
Tile Assembly Model. They also show that with 
translation only (i.e., no rotation), such results are 
not possible with a small (size < 3) seed (al- 
though with larger seeds, a single translation-only 
polyomino simulates the space-time diagram of 
a 1D cellular automaton). In the simpler setting 
of Wang plane tiling, they give an easy method 
to “compile” any tile set 7 (on the square or 
hexagonal lattice) to a single regular polygon that 
simulates exactly the tilings of 7, except with 
tiny gaps between the polygons. 


Two Hands 

It has been shown that the two-handed, or hi- 
erarchical, model of self-assembly (where large 
assemblies of tiles may come together in a single 
step) is not intrinsically universal [3]. Specifically 
there is no tile set that, in the two-handed model, 
can simulate all two-handed systems for all tem- 
peratures. However, for each t € {2,3,4,...}, 
there is a tile set U; that is intrinsically universal 
for the class of two-handed systems that work at 
temperature t. Also, there is an infinite hierar- 
chy of classes of such systems with each level 
strictly more powerful than the one below. In fact 
there are an infinite set of such hierarchies, as 
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described in the caption of Fig.2. These results 
give a formalization of the intuition that multiple 
long-range interactions are more powerful than 
fewer long-range interactions in the two-handed 
model. 


Open Problems 


Gaps in Fig.2 (i.e., missing solid arrows and 
missing models) suggest a variety of open ques- 
tions. Also, it remains as future work to further 
tease apart the power of restrictions of the ab- 
stract Tile Assembly Model, for example, it re- 
mains open whether 2D noncooperative systems 
are intrinsically universal for themselves. 

It is an open question whether or not the 
hexagonal Tile Assembly Model [4], various 
polygonal Tile Assembly Models [4, 7], the 
Nubot model [13], and Signal-Passing Tile 
Assembly Model [6,9] are intrinsically universal. 
Furthermore, simulation could be used to tease 
apart the power of subclasses of these models. 

Gilbert et al. [7] investigate the computational 
power of various kinds of polygonal tile assembly 
systems, showing that regular polygon tiles with 
>6 sides simulate Turing machines. What is the 
relationship between tile geometry and simula- 
tion power? Do more sides give strictly more 
simulation power? 

A desirable feature of a simulator is not only 
that it simulates all possible dynamics of some 
simulated system, but that the probability of a 
given dynamics is roughly equal in both the 
simulated system and the simulator. Is there an 
intrinsically universal tile set with that property? 
Here, the probability of seeing a given dynamics 
or assembly in a simulator should be close to that 
of the simulated system, where “close” means, 
say, within a factor proportional to the spatial 
scaling. 

Does there exist a tile set U for the abstract 
Tile Assembly Model, such that for any (adver- 
sarially chosen) seed assembly o, at tempera- 
ture 2, this tile assembly system simulates some 
tile assembly system 7? Moreover, U should be 
able to simulate all such members 7 of some 
nontrivial class S. U is a tile set that can do 
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one thing and nothing else: simulate tile assembly 
systems from the class S. This question about U 
is inspired by the factor simulation question in 
CA [2]. 

Many algorithmic tile assembly systems use 
cooperative self-assembly to simulate Turing ma- 
chines in a “zig-zag” fashion, as do a num- 
ber of experimentally implemented systems. Can 
the negative result of [10] be extended to show 
2D temperature | abstract Tile Assembly Model 
systems do not simulate zig-zag tile assembly 
systems? 

There are a number of future research 
directions for the two-handed, or hierarchical, 
self-assembly model. One open question [3, 8] 
asks whether or not temperature t two-handed 
systems can simulate temperature t — 1 two- 
handed systems. Another direction involves 
finding which aspects of the model (e.g., 
mismatches, excess binding strength, geometric 
blocking) are required for intrinsic universality 
at a given temperature, to better understand the 
intricacies of this very powerful, but natural, 
model. 

Of course, there are many other ways to com- 
pare the power of self-assembly models: shape 
and pattern building, tile complexity, time com- 
plexity, determinism versus nondeterminism, and 
randomized (coin-flipping) algorithms in self- 
assembly. It remains as important future work to 
find relationships between these notions on the 
one hand and intrinsic universality and simulation 
on the other hand. Can ideas from intrinsic uni- 
versality be used to answer questions about these 
notions? 
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Motivation 


The problem of coordinating the access to a 
shared medium is a central challenge in wire- 
less networks. In order to solve this problem, a 
© Springer Science+Business Media New York 2016 
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proper medium access control (MAC) protocol is 
needed. Ideally, such a protocol should not only 
be able to use the wireless medium as effectively 
as possible, but it should also be robust against 
a wide range of interference problems includ- 
ing jamming attacks. Interference problems from 
outside sources are usually ignored in theory 
but in practice it is important to take these into 
account, particularly because the ISM frequency 
band, which is the standard band used for wire- 
less communication, is one of the most dirty 
frequency bands as it is affected by many devices 
like microwaves. 


Problem Definition 


We model inference from outside sources with 
the help of an adversary. In the most general 
model that we have published so far [9], our ad- 
versarial model is based on the most widely used 
model to capture interference problems, which is 
known as the SINR (signal-to-interference-and- 
noise ratio) model. In the SINR model, a message 
sent by node u is correctly received by node v 
if and only if Py(u)/(W + docs Pu(w)) = B 
where P,.(y) is the received power at node x 
of the signal transmitted by node y, NV is the 
background noise, and S is the set of nodes w # 
u that are transmitting at the same time as u. The 
threshold 6 > 1 depends on the desired rate, the 
modulation scheme, etc. When using the standard 
model for signal propagation, then this expres- 


sion results in (P(u)/d(u,v)*)/(N + ives 
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P(w)/d(w, v)*) => B where P(x) is the strength 
of the signal transmitted by x, d(x, y) is the 
Euclidean distance between x and y, and a is 
the path-loss exponent. We assume that all nodes 
transmit with some fixed signal strength P and 
that a > 2 + € for some constant € > 0, which is 
usually the case in an outdoors environment. 

In most theory papers on MAC protocols, the 
background noise WV is either ignored (i.e., V = 
0) or assumed to behave like a Gaussian variable. 
This, however, is an oversimplification of the real 
world. There are many sources of interference 
producing a non-Gaussian noise such as elec- 
trical devices, temporary obstacles, coexisting 
networks, or jamming attacks. In order to capture 
a very broad range of noise phenomena, we 
model the background noise NV (due to jamming 
or to environmental noise) with the aid of an 
adversary ADY that has a fixed energy budget 
within a certain time frame for each node v. More 
precisely, in our case, a message transmitted by a 
node u will be successfully received by node v if 
and only if 


P/d(u, v)* 
ADV(v) + Vives P/d(w, v) 


528. (1) 


where ADYV(v) is the current noise level created 
by the adversary at node v. The goal is to design 
a MAC protocol that allows the nodes to success- 
fully transmit messages under this model as long 
as this is in principle possible. 

For the formal description and analysis, we 
assume a synchronized setting where time pro- 
ceeds in synchronized time steps called rounds. 
In each round, a node uw may either transmit a 
message or sense the channel, but it cannot do 
both. A node which is sensing the channel may 
either (i) sense an idle channel, (ii) sense a busy 
channel, or (iii) receive a packet. In order to 
distinguish between an idle and a busy channel, 
the nodes use a fixed noise threshold #: if the 
measured signal power exceeds 2, the channel 
is considered busy, otherwise idle. Whether a 
message is successfully received is determined 
by the SINR rule described above. To leave some 
chance for the nodes to communicate, we restrict 
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the adversary to be (B,7T)-bounded: for each 
node v and time interval J of length T, a (B, T)- 
bounded adversary has an overall noise budget of 
B -T that it can use to increase the noise level at 
node v and that it can distribute among the time 
steps of J as it likes, depending on the current 
state of the nodes. This adversarial noise model is 
very general, since in addition to being adaptive, 
the adversary is allowed to make independent 
decisions on which nodes to jam at any point 
in time (provided that the adversary does not 
exceed its noise budget over a time window of 
size T). 

Our goal is to design a symmetric local-control 
MAC protocol (i.e., there is no central authority 
controlling the nodes, and all the nodes are exe- 
cuting the same protocol) that has a constant com- 
petitive throughput against any (B, T)-bounded 
adversary as long as certain conditions (that are 
as general as possible) are met. In order to define 
what we mean by “competitive,’ we need some 
notation. The transmission range of a node v is 
defined as the disk with center v and radius r 
with P/r® > 6d. Given a constant e > 0, 
a time step is called potentially busy at some 
node v if ADV(v) => (1 — «)d (ie., only a 
little bit of additional interference by the other 
nodes is needed so that v sees a busy channel). 
For a not potentially busy time step, it is still 
possible that a message sent by a node uw within 
v’s transmission range is successfully received 
by v. Therefore, as long as the adversary is forced 
to offer not potentially busy time steps due to 
its limited budget and every node has a least 
one other node in its transmission range, it is in 
principle possible for the nodes to successfully 
transmit messages. To investigate that formally, 
we use the following notation. For any time frame 
F and node v let f,(/) be the number of time 
steps in F that are not potentially busy at v 
and let sy(F) be the number of time steps in 
which v successfully receives a message. We call 
a protocol c-competitive for some time frame F if 
vey Su(F) = cep fo(F). An adversary is 
uniform if at any time step, ADV(v) = ADV(w) 
for all nodes v,w ¢€ V, which implies that 
Sv(F) = fw(F) for all nodes. 
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Key Results 


We presented a MAC protocol called SADE 
which can achieve a c-competitive throughput 
where c only depends on € and the path loss 
exponent a but not on the size of the network 
or other network parameters [9]. The intuition 
behind SADE is simple: each node v maintains a 
parameter p, which specifies v’s probability of 
accessing the channel at a given moment of time. 
That is, in each round, each node yu decides to 
broadcast a message with probability p,. (This is 
similar to classical random backoff mechanisms 
where the next transmission time ¢ is chosen uni- 
formly at random from an interval of size 1/ py.) 
The nodes adapt their py values over time in a 
multiplicative-increase multiplicative-decrease 
manner, i.e., the value is lowered in times when 
the channel is utilized (more specifically, we 
decrease p, whenever a successful transmission 
occurs) or increased during times when the 
channel is idling. However, py will never exceed 
P, for some sufficiently small constant p > 0. 

In addition to the probability value py, each 
node v maintains a time window estimate 7, and 
a counter c, for 7). The variable 7, is used to 
estimate the adversary’s time window T: a good 
estimation of T can help the nodes recover from a 
situation where they experience high interference 
in the network. In times of high interference, 7, 
will be increased and the sending probability py 
will be decreased. Now we are ready to describe 
SADE in full detail. 


Initially, every node v sets Ty := 1, cy := 
1, and p, := Pp. In order to distinguish 
between idle and busy rounds, each node 
uses a fixed noise threshold of v7. 

The SADE protocol works in synchro- 
nized rounds. In every round, each node v 
decides with probability py to send a mes- 
sage. If it decides not to send a message, it 
checks the following two conditions: 


e If v successfully receives a message, 


then py := (1+ y)! py. 
(continued) 
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e If v senses an idle channel (i.e., the total 
noise created by transmissions of other 
nodes and the adversary is less than #), 
then py := min{(1 + Y) Pv, P}, Ty -= 
max{1, Ty — 1}. 


Afterward, v sets Cy := Cy + lL. If cy > Ty 
then it does the following — v sets cy := 1 
— and if there was no idle step among the 
past 7, rounds, then py := (1 + y)7! py 
and 7, := Ty, + 2. 


Given that y € O(1/(log T + loglogn)), one 
can show the following theorem, where n is the 
number of nodes and N = max{n, T}. 


Theorem 1 When running SADE for at least 
Q((T log N)/e + (log N)*/(ye)?) time steps, 
SADE has a 2-X0/9?/") competitive 
throughput for any ((1 — €)v,T)-bounded 
adversary as long as (a) the adversary is 
uniform and the transmission range of every 
node contains at least one node or (b) there are 
at least 2/€ nodes within the transmission range 
of every node. 


SADE is an adaption of the MAC protocol 
described in [6] for Unit Disk Graphs that works 
in more realistic network scenarios considering 
physical interference. Variants of SADE have also 
been shown to be successful in other scenarios: 

In [7] a variant called ANTIJAM is presented 
for a simpler wireless model but a more severe 
adversary called reactive adversary, which is an 
adversary that can base the jamming decision 
on the actions of the nodes in the current time 
step and not just the initial state of the system 
at the current time step. However, the adversary 
can only distinguish between the cases that at 
least one node is transmitting or no node is 
transmitting, i.e., it cannot determine whether a 
transmitted message is successfully received. 

In [8] another variant called COMAC is pre- 
sented for a simpler wireless model that can han- 
dle coexisting networks. Even if these networks 
cannot exchange any information and the number 
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of these networks is unknown, the protocol is 
shown to be competitive. 

All of these results trace back to a first result 
in [1] for a very simple wireless model and the 
case of a single-hop wireless network. 


Applications 


Practical applications of our results are MAC 
protocols that are much more robust to outside 
interference and jamming than the existing 
protocols like 802.11. In fact, it is known that 
a much weaker jammer than the ones considered 
by us already suffices to dramatically reduce 
the throughput of the standard 802.11 MAC 
protocol [2]. 


Open Problems 


So far, we have not considered the case of power 
control and multiple communication channels. 
Multiple communication channels have been cov- 
ered in several other works (e.g., [3-5]) but under 
an adversary that is not as powerful as our adver- 
sary. Also, several of our bounds are not tight yet, 
so it remains to determine tight upper and lower 
bounds on the competitiveness of MAC protocols 
within our models. 
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Jamming-Resistant MAC Protocols for Wireless Net- 
works, Fig. 1 The throughput with respect to varying € 
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Experimental Results 


We conducted various simulations to study the 
robustness of SADE. When varying €, we found 
out that the worst-case bound of Theorem | may 
be too pessimistic in many scenarios, and the 
throughput depends to a lesser extent on the con- 
stant €. To be more specific, our results suggest 
that the throughput depends only polynomially 
on € (cf. the left-most image of Fig. 1), so more 
work is needed here. 
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Problem Definition 


K-best enumeration problems are a type of 
combinatorial enumeration in which, rather than 
seeking a single best solution, the goal is to find 
a set of k solutions (for a given parameter value 
k) that are better than all other possible solutions. 
Many of these problems involve finding 
structures in a graph that can be represented 
by subsets of the graph’s edges. In particular, the 
k shortest paths between two vertices s and ¢ in 
a weighted network are a set of k distinct paths 
that are shorter than all other paths, and other 
problems such as the k smallest spanning trees of 
a graph or the k minimum weight matchings in a 
graph are defined in the same way. 


Key Results 


One of the earliest works in the area of k-best 
optimization was by Hoffman and Pavley [10] 
formulating the k-shortest path problem; their 
paper cites unpublished work by Bock, Kantner, 
and Hayes on the same problem. Later research 
by Lawler [12], Gabow [7], and Hamacher and 
Queyranne [8] described a general approach to 
k-best optimization, suitable for many of these 
problems, involving the hierarchical partitioning 
of the solution space into subproblems. One way 
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of doing this is to view the optimal solution to 
a problem as a sequence of edges, and define 
one subproblem for each edge, consisting of 
the solutions that first deviate from the optimal 
solution at that edge. Continuing this subdivision 
recursively leads to a tree of subproblems, 
each having a worse solution value than its 
parent, such that each possible solution is the 
best solution for exactly one subproblem in the 
hierarchy. A best-first search of this tree allows 
the k-best solutions to be found. Alternatively, 
if both the first and second best solutions can be 
found, and differ from each other at an edge e, 
then one can form only two subproblems, one 
consisting of the solutions that include e and 
one consisting of the solutions that exclude e. 
Again, the subdivision continues recursively; 
each solution (except the global optimum) is the 
second-best solution for exactly one subproblem, 
allowing a best-first tree search to find the k- 
best solutions. An algorithm of Frederickson [6] 
solves this tree search problem in a number of 
steps proportional to k times the degree of the 
tree; each step involves finding the solution (or 
second-best solution) to a single subproblem. 
Probably the most important and heavily stud- 
ied of the k-best optimization problems is the 
problem of finding k shortest paths, first for- 
mulated by Hoffman and Pavley [10]. In the 
most basic version of this problem, the paths are 
allowed to have repeated vertices or edges (unless 
the input is acyclic, in which case repetitions are 
impossible). An algorithm of Eppstein [4] solves 
this version of the problem in the optimal time 
bound O(m + nlogn + k), where m and n are 
the numbers of edges and vertices in the given 
graph; that is, after a preprocessing stage that is 
dominated by the time to use Dijkstra’s algorithm 
to find a single shortest-path tree, the algorithm 
takes constant time per path. Eppstein’s algorithm 
follows Hoffman and Pavley in representing a 
path by its sequence of deviations, the edges that 
do not belong to a tree T of shortest paths to 
the destination node. The deviation edges that 
can be reached by a path in 7 from a given 
node v are represented as a binary heap (ordered 
by how much additional length the deviation 
would cause) and these heaps are used to define 
a partition of the solution space into subprob- 
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lems, consisting of the paths that follow a certain 
sequence of deviations followed by one more 
deviation from a specified heap. The best path ina 
subproblem is the one that chooses the deviation 
at the root of its heap, and the remaining paths 
can be partitioned into three sub-subproblems, 
two for the children of the root and one for the 
paths that use the root deviation but then continue 
with additional deviations. In this way, Eppstein 
constructs a tree of subproblems to which Freder- 
ickson’s tree-searching method can be applied. 

In a graph with cycles (or in an undirected 
graph which, when its edges are converted to 
directed edges, has many cycles), it is generally 
preferable to list only the k shortest simple (or 
loopless) paths, not allowing repetitions within a 
path. This variation of the k shortest paths prob- 
lem was formulated by Clarke et al. [3]. Yen’s 
algorithm [14] still remains the one with the 
best asymptotic time performance, O(kn(m + 
nlogn)); it is based on best-solution partition- 
ing using Dijkstra’s algorithm to find the best 
solution in each subproblem. A more recent al- 
gorithm of Hershberger et al. [9] is often faster, 
but is based on a heuristic that can sometimes 
fail, causing it to become no faster than Yen’s 
algorithm. In the undirected case, it is possible to 
find the k shortest simple paths in time O(k(m + 
nlogn)) [11]. 

Gabow [7] introduced both the problem of 
finding the k minimum-weight spanning trees 
of an edge-weighted graph, and the technique of 
finding a binary hierarchical subdivision of the 
space of solutions, which he used to solve the 
problem. In any graph, the best and second-best 
spanning trees differ only by one edge swap (the 
removal of one edge from a tree and its replace- 
ment by a different edge that reconnects the two 
subtrees formed by the removal), a property that 
simplifies the search for a second-best tree as 
needed for Gabow’s partitioning technique. The 
fastest known algorithms for the k-best span- 
ning trees problem are based on Gabow’s parti- 
tioning technique, together with dynamic graph 
data structures that keep track of the best swap 
in a network as that network undergoes a se- 
quence of edge insertion and deletion opera- 
tions. To use this technique, one initializes a 
fully-persistent best-swap data structure (one in 
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which each update creates a new version of the 
structure without modifying the existing versions, 
and in which updates may be applied to any 
version) and associates its initial version with 
the root of the subproblem tree. Then, whenever 
an algorithm for selecting the k best nodes of 
the subproblem tree generates a new node (a 
subproblem formed by including or excluding 
an edge from the allowed solutions) the parent 
node’s version of the data structure is updated 
(by either increasing or decreasing the weight 
of the edge to force it to be included or ex- 
cluded in all solutions) and the updated ver- 
sion of the data structure is associated with the 
child node. In this way, the data structure can 
be used to quickly find the second-best solu- 
tion for each of the subproblems explored by 
the algorithm. Based on this method, the k-best 
spanning trees of a graph with n vertices and 
m edges can be found (in an implicit repre- 
sentation based on sequences of swaps rather 
than explicitly listing all edges in each tree) 
in time O(MST(m,n) + k min(n, k)'/?) where 
MST(m,n) denotes the time for finding a single 
minimum spanning tree (linear time, if random- 
ized algorithms are considered) [5]. 

After paths and spanning trees, probably the 
next most commonly studied k-best enumeration 
problem concerns matchings. The problem of 
finding the & minimum-weight perfect matchings 
in an edge-weighted graph was introduced by 
Murty [13]. A later algorithm by Chegireddy and 
Hamacher [1] solves the problem in time O(k n>) 
(where n is the number of vertices in the graph) 
using the technique of building a binary partition 
of the solution space. Other problems whose 
k-best solutions have been studied include the 
Chinese postman problem, the traveling salesman 
problem, spanning arborescences in a directed 
network, the matroid intersection problem, binary 
search trees and Huffman coding, chess strate- 
gies, integer flows, and network cuts. 

For many NP-hard optimization problems, 
where even finding a single best solution is 
difficult, an approach that has proven very 
successful is parameterized complexity, in which 
one finds an integer parameter describing the 
input instance or its solution that is often 
much smaller than the input size, and designs 
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algorithms whose running time is a fixed 
polynomial of the input size multiplied by a non- 
polynomial function of the parameter value. Chen 
et al. [2] extend this paradigm to k-best problems, 
showing that, for instance, many NP-hard k-best 
problems can be solved in polynomial time per 
solution for graphs of bounded treewidth. 


Applications 


The & shortest path problem has many appli- 
cations. The most obvious of these are in the 
generation of alternative routes, in problems in- 
volving communication networks, transportation 
networks, or building evacuation planning. In 
bioinformatics, it has been applied to shortest- 
path formulations of dynamic programming al- 
gorithms for biological sequence alignment and 
also applied in the reconstruction of metabolic 
pathways, and reconstruction of gene regulation 
networks. The problem has been used frequently 
in natural language and speech processing, where 
a path in a network may represent a hypothesis 
for the correct decoding of an utterance or piece 
of writing. Other applications include motion 
tracking, genealogy, the design of power, com- 
munications, and transportation networks, timing 
analysis of circuits, and task scheduling. 

The problem of finding the k-best spanning 
trees has been applied to point process intensity 
estimation, the analysis of metabolic pathways, 
image segmentation and classification, the re- 
construction of pedigrees from genetic data, the 
parsing of natural-language text, and the analysis 
of electronic circuits. 
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Problem Definition 


The theory of bidimensionality simultaneously 
provides subexponential time parameterized al- 
gorithms and efficient approximation schemes 
for a wide range of optimization problems on 
planar graphs and, more generally, on classes of 
graphs excluding a fixed graph H as a minor. 
It turns out that bidimensionality also provides 
linear kernels for a multitude of problems on 
these classes of graphs. The results stated here 
unify and generalize a number of kernelization 
results for problems on planar graphs and graphs 
of bounded genus; see [2] for a more thorough 
discussion. 


Kernelization 

Kernelization is a mathematical framework for 
the study of polynomial time preprocessing of 
instances of computationally hard problems. Let 
G be the set of all graphs. A parameterized graph 
problem is a subset IT of G x N. An instance 
is a pair (G,k) € G XN. The instance (G,k) 
is a “yes”-instance of IT if (G,k) € IT anda 
“no”-instance otherwise. A strict kernel with ck 
vertices for a parameterized graph problem /7 
and constant c > 0 is an algorithm A with the 
following properties: 


e A takes as input an instance (G,xk), runs in 
polynomial time, and outputs another instance 
(G',k’). 

° (G’,k’) isa “yes”-instance of /T if and only if 
(G,k) is. 

© |V(G’)| <c-kandk’ <k. 


A linear kernel for a parameterized graph 
problem is a strict kernel with ck vertices for 
some constant c. We remark that our definition of 
a linear kernel is somewhat simplified compared 
to the classic definition [8], but that it is 
essentially equivalent. For a discussion of the 
definition of a kernel, we refer to the textbook of 
Cygan et al. [4]. 
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Graph Classes 

Bidimensionality theory primarily concerns itself 
with graph problems where the input graph is 
restricted to be in a specific graph class. A graph 
class C is simply a subset of the set G of all 
graphs. As an example, the set of all planar 
graphs is a graph class. Another example of a 
graph class is the set of all apex graphs. Here a 
graph H is apex if H contains a vertex v such that 
deleting v from H leaves a planar graph. Notice 
that every planar graph is apex. 

A graph H is a minor of a graph G if H can 
be obtained from G by deleting vertices, deleting 
edges, or contracting edges. Here contracting the 
edge {u,v} in G means identifying the vertices 
u and v and removing all self-loops and dou- 
ble edges. If H can be obtained from G just 
by contracting edges, then H is a contraction 
of G. 

A graph class C is minor closed if every minor 
of a graph in C is also in C. A graph class C is 
minor-free if C is minor closed and there exists 
a graph H ¢ C. A graph class C is apex-minor- 
free if C is minor closed and there exists an apex 
graph H ¢€ C. Notice that H ¢ C for a minor 
closed class C implies that H cannot be a minor 
of any graph G €C. 


CMSO Logic 

CMSO logic stands for Counting Monadic 
Second Order logic, a formal language to 
describe properties of graphs. A CMSO-sentence 
is a formula y with variables for single vertices, 
vertex sets, single edges and edge sets, existential 
and universal quantifiers (4 and VY), logical 
connectives V, A and -, as well as the following 
operators: 
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¢ vu &€ S, where v is a vertex variable and S is 
a vertex set variable. The operator returns true 
if the vertex v is in the vertex set S. Similarly, 
CMSO has an operator e € X where e is an 
edge variable and X is an edge set variable. 

°* vy, = U2, where v1 and v2 are vertex variables. 
The operator returns true if v; and v2 are the 
same vertex of G. There is also an operator 
€; = €@, to check equality of two edge 
variables e; and eo. 

* adj(v1, v2) is defined for vertex variables v1 
and v2 and returns true if v, and v2 are 
adjacent in G. 

¢ inc(v, e) is defined for a vertex variable v and 
edge variable e. ine(v,e) returns true if the 
edge e is incident to the vertex v in G, in other 
words, if v is one of the two endpoints of e. 

* card, (S) is defined for every pair of inte- 
gers p, qg, and vertex or edge set variable S. 
card,,,(S) returns true if |S| = g mod p. 
For an example, card2,;(S) returns true if |S| 
is odd. 


When we quantify a variable, we need to 
specify whether it is a vertex variable, edge vari- 
able, vertex set variable, or edge set variable. To 
specify that an existentially quantified variable x 
is a vertex variable we will write dx € V(G). 
We will use Ve € E(G) to universally quantify 
edge variables and 3X C V(G) to existentially 
quantify vertex set variables. We will always use 
lower case letters for vertex and edge variables 
and upper case letters for vertex set and edge set 
variables. 

A graph G on which the formula y is true is 
said to model y. The notation G E w means 
that G models w. As an example, consider the 
formula 


Ww, = Vu € V(G) Vx € V(G) Vy € V(G) Vz € * V(G): 
(x = y)V (x =z) V(y = Z) V madj(v, x) V madj(v, y) V madj(v, z) 


The formula w, states that for every four (not 
necessarily distinct) vertices v, x, y, and z, if x, 
y, and z are distinct, then v is not adjacent to all 


of {x,y,z}. In other words, a graph G models 
¢, if and only if the degree of every vertex G 
is at most 2. CMSO can be used to express many 
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graph properties, such as G having a Hamiltonian 
cycle, G being 3-colorable, or G being planar. 

In CMSO, one can also write formulas where 
one uses free variables. These are variables that 
are used in the formula but never quantified with 
an J or V quantifier. As an example, consider the 
formula 


Wos = Wue V(G) due V(G) F 
(ve S)A (u=vV adj(y, v)) 


The variable S is a free variable in Wps because 
it is used in the formula, but is never quantified. 
It does not make sense to ask whether a graph G 
models ps because when we ask whether the 
vertex v is in S, the set S is not well defined. 
However, if the set S C V(G) is provided 
together with the graph G, we can evaluate the 
formula Wps. Wps will be true for a graph G 
and set S C V(G) if, for every vertex u € V(G), 
there exists a vertex v € V(G) such that v is in S 
and either wu = v or wand v are neighbors in G. In 
other words, the pair (G, S) models wp s (written 
(G,S) — wWps) if and only if S is a dominating 
set in G (i.e., every vertex not in S has a neighbor 
in S). 


CMSO-Optimization Problems 
We are now in position to define the parameter- 
ized problems for which we will obtain kernel- 
ization results. For every CMSO formula y with 
a single free vertex set variable S', we define the 
following two problems: 

w-CMSO-Min (Max): 


INPUT: Graph G and integer k. 

QUESTION: Does there exist a vertex set S C 
V(G) such that (G,S) — wand |S| < k 
(|S| = k for Max). 


Formally, y¥-CMSO-MIN (MAX) is a parame- 
terized graph problem where the “yes” instances 
are exactly the pairs (G, k) such that there exists 
a vertex set S of size at most k (at least k) 
and (G,S) — yw. We will use the term CMSO- 
optimization problems to refer to ~-CMSO-MIN 
(MAX) for some CMSO formula wy. 


Kernelization, Bidimensionality and Kernels 


Many well-studied and not so well-studied 
graph problems are CMSO-optimization prob- 
lems. Examples include VERTEX COVER, DOMI- 
NATING SET, CYCLE PACKING, and the list goes 
on and on (see [2]). We encourage the interested 
reader to attempt to formulate the problems men- 
tioned above as CMSO-optimization problems. 
We will be discussing CMSO-optimization prob- 
lems on planar graphs and on minotr-free classes 
of graphs. 

Our results are for problems where the input 
graph is promised to belong to a certain graph 
class C. We formalize this by encoding member- 
ship in C in the formula y. For an example, wps- 
CMSO-MIN is the well-studied DOMINATING 
SET problem. If we want to restrict the problem to 
planar graphs, we can make a new CMSO logic 
formula Vplanar such that G & Wplanar if and 
only if G is planar. We can now make a new 
formula 


Vos = WoDs A Wplanar 


and consider the problem wp, ,-CMSO-MIN. 
Here (G,k) is a “yes” instance if G has a 
dominating set S of size at most k and G is 
planar. Thus, this problem also forces us to 
check planarity of G, but this is polynomial 
time solvable and therefore not an issue with 
respect to kernelization. In a similar manner, one 
can restrict any CMSO-optimization problem to 
a graph class C, as long as there exists a CMSO 
formula we such that G — we if and only if 
G €C. Luckily, such a formula is known to exist 
for every minor-free class C. We will say that a 
parameterized problem JT is a problem on the 
graph class C if, for every “yes” instance (G, k) 
of IT, the graph G is inC. 

For any CMSO-MIN problem JT, we have that 
(G,k) € IT implies that (G,k’) € IT for all 
k' > k. Similarly, fora CMSO-MAxX problem J7, 
we have that (G,k) € JT implies that (G,k’) € 
IT for all k’ < k. Thus, the notion of “opti- 
mality” is well defined for CMSO-optimization 
problems. For the problem JT = y-CMSO-MIN, 
we define 


OPTp(G) = min{k : (G,k) € TT}. 
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If no k such that (G,k) € IT exists, OPTq(G) 
returns +00. Similarly, for the problem J7 = w- 
CMSO-MAx, 


OPT(G) = max{k : (G,k) € IT}. 


If no k such that (G,k) € I exists, OPTq(G) 
returns —oo. We define SOL 7(G) to be a func- 
tion that given as input a graph G returns a set 
S of size OP Ty(G) such that (G,S) - w and 
returns null if no such set S exists. 


Bidimensionality 
For many problems, it holds that contracting an 
edge cannot increase the size of the optimal 
solution. We will say that such problems are con- 
traction closed. Formally, a CMSO-optimization 
problem IT is contraction closed if for any G 
and uv € E(G), OPTy(G/uv) < OPTy(G). 
If contracting edges, deleting edges, and deleting 
vertices cannot increase the size of the optimal 
solution, we say that the problem is minor closed. 
Informally, a problem is bidimensional if it is 
minor closed and the value of the optimum grows 
with both dimensions of a grid. In other words, 
on a (k x k)-grid, the optimum should be approx- 
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imately quadratic in k. To formally define bidi- 
mensional problems, we first need to define the 
(k x k)-grid Fz, as well as the related graph Ix. 

For a positive integer k, ak x k grid, denoted 
by Hix, is a graph with vertex set {(x, y) : x,y € 
{1,...,k}}. Thus, Hy has exactly k? vertices. 
Two different vertices (x, y) and (x’, y’) are ad- 
jacent if and only if |x—x’|+ |y—y’| = 1. Foran 
integer k > 0, the graph I is obtained from the 
grid Hz, by adding in every grid cell the diagonal 
edge going up and to the right and making the 
bottom right vertex of the grid adjacent to all 
border vertices. The graph [9 is shown in Fig. 1. 

We are now ready to give the definition of 
bidimensional problems. A CMSO-optimization 
problem JT is contraction-bidimensional if it is 
contraction closed, and there exists a constant 
c > O such that OPT (Ix) > ck. Similarly, 
IT is minor-bidimensional if it is minor closed, 
and there exists a constant c > 0 such that 
OP Tr (Aix) = ck?. 

As an example, the DOMINATING SET prob- 
lem is contraction-bidimensional. It is easy to 
verify that contracting an edge may not increase 
the size of the smallest dominating set of a graph 
G and that I; does not have a dominating set of 


= _9)2 
size smaller than Gor. 
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Separability 

Our kernelization algorithms work by recursively 
splitting the input instance by small separators. 
For this to work, the problem has to be somewhat 
well behaved in the following sense. Whenever a 
graph is split along a small separator into two in- 
dependent sub-instances L and R, the size of the 
optimum solution for the graph G[L] is relatively 
close to the size of the intersection between L and 
the optimum solution to the original graph G. We 
now proceed with a formal definition of what it 
means for a problem to be well behaved. 

For a set L C V(G), we define 0(L) to be 
the set of vertices in L with at least one neighbor 
outside L. A CMSO-optimization problem JT is 
linear separable if there exists a constant c > 0 
such that for every set L C V(G), we have 


|[SOLm(G) NL] — ec - |a(L)| s OP Tn(G[L}) 
< |SOLn(G)N L| +.¢- |a(L)I. 


For a concrete example, we encourage the 
reader to consider the DOMINATING SET prob- 
lem and to prove that for DOMINATING SET the 
inequalities above hold. The crux of the argument 
is to augment optimal solutions of G and G[L] by 
adding all vertices in 0(L) to them. 


Key Results 


We can now state our main theorem. 


Theorem 1 Let IT be a separable CMSO- 
optimization problem on the graph class C. Then, 
if IT is minor-bidimensional and C is minor-free, 
or if IT is contraction-bidimensional and C is 
apex-minor-free, II admits a linear kernel. 


The significance of Theorem | is that it is, in 
general, quite easy to formulate graph problems 
as CMSO-optimization problems and prove that 
the considered problem is bidimensional and sep- 
arable. If we are able to do this, Theorem | imme- 
diately implies that the problem admits a linear 
kernel on all minor-free graph classes, or on all 
apex-minor-free graph classes. As an example, 
the DOMINATING SET problem has been shown 
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to have a linear kernel on planar graphs [1], and 
the proof of this fact is quite tricky. However, 
in our examples, we have shown that DOMINAT- 
ING SET is a CMSO-MIN problem, that it is 
contraction-bidimensional, and that it is separa- 
ble. Theorem | now implies that DOMINATING 
SET has a linear kernel not only on planar graphs 
but on all apex-minor-free classes of graphs! One 
can go through the motions and use Theorem | to 
give linear kernels for quite a few problems. We 
refer the reader to [9] for a non-exhaustive list. 

We remark that the results stated here are 
generalizations of results obtained by Bodlaender 
et al. [2]. Theorem | is proved by combining 
“algebraic reduction rules” (fully developed by 
Bodlaender et al. [2]) with new graph decompo- 
sition theorems (proved in [9]). The definitions 
here differ slightly from the definitions in the 
original work [9] and appear here in the way they 
will appear in the journal version of [9]. 


Cross-References 


Bidimensionality 
Data Reduction for Domination in Graphs 


Recommended Reading 


1. Alber J, Fellows MR, Niedermeier R (2004) 
Polynomial-time data reduction for dominating set. 
J ACM 51(3):363-384 

2. Bodlaender HL, Fomin FV, Lokshtanov D, Penninkx 
E, Saurabh S, Thilikos DM (2013) (Meta) Kerneliza- 
tion. CoRR abs/0904.0727. http://arxiv.org/abs/0904. 
0727 

3. Borie RB, Parker RG, Tovey CA (1992) Auto- 
matic generation of linear-time algorithms from 
predicate calculus descriptions of problems on re- 
cursively constructed graph families. Algorithmica 
71(58&6):555-581 

4. Cygan M, Fomin FV, Kowalik L, Lokshtanov D, 
Marx D, Pilipczuk M, Pilipczuk M, Saurabh S (2015, 
to appear) Parameterized algorithms. Springer, Hei- 
delberg 

5. Demaine ED, Hajiaghayi M (2005) Bidimension- 
ality: new connections between FPT algorithms 
and PTASs. In: Proceedings of the 16th annual 
ACM-SIAM symposium on discrete algorithms 
(SODA), Vancouver. SIAM, pp 590-601 


Kernelization, Constraint Satisfaction Problems Parameterized above Average 


6. Demaine ED, Hajiaghayi M (2008) The bidimension- 
ality theory and its algorithmic applications. Comput 
J 51(3):292-302 

7. Demaine ED, Fomin FV, Hajiaghayi M, Thilikos DM 
(2005) Subexponential parameterized algorithms on 
graphs of bounded genus and H -minor-free graphs. J 
ACM 52(6):866-893 

8. Downey RG, Fellows MR (2013) Fundamentals of 
parameterized complexity. Texts in computer science. 
Springer, London 

9. Fomin FV, Lokshtanov D, Saurabh S, Thilikos DM 
(2010) Bidimensionality and kernels. In: Proceedings 
of the 20th annual ACM-SIAM symposium on dis- 
crete algorithms (SODA), Austin. SIAM, pp 503-510 

10. Fomin FV, Lokshtanov D, Raman V, Saurabh S 
(2011) Bidimensionality and EPTAS. In: Proceed- 
ings of the 21st annual ACM-SIAM symposium on 
discrete algorithms (SODA), San Francisco. SIAM, 
pp 748-759 


Kernelization, Constraint 
Satisfaction Problems 
Parameterized above Average 


Gregory Gutin 
Department of Computer Science, Royal 
Holloway, University of London, Egham, UK 


Keywords 


Bikernel; Kernel; MaxCSP; MaxLin; MaxSat 


Years and Authors of Summarized 
Original Work 


2011; Alon, Gutin, Kim, Szeider, Yeo 


Problem Definition 


Let r be an integer, let V = {v1,..., un} bea set 
of variables, each taking values —1 (TRUE) and 1 
(FALSE), and let ® be a set of Boolean functions, 
each involving at most 7 variables from V. In the 
problem MAx-r-CSP, we are given a collection 
F of m Boolean functions, each f € F being a 
member of @ and each with a positive integral 
weight. Our aim is to find a truth assignment 
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that maximizes the total weight of satisfied func- 
tions from F. We will denote the maximum by 
sat(F). 

Let A be the average weight (over all truth 
assignments) of satisfied functions. Observe that 
A is a lower bound for sat(¥). In fact, A is 
a tight lower bound, whenever the family @ is 
closed under replacing each variable by its com- 
plement [1]. Thus, it is natural to parameterize 
MAx-r-CSP as follows (AA stands for Above 
Average). 


MAx-r-CSP-AA 


Instance: A collection F of m Boolean func- 
tions, each f € F being a member of @, 
each with a positive integral weight, and a 
nonnegative integer k. 

Parameter: k. 

Question: sat(F) > A+k? 


If ® is the set of clauses with at most r literals, 
then we get a subproblem of MAx-r-CSP-AA, 
abbreviated MAX-r-SAT-AA, whose unparame- 
terized version is simply MAx-r-SAT. Assign 
—1 or 1 to each variable in V randomly and 
uniformly. Since a clause c of an MAX-r-SAT-AA 
instance can be satisfied with probability 1 — 2”¢, 
where r, is the number of literals in c, we have 
A= Veer (1 — 2”). Clearly, A is a tight lower 
bound. 

If @ is the set S of equations []jc7, vi = ;, 
j = 1,...,m, where v;,b; € {—1, 1}, bjs are 
constants, |/;| <r, then we get a subproblem of 
MAx-r-CSP-AA, abbreviated MAX-r-LIN2-AA, 
whose unparameterized version is simply MAX- 
r-LIN2. Assign —1 or 1 to each variable in V 
randomly and uniformly. Since each equation of 
F can be satisfied with probability 1/2, we have 
A = W/2, where W is the sum of the weights 
of equations in F. For an assignment v = v° of 
values to the variables, let sat(S, v°) denote the 
total weight of equations of S satisfied by the 
assignment. The difference sat(S,v°) — W/2 
is called the excess of x°. Let sat(S) be 
the maximum of sat(S,v°) over all possible 
assignments v9. 
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The following notion was introduced in [1]. 
Let IT and IT’ be parameterized problems. A 
bikernel for IT is a polynomial-time algorithm 
that maps an instance (/,k) of IT to an instance 
(I', k’) of IT’ such that (i) (7, k) € 7 if and only 
if (',k’) € IT’ and (ii) k’ < g(k) and |/’| < 
g(k) for some function g. The function g(k) is 
called the size of the bikernel. It is known that a 
decidable problem is fixed-parameter tractable if 
and only if it admits a bikernel [1]. However, in 
general a bikernel can have an exponential size, 
in which case the bikernel may not be useful as a 
data reduction. A bikernel is called a polynomial 
bikernel if both f(k) and g(k) are polynomials 
ink. 

When JT = IT’ we say that a bikernel for J7 is 
simply a kernel of IT. A great deal of research has 
been devoted to decide whether a problem admits 
a polynomial kernel. 

The following lemma of Alon et al. [1] shows 
that polynomial bikernels imply polynomial ker- 
nels. 


Lemma 1 Let IT, IT' be a pair of decidable pa- 
rameterized problems such that the nonparam- 
eterized version of IT’ is in NP and the non- 
parameterized version of II is NP-complete. If 
there is a bikernelization from IT to I’ producing 
a bikernel of polynomial size, then II has a 
polynomial-size kernel. 


Key Results 


Following [2], for a Boolean function f of 
weight w(f) and on r(f) < r Boolean variables 
Vips- +s Vipgpy> WE introduce a polynomial 
hr(v), v = (v1,...,Un) as follows. Let 
S- C{-l, 1)" denote the set of all satisfying 
assignments of f. Then 


hp(v) = w(f)2 7? 


r(f) 
5 |THaraay-1 
(@itsas arcs) ES J=1 


Leth(v) = )i rex hy (v). Itis easy to see (cf. 
[1]) that the value of h(v) at some v° is precisely 
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2’(U — A), where U is the total weight of the 
functions satisfied by the truth assignment v°. 
Thus, the answer to MAX-r-CSP-AA is YES if 
and only if there is a truth assignment v° such 
that h(v°) > k2". 

Algebraic simplification of h(v) will lead 
us the following (Fourier expansion of h(v), 
cf. [7]): 


(1) 


h(v) = > cs [[u. 


SeF ieS 


where F = {@ #£ S C {1,2,...,n} 
cs # 0,|S| < r}. Thus, |F| < nn”. The 
sum )osere¢s[]jes vi can be viewed as the 
excess of an instance of MAx-r-LIN2-AA, 
and, thus, we can reduce MAXx-r-CSP-AA into 
MAx-r-LIN2-AA in polynomial time (since 
r is fixed, the algebraic simplification can be 
done in polynomial time and it does not matter 
whether the parameter of MAX-r-LIN2-AA is 
k or k’! = k2"). It is proved in [5] that MAx- 
r-LIN2-AA has a kernel with O(k7) variables 
and equations. This kernel is a bikernel from 
MAx-r-CSP-AA to MAX-r-LIN2-AA. Thus, by 
Lemma 1, we obtain the following theorem of 
Alon et al. [1]. 


Theorem 1 MAx-r-CSP-AA admits a polyno- 
mial-size kernel. 


Applying a reduction from MAx-r-LIN2-AA 
to MAX-r-SAT-AA in which each monomial in 
(1) is replaced by 2’! Clauses, Alon et al. [1] 
obtained the following: 


Theorem 2 MAx-r-SAT-AA admits a_ kernel 
with O(k?) clauses and variables. 


It is possible to improve this theorem with 
respect to the number of variables in the kernel. 
The following result was first obtained by Kim 
and Williams [6] (see also [3]). 


Theorem 3 MAxX-r-SAT-AA admits a_ kernel 
with O(k) variables. 


Crowston et al. [4] studied the following 
natural question: How parameterized complexity 
of MAx-r-SAT-AA changes when r is no longer 
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a constant, but a function r(n) of n. They proved 
that MAx-r(n)-SAT-AA is para-NP-complete 
for any r(n) > [logn]. They also proved that 
assuming the exponential time hypothesis, MAX- 
r(n)-SAT-AA is not even in XP for any integral 
r(n) > loglogn + ¢(n), where ¢(n) is 
any real-valued unbounded strictly increasing 
computable function. This lower bound on 
r(n) cannot be decreased much further as they 
proved that MAX-r(n)-SAT-AA is (i) in XP 
for any r(n) < loglogn — logloglogn and 
(ii) fixed-parameter tractable for any r(n) < 
loglogn — logloglogn — ¢(n), where ¢(n) 
is any real-valued unbounded strictly increasing 
computable function. The proofs use some results 
on MAXLIN2-AA. 
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Problem Definition 


Research on kernelization is motivated in two 
ways. First, when solving a hard (e.g., NP-hard) 
problem in practice, a common approach is to 
first preprocess the instance at hand before run- 
ning more time-consuming methods (like integer 
linear programming, branch and bound, etc.). The 
following is a natural question. Suppose we use 
polynomial time for this preprocessing phase: 
what can be predicted of the size of the instance 
resulting from preprocessing? The theory of ker- 
nelization gives us such predictions. A second 
motivation comes from the fact that a decidable 
parameterized problem belongs to the class FPT 
(i.e., is fixed parameter tractable,) if and only if 
the problem has kernelization algorithm. 

A parameterized problem is a subset of 1* x 
N, for some finite set X’. A kernelization algo- 
rithm (or, in short kernel) for a parameterized 
problem Q C Z* x N is an algorithm A that 
receives as input a pair (x,k) € X* x N and 
outputs a pair (x’,k’) = A(x,k), such that: 


¢ Auses time, polynomial in |x| + k. 
¢ (x,k) € OQ, if and only if (x’,k’) € QO. 
* There are functions f, g, such that |x’| < 


f(k) and k’ < g(k). 
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In the definition above, f and g give an upper 
bound on the size, respectively the parameter of 
the reduced instance. Many well-studied prob- 
lems have kernels with k’ < k. The running 
time of an exact algorithm that starts with a 
kernelization step usually is exponential in the 
size of the kernel (i.e., f(k)), and thus small 
kernels are desirable. A kernel is said to be poly- 
nomial, if f and g are bounded by a polynomial. 
Many well-known parameterized problems have 
a polynomial kernel, but there are also many for 
which such a polynomial kernel is not known. 

Recent techniques allow us to show, under a 
complexity theoretic assumption, for some pa- 
rameterized problems that they do not have a 
polynomial kernel. The central notion is that 
of compositionality; with the help of transfor- 
mations and cross compositions, a larger set of 
problems can be handled. 


Key Results 


Compositionality 

The basic building block of showing that prob- 
lems do not have a polynomial kernel (assuming 
NP € coNP/ poly) is the notion of composi- 
tionality. It comes in two types: or-composition 
and and-composition. 


Definition 1 An or-composition for a parameter- 

ized problem QO C ¥’* xNis an algorithm that: 

e Receives as input a sequence of in- 
stances for QO with the same parameter 
(s1,k), (S2,k),..., (Sr) 

* Uses time, polynomial in k + 5°; _, |si| 

¢ Outputs one instance for Q, (s’,k’) € Y*xN, 
such that: 


1. (s’,k’) € Q, if and only if there is an i, 
1 <i <r, with (s;,k) € Q. 
2. k’ is bounded by a polynomial in k. 


The notion of and-composition is defined sim- 
ilarly, with the only difference that condition (1) 
above is replaced by 
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(s’,k’) € Q, if and only if for alli, 1 < 
i<r:(sj;,k)€Q. 


We define the classic variant of a parame- 
terized problem Q C &X* x N as the decision 
problem, denoted Q° where we assume that the 
parameter is encoded in unary, or, equivalently, 
an instance (s, k) is assumed to have size |s| + k. 

Combining results of three papers gives the 
following results. 


Theorem1 Let OQ C X* x N be a parame- 
terized problem. Suppose that the classic vari- 
ant of Q, Q° is NP-hard. Assume that NP ¢ 
coNP/poly. 


1. (Bodlaender et al. [3], Fortnow and 
Santhanam [12]) If Q has an or-composition, 
then Q has no polynomial kernel. 

2. (Bodlaender et al. [3], Drucker [11]) If O has 
an and-composition, then Q has no polyno- 
mial kernel. 


The condition that NP ¢ coNP/poly is 
equivalent to coNP ¢ NP/ poly; if it does not 
hold, the polynomial time hierarchy collapses to 
the third level [19]. 

For many parameterized problems, one can 
establish (sometimes trivially, and sometimes 
with quite involved proofs) that they are or- 
compositional or and-compositional. Taking the 
disjoint union of instances often gives a trivial 
composition. A simple example is the LONG 
PATH problem; it gets as input a pair (G, &) with 
G an undirected graph and asks whether G has a 
simple path of length at least k. 


Lemmal /f NP ¢ coNP/poly, then LONG 
PATH has no kernel polynomial in k. 


Proof LONG PATH is well known to be NP- 
complete. Mapping (Gi,k),...,(Gr,k) to 
the pair (H,k) with H the disjoint union of 
G,,...,G, is an or-composition. So, the result 
follows directly as a corollary of Theorem 1. O 


The TREEWIDTH problem gets as input a 
pair (G,k) and asks whether the treewidth of 
G is most k. As it is NP-hard to decide if the 
treewidth of a given graph G is at most a given 
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number k [1] and the treewidth of a graph is 
the maximum of its connected components, tak- 
ing the disjoint union gives an and-composition 
for the TREEWIDTH problem and shows that 
TREEWIDTH has no polynomial kernel unless 
NP C coNP/poly. Similar proofs work for 
many more problems. Many problems can be 
seen to be and- or or-compositional and thus have 
no polynomial kernels under the assumption that 
NP €coNP/ poly. See, e.g., [3,5,9, 17]. 


Transformations 

Several researchers observed independently (see 
[2,5,9]) that transformations can be used to show 
results for additional problems. The formaliza- 
tion is due to Bodlaender et al. [5]. 


Definition 2. A polynomial parameter transfor- 
mation (ppt) from parameterized problem Q C 
&* x N to parameterized problem R C Y* x N 
is an algorithm A that: 


¢ Has as input an instance of Q, (s,k) € &* x 
N. 

* Outputs an instance of R, (s’,k’) € U* x N. 

* (s,k) € QO if and only if (s’,k’) € R. 

¢ A uses time polynomial in |s| + k. 

¢ k’ is bounded by a polynomial in k. 


The differences with the well-known 
polynomial time or Karp reductions from NP- 
completeness theory are small: note in particular 
that it is required that the new value of the 
parameter is polynomially bounded in the old 
value of the parameter. The following theorem 
follows quite easily. 


Theorem 2 (See [5,6]) Let R have a polynomial 
kernel. If there is a ppt from Q to R, and a 
polynomial time reduction from R to the classic 
variant of Q, then Q has a polynomial kernel. 


This implies that if we have a ppt from Q to R, 
O° is NP-hard, R° € NP, then when Q has no 
polynomial kernel, R has no polynomial kernel. 


Cross Composition 
Bodlaender et al. [4] introduced the concept 
of cross composition. It gives a more powerful 
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mechanism to show that some problems have 
no polynomial kernel, assuming NP ¢ 
coNP/poly. We need first the definition of a 
polynomial equivalence relation. 


Definition 3. A polynomial equivalence relation 
is an equivalence relation on %* that can be 
decided in polynomial time and has for each n, 
a polynomial number of equivalence classes that 
contain strings of length at most n. 


A typical example may be that strings repre- 
sent graphs and two graphs are equivalent if and 
only if they have the same number of vertices and 
edges. 


Definition 4 Let L be a language, R a polyno- 
mial equivalence relation, and Q a parameterized 
problem. An OR cross composition of L to Q 
(w.r.t. R) is an algorithm that: 


¢ Gets as input a sequence of instances 
S1,...,5, Of L that belong to the same 
equivalence class of R. 

* Uses time, polynomial in }*;_, |s;|. 

* Outputs an instance (s’, k) of Q. 

¢ k is polynomial in max |s;| + logk. 

° (s’,k) € Q if and only if there is ani with 
s EL. 


The definition for an AND cross composition 
is similar; the last condition is replaced by 


(s’,k) € Q if and only if for all i with 
STE ZT. 


Theorem 3 (Bodlaender et al. [4]) Jf we have 
an OR cross composition, or an AND cross com- 
position from an NP-hard language L into a 
parameterized problem Q, then Q does not have 
a polynomial kernel, unless NP © coNP/ poly. 


The main differences with or-composition and 
and-composition are we do not need to start with 
a collection of instances from Q, but can use a 
collection of instances of any NP-hard language; 
the bound on the new value of k usually allows us 
to restrict to collections of at most 2* instances, 
and with the polynomial equivalence relation, we 
can make assumptions on “similarities” between 
these instances. 
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For examples of OR cross compositions, 
and of AND cross compositions, see, e.g., 
[4, 8, 13, 17]. 


Other Models and Improvements 

Different models of compressibility and stronger 
versions of the lower bound techniques have 
been studied, including more general models of 
compressibility (see [11] and [7]), the use of co- 
nondeterministic composition [18], weak com- 
position [15], Turing kernelization [16], and a 
different measure for compressibility based on 
witness size of problems in NP [14]. 


Problems Without Kernels 

Many parameterized problems are known to be 
hard for the complexity class W[1]. As decidable 
problems are known to have a kernel, if and only 
if they are fixed parameter tractable, it follows 
that W[1]-hard problems do not have a kernel, 
unless W[1] = FPT (which would imply that 
the exponential time hypothesis does not hold). 
See, e.g., [10]. 


Cross-References 


Kernelization, Polynomial Lower Bounds 
Kernelization, Preprocessing for Treewidth 
Kernelization, Turing Kernels 


Recommended Reading 


1. Arnborg S, Corneil DG, Proskurowski A (1987) 
Complexity of finding embeddings in a k-tree. SIAM 
J Algebr Discret Methods 8:277—284 

2. Binkele-Raible D, Fernau H, Fomin FV, Lokshtanov 
D, Saurabh S, Villanger Y (2012) Kernel(s) for prob- 
lems with no kernel: on out-trees with many leaves. 
ACM Trans Algorithms 8(5):38 

3. Bodlaender HL, Downey RG, Fellows MR, Hermelin 
D (2009) On problems without polynomial kernels. J 
Comput Syst Sci 75:423-434 

4. Bodlaender HL, Jansen BMP, Kratsch S (2011) 
Cross-composition: a new technique for kerneliza- 
tion lower bounds. In: Schwentick T, Diirr C (eds) 
Proceedings 28th international symposium on theo- 
retical aspects of computer science, STACS 2011, 
Dortmund. Schloss Dagstuhl — Leibnitz-Zentrum 


10. 


11. 


12. 


13. 


14. 


15. 


16. 


17. 


18. 


Kernelization: Exponential Lower Bounds 


fuer Informatik, Leibniz International Proceedings in 
Informatics (LIPIcs), vol 9, pp 165-176 


. Bodlaender HL, Thomassé S, Yeo A (2011) Kernel 


bounds for disjoint cycles and disjoint paths. Theor 
Comput Sci 412:4570-4578 


. Bodlaender HL, Jansen BMP, Kratsch S (2012) Ker- 


nelization lower bounds by cross-composition. CORR 
abs/1206.5941 


. Chen Y, Flum J, Miiller M (2011) Lower bounds for 


kernelizations and other preprocessing procedures. 
Theory Comput Syst 48(4):803-839 


. Cygan M, Kratsch S, Pilipczuk M, Pilipczuk M, 


Wahlstr6m M (2012) Clique cover and graph sep- 
aration: new incompressibility results. In: Czumaj 
A, Mehlhorn K, Pitts AM, Wattenhofer R (eds) 
Proceedings of the 39th international colloquium on 
automata, languages and programming, ICALP 2012, 
Part I, Warwick. Lecture notes in computer science, 
vol 7391. Springer, pp 254-265 


. Dom M, Lokshtanov D, Saurabh S (2009) Incom- 


pressibility through colors and IDs. In: Albers S, 
Marchetti-Spaccamela A, Matias Y, Nikoletseas SE, 
Thomas W (eds) Proceedings of the 36th inter- 
national colloquium on automata, languages and 
programming, ICALP 2009, Part I, Rhodes. Lec- 
ture notes in computer science, vol 5555. Springer, 
pp 378-389 

Downey RG, Fellows MR (2013) Fundamentals of 
parameterized complexity. Texts in computer science. 
Springer, London 

Drucker A (2012) New limits to classical and quan- 
tum instance compression. In: Proceedings of the 
53rd annual symposium on foundations of computer 
science, FOCS 2012, New Brunswick, pp 609-618 
Fortnow L, Santhanam R (2011) Infeasibility of 
instance compression and succinct PCPs for NP. 
J Comput Syst Sci 77:91—106 

Gutin G, Muciaccia G, Yeo: A (2013) (Non- 
)existence of polynomial kernels for the test cover 
problem. Inf Process Lett 113:123-126 

Harnik D, Naor M (2010) On the compressibility of 
NP instances and cryptographic applications. SIAM 
J Comput 39:1667-1713 

Hermelin D, Wu X (2012) Weak compositions and 
their applications to polynomial lower bounds for 
kernelization. In: Rabani Y (ed) Proceedings of the 
22nd annual ACM-SIAM symposium on discrete 
algorithms, SODA 2012, Kyoto. SIAM, pp 104-113 
Hermelin D, Kratsch S, Soltys K, Wahlstrom M, 
Wu X (2013) A completeness theory for polynomial 
(turing) kernelization. In: Gutin G, Szeider S (eds) 
Proceedings of the 8th international symposium on 
parameterized and exact computation, IPEC 2013, 
Sophia Antipolis. Lecture notes in computer science, 
vol 8246. Springer, pp 202-215 

Jansen BMP, Bodlaender HL (2013) Vertex cover 
kernelization revisited — upper and lower bounds for a 
refined parameter. Theory Comput Syst 53:263-299 
Kratsch § (2012) Co-nondeterminism in composi- 
tions: a kernelization lower bound for a Ramsey-type 


Kernelization, Matroid Methods 


problem. In: Proceedings of the 22nd annual ACM- 
SIAM symposium on discrete algorithms, SODA 
2012, Kyoto, pp 114-122 

19. Yap HP (1986) Some topics in graph theory. London 
mathematical society lecture note series, vol 108. 
Cambridge University Press, Cambridge 


Kernelization, Matroid Methods 


Magnus Wahlstr6m 
Department of Computer Science, Royal 
Holloway, University of London, Egham, UK 


Keywords 


Kernelization; Matroids; Parameterized com- 
plexity 


Years and Authors of Summarized 
Original Work 


2012; Kratsch, Wahlstr6m 


Problem Definition 


Kernelization is the study of the power of 
polynomial-time instance simplification and 
preprocessing and relates more generally to 
questions of compact information representation. 
Given an instance x of a decision problem ?, 
with an associated parameter k (e.g., a bound on 
the solution size in x), a polynomial kernelization 
is an algorithm which in polynomial time 
produces an instance x’ of P, with parameter k’, 
such that x € P if and only if x’ € P and 
such that both |x’| and k’ are bounded by p(k) 
for some p(k) = poly(k). A polynomial 
compression is the variant where the output x’ 
is an instance of a new problem P’ (and may not 
have any associated parameter). 

Matroid theory provides the tools for a very 
powerful framework for kernelization and more 
general information-preserving sparsification. As 
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an example application, consider the following 
question. You are given a graph G = (V,E) 
and two sets S$,7 C V of terminal vertices, 
where potentially |V| >> |S|,|7'|. The task is 
to reduce G to a smaller graph G’ = (V’, E’), 
with S,7 C V’ and |V’| bounded by a function 
of |S| + |T|, such that for any sets AC S, BC 
T, the minimum (A, B)-cut in G’ equals that 
in G. Here, all cuts are vertex cuts and may 
overlap A and B (i.e., the terminal vertices are 
also deletable). It is difficult to see how to do this 
without using both exponential time in |S| + |7| 
(due to the large number of choices of A and B) 
and an exponential dependency of |V’| on |S| 
and |T| (due to potentially having to include 
one min cut for every choice of A and B), yet 
using the appropriate tools from matroid the- 
ory, we can in polynomial time produce such a 
graph G’ with |V’| = O(|S|-|7|-min(|S|, |7])). 
Call (G,S,T) a terminal cut system; we will 
revisit this example later. 

The main power of the framework comes from 
two sources. The first is a class of matroids 
known as gammoids, which enable the represen- 
tation of graph-cut properties as linear indepen- 
dence of vectors; the second is a tool known as 
the representative sets lemma (due to Lovasz [4] 
via Marx [5]) applied to such a representation. To 
describe these closer, we need to review several 
definitions. 


’ 


Background on Matroids 
We provide only the bare essential definitions; 
for more, see Oxley [6]. Also see the relevant 
chapters of Schrijver [8] for a more computa- 
tional perspective and Marx [5] for a concise, 
streamlined, and self-contained presentation of 
the issues most relevant to our concerns. For s € 
N, we let [s] denote the set {1,..., s}. 

A matroid is a pair M = (V,Z) where V 
is a ground set and IZ C 2" a collection of 
independent sets, subject to three axioms: 


1. @€T. 

2. If Be ZandA C B, then A € Z. 

3. If A,B € Z and |A| < |B\|, then there exists 
some b € (B\A) such that A U {b} € Z. 
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All matroids we deal with will be finite (1.e., have 
finite ground sets). A set S C V is independent 
in M if and only if S € Z. A basis is a maximal 
independent set in M; observe that all bases of 
a matroid have the same cardinality. The rank of 
a set X C V is the maximum cardinality of an 
independent set S C X; again, observe that this 
is well defined. 


Linearly Represented Matroids 

A prime example of a matroid is a linear matroid. 
Let A be a matrix over some field F, and let V 
index the column set of A. Let Z contain exactly 
those sets of columns of A that are linearly in- 
dependent. Then M = (V,7Z) defines a matroid, 
denoted M(A), known as a linear matroid. For 
an arbitrary matroid M, if M is isomorphic to 
a linear matroid M(A) (over a field F), then MW 
is representable (over F), and the matrix A rep- 
resents M. Observe that this is a compact rep- 
resentation, as |Z| would in the general case be 
exponentially large, while the matrix A would 
normally have a coding size polynomial in |V|. 
In general, more powerful tools are available for 
linearly represented matroids than for arbitrary 
matroids (see, e.g., the MATROID MATCHING 
problem [8]). In particular, this holds for the 
representative sets lemma (see below). 


Gammoids 
The class of matroids central to our concern 
is the class of gammoids, first defined by Per- 
fect [7]. Let G = (V, E) be a (possibly directed) 
graph, S C V aset of source vertices, and T C 
V a set of sink vertices (where S and T may 
overlap). Let X¥ C T be independent if and only 
if there exists a collection of |X| pairwise vertex- 
disjoint directed paths in G, each path starting 
in S and ending in X; we allow paths to have 
length zero (e.g., we allow a path from a vertex 
x € SMX to itself). This notion of independence 
defines a matroid on the ground set 7’, referred 
to as the gammoid defined by G, S, and T. By 
Menger’s theorem, the rank of a set X C T 
equals the cardinality of an (S, X)-min cut in G. 
Gammoids are representable over any suffi- 
ciently large field [6], although only randomized 
procedures for computing a representation are 
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known. An explicit randomized procedure was 
given in [2], computing a representation of the 
gammoid (G, S, T) in space (essentially) cubic in 
|S|+ |7|. Hence, gammoids imply a polynomial- 
sized representation of terminal cut systems, as 
defined in the introduction. This has implications 
in kernelization [2], though it is not on its own the 
most useful form, since it is not a representation 
in terms of graphs. 


Representative Sets 

Let M = (V,Z) be a matroid, and X and Y 
independent sets in M. We say that Y extends X 
if X U Y is independent and X N Y = Q. The 
representative sets lemma states the following. 


Lemma 1 ([4,5]) Let M = (V,T) be a linearly 
represented matroid of rank r + s, and let S = 
{S1,...,Sm} be a collection of independent sets, 
each of size s. In polynomial time, we can com- 
pute a set S* C S such that |S*| < ( and 
for any independent set X, there is a set S € S 
that extends X if and only if there is a set S’ € S* 
that extends X. 


We refer to S* as a representative set for S 
in M. This result is due to Lovasz [4], made 
algorithmic by Marx [5]; recently, Fomin et al. [1] 
improved the running time and gave algorithmic 
applications of the result. The power of the 
lemma is extended by several tools which 
construct new linearly represented matroids from 
existing ones; see Marx [5]. For a particularly 
useful case, for each i € [s] let M; = (V;,Z;) be 
a matroid, where each set V; is a new copy of an 
original ground set V. Given a representation of 
these matroids over the same field F, we can form 
a represented matroid M = (Vj U...U Vs, Z; x 
... x Zs) as a direct sum of these matroids, where 
an independent set X in M is the union of an 
independent set X; in M; for each 7. For an 
element v € V, let v(i) denote the copy of v 
in V;. Then the set {v(1),..., v(s)} extends X = 
X,U...U Xz if and only if {v(i)} extends X; for 
each i € [s]. In other words, we have constructed 
an AND operation for the notion of an 
extending set. 
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Closest Sets and Gammoids 

We need one last piece of terminology. For 
a (possibly directed) graph G = (V,£) and 
sets A,X C_ V, let Rg(A,X) be the set 
of vertices reachable from A in G\X. The 
set X is closest to A if there is no set X’ such 
that |X’| < |X| and X’ separates X from A, 
ie., XN Rg(A, X’) = @. This is equivalent to X 
being the unique minimum (A, X)-vertex cut. 
For every pair of sets A, B C V, there is a unique 
minimum (A, B)-vertex cut closest to A, which 
can be computed in polynomial time. Finally, for 
sets S and X, the set X pushed towards S is the 
unique minimum (S,, X)-vertex cut closest to S; 
this operation is well defined and has no effect 
if X is already closest to S. The following is 
central to our applications. 


Lemma 2 Let M be a gammoid defined from 
a graph G = (V,E) and source set S. Let X 
be independent in M, and let X' be X pushed 
towards S. For any v € V, the set {v} extends X 
if and only ifv € Rg(S, X’). 


Key Results 


The most powerful version of the terminal cut 
system result is the following. 


Theorem1 Let G = (V,E) be a (possibly 
directed) graph, and X C V a set of vertices. 
In randomized polynomial time, we can find a 
set Z © V of |Z| = O(|X|3) vertices such 
that for every partition X = AUBUCUD, the 
set Z contains a minimum (A, B)-vertex cut in 
the graph G\ D. 


There is also a variant for cutting into more 
than two parts, as follows. 


Theorem 2 Let G = (V,E) be an undirected 
graph, and X C V a set of vertices. In random- 
ized polynomial time, we can find a set Z C V 
of |Z| = O(|X|S*!) vertices such that for every 
partition of X into at most s parts, the set Z con- 
tains a minimum solution to the corresponding 
multiway cut problem. 


We also have the following further kerneliza- 
tion results; see [3] for problem statements. 
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Theorem 3 The following problems admit ran- 
domized polynomial kernels parameterized by the 
solution size. ALMOST 2-SAT, VERTEX MULTI- 
CUT with a constant number of cut requests, and 
GROUP FEEDBACK VERTEX SET with a constant- 
sized group. 


Applications 


We now review the strategy behind kernelization 
usage of the representative sets lemma. 


Representative Sets: Direct Usage 

There have been various types of applications of 
the representative sets lemma in kernelization, 
from the more direct to the more subtle. We 
briefly review one more direct and one indirect. 
The most direct one is for reducing constraint 
systems. We illustrate with the DIGRAPH PAIR 
CuT problem (which is closely related to a central 
problem in kernelization [3]). Let G = (V, E) 
be a digraph, with a source vertex s € V, and 
let P © V? bea set of pairs. The task is to find 
a set X of at most k vertices (with s ¢ X) such 
that Rg(s, X) does not contain any complete pair 
from P. We show that it suffices to keep O(k7) 
of the pairs ?. For this, replace s by a set S 
of k + 1 copies of s, and let M be the gammoid 
of (G,S,V). By Lemma 2, if X is closest to $ 
and |X| < k, then {u,v} C Rg(s, X) if and only 
if both {u} and {v} extend X in M. Hence, using 
the direct sum construction, we can construct 
a representative set P* C P with |P*| = 
O(k) such that for any set X closest to S, 
the set Rg(s, X) contains a pair {u,v} € P 
if and only if it contains a pair {u’,v’} € P*. 
Furthermore, for an arbitrary set X, pushing X 
towards S yields a set X’ that can only be an im- 
provement on X (i.e., the set of pairs in Rg(s, X) 
shrinks); hence for any set X with |X| < k, 
either pushing X towards S yields a solution to 
the problem, or there is a pair in P* witnessing 
that X is not a solution. Thus, the set P* may be 
used to replace ?, taking the first step towards a 
kernel for the problem. 
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Indirect Usage 

For more advanced applications, we “force” the 
lemma to reveal some set Z of special vertices 
in G, as follows. Let M be a linearly represented 
matroid, and let S = {S(v) : v € V} be 
a collection of subsets of M of bounded size. 
Assume that we have shown that for every z € 
Z, there is a carefully chosen set X(z), such 
that S(v) extends X(z) if and only if v = 
z. Then, necessarily, the representative set S* 
for S must contain S(z) for every z € Z, 
by letting X = X(z) in the statement of the 
lemma. Furthermore, we do not need to provide 
the set X(z) ahead of time, since the (possibly 
non-constructive) existence of such a set X(z) 
is sufficient to force S(z) € S*. Hence, the 
set V* = {v € V: S(v) € S*} must contain Z, 
among a polynomially bounded number of other 
vertices. The critical challenge, of course, is to 
construct the matroid M and sets S(v) and X(z) 
such that S(z) indeed extends X(z), while S(v) 
fails to extend X(z) for every v # z. 

We illustrate the application to reducing ter- 
minal cut systems. Let G = (V, E) be an undi- 
rected graph (the directed construction is similar), 
with S,7 C V, and define a set of vertices Z 
where z € Z if and only if there are sets A C 
S,B Cf T such that every minimum (A, B)- 
vertex cut contains z. We wish to learn Z. Let a 
sink-only copy of a vertex v € V be a copy v’ 
of v with all edges oriented towards v’. Then 
the following follows from Lemma 2 and the 
definition of closest sets. 


Lemma3 Let A,B C V, and let X bea 
minimum (A, B)-vertex cut. Then a vertex v € 
V is a member of every minimum (A, B)-vertex 
cut if and only if {v'} extends X in both the 
gammoid (G, A, V) and the gammoid (G, B,V). 


Via a minor modification, we can replace 
the former gammoid by the gammoid (G, S, V) 
and the latter by (G,7,V) (for appropriate 
adjustments to the set X); we can then compute 
a set V* of O(|S|-|7|-&) vertices (where k is 
the size of an (S, 7)-min cut) which contains Z. 
From this, we may compute the sought-after 
smaller graph G’, by iteratively bypassing 
a single vertex veEeV\(SUTUV*) and 
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recomputing V*, until V* US UT = JV; 
observe that bypassing uv does not change the 
size of any (A, B)-min cut. Theorem | follows 
by considering a modification of the graph G, 
and Theorem 2 follows by a generalization of the 
above, pushing into s different directions. 


Further Applications 
A polynomial kernel for MULTIWAY CUT (in the 
variants with only s terminals or with deletable 
terminals) essentially follows from the above, but 
the further kernelization applications in Theo- 
rem 3 require a few more steps. However, they 
follow a common pattern: First, we find an ap- 
proximate solution X of size poly(k) to “boot- 
strap” the process; second, we use X to transform 
the problem into a more manageable form (e.g., 
for ALMOST 2-SAT, this manageable form is DI- 
GRAPH PAIR CUT); and lastly, we use the above 
methods to kernelize the resulting problem. This 
pattern covers the problems listed in Theorem 3. 
Finally, the above results have some 
implications beyond kernelization. In particular, 
the existence of the smaller graph G’ computed 
for terminal cut systems, and correspondingly an 
implementation of a gammoid as a graph with 
poly(|S| + |7'|) vertices, was an open problem, 
solved in [3]. 


Cross-References 


Kernelization, Exponential Lower Bounds 
Kernelization, Polynomial Lower Bounds 
Matroids in Parameterized Complexity and Ex- 
act Algorithms 


Recommended Reading 


1. Fomin FV, Lokshtanov D, Saurabh S (2014) Efficient 
computation of representative sets with applications 
in parameterized and exact algorithms. In: SODA, 
Portland, pp 142-151 

2. Kratsch S, Wahlstro6m M (2012) Compression via ma- 
troids: a randomized polynomial kernel for odd cycle 
transversal. In: SODA, Kyoto, pp 94—103 

3. Kratsch S, Wahlstr6m M (2012) Representative sets 
and irrelevant vertices: new tools for kernelization. In: 
FOCS, New Brunswick, pp 450-459 


Kernelization, Max-Cut Above Tight Bounds 


4. Lovasz L (1977) Flats in matroids and geometric 
graphs. In: Proceedings of the sixth British combinato- 
rial conference, combinatorial surveys, Egham, pp 45- 
86 

5. Marx D (2009) A parameterized view on matroid opti- 
mization problems. Theor Comput Sci 410(44):4471- 
4479 

6. Oxley J (2006) Matroid theory. Oxford graduate texts 
in mathematics. Oxford University Press, Oxford 

7. Perfect H (1968) Applications of Menger’s graph the- 
orem. J Math Anal Appl 22:96-111 

8. Schrijver A (2003) Combinatorial optimization: poly- 
hedra and efficiency. Algorithms and combinatorics. 
Springer, Berlin/New York 


Kernelization, Max-Cut Above Tight 
Bounds 


Mark Jones 
Department of Computer Science, Royal 
Holloway, University of London, Egham, UK 


Keywords 


Kernel; Lambda extendible; Max Cut; Parame- 
terization above tight bound 


Years and Authors of Summarized 
Original Work 


2012; Crowston, Jones, Mnich 


Problem Definition 


In the problem MAX CUT, we are given a graph 
G with n vertices and m edges, and asked to 
find a bipartite subgraph of G with the maximum 
number of edges. 

In 1973, Edwards [5] proved that if G is 
connected, then G contains a bipartite subgraph 
with at least 7 + not edges, proving a conjec- 
ture of Erdés. This lower bound on the size of 
a bipartite subgraph is known as the Edwards- 
Erdés bound. The bound is tight — for example, 
it is an upper bound when G is a clique with odd 
number of vertices. Thus, it is natural to consider 
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parameterized MAX CUT above this bound, as 
follows (AEE stands for Above Edwards-Erdés). 


MAX CuT AEE 


Instance: A connected graph G with n ver- 
tices and m edges, and a nonnegative inte- 
gerk. 

Parameter: k. 

Question: Does G have a bipartite subgraph 
with at least > + not + k edges? 


Mahajan and Raman [6], in their first pa- 
per on above-guarantee parameterizations, asked 
whether this problem is fixed-parameter tractable. 
As such, the problem was one of the first open 
problems in above-guarantee parameterizations. 


A-Extendibility and the Poljak-Turzik 

Bound 

In 1982, Poljak and Turzik [8] investigated ex- 
tending the Edwards-Erdés bound to cases when 
the desired subgraph is something other than 
bipartite. To this end, they introduced the notion 
of A-extendibility, which generalizes the notion 
of “bipartiteness.” We will define the slightly 
stronger notion of strong A-extendibility, intro- 
duced in [7], as later results use this stronger 
notion. 

Recall that a block of a graph G is a maximal 
2-connected subgraph of G. The blocks of a 
graph form a partition of its edges, and a vertex 
that appears in two or more blocks is a cut-vertex 
of the graph. 


Definition 1 For a family of graphs J7 and 0 < 
A < 1, we say I7 is strongly A-extendible if the 
following conditions are satisfied: 


1. If G is connected and |G| = 1 or 2, then G € 
TT. 

2. G is in /7 if and only if each of its blocks is in 
TT. 

3. For any real-valued positive weight function w 
on the edges of G, if X C V(G) is such that 
G[X] is connected and G[X],G — X e€ JT, 
then G has a subgraph H ¢ IT that uses all 
the edges of G[X], all the edges of G— X, and 
at least a fraction A (by weight) of the edges 
between X and V(G) \ X. 


1022 


The definition of A-extendibility given in [8] 
is the same as the above, except that the third 
condition is only required when |X| = 2. Clearly 
strong A-extendibility implies A-extendibility; 
it is an open question whether the converse 
holds. 

The property of being bipartite is strongly 
A-extendible for A = 1/2. Other strongly A- 
extendible properties include being acyclic for 
directed graphs (A = 1/2) and being r-colorable 
(A =1/r). 

Poljak and Turzik [8] extended Edwards’ 
result by showing that for any connected 
graph G with n vertices and m edges, and 
any A-extendible property [7, G contains a 
subgraph in JT with at least Am + 1A (n — 1) 
edges. 

Thus, for any A-extendible property 7, we can 
consider the following variation of MAX CUT 
AEE, for any A-extendible [7 (APT stands for 
Above Poljak-Turzik). 


IT-SUBGRAPH APT 


Instance: A connected graph G with n ver- 
tices and m edges, and a nonnegative inte- 
gerk. 

Parameter: k. 

Question: Does G have a subgraph in 7 with 
at least Am + 1A (n — 1) +k edges? 


Key Results 


We sketch a proof of the polynomial kernel result 
for MAX CuT AEE, first shown in [2] (although 
the method described here is slightly different to 
that in [2]). 

For a connected graph G with n vertices and 
m edges, let B(G) denote the maximum number 
of edges of a bipartite subgraph of G, let y(G) = 
= not and let e(G) = B(G) — y(G). Thus, 
for an instance (G,k), our aim is to determine 
whether e(G) > k. 

Now consider a connected graph G with a set 
X of three vertices such that G’ = G — X is 
connected and G[X] = P3, the path with two 
edges. Note that G[X] is bipartite. Let H’ be a 
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subgraph of G’ with 6(G’) edges. As bipartite- 
ness is a 1/2-extendible property, we can create 
a bipartite subgraph H of G using the edges 
of H’, the edges of G[X], and at least half of 
the edges between X and G — X. It follows 
that B(G) > B(G’) + HAVO 4 9. As 
y(G) = y(G’) + ECVGN +2 A 3, we have 
that e(G) = B(G) — y(G) = B(G’)— y(G’) + 
2-2-2 =2(G) +}. 

Consider a reduction rule in which, if there 
exists a set X as described above, we delete X 
from the graph. If we were able to apply such a 
reduction rule 4k times on a graph G, we would 
end up with a reduced graph G’ such that G’ is 
connected and ¢(G) > e(G’) + 4k >0+k, and 
therefore we would know that (G,k) is a YES- 
instance. Of course there may be many graphs for 
which such a set X cannot be found. However, we 
can adapt this idea as follows. Given a connected 
graph G, we recursively calculate a set of vertices 
S(G) and a rational number ¢(G) as follows: 


¢ IfG is aclique or G is empty, then set S(G) = 

® and t(G) = 0. 
¢ IfG contains a set X such that |X| = 3, G’ = 

G — X is connected and G[X] = Ps, then set 

S(G) = S(G’)UX and set t(G) = t(G’)+ 5. 
¢ If G contains a cut-vertex v, then there exist 

non-empty sets of vertices X, Y such that XN 

Y = {v}, G[X] and G[Y] are connected, and 

all edges of G are in G[X] or G[Y]. Then set 

S(G) = S(G[X]) U S(G[Y]) and set t(G) = 

t(G[X]) + t(G[Y]). 

It can be shown that for a connected graph 
G, one of these cases will always hold, and so 
S(G) and t(G) are well defined. In the first case, 
we have that e(G) > O by the Edwards-Erdés 
bound. In the second case, we have already shown 
that €(G) > «(G’) + i. In the third case, we 
have that e(G) = e(G[X]) + e(G[Y]) (note that 
the union of a bipartite subgraph of G[X] and a 
bipartite subgraph of G[Y] is a bipartite subgraph 
of G). It follows that e(G) > t(G). Note also that 
|S(G)| < 12t(G). If we remove S(G) from G, 
the resulting graph can be built by joining disjoint 
graphs at a single vertex, using only cliques as the 
initial graphs. Thus, G — S(G) has the property 
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that each of its blocks is a clique. We call such a 
graph a forest of cliques. 
We therefore get the following lemma. 


Lemma 1 ([2]) Given a connected graph G with 
n vertices and m edges, and an integer k, we can 
in polynomial time either decide that (G,k) is a 
YES-instance of MAX CuT AEE, or find a set S 
of at most 12k vertices such that G — S is a forest 
of cliques. 


By guessing a partition of S and then using 
a dynamic programming algorithm based on the 
structure of G — S, we get a fixed-parameter 
algorithm. 


Theorem 1 ({2]) MAX CuT AEE can be solved 
in time 2°) . n4. 


Using the structure of G — S and the fact that 
|S| < 12k, it is possible (using reduction rules) 
to show first that the number of blocks in G — S 
must be bounded for any No-instance, and then 
that the size of each block must be bounded (see 
[2]). 

Thus, we get a polynomial kernel for MAX 
CuT AEE. 


Theorem 2 ([2]) MAX CuT AEE admits a ker- 
nel with O(k°) vertices. 


Crowston et al. [3] were later able to improve 
this to a kernel with O(k?3) vertices. 


Extensions to [7 -SUBGRAPH APT 

A similar approach can be used to show polyno- 
mial kernels for /T-SUBGRAPH APT, for other 
1 /2-extendible properties. In particular, the prop- 
erty of being an acyclic directed graph is 1/2- 
extendible, and therefore every directed graph 
with n vertices and m arcs has an acyclic sub- 
graph with at least 3 + nt arcs. The problem of 
deciding whether there exists an acyclic subgraph 
with at least 4 + not +k arcs is fixed-parameter 
tractable, and has a O(k?)-vertex kernel [1]. 

The notion of a bipartite graph can be gener- 
alized in the following way. Consider a graph G 
with edges labeled either + or —. Then we say 
G is balanced if there exists a partition V;, V2 
of the vertices of G, such that all edges between 
V; and V2 are labeled — and all other edges are 
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labeled +. (Note that if all edges of a graph are 
labeled —, then it is balanced if and only if it is 
bipartite.) The property of being a balanced graph 
is 1/2-extendible, just as the property of being 
bipartite is. Therefore a graph with n vertices and 
m vertices, and all edges labeled + or —, will 
have a balanced subgraph with at least F + not 
edges. The problem of deciding whether there 
exists a balanced subgraph with at least 7 + 
nt + k edges is fixed-parameter tractable and 
has a O(k*)-vertex kernel [3]. 

Mnich et al. [7] showed that Lemma 1 
applies not just for MAx CUT AEE, but for 
IT-SUBGRAPH APT for any IT which is strongly 
A-extendible for some A (with the bound 12k 
replaced with 4). Thus, I7-SUBGRAPH APT 
is fixed-parameter tractable as long as it is 
fixed-parameter tractable on graphs which are 
close to being a forest of cliques. Using this 
observation, Mnich et al. showed fixed-parameter 
tractability for a number of versions of I[T- 
SUBGRAPH APT, including when JT is the family 
of acyclic directed graphs and when /7 is the set 
of r-colorable graphs. 

Crowston et al. [4] proved the existence of 
polynomial kernels for a wide range of strongly 
4-extendible properties: 


Theorem 3 ([4]) Let 0 < A <_ 1, and let 
be a strongly A-extendible property of (possi- 
bly oriented and/or labeled) graphs. Then II- 
SUBGRAPH APT has a kernel on O(k*) vertices 
if Condition I or 2 holds, and a kernel on O(k?) 
vertices if only Condition 3 holds: 


Le Ages 

2. All orientations and labels (if applicable) of 
the graph K3 belong to I. 

3. IT is a hereditary property of simple or ori- 
ented graphs. 


Open Problems 


The Poljak-Turzik’s bound extends to edge- 
weighted graphs. The weighted version is as 
follows: for a graph G with nonnegative real 
weights on the edges, and a A-extendible family 
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of graphs JT, there exists a subgraph H of G 
such that H e€ JT and H has total weight 
A-w(G) + 1A - MST(G), where w(G) is the 
total weight of G and MS'T(G) is the minimum 
weight of a spanning tree in G. 

Thus, we can consider the weighted versions 
of MAX CuT AEE and J7-SUBGRAPH APT. It 
is known that a weighted equivalent of Lemma 1 
holds (in which all edges in a block of G—S have 
the same weight), and as a result, the integer- 
weighted version of MAX CuT AEE can be 
shown to be fixed-parameter tractable. However, 
nothing is known about kernelization results for 
these problems. In particular, it remains an open 
question whether the integer-weighted version of 
MAX CuT AEE has a polynomial kernel. 
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Problem Definition 


The problem MAXLIN2 can be stated as fol- 
lows. We are given a system of m equations 
in variables x1,...,Xn, Where each equation is 
lier, x; = b;, for some J; C {1,2,...,n} 
and x;,b; € {1,1} and j = 1,...,m. Each 
equation is assigned a positive integral weight 
w;. We are required to find an assignment of 
values to the variables in order to maximize the 
total weight of the satisfied equations. MAXLIN2 
is a well-studied problem, which according to 
HAstad [8] “is as basic as satisfiability.” 


Kernelization, MaxLin Above Average 


Note that one can think of MAXLIN2 as con- 
taining equations, Pick: yi = a; over F. This 
is equivalent to the previous definition by letting 
y; = Oif and only if x; = 1 and letting y; = 1 
if and only if x; = —1 (anda; = 1 if and only 
ifb; = —l anda; = Oif and only if b; = 1). 
We will however use the original definition as this 
was the formulation used in [1]. 

Let W be the sum of the weights of all equa- 
tions in an instance, S, of MAXLIN2 and let 
sat(S') be the maximum total weight of equations 
that can be satisfied simultaneously. To see that 
W/2 is a tight lower bound on sat(S), choose 
assignments to the variables independently and 
uniformly at random. Then W/2 is the expected 
weight of satisfied equations (as the probabil- 
ity of each equation being satisfied is 1/2) and 
thus W/2 is a lower bound. It is not difficult 
to see that this bound is tight. For example, 
consider a system consisting of pairs of equations 


of the form [];<-7x1 = —l, []jey xi = 1 of 
the same weight, for some nonempty sets J C 
{1,2,...,m}. 


As MAXLIN2 is an NP-hard problem, we 
look for parameterized algorithms. We will 
give the basic definitions of fixed-parameter 
tractability (FPT) here and refer the reader to 
[4, 5] for more information. A parameterized 
problem is a subset L C X* x N over a finite 
alphabet »’. L is fixed-parameter tractable (FPT, 
for short) if membership of an instance (x, k) in 
y* x N can be decided in time f(k)|x|°™, 
where ff is a function of the parameter k 
only. 

If we set the parameter, k, of an instance, S, 
of MAXLIN2 to sat($), then it is easy to see 
that there exists an O( f(k)|S|°) algorithm, due 
to the fact that k = sat(S) > W/2 > |S|/2. 
Therefore, this parameter is not of interest (it 
is never small in practice), and a better param- 
eter would be k, where we want to decide if 
sat(S) > W/2 +k. Parameterizing above tight 
lower bounds in this way was first introduced 
in 1997 in [11]. This leads us to define the 
following problem, where AA stands for Above 
Average. 
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MAXLIN2-AA 


Instance: A system S_ of equations 

[lier, x1 = bj, where x;,b; € {—1, 1}, 
j = 1,...,m and where each equation is 
assigned a positive integral weight w; and 
a nonnegative integer k. 


Question: sat(S) > W/2+k? 


The above problem has also been widely stud- 
ied when the number of variables in each equa- 
tion is bounded by some constant, say r, which 
leads to the following problem. 


MAx-r-LIN2-AA 


Instance: A system S_ of equations 
--7.X; = b;, where x;,b; € {-1, 1}, 
ae r, J _ 1,...,m; pane J : 
assigned a positive integral weight w; and 
a nonnegative integer k. 
Question: sat(S) > W/2+k? 


Given a parameterized problem, /7, a kernel 
of IT is a polynomial-time algorithm that maps an 
instance (J, k) of IT to another instance, (/’, k’), 
of IT such that (i) (J,k) € JT if and only if 
(’,k’) € I’, (ii) k’ < f(&), and (iii) |Z’| < 
g(k) for some functions f and g. The function 
g(k) is called the size of the kernel. It is well 
known that a problem is FPT if and only if it has 
a kernel. 

A kernel is called a polynomial kernel if both 
Ff (k) and g(k) are polynomials in k. A great deal 
of research has been devoted to finding small- 
sized kernels and in particular to decide if a 
problem has a polynomial kernel. 

We will show that both the problems stated 
above are FPT and in fact contain kernels with a 
polynomial number of variables. The number of 
equations may be non-polynomial, so these ker- 
nels are not real polynomial kernels. The above 
problems were investigated in a number of pa- 
pers; see [1—3, 6]. 
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Key Results 


We will below outline the key results for both 
MAXLIN2-AA and MAx-r-LIN2-AA. See [1] for 
all the details not given here. 


MaxLin2-AA 

Recall that MAXLIN2-AA considers a system S 
of equations lier, Xi; = b;, where x;,b; € 
{-1, 1}, 7 = 1,...,m and where each equation 
is assigned a positive integral weight w;. Let F 
denote the m different sets 7; in the equations 
of S and let by, = b; and w;, = w; for each 
J =1,2,...,m. 

Let e(x) = per wrbr [lie Xi and note 
that e(x) is the difference between the total 
weight of satisfied and falsified equations. 
Crowston et al. [3] call e(x) the excess and the 
maximum possible value of e(x) the maximum 
excess. 


Remark I Observe that the answer to MAXLIN2- 
AA and MAx-r-LIN2-AA is YES if and only if 
the maximum excess is at least 2k. 


Let A be the matrix over Fz corresponding to 
the set of equations in S, such that aj; = 1 if 
i € I; and 0, otherwise. Consider the following 
two reduction rules, where Rule 1 was introduced 
in [9] and Rule 2 in [6]. 


Reduction Rule 1 ((9]) Jf we have, for a subset 
I of {1,2,...,n}, an equation []jepxi = Dj 
with weight w',, and an equation ||,e, Xi = bY 
with weight wt, then we replace this pair by 
one of these equations with weight w', + wi if 
bi, = bi and, otherwise, by the equation whose 
weight is bigger, modifying its new weight to be 
the difference of the two old ones. If the resulting 
weight is 0, we delete the equation from the 
system. 


Reduction Rule 2 ({6]) Let t = rankA and 
suppose columns a"',...,a"' of A are linearly 
independent. Then delete all variables not in 
{Xi, ene 


Lemma 1 ([6]) Let S’ be obtained from S by 
Rule 1 or 2. Then the maximum excess of S' is 
equal to the maximum excess of S. Moreover, S’ 


. Xi, } from the equations of S. 
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can be obtained from S' in time polynomial in n 
and m. 


If we cannot change a weighted system S$ 
using Rules | and 2, we call it irreducible. Let 
S be an irreducible system of MAXLIN2-AA. 
Consider the following algorithm introduced in 
[3]. We assume that, in the beginning, no equation 
or variable in S is marked. 


ALGORITHM H. 
While the system S is nonempty, do the fol- 
lowing: 


1. Choose an equation [],<, x; = b and mark 
a variable x; such that / € J. 

2. Mark this equation and delete it from the 
system. 

3. Replace every equation [];<,,x; = b’ in 
the system containing x; by []jeray/ Xi = 
bb’, where IAI’ is the symmetric differ- 
ence of J and I’ (the weight of the equation 
is unchanged). 

4. Apply Reduction Rule | to the system. 


The maximum H-excess of S is the maximum 
possible total weight of equations marked by 1 
for S taken over all possible choices in Step 1 of 
H. The following lemma indicates the potential 
power of H. 


Lemma 2 ((3]) Let S be an irreducible system. 
Then the maximum excess of S equals its maxi- 
mum H.-excess. 


Theorem 1 ({1]) There exists an o(n* 
(nm)°)-time algorithm for MAXLIN2-AA[k] 
that returns an assignment of excess of at least 
2k if one exists, and returns NO otherwise. 


In order to prove the above, the authors pick n 
equations e;,...,@, such that their rows in A are 
linearly independent. An assignment of excess at 
least 2k must either satisfy one of these equations 
or falsify them all. If they are all falsified, then the 
value of all variables is completely determined. 
Thus, by Lemma 2, algorithm # can mark one of 
these equations, implying a search tree of depth 
at most 2k and width at most k. This implies the 
desired time bound. 
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Theorem 2 below is proved using M -sum-free 
sets, which are defined as follows (see [3]). Let 
K and M be sets of vectors in F5 such that K C 
M. We say K is M-sum-free if no sum of two 
or more distinct vectors in K is equal to a vector 
in M. 


Theorem 2 ({1]) Let S be an irreducible system 
of MAXLIN2-AA[k] and letk > 1. 1f2k <m< 
min{2”/2k-1) — 1,2” — 2}, then the maximum 
excess of S is at least 2k. Moreover, we can find 


an assignment with excess of at least 2k in time 
O(me), 


Using the above, we can solve the problem 
when 2k < m < 2”7/@k-1) _ 92 and when m > 
ne —1 (using Theorem 1). The case when m < 
2k immediately gives a kernel and the remaining 
case when 2”/@k—-1) _ 9 <m< n2* — 2 can 
be shown to imply that n € O(k? logk), thereby 
giving us the main theorem and corollary of this 
section. 


Theorem 3 ({1]) The problen MAXLIN2- 
AA[k] has a kernel with at most O(k? logk) 
variables. 


Corollary 1 ({1]) The problem MAXLIN2- 
AA[K] can be solved in time 2° ®£®) (nm)9®, 


Max-r-Lin2-AA 

In [6] it was proved that the problem MAx- 
r-LIN2-AA admits a kernel with at most 
O(k*) variables and equations (where r is 
treated as a constant). The bound on the 
number of variables can be improved and it 
was done by Crowston et al. [3] and Kim and 
Williams [10]. The best known improvement is 
by Crowston et al. [1]. 


Theorem 4 ({1]) The problem MAx-r-LIN2- 
AA admits a kernel with at most (2k — 1)r 
variables. 


Both Theorem 4 and a slightly weaker anal- 
ogous result of the results in [10] imply the 
following: 


Lemma 3 ([1,10]) There is an algorithm of run- 
time 2°) + m9 for MAx-r-LIN2-AA. 
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Kim and Williams [10] proved that the last re- 
sult is best possible, in a sense, if the exponential 
time hypothesis holds. 


Theorem 5 ((10]) Jf MAx-3-LIN2-AA can be 
solved in O(2** 2") time for every € > 0, then 
3-SAT can be solved in O(25") time for every 
6 > 0, where n is the number of variables. 


Open Problems 


The kernel for MAXLIN2-AA contains at most 
O(k? logk) variables, but may contain an expo- 
nential number of equations. It would be of in- 
terest to decide if MAXLIN2-AA admits a kernel 
that has at most a polynomial number of variables 
and equations. 
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Problem Definition 


In parameterized complexity, each instance (/, k) 
of a problem comes with an additional parame- 
ter k which describes structural properties of the 
instance, for example, the maximum degree of an 
input graph. A problem is called fixed-parameter 
tractable if it can be solved in f(k)-poly(n) time, 
that is, the super-polynomial part of the running 
time depends only on k. Consequently, instances 
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of the problem can be solved efficiently if k is 
small. 

One way to show fixed-parameter tractability 
of a problem is the design of a polynomial-time 
data reduction algorithm that reduces any input 
instance (J,k) to one whose size is bounded 
in k. This idea is captured by the notion of 
kernelization. 


Definition 1 Let (/,k) be an instance of a pa- 
rameterized problem P, where J € %* denotes 
the input instance and k € NN is a parameter. 
Problem P admits a problem kernel if there 
is a polynomial-time algorithm, called problem 
kernelization, that computes an instance (/’, k’) 
of the same problem P such that: 


* (I,k) isa yes-instance if and only if (1’, k’) is 
a yes-instance, and 
© [| +k < gk) 


for a function g of k only. 


Kernelization gives a performance guarantee 
for the effectiveness of data reduction: 
stances (J,k) with |7| > g(k) are provably 
reduced to smaller instances. Thus, one aim in 
the design of kernelization algorithms is to make 
the function g as small as possible. In particular, 
one wants to obtain kernelizations where g is a 
polynomial function. These algorithms are called 
polynomial problem kernelizations. 

For many parameterized problems, however, 
the existence of such a polynomial problem ker- 
nelization is considered to be unlikely (under 
a standard complexity-theoretic assumption) [4]. 
Consequently, alternative models of parameter- 
ized data reduction, for example Turing kernel- 
ization, have been proposed. 

The concept of partial kernelization offers a 
further approach to obtain provably useful data 
reduction algorithms. Partial kernelizations do 
not aim for a decrease of the instance size but 
for a decrease of some part or dimension of the 
instance. For example, if the problem input is a 
binary matrix with m rows and n columns, the in- 
stance size is O(n-m). A partial kernelization can 
now aim for reducing one dimension of the input, 
for example the number of rows n. Of course, 


in- 
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such a reduction is worthwhile only if we can 
algorithmically exploit the fact that the number 
of rows n is small. Hence, the aim is to reduce 
a dimension of the problem for which there are 
fixed-parameter algorithms. The dimension can 
thus be viewed as a secondary parameter. 
Altogether, this idea is formalized as follows. 


Definition 2 Let (J,k) be an instance of a 
parameterized problem P, where J € * 
denotes the input instance and k is a parameter. 
Let d : X'* — IN be a computable function such 
that P is fixed-parameter tractable with respect 
to d(/). Problem P admits a partial problem ker- 
nel if there is a polynomial-time algorithm, called 
partial problem kernelization, that computes 
an instance (I’,k’) of the same problem such 
that: 


¢ (/,k) isa yes-instance if and only if (7’, k’) is 
a yes-instance, and 
<-d oek Sek) 


for a computable function g. 


Any parameterized problem P which has a par- 
tial kernel for some appropriate dimension d is 
fixed-parameter tractable with respect to k: First, 
one may reduce the original input instance (J, k) 
to the partial kernel (/’, k’). In this partial kernel, 
we have d(/’) < g(k) and, since P can be solved 
in f(d(J’)) - poly(”) time, it can thus be solved 
in f(g(k)) - poly(n) time. 

Using partial problem kernelization instead of 
classic problem kernelization can be motivated by 
the following two arguments. 

First, the function d in the partial problem ker- 
nelization gives us a different goal in the design 
of efficient data reduction rules. For instance, if 
the main parameter determining the hardness of a 
graph problem is the maximum degree, then an 
algorithm that produces instances whose maxi- 
mum degree is O(k) but whose size is unbounded 
might be more useful than an algorithm that 
produces instances whose size is O(k*) but the 
maximum degree is 2(k7). 

Second, if the problem does not admit a 
polynomial-size problem kernel, then it might 
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still admit a partially polynomial kernel, that is, 
a partial kernel in which d(/’) + k’ < poly(k). 

We now give two examples for applications of 
partial kernelizations. 


Key Results 


The partial kernelization concept was initially 
developed to obtain data reduction algorithms 
for consensus problems, where one is given a 
collection of combinatorial objects and one is 
asked to find one object that represents this col- 
lection [2]. 

In the KEMENY SCORE problem, these objects 
are permutations of a set U and the task is to 
find a permutation that is close to these permu- 
tations with respect to what is called Kendall’s 
Tau distance, here denoted by t. The formal 
definition of the (unparameterized) problem is as 
follows. 


Input: A multiset P of permutations of a ground 
set U and an integer £. 

Question: Is there a permutation P such that 
pep t(P, P’) < £2 


The parameter & under consideration is the aver- 
age distance between the input partitions, that is, 


k:= ~ 


u(P, ei) 
{P,P’'}SP 


Observe that, since t can be computed efficiently, 
KEMENY SCORE is fixed-parameter tractable 
with respect to |U|: try all possible permutations 
of U and choose the best one. Hence, if U is 
small, then the problem is easy. Furthermore, 
the number of input permutations is not such a 
crucial feature since KEMENY SCORE is already 
NP-hard for a constant number of permutations; 
the partial kernelization thus aims for a reduction 
of |U| and ignores the—less important—number 
of input permutations. 

This reduction is obtained by removing 
elements in U that are, compared to the other 
elements, in roughly the same position in many 
input permutations. The idea is based on a 
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generalization of the following observation: 
If some element wu is the first element of at 
least 3|/P|/4 input permutations, then this 
element is the first element of an optimal 
partition. Any instance containing such an 
element uv can thus be reduced to an equivalent 
with one less element. 

By removing such elements, one obtains a 
sub-instance of the original instance in which 
every element contributes a value of 16/3 to the 
average distance k between the input permuta- 
tions. This leads to the following result. 


Theorem 1 ({3]) KEMENY SCORE admits a 
partial kernel with |U| < 16/3k. 


Further partial kernelizations for consensus 
problems have been obtained for CONSENSUS 
CLUSTERING [2, 7] and SWAP MEDIAN 
PARTITION [2], the partial kernelization for 
KEMENY SCORE has been experimentally 
evaluated [3]. 

Another application of partial kernelization 
has been proposed for covering problems such as 
SET COVER [1]. 


Input: A family S of subsets of a ground set U. 

Question: Is there a subfamily S’ C S of size 
at most £ such that every element in U is 
contained in at least one set of S’? 


If £ > |U|, then SET COVER has a trivial solution. 
Thus, a natural parameter is the amount that 
can be saved compared to this trivial solution, 
that is, kK := |U| — £. A polynomial prob- 
lem kernelization for SET COVER parameterized 
by k is deemed unlikely, again under standard 
complexity-theoretic assumptions. There is, how- 
ever, a partially polynomial problem kernel. The 
dimension d is the universe size |U |. SET COVER 
is fixed-parameter tractable with respect to |U| 
as it can be solved in f(|U|) - poly(7) time, for 
example, by dynamic programming. 

The idea behind the partial kernelization is to 
greedily compute a subfamily 7 C S of size k. 
Then, it is observed that either this subfamily has 
a structure that can be used to efficiently compute 
a solution of the problem, or |U| < 2k? — 2, or 
there are elements in U whose removal yields an 
equivalent instance. Altogether this leads to the 
following. 
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Theorem 2 ({1]) SET COVER admits a partial 
problem kernel with |U| < 2k? — 2. 


Open Problems 


The notion of partial kernelization is quite recent. 
Hence, the main aim for the near future is to iden- 
tify further useful applications of the technique. 
We list some problem areas that contain natural 
candidates for such applications. Problems that 
are defined on set families, such as SET COVER, 
have two obvious dimensions: the number m 
of sets in the set family and the size n of the 
universe. Matrix problems also have two obvi- 
ous dimensions: the number m of rows and the 
number 7 of columns. For graph problems, useful 
dimensions could be identified by examining the 
so-called parameter hierarchy [6, 8]. Here, the 
idea is to find dimensions whose value can be 
much smaller than the number of vertices in the 
graph. If the size |/| of the instance cannot be re- 
duced to be smaller than poly(k), then this might 
be still possible for the smaller dimension d(/). 
A further interesting research direction could be 
to study the relationship between partial kernel- 
ization and other relaxed notions of kernelization 
such as Turing kernelization. 

For some problems, the existence of partially 
polynomial kernels has been proven, but it is still 
unknown whether polynomial kernels exist. One 
such example is MAXLIN2-AA [5]. 
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Problem Definition 


Let r be an integer and let V be a set of n 
variables. An ordering a is a bijection from V to 
{1,2,...,m}; a constraint is an ordered r-tuple 
(v1, V2,...,U;,) of distinct variables of V; a sat- 
isfies (V1, V2,..., Ur) if a@(v1) < a(v2) < +++ < 
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a(v;). An instance of MAXx-r-LIN-ORDERING 
consists of a multiset C of constraints, and the ob- 
jective is to find an ordering that satisfies the max- 
imum number of constraints. Note that MAx-2- 
LIN ORDERING is equivalent to the problem of 
finding a maximum weight acyclic subgraph in an 
integer-weighted directed graph. Since the FEED- 
BACK ARC SET problem is NP-hard, MAx-2- 
LIN ORDERING is NP-hard, and thus MAx-r- 
LIN-ORDERING is NP-hard for each r > 2. 

Let a@ be an ordering chosen randomly and 
uniformly from all orderings and let c € C bea 
constraint. Then the probability that a satisfies c 
is 1/r!. Thus the expected number of constraints 
in C satisfied by w equals |C|/r!. This is a lower 
bound on the maximum number of constraints 
satisfied by an ordering, and, in fact, it is a 
tight lower bound. This allows us to consider the 
following parameterized problem (AA stands for 
Above Average). 


MAx-r-LIN-ORDERING-AA 


Instance: A multiset C of constraints and a 
nonnegative integer k. 

Parameter: k. 

Question: Is there an ordering satisfying at 
least |C|/r! + & constraints? 


(1,2,...,7) is the identity permutation of the 
symmetric group S,. We can extend MAx-r- 
LIN-ORDERING by considering an arbitrary sub- 
set of S, rather than just {(1,2,...,7)}. Instead 
of describing the extension for each arity r > 2, 
we will do it only for r = 3, which is our main 
interest, and leave the general case to the reader. 

Let 7 C S3 = {(1,2,3), (1, 3, 2), (2, 1, 3), 
(2,3, 1), (3, 1,2), (3,2, 1)} be arbitrary. For an 
ordering a: V — {1,2,...,m}, a constraint 
(v1, V2,U3) € C is I-satisfied by a if there is 
a permutation x ¢€ JJ such that a(vzq)) < 
Q(Vz(2)) < a(Uz(3)). Given IT, the problem JT- 
CSP is the problem of deciding if there exists an 
ordering of V that /7-satisfies all the constraints. 
Every such problem is called a Permutation CSP 
of arity 3. We will consider the maximization 
version of these problems, denoted by MAXx-IT- 
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Kernelization, Permutation CSPs Parameterized above Average, Table 1 Permutation CSPs of arity 3 (after 


symmetry considerations) 


ITC 83 Name Complexity 
Hp = {(123)} 3-LIN-ORDERING Polynomial 
IT, = {(123), (132)} Polynomial 
TIh = {(123), (213), (231)} Polynomial 
ITz = {(132), (231), (312), (321)} Polynomial 
IT, = {(123), (231)} NP-comp. 
ITs = {(123), (321)} BETWEENNESS NP-comp. 
IIe = {(123), (132), (231)} NP-comp. 
Tz = {(123), (231), (312)} CIRCULAR ORDERING NP-comp. 
Ig = 83 \ {(123), (231)} NP-comp. 
Ty = 83 \ {(123), (321)} NON-BETWEENNESS NP-comp. 
ITo = 83 \ {(123)} NP-comp. 


CSP, parameterized above the average number of 
constraints satisfied by a random ordering of V 
(which can be shown to be a tight bound). 

It is easy to see that there is only one distinct 
IT-CSP of arity 2. Guttmann and Maucher [5] 
showed that there are in fact only 13 distinct JT- 
CSPs of arity 3 up to symmetry, of which 11 
are nontrivial. They are listed in Table | together 
with their complexity. Some of the problems 
listed in the table are well known and have 
special names. For example, the problem for 
IT = {(123), (321)} is called the BETWEENNESS 
problem. 

Gutin et al. [4] proved that all 11 nontriv- 
ial MAx-IT-CSP problems are NP-hard (even 
though four of the J7-CSP are polynomial). 

Now observe that given a variable set V and 
a constraint multiset C over V, for a random 
ordering a of V, the probability of a constraint 
in C being IT-satisfied by a equals a Hence, 
the expected number of satisfied constraints from 
C is Hc), and thus there is an ordering a 
of V satisfying at least Hil i¢ | constraints (and 
this bound is tight). A derandomization argument 
leads to HI approximation algorithms for the 
problems MAX-J/T-CSP [1]. No better constant 
factor approximation is possible assuming the 
Unique Games Conjecture [1]. 

We will study the parameterization of MAXx- 
IT-CSP above tight lower bound: 


IT-ABOVE AVERAGE (JT-AA) 


Instance: A finite set V of variables, a multi- 
set C of ordered triples of distinct variables 
from V and a nonnegative integer k. 

Parameter: k. 

Question: Is there an ordering a of V such 
that at least ctl |C| + k constraints of C are 
IT-satisfied by a? 


Key Results 


The following is a simple but important observa- 
tion in [4] allowing one to reduce J7-AA to MAX- 
3-LIN-ORDERING-AA. 


Proposition 1 Let IT be a subset of S3 such 
that TT € {@,S3}. There is a polynomial 
time transformation f from TT-AA to MAX- 
3-LIN-ORDERING-AA_ such that an instance 
(V,C,k) of IT-AA is a Yes-instance if and only 
if (V,C’,k) = f(V.C,k) is a Yes-instance of 
MAX-3-LIN-ORDERING-AA. 


Using a nontrivial reduction from MAx-3- 
LIN-ORDERING-AA to a combination of MAX- 
2-LIN-ORDERING-AA and BETWEENNESS- 
AA and the facts that both problems admit 
kernels with quadratic numbers of variables and 
constraints (proved in [3] and [2], respectively), 
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Gutin et al. [4] showed that MAx-3-LIN- 
ORDERING-AA also admits a kernel with 
quadratic numbers of variables and constraints. 
Kim and Williams [6] partially improved this 
result by showing that MAX-3-LIN-ORDERING- 
AA admits a kernel with O(k) variables. 

The polynomial-size kernel result for MAXx-3- 
LIN-ORDERING-AA and Proposition | imply the 
following (see [4] for details): 


Theorem 1 ([4]) Let IT be a subset of S3 
such that IT ¢ {@,S3}. The problem IT-AA 
admits a polynomial-size kernel with O(k?) 
variables. 


Open Problems 


Similar to Proposition 1, it is easy to prove 
that, for each fixed r every [T-AA can be re- 
duced to LIN-r-ORDERING-AA. Gutin et al. [4] 
conjectured that for each fixed 7 the problem 
MAx-r-LIN-ORDERING-AA is fixed-parameter 
tractable. 
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Problem Definition 


Several combinatorial optimization problems 
on graphs involve identifying a subset of 
nodes S, of the smallest cardinality, such that 
the graph obtained after removing S' satisfies 
certain properties. For example, the VERTEX 
COVER problem asks for a minimum-sized 
subset of vertices whose removal makes the 
graph edgeless, while the FEEDBACK VERTEX 
SET problem involves finding a minimum-sized 
subset of vertices whose removal makes the graph 
acyclic. The F-DELETION problem is a generic 
formulation that encompasses several problems 
of this flavor. 

Let F be a finite set of graphs. In the F- 
DELETION problem, the input is an n-vertex 
graph G and an integer k, and the question is if 
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G has a subset S of at most k vertices, such that 
G—S does not contain a graph from F as a minor. 
The optimization version of the problem seeks 
such a subset of the smallest possible size. The 
PLANAR F-DELETION problem is the version 
of the problem where F contains at least one 
planar graph. The #-DELETION problem was 
introduced by [3], who gave a non-constructive 
algorithm running in time O( f(k) -n?) for some 
function f(k). This result was improved by [1] to 
O(f(k) +n), for f(k) = 27°", 

For different choices of sets of forbidden mi- 
nors F, one can obtain various fundamental prob- 
lems. For example, when F = {K2}, a complete 
graph on two vertices, this is the VERTEX COVER 
problem. When F = {C3}, a cycle on three 
vertices, this is the FEEDBACK VERTEX SET 
problem. The cases of F being {K2,3, Ka}, {K4}, 
{6-}, and {K3, T>}, correspond to removing ver- 
tices to obtain an outerplanar graph, a series- 
parallel graph, a diamond graph, and a graph of 
pathwidth one, respectively. 


Tools 

Most algorithms for the PLANAR F-DELETION 
problem appeal to the notion of protrusions in 
graphs. An r-protrusion in a graph G is a sub- 
graph H of treewidth at most r such that the 
number of neighbors of H in G — H is at most r. 
Intuitively, a protrusion H in a graph G may be 
thought of as subgraph of small treewidth which 
is cut off from the rest of the graph by a small 
separator. 

Usually, as a means of preprocessing, pro- 
trusions are identified and replaced by smaller 
ones, while maintaining equivalence. The notion 
of graph replacement in this fashion originates in 
the work of [4]. The modern notion of protru- 
sion reductions have been employed in various 
contexts [2,5,6, 12]. A widely used method for 
developing a protrusion replacement algorithm 
is via the notion of finite-integer index. Roughly 
speaking, this property ensures that graphs can be 
related under some appropriate notion of equiva- 
lence with respect to the problem, and that there 
are only finitely many equivalence classes. This 
allows us to identify the class that the protrusion 
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belongs to and replace it with a canonical repre- 
sentative for that class. 


Key Results 
The algorithms proposed for PLANAR 
F-DELETION usually have the following 


ingredients. First, the fact that F contains a 
planar graph implies that any YES-instance 
of the problem must admit a small subset of 
vertices whose removal leads to a graph of small 
treewidth. It turns out that such graphs admit a 
convenient structure from the perspective of the 
existence of protrusions. In particular, most of 
the graph can be decomposed into protrusions. 
From here, there are two distinct themes. 

In the first approach, the protrusions 
are replaced by smaller, equivalent graphs. 
Subsequently, we have a graph that has no large 
protrusions. For such instances, it can be shown 
that if there is a solution, there is always one 
that is incident to a constant fraction of the edges 
in the graph, and this leads to a randomized 
algorithm by branching. Notably, the protrusion 
replacement can be performed by an algorithm 
that guarantees the removal of a constant fraction 
of vertices in every application. This helps in 
ensuring that the overall running time of the 
algorithm has a linear dependence on the size 
of the input. This algorithm is limited to the case 
when all graphs in F are connected, as is required 
in demonstrating finite-integer index. 


Theorem 1 ((8]) When every graph in F is con- 
nected, there is a randomized algorithm solving 
PLANAR F-DELETION in time 2°) - n. 


The second approach involves exploring the 
structure of the instance further. Here, an O(k)- 
sized subset of vertices is identified, with the key 
property that there is a solution that lives within 
it. The algorithm then proceeds to exhaustively 
branch on these vertices. This technique requires 
a different protrusion decomposition from the 
previous one. The overall algorithm is imple- 
mented using iterative compression. Since the 
protrusions are not replaced, this algorithm works 
for all instances of PLANAR F-DELETION, with- 
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out any further assumptions on the family F. 
While both approaches lead to algorithms that are 
single-exponential in k, the latter has a quadratic 
dependence on the size of the input. 


Theorem 2 ({11]) PLANAR-F-DELETION can 
be solved in time 2° - n2. 


In the context of approximation algorithms, 
the protrusion replacement is more intricate, be- 
cause the notion of equivalence is now more 
demanding. The replacement should preserve not 
only the exact solutions, but also approximate 
ones. By appropriately adapting the machinery of 
replacements with lossless protrusion replacers, 
the problem admits the following approximation 
algorithm. 


Theorem 3 ([8]) PLANAR F-DELETION admits 
a randomized constant ratio approximation algo- 
rithm. 


The PLANAR F-DELETION problem also 
admits efficient preprocessing algorithms. 
Formally, a kernelization algorithm for the 
problem takes an instance (G,k) as input and 
outputs an equivalent instance (H,k’) where 
the size of the output is bounded by a function 
of k. If the size of the output is bounded by 
a polynomial function of k, then it is called a 
polynomial kernel. The reader is referred to the 
survey [13] for a more detailed introduction to 
kernelization. 

The technique of protrusion replacement was 
developed and used successfully for kernelization 
algorithms on sparse graphs [2,5]. These methods 
were also used for the special case of the PLANAR 
F-DELETION problem when F is a graph with 
two vertices and constant number of parallel 
edges [6]. In the general setting of PLANAR 
F-DELETION, kernelization involves anticipat- 
ing protrusions, that is, identifying subgraphs that 
become protrusions after the removal of some 
vertices from an optimal solution. These “near- 
protrusions” are used to find irrelevant edges, 
ie., an edge whose removal does not change 
the problem, leading to natural reduction rules. 
The process of finding an irrelevant edge appeals 
to the well-quasi-ordering of a certain class of 
graphs as a subroutine. 
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Theorem 4 ([8]) PLANAR F-DELETION admits 
a polynomial kernel. 


Applications 


The algorithms for PLANAR F-DELETION 
apply to any vertex deletion problem that 
can be described as hitting minor models of 
some fixed finite family that contains a planar 
graph. 

For a finite set of graphs F, let Gry be a 
class of graphs such that for every G € Gry 
there is a subset of vertices S of size at most 
k such that G \ S has no minor from F. The 
following combinatorial result is a consequence 
of the kernelization algorithm for PLANAR F- 
DELETION. 


Theorem 5 ([8]) For every set F that contains a 
planar graph, every minimal obstruction for GF,x 
is of size polynomial in k. 


Kernelization algorithms on apex-free and 
H-minor-free graphs for all bidimensional 
problems from [5] can be implemented in linear 
time by employing faster protrusion reducers. 
This leads to randomized linear time, linear 
kernels for several problems. 

In the framework for obtaining EPTAS on H- 
minor-free graphs in [7], the running time of 
approximation algorithms for many problems is 
f(l/e) -nP&)), where g is some function 
of H only. The only bottleneck for improv- 
ing polynomial-time dependence is a constant 
factor approximation algorithm for TREEWIDTH 
n-DELETION. Using Theorem 3 instead, each 
EPTAS from [7] runs in time O( f(1/e) « n?). 
For the same reason, the PTAS algorithms for 
many problems on unit disk and map graphs 
from [9] become EPTAS algorithms. 


Open Problems 


An interesting direction for further research is to 
investigate PLANAR F-DELETION when none of 
the graphs in ¥ is planar. The most interesting 
case here is when F = {Ks, K3,3}, also known 
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as the VERTEX PLANARIZATION problem. The 
work in [10] demonstrates an algorithm with 
running time 22"), which notably has a 
linear-time dependence on n. It remains open 
as to whether VERTEX PLANARIZATION can be 
solved in 2?) time. The question of poly- 
nomial kernels in the non-planar setting is also 
open, in particular, even the specific case of F = 
{Ks} is unresolved. 
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Problem Definition 


The work of Dell and van Melkebeek [4] refines 
the framework for lower bounds for kernelization 
introduced by Bodlaender et al. [1] and Fortnow 
and Santhanam [6]. The main contribution is 
that their results yield a framework for proving 
polynomial lower bounds for kernelization rather 
than ruling out all polynomial kernels for a prob- 
lem; this, for the first time, gives a technique for 
proving that some polynomial kernelizations are 
actually best possible, modulo reasonable com- 
plexity assumptions. A further important aspect 
is that, rather than studying kernelization directly, 
the authors give lower bounds for a far more 
general oracle communication protocol. In this 
way, they also obtain strong lower bounds for 
sparsification, lossy compression (in the sense 
of Harnik and Naor [7]), and probabilistically 
checkable proofs (PCPs). 

To explain the connection between kernel- 
ization and oracle communication protocols, let 
us first recall the following. A parameterized 
problem is a language Q C »™* x N; the second 
component k of instances (x,k) € * x N is 
called the parameter. A kernelization for Q with 
size h: N > N is an efficient algorithm that 
gets as input an instance (x,k) € ’* x N and 
returns an equivalent instance (x’,k’), i.e., such 
that (x,k) € Q if and only if (x’,k’) € Q, 
with |x’|,k’ < h(k). If h(k) is polynomially 
bounded in k, then we also call it a polynomial 
kernelization. 

One way to use a kernelization is to first 
simplify a given input instance and then solve the 
reduced instance by any (possibly brute-force) 
algorithm; together this yields an algorithm for 
solving the problem in question. If we abstract 
out the algorithm by saying that the answer for 
the reduced instance is given by an oracle, then 
we atrive at a special case of the following com- 
munication protocol. 


Definition 1 (oracle communication protocol 
[4]) An oracle communication protocol for a 
language L is a communication protocol for two 
players. The first player is given the input x and 
has to run in time polynomial in the length of 
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the input; the second player is computationally 
unbounded but is not given any part of x. At the 
end of the protocol, the first player should be able 
to decide whether x € L. The cost of the protocol 
is the number of bits of communication from the 
first player to the second player. 


As an example, if Q has a kernelization with 
size h, then instances (x,k) can be solved by 
a protocol of cost h(k). It suffices that the first 
player can compute a reduced instance (x’, k’) 
and send it to the oracle who decides mem- 
bership of (x’,k’) in Q; this yields the desired 
answer for whether (x,k) € Q. Note that the 
communication protocol is far more general than 
kernelization because it makes no assumption 
about what exactly is sent (or in what encod- 
ing). More importantly, it also allows multiple 
rounds of communication, and the behavior of 
the oracle could also be active rather than just 
answering queries for the first player. Thus, the 
obtained lower bounds for oracle communication 
protocols are very robust, covering also relaxed 
forms of kernelization (like bikernels and com- 
pressions), and also yield the other mentioned 
applications. 


Key Results 


A central result in the work of Dell and van 
Melkebeek [4] (see also [5]) is the following 
lemma, called complementary witness lemma. 


Lemma 1 (complementary witness lemma [4]) 
Let L be a language and t:N > N \ {0} be 
polynomially bounded such that the problem of 
deciding whether at least one out of t(s) inputs 
of length at most s belongs to L has an oracle 
communication protocol of cost O(t(s) log t(s)), 
where the first player can be conondeterministic. 


Then L € CONP/poly. 


A previous work of Fortnow and San- 
thanam [6] showed that an efficient algorithm 
for encoding any ¢ instances x1,..., x; of size at 
most s into one instance y of size poly(s) such 
that y € L if and only if at least one x; is in 
L implies L € coNP/poly. (We recall that this 
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settled the OR-distillation conjecture of Bodlaen- 
der et al. [1] and allowed their framework to rule 
out polynomial kernels under the assumption 
that NP Z coNP/poly.) Lemma 1 is obtained 
by a more detailed analysis of this result and 
requires an encoding of the OR of f(s) instances 
into one instance of size O(t(s) logt(s)) rather 
than allowing only size poly(s) for all values of 
t. This focus on the number ¢(s) of instances 
in relation to the maximum instance size s is 
the key for getting polynomial lower bounds for 
kernelization (and other applications). In this 
overview, we will not discuss the possibility of 
conondeterministic behavior of the first player, 
but the interested reader is directed to [9, 10] for 
applications thereof. 

Before outlining further results of Dell and van 
Melkebeek [4], let us state a lemma that cap- 
tures one way of employing the complementary 
witness lemma for polynomial lower bounds for 
kernelization. The lemma is already implicit in 
[4] and is given explicitly in follow-up work of 
Dell and Marx [3] (it can also be found in the cur- 
rent full version [5] of [4]). We recall that OR(L) 
refers to the language of all tuples (x1,..., x) 
such that at least one x; is contained in L. 


Lemma 2 ([3,5]) Suppose that a parameterized 
problem IT has the following property for some 
constant c: For some NP-complete language L, 
there exists a polynomial-time mapping reduc- 
tion from OR(L) to IT that maps an instance 
(x1,..-,X+) of OR(L) in which each x; has size 
at most s to an instance of II with parameter 
k < t'/¢+0Q . poly(s). Then IT does not have a 
communication protocol of cost O(k°~*) for any 
constant € > 0 unless NP C coNP/poly, even 
when the first player is conondeterministic. 


Intuitively, Lemma 2 follows from Lemma | 
because if the reduction and communication pro- 
tocol in Lemma 2 both exist (for all ¢), then 
we can choose f(s) large enough (but polyno- 
mially bounded in s) such that for all s we 
get an oracle communication protocol of cost 
O(t(s)) as required for Lemma 1. This implies 
L € coNP/poly and, hence, NP € coNP/poly 
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(since L is NP-complete). As discussed earlier, 
any kernelization yields an oracle communication 
protocol with cost equal to the kernel size and, 
thus, this bound carries over directly to kernel- 
ization. 

Let us now state the further results of Dell and 
van Melkebeek [4] using the context of Lemma 2. 
The central result is the following theorem on 
lower bounds for vertex cover on d-uniform hy- 
pergraphs. 


Theorem 1 ([4]) Let d > 2 be an integer and 
€ a positive real. If NP € coNP/poly, there is 
no protocol of cost O(n4-£) to decide whether a 
d-uniform hypergraph on n vertices has a vertex 
cover of at most k vertices, even when the first 
player is conondeterministic. 


To prove Theorem 1, Dell and van Melkebeek 
devise a reduction from OR(SAT) to CLIQUE 
on d-uniform hypergraphs parameterized by the 
number of vertices (fulfilling the assumption 
of Lemma 2 for c = d). This reduction relies 
on an intricate lemma, the packing lemma, 
that constructs a d-uniform hypergraph with ¢ 
cliques on s vertices each, but having only about 
O(t!/4+eM . s) vertices and no further cliques of 
size Ss. In follow-up work, Dell and Marx [3] give 
a simpler proof for Theorem | without making 
use of the packing lemma, but use the lemma for 
another of their results. 

Note that the stated bound for VERTEX COVER 
in d-uniform hypergraphs follows by comple- 
mentation. Furthermore, since every nontrivial 
instance has k < n, this also rules out kernel- 
ization to size O(k4~-£). The following lower 
bound for SATISFIABILITY is obtained by giving 
a reduction from VERTEX COVER on d-uniform 
hypergraphs with parameter n. In the reduction, 
hyperedges of size d are encoded by positive 
clauses on d variables (one per vertex), and an 
additional part of the formula (which requires 
d > 3) checks that at most k of these variables 
are set to true. 


Theorem 2 ([4]) Let d > 3 be an integer and € 
a positive real. If NP £ coNP/poly, there is no 


Kernelization, Polynomial Lower Bounds 


protocol of cost O(n4—£) to decide whether an n- 
variable d-CNF formula is satisfiable, even when 
the first player is conondeterministic. 


Finally, the following theorem proves that sev- 
eral known kernelizations for graph modification 
problems are already optimal. The theorem is 
proved by a reduction from VERTEX COVER (on 
graphs) with parameter & that is similar in spirit 
to the classical result of Lewis and Yannakakis on 
NP-completeness of the [J-VERTEX DELETION 
problem for nontrivial hereditary properties JT. 
Note that Theorem 3 requires that the property IT 
is not only hereditary, i.e., inherited by induced 
subgraphs, but inherited by all subgraphs. 


Theorem 3 ([4]) Let IT be a graph property that 
is inherited by subgraphs and is satisfied by 
infinitely many but not all graphs. Let € be a 
positive real. If NP € coNP/poly, there is no 
protocol of cost O(k?~€) for deciding whether a 
graph satisfying IT can be obtained from a given 
graph by removing at most k vertices. 


As an example, the theorem implies that 
the FEEDBACK VERTEX SET problem does 
not admit a kernelization with size O(k?~*). 
This is in fact tight since a kernelization by 
Thomassé [12] achieves O(k?) vertices and 
O(k?) edges (cf. [4]); improving to O(k?~£) 
edges is ruled out since it would yield an 
encoding in size O(k2-*’). Similarly, the well- 
known kernelization for VERTEX COVER to 
2k vertices is tight and cannot, in general, be 
expected to yield instances with less than the 
trivial O(k7) edges. 


Applications 


Several authors have used the present approach 
to get polynomial lower bounds for kerneliza- 
tions of certain parameterized problems; see, e.g., 
(2,3,8, 11]. Similarly, some results make use of 
conondeterminism [9, 10] and the more general 
setting of lower bounds for oracle communication 
protocols [11]. 
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Open Problems 


Regarding applications it would be interesting to 
have more lower bounds that use the full gen- 
erality of the oracle communication protocols. 
Furthermore, it is an open problem to relax the 
assumption of NP ¢ coNP/poly to the minimal 
P #~ NP. 
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Problem Definition 


This work undertakes a theoretical study of pre- 
processing for the NP-hard TREEWIDTH prob- 
lem of finding a tree decomposition of width at 
most k for a given graph G. In other words, given 
G andk e€ N, the question is whether G has 
treewidth at most k. Several efficient reduction 
rules are known that provably preserve the correct 
answer, and experimental studies show signifi- 
cant size reductions [3,5]. The present results 
study these and further newly introduced rules 
and obtain upper and lower bounds within the 
framework of kernelization from parameterized 
complexity. 

The general interest in computing tree de- 
compositions is motivated by the well-understood 
approach of using dynamic programming on tree 
decompositions that is known to allow fast al- 
gorithms on graphs of bounded treewidth (but 
with runtime exponential in the treewidth). A 
bottleneck for practical applications is the need 
for finding, as a first step, a sufficiently good tree 
decomposition; the best known exact algorithm 
due to Bodlaender [2] runs in time exponential in 
k3 and is thus only of theoretical interest. This 
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motivates the use of heuristics and preprocessing 
to find a reasonably good tree decomposition 
quickly. 


Tree Decompositions and Treewidth 

A tree decomposition for a graph G = (V, E) 
consists of a tree T = (N,F) and a family 
X := {X; |i € N,X; C V}. The sets X; are 
also called bags and the vertices of T are usually 
referred to as nodes to avoid confusion with G; 
there is exactly one bag X; associated with each 
node i € N. The pair (T,%) must fulfill the 
following three properties: (1) Every vertex of G 
is contained in at least one bag; (2) For each edge 
{u,v} © E there must be a bag X; containing 
both u and v; (3) For each vertex v of G the set 
of nodes i of T with v € X; induce a (connected) 
subtree of 7’. The width of a tree decomposition 
(T, ¥) is equal to the size of the largest bag 
X; € & minus one. The treewidth of a graph G, 
denoted tw(G), is the smallest width taken over 
all tree decompositions of G. 


Parameters 

The framework of parameterized complexity al- 
lows the study of the TREEWIDTH problem with 
respect to different parameters. A parameter is 
simply an integer value associated with each 
problem instance. The standard parameter for 
an optimization problem like TREEWIDTH is the 
desired solution quality k and we denote this 
problem by TREEWIDTH(k). Apart from this, 
structural parameters are considered that capture 
structural aspects of G. For example, the work 
considers the behavior of TREEWIDTH when the 
input graph G has a small vertex cover S, ie., 
such that deletion of € = |S| vertices yields 
an independent set, with & being used as the 
parameter. Similarly, several other parameters are 
discussed, foremost among them the feedback 
vertex set number and the vertex deletion distance 
to a single clique; the corresponding vertex sets 
are called modulators, e.g., a feedback vertex set 
is a modulator to a forest. We denote the aris- 
ing parameterized problems by TREEWIDTH(VC), 
TREEWIDTH(fvs), and TREEWIDTH(vc(G)). To 
decouple the overhead of finding, e.g., a mini- 
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mum vertex cover for G, all these variants as- 
sume that an appropriate modulator is given along 
with the input and the obtained guarantees are 
in terms of the size of this modulator. Since 
all studied parameters can be efficiently approx- 
imated to within a constant factor of the opti- 
mum, not providing an (optimal) modulator gives 
only a constant-factor blowup in the obtained 
results. 


Kernelization 

A kernelization for a problem with parameter £ is 
an efficient algorithm that given an instance (x, £) 
returns an equivalent instance (x’, £’) of size and 
parameter value ¢’ bounded by some computable 
function of £. If the bound is polynomial in @ then 
we have a polynomial kernelization. Specialized 
to, for example, TREEWIDTH(VC) a polynomial 
kernelization would have the following behavior: 
It gets as input an instance (G,S,k), asking 
whether the treewidth of G is at most k, where 
S is a vertex cover for G. In polynomial time it 
creates an instance (G’, S’, k’) such that: (1) The 
size of the instance (G’, S’, k’) and the parameter 
value |,S’| are bounded polynomially in k; (2) The 
set S’ is a vertex cover of G’; (3) The graph G has 
treewidth at most k if and only if G’ has treewidth 
at most k’. 


Key Results 


The kernelization lower bound framework of 
Bodlaender et al. [6] together with recent 
results of Drucker [9] is known to imply 
that TREEWIDTH(kK) admits no polynomial 
kernelization unless NP C coNP/poly and the 
polynomial hierarchy collapses. The present 
work takes a more detailed look at polynomial 
kernelization for TREEWIDTH with respect to 
structural parameters. The results are as follows. 


Theorem 1 TREEWIDTH(vc) parameter- 
ized by vertex cover number, admits a polynomial 
kernelization to an equivalent instance with 


O((ve(G))?) vertices. 


Lé., 
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An interesting feature of this result is that 
it uses only three simple reduction rules that 
are well known and often used (cf. [5]). Two 
rules address so-called simplicial vertices, whose 
neighborhood is a clique, and a third rule inserts 
edges between certain pairs of vertices that have a 
large number of shared vertices. Analyzing these 
empirically successful rules with respect to the 
vertex cover number of the input graph yields a 
kernelization. A fact that nicely complements the 
observed experimental success. 


Theorem 2 TREEWIDTH(fVS) i.e., parameter- 
ized by feedback vertex set number, admits 
a polynomial kernelization to an equivalent 
instance with O((fvs(G))*) vertices. 


The feedback vertex set number of a graph 
is upper bounded by its vertex cover number, 
and forests have feedback vertex set number zero 
but arbitrarily large vertex cover number. Thus, 
for large families of input graphs, this second 
result is stronger. The result again builds on sev- 
eral known reduction rules (including the above 
ones), among others, for handling vertices that 
are almost simplicial, i.e., all but one neighboring 
vertex form a clique. On top of these, several new 
rules are added. One of them addresses a previ- 
ously uncovered case of almost simplicial vertex 
removal, namely, when the vertex has degree 
exactly k + 1, where k is the desired treewidth 
bound. Furthermore, these reduction rules lead 
to a structure dubbed clique-seeing paths, which 
takes a series of fairly technical rules and analysis 
to reduce and bound. Altogether, this combina- 
tion leads to the above result. 


Theorem 3 TREEWIDTH(vC(G)) i.e., parame- 
terized by deletion distance to a single clique, 
admits no polynomial kernelization unless 
NP ¢ coNP/poly and the polynomial hierarchy 
collapses. 


The proof uses the notion of a cross- 
composition introduced by Bodlaender et al. [8], 
which builds directly on the kernelization lower 
bound framework of Bodlaender et al. [6] 
and Fortnow and Santhanam [10]. The cross- 
composition builds on the proof of NP- 
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completeness of TREEWIDTH by Arnborg 
et al. [1], which uses a Karp reduction 


from CUTWIDTH to TREEWIDTH. This con- 
struction is extended significantly to yield a 
cross-composition of CUTWIDTH ON SUBCUBIC 
GRAPHS (i.e., graphs of maximum degree three) 
into TREEWIDTH, which, roughly, requires an 
encoding of many CUTWIDTH instances into a 
single instance of TREEWIDTH with sufficiently 
small parameter. 

Overall, together with previously known re- 
sults, the obtained upper and lower bounds for 
TREEWIDTH cover a wide range of natural pa- 
rameter choices (see the discussion in [7]). If C is 
any graph class that contains all cliques, then the 
vertex deletion distance of a graph G toC is upper 
bounded by vc(G). Thus, TREEWIDTH parame- 
terized by distance to C does not admit a poly- 
nomial kernelization unless NP C coNP/poly. 
This includes several well-studied classes like 
interval graphs, cographs, and perfect graphs. 
Since TREEWIDTH remains NP-hard on bipartite 
graphs, the result for parameterization by feed- 
back vertex set number cannot be generalized to 
vertex deletion to a bipartite graph. It may, how- 
ever, be possible to generalize this parameter to 
vertex deletion distance to an outerplanar graph, 
i.e., planar graphs having an embedding with 
all vertices appearing on the outer face. Since 
these graphs generalize forests, this value is upper 
bounded by the feedback vertex set number. 


Theorem 4 WEIGHTED TREEWIDTH(VC) 
i.e, parameterized by vertex cover number, 
admits no polynomial kernelization unless 
NP ¢ coNP/poly and the polynomial hierarchy 
collapses. 


In the WEIGHTED TREEWIDTH problem, 
each vertex comes with an integer weight, and 
the size of a bag in the tree decomposition 
is defined as the sum of the weights of its 
vertices. (To note, the present paper uses an 
extra deduction of one such that treewidth and 
weighted treewidth coincide for graphs with 
all vertices having weight one.) The result is 
proved by a cross-composition from TREEWIDTH 
(to WEIGHTED TREEWIDTH parameterized by 
vertex cover number) and complements the 
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polynomial kernelization for the unweighted 
case. A key idea for the cross-composition is 
to use a result of Bodlaender and Mohring [4] on 
the behavior of treewidth under the join operation 
on graphs. This is combined with replacing all 
edges (in input graphs and join edges) by using 
a small number of newly introduced vertices of 
high weight. 


Open Problems 


A particular interesting case left open by existing 
results on polynomial kernelization for structural 
parameterizations of TREEWIDTH is the vertex 
deletion distance to outerplanar graphs. 
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Definition and Discussion 


The basic definition of the field expresses ker- 
nelization as a Karp (many-one) self-reduction. 
Classical complexity and recursion theory offers 
quite a lot of alternative and more general notions 
of reducibilities. The most general notion, that 
of a Turing reduction, motivates the following 
definition: 

Let (Q,«) be a parameterized problem over a 
finite alphabet 2’. 


e An input-bounded oracle for (Q, K) is an ora- 
cle that, for any given input x € ©* of (O,k) 
and any bound f, first checks if |x|, |k(x)| < ¢, 
and if this is certified, it decides in constant 
time whether the input x is a YES instance of 
(0, «). 

¢ A Turing kernelization (algorithm) for (Q, k) 
is an algorithm that, provided with access to 
some input-bounded oracle for (Q,x), de- 
cides on input x € &* in polynomial time 
whether x is a YES instance of (Q,x) or 
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not. During its computation, the algorithm can 
produce (polynomially many) oracle queries 
x’ with bound t = h(«(x)), where A is an 
arbitrary computable function. The function 
is referred to as the size of the kernel. 


If only one oracle access is permitted in a run 
of the algorithm, we basically get the classical 
notion of a (many-one or Karp) kernelization. 

A more general definition was given in [4], 
allowing access to a different (auxiliary) problem 
(Q’,«’). As long as there is a computable reduc- 
tion from Q’ to Q, this does not make much of 
a difference, as we could translate the queries to 
Q’ into queries of Q. Therefore, we prefer to use 
the definition given in [1]. 


Out-Branching: Showing 
the Difference 


In [1], the first example of a natural problem is 
provided that admits a Turing kernel of polyno- 
mial size, but (most likely) no Karp kernel of 
polynomial size. We provide some details in the 
following. 


Problem Definition 


A subdigraph 7 of a digraph D is an out-tree 
if T is an oriented tree with only one vertex r 
of indegree zero (called the root). The vertices 
of T of outdegree zero are called leaves. If T 
is a spanning out-tree, ie, V(T) = V(D), 
then T is called an out-branching of D. The 
DIRECTED MAXIMUM LEAF OUT-BRANCHING 
problem is to find an out-branching in a given 
digraph with the maximum number of leaves. 
The parameterized version of the DIRECTED 
MAXIMUM LEAF OUT-BRANCHING problem is 
k-LEAF OUT-BRANCHING, where for a given 
digraph D and integer k, it is asked to decide 
whether D has an out-branching with at least 
k leaves. If we replace “out-branching” with 
“out-tree” in the definition of k-LEAF OUT- 
BRANCHING, we get a problem called k-LEAF 
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OUT-TREE. The parameterization « is set to k in 
both problems. As the two problems are easily 
translatable into each other, we focus on k-LEAF 
OUT-BRANCHING as the digraph analogue of the 
well-known MAXIMUM LEAF SPANNING TREE 
problem. 


Key Results 


It is shown that the problem variant where an 
explicit root is given as additional input, called 
ROOTED k-LEAF OUT-BRANCHING, admits a 
polynomial Karp kernel. Alternatively, this vari- 
ant can be seen as a special case of k-LEAF OUT- 
BRANCHING by adding one vertex of indegree 
zero and outdegree one, pointing to the desig- 
nated root of the original graph. By making a call 
to this oracle for each of the vertices as potential 
roots, this provides a Turing kernelization of 
polynomial size for k-LEAF OUT-BRANCHING. 
This result is complemented by showing that k- 
LEAF OUT-TREE has no polynomial Karp kernel 
unless coNP C NP/poly. 

We list the reduction rules leading to the 
polynomial-size kernel for the rooted version 
in the following. 


Reachability Rule:If there exists a vertex u which 
is disconnected from the root r, then return 
No. 

Useless Arc Rule:If vertex u disconnects a vertex 
v from the root r, then remove the arc vu. 

Bridge Rule:If an arc uv disconnects at least two 
vertices from the root r, contract the arc uv. 

Avoidable Arc Rule:If a vertex set S, |S| < 2, 
disconnects a vertex v from the root r, vw € 
A(D) and xw € A(D) for all x € S, then 
delete the arc uw. 

Two Directional Path Rule:If there is a path P = 
Pip2---Pi-1pi with = 7 orl = 8 such 
that 


* pi and pin € {pi-1, pi} are the only 
vertices with in-arcs from the outside of P 

* pz and Pou € {p1, p2} are the only vertices 
with out-arcs to the outside of P 
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e The path P is the unique out-branching of 
D[V(P)] rooted at p1 

¢ There is a path Q that is the unique out- 
branching of D[V(P)] rooted at pin and 
ending in Pout 

e The vertex after Pou on P is not the same 
as the vertex after p; on O 


then delete R = P \ {P1, Pins Pout, pr} and 
all arcs incident to these vertices from D. 
Add two vertices u and v and the arc set 


{Poutl, uv, UPin, PIV, vu, UP} to D. 


This reduction was simplified and improved 
in [2] by replacing the rather complicated 
last reduction rule by a rule that shortens 
induced bipaths of length four to length two. 
Here, P = {xj,...,x;}, with ] > 3, is an 
induced bipath of length | — | if the set of arcs 
neighbored to {x2,...,X;-;} in D is exactly 
1 4) Cea ae) | te Gy sl — Ty. 
This yielded a Karp kernel with a quadratic 
number of vertices (measured in terms of the 
parameter k) for the rooted version. For directed 
acyclic graphs (DAGs), even a Karp kernel with 
a linear number of vertices is known for the 
rooted version [3]. Notice that also for DAGs 
(in fact, for quite restricted DAGs called willow 
graphs), the unrooted problem versions have no 
polynomial Karp kernel unless coNP C NP/poly, 
as suggested by the hardness proof in [1]. 
Another direction of research is to obtain faster 
kernelization algorithms, often by restricting 
the use (and power) of reduction rules. For the 
k-LEAF OUT-BRANCHING, this was done by 
Kammer [6]. 


Hierarchies Based on Turing Kernels 


Based on the notion of polynomial parametric 
transformation, in [4] an intertwined WK/MK 
hierarchy was defined, in analogy to the well- 
known W/M hierarchy of (hard) parameterized 
problems. The lowest level (MK[1]) corresponds 
to (NP) problems with polynomial-size Karp ker- 
nels. The second-lowest level is WK[1], and this 
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does not equal MK[1] unless coNP C NP/poly. 
Typical complete problems for WK[1] are: 


¢ Given a graph G of order n and an integer k, 
does G contain a clique of size k? Here, the 
parameterization is K(G,k) = k - log(n). 

¢ Given a nondeterministic Turing machine M@ 
and an integer k, does M stop within k steps? 
Here, the parameterization is k(M,k) = k - 
log(|M|). 


As noticed in [4], the CLIQUE problem provides 
also another (less natural) example of a problem 
without polynomial-size Karp kernel that has a 
polynomial-size Turing kernel, taking as param- 
eterization the maximum degree of the input 
graph. 


How Much Oracle Access Is Needed? 


The examples we gave so far make use of oracles 
in a very simple way. More precisely, a very 
weak notion of truth-table reduction (disjunctive 
reduction) is applied. The INDEPENDENT SET 
problem on bull-free graphs [7] seems to provide 
a first example where the power of Turing re- 
ductions is used more extensively, as the oracle 
input is based on the previous computation of 
the reduction. Therefore, it could be termed an 
adaptive kernelization [5]. Yet another way of 
constructing Turing kernels was described by 
Jansen [5]. There, in a first step, the instance is 
decomposed (according to some graph decompo- 
sition in that case), and then the fact is used that 
either a solution is already obtained or it only 
exists in one of the (small) components of the 
decomposition. This framework is then applied 
to deduce polynomial-size Turing kernels, e.g., 
for the problem of finding a path (or a cycle) of 
length at least k in a planar graph G, where k is 
the parameter of the problem. 


Open Problems 


One of the most simple open questions is whether 
LONGEST PATH, i.e., the problem of finding a 
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path of length at least k, admits a polynomial-size 
Turing kernel on general graphs. 

Conversely, no tools have been developed 
so far that allow for ruling out polynomial-size 
Turing kernels. For the question of practical 
applications of kernelization, this would be 
a much stronger statement than ruling out 
traditional Karp kernels of polynomial size, as 
a polynomial number of polynomial-size kernels 
can give a practical solution (see the discussion 
of kK-LEAF OUT-BRANCHING above). 
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Problem Definition 


Many application areas of algorithms research 
involve objects in motion. Virtual reality, 
simulation, air-traffic control, and mobile 
communication systems are just some examples. 
Algorithms that deal with objects in motion 
traditionally discretize the time axis and compute 
or update their structures based on the position of 
the objects at every time step. If all objects move 
continuously then in general their configuration 
does not change significantly between time 
steps — the objects exhibit spatial and temporal 
coherence. Although time-discretization methods 
can exploit spatial and temporal coherence they 
have the disadvantage that it is nearly impossible 
to choose the perfect time step. If the distance 
between successive steps is too large, then 
important interactions might be missed, if it is 
too small, then unnecessary computations will 
slow down the simulation. Even if the time step is 
chosen just right, this is not always a satisfactory 
solution: some objects may have moved only 
slightly and in such a way that the overall data 
structure is not influenced. 

One would like to use the temporal coherence 
to detect precisely those points in time when there 
is an actual change in the structure. The kinetic 
data structure (KDS) framework, introduced by 
Basch et al. in their seminal paper [2], does 
exactly that: by maintaining not only the structure 
itself, but also some additional information, they 
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can determine when the structure will undergo 
a “real” (combinatorial) change. 


Key Results 


A kinetic data structure is designed to maintain 
or monitor a discrete attribute of a set of mov- 
ing objects, for example, the convex hull or the 
closest pair. The basic idea is, that although all 
objects move continuously, there are only certain 
discrete moments in time when the combinatorial 
structure of the attribute changes (in the earlier 
examples, the ordered set of convex-hull vertices 
or the pair that is closest, respectively). A KDS 
therefore contains a set of certificates that consti- 
tutes a proof of the property of interest. Certifi- 
cates are generally simple inequalities that assert 
facts like “point c is on the left of the directed line 
through points a and b.” These certificates are 
inserted in a priority queue (event queue) based 
on their time of expiration. The KDS then per- 
forms an event-driven simulation of the motion 
of the objects, updating the structure whenever 
an event happens, that is, when a certificate fails 
(see Fig. 1). It is part of the art of designing 
efficient kinetic data structures to find a small set 
of simple and easily updatable certificates that 
serve as a proof of the property one wishes to 
maintain. 

A KDS assumes that each object has a known 
motion trajectory or flight plan, which may be 
subject to restrictions to make analysis tractable. 
Two common restrictions would be translation 
along paths parametrized by polynomials of fixed 
degree d, or translation and rotation described 
by algebraic curves. Furthermore, certificates are 
generally simple algebraic equations, which im- 
plies that the failure time of a certificate can be 
computed as the next largest root of an algebraic 
expression. An important aspect of kinetic data 
structures is their on-line character: although the 
positions and motions (flight plans) of the objects 
are known at all times, they are not necessarily 
known far in advance. In particular, any object 
can change its flight plan at any time. A good 
KDS should be able to handle such changes in 
flight plans efficiently. 
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Kinetic Data Structures, 
Fig.1 The basic structure 
of an event based 
simulation with a KDS 


certificates 


correctness proof 


x 


Proof: 


a lies to the left of bc 
d lies to the left of bc ¢, 
oc blies to the right of ad 
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event 
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certificates attribute 
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c lies to the left of ad 


Kinetic Data Structures, Fig. 2. Equivalent convex hull configurations (/eft and right), a proof that a, b, and c form 


the convex hull of S (center) 


Kinetic Data Structures, 
Fig. 3 Certificate structure 
for points a, b, and c being 
stationary and point d 
moving along a straight 
line 


A detailed introduction to kinetic data struc- 
tures can be found in Basch’s Ph. D. thesis [1] or 
in the surveys by Guibas [3, 4]. In the following 
the principles behind kinetic data structures are 
illustrated by an easy example. 

Consider a KDS that maintains the convex 
hull of a set S of four points a,b,c, and d as 
depicted in Fig. 2. A set of four simple certificates 
is sufficient to certify that a,b, and c form indeed 
the convex hull of S (see Fig. 2 center). This 
implies, that the convex hull of S will not change 
under any motion of the points that does not lead 
to a violation of these certificates. To put it dif- 
ferently, if the points move along trajectories that 
move them between the configurations depicted 
in Fig. 2 without the point d ever appearing on 
the convex hull, then the KDS in principle does 
not have to process a single event. 

Now consider a setting in which the points 
a,b, and c are stationary and the point d moves 
along a linear trajectory (Fig. 3 left). Here the 
KDS has exactly two events to process. At time 
t, the certificate “d is to the left of bc” fails as the 


Certificate Failure time 
a lies to the left of bc | never 
d lies to the left of bc | t, 
blies to the right of ad | tz 
~ cies to the left of ad | never 


point d appears on the convex hull. In this easy 
setting, only the failed certificate is replaced by 
“d is to the right of bc” with failure time “never’’, 
generally processing an event would lead to the 
scheduling and descheduling of several events 
from the event queue. Finally at time f, the 
certificates “b is to the right of ad” fails as the 
point b ceases to be on the convex hull and is 
replaced by “b is to the left of ad” with failure 
time “never.” 

Kinetic data structures and their accompany- 
ing maintenance algorithms can be evaluated and 
compared with respect to four desired character- 
istics. 


Responsiveness. One of the most important per- 
formance measures for a KDS is the time 
needed to update the attribute and to repair the 
certificate set when a certificate fails. A KDS 
is called responsive if this update time is 
“small”, that is, polylogarithmic. 

Compactness. A KDS is called compact if the 
number of certificates is near-linear in the 
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total number of objects. Note that this is not 
necessarily the same as the amount of storage 
the entire structure needs. 

Locality. A KDS is called local if every object is 
involved in only a small number of certificates 
(again, “small” translates to polylogarithmic). 
This is important whenever an object changes 
its flight plane, because one has to recompute 
the failure times of all certificates this object 
is involved in, and update the event queue 
accordingly. Note that a local KDS is always 
compact, but that the reverse is not necessarily 
true. 

Efficiency. A certificate failure does not auto- 
matically imply a change in the attribute that 
is being maintained, it can also be an internal 
event, that is, a change in some auxiliary 
structure that the KDS maintains. A KDS 
is called efficient if the worst-case number 
of events handled by the data structure for 
a given motion is small compared to the num- 
ber of combinatorial changes of the attribute 
(external events) that must be handled for that 
motion. 


Applications 


The paper by Basch et al. [2] sparked a large 
amount of research activities and over the last 
years kinetic data structures have been used to 
solve various dynamic computational geometry 
problems. A number of papers deal foremost with 
the maintenance of discrete attributes for sets of 
moving points, like the closest pair, width and di- 
ameter, clusters, minimum spanning trees, or the 
constrained Delaunay triangulation. Motivated by 
ad hoc mobile networks, there have also been 
a number of papers that show how to maintain 
the connected components in a set of moving 
regions in the plane. Major research efforts have 
also been seen in the study of kinetic binary space 
partitions (BSPs) and kinetic kd-trees for various 
objects. Finally, there are several papers that 
develop KDSs for collision detection in the plane 
and in three dimensions. A detailed discussion 
and an extensive list of references can be found 
in the survey by Guibas [4]. 
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Problem Definition 


For a given set of items N = {1,...,m} with 
nonnegative integer weights w; and profits p;, 
jJ = 1,...,n, and a knapsack of capacity c, the 
knapsack problem (KP) is to select a subset of 
the items such that the total profit of the selected 
items is maximized and the corresponding total 
weight does not exceed the knapsack capacity c. 


Knapsack 


Alternatively, a knapsack problem can be for- 
mulated as a solution of the following linear 
integer programming formulation: 


n 
(KP) maximize }* px; (1) 
j=l 


n 
subject to 2 wjxj Sc, (2) 


J=1 


xj €(0,1), fj =1,...,n. (3) 


The knapsack problem is the simplest nontrivial 
integer programming model having binary vari- 
ables, only a single constraint, and only positive 
coefficients. A large number of theoretical and 
practical papers have been published on this prob- 
lem and its extensions. An extensive overview 
can be found in the books by Kellerer, Pferschy, 
and Pisinger [4] or Martello and Toth [7]. 

Adding the integrality condition (3) to the 
simple linear program (1)—(2) already puts (KP) 
into the class of \VP-hard problems. Thus, (KP) 
admits no polynomial time algorithms unless 
P = NP holds. 

Therefore, this entry will focus on approxi- 
mation algorithms for (KP). A common method 
to judge the quality of an approximation algo- 
rithm is its worst-case performance. For a given 
instance J, define by z*(/) the optimal solution 
value of (KP) and by z/ (J) the corresponding 
solution value of a heuristic H. For e €[0,1[, 
a heuristic H is called a (1 — €)-approximation 
algorithm for (KP) if for any instance [ 


2 (2092) 


holds. Given a parameter €, a heuristic H is called 
a fully polynomial approximation scheme, or an 
FTPAS, if H is a (1—«)-approximation algorithm 
for (KP) for any ¢ €[0,1[, and its running time 
is polynomial both in the length of the encoded 
input n and 1/e. The first FTPAS for (KP) was 
suggested by Ibarra and Kim [1] in 1975. It was 
among the early FPTASes for discrete optimiza- 
tion problems. It will be described in detail in the 
following. 
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Key Results 


(KP) can be solved in pseudopolynomial time 
by a simple dynamic programming algorithm. 
One possible variant is the so-called dynamic 
programming by profits (DP-Profits). The main 
idea of DP-Profits is to reach every possible total 
profit value with a subset of items of minimal to- 
tal weight. Clearly, the highest total profit value, 
which can be reached by a subset of weight not 
greater than the capacity c, will be an optimal 
solution. 

Let y;(q) denote the minimal weight of a 
subset of items from {1,..., 7} with total profit 
equal to g. To bound the length of every array y;, 
an upper bound u on the optimal solution value 
has to be computed. An obvious possibility would 
be to use the upper bound U_p = [ zi? | from the 
solution z/? of the LP-relaxation of (KP) and set 
U : = Ujp. It can be shown that Up is at most 
twice as large as the optimal solution value z*. 
Initializing yo(O) := 0 and yo(qg) := c + 1 for 
q = 1,...,U, all other values can be computed 
for 7 = 1,...,n andg = 0,...,U by using the 
recursion 


yi-1 (q) if q < pj, 


WA) ae (yi-1 (9). (i-1 (Q))) if = Dj. 


The optimal solution value is given by 
max{q|yn(q) < c} and the running time of 
DP-Profits is bounded by O(nU). 


Theorem 1 (Ibarra, Kim) There is an FTPAS 
for (KP) which runs in O(nlogn + n/e?) time. 


Proof The FTPAS is based on appropriate scal- 
ing of the profit values p; and then running 
DP-Profits with the scaled profit values. Scaling 
means here that the given profit values p; are 
replaced by new profits p; such that p; := | = | 
for an appropriate chosen constant K. 

This scaling can be seen as a partitioning of 
the profit range into intervals of length K with 
starting points 0, K,2K,.... Naturally, for every 
profit value p;, there is some integer value i > 0 
such that p; falls into the interval [iK, (i + 1)K[. 
The scaling procedure generates for every p; the 
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value p; as the corresponding index i of the 
lower interval bound ik. 

Running DP-Profits yields a solution set X for 
the scaled items which will usually be different 
from the original optimal solution set X*. Evalu- 
ating the original profits of item set X yields the 
approximate solution value z% . The difference 
between z” and the optimal solution value can 
be bounded as follows: 


#2 Eel He Ele 


jEex jExx 
Pj 
> K (= -1) = « —|Xx| K. 
> oe - za —|Xx| 
JEX* 


To get the desired performance guarantee of 1—e, 
it is sufficient to have 


z*x—zt |X| K 
< <e¢ 


Z* ~ x 


To ensure this, K has to be chosen such that 


* 


EZ 
|X| 


= 


(4) 


Since n > |X*| and U_p/2 < z*, choosing K := 
ote satisfies condition (4) and thus guarantees 
the performance ratio of 1 — e. Substituting U in 
the O(nU) bound for DP-Profits by U/K yields 
an overall running time of O(n7e). 

A further improvement in the running time is 
obtained in the following way. Separate the items 
into small items (having profit < 5U_p) and large 
items (having profit > 5U,p). Then, perform DP- 
Profits for the scaled large items only. To each en- 
try q of the obtained dynamic programming array 
with corresponding weight y(q), the small items 
are added to a knapsack with residual capacity 
c — y(q) in a greedy way. The small items shall 
be sorted in nonincreasing order of their profit to 
weight ratio. Out of the resulting combined profit 
values, the highest one is selected. Since every 
optimal solution contains at most 2/¢ large items, 
|X *| can be replaced in (4) by 2/¢ which results 
in an overall running time O(n logn+n/e7). The 
memory requirement of the algorithm is O(n + 


1/e*). 


Knapsack 


Two important approximation schemes with 
advanced treatment of items and algorithmic fine- 
tuning were presented some years later. The clas- 
sical paper by Lawler [5] gives a refined scal- 
ing resp. partitioning of the items and several 
other algorithmic improvements which results in 
a running time O(n log(1/e) + 1/e*). A second 
paper by Magazine and Oguz [6] contains among 
other features a partitioning and recombination 
technique to reduce the space requirements of 
the dynamic programming procedure. The fastest 
algorithm is due to Kellerer and Pferschy [2, 3] 
with running time O(nmin{logn, log(1/e)} + 
1/e7 log(1/e) - min {n, 1/e log(1/e)}) and space 
requirement O(n + 1/e?). 


Applications 


(KP) is one of the classical problems in combi- 
natorial optimization. Since (KP) has this simple 
structure and since there are efficient algorithms 
for solving it, many solution methods of more 
complex problems employ the knapsack problem 
(sometimes iteratively) as a subproblem. 

A straightforward interpretation of (KP) is an 
investment problem. A wealthy individual or in- 
stitutional investor has a certain amount of money 
c available which he wants to put into profitable 
business projects. As a basis for his decisions, he 
compiles a long list of possible investments in- 
cluding for every investment the required amount 
w; and the expected net return p; over a fixed 
period. The aspect of risk is not explicitly taken 
into account here. Obviously, the combination of 
the binary decisions for every investment such 
that the overall return on investment is as large 
as possible can be formulated by (KP). 

One may also view the (KP) as a “cutting” 
problem. Assume that a sawmill has to cut a log 
into shorter pieces. The pieces must however be 
cut into some predefined standard-lengths w; , 
where each length has an associated selling price 
pj; - In order to maximize the profit of the log, 
the sawmill can formulate the problem as a (KP) 
where the length of the log defines the capacity c. 

Among the wide range of “real-world” 
applications shall be mentioned two-dimensional 
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cutting problems, column generation, separation 
of cover inequalities, financial decision problems, 
asset-backed securitization, scheduling problems, 
knapsack cryptosystems, and most recent 
combinatorial auctions. For a survey on 
applications of knapsack problems, the reader 
is referred to [4]. 
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Problem Definition 


What is the role of knowledge in distributed 
computing ? 

Actions taken by a process in a distributed 
system can only be based on its local information 
or local knowledge. Indeed, in reasoning about 
distributed protocols, people often talk informally 
about what processes know about the state of the 
system and about the progress of the computa- 
tion. Can the informal reasoning about knowl- 
edge in distributed and multi-agent systems be 
given a rigorous mathematical formulation, and 
what uses can this have? 


Key Results 


In [4] Halpern and Moses initiated a theory 
of knowledge in distributed systems. They 
suggested that states of knowledge ascribed 
to groups of processes, especially common 
knowledge, have an important role to play. 
Knowledge-based analysis of distributed 
protocols has generalized well-known results 
and enables the discovery of new ones. These 
include new efficient solutions to basic problems, 
tools for relating results in different models, 
and proving lower bounds and impossibility 
results. For example, the inability to attain 
common knowledge when communication is 
unreliable was established in [4] and shown to 
imply and generalize the Coordinated Attack 
problem. Chandy and Misra showed in [1] that in 
asynchronous systems there is a tight connection 
between the manner in which knowledge is 
gained or lost and the message chains that underly 
Lamport’s notion of potential causality. 


Modeling Knowledge 

In philosophy, knowledge is often modeled by 
so-called possible-worlds semantics. Roughly 
speaking, at a given “world,” an agent will know 
a given fact to be true precisely if this fact holds at 
all worlds that the agent “considers possible.” In 
a distributed system, the agents are processes 
or computing elements. A simple language 
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for reasoning about knowledge is obtained by 
starting out with a set ® = {p,q, p’,q’...} 
of propositions, or basic facts. The facts in ® 
will depend on the application we wish to study; 
they may involve statements such as x = 0 
or x > y concerning values of variables or 
about other aspects of the computation (e.g., in 
the analysis of mutual exclusion, a proposition 
CS(i) may be used to state that process i is in 
the critical section). We obtain a logical language 
ees = L¥E(®) for knowledge, which is a set of 
formulas, by the following inductive definition. 
First, p € Vise for all propositions p € @. 
Moreover, for all formulas g,y € Ls , the 
language contains the formulas —@ (standing 
for “not 9”), g A y (standing for “g and w’), 
and K;ig (“process i knows gy”), for every 
process i € {1,...,m} (Using the operators 
“—” and “A,” we can express all of the Boolean 
operators. Thus, g V Ww (“@ or ww’) can be 
expressed as —(—y A -W), while gp>wW (“@ 
implies yw”) is ~y V w, etc.). The language 
ia is the basis of a propositional logic of 
knowledge. Using it, we can make formulas such 
as K,CS(1) A K, K27CS(2), which states that 
“process I knows that it is in the critical section, 
and it knows that process 2 knows that 2 is not in 
the critical section.” LE determines the syntax of 
formulas of the logic. A mathematical definition 
of what the formulas mean is called its semantics. 

A process will typically know different things 
in different computations; even within a com- 
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putation, its knowledge changes over time as a 
result of communication and of observing various 
events. We refer to time f in a run (or computa- 
tion) r by the pair (r, t), which is called a point. 
Formulas are considered to be true or false at a 
point (r,t), with respect to a set of runs R (we 
call R a system). The set of points of R is denoted 
by Pts(R). 

The definition of knowledge is based on the 
idea that at any given point (r,t), each process 
has a well-defined view, which depends on i’s 
history up to time ¢ in r. This view may consist 
of all events that 7 has observed, or on a much 
more restricted amount of information, which is 
considered to be available to i at (r,t). In the 
language of [3], this view can be thought of as 
being process i’s local state at (r,t), which we 
denote by r; (t). Intuitively, a process is assumed 
to be able to distinguish two points iff its local 
state at one is different from its state in the other. 
In a given system R, the meaning of the propo- 
sitions in a set ® needs to be defined explicitly. 
This is done by way of an interpretation nm : ® x 
Pts(R) — {True, False}. The pair Z = (R, 2) 
is called an interpreted system. We denote the 
fact that ¢ is satisfied, or true, at a point (r,t) in 
the system TZ by (Z, r,t) Ey. Semantics of LK () 
with respect to an interpreted system Z is given 
by defining the satisfaction relation “FE” defined 
by induction on the structure of the formulas, as 
follows: 


iff n(p, (r,t) = True, fora proposition p € ® 


Z.r,HnEg Aw iff both (Z,r,t)Eg and (Z,r,HEw 


(Z, rtEKig 


iff (Z,r',t/)=@p whenever r/(t’) = r;(t) and (r’,t’) € Pts(R) 


The fourth clause, which defines satisfaction 
for knowledge formulas, can be applied repeat- 
edly. This gives meaning to formulas involving 
knowledge about formulas that themselves 
involve knowledge, such as K2(CS(2) A 
—=K,7CS(2)). Knowledge here is ascribed to 
processes. The intuition is that the local state 


captures all of the information available to the 
process. If there is a scenario leading to another 
point at which a fact @ is false and the process 
has the same state as it has not, then the process 
does not know ¢. 

This notion of knowledge has fairly strong 
properties that distinguish it from what one might 
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consider a reasonable notion of, say, human 
knowledge. For example, it does not depend 
on computation, thoughts, or a derivation of 
what the process knows. It is purely “information 
based.” Indeed, any fact that holds at all elements 
of Pts(R) (e.g., the protocol that processes 
are following) is automatically known to all 
processes. Moreover, it is not assumed that a 
process can report its knowledge or that its 
knowledge is explicitly recorded in the local 
state. This notion of knowledge can be thought of 
as being ascribed to the processes by an external 
observer and is especially useful for analysis by 
a protocol designer. 


Common Knowledge and Coordinated 
Attack 

A classic example of a problem for which the 
knowledge terminology can provide insight is 
Jim Gray’s Coordinated Attack problem. We 
present it in the style of [4]: 


The Coordinated Attack Problem 

Two divisions of an army are camped on two hill- 
tops, and the enemy awaits in the valley below. 
Neither general will decide to attack unless he is 
sure that the other will attack with him, because 
only a simultaneous attack guarantees victory. 
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The divisions do not initially have plans to attack, 
and one of the commanding generals wishes to 
coordinate a simultaneous attack (at some time 
the next day). The generals can only communi- 
cate by means of a messenger. Normally, it takes 
the messenger | h to get from one encampment 
to the other. However, the messenger can lose his 
way or be captured by the enemy. Fortunately, on 
this particular night, everything goes smoothly. 
How long will it take them to coordinate an 
attack? 

It is possible to show by induction that & trips 
of the messenger do not suffice, for all k > 0, 
and hence the generals will be unable to attack. 
Gray used this example to illustrate the impact 
of unreliable communication on the ability to 
consistently update distinct sites of a distributed 
database. A much stronger result that generalizes 
this and applies directly to practical problems can 
be obtained based on a notion called common 
knowledge. Given a group G C {1,...,} of pro- 
cesses, we define two new logical operators Eg 
and Cg, corresponding to everyone (in G) knows 
and is common knowledge in G, respectively. We 
shall denote F GY=Ecy and inductively define 
Ektlo = Eg(EXg). Satisfaction for the new 
operators is given by 


(Z,r,nEEce iff (,r,t)EKi¢g holds for alli ¢ G 


ZrnoeCee iff 


(Z, r,t) E%@ holds for all k > 1 


Somewhat surprisingly, common knowledge 
is not uncommon in practice. People shake hands 
to signal that they attained common knowledge 
of an agreement, for example. Similarly, a public 
announcement to a class or to an audience is 
considered common knowledge. Indeed, as we 
now discuss, simultaneous actions can lead to 
common knowledge. 

Returning to the Coordinated Attack problem, 
consider three propositions attack,, attackg, 
and delivered, corresponding, respectively, to 
“general A is attacking,’ “general B is attack- 
ing,’ and “at least one message has been de- 
livered.” The fact that the generals do not have 
a plan to attack can be formalized by saying 


that at least one of them does not attack unless 
delivered is true. Consider a set of runs R con- 
sisting of all possible interactions of the generals 
in the above setting. Suppose that the generals 
follow the specifications, so they only ever attack 
simultaneously at points of R. Then, roughly 
speaking, since the generals’ actions depend on 
their local state, general A knows when attack 4 
is true. But since they only attack simultaneously 
and attackg is true whenever attackg is true, 
Kagattackg will hold whenever general A at- 
tacks. Since B similarly knows when A attacks, 
Kp Kagattackg will hold as well. Indeed, it can 
be shown that when the generals attack in a sys- 
tem that guarantees that attacks are simultaneous, 


1054 


they must have common knowledge that they are 
attacking. 


Theorem 1 (Halpern and Moses [4]) Let R be 
a system with unreliable communication, let I = 
(R,2), let (r,t) € Pts(R), and assume that 
|G| > 1. Then (Z,r, t)-=—-Cgdelivered. 


As in the case of Coordinated Attack, si- 
multaneous actions must be common knowledge 
when they are performed. Moreover, in cases in 
which such actions require a minimal amount of 
communication to be materialize, Cgdelivered 
must hold when they are performed. Theorem 1 
implies that no such actions can be coordinated 
when communication is unreliable. One immedi- 
ate consequence is: 


Corollary 1 Under a protocol that satisfies the 
constraints of the Coordinated Attack problem, 
the generals never attack. 


The connection between common knowledge 
and simultaneous actions goes even deeper. It can 
be shown that when a fact that is not common 
knowledge to G becomes common knowledge 
to G, a state transition must occur simultaneously 
at all sites in G. If simultaneity cannot be coor- 
dinated in the system, common knowledge can- 
not be attained. This raises some philosophical 
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issues: Events and transitions that are viewed as 
being simultaneous in a system that is modeled 
at a particular (“coarse”) granularity of time will 
fail to be simultaneous when time is modeled at 
a finer granularity. As discussed in [4], this is 
not quite a paradox, since there are many settings 
in which it is acceptable, and even desirable, to 
model interactions at a granularity of time in 
which simultaneous transitions do occur. 


A Hierarchy of States of Knowledge and 
Common Knowledge 

Common knowledge is a much stronger state of 
knowledge than, say, knowledge of an individ- 
ual process. Indeed, it is best viewed as a state 
of knowledge of a group. There is an essential 
difference between F a (everyone knows that ev- 
eryone knows, for k levels), even for large k, and 
Cg (common knowledge). Indeed, for every k, 
there are examples of tasks that can be achieved if 
EL holds but not if Eko does. This suggests 
the existence of a hierarchy of states of group 
knowledge, ranging from Egg to Cgg. But it 
is also possible to define natural states of knowl- 
edge for a group that are weaker than these. One 
is Sg, where Sg¢q is true if \/;<g Kig — someone 
in G knows . Even weaker is distributed knowl- 
edge, denoted by Dg, which is defined by 


(Z.r.t)EDeg iff (Z.r',t)K¢ for all (r’, t’) satisfying r/(t’) = r;(t) for alli ¢ G 


Roughly speaking, the distributed knowledge 
of a group corresponds to what follows from the 
combined information of the group at a given 
instant. Thus, for example, if all processes start 
out with initial value 1, they will have distributed 
knowledge of this fact, even if no single process 
knows this individually. Halpern and Moses pro- 
pose a hierarchy of states of group knowledge and 
suggest that communication can often be seen as 
the process of moving the state of knowledge up 
the hierarchy: 


Coe> EG E&o > Egg 


=> Sgo => Dey. 


Knowledge Gain and Loss in 

Asynchronous Systems 

In asynchronous systems there are no guarantees 
about the pace at which communication is deliv- 
ered and no guarantees about the relative rates 
at which processes operate. This motivated Lam- 
port’s definition of the happened-before relation 
among events. It is based on the intuition that in 
asynchronous systems only information obtained 
via message chains can affect the activity at a 
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given site. A crisp formalization of this intuition 
was discovered by Chandy and Misra in [1]: 


Theorem 2 (Chandy and Misra) Let T be an 
asynchronous interpreted system, let @ € te 
and let t' > t. Then 


Knowledge Gain: If (Z,r,t)\AK;o 
(Zr,t'\E Kin Kin +++ KiyQ, then 
is a message chain through processes 
(ij, i2,...,im) inr between times t and t’. 

Knowledge Loss: If (Z,1r,t)= Kin Kin, °° 
Ki, g and (Z,r, t')@, then there is a message 
chain through processes (im,im—1,...,41) in 
r between times t and t’. 


and 
there 


Note that the second clause implies that sending 
messages can cause a process to lose knowledge 
about other sites. Roughly speaking, the only 
way a process can know a nontrivial fact about 
a remote site is if this fact can only be changed 
by explicit permission from the process. 


Applications and Extensions 
The knowledge framework has been used in sev- 
eral ways. We have already seen its use for 
proving impossibility results in the discussion of 
the Coordinated Attack example. One interesting 
use of the formalism is as a tool for expressing 
knowledge-based protocols, in which programs 
can contain tests such as if K;(msg received) 
then... , Halpern and Zuck, for example, showed 
that distinct solutions to the sequence transmis- 
sion problem under different assumptions regard- 
ing communication faults were all implementa- 
tions of the same knowledge-based protocol [5]. 
A knowledge-based analysis can lead to 
the design of efficient, sometimes optimal, 
distributed protocols. Dwork and Moses analyzed 
when facts become common knowledge when 
processes can crash and obtained an efficient and 
optimal solution to simultaneous consensus in 
which decisions are taken when initial values 
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become common knowledge [2]. Moses and 
Tuttle showed that in a slightly harsher failure 
model, similar optimal solutions exist, but 
they are not computationally efficient, because 
computing when values are common knowledge 
is NP-hard [7]. A thorough exposition of 
reasoning about knowledge in a variety of fields 
including distributed systems, game theory, and 
philosophy appears in [3], while a later discussion 
of the role of knowledge in coordination, with 
further references, appears in [6]. 
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Problem Definition 


Treewidth is an important and a widely used 
graph parameter. Informally, the treewidth of a 
graph measures how close the graph is to being 
a tree. In particular, low-treewidth graphs often 
exhibit behavior somewhat similar to that of trees, 
in that many problems can be solved efficiently 
on such graphs, often by using dynamic program- 
ming. The treewidth of a graph G = (V, E) 
is typically defined via tree decompositions. A 
tree decomposition for G consists of a tree T = 
(V(T), E(T)) and a collection of sets {X, C 
Vivevr) called bags, such that the following 
two properties are satisfied: (i) for each edge 
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(a,b) € E, there is some node v € V(T) 
with both a,b € Xy, and (ii) for each vertex 
a € VJ, the set of all nodes of T whose bags 
contain a form a nonempty (connected) subtree 
of T. The width of a given tree decomposition is 
maxyey(T){| Xv|—1}, and the treewidth of a graph 
G, denoted by tw(G), is the width of a minimum- 
width tree decomposition for G. 

In large-treewidth graph decompositions, we 
seek to partition a given graph G into a large 
number of disjoint subgraphs G1,...,G,, where 
each subgraph G; has a large treewidth. Specif- 
ically, if k denotes the treewidth of G, h is the 
desired number of the subgraphs in the decom- 
position, and r is the desired lower bound on 
the treewidth of each subgraph G;, then we are 
interested in efficient algorithms that partition 
any input graph G of treewidth k into h disjoint 
subgraphs of treewidth at least r each, and in 
establishing the bounds on / and r in terms of 
k, for which such a partition exists. 


Key Results 


The main result of [1] is summarized in the 
following theorem. 


Theorem 1 There is an efficient algorithm that, 
given integers h,r,k > 0, where either hr? < 
k/ poly logk or h3r < k/polylogk holds, and 
a graph G of treewidth k, computes a partition of 
G into h disjoint subgraphs of treewidth at least 
r each. 
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Applications 


While low-treewidth graphs can often be handled 
well by dynamic programming, the major tool 
for dealing with large-treewidth graphs so far has 
been the Excluded Grid Theorem of Robertson 
and Seymour [11]. The theorem states that there 
is some function g : Zt — ZT", such that 
for any integer ft, every graph of treewidth at 
least g(t) contains a (¢ x f)-grid as a minor 
(we say that a graph H is a minor of G iff we 
can obtain H from G by a sequence of edge 
deletions and edge contractions). A long line of 
work is dedicated to improving the upper and 
the lower bounds on the function g [2, 6, 7, 
9-12]. The best current bounds show that the 
theorem holds for g(t) = O(t°8 -poly log(t)) [2], 
and the best negative result shows that g(t) = 
Q(t? logt) must hold [12]. Robertson et al. [12] 
suggest that g(t) = O(t?logt) may be suf- 
ficient, and Demaine et al. [5] conjecture that 
the bound of g(t) = O(t%) is both necessary 
and sufficient. Large-treewidth graph decompo- 
sition is a tool that allows, in several appli- 
cations, to bypass the Excluded Grid Theorem 
while obtaining stronger parameters. Such ap- 
plications include Erdés-Pésa-type results and 
fixed-parameter tractable algorithms that rely on 
the bidimensionality theory. We note that the 
Excluded Grid Theorem of Robertson and Sey- 
mour provides a large-treewidth graph decom- 
position with weaker bounds. The most recent 
polynomial bounds for the Excluded Grid The- 
orem of [2] only ensure that a partition exists 
for any h,r where h*9r°8 < k/ poly log k. Prior 
to the work of [1], the state-of-the-art bounds 
for the Grid-Minor Theorem could only guar- 
antee that the partition exists whenever hr? < 
logk/loglogk. 

We now provide several examples where the 
large-treewidth graph decomposition theorem 
can be used to improve previously known bounds. 


Erd6s-Posa-Type Results 

A family F of graphs is said to satisfy the 
Erd6és-P6sa property, iff there is an integer-valued 
function f7, such that for every graph G, either 
G contains k disjoint subgraphs isomorphic to 
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members of F, or there is a set S of f¢(k) nodes, 
such that G \ S' contains no subgraph isomorphic 
to a member of ¥. In other words, S is a cover, 
or a hitting set, for F in G. Erdés and Posa [8] 
showed such a property when F is the family of 
cycles, with f-(k) = O(k logk). 

The Excluded Grid Theorem has been widely 
used in proving Erdés-Pésa-type results, where 
the specific parameters obtained depend on the 
best known upper bound on the function g(k) 
in the Excluded Grid Theorem. The parameters 
in many Erdés-Pésa-type results can be signifi- 
cantly strengthened using Theorem 1, as shown 
in the following theorem: 


Theorem 2 Let F be any family of connected 
graphs and assume that there is an integer r, such 
that any graph of treewidth at least r is guaran- 
teed to contain a subgraph isomorphic to a mem- 


ber of F. Then fr(k) < O(kr?poly log(kr)). 


Combining Theorem 2 with the best current 
bound for the Excluded Grid Theorem [2], we 
obtain the following corollary. 


Corollary 1 Let F be any family of connected 
graphs, such that for some integer q, any graph 
containing a q X q grid as a minor is guaranteed 
to contain a subgraph isomorphic to a member of 


F. Then fr(k) < O(q?*kpoly log(kq)). 


For a fixed graph H, let ¥ (#7) be the family of 
all graphs that contain H as a minor. Robertson 
and Seymour [11], as one of the applications 
of their Excluded Grid Theorem, showed that 
F (#1) has the Erdés-Po6sa property iff H is pla- 
nar. By directly applying Corollary 1, we get 
the following improved near-linear dependence 
on k. 


Theorem 3 For any fixed planar graph H, the 
family F (2) of graphs has the Erdés-Poésa prop- 
erty with froay)(k) = O(k - poly log(k)). 


Improved Running Times for 
Fixed-Parameter Tractability 
The theory of bidimensionality [3] is a 
powerful methodology in the design of fixed- 
parameter tractable (FPT) algorithms. It led 
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to sub-exponential (in the parameter k) time 
FPT algorithms for bidimensional parameters 
in planar graphs and more generally graphs 
that exclude a fixed graph H as a minor. The 
theory is based on the Excluded Grid Theorem. 
However, in general graphs, the weak bounds 
of the Excluded Grid Theorem meant that one 
could only derive FPT algorithms with running 
time of the form 2*°n°™, for some large 
constant c, by using the results of Demaine 
and Hajiaghayi [4], and the recent polynomial 
bounds for the Excluded Grid Theorem [2]. 
Using Theorem 1, we can obtain algorithms with 
running times of the form 2* Pyles) OM) for 
the same class of problems as in [4]. 


Open Problems 


The authors conjecture that there is an efficient 
algorithm that, given integers k,r,h with hr < 
k/polylogk, and any graph G of treewidth k, 
finds a partition of G into A disjoint subgraphs of 
treewidth at least r each. This remains an open 
problem. 


Experimental Results 


None are reported. 


URLs to Code and Data Sets 


None are reported. 
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Problem Definition 


As the feature size keeps shrinking, there are in- 
creasing difficulties to print circuit patterns using 
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single litho exposure. For 32/22 nm technology 
nodes, double patterning lithography (DPL) is 
one of the most promising techniques for the 
industry. In DPL, a layout is decomposed into 
two masks, where each feature in the layout is 
uniquely assigned to one of the masks. By using 
two masks which go through two separate expo- 
sures, better printing resolution can be achieved. 
For 14/10 nm technology node and beyond, triple 
patterning lithography (TPL) is one technique to 
obtain qualified printing results. In TPL, a layout 
is decomposed into three masks which further en- 
hance the printing resolution. Currently, DPL and 
TPL are the two most studied multiple patterning 
techniques for advanced technology nodes [1-7]. 
Multiple patterning techniques such as quadruple 
patterning lithography and beyond usually are not 
investigated because of their increasing mask cost 
and other technical issues. 

For DPL/TPL, there is a minimum coloring 
distance din. If the distance of two features is 
less than din, they cannot be printed in the same 
mask. din reflects the printing capabilities of 
current technology and can be redeemed as a 
constant in the problem. One practical concern of 
DPL/TPL is feature splitting, in which a feature 
is split into two or more parts for a legal color 
assignment. Such a splitting is called a stitch, 
which increases manufacturing cost and com- 
plexity due to additional line ends and more tight 
overlay control. Therefore, minimizing the num- 
ber of stitches is a key objective for DPL/TPL 
decompositions. Other concerns include mini- 
mizing design rule violations, maximizing the 
overlap length, and balancing the usage of differ- 
ent colors. Among these concerns for DPL/TPL, 
minimizing the number of stitches is the most 
commonly studied one. 

Multiple patterning decomposition is essen- 
tially a graph k-coloring problem, where k = 
2 for DPL and k = 3 for TPL. It is well 
known that 3-coloring problem is NP-Complete, 
even when the graph is planar. For the general 
layout, ILP formulations are used in [1-3], and 
some heuristics are proposed in [5-7]. In reality, 
many industry designs are based on predesigned 
standard cells, where the layout is usually in row 
structures. All the cells are of exactly the same 
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height, with power rails going from the left most 
of the cell to the right most of it. It is shown 
that multiple patterning decomposition (k = 3) 
for cell-based row structure layout is polynomial 
time solvable [4]. The following discussions are 
based on k = 3. The same concept can be easily 
extended to other multiple patterning techniques 
such ask = 2 ork > 3. 


Problem: Multiple Patterning 

Decomposition 

Using k colors to represent the k masks, multiple 
patterning decomposition can be defined as fol- 
lows: 


Input: Circuit layout and a minimum coloring 
distance dpin- 

Output: A coloring solution where all features 
are assigned to one of the k colors. 

Constraint: Any two features with the distance 
less than dyin Cannot be assigned to the same 
color. 


Key Results 


Multiple Patterning Decomposition for 
Standard Cell Designs 

Given a layout, a conflict graph G = (V, E) is 
constructed, where (I) vertices V = {v,..., Un} 
represent the features in the layout and (II) E = 
{€1,...,€m} represent conflicting relationships 
between the features. A conflict edge exists if 
the distance of the two features is within dyin. 
Imagine a cutting line that goes vertically across 
the cell, there are limited number of features 
that intersect with the cutting line due to the 
fixed height of the cell. Therefore, the coloring 
solutions of each cutting can be enumerated in 
polynomial time. The set of polygons that inter- 
sect with the same cutting line is called a cutting 
line set. An example of conflict graph, cutting 
line, and cutting line set is shown in Fig. 1. 

By using the left boundary of each feature as 
the cutting line, the solutions of each cut line 
are computed. Solutions of adjacent cut lines are 
connected together, which leads to a solution 
graph. Polygon dummy extension is performed 
to ensure that the constructed solution graph is 
legal. For each polygon, its right boundary is 
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Layout Decomposition for Multiple Patterning, Fig. 1 (a) Input layout. (b) Conflict graph. (c) Cutting lines and 
the corresponding cutting line sets. There are four cutting lines L;—L4 and four cutting line sets S;—S4 in this layout 
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Layout Decomposition for Multiple Patterning, 
Fig. 2 (a) Input layout with polygon dummy extension. 
(b) Solution graph. The highlighted path is a sample 


virtually extended to its right most conflicting 
polygon. After extending the right boundaries 
of the polygons, it is guaranteed that for any 
polygon in a cutting line set, all its conflicting 
polygons (with smaller x coordinates) appear 
in the previous cutting line set. Therefore, the 
solution graph can be incrementally constructed 
and the correctness of the graph is guaranteed. 

The solution graph is complete in the sense 
that it explores all the solution space. It is proven 
in [4] that every path in the solution graph cor- 
responds to legal TPL decomposition and every 
legal TPL decomposition corresponds to a path in 
the solution graph. Figure 2 illustrates the overall 
flow of their approach. 


Minimizing Stitches 

The approach can be extended to handle stitches. 
All legal stitch candidates are computed for the 
layout, where a polygon feature is decomposed 
into a set of touching polygons by the stitch 


c 
undecided 
|_| 
mask | 
|_| 
mask 2 
|_| 


mask 3 
decomposition. (c) Sample decomposition. Different 
colors represent different masks 


candidates. Conflict graph G = (V, E) is con- 
structed to model the rectangular layout, where 
(I) vertices V = {v1,...,Un} represent the 
features in the layout and (II) EF = {e1,...,é@m} 
represent different relationships between the fea- 
tures. There are two types of edges in the graph: 
conflict edges and stitch edges. A conflict edge 
exists if the two features do not touch each other 
and their distance is within din. A stitch edge 
exists if the two features touch each other. 

A weighted solution graph is constructed, 
where the weight of an edge denotes the number 
of stitches needed between the two vertices. 
A shortest path algorithm is utilized to get the 
decomposition with optimal number of stitches. 


Multiple Patterning Coloring Constraint 

In practice, there are additional coloring con- 
straints such as balancing the usage of different 
masks [4,8] and assigning the same pattern for the 
same type of cells [9]. For standard cell designs, 
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coloring balancing can be simply achieved by 
using three global variables when parsing the 
solution graph [4]. An efficient SAT formulation 
with limited number of clauses is used to guar- 
antee that the same type of cells has the same 
coloring decomposition [9]. 


Applications 


Products using DPL in 22nm technology node 
are already available in markets. TPL can be used 
in 14/10 nm technology node. 


Open Problems 


None is reported. 


Experimental Results 


The authors in [1] show that as the minimum 
coloring distance din increases, the number of 
unsolved conflicts increases. They also observe 
that the placement utilization has a very small 
impact on the number of unsolved conflicts. The 
results in [3] show that the speedup techniques 
can greatly reduce the overall runtime without 
adversely affecting the quality of the decompo- 
sition. Better results on the same benchmarks 
are reported in [5—7]. The authors in [4] show 
that their algorithm is able to solve all TPL 
decomposable benchmarks. For complex layout 
with stitch candidates, their approach computes 
a decomposition with the optimal number of 
stitches. 


URLs to Code and Data Sets 

The NanGate Open Cell Library can be obtained 
online for free. 
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Problem Definition 


Layout decomposition is a key stage in triple 
patterning lithography manufacturing process, 
where the original designed layout is divided into 
three masks. There will be three exposure/etching 
steps, through which the circuit layout can 
be produced. When the distance between two 
input features is less than certain minimum 
distance min;, they need to be assigned to 
different masks (colors) to avoid coloring 
conflict. Sometimes coloring conflict can be 
resolved by splitting a pattern into two different 
masks. However, this introduces stitches, which 
lead to yield loss because of overlay error. 
Therefore, two of the main objectives in layout 
decomposition are conflict minimization and 
stitch minimization. An example of triple 
patterning layout decomposition is shown in 
Fig. 1, where all features are divided into three 
masks without any conflict and one stitch is 
introduced. 

Given an input layout, a conflict graph is 
constructed to transfer initial geometrical rela- 
tionship into an undirected graph with a set of 
vertices V and two sets of edges, which are 
the conflict edges (CE) and stitch edges (SE), 
respectively. V has one or more vertices for each 


stitch | 
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polygonal shape and each vertex is associated 
with a polygonal shape. An edge is in CE iff the 
two corresponding vertices are within minimum 
coloring distance mins. An edge is in SE iff there 
is a stitch candidate between the two vertices 
which are associated with the same polygonal 
shape. 


Problem 1 (Layout Decomposition for Triple 
Patterning) 


INPUT: The decomposition graph where each 
vertex represents one polygonal shape, and 
all possible conflicts and stitches are in the 
conflict edge set CE and the stitch edge set 
SE, respectively. 

OUTPUT: A three-color assignment to the con- 
flict graph, such that the weighted cost of 
conflicts and stitches are minimized. The ad- 
ditional constraints may include color balanc- 
ing, overlay control, and color preference. 


Key Results 


Given an input layout, the conflict graph is con- 
structed. Based on the conflict graph, the layout 
decomposition for triple patterning can be for- 
mulated as an integer linear programming (ILP) 
formulation [5]. As shown in (1), the objective 
function in the ILP formulation is to minimize 
the weighted cost function of conflict and stitch 
numbers simultaneously: 


min > Cij +a > Sij (1) 
ej ECE e,jeSE 
== lm 
Mask 1 Mask 2 Mask 3 


Layout Decomposition for Triple Patterning, Fig. 1 Layout decomposition for triple patterning lithography (TPL) 
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where @ is a parameter for assigning relative cost 
of stitch versus conflict. Typically, @ is much 
smaller than 1, for example, 0.1, as resolving 
conflict is the most important objective during 
layout decomposition. Although the ILP formu- 
lation can solve the above layout decomposition 
problem optimally, it is not scalable to deal with 
large layouts in modern VLSI designs as the ILP 
problem is NP-complete. 

In [5], a semidefinite programming (SDP)- 
based algorithm was proposed to achieve 
good runtime and solution quality. Instead of 
using a two binary variables to represent three 


masks, three unit vectors (1,0), (-4 7 2), 


3 are proposed to represent 
them. Note that the angle between any two 
vectors of the same color is 0, while the 
angle between any two vectors with different 
colors is 22/3. The inner product of two 
m-dimension vectors yj and vj is defined as 


Vi- Vj = do, Vikvjx- Then for any two vectors 


Viv) € (1,0), (4 8) (-4,-)}, the 


2° 2 2° 2 
following property holds: 


Hae 1, Vi = Vj 
i’ Yj 1 
liyFy;j 


Based on the vector representation, the layout 
decomposition for triple patterning problem can 
be written as the following vector programming: 


2 
min )> (ut s)+F 


speck” 
1 V3 J3 
a) -5.- 2 


s.t. Vi € jc) (-; 
(2a) 


It shall be noted that v; here is discrete, which 
is very expensive to solve. Then the discrete 
vector program is relaxed to the corresponding 
continuous formulation, which can be solved as 
a standard semidefinite programming (SDP), as 
shown below: 


= (1—vi - vj) 


ei ESE 
(2) 
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min )> 5(y: yj+ s)+F Y> (=yi- yi) 


6 ece” e;j €SE 
(3) 
s.t. vic yi = 1, VieV (3a) 
1 
5 <yi-yj, Veij ¢ CE (3b) 
Y>0 (3c) 


The resulting matrix Y, where y;; = yi- Yi, es- 
sentially provides the relative coloring guidance 
between two layout features (nodes in the conflict 
graph). It will be used to guide the final color 
assignment. If y;; is close to 1, nodes i and j 
should be in the same mask; if y;; is close to 
—0.5, nodes i and / tend to be in different masks. 
The results show that with reasonable threshold 
such as 0.9 < yj; < 1 for the same mask 
and —0.5 < y;; < —0.4 for different masks, 
more than 80% of nodes/polygons are decided 
by the global SDP. For the rest values, heuristic 
mapping algorithms will be performed to assign 
all vertices to their final colors. 

A set of graph simplification techniques have 
been proposed to achieve speedup [1, 2,5, 8]. For 
example, one technique is called iterative vertex 
removal, where all vertices with degree less than 
or equal to two are detected and removed tem- 
porarily from the conflict graph. After each vertex 
removal, the degrees of other vertices would be 
updated. This removing process will continue 
until all the vertices have degree three or more. 
All the vertices that are temporarily removed 
are stored in a stack S. After the color assign- 
ment of the remained conflict graph is solved, 
the removed vertices in S are added back for 
coloring assignment. For row-based structure lay- 
out, specific graph-based algorithms are proposed 
to provide fast layout decomposition solutions 
41. 

Triple patterning layout decomposition has 
been actively studied in the last few years, with 
many interesting results reported. In [5], the per- 
formances between ILP- and SDP-based methods 
were compared. As shown in Fig. 2, SDP-based 
method can achieve the same optimal solutions 


Layout Decomposition for Triple Patterning 


250 qT TT TT 
= 200 |- Beene ees dquececins Sieseseesecg ogous Sie ss skate eee tee eas ake a 
e HW Conflict Num 
& DSO) eo Sas Sane ye eS ee Rhee ee nett Stith Num sos |] ih =] 
Be 
0 = — ee er ee ere 4 
+ 
a 
i ee ee =| le! 4 
a 

ee ee o_o |; ae 
Baa AA AaaAaaaeamaeaaaAah aaah aAhaAahAaa Aaa waa 
Fara pA FA FA F868 FPA FA A8 FPA BBA FA Fe Fea 20 
Sa Tan "a “an tan Ma Ma a Man ao MO TH a Man Oo 
cl C2 C3 C4 CS C6 C7 C8 co ClO SI S2 $3 S4~ $5 
Benchmark 


Layout Decomposition for Triple Patterning, Fig. 2. For ISCAS benchmark suite, the results of ILP- and SDP-based 


methods are very comparable 


as obtained by ILP for 14 out of 15 test cases. 
However, the runtime of ILP-based algorithm is 
prohibitive when the problem size is big and the 
layout is dense. Graph simplification techniques 
are very effective to speed up the layout decom- 
position process as that can effectively reduce 
the ILP and SDP problem size. The coloring 
density balance was integrated into the SDP for- 
mulation in [6]. In [4], the SDP framework was 
further extended to handle quadruple patterning 
or more general multiple patterning lithography 
with new vector definition and linear runtime 
heuristic algorithms. 


URLs to Code and Data Sets 


Programs and benchmark suites can be 
found through http://www.cerc.utexas.edu/utda/ 
download/MPLD/. 
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Problem Definition 


This problem is concerned with the learnability of 
multiplicity automata in Angluin’s exact learning 
model and applications to the learnability of func- 
tions represented by small multiplicity automata. 


The Learning Model 

It is the exact learning model [2]: Let f be a 
target function. A learning algorithm may pro- 
pose to an oracle, in each step, two kinds of 
queries: membership queries (MQ) and equiva- 
lence queries (EQ). In a MQ it may query for the 
value of the function f ona particular assignment 
z. The response to such a query is the value f(z). 
(if f is Boolean, this is the standard membership 
query.) In an EQ it may propose to the oracle 
a hypothesis function A. If h is equivalent to 
f on all input assignments, then the answer to 
the query is YES and the learning algorithm 
succeeds and halts. Otherwise, the answer to 
the equivalence query is NO and the algorithm 
receives a counterexample, i.e., an assignment z 
such that f(z) #4 h(z). One says that the learner 
learns a class of functions C, if for every function 


Stefano Varricchio: deceased 
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Ff € C the learner outputs a hypothesis / that is 
equivalent to f and does so in time polynomial 
in the “size” of a shortest representation of f 
and the length of the longest counterexample. 
The exact learning model is strictly related to the 
Probably Approximately Correct (PAC) model of 
Valiant [19]. In fact, every equivalence query can 
be easily simulated by a sample of random exam- 
ples. Therefore, learnability in the exact learning 
model also implies learnability in the PAC model 
with membership queries [2, 19]. 


Multiplicity Automata 

Let K be a field, © be an alphabet, and € be the 
empty string. A multiplicity automaton (MA) A 
of size r consists of || matrices {ttg : 0 € X}, 
each of which is an r x r matrix of elements 
from K and an r-tuple y = (Y1,...,7r) € K". 
The automaton A defines a function f4 : 4* > 
K as follows. First, define a mapping jz, which 
associates with every string in &* anr xr matrix 
over K, by p(e) 2 ID where ID denotes the 
identity matrix, and for a string w = 0102...0n, 
let u(w) = Ho, * Hop +++ fo, A simple property 
of pt is that (x o y) = w(x) - L(y), where o de- 
notes concatenation. Now, f4(@) = [u(@)]1-7 
(where [(w)]; denotes the ith row of the matrix 
[L(w)). Let f : &* > K bea function. Associate 
with f an infinite matrix F, where each of its 
rows is indexed by a string x € X* and each of 
its columns is indexed by a string y € &*. The 
(x, y) entry of F contains the value f(x o y). 
The matrix F is called the Hankel Matrix of f. 
The xth row of F is denoted by F,. The (x, y) 
entry of F may be therefore denoted as F,(y) 
and as Fy. The following result relates the size 
of the minimal MA for f to the rank of F (cf. [4] 
and references therein). 


Theorem 1 Let f : &* — K such that f # 0 
and let F be its Hankel matrix. Then, the size r 
of the smallest multiplicity automaton A such that 
fa = f satisfies r = rank(F) (over the field K). 


Key Results 

The learnability of multiplicity automata has 
been proved in [7] and, independently, in [17]. 
In what follows, let K be a field, f : 5* > K 
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be a function, and F its Hankel matrix such that 
r = rank(F) (over K). 


Theorem 2 ([4]) The function f is learnable by 
an algorithm in time O(|=|-r-M(r)+m-r?) using 
r equivalence queries and O((|=| + logm)r?) 
membership queries, where m is the size of the 
longest counterexample obtained during the exe- 
cution of the algorithm and M(r) is the complex- 
ity of multiplying two r x r matrices. 


Some extensions of the above result can be found 
in [8, 13, 16]. In many cases of interest, the 
domain of the target function f is not &* but 
rather ©” for some value n, i.e., f : 4” > K. 
The length of counterexamples, in this case, is 
always n and som = n. Denote by F% the 
submatrix of F whose rows are strings in uF 
and whose columns are strings in )”~@ and let 
Tmax = Max,’ rank(F @) (where rank is taken 
over K). 


Theorem 3 ([4]) The function f is learnable by 
an algorithm in time O(|X|rn - M(rmax)) using 
O(r) equivalence queries and O((|X| + logn)r- 
Tmax) membership queries. 


The time complexity of the two above results has 
been recently further improved [9]. 


Applications 


The results of this section can be found in [3-6]. 
They show the learnability of various classes 
of functions as a consequence of Theorems 2 
and 3. This can be done by proving that for every 
function f in the class in question, the corre- 
sponding Hankel matrix F' has low rank. As is 
well known, any nondeterministic automaton can 
be regarded as a multiplicity automaton, whose 
associated function returns the number of ac- 
cepting paths of the nondeterministic automaton 
on w. Therefore, the learnability of multiplicity 
automata gives a new algorithm for learning de- 
terministic automata and unambiguous automata. 
(A nondeterministic automata is unambiguous 
if for every w € &%*, there is at most one 
accepting path.) The learnability of deterministic 
automata has been proved in [1]. By [14], the 


1067 


class of deterministic automata contains the class 
of O(logn)-term DNF, i.e., DNF formulae over 
n Boolean variables with O(logm) number of 
terms. Hence, this class can be learned using 
multiplicity automata. 


Classes of Polynomials 
Theorem 4 Let p;,; x => K be arbitrary 
functions of a single variable (1 <i <t, 1 < 
J <n). Let gi x” — K be defined by 
n 
I] pi.j(z;). Finally, let f : X&" > K be defined 
j=l 
t 
by f = > g;. Let F be the Hankel matrix 
i=1 
corresponding to f and F4 the submatrices 
defined in the previous section. Then, for every 
0<d <n, rank(F%) <t. 


Corollary 1 The class of functions that can 
be expressed as functions over GF(p) with 
t summands, where each summand T; is a 
product of the form pj,1(X1)-+* Pin(*n) (and 
Di,j « GF(p) > GF(p) are arbitrary functions), 
is learnable in time poly(n, t, p). 


The above corollary implies as a special case 
the learnability of polynomials over GF(p). This 
extends the result of [18] from multi-linear poly- 
nomials to arbitrary polynomials. The algorithm 
of Theorem 3, for polynomials with n variables 
and f terms, uses O(nt) equivalence queries and 
O(t?nlogn) membership queries. The special 
case of the above class — the class of polynomials 
over GF(2) — was known to be learnable before 
[18]. Their algorithm uses O(nt) equivalence 
queries and O(t?n) membership queries. The 
following theorem extends the latter result to 
infinite fields, assuming that the functions p;,; are 
bounded-degree polynomials. 


Theorem 5 The class of functions over a field K 
that can be expressed as t summands, where each 
summand T; is of the form pi,1(x1)+++ Pin(Xn) 
and pi,j : K — K are univariate polynomi- 
als of degree at most k, is learnable in time 
poly(n,t,k). Furthermore, if |K| > nk + 1, then 
this class is learnable from membership queries 
only in time poly(n,t,k) (with small probability 
of error). 
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Classes of Boxes 

Let [€] denote the set {0,1,...,€— 1}. A box in 
[¢]” is defined by two corners (a1,...,@n) and 
(b1,...,bn) (in [€]”) as follows: 


bp = {(21, «+ «5 Xn): Vi, apn}. 


A box can be represented by its characteristic 
function in [€]”. The following result concerns a 
more general class of functions. 


Theorem 6 Let p;,; : & — {0,1} be arbitrary 
functions of a single variable (1 <i <t, 1< 
J <n). Let gj : =" — {0,1} be defined by 
n 

II i,j (cj). Assume that there is no point x € 
j=l 

x” such that gi(x) = 1 for more than s functions 
g;. Finally, let f : %" — {0,1} be defined 

t 
by f = V\V gj. Let F be the Hankel matrix 


i=1 


corresponding to f. Then, for every field K and 
Ss 
for every 0 <d <n, rank(F%) < > te 
i=1 


Corollary 2 The class of unions of disjoint 
boxes can be learned in time poly(n, t, £) (where 
t is the number of boxes in the target function). 
The class of unions of O(logn) boxes can be 
learned in time poly(n, £). 


Classes of DNF Formulae 

The learnability of DNF formulae has been 
widely investigated. The following special case 
of Corollary 1 solves an open problem of [18]: 


Corollary 3 The class of functions that can be 
expressed as exclusive OR of t (not necessar- 
ily monotone) monomials is learnable in time 
poly(n, ft). 


While Corollary 3 does not refer to a subclass 
of DNF, it already implies the learnability of 
disjoint (i.e., satisfy-1) DNF. Since DNF is a 
special case of union of boxes (with £ = 2), one 
obtains also the learnability of disjoint DNF from 
Corollary 2. Positive results for satisfy-s DNF 
(i.e., DNF formulae in which each assignment 
satisfies at most s terms) can be obtained, with 
larger values of s. The following two important 
corollaries follow from Theorem 6. Note that 
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Theorem 6 holds in any field. For convenience 
(and efficiency), let K = GF(2). 


Theorem 7 Let f = 7; VT, V...V T; bea 
satisfy-s DNF (i.e., each T; is amonomial). Let F 
be the Hankel matrix corresponding to f. Then, 


S 

rank(F@) < 2 (;) <ft’. 

i=1 

Corollary 4 The class of satisfy-s DNF formu- 
lae, for s = O(1), is learnable in time poly(n, t). 


Corollary 5 The class of satisfy-s, t-term 
DNF formulae is learnable in time poly(n) 
for the following choices of s andt : (A) t = 
O(logn),(2) t =  polylog(n) and s = 
O(logn/log logn),(3) t = 20(esn/log logn) 


and s = O(log logn). 


Classes of Decision Trees 

The algorithm of Theorem 3 efficiently learns the 
class of disjoint DNF formulae. This includes the 
class of decision trees. Therefore, decision trees 
of size f on n variables are learnable using O(tn) 
equivalence queries and O(t?n logn) member- 
ship queries. This is better than the best known 
algorithm for decision trees [11] (which uses 
O(t?) equivalence queries and O(t?n”) member- 
ship queries). The following results concern more 
general classes of decision trees. 


Corollary 6 Consider the class of decision trees 
that compute functions f : GF(p)” — GF(p) as 
follows: each node v contains a query of the form 
“x; € Sy?” for some Sy © GF(p). If x; € Sy, 
then the computation proceeds to the left child of 
v, and if x; ¢,Sy the computation proceeds to 
the right child. Each leaf £ of the tree is marked 
by a value y£ € GF(p) which is the output on all 
the assignments which reach this leaf. Then, this 
class is learnable in time poly(n,|L|, p), where 
L is the set of leaves. 


The above result implies the learnability of de- 
cision trees with “greater-than” queries in the 
nodes, solving a problem of [11]. Every decision 
tree with “greater-than” queries that computes a 
Boolean function can be expressed as the union 
of disjoint boxes. Hence, this case can also be 
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derived from Corollary 2. The next theorem will 
be used to learn more classes of decision trees. 


Theorem 8 Let g; x” = K be arbitrary 
functions (1 <i < @). Let f : 3” > K be 
£ 


defined by f = |] g;. Let F be the Hankel 

matrix woneipoiaine 18 J, and G; be the Hankel 

matrix corresponding to g;. Then, rank (F%) < 
£ 

[| rank(G¢). 

i=1 

This theorem has some interesting applications. 

The first application states that arithmetic circuits 

of depth two with multiplication gate of fan- 

in O(logn) at the top level and addition gates 

with unbounded fan-in in the bottom level are 

learnable. 


Corollary 7 Let C be the class of functions that 
can be expressed in the following way: Let pj,; : 
x —> K be arbitrary functions of a single vari- 
able(1<i<,1<j <n). Let€ = O(logn) 
and gy : =" > KU <i < 2) be defined 
by &" _, pi,j (Zs). Finally, let f : i" > K be 


£ 
defined by f = |] gi. Then, C is learnable in 
i=l 
time poly(n, |). 
Corollary 8 Consider the class of decision trees 
of depth s, where the query at each node v is a 
Boolean function fy with rmax < t (as defined 
in section “Key Results”) such that (t + 1)* = 
poly(”). Then, this class is learnable in time 


poly(n, ||). 


The above class contains, for example, all the de- 
cision trees of depth O(log 7) that contain in each 
node a term or a XOR of a subset of variables as 
defined in [15] (in this case 74x < 2). 


Negative Results 

In [4] some limitation of the learnability via 
the automaton representation has been proved. 
One can show that the main algorithm does not 
efficiently learn several important classes of func- 
tions. More precisely, these classes contain func- 
tions f that have no “small” automaton, i.e., by 
Theorem 1, the corresponding Hankel matrix F 
is “large” over every field K. 
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Theorem 9 The following classes are not learn- 
able in time polynomial inn and the formula size 
using multiplicity automata (over any field K): 
DNF, monotone DNF; 2-DNF; read-once DNF; 
k-term DNF, fork = w(logn); satisfy-s DNF, 
for s = w(\); and read-j satisfy-s DNF, for 
jJ = o(1) ands = Q(logn). 


Some of these classes are known to be learn- 
able by other methods, some are natural gener- 
alizations of classes known to be learnable as 
automata (O(logn)-term DNF [11, 12, 14], and 
satisfy-s DNF for s = O(1) (Corollary 4)) or 
by other methods (read-j satisfy-s for js = 
O(logn/ log logn) [10]), and the learnability of 
some of the others is still an open problem. 
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Problem Definition 


This problem deals with learning “simple” 
Boolean functions f {0,1}” > {-1,}} 
from uniform random labeled examples. In the 
basic uniform-distribution PAC framework, the 
learning algorithm is given access to a uniform 
random example oracle EX(f,U) which, when 
queried, provides a labeled random example 
(x, f(x)) where x is drawn from the uniform 
distribution U over the Boolean cube {0, 1}”. 
Successive calls to the EX(f,U) oracle yield 
independent uniform random examples. The 
goal of the learning algorithm is to output a 
representation of a hypothesis function h 
{0, 1}” — {-1,1} which with high probability 
has high accuracy; formally, for any €,6 > 0, 
given € and 6 the learning algorithm should 
output an / which with probability at least 1 — 6 
has Prxey[h(x) # f(x)] <. 

Many variants of the basic framework 
described above have been considered. In the 
distribution-independent PAC learning model, 
the random example oracle is EX(f,D) where 
D is an arbitrary (and unknown to the learner) 
distribution over {0, 1}”; the hypothesis / should 
now have high accuracy with respect to D, 
i.e., with probability 1 — 6, it must satisfy 
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Pryep|[h(x) # f(x)] < ¢. Another variant that 
has been considered is when the distribution D is 
assumed to be an unknown product distribution; 
such a distribution is defined by n parameters 
O < pi,..-.,Pn < 1, and a draw from D is 
obtained by independently setting each bit x; to 
1 with probability p;. Yet another variant is to 
consider learning with the help of a membership 
oracle: this is a “black box” oracle MQ(f) for 
J which, when queried on an input x € {0, 1}”, 
returns the value of f(x). The model of uniform- 
distribution learning with a membership oracle 
has been well studied; see e.g. [7, 15]. 

There are many ways to make precise the 
notion of a “simple” Boolean function; one com- 
mon approach is to stipulate that the function be 
computed by a Boolean circuit of some restricted 
form. A circuit of size s and depth d consists 
of s AND and OR gates (of unbounded fanin) 
in which the longest path from any input literal 
X1,.++,Xn,X1,---,Xn to the output node is of 
length d. Note that a circuit of size s and depth 
2 is simply a CNF formula or DNF formula. 
The complexity class consisting of those Boolean 
functions computed by poly(7)-size, O(1)-depth 
circuits is known as nonuniform AC®. 


Key Results 


Positive Results 

Linial et al. [16] showed that almost all of the 
Fourier weight of any constant-depth circuit is on 
“low-order” Fourier coefficients: 


Lemmal Let /f:{0,1}">{-1,1} be a 
Boolean function that is computed by an 
AND/OR/NOT circuit of size s and depth d. 
Then for any integer t = 0, 


‘Ss 


SCU,...,n},|S|>t 


F(S)? < 2527814 /20, 


(Hastad [6] has given a refined version of 
Lemma 1 with slightly sharper bounds; see 
also [21] for a streamlined proof.) They also 
showed that any Boolean function can be well 
approximated by approximating its Fourier 
spectrum. 
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Lemma2 Let f : {0,1}" — {-1,1} be any 
Boolean Function and let g : {0,1}" > R be an 


jaee9 


Using the above two results together with 
a procedure that estimates all the “low- 
order” Fourier coefficients, they obtained a 
quasipolynomial-time algorithm for learning 
constant-depth circuits: 


Theorem 1 There is an nOs"@/©)) time al- 
gorithm that learns any poly(n)-size, depth-d 
Boolean AND/OR/NOT circuit to accuracy € with 
respect to the uniform distribution, using uniform 
random examples only. 


Furst et al. [3] extended this result to learning 
under constant-bounded product distributions. A 
product distribution D is said to be constant 
bounded if each of its n parameters p1,..-, Pn 
is bounded away from 0 and 1, ie., satisfies 
min{ p;,1 — pi} = OC). 


Theorem 2 There is an nee!) time 
algorithm that learns any poly(n)-size, depth- 
d Boolean AND/OR/NOT circuit to accuracy 
€ given random examples drawn from any 
constant-bounded product distribution. 


By combining the Fourier arguments of 
Linial et al. with hypothesis boosting, Jackson 
et al. [8] were able to extend Theorem | to a 
broader class of circuits, namely, constant-depth 
AND/OR/NOT circuits that additionally contain 
(a limited number of) majority gates. A majority 
gate over r Boolean inputs is a binary gate which 
outputs “true” if and only if at least half of its r 
Boolean inputs are set to “true.” 


Theorem 3 There is ann??? /9) time 
algorithm that learns any poly(n)-size, constant- 
depth Boolean AND/OR/NOT circuit that 
additionally contains polylog(n) many majority 
gates to accuracy € with respect to the uniform 
distribution, using uniform random examples 


only. 


A threshold gate over r Boolean inputs is a 
binary gate defined by r real weights w,.. 
and a real threshold 9; on input (x1,.. 


. Wr 
., Xr) 
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the value of the threshold gate is 1 if and 
only if wyxy +---+w,-x; > 6. Gopalan and 
Servedio [4] observed that a conjecture of 
Gotsman and Linial [5] bounding the average 
sensitivity of low-degree polynomial threshold 
functions implies Fourier concentration — and 
hence quasipolynomial time learnability using 
the original Linial et al. [16] algorithm — for 
Boolean functions computed by polynomial- 
constant-depth AND/OR/NOT circuits 
augmented with a threshold gate as the topmost 
(output) gate. Combining this observation with 
upper bounds on the noise sensitivity of low- 
degree polynomial threshold functions given in 
[2], they obtained unconditional sub-exponential 
time learning for these circuits. Subsequent 
improvements of these noise sensitivity results, 
nearly resolving the Gotsman-Linial conjecture 
and giving stronger unconditional running 
times for learning these circuits, were given by 
Kane [10]; see [2, 4, 10] for detailed statements 
of these results. 


size 


Negative Results 

Kharitonov [11] showed that under a strong but 
plausible cryptographic assumption, the algorith- 
mic result of Theorem | is essentially optimal. 
A Blum integer is an integer N = P - Q where 
both P and Q are congruent to 3 modulo 4. 
Kharitonov proved that if the problem of factor- 
ing a randomly chosen n-bit Blum integer is 2”° - 
hard for some fixed € > 0, then any algorithm 
that (even weakly) learns polynomial-size depth- 
d circuits must run in time glos n even if 
it is only required to learn under the uniform 
distribution and can use a membership oracle. 
This implies that there is no polynomial-time 
algorithm for learning polynomial-size, depth-d 
circuits (for d larger than some absolute con- 
stant). 

Using a cryptographic construction of Naor 
and Reingold [18], Jackson et al. [8] proved a re- 
lated result for circuits with majority gates. They 
showed that under Kharitonov’s assumption, any 
algorithm that (even weakly) learns depth-5 cir- 
cuits consisting of log* n many majority gates 
must run in time 28°?” time, even if it is only 
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required to learn under the uniform distribution 
and can use a membership oracle. 


Applications 


The technique of learning by approximating 
most of the Fourier spectrum (Lemma 2 above) 
has found many applications in subsequent 
work on uniform-distribution learning. It is a 
crucial ingredient in the current state-of-the- 
art algorithms for learning monotone DNF 
formulas [20], monotone decision trees [19], 
and intersections of half-spaces [12] from 
uniform random examples only. Combined 
with a membership oracle-based procedure 
for identifying large Fourier coefficients, this 
technique is at the heart of an algorithm for 
learning decision trees [15]; this algorithm 
in turn plays a crucial role in the celebrated 
polynomial-time algorithm of Jackson [7] for 
learning polynomial-size depth-2 circuits under 
the uniform distribution. 

The ideas of Linial et al. have also been ap- 
plied for the difficult problem of agnostic learn- 
ing. In the agnostic learning framework, there is 
a joint distribution D over example-label pairs 
{0, 1}” x {-1, 1}; the goal of an agnostic learning 
algorithm for a class C of functions is to construct 
a hypothesis / such that Pr(x,yyep[h(x) # y] < 
min ¢ec Pr(x,yyep[ f(x) # y] + €. Kalai et al. [9] 
gave agnostic learning algorithms for half-spaces 
and related classes via an algorithm which may 
be viewed as a generalization of Linial et al.’s 
algorithm to a broader class of distributions. 

Finally, there has been some applied work 
on learning using Fourier representations as well 
[17]. 


Open Problems 


Perhaps the most outstanding open question re- 
lated to this work is whether polynomial-size 
circuits of depth two — i.e., DNF formulas — 
can be learned in polynomial time from uniform 
random examples only. Blum [1] has offered a 
cash prize for a solution to a restricted version 
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of this problem. A hardness result for learning 
DNF would also be of great interest; recent work 
of Klivans and Sherstov [14] gives a hardness 
result for learning ANDs of majority gates, but 
hardness for DNF (ANDs of ORs) remains an 
open question. 

Another open question is whether the 
quasipolynomial-time algorithms for learning 
constant-depth circuits under uniform dis- 
tributions and product distributions can be 
extended to the general distribution-independent 
model. Known results in complexity theory 
imply that quasipolynomial-time distribution- 
independent learning algorithms for constant- 
depth circuits would follow from the exis- 
tence of efficient linear threshold learning 
algorithms with a sufficiently high level of 
tolerance to “malicious” noise. Currently no 
nontrivial distribution-independent algorithms 
are known for learning circuits of depth 3; 
for depth-2 circuits the best known running 
time in the distribution-independent setting is 
the 2°”) time algorithm of Klivans and 
Servedio [13]. 

A third direction for future work is to ex- 
tend the results of [8] to a broader class of 
circuits. Can constant-depth circuits augmented 
with MOD, gates be learned in quasipolyno- 
mial time? Jackson et al. [8] discusses the lim- 
itations of current techniques to address these 
extensions. 


Experimental Results 


None to report. 


Data Sets 


None to report. 


URL to Code 


None to report. 
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Problem Definition 


A disjunctive normal form (DNF) expression is 
a Boolean expression written as a disjunction 
of terms, where each term is the conjunction 
of Boolean variables that may or may not be 
negated. For example, (vj A U2) V (v2 A v3) Is 
a two-term DNF expression over three variables. 
DNF expressions occur frequently in digital cir- 
cuit design, where DNF is often referred to as 
sum of products notation. From a learning per- 
spective, DNF expressions are of interest because 
they provide a natural representation for certain 
types of expert knowledge. For example, the 
conditions under which complex tax rules apply 
can often be readily represented as DNFs. An- 
other nice property of DNF expressions is their 
universality: every n-bit Boolean function (the 
type of function considered in this entry unless 
otherwise noted) f : {0,1}” — {0,1} can be 
represented as a DNF expression F' over at most 
n variables. 

In the basic probably-approximately correct 
(PAC) learning model [24], n is a fixed positive 
integer and a target distribution D over {0, 1}” 
is assumed fixed. The learner will have black- 
box access to an unknown arbitrary Boolean 
f through an example oracle EX(f,D) which, 
when queried, selects x € {0,1}” at random 
according to D and returns the pair (x, f(x)). 
The DNF learning problem is then to design an 
algorithm provably meeting the following speci- 
fications. 


Problem 1 (PAC-DNF) 


INPUT: Positive integer n; €,5 > 0; oracle 
EX(f,D) for f : {0,1}” — {0,1} express- 
ible as DNF having at most s terms and for D 
an arbitrary distribution over {0, 1}”. 

OUTPUT: With probability at least 1 — 6 over the 
random choices made by EX(f,D) and the 
algorithm (if any), a function h : {0,1}” > 
{0, 1} such that Pr,op[h(x) 4 f(x)] < € 
and such that h(x) is computable in time 
polynomial in n and s for each x € {0, 1}". 

RUN TIME: Polynomial in 1, s, 1/e, and 1/6. 


Learning DNF Formulas 


The PAC-DNF problem has not been resolved 
at the time of this writing, and many believe 
that no algorithm can solve this problem (see, 
e.g., [2]). However, DNF has been shown to be 
learnable in several models that relax the PAC 
assumptions in various ways. In particular, all 
polynomial-time learning results to date have 
limited the choice of the target distribution D to 
(at most) the class of constant-bounded product 
distributions. For a constant c € (0, S|, a c- 
bounded product distribution D,, is defined by 
fixing a vector w € [c,1 —c]” and having D,, 
correspond to selecting each bit x; of x indepen- 
dently so that the mean value of x; is (4;; mathe- 
matically, this distribution function can be written 
as Du(x) = [Tai @imi + (— 27) — p)). 
In most learning models, the target distribution 
is assumed to be selected by an unknown, even 
adversarial, process from among the allowed dis- 
tributions; thus, the limitation relative to PAC 
learning is on the class of allowed distributions, 
not on how a distribution is chosen. However, in 
an alternative model of learning from smoothed 
product distributions [17], the mechanism used to 
choose the target distribution is also constrained 
as follows. A constant c € (0, 5] and an ar- 
bitrary vector 44 € [2c,1 — 2c]” are fixed, a 
perturbation A € [—c,c]” is chosen uniformly 
at random, and the target distribution is taken to 
be the c-bounded product distribution D,,” such 
that uw’ = «+ A. The learning algorithm now 
needs to succeed with only high probability over 
the choice of A (as well as over the usual choices, 
and in the same run time as for PAC-DNF). 


Problem 2 (SMOOTHED-DNF) 


INPUT: Same as PAC-DNF except that oracle 
EX(f, Dy’) has Dy a smoothed product dis- 
tribution. 

OUTPUT: With probability at least 1 — 6 over the 
random choice of ju’ and the random choices 
made by EX(f,D,) and the algorithm (if 
any), a function  : {0, 1}” — {0, 1} such that 
Prr~p,lh(x) # f(a] <e. 


Various more-informative oracles have also 
been studied, and we will consider two in par- 
ticular. A membership oracle MEM(f), given 
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x, returns f(x). A product random walk oracle 
PRW(f,D,,) [15] is initialized by selecting an 
internal state vector x at random according to a 
fixed arbitrary constant-bounded product distri- 
bution D,,. Then, on each call to PRW(f, D,,), an 
i € {1,2,...,m} is chosen uniformly at random 
and bit x; in the internal vector x is replaced with 
a bit b effectively chosen by flipping a ju; -biased 
coin, so that x; will be 1 with probability j1;. 
The oracle then returns the triple (i, x’, f(x’)) 
consisting of the selected i, the resulting new 
internal state vector x’, and the value that f 
assigns to this vector. These oracles are used to 
define two additional DNF learning problems. 


Problem 3 (PMEM-DNF) 


INPUT: Same as PAC-DNF except that the oracle 
supplied is MEM(/) and the target distribu- 
tion is a constant-bounded product distribution 
Du- 

OUTPUT: With probability at least 1 — 6 over 
the random choices made by the algorithm, a 
function h such that Pry—p,, [AA f(x) |< €. 


Problem 4 (PRW-DNF) 


INPUT: Same as PAC-DNF except that the or- 
acle supplied is PRW(f, D,,) for a constant- 
bounded product distribution D,,. 

OUTPUT: With probability at least 1 — 6 over 
the random choices made by PRW(f, D,,) and 
the algorithm (if any), a function 4 such that 
Prywp, h(x) # f(a) <e. 

Certain other DNF learning problems and as- 
sociated results are mentioned briefly in the next 
section. 


Key Results 


The first algorithm for efficiently learning arbi- 
trary functions in time polynomial in their DNF 
size was the Harmonic Sieve [13], which solved 
the PMEM-DNF problem. This algorithm, like all 
algorithms for learning arbitrary DNF functions 
to date, relies in large part on Fourier analy- 
sis for its proof of correctness. In particular, 
a key component of the Sieve involves finding 
heavy (large magnitude) Fourier coefficients of 
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certain functions using a variation on an algo- 
rithm discovered by Goldreich and Levin [11] 
and first employed to obtain a learning result by 
Kushilevitz and Mansour [22]. The original Sieve 
also depends on a certain hypothesis boosting 
algorithm [10]. Subsequent work on the PMEM- 
DNF problem [5,7,8, 18,21] has produced simpler 
and/or faster algorithms. In the case of [18], the 
result is somewhat stronger as the approximator 
h is reliable: it rarely produces false positives. 
The best run time for the PMEM-DNF problem 
obtained thus far, O(ns* /€), is due to Feldman 
[7]. 

Using a membership oracle is a form of ac- 
tive learning in which the learning algorithm 
is able to influence the oracle, as opposed to 
passive learning—exemplified by learning from 
an example oracle—where the learning algorithm 
merely accepts the information provided by the 
oracle. Thus, an apparently significant step in 
the direction of a solution to PAC-DNF occurred 
when Bshouty et al. [6] showed that PRW-DNF, 
which is obviously a passive learning problem, 
can be solved when the product distribution is 
constrained to be uniform (the distribution is 
c-bounded for c = 3). Noise sensitivity anal- 
ysis plays a key role in Bshouty et al.’s result. 
Jackson and Wimmer [15] subsequently defined 
the product random walk model and extended 
the result of [6] to show that PRW-DNF can be 
solved in general, not merely for uniform random 
walks. Both results still rely to some extent on the 
Harmonic Sieve, or more precisely on a slightly 
modified version called the Bounded Sieve [3]. 

More recently, the SMOOTHED-DNF 
problem has been defined and solved by Kalai, 
Samorodnitsky, and Teng [17]. As an example 
oracle is used by the algorithm solving this 
problem, this result can be viewed as a partial 
solution to PAC-DNF that applies when the 
target distribution is chosen in a somewhat 
“friendly,” rather than adversarial, way. Their 
algorithm avoids the Sieve’s need for boosting 
by combining a form of gradient projection 
optimization [12] with reliable DNF learning 
[18] in order to produce a good approximator 
h, given only the heavy Fourier coefficients of 
the function f to be approximated. Feldman [9] 
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subsequently discovered a simpler algorithm for 
this problem. 

Algorithms for efficiently learning DNF in a 
few, less studied, models are mentioned only 
briefly here due to space constraints. These 
results include uniform-distribution learning of 
DNF from a quantum example oracle [4] and 
from two different types of extended statistical 
queries [3, 14]. 

Finally, note that although the focus of this 
entry is on learning arbitrary functions in time 
polynomial in their DNF size, there is also a 
substantial literature on polynomial-time learning 
of restricted classes of functions representable 
by constrained forms of DNF, such as monotone 
DNF (functions expressible as a DNF having no 
negated variables). For the most part, this work 
predates the algorithms described here. [19] pro- 
vides a good summary of many early restricted- 
DNF results. See also Sellie’s algorithm [23] 
that, roughly speaking, efficiently learns with 
respect to the uniform target distribution most— 
but not necessarily all—functions representable 
by polynomial-size DNF given uniform random 
examples of each function. 


Applications 


DNF learning algorithms have proven useful as 
a basis for learning more expressive function- 
representation classes. In fact, the Harmonic 
Sieve, without alteration, can be used to learn 
with respect to the uniform distribution arbitrary 
functions in time polynomial in their size as 
a majority of parity functions (see [13] for a 
definition of the TOP class and a discussion 
of its superior expressiveness relative to DNF). 
In another generalization direction, a DNF 
expression can be viewed as a union of rectangles 
of the Boolean hypercube. Atici and Servedio [1] 
have given a generalized version of the Harmonic 
Sieve that can, among other things, efficiently 
learn with respect to uniform an interesting 
subset of the class of unions of rectangles over 
{0,1,...,b —1}” for non-constant b. Both of the 
preceding results use membership queries. A few 
quasipolynomial-time passive learning results 
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for classes more expressive than DNF in various 
learning models have also been obtained [15, 16] 
by building on techniques employed originally in 
DMF learning algorithms. 


Open Problems 


As indicated at the outset, a resolution, either 
positively or negatively, of the PAC-DNF ques- 
tion would be a major step forward. Several other 
DNF questions are also of particular interest. In 
the problem of agnostic learning [20] of DNF, the 
goal, roughly, is to efficiently find a function h 
that approximates arbitrary function f : {0, 1}” 
nearly as well as the best s-term DNF, for any 
fixed s. Is DNF agnostically learnable in any rea- 
sonable model? Also welcome would be the dis- 
covery of efficient algorithms for learning DNF 
with respect to interesting classes of distributions 
beyond product distributions. Finally, although 
monotone DNF is not a universal function class, 
it should be mentioned that an efficient example- 
oracle algorithm for monotone DNF, even if re- 
stricted to the uniform target distribution, would 
be considered a breakthrough. 
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Problem Definition 


The Hamming distance dy(y, z) between two 
binary strings y and z of the same length is the 
number of entries in which y and z disagree. A bi- 
nary error-correcting code of minimum distance d 
is a mapping C : {0, 1}* — {0, 1}” such that for 
every two distinct inputs x, x’ € {0, 1}*, the en- 
codings C(x) and C(x’) have Hamming distance 
at least d. Error-correcting codes are employed 
to transmit information over noisy channels. If 
a sender transmits an encoding C(x) of a message 
x via a noisy channel, and the recipient receives 
a corrupt bit string y # C(x), then, provided 
that y differs from C(x) in at most (d — 1)/2 
locations, the recipient can recover y from C(x). 
The recipient can do so by searching for the string 
x that minimizes the Hamming distance between 
C(x) and y: there can be no other string x’ such 
that C(x’) has Hamming distance (d — 1)/2 or 
smaller from y, otherwise C(x) and C(x’) would 
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be within Hamming distance d — 1 or smaller, 
contradicting the above definition. The problem 
of recovering the message x from the corrupted 
encoding y is the unique decoding problem for the 
error-correcting code C. For the above-described 
scheme to be feasible, the decoding problem 
must be solvable via an efficient algorithm. These 
notions are due to Hamming [4]. 

Suppose that C is a code of minimum distance 
d, and such that there are pairs of encodings 
C(x), C(x’) whose distance is exactly d. Further- 
more, suppose that a communication channel is 
used that could make a number of errors larger 
than (d —1)/2. Then, if the sender transmits 
an encoded message using C, it is no longer 
possible for the recipient to uniquely reconstruct 
the message. If the sender, for example, trans- 
mits C(x), and the recipient receives a string y 
that is at distance d/2 from C(x) and at dis- 
tance d/2 from C(x’), then, from the perspec- 
tive of the recipient, it is equally likely that the 
original message was x or x’. If the recipient 
knows an upper bound e on the number of en- 
tries that the channel has corrupted, then, given 
the received string y, the recipient can at least 
compute the list of all strings x such that C(x) 
and y differ in at most e locations. An error- 
correcting code C : {0, 1}* — {0,1}” is (e, L)- 
list decodable if, for every string y € {0,1}”, 
the set {x € {0, 1} : dy (C(x), y) < e} has car- 
dinality at most L. The problem of reconstructing 
the list given y and e is the list-decoding problem 
for the code C. Again, one is interested in efficient 
algorithms for this problem. The notion of list- 
decoding is due to Elias [1]. 

A code C : {0, 1}* > {0, 1}” is a Hadamard 
code if every two encodings C(x), C(x’) differ in 
precisely n/2 locations. In the Computer Science 
literature, it is common to use the term Hadamard 
code for a specific construction (the Reed—Muller 
code of order 2) that satisfies the above prop- 
erty. For a string a € {0, 1}*, define the function 
£q 2 {0, 1}* > {0, 1} as 


lg(x):= > aa mod 2. 


Observe that, for a # b, the two functions £, and 
€, differ on precisely (2*)/2 inputs. For n = 2*, 
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the code H : {0, 1}* — {0, 1}” maps a message 
a € {0, 1} into the n-bit string which is the truth- 
table of the function €,. That is, if by,...,Dy 
is an enumeration of the n = 2* elements of 
{0, 1}*, and a € {0,1}* is a message, then the 
encoding H(a) is the n-bit string that contains the 
value £,(b;) in the i-th entry. Note that any two 
encodings H(x), H(x’) differ in precisely n/2 en- 
tries, and so what was just defined is a Hadamard 
code. From now on, the term Hadamard code will 
refer exclusively to this construction. 

It is known that the Hadamard code H 
£0, 1} > £0, 132" is (4 -«, ik _1.)-list decodable 
for every € > 0. The Goldreich—Levin results 
provide efficient list-decoding algorithm. 

The following definition of the Fourier 
spectrum of a boolean function will be needed 
later to state an application of the Goldreich— 
Levin results to computational learning theory. 
For a string a € {0, i. define the function 
Xa: {0,1 > (-1, 41} as ya(x) = (1). 
Equivalently, yg(x) = (—1)2#4i*i. For two 
functions f,g : {0,1}* +R, define their inner 
product as 


(fg) rae f(x): g(x). 


Then it is easy to see that, for every a 4b, 
(Xa; Xb) = 0, and (Za, Xa) = 1. This means that 
the functions {Ya}ge¢o,1}« form an orthonormal 
basis for the set of all functions f : {0, 1} > R. 
In particular, every such function f can be written 
as a linear combination 


f(x) = 2 f(@ a(x) 


where the coefficients 7 (a) satisfy f (a)=(f, Xa). 
The coefficients f(a) are called the Fourier 
coefficients of the function f. 


Key Results 


Theorem 1 There is a randomized algorithm GL 
that, given in input an integer k and a parameter 


1079 


€ >0, and given oracle access to a function 
f 2 {0,1} — £0, 1}, runs in time polynomial in 
1/e and in k and outputs, with high probability 
over its internal coin tosses, a set S € {0, 1}* 
that contains all the strings a € {0,1}* such that 
£, and f agree on at leasta 1/2 + fraction of 
inputs. 


Theorem 1! is proved by Goldreich and 
Levin [3]. The result can be seen as a list- 
decoding for the Hadamard code H : {0, 1}* > 
{0, 1"; remarkably, the algorithm runs in time 
polynomial in k, which is poly-logarithmic in the 
length of the given corrupted encoding. 


Theorem 2 There is a randomized algorithm 
KM that given in input an integer k and parame- 
ters €,6 > 0, and given oracle access to a func- 
tion f : {0,1}* > {0,1}, runs in time polyno- 
mial in 1/e, in 1/5, and in k and outputs a set 
S ¢ {0, 1}* anda value g(a) for eacha € S. 

With high probability over the internal coin 
tosses of the algorithm, 


1 S contains all the strings a € {0, 1}* such that 
If(@|=eand | 
2 Foreverya € S,|f(a)—g(a)| <6. 


Theorem 2 is proved by Kushilevitz and 
Mansour [5]; it is an easy consequence of the 
Goldreich—Levin algorithm. 


Applications 


There are two key applications of the Goldreich— 
Levin algorithm: one is to cryptography and the 
other is to computational learning theory. 


Application in Cryptography 

In cryptography, a one-way permutation is 
a family of functions {p,}n>1 such that: (i) for 
every 7, Pn: {0,1}" > {0,1}” is bijective, 
(ii) there is a polynomial time algorithm 
that, given x € {0,1}”, computes p,(x), and 
(iii) for every polynomial time algorithm A and 
polynomial g, and for every sufficiently large n, 
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1 
n = x] < —~ 

Px~0,1}" [A(pn(x)) = x] S a 

That is, even though computing p,(x) given x is 
doable in polynomial time, the task of computing 
x given p,(x) is intractable. A hard core predicate 
for a one-way permutation {py} is a family of 
functions {By}n>1 such that: (i) for every n, 
B, : {0, 1}" — {0, 1}, (ii) there is a polynomial 
time algorithm that, given x € {0, 1}”, computes 
B,(x), and (iii) for every polynomial time algo- 
rithm A and polynomial q, and for every suffi- 
ciently large n, 


1 1 
Ps~(on1"[A(Pa(x)) = Bu(x)] S 5+ 7 - 


That is, even though computing B,(x) given x is 
doable in polynomial time, the task of computing 
B,(x) given p,(x) is intractable. 

Goldreich and Levin [3] use their algorithm to 
show that every one-way permutation has a hard- 
core predicate, as stated in the next theorem. 


Theorem 3 Let { pn} be a one-way permutation; 
define {p),} such that p5,(x,¥) = Pn(x),y 
and let Boy(x,y) := 30; x;y; mod 2. (For 
odd indices, let Py 4,(Z.5) := Ph_(Z) and 
Bonsi(Z,b) := Ban(z).) 

Then { p},} is a one-way permutation and { By} 
is a hard-core predicate for { p',}. 


This result is used in efficient constructions of 
pseudorandom generators, pseudorandom func- 
tions, and private-key encryption schemes based 
on one-way permutations. The interested reader 
is referred to Chapter 3 in Goldreich’s mono- 
graph [2] for more details. 

There are also related applications in com- 
putational complexity theory, especially in the 
study of average-case complexity. See [7] for an 
overview. 


Application in Computational Learning 
Theory 

Loosely speaking, in computational learning 
theory one is given an unknown function 
f :{0, 1}* — {0,1} and one wants to compute, 
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via an efficient randomized algorithm, a repre- 
sentation of a function g : {0, 1}* — {0,1} that 
agrees with fon most inputs. In the PAC learning 
model, one has access to f only via randomly 
sampled pairs (x, f(x)); in the model of learning 
with queries, instead, one can evaluate f at points 
of one’s choice. Kushilevitz and Mansour [5] 
suggest the following algorithm: using the 
algorithm of Theorem 2, find a set S of large 
coefficients and approximations g(a) of the 
coefficients 7 (a) for ae S. Then define the 
function g(x) = )vges (4) Xa(x). If the error 
caused by the absence of the smaller coefficients 
and the imprecision in the larger coefficient is 
not too large, g and f will agree on most inputs. 
(A technical point is that g as defined above is 
not necessarily a boolean function, but it can be 
easily “rounded” to be boolean.) Kushilevitz and 
Mansour show that such an approach works well 
for the class of functions f for which }~, | i (a)| 
is bounded, and they observe that functions of 
small decision tree complexity fall into this class. 
In particular, they derive the following result. 


Theorem 4 There is a randomized algorithm 
that, given in input parameters k, m, € and 
6, and given oracle access to a_ function 
f 2 {0, 1}* — {0,1} of decision tree complexity 
at most m, runs in time polynomial in k, m, 1/€ 
and log 1/6 and, with probability at least 1 —6 
over its internal coin tosses, outputs a circuit 
computing a function g : {0,1}* — {0,1} that 
agrees with f on at least a 1—€ fraction of 
inputs. 


Another application of the Kushilevitz—Mansour 
technique is due to Linial, Mansour, and 
Nisan [6]. 
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Problem Definition 


Fourier transform is among the most widely 
used tools in computer science. Computing the 
Fourier transform of a signal of length N may be 
done in time O(N log NV) using the Fast Fourier 
Transform (FFT) algorithm. This time bound 
clearly cannot be improved below @(N ), because 
the output itself is of length N. Nonetheless, it 
turns out that in many applications it suffices to 
find only the significant Fourier coefficients, i.e., 


1081 


Fourier coefficients occupying, say, at least 1 % 
of the energy of the signal. This motivates the 
problem discussed in this entry: the problem 
of efficiently finding and approximating the 
significant Fourier coefficients of a given signal 
(SFT, in short). A naive solution for SFT is to first 
compute the entire Fourier transform of the given 
signal and then to output only the significant 
Fourier coefficients; thus yielding no complexity 
improvement over algorithms computing the 
entire Fourier transform. In contrast, SFT can 
be solved far more efficiently in running time 
O(log N) and while reading at most O(log N) 
out of the N signal’s entries [2]. This fast 
algorithm for SFT opens the way to applications 
taken from diverse areas including computational 
learning, error correcting codes, cryptography, 
and algorithms. 

It is now possible to formally define 
the SFT problem, restricting our attention 
to discrete signals. Use functional notation 
where a signal is a function f{:G—->C 
over a finite Abelian group G, its energy is 


II = A/G] Deg £2. and its maximal 


amplitude is || f ||, © max{| f(x)| |x € G}. (For 


readers more accustomed to vector notation, 
the authors remark that there is a simple 
correspondence between vector and functional 
notation. For example, a one-dimensional 
signal (v1,...,UN) € CN corresponds to the 
function f:Zy — C defined by f(i) = v; for 
all i = 1,...,N. Likewise, a two-dimensional 
signal M ¢ C%1*42 corresponds to the function 
ft: Zn, X Zn, > C defined by f(i, j) = Mi; 
for all i? = 1,...,Ny and 7 =1,..., No.) For 
ease of presentation assume without loss of 
generality thatG = Zy, x Zn, x-:: X Zy, for 
N,,..., Ng € Zt (ie., positive integers), and for 
Zy is the additive group of integers modulo N. 

__ The Fourier transform of f is the function 
f:G — C defined for eacha = (a@1,...,a%) € G 


aD 


(x1 yey XK IEG 


k 
AX; 
fGa.-..x0) [T] on, Pi. 


j=1 
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where wy, = exp (i27/N;) is a primitive root 
of unity of order Nj. For any a € G, valy € C 
and t,¢ € [0,1], say that a is a t-significant 
Fourier coefficient iff | f(a)|? = t|| f ||, and say 
that val is an &-approximation for F(a) iff 
|valy — f(a)| <e. 


Problem 1 (SFT) 

INPUT: Integers Nj,...,Nx = 2 specifying 
the group G=Zy, x---xZy,, a_ thresh- 
old te€(0,1), an approximation parameter 
é€ € (0,1), and oracle access (Say that an 
algorithm is given oracle access to a function 
f over G, if it can request and receive the value 
f(x) for any x € G in unit time.) to f:G > C. 
OuTPuT: A list of all t-significant Fourier co- 
efficients of f along with ¢-approximations for 
them. 


Key Results 


The key result of this entry is an algorithm 
solving the SFT problem which is much faster 
than algorithms for computing the entire Fourier 
transform. For example, for fa Boolean function 
over Zy, the running time of this algorithm 
is log N - poly(loglog N,1/t,1/e), in contrast 
to the O(N log N) running time of the FFT 
algorithm. This algorithm is named the SFT 
algorithm. 


Theorem 1 (SFT algorithm [2]) There is an 
algorithm solving the SFT problem with run- 
ning time log |G|-poly(log log |G], || f lloo/II fla. 
1/t,1/e) for |G| = nem N; the cardinality 
of G. 


Remarks 


1. The above result extends to functions f over 
any finite Abelian group G, as long as the 
algorithm is given a description of G by its 
generators and their orders [2]. 

2. The SFT algorithm reads at most log|G| - 
poly(loglog |G, Il f lloo/IlFllz.1/t, 1/2) out 
of the |G| values of the signal. 
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3. The SFT algorithm is non adaptive, that is, 
oracle queries to f are independent of the 
algorithm’s progress. 

4. The SFT algorithm is a probabilistic algorithm 
having a small error probability, where prob- 
ability is taken over the internal coin tosses 
of the algorithm. The error probability can be 
made arbitrarily small by standard amplifica- 
tion techniques. 


The SFT algorithm follows a line of works 
solving the SFT problem for restricted function 
classes. Goldreich and Levin [9] gave an 
algorithm for Boolean functions over the 
group Zk = {0, i, The running time of their 
algorithm is polynomial in k,1/t and 1/e. 
Mansour [10] gave an algorithm for complex 
functions over groups G = Zy, X-:- x Zn, 
provided that Nj,...,Nz are powers of 
two. The running time of his algorithm is 
polynomial in log |G|, log(maxgeg | f(a@)|), 1/t 
and 1/e. Gilbert et al. [6] gave an algorithm 
for complex functions over the group Zy 
for any positive integer N. The running 
time of their algorithm is polynomial in 
log N, log(maxxezy f(x)/minyezy f(x), 1/t 
and 1/e. Akavia et al. [2] gave an algorithm 
for complex functions over any finite Abelian 
group. The latter [2] improves on [6] even when 
restricted to functions over Zy in achieving 
log N - poly(log log N) rather than poly (log NV) 
dependency on WN. Subsequent works [7] 
improved the dependency of [6] on t and «. 


Applications 


Next, the paper surveys applications of the SFT 
algorithm [2] in the areas of computational learn- 
ing theory, coding theory, cryptography, and al- 
gorithms. 


Applications in Computational Learning 
Theory 

A common task in computational learning is to 
find a hypothesis / approximating a function 
jf, when given only samples of the function f. 
Samples may be given in a variety of forms, e.g., 
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via oracle access to f. We consider the following 
variant of this learning problem: f and / are 
complex functions over a finite Abelian group 
G =Zy, X-:: x Zy,, the goal is to find h such 
that || f a ales < vis for y > 0 an approxi- 
mation parameter, and samples of f are given via 
oracle access. 

A straightforward application of the SFT 
algorithm gives an efficient solution to the above 
learning problem, provided that there is a small 
set CGst. Deer If)? > 1 —y/3)I Ff I- 
The learning algorithm operates as follows. 
It first runs the SFT algorithm to find all 
a = (a,...,@%) € G that are y/|I"|-significant 
Fourier coefficients of f along with their 


y/|F' ||| f lloo-approximations valy, and then 
returns the hypothesis 
def 
A(xq,...,X~) = 
k 
>: valy + on, i 
ais y/|I |—significant j= 


This hypothesis / satisfies that || f — hl < 
yf ie The running time of this learning 
algorithm and the number of oracle queries 
it makes is polynomially bounded by log|G|, 


IF lloo/ IF lla LIF lloo/y- 


Theorem 2 Let f:G — C be a function over 
G = Zn, X-:: X Zy,, and y > 0 an approx- 
imation parameter. Denote t = min{|I"| |" C 
Gt. Deer |f(@)? > (1 — y/3)|f ll}. There 
is an algorithm that given N,,..., Nx, y, and 
oracle access to f, outputs a (short) descrip- 
tion of hiG > C st. || f —hllz < yl fl. 
The running time of this algorithm is log|G| - 
poly(log log |G|.Ilflleo/IIf ltl f lleo/V): 


More examples of function classes that can be 
efficiently learned using our SFT algorithm are 
given in [3]. 


Applications in Coding Theory 

Error correcting codes encode messages in a way 
that allows decoding, that is, recovery of the 
original message, even in the presence of noise. 
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When the noise is very high, unique decoding 
may be infeasible, nevertheless it may still be 
possible to list decode, that is, to find a short 
list of messages containing the original message. 
Codes equipped with an efficient list decoding 
algorithm have found many applications (see [11] 
for a survey). 

Formally, a binary code is a subset C € {0, 1}* 
of codewords each encoding some message. 
Denote by Cy,x € {0, ia a codeword of length 
N encoding a message x. The normalized 
Hamming distance between a _ codeword 


Cy,x and a received word w € {0, 1} is 


A(Cy,x.w) © 1/N|fi € Zw |Cn.x(i) ¥ wH)} 


where Cy,x(i) and w(i) are the ith bits of 
Cy,x and w, respectively. Given w € {0, 1” 
and a noise parameter 7 > 0, the list decoding 
task is to find a list of all messages x such that 
A(Cy,x,w) < 7. The received word w may be 
given explicitly or implicitly; we focus on the 
latter where oracle access to w is given. Goldreich 
and Levin [9] give a list decoding algorithm for 
Hadamard codes, using in a crucial way their 
algorithm solving the SFT problem for functions 
over the Boolean cube. 

The SFT algorithm for functions over ZZ y 
is a key component in a list decoding algo- 
rithm given by Akavia et al. [2]. This list de- 
coding algorithm is applicable to a large class of 
codes. For example, it is applicable to the code 
cmsh — {Cy :Zy > {0, I} xez%,,Nez+ Whose 
codewords are Cy,x(j) = msby(j -x mod N) 
for msby(y) = liff y > N/2 andmsby(y)=0 
otherwise. More generally, this list decoding al- 
gorithm is applicable to any Multiplication code 
C? for P a family of balanced and well concen- 
trated functions, as defined below. The running 
time of this list decoding algorithm is polynomial 
in log N and 1/(1 — 27), as long as n < 5 

Abstractly, the list decoding algorithm of [2] 
is applicable to any code that is “balanced,” 
“(well) concentrated,” and “recoverable,” 
as defined next (and those Fourier coeffi- 
cients have small greatest common divisor 
(GCD) with WN). A code is balanced if 
Prjezy[Cn,x(Z) = 0] = Prjezy[Cn,x(VJ) = 1 
for every codeword Cy,,. A code is (well) con- 
centrated if its codewords can be approximated 
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by a small number of significant coefficients in 
their Fourier representation (and those Fourier 
coefficients have small greatest common divisor 
(GCD) with N). A code is recoverable if there 
is an efficient algorithm mapping each Fourier 
coefficient a@ to a short list of codewords for 
which q@ is a significant Fourier coefficient. 
The key property of concentrated codes is that 
received words w share a significant Fourier 
coefficient with all close codewords Cy,x. The 
high level structure of the list decoding algorithm 
of [2] is therefore as follows. First it runs the 
SFT algorithm to find all significant Fourier 
coefficients a of the received word w. Second for 
each such q, it runs the recovery algorithm to find 
all codewords Cy,x for which @ is significant. 
Finally, it outputs all those codewords Cy, x. 


Definition 1 (Multiplication codes [2]) Let 
P={Py:Zy — {0,1l}}nezt+ be a family of 
functions. Say that C? = {Cnx:Zy > 
{0, I xez%, .wez+ is a multiplication code for P 
if for every N € Zt and x e€ Zi, the encoding 
Cy,x: Zn — {0, 1} of x is defined by 


Cy,x(j) = P(j -x mod N). 


Definition 2 (Well concentrated [2]) Let P = 
{Pny:Zy — Chyezt+ be a family of functions. 
Say that P is well concentrated if VN € Zt, y > 
0,4 ¢ Zw st. @) || < polyQog N/y), 
Gi) Veer |Py@? = (1 — y)IlPwllz, and 
(iii) for alla e I’, gcd(a, N) < poly(log N/y) 
(where gcd(a, N) is the greatest common divisor 
of a and N). 


Theorem 3 (List decoding [2]) Let P = 
{Py:Zyn —> {0,l}}nezt be a family of effi- 
ciently computable (P ={ Pv: Zy — {0, ly} }yezt 
is a family of efficiently computable functions if 
there is an algorithm that given any N € Z* and 
x €Zy outputs Py(x) in time poly(log N).), 
well concentrated, and balanced functions. Let 
C? = {Cn,x:Zy — (0, Uhxez% eat be the 
multiplication code for P. Then there is an 
algorithm that, given N € Zs, n< 5 and oracle 
access tow: Ly —> {0, 1}, outputs all x € ZN for 
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which A(Cy,x,w) < 9. The running time of this 
algorithm is polynomial in log N and 1/(1 — 27). 


Remarks 


1. The requirement that P is a family of effi- 
ciently computable functions can be relaxed. 
It suffices to require that the list decoding 
algorithm receives or computes a set I” C Zy 
with properties as specified in Definition 2. 

2. The requirement that P is a family of balanced 
functions can be relaxed. Denote bias(P) = 
mingeso,1} infyezt+ Prjezy[Pw(j) = BI. 
Then the list decoding algorithm of [2] is 
applicable to C? even when bias(P) 5, as 
long as n < bias(P). 


Applications in Cryptography 

Hard-core predicates for one-way functions are 
a fundamental cryptographic primitive, which is 
central for many cryptographic applications such 
as pseudo-random number generators, semantic 
secure encryption, and cryptographic protocols. 
Informally speaking, a Boolean predicate P is 
a hard-core predicate for a function f if P(x) 
is easy to compute when given x, but hard to 
guess with a non-negligible advantage beyond 
50% when given only f(x). The notion of hard- 
core predicates was introduced by Blum and 
Micali [2]. Goldreich and Levin [9] showed a 
randomized hardcore predicate for any one-way 
function, using in a crucial way their algorithm 
solving the SFT problem for functions over the 
Boolean cube. 

Akavia et al. [2] introduce a unifying frame- 
work for proving that a predicate P is hard- 
core for a one-way function f. Applying their 
framework they prove for a wide class of predi- 
cates — segment predicates — that they are hard- 
core predicates for various well-known candidate 
one-way functions. Thus showing new hard-core 
predicates for well-known one-way function can- 
didates as well as reproving old results in an 
entirely different way. 

Elaborating on the above, a segment predicate 
is any assignment of Boolean values to an arbi- 
trary partition of Zy into poly(log N) segments, 
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or dilations of such an assignment. Akavia 
et al. [2] prove that any segment predicate is 
hard-core for any one-way function f defined over 
Zw for which, for a non-negligible fraction of the 
x’s in Zy, given f(x) and y, one can efficiently 
compute f(xy) (where xy is multiplication 
in Zy). This includes the following functions: 
the exponentiation function EXPp,g:Zp > Zi), 
defined by EXPp g(x) = g* mod p for each 
prime p and a generator g of the group Z53 
the RSA function RSA:Z), — Zi, defined 
by RSA(x) = e* mod N for each N = pq 
a product of two primes p, g, and e co-prime 
to N; the Rabin function Rabin: Zi, > Zy 
defined by Rabin(x) =x? mod N for each 
N = pq a product of two primes p, g; and 
the elliptic curve log function defined by 
ECLa,b,p,0 =XQ for each elliptic curve 
Ea,b,p(Zp) and Q a point of high order on the 
curve. 

The SFT algorithm is a central tool in the 
framework of [2]: Akavia et al. take a list decod- 
ing methodology, where computing a hard-core 
predicate corresponds to computing an entry in 
some error correcting code, predicting a predicate 
corresponds to access to an entry in a corrupted 
codeword, and the task of inverting a one-way 
function corresponds to the task of list decoding 
a corrupted codeword. The codes emerging in [2] 
are multiplication codes (see Definition | above), 
which are list decoded using the SFT algorithm. 


Definition 3 (Segment predicates [2]) Let 
P = {Py:Zn > {0,]1}}nez+ be a fam- 
ily of predicates that are non-negligibly 
far from constant (A family of functions 
P={Py:Zy — {0, 1}}yezt is non-negligibly 
far from constant if WN € Z* and b € {0,1}, 
Prjezy[Py(j) = 6] < 1— poly(1/log N)). 


e It can be sayed that Py is a basic t-segment 
predicate if Py (x + 1) # Py (x) for at most 
tx’sinZy. 

e It can be sayed that Py is a t-segment predi- 
cate if there exist a basic t-segment predicate 
P’ and a € Zy which is co-prime to WN s.t. 
Vx € Zy, Pn(x) = P’(x/a). 
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¢ It can be sayed that P is a family of segment 
predicates if VN € Z*, Py is a t(N)-segment 
predicate for (NV) < poly(log N). 


Theorem 4 (Hardcore predicates [2]) Let P 
be a family of segment predicates. Then, P is 
hard-core for RSA, Rabin, EXP, ECL, under the 
assumption that these are one-way functions. 


Application in Algorithms 

Our modern times are characterized by 
information explosion incurring a need for faster 
and faster algorithms. Even algorithms classically 
regarded as efficient — such as the FFT algorithm 
with its O(N log N) complexity — are often too 
slow for data-intensive applications, and linear 
or even sub-linear algorithms are imperative. 
Despite the vast variety of fields and applications 
where algorithmic challenges arise, some basic 
algorithmic building blocks emerge in many of 
the existing algorithmic solutions. Accelerating 
such building blocks can therefore accelerate 
many existing algorithms. One of these recurring 
building blocks is the Fast Fourier Transform 
(FFT) algorithm. The SFT algorithm offers 
a great efficiency improvement over the FFT 
algorithm for applications where it suffices to 
deal only with the significant Fourier coefficients. 
In such applications, replacing the FFT building 
block with the SFT algorithm accelerates the 
O(N log N) complexity in each application of 
the FFT algorithm to poly(log N) complex- 
ity [1]. Lossy compression is an example of such 
an application [1, 5, 8]. To elaborate, central 
component in several transform compression 
methods (e.g., JPEG) is to first apply Fourier (or 
Cosine) transform to the signal, and then discard 
many of its coefficients. To accelerate such algo- 
rithms — instead of computing the entire Fourier 
(or Cosine) transform — the SFT algorithm can be 
used to directly approximate only the significant 
Fourier coefficients. Such an _ accelerated 
algorithm achieves compression guarantee as 
good as the original algorithm (and possibly 
better), but with running time improved to 
poly (log N) in place of the former O(N log N). 
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Problem Definition 


This problem is concerned with PAC learning 
of concept classes when training examples are 
affected by malicious errors. The PAC (prob- 
ably approximately correct) model of learning 
(also known as the distribution-free model of 
learning) was introduced by Valiant [13]. This 
model makes the idealized assumption that error- 
free training examples are generated from the 
same distribution which is then used to evaluate 
the learned hypothesis. In many environments, 
however, there is some chance that an erroneous 
example is given to the learning algorithm. The 
malicious noise model — again introduced by 
Valiant [14] — extends the PAC model by allowing 
example errors of any kind: it makes no assump- 
tions on the nature of the errors that occur. In this 
sense the malicious noise model is a worst-case 
model of errors, in which errors may be generated 
by an adversary whose goal is to foil the learning 
algorithm. Kearns and Li [8,9] study the maximal 
malicious error rate such that learning is still 
possible. They also provide a canonical method 
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to transform any standard learning algorithm into 
an algorithm which is robust against malicious 
noise. 


Notations Let X be a set of instances. The goal 
of a learning algorithm is to infer an unknown 
subset C C X of instances which exhibit a cer- 
tain property. Such subsets are called concepts. It 
is known to the learning algorithm that the correct 
concept C is from a concept class C C 2*,C € 
C. Let C(x)=1ifx € C and C(x)=0ifx ¢C. 
As input the learning algorithm receives an ac- 
curacy parameter ¢ > 0, a confidence parameter 
5 > 0, and the malicious noise rate B > 0. 
The learning algorithm may request a sample of 
labeled instances S = ((x1,£1),...,(%m,&m)), 
x; € X, and £; € {0, 1} and produces a hypothe- 
sis H C X.Let D be the unknown distribution of 
instances in X. Learning is successful if H mis- 
classifies an example with probability less than e, 
eirp(C, H) := D{x € X : C(x) 4 H(x)} < «. 
A learning algorithm is required to be successful 
with probability 1 — 6. The error of a hypothesis 
H inrespect to a sample S of labeled instances is 
defined as err(S, H) := |{(x,£) € S : H(x) # 


3I/|S|. 
The VC dimension VC(C) of a concept class 
C is the maximal number of instances x1,...,Xg 


such that {(C(x,),...,C(xg)) : C € Ch = 
{0, 1}4. The VC dimension is a measure of the 
difficulty to learn concept class C [4]. 

To investigate the computational complexity 
of learning algorithms, sequences of concept 
classes with increasing complexity (Xn,Cn)n = 
((X1,C1), (X2,C2),...) are considered. In this 
case the learning algorithm receives also a 
complexity parameter as input. 


Generation of Examples In the malicious noise 
model, the labeled instances (x;,¢;) are gener- 
ated independently from each other by the fol- 
lowing random process: 


(a) Correct examples: with probability 1 — 6, 
an instance x; is drawn from distribution D 
and labeled by the correct concept C, £; = 
Cc (x i ). 
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(b) Noisy examples: with probability 6, an arbi- 
trary example (x;,;) is generated, possibly 
by an adversary. 


Problem 1 (Malicious 
(X,C)) 


Noise Learning of 


INPUT: Reals ¢,6 > 0, 8B > 0 
OutTPuT: A hypothesis H C X 


For any distribution D on X and any concept C € 
C, the algorithm needs to produce with probabil- 
ity 1— 6 ahypothesis H such that errp(C, H) < 
é. The probability 1 — 6 is taken in respect to the 
random sample (x1, ¢1),...,(%m,€m) requested 
by the algorithm. The examples (x;, £;) are gen- 
erated as defined above. 


Problem 2 (Polynomial Malicious Noise 


Learning of (Xn, Cn)n) 


INPUT: Reals ¢€,6 > 0, 6 > 0, integern > 1 
Output: A hypothesis H C X, 


For any distribution D on X, and any con- 
cept C € Cy, the algorithm needs to produce 
with probability 1 — 6 a hypothesis H such that 
eirp(C, H) < e. The computational complexity 
of the algorithm must be bounded by a polyno- 
mial in 1/e, 1/6, and n. 


Key Results 


Theorem 1 ((9]) Let C be a nontrivial concept 
class with two concepts C,, Cz € C that are equal 
on an instance x, and differ on another instance 
X2; C1 (x1) = C2(x1), and Ci (x2) x C2(x2). 
Then no algorithm can learn C with malicious 
noise rate B > ere 

Theorem2 Let A > 0 and d = VC(C). 
For a suitable constant k, any algorithm which 
requests a sample S of m = k othe tee labeled 
examples and returns a hypothesis H € C which 
minimizes err(S, H) learns the concept class C 


: _ p Pe 
with malicious noise rate B < 16.= A. 


Lower bounds on the number of examples 
necessary for learning with malicious noise were 
derived by Cesa-Bianchi et al. 
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Theorem 3 ([7]) Let A > 0 andd = VC(C) = 
3. There is a constant « such that any algorithm 
which learns C with malicious noise rate B = 
Te — A by requesting a sample and returning 
a hypothesis H_ € C which minimizes err(S, H) 


needs a sample of size at leastm = K&S. 


A general conversion of a learning algorithm 
for the noise-free model into an algorithm for 


the malicious noise model was given by Kearns 
and Li. 


Theorem 4 ([9]) Let A be a (polynomial-time) 
learning algorithm which learns concept classes 
Cy from m(e,6,n) noise-free examples, i.e., B = 
0. Then A can be converted into a (polynomial- 
time) learning algorithm for Cy for any malicious 


: log m(e/8,1/2,n) 
notse rate B < ~m(E/8,1/2,n) * 


The next theorem relates learning with mali- 
cious noise to a type of combinatorial optimiza- 
tion problems. 


Theorem 5 ([9]) Letr > 1 anda > 0. 


1. Let A be a polynomial-time algorithm which, 
for any sample S, returns a hypothesis H € C 
with err(S, H) < r-mincec err(S,C). Then 
A learns concept class C for any malicious 
noise rate B < azaater in time polynomial 
in 1/e, log 1/5, VC(C), and 1/a. 

2. Let A be a polynomial-time learning algo- 
rithm for concept classes Cy, which tolerates 
a malicious noise rate B = * and returns a 
hypothesis H € Cy. Then A can be converted 
into a polynomial-time algorithm which for 
any sample S, with high probability, returns 
a hypothesis H € Cy such that err(S,H) < 
(1+ a)r-mincec err(S,C). 


The computational hardness of several such 
related combinatorial optimization problems was 
shown by Ben-David, Eiron, and Long [3]. Some 
particular concept classes for which learning with 
malicious noise has been considered are monomi- 
als, CNF and DNF formulas [9, 14], symmetric 
functions and decision lists [9], multiple intervals 
on the real line [7], and halfspaces [11]. 


Learning with Malicious Noise 


Applications 


Several extensions of the learning model with 
malicious noise have been proposed, in particular 
the agnostic learning model [10] and the statis- 
tical query model [1]. The following relations 
between these models and the malicious noise 
model have been established: 


Theorem 6 ((10]) Jf concept class C_ is 
polynomial-time learnable in the agnostic model, 
then C is polynomial-time learnable with any 
malicious noise rate B < €/2. 


Theorem 7 ([{1]) /fC is learnable from (relative 
error) statistical queries, then C is learnable with 
any malicious noise rate B < ¢/ log? (1/e) for a 
suitable large p independent of C. 


Another learning model related to the mali- 
cious noise model is learning with nasty noise [6]. 
In this model examples affected by malicious 
noise are not chosen at random with probability 
8, but an adversary might manipulate an arbitrary 
fraction of 6m examples out of a given sample 
of size m. The malicious noise model was also 
considered in the context of online learning [2] 
and boosting [12]. A variant of the malicious 
noise model for unsupervised learning has been 
investigated in [5]. In this model, noisy data 
points are again replaced by arbitrary points pos- 
sibly generated by an adversary. Still, a correct 
clustering of the points can learned if the noise 
rate is moderate. 
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Problem Definition 


In the exact learning model of Angluin [2], a 
learning algorithm A must discover an unknown 
function f : {0,1}” — {0,1} that is a member 
of a known class C of Boolean functions. The 
learning algorithm can make at least one of the 
following types of queries about f: 


¢ Equivalence query EQ;(g), for a candidate 
function g: 
The reply is either “yes,” if g <> f, or a coun- 
terexample a with g(a) # f(a), otherwise. 

* Membership query MQ;(a), for some a € 
{0, 1}": 
The reply is the Boolean value f(a). 

* Subset query SubQ;(g), for a candidate func- 
tion g: 
The reply is “yes,” if g => f, or a counterex- 
ample a with f(a) < g(a), otherwise. 

* Superset query SupQ,(g), for a candidate 
function g: 
The reply is “yes,” if f => g, or a counterex- 
ample a with g(a) < f(a), otherwise. 


A disjunctive normal formula (DNF) is a 
depth-2 OR-AND circuit whose size is given 
by the number of its AND gates. Likewise, a 
conjunctive normal formula (CNF) is a depth- 
2 AND-OR circuit whose size is given by the 
number of its OR gates. Any Boolean function 
can be represented as both a DNF or a CNF 
formula. A kK-DNF is a DNF where each AND 
gate has a fan-in of at most k; similarly, we may 
define a kK-CNF. 


Problem For a given class C of Boolean 
functions, such as polynomial-size Boolean 
circuits or disjunctive normal form (DNF) 
formulas, the goal is to design polynomial-time 
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learning algorithms for any unknown f € C and 
ask a polynomial number of queries. The output 
of the learning algorithm should be a function 
g of polynomial size satisfying g <= f. The 
polynomial functions bounding the running time, 
query complexity, and output size are defined in 
terms of the number of inputs n and the size of 
the smallest representation (Boolean circuit or 
DNF) of the unknown function f. 


Key Results 


One of the main results proved in [5] is that 
Boolean circuits and disjunctive normal formulas 
are exactly learnable using equivalence queries 
and access to an NP oracle. 


Theorem 1 The following tasks can be accom- 
plished with probabilistic polynomial-time algo- 
rithms that have access to an NP oracle and make 
polynomially many equivalence queries: 


¢ Learning DNF formulas of size s using equiv- 
alence queries that are depth-3 AND-OR- 
AND formulas of size O(sn?/ log? n). 

¢ Learning Boolean circuits of size s using 
equivalence queries that are circuits of size 
O(sn + nlogn). 


The idea behind this result is simple. Any class 
C of Boolean functions is exactly learnable with 
equivalence queries using the Halving algorithm 
of Littlestone [11]. This algorithm asks equiva- 
lence queries that are the majority of candidate 
functions from C. These are functions in C that 
are consistent with the counterexamples obtained 
so far by the learning algorithm. Since each such 
majority query eliminates at least half of the can- 
didate functions, log, |C| equivalence queries are 
sufficient to learn any function in C. A problem 
with using the Halving algorithm here is that 
the majority query has exponential size. But, it 
can be shown that a majority of a polynomial 
number of uniformly random candidate functions 
is a good enough approximator to the majority of 
all candidate functions. Moreover, with access to 
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an NP oracle, there is a randomized polynomial 
time algorithm for generating random uniform 
candidate functions due to Jerrum, Valiant, and 
Vazirani [7]. This yields the result. 

The next observation is that subset and su- 
perset queries are apparently powerful enough to 
simulate both equivalence queries and the NP 
oracle. This is easy to see since the tautology test 
g > 1 is equivalent to SubQ;(g) ASubQ;(g), for 
any unknown function f; and, EQ;(g) is equiva- 
lent to SubQ;(g) A SupQ,(g). Thus, the follow- 
ing generalization of Theorem | is obtained. 


Theorem 2 The following tasks can be accom- 
plished with probabilistic polynomial-time algo- 
rithms that make polynomially many subset and 
superset queries: 


¢ Learning DNF formulas of size s using equiv- 
alence queries that are depth-3 AND-OR- 
AND formulas of size O(sn?/ log? n). 

¢ Learning Boolean circuits of size s using 
equivalence queries that are circuits of size 
O(sn +n logn). 


Stronger deterministic results are obtained by 
allowing more powerful complexity-theoretic or- 
acles. The first of these results employ techniques 
developed by Sipser and Stockmeyer [12, 13]. 


Theorem 3 The following tasks can be accom- 
plished with deterministic polynomial-time algo- 
rithms that have access to an p34 oracle and make 
polynomially many equivalence queries: 


¢ Learning DNF formulas of size s using equiv- 
alence queries that are depth-3 AND-OR- 
AND formulas of size O(sn?/ log* n). 

¢ Learning Boolean circuits of size s using 
equivalence queries that are circuits of size 
O(sn + nlogn). 


In the following result, C is an infinite class 
of functions containing functions of the form f : 
{0, 1}* + {0, 1}. The class C is p-evaluatable if 
the following tasks can be performed in polyno- 
mial time: 


Learning with the Aid of an Oracle 


¢ Given y, is y a valid representation for any 
function fy € C? 

¢ Given a valid representation y and x € 
1) 6G) =H 1? 


Theorem 4 Let C be any p-evaluatable class. 
The following statements are equivalent: 


* Cis learnable from polynomially many equiv- 
alence queries of polynomial size (and unlim- 
ited computational power). 

¢ C is learnable in deterministic polynomial 
time with equivalence queries and access to 
a be oracle. 


For exact learning with membership queries, 
the following results are proved. 


Theorem 5 The following tasks can be accom- 
plished with deterministic polynomial-time algo- 
rithms that have access to an NP oracle and make 
polynomially many membership queries (in n, 
DNF and CMF sizes of f, where f is the unknown 
function): 


¢ Learning monotone Boolean functions. 


* Learning O(logn)-CNF () O(log n)-DNF. 


The ideas behind the above result use tech- 
niques from [2,4]. For a monotone Boolean func- 
tion f, the standard closure algorithm uses both 
equivalence and membership queries to learn f 
using candidate functions g satisfying g > f. 
The need for membership can be removed us- 
ing the following observation. Viewing —f as a 
monotone function on the inverted lattice, we can 
learn f and —f simultaneously using candidate 
functions g,h, respectively, that satisfy g > h. 
The NP oracle is used to obtain an example a 
that either helps in learning f or in learning —f; 
when no such example can be found, we have 
learned f. 


Theorem 6 Any class C of Boolean functions 
that is exactly learnable using a polynomial 
number of membership queries (and unlimited 
computational power) is exactly learnable in 
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expected polynomial time using a polynomial 
number of membership queries and access to an 
NP oracle. 

Moreover, any p-evaluatable class C 
that is exactly learnable from a_ polynomial 
number of membership queries (and unlimited 
computational power) is also learnable in 
deterministic polynomial time using a polynomial 
number of membership queries and access to a 
XP oracle. 


Theorems 4 and 6 showed that information- 
theoretic learnability using equivalence and 
membership queries can be transformed into 
computational learnability at the expense of using 
the D2 and NP oracles, respectively. 


Applications 


The learning algorithm for Boolean circuits using 
equivalence queries and access to an NP oracle 
has found an application in complexity theory. 
Watanabe (see [10]) showed an improvement 
on a known theorem of Karp and Lipton 
[8]: if NP has polynomial-size circuits, then 
the polynomial-time hierarchy PH collapses 
to ZPPN?. Subsequently, Aaronson (see [1]) 
showed that queries to the NP oracle used in 
the learning algorithm (for Boolean circuits) 
cannot be parallelized by any relativizing 
techniques. 

Some techniques developed in Theorem 5 
for exact learning using membership queries 
of monotone Boolean functions have found 
applications in data mining [6]. 


Open Problems 


It is unknown if there are polynomial-time 
learning algorithms for Boolean circuits and 
DNF formulas using equivalence queries 
(without complexity-theoretic oracles). There 
are strong cryptographic evidence that Boolean 
circuits are not learnable in polynomial-time 
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(see [3] and the references therein). The best 
running time for learning DNF formulas is 
200") as given by Klivans and Servedio [9]. 
It is unclear if membership queries help in this 
case. 
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Problem Definition 


In the last forty years, there has been a tremen- 
dous progress in the field of computer algorithms, 
especially within the core area known as com- 
binatorial algorithms. Combinatorial algorithms 
deal with objects such as lists, stacks, queues, se- 
quences, dictionaries, trees, graphs, paths, points, 
segments, lines, convex hulls, etc, and constitute 
the basis for several application areas including 
network optimization, scheduling, transport opti- 
mization, CAD, VLSI design, and graphics. For 
over thirty years, asymptotic analysis has been 
the main model for designing and assessing the 
efficiency of combinatorial algorithms, leading to 
major algorithmic advances. 

Despite so many breakthroughs, however, very 
little had been done (at least until 15 years ago) 
about the practical utility and assessment of this 
wealth of theoretical work. The main reason for 
this lack was the absence of a standard algorithm 
library, that is, of a software library that contains 
a systematic collection of robust and efficient im- 
plementations of algorithms and data structures, 
upon which other algorithms and data structures 
can be easily built. 

The lack of an algorithm library limits 
severely the great impact which combina- 
torial algorithms can have. The continuous 
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re-implementation of basic algorithms and data 
structures slows down progress and typically 
discourages people to make the (additional) 
effort to use an efficient solution, especially if 
such a solution cannot be re-used. This makes the 
migration of scientific discoveries into practice 
a very slow process. 

The major difficulty in building a library of 
combinatorial algorithms stems from the fact that 
such algorithms are based on complex data types, 
which are typically not encountered in program- 
ming languages (i.e., they are not built-in types). 
This is in sharp contrast with other computing 
areas such as statistics, numerical analysis, and 
linear programming. 


Key Results 


The currently most successful algorithm library 
is LEDA (Library for Efficient Data types and 
Algorithms) [4, 5]. It contains a very large collec- 
tion of advanced data structures and algorithms 
for combinatorial and geometric computing. The 
development of LEDA started in the early 1990s, 
it reached a very mature state in the late 1990s, 
and it continues to grow. LEDA has been written 
in C++ and has benefited considerably from the 
object-oriented paradigm. 

Four major goals have been set in the design 
of LEDA. 


1. Ease of use: LEDA provides a sizable collec- 
tion of data types and algorithms in a form 
that they can be readily used by non-experts. 
It gives a precise and readable specification for 
each data type and algorithm, which is short, 
general and abstract (to hide the details of im- 
plementation). Most data types in LEDA are 
parameterized (e.g., the dictionary data type 
works for arbitrary key and information type). 
To access the objects of a data structure by 
position, LEDA has invented the item concept 
that casts positions into an abstract form. 

2. Extensibility: LEDA is easily extensible by 
means of parametric polymorphism and can 
be used as a platform for further software 
development. Advanced data types are built 
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on top of basic ones, which in turn rest on 
a uniform conceptual framework and solid 
implementation principles. The main mecha- 
nism to extend LEDA is through the so-called 
LEDA extension packages (LEPs). A LEP 
extends LEDA into a particular application 
domain and/or area of algorithms that is not 
covered by the core system. Currently, there 
are 15 such LEPs; for details see [1]. 

3. Correctness: In LEDA, programs should 
give sufficient justification (proof) for their 
answers to allow the user of a program to 
easily assess its correctness. Many algorithms 
in LEDA are accompanied by program 
checkers. A program checker C for a program 
P is a (typically very simple) program that 
takes as input the input of P, the output of P, 
and perhaps additional information provided 
by P, and verifies that the answer of P in 
indeed the correct one. 

4. Efficiency: The implementations in LEDA 
are usually based on the asymptotically 
most efficient algorithms and data structures 
that are known for a problem. Quite 
often, these implementations have been 
fine-tuned and enhanced with heuristics 
that considerably improve running times. 
This makes LEDA not only the most 
comprehensive platform for combinatorial 
and geometric computing, but also a library 
that contains the currently fastest implemen- 
tations. 


Since 1995, LEDA is maintained by the Al- 
gorithmic Solutions Software GmbH [1] which is 
responsible for its distribution in academia and 
industry. 

Other efforts for algorithm libraries include 
the Standard Template Library (STL) [7], the 
Boost Graph Library [2, 6], and the Computa- 
tional Geometry Algorithms Library (CGAL) [3]. 

STL [7] (introduced in 1994) is a library of 
interchangeable components for solving many 
fundamental problems on sequences of elements, 
which has been adopted into the C++ standard. 
It contributed the iterator concept which pro- 
vides an interface between containers (an object 
that stores other objects) and algorithms. Each 
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algorithm in STL is a function template param- 
eterized by the types of iterators upon which it 
operates. Any iterator that satisfies a minimum set 
of requirements can be used regardless of the data 
structure accessed by the iterator. The system- 
atic approach used in STL to build abstractions 
and interchangeable components is called generic 
programming. 

The Boost Graph Library [2, 6] is a C++ 
graph library that applies the notions of generic 
programming to the construction of graph algo- 
rithms. Each graph algorithm is written not in 
terms of a specific data structure, but instead in 
terms of a graph abstraction that can be easily im- 
plemented by many different data structures. This 
offers the programmer the flexibility to use graph 
algorithms in a wide variety of applications. The 
first release of the library became available in 
September 2000. 

The Computational Geometry Algorithms Li- 
brary [3] is another C++ library that focuses 
on geometric computing only. Its main goal is 
to provide easy access to efficient and reliable 
geometric algorithms to users in industry and 
academia. The CGAL library started in 1996 and 
the first release was in April 1998. 

Among all libraries mentioned above LEDA is 
by far the best (both in quality and efficiency of 
implementations) regarding combinatorial com- 
puting. It is worth mentioning that the late ver- 
sions of LEDA have also incorporated the iterator 
concept of STL. 

Finally, a notable effort concerns the Stony 
Brook Algorithm Repository [8]. This is not 
an algorithm library, but a comprehensive col- 
lection of algorithm implementations for over 
seventy problems in combinatorial computing, 
started in 2001. The repository features imple- 
mentations coded in different programming lan- 
guages, including C, C++, Java, Fortran, ADA, 
Lisp, Mathematic, and Pascal. 


Applications 
An algorithm library for combinatorial and ge- 


ometric computing has a wealth of applications 
in a wide variety of areas, including: network 
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optimization, scheduling, transport optimization 
and control, VLSI design, computer graphics, sci- 
entific visualization, computer aided design and 
modeling, geographic information systems, text 
and string processing, text compression, cryp- 
tography, molecular biology, medical imaging, 
robotics and motion planning, and mesh partition 
and generation. 


Open Problems 


Algorithm libraries usually do not provide an in- 
teractive environment for developing and experi- 
menting with algorithms. An important research 
direction is to add an interactive environment into 
algorithm libraries that would facilitate the devel- 
opment, debugging, visualization, and testing of 
algorithms. 


Experimental Results 


There are numerous experimental studies based 
on LEDA, STL, Boost, and CGAL, most of 
which can be found in the world-wide web. Also, 
the web sites of some of the libraries contain 
pointers to experimental work. 


URL to Code 


The afore mentioned algorithm libraries can be 
downloaded from their corresponding web sites, 
the details of which are given in the bibliography 
(Recommended Reading). 
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Problem Definition 


Lossless data compression is concerned with 
compactly representing data in a form that allows 
the original data to be faithfully recovered. 
Reduction in space can be achieved by exploiting 
the presence of repetition in the data. 
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Many of the main solutions for lossless data 
compression in the last three decades have been 
based on techniques first described by Ziv and 
Lempel [22, 33,34]. These methods gained popu- 
larity in the 1980s via tools like Unix compress 
and the GIF image format, and today they per- 
vade computer software, for example, in the zip, 
gzip, and Izma compression utilities, in modem 
compression standards V.42bis and V.44, and as 
a basis for the compression used in information 
retrieval systems [8] and Google’s BigTable [4] 
database system. Perhaps the primary reason for 
the success of Lempel and Ziv’s methods is their 
powerful combination of compression effective- 
ness and compression/decompression through- 
put. We refer the reader to [3,29] for a review of 
related dictionary-based compression techniques. 


Key Results 


Let S[l...n] be a string of n symbols drawn 
from an alphabet X’. Lempel-Ziv-based compres- 
sion algorithms work by parsing S' into a se- 
quence of substrings called phrases (or factors). 
To achieve compression, each phrase is replaced 
by a compact representation, as detailed below. 


LZ78 

Assume the encoder has already parsed 
the phrases S,,S2,...,5;-1, that is, S = 
S1S2...5S;-1S’ for some suffix S’ of S. The 
LZ78 [34] dictionary is the set of strings 
obtained by adding a single symbol to one of 
the strings S; or to the empty string. The next 
phrase S; is then the longest prefix of S’ that 
is an element of the dictionary. For example, 
S = bbaabbbabbabbaab has an LZ78 parsing 
of b, ba, a, bb, bab, babb, aa, b. Clearly, all 
LZ78 phrases will be distinct, except possibly the 
final one. Let Sp denote the empty string. If S; = 
S;a, where 0 < j <i anda € &, the code word 
emitted by LZ78 for S; will be the pair (j,@). 
Thus, if LZ78 parses the string S into y words, its 
output is bounded by y log y + y log|’| + O(y) 
bits. 
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LZ77 

The LZ77 parsing algorithm takes a single posi- 
tive integer parameter w, called the window size. 
Say the encoder has already parsed the phrases 
S,,S2,..., Sj-1, that is, S = S,S2...S;—1S8’ 
for some suffix S’ of S. The next phrase S; starts 
at position g = |S,S2...S;-1| + 1 in S and 
is the shortest prefix of S’ that does not have an 
occurrence starting at any position g —w < p; < 
q in S. Thus defined, LZ77 phrases have the form 
ta, where t (possibly empty) is the longest prefix 
of S’ that also has an occurrence starting at some 
position g—w < p; < qin S anda is the symbol 
S[q + |t| + 1]. 

The version of LZ77 described above is of- 
ten called sliding window LZ77: a text window 
of length w that slides along the string during 
parsing is used to decide the next phrase. In the 
so-called infinite window LZ77, we enforce that 
q — w is always equal to O — in other words, 
throughout parsing the window stretches all the 
way back to the beginning of the string. Infinite 
window parsing is more powerful than sliding 
window parsing, and for the remainder of this 
entry, the term “LZ77” refers to infinite window 
LZ77, unless explicitly stated. 

For S = bbaabbbabbabbaab, the infinite 
window LZ77 parsing is b, ba, ab, bbab, 
babbaa, b. Note that phrases are allowed to 
overlap their definitions, as is the case with phrase 
Ss in our example. Like LZ78, all LZ77 phrases 
are distinct, with the possible exception of the 
last phrase. It is easy to see that infinite window 
LZ77 will always produce a smaller number of 
phrases than LZ78. If S$; = ta witha € &, the 
code word for S; is the triple (p;, €;,a@), where 
p; is the position of a previous occurrence of ¢ in 
S,S2...S;-1 and €; = |t|. 

Finally, it is important to note that for a given 
phrase S; = ta, there is sometimes more than 
one previous occurrence of f, leading to a choice 
of p; value. If pj < q is the largest possi- 
ble for every phrase, then we call the parsing 
rightmost. In their study on the bit complexity 
of LZ compression, Ferragina et al. [9] showed 
that the rightmost parsing can lead to encodings 
asymptotically smaller than what is achievable 
otherwise. 
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Compression and Decompression 
Complexity 

The LZ78 parsing for a string of length n can 
be computed in O(7) time by maintaining the 
phrases in a try. Doing so allows finding in time 
proportional to its length the longest prefix of the 
unparsed portion that is in the dictionary, and so 
the time overall is linear in n. Compression in 
sublinear time and space is possible using suc- 
cinct dynamic tries [16]. Decoding is somewhat 
symmetric — an explicit try is not needed, just 
the parent pointers implicitly represented in the 
encoded pairs. 

Recovering the original string S from its 
LZ77 parsing (i.e., decompression) is very easy: 
(pi,£i,a@) is decoded by copying the symbols 
of the substring S[p;...p; + €; — 1] and then 
appending a. By the definition of the parsing, any 
symbol we need to copy will have already been 
decoded (if we copy the strings left to right). 

Obtaining the LZ77 parsing in the first place 
is not as straightforward and has been the subject 
of intense research since LZ77 was published. 
Indeed, it seems safe to speculate that LZ78 and 
sliding window LZ77 were invented primarily 
because it was initially unclear how infinite win- 
dow LZ77 could be computed efficiently. Today, 
parsing is possible in worst-case O(n) time, using 
little more than n(logn + log ||) bits of space. 
Current state-of-the-art methods [14, 15, 18, 19] 
operate offline, combining the suffix array [24] of 
the input string with data structures for answering 
next and previous smaller value queries [10]. 
For online parsing, the current best algorithm 
uses O(n logn) time and O(n log|'|) bits of 
space [32]. 


Compression Effectiveness 

It is well known that LZ converges to the entropy 
of any ergodic source [6, 30,31]. However, it is 
also possible to prove compression bounds on 
LZ-based schemes without probabilistic assump- 
tions on the input, using the notion of empirical 
entropy [25]. 


Convergence to Empirical Entropy 
For any string S, the kth-order empirical entropy 
H;(S) is a lower bound on the compression 
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achievable by any compressor that assigns code 
words to symbols based on statistics derived from 
the k letters preceding each symbol in the string. 
In particular, the output of LZ78 (and so LZ77) 
is upper-bounded by |S|Hz(S) + o(|S| log |2'|) 
bits [20] for any k = o(log) s) 1). 


Relationship to Grammar Compression 

The smallest grammar problem is the problem 
of finding the smallest context-free grammar that 
generates only a given input string S. The size g* 
of the smallest grammar is a rather elegant mea- 
sure of compressibility, and Charikar et al. [5] 
established that finding it is NP-hard. They also 
considered several approximation algorithms, in- 
cluding LZ78. The LZ78 parsing of S can be 
viewed as a context-free grammar in which for 
each dictionary word S$; = Sj;a, there is a pro- 
duction rule X; = Xj;a. LZ78’s approximation 
ratio is rather bad: 2(n?/3/logn). 

Charikar et al. also showed that g* is at least 
the number of phrases z of the LZ77 parse of 
S and used the phrases of the parsing to derive 
a new grammar compression algorithm with ap- 
proximation ratio O(log(|S|/g*)). The same re- 
sult was discovered contemporaneously by Ryt- 
ter [28] and later simplified by Jez [17]. 


Greedy Versus Non-greedy Parsing 

LZ78 and LZ77 are both greedy algorithms: they 
select, at each step, the longest prefix of the 
remaining suffix of the input that is in the dic- 
tionary. For LZ77, the greedy strategy is optimal 
in the sense that it yields the minimum number 
of code words. However, if variable-length codes 
are used to represent each element of the code 
word triple, the greedy strategy does not yield 
an optimal parsing, as Ferragina, Nitto, and Ven- 
turini have recently established [9]. For LZ78, 
greedy parsing does not always produce the min- 
imum number of phrases. Indeed, in the worst 
case, greedy parsing can produce a factor O(./7) 
more than a simple non-greedy parsing strategy 
that, instead of choosing the prefix that gives the 
longest extension in the current iteration, chooses 
the prefix that gives the longest extension in the 
next iteration [26]. There are many, many variants 
on LZ parsing that relax the greedy condition 
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with the aim of reducing the overall encoding size 
in practice. Several of these non-greedy methods 
are covered in textbooks (e.g., [3,29]). 


Applications 


As outlined at the start of this entry, the major 
applications of Lempel and Ziv’s methods are in 
the fields of lossless data compression. However, 
the deep connections of these methods to string 
data mean the Lempel-Ziv parsing has also found 
important applications in string processing: the 
parsing reveals a great deal of information about 
the repetitive structure of the underlying string, 
and this can be used to design efficient algorithms 
and data structures. 


Pattern Matching in Compressed Space 
A compressed full-text index for a string 
S[l...n] is a data structure that takes space 
proportional to the entropy of S while 
simultaneously supporting efficient queries over 
S. The supported queries can be relatively 
simple, such as random access to symbols of 
S, or more complex, such as reporting all the 
occurrences of a pattern P[1...m] in S. 
Arroyuelo et al. [2] describe a compressed 
index based on LZ78. For any text S, their in- 
dex uses (2 + €)nH;,(S) + o(n log ||) bits of 
space and reports all c occurrences of P in S 
in O(m? logm + (m + c) logn) time. Their ap- 
proach stores two copies of the LZ78 dictionary 
represented as tries. One try contains the dictio- 
nary phrases, and the other contains the reverse 
phrases. The main trick to pattern matching is to 
then split the pattern in two (in all m possible 
ways) and then check for each half in the tries. 
Kreft and Navarro [21] describe a compressed 
index based on the LZ77 parsing. It requires 
3z logn + O(z log ||) + o(1) bits of space and 
supports extraction of £ symbols in O(¢h) time 
and pattern matching in O(m?h + mlogz + 
clog z) time, where h < ./n is the maximum 
length of a referencing chain in the parsing (a 
position is copied from another, that one from 
another, and so on, / times). More recently, 
Gagie et al. [12] describe a different LZ77-based 
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index with improved query bounds that takes 
O(zlogn log(n/z)) bits of space and supports 
extraction of € symbols in O( + logn) time 
and pattern matching in O(m logm +c log logn) 
time. A related technique called indexing by ker- 
nelization, which does not support access and 
restricts the maximum pattern length that can be 
searched for, has recently emerged as a promising 
practical approach applicable to highly repetitive 
data [11]. This technique uses an LZ parsing to 
identify a subsequence (called a kernel) of the 
text that is guaranteed to contain at least one 
occurrence of every pattern the text contains. This 
kernel string is then further processed to obtain an 
index capable of searching the original text. 

A related problem is the compressed matching 
problem, in which the text and the pattern are 
given together and the text is compressed. The 
task here is to perform pattern matching in the 
compressed text without decompressing it. For 
LZ-based compressors, this problem was first 
considered by Amir, Benson, and Farach [1]. 
Considerable progress has been made since then, 
and we refer the reader to [13,27] (and to another 
encyclopedia entry) for an overview of more 
recent results. 


String Alignment 

Crochemore, Landau, and Ziv-Ukelson [7] used 
LZ78 to accelerate sequence alignment: the prob- 
lem of finding the lowest-cost sequence of edit 
operations that transforms one string S[1...7] 
into another string T'[1 ...]. Masek and Paterson 
proposed an O(n?/logn) time algorithm that 
applies when the costs of the edit operations are 
rational. Crochemore et al.’s method runs in the 
same time in the worst case, but allows real- 
valued costs, and obtains an asymptotic speedup 
when the underlying texts are compressible. 

The textbook solution to the string alignment 
problem runs in O(n7), using a straightforward 
dynamic program that computes a matrix 
M[l...n,1...n]. The approach of the faster 
algorithms is to break the dynamic program 
matrix into blocks. Masek and Paterson use 
blocks of uniform size. Crochemore et al. use 
blocks delineated by the LZ78 parsing, the idea 
being that whenever they need to solve a block 
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M{i...i', 7 ...j’], they can solve it in O(i’—i + 
j'—j) time by essentially copying their solutions 
to the previous blocks M[i...i’—1, j ... j"] and 
M{i...i',j...j’ — 1]. A similar approach was 
later used to speed up training of hidden Markov 
models [23]. 


Open Problems 


Ferragina, Nitto, and Venturini [9] provide an al- 
gorithm for computing the rightmost LZ77 pars- 
ing that takes O(n + nlog|’|/loglogn) time 
and O(n) words of space to process a string of 
length n. The existence of an O(n) time algo- 
rithm independent of the alphabet size is an open 
problem. 

As mentioned above, the size z of the LZ77 
parsing is a lower bound on the size g* of the 
smallest grammar for a given string. Proving 
an asymptotic separation between g* and z (or, 
alternatively, finding a way to produce grammars 
of size z) is a problem of considerable theoretical 
interest. 


URLs to Code and Data Sets 


The source code of the gzip tool (based on 
LZ77) is available at http://www.gzip.org, and 
the related compression library zlib is available 
at http://www.zlib.net. Source code for the more 
efficient compressor LZMA is at: http://www.7- 
zip.org/sdk.html. 

Source code for more recent LZ parsing 
algorithms (developed in the last 2 years) is 
available at — http://www.cs.helsinki.fi/group/ 
pads/. This code includes the current fastest LZ 
parsing algorithms for both internal and external 
memory. 

The Pizza&Chili Corpus is a frequently used 
test data set for LZ parsing algorithms; see http:// 
pizzachili.dcc.uchile.cl/repcorpus.html. 


Cross-References 


Approximate String Matching 
Compressed Suffix Array 
Grammar Compression 


Le 


mpel-Ziv Compression 


Pattern Matching on Compressed Text 
Suffix Trees and Arrays 


Recommended Reading 


1. 


10. 


11. 


12. 


13. 


14. 


15. 


Amir A, Benson G, Farach M (1996) Let sleeping 
files lie: pattern matching in Z-compressed files. J 
Comput Syst Sci 52(2):299-307 

Arroyuelo D, Navarro G, Sadakane K (2012) 
Stronger Lempel-Ziv based compressed text index- 
ing. Algorithmica 62(1-2):54-101 

Bell TC, Cleary JG, Witten IH (1990) Text compres- 
sion. Prentice-Hall, Upper Saddle River 

Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach 
DA, Burrows M, Chandra T, Fikes A, Gruber RE 
(2008) Bigtable: a distributed storage system for 
structured data. ACM Trans Comp Sys 26(2):1-26 
Charikar M, Lehman E, Liu D, Panigrahy R, Prab- 
hakaran M, Sahai A, Shelat A (2005) The small- 
est grammar problem. [EEE Trans Inform Theory 
51(7):2554—2576 

Cover T, Thomas J (1991) Elements of information 
theory. Wiley, New York 

Crochemore M, Landau GM, Ziv-Ukelson M (2003) 
A subquadratic sequence alignment algorithm for 
unrestricted scoring matrices. SIAM J Comput 
32(6):1654—1673 

Ferragina P, Manzini G (2010) On compressing the 
textual web. In: Proceedings of the third interna- 
tional conference on web search and web data min- 
ing (WSDM) 2010, New York, 4-6 February 2010. 
ACM, pp 391-400 

Ferragina P, Nitto I, Venturini R (2013) On the 
bit-complexity of lempel-Ziv compression. SIAM J 
Comput 42(4):1521-1541 

Fischer J, Heun V (2011) Space-efficient preprocess- 
ing schemes for range minimum queries on static 
arrays. SIAM J Comput 40(2):465-492 

Gagie T, Puglisi SJ (2015) Searching and indexing 
genomic databases via kernelization. Front Bioeng 
Biotechnol 3(12). doi:10.3389/fbioe.2015.00012 
Gagie T, Gawrychowski P, Karkkainen J, Nekrich 
Y, Puglisi SJ (2014) LZ77-based self-indexing with 
faster pattern matching. In: Proceedings of Latin- 
American symposium on theoretical informatics 
(LATIN), Montevideo. Lecture notes in computer 
science, vol 8392. Springer, pp 731-742 
Gawrychowski P (2013) Optimal pattern matching 
in LZW compressed strings. ACM Trans Algorithms 
9(3):25 

Goto K, Bannai H (2013) Simpler and faster Lempel 
Ziv factorization. In: Proceedings of the 23rd data 
compression conference (DCC), Snowbird, pp 133- 
142 

Goto K, Bannai H (2014) Space efficient linear time 
Lempel-Ziv factorization for small alphabets. In: Pro- 
ceedings of the 24th data compression conference 
(DCC), Snowbird, pp 163-172 


16. 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


29. 


30. 


31. 


32: 


1099 


Jansson J, Sadakane K, Sung W (2007) Com- 
pressed dynamic tries with applications to LZ- 
compression in sublinear time and space. In: Proceed- 
ings of 27th FSTTCS, Montevideo. Lecture notes in 
computer science, vol 4855. Springer, New Delhi, 
pp 424-435 

Jez A (2014) A really simple approximation of 
smallest grammar. In: Kulikov AS, Kuznetsov SO, 
Pevzner PA (eds) Proceedings of 25th annual sym- 
posium combinatorial pattern matching (CPM) 2014, 
Moscow, 16-18 June 2014. Lecture notes in computer 
science, vol 8486. Springer, pp 182-191 

Karkkainen J, Kempa D, Puglisi SJ (2013) Linear 
time Lempel-Ziv factorization: simple, fast, small. In: 
Proceedings of CPM, Bad Herrenalb. Lecture notes in 
computer science, vol 7922, pp 189-200 

Kempa D, Puglisi SJ (2013) Lempel-Ziv factoriza- 
tion: simple, fast, practical. In: Zeh N, Sanders P 
(eds) Proceedings of ALENEX, New Orleans. SIAM, 
pp 103-112 

Kosaraju SR, Manzini G (1999) Compression of low 
entropy strings with lempel-ziv algorithms. SIAM J 
Comput 29(3):893-91 1 

Kreft S, Navarro G (2013) On compressing and 
indexing repetitive sequences. Theor Comput Sci 
483:115-133 

Lempel A, Ziv J (1976) On the complexity of finite 
sequences. IEEE Trans Inform Theory 22(1):75-81 
Lifshits Y, Mozes S, Weimann O, Ziv-Ukelson 
M (2009) Speeding up hmm decoding and train- 
ing by exploiting sequence repetitions. Algorithmica 
54(3):379-399 

Manber U, Myers GW (1993) Suffix arrays: a new 
method for on-line string searches. SIAM J Comput 
22(5):935—948 

Manzini G (2001) An analysis of the Burrows- 
Wheeler transform. J ACM 48(3):407-430 

Matias Y, Sahinalp SC (1999) On the optimality of 
parsing in dynamic dictionary based data compres- 
sion. In: Proceedings of the tenth annual ACM-SIAM 
symposium on discrete algorithms, 17-19 January 
1999, Baltimore, pp 943-944 

Navarro G, Tarhio J (2005) LZgrep: a Boyer-Moore 
string matching tool for Ziv-Lempel compressed text. 
Softw Pract Exp 35(12):1107-1130 

Rytter W (2003) Application of Lempel-Ziv fac- 
torization to the approximation of grammar-based 
compression. Theor Comput Sci 302(1-3):211-222 
Salomon D (2006) Data compression: the complete 
reference. Springer, New York/Secaucus 

Sheinwald D (1994) On the Ziv-Lempel proof and 
related topics. Proc. IEEE 82:866-871 

Wyner A, Ziv J (1994) The sliding-window Lempel- 
Ziv algorithm is asymptotically optimal. Proc IEEE 
82:872-877 

Yamamoto J, I T, Bannai H, Inenaga S, Takeda M 
(2014) Faster compact on-line Lempel-Ziv factoriza- 
tion. In: Proceedings of 31st international symposium 
on theoretical aspects of computer science (STACS), 
Lyon. LIPIcs 25, pp 675-686 


1100 


33. Ziv J, Lempel A (1977) A universal algorithm for 
sequential data compression. IEEE Trans Inform The- 
ory 23(3):337-343 

34. Ziv J, Lempel A (1978) Compression of individ- 
ual sequences via variable-rate coding. IEEE Trans 
Inform Theory 24(5):530-536 


Leontief Economy Equilibrium 


Yinyu Ye 

Department of Management Science and 
Engineering, Stanford University, Stanford, CA, 
USA 


Keywords 


Bimatrix game; Competitive exchange; Compu- 
tational equilibrium; Leontief utility function 


Synonyms 


Algorithmic game theory; Arrow-debreu market; 
Max-min utility; Walras equilibrium 


Years and Authors of Summarized 
Original Work 


2005; Codenotti, Saberi, Varadarajan, Ye 
2005; Ye 


Problem Definition 


The Arrow-Debreu exchange market equilibrium 
problem was first formulated by Léon Walras in 
1954 [7]. In this problem, everyone in a popula- 
tion of m traders has an initial endowment of a 
divisible goods and a utility function for consum- 
ing all goods — their own and others’. Every trader 
sells the entire initial endowment and then uses 
the revenue to buy a bundle of goods such that his 
or her utility function is maximized. Walras asked 
whether prices could be set for everyone’s goods 
such that this is possible. An answer was given by 
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Arrow and Debreu in 1954 [1] who showed that, 
under mild conditions, such equilibrium would 
exist if the utility functions were concave. In gen- 
eral, it is unknown whether or not an equilibrium 
can be computed efficiently; see, e.g., » General 
Equilibrium. 

Consider a special class of Arrow-Debreu’s 
problems, where each of the n traders has exactly 
one unit of a divisible and distinctive good for 
trade, and let traderi,i = 1,...,n, bring good i, 
where the class of problems is called the pairing 
class. For given prices p; on good j, consumer 
i’s maximization problem is 


maximize uj; (Xj1,...,Xin) 
subject to )); pixij < Pi, (1) 
xij 20, VY, 


where x;; is the quantity of good j purchased 
by trader i. Let x7 denote a maximal solution 
vector of (1). Then, vector p is called the Arrow- 
Debreu price equilibrium if there exists an x* for 
consumer i, i = 1,...,n, to clear the market, 


that is, 
Watae 
i 


where e is the vector of all ones representing 
available goods on the exchange market. 

The Leontief economy equilibrium problem is 
the Arrow-Debreu equilibrium problem when the 
utility functions are in the Leontief form: 


hy hy2... Ain 
H = ho hoo sae han : (2) 
hny hn : han 


Here, one may assume that 


Assumption 1 H has no all-zero row, that is, 
every trader likes at least one good. 
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Key Results 


Let u; be the equilibrium utility value of con- 
sumer i and p; be the equilibrium price for good 
i,i = 1,...,n. Also, let U and P be diagonal 
matrices whose diagonal entries are u;’s and 
Pi’s, tespectively. Then, the Leontief economy 
equilibrium p € R”, together with u € R”, must 
satisfy 


UHp = p, 
P(e— Hx) =0, 
Hu <e, 

u, p = 0, 

p#O. 


(3) 


One can prove: 


Theorem 1 (Ye [8]) System (3) always has a 
solution (u # 0, p) under Assumption I (i.e., 
H has no all-zero row). However, a solution to 
System (3) may not be a Leontief equilibrium, 
although every Leontief equilibrium satisfies Sys- 
tem (3). 


A solution to System (3) is called a quasi- 
equilibrium. For example, 


120 
012 
001 


H' = 


has a quasi-equilibrium p’ = (1, 0, 0) and 
ue = (1, 0, 0), but it is not an equilibrium. 
This is because that trader 3, although with zero 
budget, can still purchase goods 2 and 3 at zero 
prices. In fact, check if H has an equilibrium that 
is an NP-hard problem; see discussion later. How- 
ever, under certain sufficient conditions, e.g., all 
entries in H are positive, every quasi-equilibrium 
is an equilibrium. 

Theorem 2 (Ye [8]) Let B Cc ({1,2,...,n}, 
N = {1,2,...,n}\ B, Hpp be irreducible, and 
ug Satisfy the linear system 

and up>O. 


Hp pup =e, Hy ie Se: 
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Then the (right) Perron-Frobenius eigenvector 
Pe of UgHgp together with py = O 
will be a solution to System (3). And_ the 
converse is also true. Moreover, there is always 
a rational solution for every such B, that 
is, the entries of price vector are rational 
numbers, if the entries of H are rational. 
Furthermore, the size (bit length) of the 
solution is bounded by the size (bit length) 
of H. 


The theorem implies that the traders in block 
B can trade among themselves and keep others 
goods “free.” In particular, if one trader likes his 
or her own good more than any other good, that 
is, hj; > hj; for all 7, then uj = 1/hij, pi = 1, 
and uj; = p; = O for all j F 7, that is, 
B = {i}, makes a Leontief economy equilibrium. 
The theorem thus establishes, for the first time, 
a combinatorial algorithm to compute a Leontief 
economy equilibrium by finding a right block 
B # 9, which is actually a nontrivial solution 
(u # 0) to an LCP problem 


HTutv=e,u'v=0,04u,v>0. (4) 


If H > 0, then any complementary solution u 4 
0, together with its support B = {j : u; > O}, 
of (4) induce a Leontief economy equilibrium 
that is the (right) Perron-Frobenius eigenvector of 
Up Hepp, and it can be computed in polynomial 
time by solving a linear equation. Even if H # 0, 
any complementary solution u #4 0 and B = {7 : 
u; > O}, as long as Hz is irreducible, induces 
an equilibrium for System (3). The equivalence 
between the pairing Leontief economy model and 
the LCP also implies 


Corollary 1 LCP (4) always has a nontrivial 
solution u # 0, where Hgp is irreducible with 
B={j: uj > 0}, under Assumption I (i.e., H 
has no all-zero row). 


If Assumption | does not hold, the corollary may 
not be true; see example below: 


02 
T _ 
aT = (93). 
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Applications 


Given an arbitrary bimatrix game, specified by a 
pair of n x m matrices A and B, with positive 
entries, one can construct a Leontief exchange 
economy with n + m traders and n + m goods 
as follows. In words, trader i comes to the market 


with one unit of good i, fori = 1,...,1 +m. 
Traders indexed by any j € {1,...,m} receive 
some utility only from goods j € {n+1,...,0+ 


m}, and this utility is specified by parameters 
corresponding to the entries of the matrix B. 
More precisely the proportions in which the j-th 
trader wants the goods are specified by the entries 
on the jth row of B. Vice versa, traders indexed 
by any 7 € {n+ 1,...,n + m} receive some 
utility only from goods 7 € {1,...,m}. In this 
case, the proportions in which the j-th trader 
wants the goods are specified by the entries on 
the j-th column of A. 

In the economy above, one can partition the 
traders in two groups, which bring to the market 
disjoint sets of goods and are only interested 
in the goods brought by the group they do not 
belong to. 


Theorem 3 (Codenotti et al. [4]) Let (A, B) 
denote an arbitrary bimatrix game, where one 
assumes, w.l.o.g., that the entries of the matrices 
A and B are all positive. Let 


r_(0A 
= =(5r 


describe the Leontief utility coefficient matrix 
of the traders in a Leontief economy. There is 
a one-to-one correspondence between the Nash 
equilibria of the game (A,B) and the market 
equilibria H of the Leontief economy. Further- 
more, the correspondence has the property that 
a strategy is played with positive probability at a 
Nash equilibrium if and only if the good held by 
the corresponding trader has a positive price at 
the corresponding market equilibrium. 


The theorem implies that finding an equilibrium 
for Leontief economies is at least as hard as find- 
ing a Nash equilibrium for two-player nonzero 
sum games, a problem recently proven PPAD- 
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complete (Chen and Deng [3]), where no poly- 
nomial time approximation algorithm is known 
today. 

Furthermore, Gilboa and Zemel [6] proved a 
number of hardness results related to the compu- 
tation of Nash equilibria (NE) for finite games 
in normal form. Since the NE for games with 
more than two players can be irrational, these 
results have been formulated in terms of NP- 
hardness for multiplayer games, while they can 
be expressed in terms of NP-completeness for 
two-player games. Using a reduction to the NE 
game, Codenotti et al. proved: 


Theorem 4 (Codenotti et al. [4]) Jt is NP-hard 
to decide whether a Leontief economy H has an 
equilibrium. 


On the positive side, Zhu et al. [9] recently 
proved the following result: 


Theorem 5 Let the Leontief utility matrix H be 
symmetric and positive. Then there is a fully poly- 
nomial time approximation scheme (FPTAS) for 
approximating a Leontief equilibrium, although 
the equilibrium set remains non-convex or non- 
connected. 
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Recommended Reading 


The reader may want to read Brainard and Scarf 
[2] on how to compute equilibrium prices in 
1891; Chen and Deng [2] on the most recent 
hardness result of computing the bimatrix game; 
Cottle et al. [5] for literature on linear comple- 
mentarity problems; and all references listed in 
[4] and [8] for the recent literature on computa- 
tional equilibrium. 
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Problem Definition 


We provide a general method to prove the ex- 
istence and compute efficiently elimination or- 
derings in graphs. Our method relies on several 
tools that were known before but that were not 
put together so far: the algorithm LexBFS due to 
Rose, Tarjan, and Lueker, one of its properties 
discovered by Berry and Bordat, and a local 
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decomposition property of graphs discovered by 
Maffray, Trotignon, and VuSkovic¢. 


Terminology 

In this paper, all graphs are finite and simple. 
A graph G contains a graph F if F is isomorphic 
to an induced subgraph of G. A class of graphs 
is hereditary if for every graph G of the class, all 
induced subgraphs of G belong to the class. A 
graph G is F--free if it does not contain F. When 
F is a set of graphs, G is F-free if it is F-free 
for every F € F. Clearly every hereditary class 
of graphs is equal to the class of F-free graphs 
for some ¥ (F can be chosen to be the set of all 
graphs not in the class but all induced subgraphs 
of which are in the class). The induced subgraph 
relation is not a well quasi order (contrary, e.g., 
to the minor relation), so the set F does not need 
to be finite. 

When X C V(G), we write G[X] for the 
subgraph of G induced by X. An ordering 
(v1,...,Un) of the vertices of a graph G is an 
F -elimination ordering if for everyi = 1,...,n, 
NG{poq,....v;}](i) is F-free. Note that this is 
equivalent to the existence, in every induced 
subgraph of G, of a vertex whose neighborhood 
is F-free. 


Example 

Let us illustrate our terminology on a classical 
example. We denote by S2 the independent graph 
on two vertices. A vertex is simplicial if its neigh- 
borhood is S2-free, or equivalently is a clique. A 
graph is chordal if it is hole-free, where a hole is 
a chordless cycle of length at least 4. 


Theorem 1 (Dirac [6]) Every chordal graph ad- 
mits an {S2}-elimination ordering. 


Theorem 2 (Rose, Tarjan, and Lueker [14]) 
There exists a linear-time algorithm that com- 
putes an {S>}-elimination ordering of an input 
chordal graph. 


LexBFS 

To explain the results, we need to define LexBFS. 
It is a linear time algorithm of Rose, Tarjan, and 
Lueker [14] whose input is any graph G together 
with a vertex s and whose output is a linear 
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ordering of the vertices of G starting at s. A linear 
ordering of the vertices of a graph G is a LexBF'S 
ordering if there exists a vertex s of G such that 
the ordering can be produced by LexBFS when 
the input is G,s. The order from Theorem 2 is 
in fact computed by LexBFS. We do not need 
here to define LexBFS more precisely, because 
the following result fully characterizes LexBFS 
orderings. 


Theorem 3 (Brandstadt, Dragan, and Nico- 
lai [3]) An ordering < of the vertices of a graph 
G = (V,E) is a LexBFS ordering if and only 
if it satisfies the following property: for all 
a,b,c € V such thatc ~ b =~ a,ca € E 
and cb ¢ E there exists a vertex d in G such that 
d<c,dbé Eandda¢ E. 


Key Results 


The following property was introduced by Maf- 
fray, Trotignon, and VuSskovic in [12] (where it 
was called Property (*)). 


Definition 1 Let F be a set of graphs. A graph G 
is locally F-decomposable if for every vertex v 
of G, every F € F contained in N(v), and every 
connected component C of G—WN [ov], there exists 
y € F such that y has a non-neighbor in F and 
no neighbors in C. A class of graphs C is locally 
F -decomposable if every graph G € C is locally 
F -decomposable. 


It is easy to see that if a graph is locally F- 
decomposable, then so are all its induced sub- 
graphs. Therefore, for all sets of graphs F, the 
class of graphs that are locally F-decomposable 
is hereditary. The main result is the following. 


Theorem 4 [fF is a set of non-complete graphs, 
and G is a locally F-decomposable graph, then 
every LexBFS ordering of G is an F-elimination 
ordering. 


First Example of Application 

Let us now illustrate how Theorem 4 can be 
used with the simplest possible set made of non- 
complete graphs F = {Sz}, where Sz is the 
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independent graph on two vertices. The following 
is well known and easy to prove. 


Lemma 1 A graph G is locally {S2}-decomp- 
osable if and only if G is chordal. 


Hence, a proof for Theorems | and 2 is easily 
obtained by using Lemma | and Theorem 4. 


Sketch of Proof 
The proof of Theorem 4 relies mainly on the 
following. 


Theorem 5 (Berry and Bordat [2]) /f G is a 
non-complete graph and z is the last vertex of 
a LexBFS ordering of G, then there exists a 
connected component C of G — N[z] such that 
for every neighbor x of z, either N[x] = N[z] or 
N(X)NC FM. 


Equivalently, if we put z together with its 
neighbors of the first type, the resultant set of 
vertices is a clique, a homogeneous set, and its 
neighborhood is a minimal separator. Such sets 
are called moplexes in [2] and Theorem 5 is stated 
in term of moplexes in [2]. Note that Theorem 5 
can be proved from the following very convenient 
lemma. 


Lemma 2 Let < be a LexBFS ordering of a 
graph G = (V,E). Let z denote the last vertex 
in this ordering. Then for all vertices a,b,c € V 
such thatc <~ b < aandca € E, there exists 
a path from b to c whose internal vertices are 
disjoint from N[z]. 


Truemper Configurations 

To state the next results, we need special types of 
graphs that are called Truemper configurations. 
They play an important role in structural graph 
theory; see [15]. Let us define them. A 3-path 
configuration is a graph induced by three inter- 
nally vertex disjoint paths of length at least 1, 
Py = x1...¥1, Pp = X2...¥2 and P3 = 
x3...)3, such that either x} = x2 = Xx3 or 
X1,X2,%X3 are all distinct and pairwise adjacent 
and either yj) = yo = y3 Or V1, V2, y3 are all 
distinct and pairwise adjacent. Furthermore, the 
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vertices of P; U P;,i # j induce a hole. Note 
that this last condition in the definition implies 
the following: 


e If x1,X2,x3 are distinct (and therefore pair- 
wise adjacent) and y,, y2, y3 are distinct, then 
the three paths have length at least 1. In this 
case, the configuration is called a prism. 

¢ Ifxy = x2 = x3 and yj = y2 = 93, then the 
three paths have length at least 2 (since a path 
of length 1 would form a chord of the cycle 
formed by the two other paths). In this case, 
the configuration is called a theta. 

e If xy = x2 = x3 and yj, y2, y3 are distinct, 
or if x1, X2, x3 are distinct and y; = y2 = y3, 
then at most one of the three paths has length 
1, and the others have length at least 2. In this 
case, the configuration is called a pyramid. 


A wheel (H, v) is a graph formed by a hole H, 
called the rim, and a vertex v, called the center, 
such that the center has at least three neighbors 
on the rim. A Truemper configuration is a graph 
that is either a prism, a theta, a pyramid, or a 
wheel. 

A hole ina graph is a chordless cycle of length 
at least 4. It is even or odd according to the parity 
of the number of its edges. A graph is universally 
signable if it contains no Truemper configuration. 


Speeding Up of Known Algorithms 

We now state the previously known optimization 
algorithms for which we get better complexity by 
applying our method. In each case, we prove the 
existence of an elimination ordering, compute it 
with LexBFS, and take advantage of the ordering 
to solve the problem. Each time, we improve the 
previously known complexity by at least a factor 
of n: 


¢ Maximum weighted clique in even-hole-free 
graphs in time O(nm) 

e Maximum weighted clique in universally 
signable graphs in time O(n + m) 

¢ Coloring in universally signable graphs in 
time O(n + m) 
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New Algorithms 

We now apply systematically our method to all 
possible sets made of non-complete graphs of 
order 3. For each such set F (there are seven of 
them), we provide a class with a F-elimination 
ordering. 

To describe the classes of graphs that we 
obtain, we need to be more specific about wheels. 
A wheel is a /-wheel if for some consecutive 
vertices x, y,z of the rim, the center is adjacent 
to y and nonadjacent to x and z. A wheel is a 
2-wheel if for some consecutive vertices x, y, Z 
of the rim, the center is adjacent to x and y and 
nonadjacent to z. A wheel is a 3-wheel if for some 
consecutive vertices x, y, z of the rim, the center 
is adjacent to x, y and z. Observe that a wheel 
can be simultaneously a 1-wheel, a 2-wheel, and 
a 3-wheel. On the other hand, every wheel is a 1- 
wheel, a 2-wheel, or a 3-wheel. Also, any 3-wheel 
is either a 2-wheel or a universal wheel (1.e., a 
wheel whose center is adjacent to all vertices of 
the rim). 

Up to isomorphism, there are four graphs on 
three vertices, and three of them are not com- 
plete. These three graphs (namely, the indepen- 
dent graph on three vertices denoted by S3, the 
path of length 2 denoted by P3, and its com- 
plement denoted by P3) are studied in the next 
lemma. 


Lemma 3 Fora graph G, the following hold: 


(i) G is locally {S3}-decomposable if and only 
if G is {1-wheel, theta, pyramid}-free. 
(ii) G is locally { P3}-decomposable if and only 
if G is 3-wheel-free. 
(iii) G is locally {P3}-decomposable if and only 
if G is {2-wheel, prism, pyramid}-free. 


Applying our method then leads to the next 
result, that is, a description of eight classes of 
graphs (one of them is the class of chordal 
graphs, and one of them is the class of universally 
signable graphs). They are described in Table 1: 
the second column describes the forbidden 
induced subgraphs that define the class and the 
last column describes the neighborhood of the 
last vertex of a LexBFS ordering. 
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LexBFS, Structure, and Algorithms, Table 1 Eight classes of graphs 


i Class C; Fj 


1 {1-wheel, theta, pyramid }-free {, 


r 
2  3-wheel-free we 
=) 


3. {2-wheel, prism, pyramid }-free ie, 
{ 1-wheel, 3-wheel, theta, { 7s \ 

4 : pa 
pyramid }-free 
{1-wheel, 2-wheel, prism, theta, { \ 

5 ¢ 0 oe, e—0 
pyramid }-free 

6 {2-wheel, | 3-wheel, _ prism, 1 7 } 
pyramid }-free 

7 {wheel, prism, theta, pyramid}- { 7 oo \ 
free 

8 | hole-free 


LexBFS, Structure, and Algorithms, Table 2 Several 
properties of classes defined in Table | 


i Max clique Coloring 

1 NP-hard [13] NP-hard [9] 

2 O(nm) [14] NP-hard [11] 

3 O(nm) NP-hard [11] 

4 O(n+m) 7 

5 O(nm) ? 

6 O(n+m) NP-hard [11] 

7 O(n+m) O(n +m) 

8 O(n + m) [14] O(n + m) [14] 


Theorem 6 For i = 1,...,8, let C; and Ff; 
be the classes defined as in Table 1. Fori = 
1,...,8, the class C; is exactly the class of locally 
Fj -decomposable graphs. 


For each class Cj, we survey in Table 2 the 
complexity of the maximum clique problem (for 
which our method provides sometimes a fast 
algorithm) and of the coloring problem. 


Open Problems 


Addario-Berry, Chudnovsky, Havet, Reed, and 
Seymour [1] proved that every even-hole-free 
graph admits a vertex whose neighborhood is the 
union of two cliques. We wonder whether this 
result can be proved by some search algorithm. 


Neighborhood 


No stable set of size 3 
Disjoint union of cliques 
Complete multipartite 


Disjoint union of at most two 
cliques 


Stable sets of size at most 2 with 
all possible edges between them 


Clique or stable set 


Clique or stable set of size 2 


Clique 


Our work suggests that a linear time algorithm 
for the maximum clique problem might exist in 
Cz, but we could not find it. 

We are not aware of a polynomial time color- 
ing algorithm for graphs in C4 or Cs. 

Since class C; generalizes claw-free graphs, it 
is natural to ask which of the properties of claw- 
free graphs it has, such as a structural descrip- 
tion (see [4]), a polynomial time algorithm for 
the maximum stable set (see [7]), approximation 
algorithms for the chromatic number (see [10]), 
and a polynomial time algorithm for the induced 
linkage problem (see [8]). 

In [5], an O(nm) time algorithm is described 
for the maximum weighted stable set problem in 
C7. Since the class is a simple generalization of 
chordal graphs, we wonder whether a linear time 
algorithm exists. 
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Problem Definition 


In this article, we discuss the problem of testing 
linearity of functions and, more generally, test- 
ing whether a given function is a group homo- 
morphism. An algorithm for this problem, given 
by [9], is one of the most celebrated property 
testing algorithms. It is part of or is a special 
case of many important property testers for alge- 
braic properties. Originally designed for program 
checkers and self-correctors, it has found uses in 
probabilistically checkable proofs (PCPs), which 
are an essential tool in proving hardness of ap- 
proximation. 

We start by formulating an important spe- 
cial case of the problem, testing the linearity of 
Boolean functions. A function f : {0,1}” —> 
{0,1} is linear if for some a1,d2,...,dn € 


{0, I}, 


F(X1, X2,-.+5,Xn) = AyX1, + dgxX2 +++ AnXp. 
The operations in this definition are over F. 
That is, given vectors xk = (X1,...,Xn) andy = 
(1,---,¥n), Where x1,...,Xn,Vi,---,¥n € 
{0,1}, the vector x + y = (x; + y; mod 
2,...,Xn + Yn mod2). There is another, 


equivalent definition of linearity of Boolean 
functions over {0,1}”: a function f is linear 
if for all x, y € {0, 1}”, 


fQ)+ FIO) =FAt+Y), 


A generalization of a linear function, defined 
above, is a group homomorphism. Given two 
finite groups, (G,o) and (H, *), a group homo- 
morphism from G to H isa function f :G > H 
such that for all elements x, y € G, 


f(x) * f(y) = f(x oy). 
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We would like to test (approximately) whether 
a given function is linear or, more generally, is a 
group homomorphism. Next, we define the prop- 
erty testing framework [12,23]. Linearity testing 
was the first problem studied in this framework. 
The linearity tester of [9] actually preceded the 
definition of this framework and served as an 
inspiration for it. Given a proximity parameter 
e € (0,1), a function is €-far from satisfying a 
specific property P (such as being linear or being 
a group homomorphism) if it has to be modified 
on at least an € fraction of its domain in order 
to satisfy P. A function is €-close to P if it is 
not €-far from it. A tester for property P gets a 
parameter « € (0,1) and an oracle access to a 
function f. It must accept with probability (The 
choice of error probability in the definition of 
the tester is arbitrary. Using standard techniques, 
a tester with error probability 1/3 can be turned 
into a tester with error probability 6 € (0, 1/3) 
by repeating the original tester O(log x) times 
and taking the majority answer.) at least 2/3 if 
the function f satisfies property P and reject 
with probability at least 2/3 if f is ¢-far from 
satisfying P. Our goal is to design an efficient 
tester for group homomorphism. 


Alternative Formulation 

Another way of viewing the same problem is in 
terms of error-correcting codes. Given a function 
f : G — H, we can form a codeword corre- 
sponding to f by listing the values of f on all 
points in the domain. The homomorphism code 
is the set of all codewords that correspond to 
homomorphisms from G to H. This is an error- 
correcting code with large distance because, for 
two different homomorphisms f,g : G > H, 
the fraction of points x € G on which f(x) = 
g(x) is at most 1/2. In the special case when G 
is {0, 1}” and H is {0,1}, we get the Hadamard 
code. Our goal can be formulated as follows: 
design an efficient algorithm that tests whether a 
given string is a codeword of a homomorphism 
code (or e€-far from it). 


Key Results 


The linearity (homomorphism) tester designed by 
[9] repeats the following test several times, until 
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the desired success probability is reached, and 
accepts iff all iterations accept. 


Algorithm 1: BLR Linearity (Homomor- 
phism) Test 
input : Oracle access to an unknown function 


f:G-oH. 


1 Pick x, y € G uniformly and independently at 
random. 
2 Query f on x, y, and x + y to find out 


F(x), f(y), and f(x + y). 
3 Accept if f(x) + f(y) = f(x + y); otherwise, 
reject. 


Blum et al. [9] and Ben-Or et al. [7] showed 
that O(1/e) iterations of the BLR test suffice to 
get a property tester for group homomorphism. 
(The analysis in [9] worked for a special case of 
the problem, and [7] extended it to all groups). 
It is not hard to prove that {2(1/e) queries are 
required to test for linearity and, in fact, any non- 
trivial property, so the resulting tester is optimal 
in terms of the query complexity and the running 
time. 

Lots of effort went into understanding the 
rejection probability of the BLR test for functions 
that are €-far from homomorphisms over various 
groups and, especially, for the case F = {0, 1}” 
(see [17] and references therein). A nice expo- 
sition of the analysis for the latter special case, 
which follows the Fourier-analytic approach of 
[5], can be found in the book by [21]. 

Several works [8, 14, 24-26] showed how to 
reduce the number of random bits required by 
homomorphism tests. In the natural implementa- 
tion of the BLR test, 2log|G| random bits per 
iteration are used to pick x and y. Shpilka and 
Wigderson [25] gave a homomorphism test for 
general groups that needs only (1+ 0(1)) log, |G| 
random bits. 

The case when G is a subset of an infinite 
group, f is a real-valued function, and the oracle 
query to f returns a finite-precision approxi- 
mation to f(x) has been considered in [2, 10, 
11, 19,20]. These works gave testers with query 
complexity independent of the domain size (see 
[18] for a survey). 
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Applications 


Self-Testing/Correcting Programs 

The linearity testing problem was motivated 
in [9] by applications to self-testing and self- 
correcting of programs. Suppose you are given 
a program that is known to be correct on most 
inputs but has not been checked (or, perhaps, 
is even known to be incorrect) on remaining 
inputs. A self-tester for f is an algorithm that 
can quickly verify whether a given program 
that supposedly computes f is correct on most 
inputs, without the aid of another program for f 
that has already been verified. A self-corrector 
for f is an algorithm that takes a program that 
correctly computes f on most inputs and uses it 
to correctly compute f on all inputs. 

Blum et al. [9] used their linearity test to con- 
struct self-testers for programs intended to com- 
pute various homomorphisms. Such functions in- 
clude integer, polynomial, matrix, and modular 
multiplication and division. Once it is verified 
that a program agrees on most inputs with some 
homomorphism, the task of determining whether 
it agrees with the correct homomorphism on most 
inputs becomes much easier. 

For programs intended to compute homomor- 
phisms, it is easy to construct self-correctors: 
Suppose a program outputs f(x) on input x, 
where f agrees on most inputs with a homomor- 
phism g. Fix a constant c. Consider the algorithm 
that, on input x, picks clog1/é values y from 
the domain G uniformly at random, computes 
f(x + y) — f(Q), and outputs the value that is 
seen most often, breaking ties arbitrarily. If f is 
z-close to g, then, since both y and x + y are 
uniformly distributed in G, it is the case that for 
at least 3/4 of the choices of y, both g(x + y) = 
f(x + y) and g(y) = f(y), in which case 
F(x + vy) — fv) = g(x). Thus, it is easy to 
show that there is a constant c such that if f is 
=-close to ahomomorphism g, then for all x, the 
above algorithm outputs g(x) with probability at 
least 1 — 6. 


Probabilistically Checkable Proofs 
We discussed an equivalent formulation of the 
linearity testing problem in terms of testing 
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whether a given string is a codeword of a 
Hadamard code. This formulation has been used 
in proofs of hardness of approximation of some 
NP-hard problems and to construct PCP systems 
that can be verified with a few queries (see, e.g., 
[3, 13]). 


The BLR Test as a Building Block 

The BLR test has been generalized and extended 
in many ways, as well as used as a building block 
in other testers. One generalization, particularly 
useful in PCP constructions, is to testing if a 
given function is a polynomial of low degree (see, 
e.g., [1, 15, 16]). Other generalizations include 
tests for long codes [6, 13] and tests of linear 
consistency among multiple functions [4]. An ex- 
ample of an algorithm that uses the BLR test as a 
building block is a tester by [22] for the singleton 
property of functions f : {0,1}” — {0,1}, 
namely, the property that the function f(x) = x; 
for some i € [1,7]. 


Open Problems 


We discussed that the BLR test can be used 
to check whether a given string is a Hadamard 
codeword or far from it. For which other codes 
can such a check be performed efficiently? In 
other words, which codes are locally testable? We 
refer the reader to the entry » Locally Testable 
Codes. 

Which other properties of functions can be 
efficiently tested in the property testing model? 
Some examples are given in the entries > Testing 
Juntas and Related Properties of Boolean 
Functions and » Monotonicity Testing. Testing 
properties of graphs is discussed in the entries 

Testing Bipartiteness in the Dense-Graph 
Model and » Testing Bipartiteness of Graphs 
in Sublinear Time. 


Cross-References 

> Learning Heavy Fourier Coefficients of 
Boolean Functions 

> Locally Testable Codes 
Monotonicity Testing 
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Quantum Error Correction 

Testing Bipartiteness in the Dense-Graph 
Model 

Testing Bipartiteness of Graphs in Sublinear 
Time 

Testing if an Array Is Sorted 

Testing Juntas and Related Properties of 
Boolean Functions 


Acknowledgments The first author was supported in 
part by NSF award CCF-1422975 and by NSF CAREER 
award CCF-0845701. 


Recommended Reading 


1. 


10. 


11. 


Alon N, Kaufman T, Krivilevich M, Litsyn S, Ron D 
(2003) Testing low-degree polynomials over GF(2). 
In: Proceedings of RANDOM’ 03, Princeton, pp 188— 
199 

Ar S, Blum M, Codenotti B, Gemmell P (1993) 
Checking approximate computations over the reals. 
In: Proceedings of the Twenty-Fifth Annual ACM 
Symposium on the Theory of Computing, San Diego, 
pp 786-795 

Arora S, Lund C, Motwani R, Sudan M, Szegedy 
M (1998) Proof verification and the hardness of 
approximation problems. J ACM 45(3): 501-555 
Aumann Y, Hastad J, Rabin MO, Sudan M (2001) 
Linear-consistency testing. J Comput Syst Sci 
62(4):589-607 

Bellare M, Coppersmith D, Hastad J, Kiwi M, Sudan 
M (1996) Linearity testing over characteristic two. 
IEEE Trans Inf Theory 42(6):1781-1795 

Bellare M, Goldreich O, Sudan M (1998) Free 
bits, PCPs, and nonapproximability—towards tight 
results. SIAM J Comput 27(3):804—-915 

Ben-Or M, Coppersmith D, Luby M, Rubinfeld R 
(2008) Non-Abelian homomorphism testing, and dis- 
tributions close to their self-convolutions. Random 
Struct Algorithms 32(1):49-70 

Ben-Sasson E, Sudan M, Vadhan S, Wigderson A 
(2003) Randomness-efficient low degree tests and 
short PCPs via epsilon-biased sets. In: Proceedings 
of the Thirty-Fifth Annual ACM Symposium on the 
Theory of Computing, San Diego, pp 612-621 

Blum M, Luby M, Rubinfeld R (1993) Self- 
testing/correcting with applications to numerical 
problems. JCSS 47:549-595 

Ergun F, Kumar R, Rubinfeld R (2001) Checking 
approximate computations of polynomials and func- 
tional equations. SIAM J Comput 31(2):s 550-576 
Gemmell P, Lipton R, Rubinfeld R, Sudan M, 
Wigderson A (1991) Self-testing/correcting for poly- 
nomials and for approximate functions. In: Proceed- 
ings of the Twenty-Third Annual ACM Symposium 
on Theory of Computing, New Orleans, pp 32-42 


12. 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


23: 


26. 


Linearizability 


Goldreich O, Goldwasser S, Ron D (1998) Property 
testing and its connection to learning and approxima- 
tion. J ACM 45(4):653-750 

Hastad J (2001) Some optimal in approximability 
results. J ACM 48(4):798-859 

Hastad J, Wigderson A (2003) Simple analysis of 
graph tests for linearity and PCP. Random Struct 
Algorithms 22(2):139-160 

Jutla CS, Patthak AC, Rudra A, Zuckerman D (2009) 
Testing low-degree polynomials over prime fields. 
Random Struct Algorithms 35(2): 163-193 

Kaufman T, Ron D (2006) Testing polynomials over 
general fields. SIAM J Comput 36(3):779-802 
Kaufman T, Litsyn S, Xie N (2010) Breaking the 
epsilon-soundness bound of the linearity test over 
GF(2). SIAM J Comput 39(5): 1988-2003 

Kiwi M, Magniez F, Santha M (2001) Exact and 
approximate testing/correcting of algebraic func- 
tions: a survey. Electron. Collog. Comput. Complex. 
8(14). http://dblp.uni-trier.de/db/journals/eccc/eccc8. 
html#ECCC-TRO1-014 

Kiwi M, Magniez F, Santha M (2003) Approxi- 
mate testing with error relative to input size. JCSS 
66(2):37 1-392 

Magniez F (2005) Multi-linearity self-testing with 
relative error. Theory Comput Syst 38(5):573-591 
O’Donnell R (2014) Analysis of Boolean Functions. 
Cambridge University Press, New York 

Parnas M, Ron D, Samorodnitsky A (2002) Test- 
ing basic Boolean formulae. SIAM J Discret Math 
16(1):20-46 

Rubinfeld R, Sudan M (1996) Robust characteriza- 
tions of polynomials with applications to program 
testing. SIAM J Comput 25(2):252-271 
Samorodnitsky A, Trevisan L (2000) A PCP char- 
acterization of NP with optimal amortized query 
complexity. In: Proceedings of the Thirty-Second 
Annual ACM Symposium on Theory of Computing, 
Portland, pp 191-199 

Shpilka A, Wigderson A (2006) Derandomizing ho- 
momorphism testing in general groups. SIAM J Com- 
put 36(4):1215-1230 

Trevisan L (1998) Recycling queries in PCPs and in 
linearity tests. In: Proceedings of the Thirtieth Annual 
ACM Symposium on the Theory of Computing, Dal- 
las, pp 299-308 


Linearizability 


Maurice Herlihy 
Department of Computer Science, Brown 
University, Providence, RI, USA 


Keywords 


Atomicity 


Linearizability 


Years and Authors of Summarized 
Original Work 


1990; Herlihy, Wing 


Problem Definition 


An object in languages such as Java and C++ is 
a container for data. Each object provides a set 
of methods that are the only way to to manipulate 
that object’s internal state. Each object has a class 
which defines the methods it provides and what 
they do. 

In the absence of concurrency, methods can be 
described by a pair consisting of a precondition 
(describing the object’s state before invoking the 
method) and a postcondition, describing, once 
the method returns, the object’s state and the 
method’s return value. If, however, an object is 
shared by concurrent threads in a multiprocessor 
system, then method calls may overlap in time, 
and it no longer makes sense to characterize 
methods in terms of pre- and post-conditions. 

Linearizability is a correctness condition for 
concurrent objects that characterizes an object’s 
concurrent behavior in terms of an “equivalent” 
sequential behavior. Informally, the object be- 
haves “as if’ each method call takes effect instan- 
taneously at some point between its invocation 
and its response. This notion of correctness has 
some useful formal properties. First, it is non- 
blocking, which means that linearizability as such 
never requires one thread to wait for another to 
complete an ongoing method call. Second, it is 
local, which means that an object composed of 
linearizable objects is itself linearizable. Other 
proposed correctness conditions in the literature 
lack at least one of these properties. 


Notation 

An execution of a concurrent system is mod- 
eled by a history, a finite sequence of method 
invocation and response events. A subhistory of 
a history H is a subsequence of the events of H. 
A method invocation is written as (x.m(a*)A), 
where x is an object, m a method name, a ase- 
quence of arguments, and A a thread. A method 
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response is written as (x:t(r*)A) where ¢ is 
a termination condition and r° is a sequence of 
result values. 

A response matches an invocation if their 
objects and thread names agree. A method call 
is a pair consisting of an invocation and the next 
matching response. An invocation is pending in 
a history if no matching response follows the 
invocation. If H is a history, complete(H) is the 
subsequence of H consisting of all matching invo- 
cations and responses. A history H is sequential if 
the first event of H is an invocation, and each in- 
vocation, except possibly the last, is immediately 
followed by a matching response. 

Let H be a a history. The thread subhistory 
H|P is the subsequence of events in H with thread 
name P. The object subhistory H|x is similarly 
defined for an object x. Two histories H and H’ 
are equivalent if for every thread A, H|A = 
H'|A. A history H is well-formed if each thread 
subhistory H|A of H is sequential. Notice that 
thread subhistories of a well-formed history are 
always sequential, but object subhistories need 
not be. 

A sequential specification for an object is 
a prefix-closed set of sequential object histories 
that defines that object’s legal histories. A se- 
quential history H is legal if each object subhis- 
tory is legal. A method is total if it is defined 
for every object state, otherwise it is partial. 
(For example, a deg() method that blocks on an 
empty queue is partial, while one that throws an 
exception is total.) 

A history H defines an (irreflexive) partial 
order —y on its method calls: mp —y my, if the 
result event of mo occurs before the invocation 
event of m . If H is a sequential history, then > 
is a total order. 

Let H be a history and x an object such that 
H|x contains method calls mp and m,. A call 
mo x My, if mo precedes m, in H|x. Note that 
—, is a total order. 

Informally, linearizability requires that 
each method call appear to “take effect” 
instantaneously at some moment between 
its invocation and response. An important 
implication of this definition is that method 
calls that do not overlap cannot be reordered: 
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linearizability preserves the “real-time” order of 
method calls. Formally, 


Definition 1 A history H is linearizable if it can 
be extended (by appending zero or more response 
events) to a history H’ such that: 


¢ LI complete(H’) is equivalent to a legal se- 
quential history S, and 

¢ L2 If method call mo precedes method call m 
in H, then the same is true in S. 


S is called a linearization of H. (H may have 
multiple linearizations.) Informally, extending H 
to H’ captures the idea that some pending in- 
vocations may have taken effect even though 
their responses have not yet been returned to the 
caller. 


Key Results 


The Locality Property 
A property is local if all objects collectively 
satisfy that property provided that each individual 
object satisfies it. 

Linearizability is local: 


Theorem 1 4 is linearizable if and only if H|x is 
linearizable for ever object x. 


Proof The “only if” part is obvious. 

For each object x, pick a linearization of H|x. 
Let R, be the set of responses appended to H|x 
to construct that linearization, and let >, be 
the corresponding linearization order. Let H’ be 
the history constructed by appending to H each 
response in R,. 

The —y and —, orders can be “rolled up” 
into a single partial order. Define the relation > 
on method calls of complete(H’): For method 
calls m and m,m — m if there exist method calls 
Mo,..., My, Such that m = mo,m = my, and for 
each i between 0 and n — 1, either mj > mj+1 
for some object x, or mj > H Mj;+41. 

It turns out that — is a partial order. Clearly, 
— is transitive. It remains to be shown that — is 
anti-reflexive: for all x, it is false that x > x. 
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The proof proceeds by contradiction. If not, 
then there exist method calls mo,...,m™y, such 
that m9 -~ my > > Myn,My, > Mo, 
and each pair is directly related by some —, or 
by >#. 

Choose a cycle whose length is minimal. Sup- 
pose all method calls are associated with the same 
object x. Since —, is a total order, there must 
exist two method calls m;—, and m; such that 
Mji-1 >H mj; and m; —»x mj—1, contradicting 
the linearizability of x. 

The cycle must therefore include method calls 
of at least two objects. By reindexing if necessary, 
let m, and mz be method calls of distinct objects. 
Let x be the object associated with m,. None 
of m2,...,M, can be a method call of x. The 
claim holds for mz by construction. Let m; be 
the first method call in m3,...,m, associated 
with x. Since m;—; and m; are unrelated by 
—,, they must be related by —y, so the re- 
sponse of m;—1 precedes the invocation of mj. 
The invocation of mp precedes the response of 
mj;—1, since otherwise mj-; —y Mz, yielding 
the shorter cycle m2,...,mj;—. Finally, the re- 
sponse of m, precedes the invocation of mz, since 
™, —H Mz by construction. It follows that the 
response to mj, precedes the invocation of m,, 
hence m,; —y mj, yielding the shorter cycle 
M,,Mj,...,Mn. 

Since m, 1s not a method call of x, but my, > 
my, it follows that m, -y m,. But m, >-y 
mz by construction, and because —y is tran- 
sitive, 1, —->H M2, yielding the shorter cycle 
M2,...,Mn, the final contradiction. Oo 


Locality is important because it allows concur- 
rent systems to be designed and constructed in 
a modular fashion; linearizable objects can be im- 
plemented, verified, and executed independently. 
A concurrent system based on a non-local cor- 
rectness property must either rely on a centralized 
scheduler for all objects, or else satisfy addi- 
tional constraints placed on objects to ensure 
that they follow compatible scheduling protocols. 
Locality should not be taken for granted; as 
discussed below, the literature includes proposals 
for alternative correctness properties that are not 
local. 
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The Non-blocking Property 

Linearizability is a non-blocking property: 
a pending invocation of a total method is never 
required to wait for another pending invocation 
to complete. 


Theorem 2 Let inv(m) be an invocation of a total 
method. If (x invP) is a pending invocation 
in a linearizable history H, then there exists 
a response {x resP) such that H - (x resP) is 
linearizable. 


Proof Let S be any linearization of H. If S in- 
cludes a response (x resP) to (x invP), the 
proof is complete, since S is also a linearization 
of H - (x resP). Otherwise, (x invP) does not 
appear in S either, since linearizations, by defini- 
tion, include no pending invocations. Because the 
method is total, there exists a response (x res P) 
such that 


S’=S+(xinvP)-+(x res P) 


is legal. S’, however, is a linearization of 
H - (x resP), and hence is also a linearization 
of H. Oo 


This theorem implies that linearizability by 
itself never forces a thread with a pending invoca- 
tion of a total method to block. Of course, block- 
ing (or even deadlock) may occur as artifacts 
of particular implementations of linearizability, 
but it is not inherent to the correctness property 
itself. This theorem suggests that linearizability 
is an appropriate correctness condition for sys- 
tems where concurrency and real-time response 
are important. Alternative correctness conditions, 
such as serializability [1] do not share this non- 
blocking property. 

The non-blocking property does not rule 
out blocking in situations where it is explicitly 
intended. For example, it may be sensible for 
a thread attempting to dequeue from an empty 
queue to block, waiting until another thread 
enqueues an item. The queue specification 
captures this intention by making the deq() 
method’s specification partial, leaving it’s effect 
undefined when applied to an empty queue. 
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The most natural concurrent interpretation of 
a partial sequential specification is simply to 
wait until the object reaches a state in which the 
method is defined. 


Other Correctness Properties 

Sequential Consistency [4] is a weaker correct- 
ness condition that requires Property L/ but not 
L2: method calls must appear to happen in some 
one-at-a-time, sequential order, but calls that do 
not overlap can be reordered. Every linearizable 
history is sequentially consistent, but not vice 
versa. Sequential consistency permits more con- 
currency, but it is not a local property: a system 
composed of multiple sequentially-consistent ob- 
jects is not itself necessarily sequentially consis- 
tent. 

Much work on databases and distributed sys- 
tems uses serializability as the basic correctness 
condition for concurrent computations. In this 
model, a transaction is a “thread of control” that 
applies a finite sequence of methods to a set of 
objects shared with other transactions. A history 
is serializable if it is equivalent to one in which 
transactions appear to execute sequentially, that 
is, without interleaving. A history is strictly seri- 
alizable if the transactions’ order in the sequen- 
tial history is compatible with their precedence 
order: if every method call of one transaction 
precedes every method call of another, the former 
is serialized first. (Linearizability can be viewed 
as a special case of strict serializability where 
transactions are restricted to consist of a single 
method applied to a single object.) 

Neither serializability nor strict serializability 
is a local property. If different objects serialize 
transactions in different orders, then there may be 
no serialization order common to all objects. Se- 
rializability and strict serializability are blocking 
properties: Under certain circumstances, a trans- 
action may be unable to complete a pending 
method without violating serializability. A dead- 
lock results if multiple transactions block one an- 
other. Such transactions must be rolled back and 
restarted, implying that additional mechanisms 
must be provided for that purpose. 
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Applications 


Linearizability is widely used as the basic 
correctness condition for many concurrent data 
structure algorithms [5], particularly for lock- 
free and wait-free data structures [2]. Sequential 
consistency is widely used for describing 
low-level systems such as hardware memory 
interfaces. Serializability and strict serializability 
are widely used for database systems in which 
it must be easy for application programmers to 
preserve complex application-specific invariants 
spanning multiple objects. 


Open Problems 


Modern multiprocessors often support very weak 
models of memory consistency. There are many 
open problems concerning how to model such 
behavior, and how to ensure linearizable object 
implementations on top of such architectures. 
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Problem Definition 


One of the central trade-offs in the theory of 
error-correcting codes is the one between the 
amount of redundancy needed and the fraction of 
errors that can be corrected. (This entry deals 
with the adversarial or worst-case model of 
errors—no assumption is made on how the errors 
and error locations are distributed beyond an 
upper bound on the total number of errors that 
may be caused.) The redundancy is measured by 
the rate of the code, which is the ratio of the the 
number of information symbols in the message 
to that in the codeword — thus, for a code with 
encoding function E : S* — 5”, the rate equals 
k/n. The block length of the code equals n, and 
= is its alphabet. 

The goal in decoding is to find, given a noisy 
received word, the actual codeword that it could 
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have possibly resulted from. If the target is to 
correct a fraction p of errors (p will be called 
the error-correction radius), then this amounts to 
finding codewords within (normalized Hamming) 
distance p from the received word. We are guar- 
anteed that there will be a unique such code- 
word provided the distance between every two 
distinct codewords is at least 2¢, or in other words 
the relative distance of the code is at least 2p. 
However, since the relative distance 8 of a code 
must satisfy 6 < 1 — R where R is the rate of the 
code (by the Singleton bound), if one insists on 
an unique answer, the best trade-off between p 
and R is p = py(R) = (1 — R)/2. But this is an 
overly pessimistic estimate of the error-correction 
radius, since the way Hamming spheres pack in 
space, for most choices of the received word there 
will be at most one codeword within distance 
o from it even for p much greater than 6/2. 
Therefore, always insisting on a unique answer 
will preclude decoding most such received words 
owing to a few pathological received words that 
have more than one codeword within distance 
roughly 6/2 from them. 

A notion called list decoding, that dates 
back to the late 1950s [1, 9], provides a clean 
way to get around this predicament, and yet 
deal with worst-case error patterns. Under list 
decoding, the decoder is required to output a list 
of all codewords within distance p from the 
received word. Let us call a code C (p, L) -list 
decodable if the number of codewords within 
distance p of any received word is at most L. 
To obtain better trade-offs via list decoding, 
(p, L)-list decodable codes are needed where 
L is bounded by a polynomial function of the 
block length, since this an a priori requirement 
for polynomial time list decoding. How large can 
p be as a function of R for which such (p, L)- 
list decodable codes exist? A standard random 
coding argument shows that p > 1 — R—o(1) 
can be achieved over large enough alphabets, 
cf. [2, 10], and a simple counting argument 
shows that p must be at most | — R. Therefore 
the list decoding capacity, i.e., the information- 
theoretic limit of list decodability, is given by 
the trade-off pcap(R) = 1 — R = 2py(R). Thus 
list decoding holds the promise of correcting 
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twice aS many etrors as unique decoding, for 
every rate. The above-mentioned list decodable 
codes are non-constructive. In order to realize the 
potential of list decoding, one needs explicit 
constructions of such codes, and on top of 
that, polynomial time algorithms to perform list 
decoding. 

Building on works of Sudan [8], Guruswami 
and Sudan [6] and Parvaresh and Vardy [7], 
Guruswami and Rudra [5] present codes that 
get arbitrarily close to the list decoding capacity 
Pcap(R) for every rate. In particular, for every 
1> R> 0 and every € > 0, they give explicit 
codes of rate R together with polynomial time 
list decoding algorithm that can correct up to 
a fraction 1— R—e of errors. These are the 
first explicit codes (with efficient list decoding 
algorithms) that get arbitrarily close to the list 
decoding capacity for any rate. 


Description of the Code 
Consider a Reed—Solomon (RS) code 
C = RSpr«[n,k] consisting of evaluations of 
degree k polynomials over some finite field F 
at the set F* of nonzero elements of F. Let 
q = |F| =n+1. Let y be a generator of the 
multiplicative group F*, and let the evaluation 
points be ordered as 1,y,y?,...,y” 1. Using 
all nonzero field elements as evaluation points is 
one of the most commonly used instantiations of 
Reed—Solomon codes. 

Let m > 1 be an integer parameter called the 
folding parameter. For ease of presentation, it 
will assumed that m divides n = q — 1. 


Definition 1 (Folded Reed—Solomon Code) 
The m-folded version of the RS code C, denoted 
FRSg.ym,k» is a code of block length N = n/m 
over F”. The encoding of a message /(X), 
a polynomial over F of degree at most k, has as 
its j’th symbol, for 0 < j <n/m, the m-tuple 
LO), FOO see Fee in 
other words, the codewords of C’ = FRSp.y,m,x 
are in one-one correspondence with those of the 
RS code C and are obtained by bundling together 
consecutive m-tuple of symbols in codewords 
of C. 
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Key Results 


The following is the main result of Guruswami 
and Rudra. 


Theorem 1 ((5]) For everye > Oand0 < R <1, 
there is a family of folded Reed—Solomon codes 
that have rate at least R and which can be list 
decoded up to a fraction 1— R—e of errors 
in time (and outputs a list of size at most) 
(N/e2) Oe! los1/R)) where N is the block length 
of the code. The alphabet size of the code as 
a function of the block length N is (N/e2)OG/€?), 


The result of Guruswami and Rudra also works 
in a more general setting called list recovering, 
which is defined next. 


Definition 2 (List Recovering) A code C C ” 
is said to be (¢,/, L)-list recoverable if for every 
sequence of sets S),---,S, where each S$; C 
has at most / elements, the number of codewords 
c € C for which c; € S; for at least €1 positions 
i € {1,2,...,n}is at most L. 

A code C C &” is said to (¢,/)-list recov- 
erable in polynomial time if it is (¢,/, L(”))- 
list recoverable for some polynomially bounded 
function L(-), and moreover there is a polynomial 
time algorithm to find the at most L(7) codewords 
that are solutions to any (¢,/, L(n))-list recover- 
ing instance. 


Note that when / = 1, (¢, 1,-)-list recovering is 
the same as list decoding up to a (1 — ¢) fraction 
of errors. Guruswami and Rudra have the follow- 
ing result for list recovering. 


Theorem 2 ((5]) For every integer 1 > 1, for 
all R, O0< R<1 and €>0, and for every 
prime p, there is an explicit family of folded 
Reed—Solomon codes over fields of characteristic 
p that have rate at least R and which can be 
(R + €,1)-list recovered in polynomial time. The 
alphabet size of a code of block length N in the 
family is (N/e2) Oe? log t/C1— RD). 
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Applications 


To get within € of capacity, the codes in Theo- 
rem | have alphabet size N%?(/ <*) where N is 
the block length. By concatenating folded RS 
codes of rate close to 1 (that are list recover- 
able) with suitable inner codes followed by re- 
distribution of symbols using an expander graph 
(similar to a construction for linear-time unique 
decodable codes in [3]), one can get within « 
of capacity with codes over an alphabet of size 
QO * los1/e)) A counting argument shows that 
codes that can be list decoded efficiently to within 
€ of the capacity need to have an alphabet size of 
g2Q/e)_ 

For binary codes, the list decoding capacity is 
known to be ppin(R) = H~!(1 — R) where H(-) 
denotes the binary entropy function. No explicit 
constructions of binary codes that approach this 
capacity are known. However, using the Folded 
RS codes of Guruswami Rudra in a natural con- 
catenation scheme, one can obtain polynomial 
time constructable binary codes of rate R that can 
be list decoded up to a fraction pzy,»(R) of errors, 
where Pzyab(R) is the “Zyablov bound”. 

See [5] for more details. 


Open Problems 


The work of Guruswami and Rudra could be 
improved with respect to some parameters. The 
size of the list needed to perform list decoding 
to a radius that is within € of capacity grows as 
NO€~'lx/R)) where N and R are the block 
length and the rate of the code respectively. 
It remains an open question to bring this list 
size down to a constant independent of n (the 
existential random coding arguments work with 
a list size of O(1/e)). The alphabet size needed 
to approach capacity was shown to be a constant 
independent of N. However, this involved a brute- 
force search for a rather large (inner) code, 
which translates to a construction time of about 
NO lee(1/©)) (instead of the ideal construction 
time where the exponent of N does not depend on 
€). Obtaining a “direct” algebraic construction 
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over a constant-sized alphabet, such as the 
generalization of the Parvaresh-Vardy framework 
to algebraic-geometric codes in [4], might help 
in addressing these two issues. 

Finally, constructing binary codes that ap- 
proach list decoding capacity remains open. 
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Problem Definition 


Let L be a linked list of n vertices x1, X2,...,Xn 
such that every vertex x; stores a pointer succ(x;) 
to its successor in L. As with any linked list, 
we assume that no two vertices have the same 
successor, any vertex can reach the tail of the list 
by following successor pointers, and we denote 
the head of the list the vertex that no other 
vertex in L points to and the tail the vertex 
whose successor is null. Given the head x, of 
L, the list-ranking problem is to find the rank, or 
distance, of each vertex x; in L from the head of 
L: that is, rank(xp,) = 0 and rank(succ(x;)) = 
rank(x;) + 1; refer to Fig. 1. A generalization 
of this problem is to consider that each vertex x; 
stores, in addition to succ(x;), a weight w(x;); 
in this case the list is given as a set of tuples 
{(xi, w(x;), succ(x;))} and we want to compute 
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List Ranking, Fig. 1 (a) head 
An instance of the LR a 
problem, with the ranks 
shown in (b) CTI ho IT a 
b head 
3] [7 2 4 ) 8 5 1 6 
rank(xp,) = w(x,), and rank(succ(x;)) = The external memory model (see also chap- 


rank(x;) + w(succ(x;)). 


Key Results 


List ranking is one of the fundamental prob- 
lems in the external memory model (EM or I/O- 
model) which requires nontrivial techniques and 
illustrates the differences (and connection) be- 
tween the models of computation, namely, the 
random access machine (RAM), its parallel ver- 
sion PRAM, and the parallel external memory 
model (PEM). It also illustrates how ideas from 
parallel algorithms are used in serial external 
memory algorithms and the idea of using geomet- 
rically decreasing sizes to get an algorithm that in 
total is as fast as sorting. Furthermore list ranking 
is the main ingredient in the Euler Tour technique, 
which is one of the main techniques for obtaining 
1/O-efficient solutions for fundamental problems 
on graphs and trees. 

In internal memory, list ranking can be solved 
in O(n) time with a straightforward algorithm 
that starts from the head of the list and follows 
successor pointers. In external memory, the same 
algorithm may use §2(n) I/Os — the intuition is 
that in the worst case, the vertices are arranged in 
such an order that following a successor pointer 
will always require loading a new block from disk 
(one I/O). If the vertices were arranged in order 
of their ranks, then traversing the list would be 
trivial, but arranging them in this order would 
require knowing their ranks, which is exactly the 
problem we are trying to solve. 


ter » I/O-Model) has been extended to a parallel 
version, the PEM model, which models a private 
cache shared memory architecture, where the 
shared memory (external memory) is organized 
in blocks of size B, each cache (internal memory) 
has size M, and there are P processors. The 
cost of executing an algorithm is the number of 
parallel I/Os. When we use asymptotic notation, 
we think of M,B, and P as arbitrary nonde- 
creasing functions depending on m, the number 
of elements constituting the input. Similar to the 
PRAM, there are different versions of the model, 
depending on the possibility of concurrent read 
and write. In the following we assume a Concur- 
rent Read Exclusive Write (CREW) policy [2]. 
For M = 2, B = 1 the PEM model is a PRAM 
model, and for P = 1 the EM model. 

Similar to the EM model (see chapter 
> External Sorting and Permuting), sorting 
is an important building block for the PEM 
having complexity sort(n) = O( pg logy #) 
for d = max {2,min{Z, 4}} if P < 4 [7]. 

The complexity of permuting in the PEM 
model is that of sorting unless the direct algo- 
rithm with O(>) T/Os is faster, 1.e., for B smaller 
than the logarithmic term. 

In the RAM model, permuting and list ranking 
have scanning complexity, while (comparison- 
based) sorting is more expensive. In the PRAM 
model, this changes slightly: permuting still has 
scanning complexity, while list ranking has sort- 
ing complexity (like any function where a single 
output depends on all inputs it needs n/P + logn 
time). In the external memory model, list ranking 
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has permuting complexity which is higher than 
scanning unless B = O(1), and usually it is 
sorting complexity. In contrast, for many proces- 
sors and large B (more precisely P = M = 
2B = /n) and a restricted model where the 
input is only revealed to the algorithm if certain 
progress has been made, list ranking has com- 
plexity 2 (log? n) [8]. 


List-Ranking Algorithms in Parallel 

External Memory 

A solution for list ranking that runs in O(sort(”)) 
I/Os was described by Chiang et al. [4]. The 
general idea is based on a PRAM algorithm by 
Cole and Vishkin and consists of the following 
steps: 


1. Find an independent set J of L (a set of 
vertices such that no two vertices in J are 
adjacent in L) consisting of @() vertices. 

2. Compute a new list L — J by removing the 
vertices in J from L; that is, all vertices x 
in J are bridged out: let y be the vertex with 
succ(y) = x, and then we set succ(y) := 
succ(x); additionally the weight of x is added 
to the weight of succ(x). This ensures that the 
rank of any vertex in L — J is the same as its 
rank in L. 

. Compute the ranks recursively on L — J. 

4. Compute the ranks of the vertices in J from 

the ranks of the neighbors in L — J. 


ww 


The key idea of the algorithm is finding an 
independent set of size §2(c-n) for some constant 
c € (0,1) and thus recursing on a list of size 
O((1 — c)-n). The first step, finding a large 
independent set, can be performed in O(sort(n)) 
I/Os and is described in more detail below. The 
second and fourth step can be performed in a 
couple of scanning and sorting passes in over- 
all O(sort(7)) I/Os. For example, to update the 
weights of vertices in L — /, it suffices to sort the 
vertices in J by their successor, sort the vertices 
in L — I by their vertex ID, and then scan the 
two sorted lists, and for each pair (x’,x) in I 
with x = succ(x’), and (x, succ(x)) in L — T, 
we update the weight of x to include the deleted 
vertex x’: w(x) = w(x) + w(x’). Similarly, to 
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update the successors of vertices in L — /, it 
suffices to sort J by vertex ID, sort the vertices 
in L — J by their successor, and then scan the 
two sorted lists. Once the ranks of the vertices 
in L — I are computed, it suffices to sort J by 
vertex ID, sort L — I by successor, and then 
scan the two lists to update the rank of each 
vertex x € I. Overall, the I/O-complexity of this 
list-ranking algorithm is given by the recurrence 
T(n) < O(sort(x)) + T(c-n), for some constant 
c € (0,1), with solution O(sort(n)) I/Os. This 
is due to the convexity of the I/O behavior of 
sorting, i.e., sort(c-n) < c-sort(n). The algorithm 
can be used in the parallel setting as well because 
the scanning here works on pairs of elements that 
are stored in neighboring cells. For moderately 
large number of processors, sorting the original 
instance still dominates the overall running times, 
and more processors lead to an additional log 
factor in the number of parallel I/Os [8]. 


Computing a Large Independent Set of L 

There are several algorithms for finding an inde- 
pendent set of a list L that run in O(sort(7)) I/Os. 
The simplest one is a randomized algorithm by 
Chiang et al. [4], based on a PRAM algorithm by 
Anderson and Miller. The idea is to flip a coin 
for each vertex and then select the vertices whose 
coin came up heads and their successor’s coin 
came up tail. This produces an independent set 
of expected size (n — 1)/4 = O(n). 

In the serial setting, a different way to compute 
an independent set of L is to 3-color the vertices 
in L (assign one of the three colors to each 
vertex such that no two adjacent vertices have 
same color) and then pick the most popular of the 
three colors, an independent set of at least n/3 
vertices. This can be implemented in O(sort(7)) 
I/Os using time-forward processing [4]. A more 
direct algorithm, also based on time-forward pro- 
cessing, is described in [11]. 

In the parallel setting, time-forward process- 
ing is not available. Instead deterministic coin 
tossing can be used. With permuting complexity, 
any k-coloring of a list can be transformed to a 
log k coloring by what is known as deterministic 
coin tossing [5]. This immediately leads to a par- 
allel deterministic algorithm with an additional 
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log* n factor in the number of parallel I/Os. 
Alternatively, as long as sorting is still convex 
in n, a technique called delayed pointer process- 
ing [3] can be used. This can be understood as a 
parallel version of time-forward processing for a 
DAG of depth log log n. 

Alternatively, all the CREW-PRAM list- 
ranking algorithm can be executed on the PEM 
leading to one parallel I/O per step. 


Lower Bounds 

There is a permuting complexity lower bound 
for list ranking, showing that the above-explained 
algorithm is asymptotically optimal for a large 
range of the parameters. This was sketched in [4] 
for the serial case and made precise in [8] by 
the indivisibility assumption that edges have to 
be treated as atoms and extended to the parallel 
case. Observe that in the PEM model, superlinear 
speedups in the number of processors are possible 
(the overall available fast memory increases), and 
hence a lower bound for the serial case does not 
imply a good lower bound the parallel case. 

If the number of processors is high (or for 
parallel computational models where permuting 
is easy like BSP or map-reduce), list ranking 
seems to become more difficult than permuting. 
All known algorithms are a factor O(log 7) more 
expensive than permuting. One attempt at an 
explanation is a lower bound in the mentioned 
setting where the instance is only gradually re- 
vealed (depending on the algorithm already hav- 
ing solved certain other parts of the instance) [8]. 
For this particular parameter setting, the lower 
bound shows that the described sorting-based 
algorithm is optimal. 


Applications 


List ranking in external memory is particularly 
useful in connection with Euler tours [4]. An 
Euler tour of an undirected tree T = (V, E) is 
a traversal of T that visits every edge twice, once 
in each direction. Such a traversal is represented 
as a linear list L of edges and can be obtained 
1/O-efficiently as follows: after fixing an order of 
the edges {v,w 1},...,{v, wx} incident to each 
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node v of T, we set the successor of {w;,v} in 
L to be {v, wi+1} and the successor of {wx, v} in 
L to be {v, wi}. We break the resulting circular 
list at some root node r by choosing an edge 
{v,r} with successor {r, w}, setting the successor 
of {v,r} to be null and marking {7, w} to be the 
first edge of the traversal list L. List ranking on 
L is then applied in order to lay out the Euler 
tour on disk in a way that about B consecutive 
list elements are kept in each block. As an Euler 
tour reflects the structure of its underlying tree T, 
many properties of JT can be derived from a few 
scanning and sorting steps on the edge sequence 
of the Euler tour once it has been stored in a way 
suitable for I/O-efficient traversal. In fact, this 
technique is not restricted to external memory 
but has already been used earlier [9] for parallel 
(PRAM) algorithms, where the scanning steps are 
replaced by parallel prefix computations. 

Classic tree problems solved with the Euler 
tour technique include tree rooting (finding the 
parent-child direction of tree edges after a vertex 
of an unrooted and undirected tree has been 
chosen to become the root), assigning pre-/post- 
/inorder numbers, and computing node levels or 
number of descendants. Euler tours and hence 
list ranking are also useful for non-tree graphs. 
For example, they are a basic ingredient of a 
clustering preprocessing step [1] for I/O-efficient 
breadth first search (BFS) on sparse undirected 
graphs G: after obtaining a spanning tree 7 for 
G, an Euler tour around T is used in order to de- 
terministically obtain low diameter clusters of G. 


Experimental Results 


Despite their theoretical sorting complexity I/O 
bound, external-memory list-ranking implemen- 
tations based on independent set removal suffer 
from non-negligible constant factors. For small 
to medium input sizes featuring n < M?/(4- 
B), Sibeyn modifies his connected components 
algorithm [10] in order to solve practical list- 
ranking problems in scanning complexity with a 
small constant factor (22 -n/B I/Os). The algo- 
rithm splits the input list into at most M/(2- B) 
subproblems of M/2 consecutive node indices 
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each and processes these subproblems in two 
passes (the first one running from high to low 
index ranges and the second one vice versa). 

For all nodes of the current range, Sibeyn’s 
algorithm follows the links leading to the nodes 
of the same sublists and updates the information 
on their final node and the number of links 
to it. For all nodes with links running outside 
the current sublist, the required information is 
requested in a batched fashion from the sub- 
problems containing the nodes to which they are 
linked. Phase-one-requests from and phase-two- 
answers to the sublists are processed only when 
the wave through the data hits the corresponding 
subproblem. Due to the overall size restriction, 
a buffer of size ©(B) can be kept in main 
memory for each subproblem in order to facilitate 
1/O-efficient information transfer between the 
subproblems. 

The implementation of Sibeyn’s algorithm in 
the STXXL [6] framework has been used as 
a building block in the engineering of many 
graph traversal algorithms [1]. For example, the 
improved clustering preprocessing based on list 
ranking and Euler tours helped in reducing the 
I/O wait time for BFS by up to two orders 
of magnitude compared to a previous clustering 
method. 
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Problem Definition 


The paper of Graham [8] was published in the 
1960s. Over the years, it served as a common 
example of online algorithms (though the original 
algorithm was designed as a simple approxima- 
tion heuristic). The following basic setting is 
considered. 

A sequence of 1 jobs is to be assigned to m 
identical machines. Each job should be assigned 
to one of the machines. Each job has a size asso- 
ciated with it, which can be seen as its processing 
time or its load. The load of a machine is the 
sum of sizes of jobs assigned to it. The goal is to 
minimize the maximum load of any machine, also 
called the makespan. We refer to this problem as 
JOB SCHEDULING. 

If jobs are presented one by one and each 
job needs to be assigned to a machine in tur, 
without any knowledge of future jobs, the prob- 
lem is called online. Online algorithms are typ- 
ically evaluated using the (absolute) competitive 
ratio, which is similar to the approximation ratio 
of approximation algorithms. For an algorithm 
A, we denote its cost by A as well. The cost 
of an optimal offline algorithm that knows the 
complete sequence of jobs is denoted by OPT. 
The competitive ratio of an algorithm A is the 
infimum R > 1 such that for any input, A < 
R + OPT. 


Key Results 


In paper [8], Graham defines an algorithm called 
LIST SCHEDULING (LS). The algorithm receives 
jobs one by one. Each job is assigned in turn to a 
machine which has a minimal current load. Ties 
are broken arbitrarily. 

The main result is the following: 


Theorem 1 LS has a competitive ratio of 2 — 4. 


Proof Consider a schedule created for a given 
sequence. Let € denote a job that determines 
the makespan (that is, the last job assigned to a 
machine i that has a maximum load), let L denote 
its size, and let X denote the total size of all 
other jobs assigned to 7. At the time when L was 
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assigned to 7, this was a machine of minimum 
load. Therefore, the load of each machine is at 
least X. The makespan of an optimal schedule 
(i.e., a schedule that minimizes the makespan) is 
the cost of an optimal offline algorithm and thus 
is denoted by OPT. Let P be the sum of all job 
sizes in the sequence. 


The two following simple lower bounds on 
OPT can be obtained: 


() 


(2) 


Inequality (1) follows from the fact that {OPT} 
needs to run job @ and thus at least one machine 
has a load of at least L. The first inequality in 
(2) is due to the fact that at least one machine 
receives at least a fraction 4 of the total size 
of jobs. The second inequality in (2) follows 
from the comments above on the load of each 
machine. 

This proves that the makespan of the algo- 
rithm, X + L can be bounded as follows: 


m—1 m—1 
X+L <OPT + ——L < OPT + ——— OPT 
m m 


= (2—1/m) OPT. 
(3) 


The first inequality in (3) follows from (2) and the 
second one from (1). 

To show that the analysis is tight, consider 
m(m — 1) jobs of size 1 followed by a single 
job of size m. After the smaller jobs arrive, 
LS obtains a balanced schedule in which every 
machine has a load of m — 1. The additional job 
increases the makespan to 2m — 1. However, an 
optimal offline solution would be to assign the 
smaller jobs to m— 1 machines and the remaining 
job to the remaining machine, getting a load 
of m. 

A natural question was whether this bound is 
best possible. In a later paper, Graham [9] showed 
that applying LS with a sorted sequence of jobs 
(by nonincreasing order of sizes) actually gives 
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a better upper bound of 4 - z= on the approxi- 
mation ratio. A polynomial time approximation 
scheme was given by Hochbaum and Shmoys 
in [10]. This is the best offline result one could 
hope for as the problem is known to be NP hard 
in the strong sense. 

As for the online problem, it was shown in 
[5] that no (deterministic) algorithm has a smaller 
competitive ratio than 2 — a for the cases m = 2 
and m = 3. On the other hand, it was shown in 
a sequence of papers that an algorithm with a 
smaller competitive ratio can be found for any 
m >4, and even algorithms with a competitive 
ratio that does not approach 2 for large m were 
designed. 

The best such result is by Fleischer and Wahl 
[6], who designed a 1.9201-competitive algo- 
rithm. Lower bounds of 1.852 and 1.85358 on the 
competitive ratio of any online algorithm were 
shown in [1,7]. Rudin [13] claimed a better lower 
bound of 1.88. 


Applications 


As the study of approximation algorithms and 
specifically online algorithms continued, the 
analysis of many scheduling algorithms used 
similar methods to the proof above. Below, 
several variants of the problem where almost 
the same proof as above gives the exact same 
bound are mentioned. 


Load Balancing of Temporary Tasks 

In this problem, the sizes of jobs are seen as loads. 
Time is a separate axis. The input is a sequence 
of events, where every event is an arrival or a 
departure of a job. The set of active jobs at time 
t is the set of jobs that have already arrived at 
this time and have not departed yet. The cost of 
an algorithm at a time ¢ is its makespan at this 
time. The cost of an algorithm is its maximum 
cost over time. It turns out that the analysis above 
can be easily adapted for this model as well. 
It is interesting to note that in this case, the 
bound 2 — a is actually best possible, as shown 
in [2]. 
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Scheduling with Release Times and 
Precedence Constraints 

In this problem, the sizes represent processing 
times of jobs. Various versions have been studied. 
Jobs may have designated release times, which 
are the times when these jobs become available 
for execution. In the online scenario, each job 
arrives and becomes known to the algorithm only 
at its release time. Some precedence constraints 
may also be specified, defined by a partial order 
on the set of jobs. Thus, a job can be run only 
after its predecessors complete their execution. In 
the online variant, a job becomes known to the 
algorithm only after its predecessors have been 
completed. In these cases, LS acts as follows. 
Once a machine becomes available, a waiting job 
that arrived earliest is assigned to it. (If there is 
no waiting job, the machine is idle until a new 
job arrives). 

The upper bound of 2 — 4 on the competitive 
ratio can be proved using a relation between the 
cost of an optimal schedule and the amount of 
time when at least one machine is idle (See [14] 
for details). 

This bound is tight for several cases. For the 
case where there are release times, no prece- 
dence constraints, and processing times (sizes) 
are not known upon arrival, Shmoys, Wein, and 
Williamson [15] proved a lower bound of 2 — 4. 
For the case where there are only precedence 
constraints (no release times, and sizes of jobs 
are known upon arrival), a lower bound of the 
same value appeared in [4]. Note that the case 
with clairvoyant scheduling (i.e., sizes of jobs 
are known upon arrival), release times, and no 
precedence constraints is not settled. For m = 2, it 
was shown by Noga and Seiden [11] that the tight 
bound is (5 — /5)/2 ~ 1.38198, and the upper 
bound is achieved using an algorithm that applies 
waiting with idle machines rather than scheduling 
a job as soon as possible, as done by LS. 


Open Problems 
The most challenging open problem is to find the 


best possible competitive ratio for this basic on- 
line problem of job scheduling. The gap between 
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the upper bound and the lower bound is not large, 
yet it seems very difficult to find the exact bound. 
A possibly easier question would be to find the 
best possible competitive ratio for m = 4. A lower 
bound of /3 ~ 1.732 has been shown by [12], 
and the currently known upper bound is 1.733 
by [3]. Thus, it may be the case that this bound 
would turn out to be /3. 
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Problem Definition 


The pairwise local alignment problem is con- 
cerned with identification of a pair of similar 
substrings from two molecular sequences. This 
problem has been studied in computer science 
for four decades. However, most problem mod- 
els were generally not biologically satisfying or 
interpretable before 1974. In 1974, Sellers devel- 
oped a metric measure of the similarity between 
molecular sequences. [9] generalized this metric 
to include deletions and insertions of arbitrary 
length which represent the minimum number of 
mutational events required to convert one se- 
quence into another. 

Given two sequences S and 7, a pairwise 
alignment is a way of inserting space characters 
‘’ in S and T to form sequences S’ and T’ 
respectively with the same length. There can be 
different alignments of two sequences. The score 
of an alignment is measured by a scoring metric 
5(x, y). At each position i where both x and 
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y are not spaces, the similarity between S’[i] 
and T’ [i] is measured by 8(S’[i], T’[7]). Usually, 
5(x, y) is positive when x and y are the same and 
negative when x and y are different. For positions 
with consecutive space characters, the alignment 
scores of the space characters are not considered 
independently; this is because inserting or delet- 
ing a long region in molecular sequences is more 
likely to occur than inserting or deleting several 
short regions. Smith and Waterman use an affine 
gap penalty to model the similarity at positions 
with space characters. They define a consecutive 
substring with spaces in S’ or T’ as a gap. For 
each length / gap, they give a linear penalty 
W;, = W, +1 x W, for some predefined positive 
constants W,; and W,. The score of an align- 
ment is the sum of the score at each position i 
minus the penalties of each gap. For example, 
the alignment score of the following alignment 
is 8(G,G) + &(C,C) + &(C,C) + &(U,C) + 
8(G, G) — (W; +2 x W,). 


S:GCCAUUG 
T:GCC_CG 


The optimal global alignment of sequences S' and 
T is the alignment of S and T with the maximum 
alignment score. 

Sometimes we want to know whether 
sequences S and 7 contain similar substrings 
instead of whether S and 7 are similar. In this 
case, they solve the pairwise local alignment 
problem, which wants to find a substring U in S 
and another substring V in T such that the global 
alignment score of U and V is maximized. 


Pairwise Local Alignment Problem 
Input: Two sequences S[1...n] and T[1...m]. 


H(i,0) = H(0, j) =0, 


Hy (i,0) = Hn (0, j) = —o0, 
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Output: A substring U in S anda substring V in 
T such that the optimal global alignment of U 
and V is maximized. 


O(mn) time and O(mn) space algorithm is 
based on dynamic programming. 

The pairwise local alignment problem can be 
solved in O(mn) time and O(mn) space by dy- 
namic programming. The algorithm needs to fill 
inthe 4m xn tables H, Hy, Hs, and Hr, where 
each entry takes constant time. The individual 
meanings of these 4 tables are as follows. 


H(i, 7): maximum score of the global align- 
ment of U and V over all suffixes U in 
S[1...i] and all suffixes V in T[1... 7]. 

HAy(i, j): maximum score of the global align- 
ment of U and V over all suffixes U in 
S[1...i] and all suffixes V in T[1... 7], with 
the restriction that S[i] and T[j] must be 
aligned. 

Hs(i, 7): maximum score of the global align- 
ment of U and V over all suffixes U in 
S{1...i] and all suffixes V in T[1... 7], with 
S[j] aligned with a space character. 

Hr(i, 7): maximum score of the global align- 
ment of U and V over all suffixes U in 
S{1...i] and all suffixes V in T[1... 7], with 
T [j] aligned with a space character. 


The optimal local alignment score of S' and T 
will be max{ H(i, j)}, and the local alignment of 
Sand T can be found by tracking back table H. 

In the tables, each entry can be filled in by the 
following recursion in constant time. 


Basic Step 


O<i<n,0<i<m 


0<i<n,0 


H,(i,0) = Hr(0, j) = We + Wp, 0<i <n, O<i <m 


A; (0, 7) = Hr(i, 0) = —oo, 


O<i<n,O0<i<m 
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Recursion Step 


A(i, j)=max{Hy (i, j), Hs, j), Ar, j), Of. 


l<i<n, l<i<m 
An(i,j) = AG -1,j -1) + 4(S[i], TL), 
l<i<n, l1<i<m 


Hs(i, j) = max{H(i — 1, j) —(W, + Wp), 
git =1)3) = Wats 


l<i<n 


~~ ~ ’ 


l<i<m 
Hr (i,j) = max{H(i, j — 1) -— (Ws + Wp), 
Hr(t, j =1) = Wo}, 


l<i<n, l<i<m 


Applications 


Local alignment with affine gap penalty can 
be used for protein classification, phylogenetic 
footprinting, and identification of functional 
sequence elements. 


URL to Code 


http://bioweb.pasteur.fr/seqanal/interfaces/water. 
html 
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Problem Definition 


This work of Miller and Myers [11] deals with the 
problem of pairwise sequence alignment in which 
the distance measure is based on the gap penalty 
model. They proposed an efficient algorithm to 
solve the problem when the gap penalty is a 
concave function of the gap length. 

Let X and Y be two strings (sequences) of 
alphabet &. The pairwise alignment A of X and 
Y maps X, Y into strings X’, Y’ that may contain 
spaces (not in ©) such that (1) |X’| = |Y’| = @; 
(2) removing spaces from X’ and Y’ returns X 
and Y, respectively; and (3) for any 1 <i < @, 
X'[i] and Y'[i] cannot be both spaces where X’ [i] 
denotes the ith character in X’. 


Local Alignment (with Concave Gap Weights) 


To evaluate the quality of an alignment, there 
are many different measures proposed (e.g., edit 
distance, scoring matrix [12]). In this work, they 
consider the gap penalty model. 

A gap in an alignment A of X and Y is a 
maximal substring of contiguous spaces in either 
X' or Y’. There are gaps and aligned characters 
(both X’[i] and Y’[i] are not spaces) in an align- 
ment. The score for a pair of aligned characters is 
based on a distance function 6(a, b) where a,b € 
x. Usually 6 is a metric, but this assumption is 
not required in this work. The penalty of a gap 
of length k is based on a nonnegative function 
W(k). The score of an alignment is the sum of 
the scores of all aligned characters and gaps. An 
alignment is optimal if its score is the minimum 
possible. 

The penalty function W(k) is concave if 
AW(k) => AW(k + 1) for all k >1, where 
AW(k) = W(k + 1) — W(k). 

The penalty function W(k) is affine if W(k) = 
a + bk where aandb are constants. Affine func- 
tion is a special case of concave function. The 
problem for affine gap penalty has been consid- 
ered in [1,7]. 

The penalty function W(k) is a P-piece affine 
curve if the domain of W can be partitioned into 
P intervals, (t1 = 1, 71), (t2, X2),---, (Tp. Xp = 
oo), where tT} = yi-1 + 1 forall | <i < p, 
such that for each interval, the values of W 
follow an affine function. More precisely, for 
any k € (tj, 7;), W(k) = a; + b;k for some 
constants a;, b;. 


Problem 


Input: Two strings X and Y, the scoring func- 
tion 6, and the gap penalty function W(k). 
Output: An optimal alignment of X and Y. 


Key Results 


Theorem 1 Jf W(k) is concave, they provide an 
algorithm for computing an optimal alignment 
that runs in O(n? logn) time where n is the 
length of each string and uses O(n) expected 
space. 
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Corollary 1 If W(k) is an affine function, the 
same algorithm runs in O(n?) time. 


Theorem 2 For some special types of gap 
penalty functions, the algorithm can be modified 
to run faster. 


° If W(k) is a P-piece affine curve, the algo- 
rithm can be modified to run in O(n? log P) 
time. 

¢ For logarithmic gap penalty function, W(k) = 
a+ blogk, the algorithm can be modified to 
run in O(n?) time. 

¢ If W(k) is a concave function when k > 
K, the algorithm can be modified to run in 
O(K + n? logn) time. 


Applications 


Pairwise sequence alignment is a fundamental 
problem in computational biology. Sequence 
similarity usually implies functional and 
structural similarity. So, pairwise alignment can 
be used to check whether two given sequences 
have similar functions or structures and to predict 
functions of newly identified DNA sequence. One 
can refer to Gusfield’s book for some examples 
on the importance of sequence alignment (pp. 
212-214 of [8]). 

The alignment problem can be further divided 
into the global alignment problem and the local 
alignment problem. The problem defined here is 
the global alignment problem in which the whole 
input strings are required to align with each other. 
On the other hand, for local alignment, the main 
interest lies in identifying a substring from each 
of the input strings such that the alignment score 
of the two substrings is the minimum among all 
possible substrings. Local alignment is useful in 
aligning sequences that are not similar, but con- 
tain a region that are highly conserved (similar). 
Usually this region is a functional part (domain) 
of the sequences. Local alignment is particu- 
larly useful in comparing proteins. Proteins in 
the same family from different species usually 
have some functional domains that are highly 
conserved while the other parts are not similar 
at all. Examples are the homeobox genes [4] for 
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which the protein sequences are quite different 
in each species except the functional domain 
homeodomain. 

Conceptually, the alignment score is used to 
capture the evolutionary distance between the 
two given sequences. Since a gap of more than 
one space can be created by a single mutational 
event, considering a gap of length k as a unit 
instead of k different point mutation may be more 
appropriate in some cases. However, which gap 
penalty function should be used is a difficult 
question to answer and sometimes depends on 
the actual applications. Most applications, such 
as BLAST, uses the affine gap penalty which is 
still the dominate model in practice. On the other 
hand, Benner et al. [2] and Gu and Li [9] sug- 
gested to use the logarithmic gap penalty in some 
cases. Whether using a concave gap penalty func- 
tion in general is meaningful is still an open issue. 


Open Problem 


Note that the results of this paper have been 
independently obtained by Galil and Giancarlo 
[6], and for affine gap penalty, Gotoh [7] also 
gave an O(n) algorithm for solving the align- 
ment problem. In [5], Eppstein gave a faster 
algorithm that runs in O(n?) time for solving the 
same sequence alignment problem with concave 
gap penalty function. Whether a subquadratic 
algorithm exists for solving this problem remains 
open. As a remark, subquadratic algorithms do 
exist for solving the sequence alignment problem 
if the measure is not based on the gap penalty 
£ 
model, but is computed as >> 6(X1'[i], Y’[i]) 
based only on a scoring faneton 6(a,b) where 
a,b € XU {_} where ‘_’ represents the space 
[3, 10]. 


Experimental Results 


They have performed some experiments to com- 
pare their algorithm with Waterman’s O(n?) al- 
gorithm [13] on a number of different concave 
gap penalty functions. Artificial sequences are 
generated for the experiments. Results from their 
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experiments lead to their conjectures that Water- 
man’s method runs in O(n) time when the two 
given strings are very similar or the score for 
mismatch characters is small and their algorithm 
runs in O(n?) time if the range of the function 
W(k) is not functionally dependent on n. 
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Problem Definition 


A local algorithm is a distributed algorithm on 
a network with a running time which is inde- 
pendent or almost independent of the network’s 
size or diameter. Usually, a distributed algorithm 
is called local if its time complexity is at most 
polylogarithmic in the size n of the network. 
Because the time needed to send information 
from one node of a network to another is at 
least proportional to the distance between the two 
nodes, in such an algorithm, each node’s com- 
putation is based on information from nodes in 
a close vicinity only. Although all computations 
are based on local information, the network as 
a whole typically still has to achieve a global 
goal. Having local algorithms is inevitable to ob- 
tain time-efficient distributed protocols for large- 
scale and dynamic networks such as peer-to-peer 
networks or wireless ad hoc and sensor networks. 

In [2, 6, 7], Kuhn, Moscibroda, and Watten- 
hofer describe upper and lower bounds on the 
possible trade-off between locality (time com- 
plexity) of distributed algorithms and the quality 
(approximation ratio) of the achievable solution 
for an important class of problems called cover- 
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ing and packing problems. Interesting covering 
and packing problems in the context of net- 
works include minimum dominating set, mini- 
mum vertex cover, maximum matching, as well 
as certain flow maximization problems. All the 
results given in [2, 6, 7] hold for general network 
topologies. Interestingly, it is shown by Kuhn, 
Moscibroda, Nieberg, and Wattenhofer in [3, 4, 5] 
that covering and packing problems can be solved 
much more efficiently when assuming that the 
network topology has special properties which 
seem realistic for wireless networks. 


Distributed Computation Model 

In [2, 3, 4, 5, 6, 7], the network is modeled 
as an undirected and except for [5] unweighted 
graph G = (V, E). Two nodes u,v € V of the 
network are connected by an edge (u,v) € E 
whenever there is a direct bidirectional com- 
munication channel connecting u and v. In the 
following, the number of nodes and the maximal 
degree of G are denoted by n = |V| and by A. 

For simplicity, communication is assumed to 
be synchronous. That is, all nodes start an al- 
gorithm simultaneously and time is divided into 
rounds. In each round, every node can send an 
arbitrary message to each of its neighbors and 
perform some local computation based on the 
information collected in previous rounds. The 
time complexity of a synchronous distributed al- 
gorithm is the number of rounds until all nodes 
terminate. 

Local distributed algorithms in the described 
synchronous model have first been considered 
in [8] and [9]. As an introduction to the above 
and similar distributed computation models, it is 
also recommended to read [11]. 


Distributed Covering and Packing 

Problems 

A fractional covering problem (P) and its dual 
fractional packing problem (D), are linear pro- 
grams (LPs) of the canonical forms 


min c!'x max by 
st. A-x>b (P) st. Al-.y<e (D) 
x>0 y>=0 
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where all aj, b;, and c; are non-negative. In 
a distributed context, finding a small (weighted) 
dominating set or a small (weighted) vertex cover 
of the network graph are the most important 
covering problems. A dominating set of a graph G 
is a subset S of its nodes such that all nodes of G 
either are in S or have a neighbor in S. The domi- 
nating set problem can be formulated as covering 
integer LP by setting A to be the adjacency 
matrix with Is in the diagonal, by setting b to be 
a vector with all 1s and if c is the weight vector. 
A vertex cover is a subset of the nodes such that 
all edges are covered. Packing problems occur in 
a broad range of resource allocation problems. 
As an example, in [1] and [10], the problem of 
assigning flows to a given fixed set of paths is 
described. Another common packing problem is 
(weighted) maximum matching, the problem of 
finding a largest possible set of pairwise non- 
adjacent edges. 

While computing a dominating set, vertex 
cover, or matching of the network graph are 
inherently distributed tasks, general covering 
and packing LPs have no immediate distributed 
meaning. To obtain a distributed version of these 
LPs, two dual LPs (P) and (D) are mapped to 
a bipartite network as follows. For each primal 
variable x; and for each dual variable y;, there 
are nodes v;’ and v4, respectively. There is an 
edge between two nodes v,? and vj4 whenever 
aji # 0, ie., there is an edge if the ith variable 
of an LP occurs in its jth inequality. 

In most real-world examples of distributed 
covering and packing problems, the network 
graph is of course not equal to the described 
bipartite graph. However, it is usually straight- 
forward to simulate an algorithm which is 
designed for the above bipartite network on the 
actual network graph without affecting time and 
message complexities. 


Bounded Independence Graphs 

In [3, 4, 5], local approximation algorithms for 
covering and packing problems for graphs oc- 
curing in the context of wireless ad hoc and 
sensor networks are studied. Because of scale, 
dynamism and the scarcity of resources, these 
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networks are a particular interesting area to apply 
local distributed algorithms. 

Wireless networks are often modeled as unit 
disk graphs (UDGs): Nodes are assumed to be 
in a two-dimensional Euclidean plane and two 
nodes are connected by an edge iff their dis- 
tance is at most 1. This certainly captures the 
inherent geometric nature of wireless networks. 
However, unit disk graphs seem much too restric- 
tive to accurately model real wireless networks. 
In [3, 4, 5], Kuhn et. al. therefore consider two 
generalizations of the unit disk graph model, 
bounded independent graphs (BIGs) and unit ball 
graphs (UBGs). A BIG is a graph where all local 
independent sets are of bounded size. In particu- 
lar, it is assumed that there is a function /(r) which 
upper bounds the size of the largest independent 
set of every r-neighborhood in the graph. Note 
that the value of /(r) is independent of n, the 
size of the network. If J(r) is a polynomial in r, 
a BIG is said to be polynomially bounded. UDGs 
are BIGs with J(r) € O(r?). UBGs are a natural 
generalization of UDGs. Given some underlying 
metric space (V, d) two nodes u,v € V are con- 
nected by an edge iff d(u, v) < 1. If the metric 
space (V, d) has constant doubling dimension, 
(The doubling dimension of a metric space is 
the logarithm of the maximal number of balls 
needed to cover a ball B,(x) in the metric space 
with balls B,/2(y) of half the radius), a UBG is 
a polynomially bounded BIG. 


Key Results 


The first algorithms to solve general distributed 
covering and packing LPs appear in [1, 10]. 
In [1], it is shown that it is possible to 
find a solution which is within a factor of 
l+¢ of the optimum in O(log?(pn)/e?) 
rounds where p is the ratio between the 
largest and the smallest non-zero coefficient 
of the LPs. The result of [1] is improved and 
generalized in [6, 7] where the following result is 
proven: 


Theorem 1 Jn k rounds, (P) and (D) can be 
approximated by a factor of (pA)Oa/ve) 
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using messages of size at most O(log(pA)). 
An (1 + €)-approximation can be found in time 


O(log? (pA)/e*). 


The algorithm underlying Theorem | needs only 
small messages of size O(log(pA)) and ex- 
tremely simple and efficient local computations. 
If larger messages and more complicated (but 
still polynomial) local computations are allowed, 
it is possible to improve the result of Theorem 1: 


Theorem 2 In k rounds, LPs of the form 
(P) or (D) can be approximated by a factor 
of O(n°@/*)). This implies that a constant 
approximation can be found in time O(logn). 


Theorems | and 2 only give bounds on the quality 
of distributed solutions of covering and packing 
LPs. However, many of the practically relevant 
problems are integer versions of covering and 
packing LPs. Combined with simple randomized 
rounding schemes, the following upper bounds 
for dominating set, vertex cover, and matching 
are proven in [6, 7]: 


Theorem 3 Let A be the maximal degree of 
the given network graph. In k rounds, minimum 
dominating set can be approximated by a factor 
of O(Aed/ ve) -log A) in expectation by using 
messages of size O(A). Without bound on the 
message size, an expected approximation ratio 
of O(nPG/®) . log A) can be achieved. Minimum 
vertex cover and maximum matching can both be 
approximated by a factor of O(A'/*) in k rounds. 


In [2, 7], it is shown that the upper bounds 
on the trade-offs between time complexity and 
approximation ratio given by Theorems 1-3 are 
almost optimal: 


Theorem 4 In k rounds, it is not possi- 
ble to approximate minimum vertex cover 
better than by factors of 2(A*/k) and 
Q(n2Q/*) 7), This implies time lower bounds 
of 2 (log A/ log log A) and 2(,/log n/ log logn) 
for constant or even poly-logarithmic approxima- 
tion ratios. The same bounds hold for minimum 
dominating set, for maximum matching, as well 
as for the underlying LPs. 
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While Theorem 4 shows that the results given by 
Theorems 1-3 are close to optimal for worst-case 
network topologies, the problems might be much 
simpler if restricted to networks which actually 
occur in reality. In fact, it is shown in [3, 4, 5] 
that the above results can indeed be improved 
if the network graph is assumed to be a BIG or 
a UBG with constant doubling dimension. In [5], 
the following result for UBGs is proven: 


Theorem 5 Assume that the network graph 
G =(V,E) is a UBG with underlying metric 
(V, d). If (V, d) has constant doubling dimension 
and if all nodes know the distances to their 
neighbors in G up to a constant factor, it is 
possible to find constant approximations for 
minimum dominating set, minimum vertex cover, 
maximum matching, as well as for general LPs of 
the forms (P) and (D) in O(log* n) rounds. (The 
log-star function log* n is an extremely slowly 
increasing function which gives the number of 
times the logarithm has to be taken to obtain 
anumber smaller than 1.) 


While the algorithms underlying the results 
of Theorems | and 2 for solving covering 
and packing LPs are deterministic or straight- 
forward to be derandomized, all known efficient 
algorithms to solve minimum dominating set 
and more complicated integer covering and 
packing problems are randomized. Whether 
there are good deterministic local algorithms for 
dominating set and related problems is a long- 
standing open question. In [3], it is shown that 
if the network is a BIG, efficient deterministic 
distributed algorithms exist: 


Theorem 6 On a BIG it is possible to find con- 
stant approximations for minimum dominating 
set, minimum vertex cover, maximum matching, 
as well as for LPs of the forms (P) and (D) 


deterministically in O(log A - log* n) rounds. 
In [4], it is shown that on polynomially bounded 


BIGs, one can even go one step further and 
efficiently find an arbitrarily good approximation 
by a distributed algorithm: 


Theorem 7 On a polynomially bounded BIG, 
there is a local approximation scheme which 
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computes a (1 + €)-approximation for minimum 
dominating set in time O(log Alog*(n)/e + 
1/e°). If the network graph is a UBG with 
constant doubling dimension and nodes know 
the distances to their neighbors, a (1+ 6)- 
approximation can be computed in O (log* (n)/e+ 
1/e°™) rounds. 


Applications 


The most important application environments for 
local algorithms are large-scale decentralized 
systems such as wireless ad hoc and sensor 
networks or peer-to-peer networks. On such 
networks, only local algorithms lead to scalable 
systems. Local algorithms are particularly 
well-suited if the network is dynamic and 
computations have to be repeated frequently. 

A particular application of the minimum dom- 
inating set problem is the task of clustering the 
nodes of wireless ad hoc or sensor networks. As- 
signing each node to an adjacent node in a dom- 
inating set induces a simple clustering of the 
nodes. If the nodes of the dominating set (i.e., the 
cluster centers) are connected with each other by 
using additional nodes, the resulting structure can 
be used as a backbone for routing. 


Open Problems 


There are a number of open problems related 
to the distributed approximation of covering and 
packing problems in particular and to distributed 
approximation algorithms in general. The most 
obvious open problem certainly is to close the 
gaps between the upper bounds of Theorems 1, 
2, and 3 and the lower bounds of Theorem 4. 
It would also be interesting to see how well 
other optimization problems can be approximated 
in a distributed manner. In particular, the dis- 
tributed complexity of more general classes of 
linear programs remains completely open. A very 
intriguing unsolved problem is to determine to 
what extent randomization is needed to obtain 
time-efficient distributed algorithms. Currently, 
the best determinic algorithms for finding a dom- 
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inating set of reasonable size and for many other 
problems take time 20(V/l0g") whereas the time 
complexity of the best randomized algorithms 
usually is at most polylogarithmic in the number 
of nodes. 
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Problem Definition 


In many ways, familiar distributed computing 
communication models such as the message 
passing model do not describe the harsh 
conditions faced in wireless ad hoc and sensor 
networks closely enough. Ad hoc and sensor net- 
works are multi-hop radio networks and hence, 
messages being transmitted may interfere with 
concurrent transmissions leading to collisions 
and packet losses. Furthermore, the fact that all 
nodes share the same wireless communication 
medium leads to an inherent broadcast nature of 
communication. A message sent by a node can 
be received by all nodes in its transmission range. 
These aspects of communication are modeled by 
the radio network model, e.g., [2]. 


Definition 1 (Radio Network Model) In the 
radio network model, the wireless network is 
modeled as a graph G = (V, E). In every time- 
slot, a node u € V can either send or not send 
a message. A node v, (u,v) € FE, receives the 
message if and only exactly one of its neighbors 
has sent a message in this time-slot. 


While communication primitives such as 
broadcast, wake-up, or gossiping, have been 
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widely studied in the literature on radio networks 
(e.g., [1, 2, 8]), less is known about the 
computation of local network coordination 
structures such as clusterings or colorings. The 
most basic notion of a clustering in wireless 
networks boils down to the graph-theoretic notion 
of a dominating set. 


Definition 2 (Minimum Dominating Set 
(MDS)) Given a graph G = (V,E). A domi- 
nating set is a subset S C V such that every node 
is either in S or has at least one neighbor in S. 
The minimum dominating set problem asks for 
a dominating set S of minimum cardinality. 


A dominating set S CV in which no two 
neighboring nodes are in S is a maximal inde- 
pendent set (MIS). The distributed complexity of 
computing a MIS in the message passing model 
has been of fundamental interest to the distributed 
computing community for over two decades 
(e.g., [1 1-13]), but much less is known about the 
problem’s complexity in radio network models. 


Definition 3 (Maximal Independent Set (MIS)) 
Given a graph G = (V, E). An independent set 
is a subset of pair-wise non-adjacent nodes in G. 
A maximal independent set in G is an indepen- 
dent set S C V such that for every node u ¢ S, 
there is anode v € I’(u) in S. 


Another important primitive in wireless networks 
is the vertex coloring problem, because associ- 
ating different colors with different time slots in 
a time-division multiple access (TDMA) scheme; 
a correct coloring corresponds to a medium ac- 
cess control (MAC) layer without direct interfer- 
ence, that is, no two neighboring nodes send at 
the same time. 


Definition 4 (Minimum Vertex Coloring) 
Given a graph G = (V,£E). A correct vertex 
coloring for G is an assignment of a color c(v) to 
each node v € V, such that c(u) 4 c(v) any two 
adjacent nodes (u,v) € E. A minimum vertex 
coloring is a correct coloring that minimizes the 
number of used colors. 
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In order to capture the especially harsh 
characteristics of wireless multi-hop networks 
immediately after their deployment, the 
unstructured radio network model makes 
additional assumptions. In particular, a new 
notion of asynchronous wake-up is considered, 
because, in a wireless, multi-hop environment, 
it is realistic to assume that some nodes 
join the network (e.g., become deployed, 
or switched on) later than others. Notice 
that this is different from the notion of 
asynchronous wake-up defined and_ studied 
in [8] and subsequent work, in which nodes 
are assumed to be “woken up” by incoming 
messages. 


Definition 5 (Unstructured Radio Network 
Model) In the unstructured radio network 
model, the wireless network is modeled as a unit 
disk graph (UDG) G = (V, E). In every time- 
slot, a node u € V can either send or not send 
a message. A node v, (u,v) € EF, receives the 
message if and only exactly one of its neighbors 
has sent a message in this time-slot. Additionally, 
the following assumptions are made: 


¢ Asynchronous wake-up: New nodes can wake 
up/join in asynchronously at any time. Before 
waking-up, nodes do neither receive nor send 
any messages. 

¢ No global clock: Nodes only have access to 
a local clock that starts increasing after wake- 
up. 

¢ No collision detection: Nodes cannot distin- 
guish between the event of a collision and 
no message being sent. Moreover, a sending 
node does not know how many (if any at 
all!) neighbors have received its transmission 
correctly. 

¢ Minimal global knowledge: At the time of 
their wake-up, nodes have no information 
about their neighbors in the network and they 
do not whether some neighbors are already 
awake, executing the algorithm. However, 
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nodes know an upper bound for the maximum 
number of nodes n = |V |. 


The measure that captures the efficiency of an 
algorithm defined in the unstructured radio net- 
work model is its time-complexity. Since every 
node can wake up at a different time, the time- 
complexity of an algorithm is defined as the 
maximum number of time-slots between a node’s 
wake-up and its final, irrevocable decision. 


Definition 6 (Time Complexity) The running 
time Ty of anode v € V is defined as the number 
of time slots between v’s waking up and the 
time v makes an irrevocable final decision on the 
outcome of its protocol (e.g., whether or not it 
joins the dominating set in a clustering algorithm, 
or which color to take in a coloring algorithm, 
etc.). The time complexity T(Q) of algorithm Q 
is defined as the maximum running time over all 
nodes in the network, i.e., T(Q) := Maxyey Ty. 


Key Results 


Naturally, algorithms for such uninitialized, 
chaotic networks have a_ different flavor 
compared to “traditional” algorithms that operate 
on a given network graph that is static and well- 
known to all nodes. Hence, the algorithmic 
difficulty of the following algorithms partly 
stems from the fact that since nodes wake 
up asynchronously and do not have access 
to a global clock, the different phases of the 
algorithm may be arbitrarily intertwined or 
shifted in time. Hence, while some nodes may 
already be in an advanced stage of the algorithm, 
there may be nodes that have either just woken 
up, or that are still in early stage. It was proven 
in [9] that even in single-hop networks (G is the 
complete graph), no efficient algorithms exist if 
nodes have no knowledge on n. 


Theorem 1 /f nodes have no knowledge of n, 
every (possibly randomized) algorithm requires 
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up to 2(n/logn) time slots before at least one 
node can send a message in single-hop networks. 


In single-hop networks, and if n is globally 
known, [8] presented a randomized algorithm 
that selects a unique leader in time O(n logn), 
with high probability. This result has subse- 
quently been improved to O(logn) by Jurdziriski 
and Stachowiak [9]. The generalized wake-up 
problem in multi-hop radio network was first 
studied in [4]. 

The complexity of local network structures 
such as clusterings or colorings in unstructured 
multi-hop radio networks was first studied 
in [10]: A good approximation to the minimum 
dominating set problem can be computed in 
polylogarithmic time. 


Theorem 2 Jn the unstructured radio network 
model, an expected O(1)-approximation to the 
dominating set problem can be computed in ex- 
pected time O(log*n). That is, every node de- 
cides whether to join the dominating set within 
O(log?n) time slots after its wake-up. 


In a subsequent paper [18], it has been shown that 
the running time of O(log*m) is sufficient even 
for computing the more sophisticated MIS struc- 
ture. This result is asymptotically optimal be- 
cause — improving on a previously known bound 
of Q(log?n/loglogn) [9] -, a corresponding 
lower bound of 2 (logn) has been proven in [6]. 


Theorem 3. With high probability, a maximal in- 
dependent set (MIS) can be computed in expected 
time O(log?n) in the unstructured radio network 
model. This is asymptotically optimal. 


It is interesting to compare this achievable upper 
bound on the harsh unstructured radio network 
model with the best known time lower bounds in 
message passing models: (2(log*7) in unit disk 
graphs [12] and 92(,/logn/ log log) in general 
graphs [11]. Also, a time bound of O(log”) was 
also proven in [7] in a radio network model with- 
out asynchronous wake-up and in which nodes 
have a-priori knowledge about their neighbor- 
hood. 
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Finally, it is also possible to efficiently color 
the nodes of a network as shown in [17], and sub- 
sequently improved and generalized in Chap. 12 
of [15]. 


Theorem 4 In the unstructured radio network 
model, a correct coloring with at most O(A) 
colors can be computed in time O(Alogn) with 
high probability. 


Similar bounds for a model with collision detec- 
tion mechanisms are proven in [3]. 


Applications 


In wireless ad hoc and sensor networks, local 
network coordination structures find important 
applications. In particular, clusterings and color- 
ings can help in facilitating the communication 
between adjacent nodes (MAC layer protocols) 
and between distant nodes (routing protocols), or 
to improve the energy efficiency of the network. 

The following mentions two specific exam- 
ples of applications: Based on the MIS algo- 
rithms of Theorem 3, a protocol is presented 
in [5], which efficiently constructs a spanner, 
i.e., amore sophisticated initial infrastructure that 
helps in structuring wireless multi-hop network. 
In [16], the same MIS algorithm is used as an in- 
gredient for a protocol that minimizes the energy 
consumption of wireless sensor nodes during the 
deployment phase, a problem that has been first 
studied in [14]. 
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Problem Definition 


Consider some massive dataset represented as a 
function f : D }& R, where D is discrete and 
R is an arbitrary range. This dataset could be as 
varied as an array of numbers, a graph, a matrix, 
or a high-dimensional function. Datasets are of- 
ten useful because they possess some property of 
interest. An array might be sorted, a graph might 
be connected, a matrix might be orthogonal, or a 
function might be convex. These properties are 
critical to the use of the dataset. Yet, due to 
unavoidable errors (say, in storing the dataset), 
these properties might not hold any longer. For 
example, a sorted array could become unsorted 
because of roundoff errors. 


Local Reconstruction 


Can we find a function g : D »& R that 
satisfies the property and is “sufficiently close” to 
jf? Let us formalize this question. Let P denote 
a property, which we define as a subset of func- 
tions. We define a distance between functions, 
dist(7, g) = ltx| fx) # g(x)}|/|DI- In words, 
this is the fraction of domain points where f 
and g differ (the relative Hamming distance). 
This definition naturally extends to properties: 
dist( iP) = minyep dist( fh). This is the min- 
imum amount of change f must undergo to have 
property ?. Our aim is to construct g € P 
such that dist( fg) is “small.” The latter can 
be quantified by comparing with the baseline, 
dist(f, P). 

The offline reconstruction problem involves 
explicitly constructing g from f. But this is 
prohibitively expensive if f is a large dataset. 
Instead, we wish to locally construct g, meaning 
we want to quickly compute g(x) (for x € D) 
without constructing g completely. 


Local filters [13]: A local filter for property P 
is an algorithm A satisfying the following con- 
ditions. The filter has oracle access to function 
f : Dw& R, meaning that it can access f(x) for 
any x € D. Each such access is called a lookup. 
The filter takes as input an auxiliary random seed 
pand x € D. For fixed fp, A runs determinis- 
tically on input x to produce an output A ¢ p(x). 
Note that A fp Specifies a function on domain D, 
which will be the desired function g. 


1. For each f and p, A, always satisfies P. 

2. For each f, with high probability over p, the 
function A fp is suitably close to f. 

3. For each x, Ag¢,(x) can be computed with 
very few lookups. 

4. The size of the random seed p should be much 
smaller than D. 


Let g be Agp,. Condition 2 has been 
formalized in at least two different ways. The 
original definition demanded that dist(f,g) < 
c- dist(f,P), where c is a fixed constant [13]. 
Other results only enforce Condition 2 when 
f € P [3,9]. One could imagine desiring 
|dist( f, g) — dist( f, P)| < 6, for input parameter 
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6. Conditions 3 and 4 typically demand that 
the lookup complexity and random seed lengths 
are o(|D|) (sometimes we desire them to be 
poly(log | D|) or even constant). 


Connections with Other Models 

The notion of reconstruction through filters was 
first proposed by Ailon et al. in [1], though 
the requirements were less stringent. There is a 
sequence x1,X2,... of domain points generated 
online. Given x;, the filter outputs value g(x;). 
The filter is allowed to store previous outputs 
to ensure consistency. Saks and Seshadhri [13] 
defined local filters to address two concerns with 
this model. First, the storage of all previous 
queries and answers is a massive space overhead. 
Second, different runs of the filter construct dif- 
ferent g’s (because the filter is randomized). So 
we cannot instantiate multiple copies of the filter 
to handle queries independently and consistently. 

Independent of this line of work, Brakerski 
defined local restorers [6], which are basically 
equivalent to filters with an appropriate setting 
of Conditions 1 and 2. A major generalization 
of local filters, called local computation algo- 
rithms, was subsequently proposed by Rubinfeld 
et al. [12]. This model considers computation on 
a large input where the output itself is large (e.g., 
one may want a maximal independent set of a 
massive graph). The aim is to locally compute the 
output, by an algorithm akin to a filter. 

Depending on how Property 2 is instantiated, 
filters can easily be used for tolerant testing and 
distance approximation [11]. If the filter ensures 
that dist(, g) is comparable to dist( f,P), then 
it suffices to estimate dist(f| g) for distance ap- 
proximation. 

A special case of local reconstruction that has 
received extensive attention is decoding of error 
correcting codes. Here, f is some string, and 
P is the set of all valid code words. In local 
decoding, there is either one correct output or a 
sparse list of possible correct outputs. For general 
properties, there may be many (possibly infinitely 
many) ways to construct a valid reconstruction 
g. This creates challenges for designing filters. 
Once the random seed is fixed, all query answers 
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provided by the filter must be consistent with a 
single function having the property. 


Key Results 


Over the past decade, there have been numerous 
results on local reconstruction, over a variety of 
domains. 


Monotonicity 

The most studied property for local reconstruc- 
tion is monotonicity [1,3,5, 13]. Consider f : 
[n]¢ © R, where d,n are positive integers and 
[n] = {1,2,...,}. The domain is equipped with 
the natural coordinate-wise partial order. Namely, 
x < yif x # y and all coordinates of x are 
less than or equal those of y. A function f is 
monotone if: Vx ~ y, f(x) < f(y). When 
d = 1, monotone functions are exactly sorted 
arrays. 

Most initial work on local filters focused ex- 
clusively on monotonicity. There exists a filter 
of running time (logn)?@) with dist(f,g) < 
2°@dist( f, P) [13]. There are nearly matching 
lower bounds that are extremely strong; even for 
relaxed versions of Condition 2 [3]. 


Lipschitz Continuity 

Let n,d be positive integers and c be a positive 
real number. A function f : [n]? + R is c- 
Lipschitz if Wx. y, | f(x) — f(y) < ellx— yh. 
This is a fundamental property of functions and 
appears in functional analysis, statistics, and op- 
timization. Recently, Lipschitz continuity was 
studied from a property testing perspective by 
Jha and Raskhodnikova [9]. Most relevant to this 
essay, they gave an application of local Lipschitz 
filters for differential privacy. The guarantees on 
their filter are analogous to monotonicity (with 
a weaker form of Property 2). Awasthi, Jha, 
Molinaro, and Raskhodnikova [3] gave matching 
lower bounds for these reconstruction problems. 


Dense Graph Properties 

Dense graphs are commonly studied in property 
testing, where the input is given as a binary adja- 
cency matrix. Brakerski’s work on local restorers 
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provides filters for bipartiteness and existence 
of large cliques. Large classes of dense graphs 
are known to be tolerant testable. These results 
have been extended to local filters for hypergraph 
properties by Austin and Tao [2]. This work was 
developed independently of all the previous work 
on filters, and their algorithms are called “repair” 
algorithms. 


Connectivity Properties of Sparse Graphs 

In the sparse graph setting, the input G is an 
adjacency list of a bounded-degree graph. Filters 
have been given for several properties regarding 
connectivity. Campagna, Guo, and Rubinfeld [7] 
provide reconstructors for k-connectivity and the 
property of having low diameter. Local recon- 
structors for the property of expansion were given 
by Kale and Seshadhri [10]. 


Convexity in 2, 3-Dimensions 

Chazelle and Seshadhri [8] studied reconstruction 
in the geometric setting. They focus on convex 
polygon and 3D convex polytope reconstruction. 
These results were in the online filter setting 
of [1], though their 3D result is a local filter. 


Open Problems 


For any property tester, one can ask if the as- 
sociated property has a local filter. Given the 
breadth of this area, we cannot hope to give a 
good summary of open problems. Nonetheless, 
we make a few suggestions. 


The Curse of Dimensionality for 

Monotonicity and Lipschitz 

Much work has gone into understanding local 
filters for monotonicity, but it is not clear how to 
remove the exponential dependence on d. Can 
we find a reasonable setting for filters where a 
poly(d,logn) lookup complexity is possible? 
One possibility is requiring only “additive error” 
in Condition 2. For some parameter 6 > 0, 
we only want |dist(fg) — dist(fiP)| < 
6. Is there a filter with lookup complexity 
poly(d, logn, 1/6)? This definition would avoid 
previous lower bounds [3]. 
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Filters for Convexity 

Filters for convexity could have a great impact 
on optimization. A local filter would implicitly 
represent a close enough convex function to an 
input non-convex function, which would be ex- 
tremely useful for (say) minimization. For this 
application, it would be essential to handle high- 
dimensional functions. Unfortunately, there are 
no known property testers for convexity in this 
setting, so designing local filters is a distant goal. 


Filters for Properties of Bounded-Degree 
Graphs 

The large class of minor-closed properties (such 
as planarity) is known to be testable for bounded- 
degree graphs [4]. Can we get local filters for 
these properties? This appears to be a challenging 
question, since even approximating the distance 
to planarity is a difficult problem. Nonetheless, 
the right relaxations of filter conditions could lead 
to positive results for filters. 
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Problem Definition 

Clustering is a form of unsupervised learning, 


where the goal is to “learn” useful patterns in 
a data set D of size n. It can also be thought of 
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as a data compression scheme where a large data 
set is represented using a smaller collection of 
“representatives”. Such a scheme is characterized 
by specifying the following: 


1. A distance metric d between items in the data 
set. This metric should satisfy the triangle 
inequality: d(i, 7) <d(j,k) + d(k,i) for 
any three items i,j,k € D. In addition, 
di,j)=d(j,i) for all i,j €S and 
d(i,i) = 0. Intuitively, if the distance between 
two items is smaller, they are more similar. 
The items are usually points in some high 
dimensional Euclidean space R%. The 
commonly used distance metrics include the 
Euclidean and Hamming metrics, and the 
cosine metric measuring the angle between 
the vectors representing the items. 

2. The output of the clustering process is 
a partitioning of the data. This chapter 
deals with center-based clustering. Here, 
the output is a smaller set CC R% of 
centers which best represents the input data 
set SC R4. It is typically the case that 
|C| < |D|. Each item 7 € D is mapped to or 
approximated by the the closest center i € C, 
implying dG, j) <d(i’,j) for all i’€C. 
Let o : D — C denote this mapping. This is 
intuitive since closer-by (similar) items will 
be mapped to the same center. 

3. A measure of the quality of the clustering, 
which depends on the desired output. There 
are several commonly used measures for the 
quality of clustering. In each of the cluster- 
ing measures described below, the goal is to 
choose C such that |C| = k and the objective 
function f(C) is minimized. 


k-center: f(C) = maxjend(j,o(j)). 
k-median: f(C) = > jen d(j,o(/)). 
k-means: f(C) = > jen d(j,o(j)). 


All the objectives described above are NP- 
HARD to optimize in general metric spaces d, 
leading to the study of heuristic and approxi- 
mation algorithms. In the rest of this chapter, 
the focus is on the k-median objective. The ap- 
proximation algorithms for k-median clustering 
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are designed for d being a general possibly non- 
Euclidean metric space. In addition, a collec- 
tion F of possible center locations is given as 
input, and the set of centers C is restricted to 
C C Ff. From the perspective of approximation, 
the restriction of the centers to a finite set F 
is not too restrictive — for instance, the optimal 
solution which is restricted to 7 = D has ob- 
jective value at most a factor 2 of the optimal 
solution which is allowed arbitrary F. Denote 
|D| =n, and |F| = m. The running times of the 
heuristics designed will be polynomial in m n, 
and a parameter ¢ > 0. The metric space d is now 
defined over D U F. 

A ftelated problem to k-medians is its 
Lagrangean relaxation, called FACILITY LOCA- 
TION. In this problem, there is a again collection 
F of possible center locations. Each location 
i € F has a location cost r;. The goal is to choose 
a collection C C F of centers and construct the 
mapping o : S — C from the items to the centers 
such that the following function is minimized: 


fO = D5 aG.oG)) + on. 
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The facility location problem effectively gets 
rid of the hard bound k on the number of cen- 
ters in k-medians, and replaces it with the cen- 
ter cost term Suen r; in the objective function, 
thereby making it a Lagrangean relaxation of the 
k-median problem. Note that the costs of centers 
can now be non-uniform. 

The approximation results for both the k- 
median and facility location problems carry 
over as is to the weighted case: Each item 
j €D is allowed to have a _ non-negative 
weight w;. In the objective function f(C), 
the term Viep d(j,o(j)) is replaced with 
Viep Wj d(j,o(j)). The weighted case is 
especially relevant to the FACILITY LOCATION 
problem where the item weights signify user 
demands for a resource, and the centers denote 
locations of the resource. In the remaining 
discussion, “items” and “users” are used inter- 
changably to denote members of the set D. 
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Key Results 


The method of choice for solving both the k- 
median and FACILITY LOCATION problems are 
the class of local search heuristics, which run 
in “local improvement” steps. At each step 1, 
the heuristic maintains a set C; of centers. For 
the k-median problem, this collection satisfies 
|C;| = k. A local improvement step first gener- 
ates a collection of new solutions £;4, from C;. 
This is done such that |£;+ | is polynomial in the 
input size. For the k-median problem, in addition, 
each C € £;+1 satisfies |C| = k. The improve- 
ment step sets C41 = argminces,,, f(C). For 
a pre-specified parameter ¢ > 0, the improve- 
ment iterations stop at the first step T where 
J (Cr) 2 (l—)7 (Cr); 

The key design issue is the specification of the 
start set Co, and the construction of £;4, from C;. 
The key analysis issues are bounding the number 
of steps T till termination, and the quality of the 
final solution f(C7r) against the optimal solution 
Ff (C*). The ratio (f(Cr))/(f(C*)) is termed the 
“locality gap” of the heuristic. 

Since each improvement step reduces the 
value of the solution by at least a factor of 
(1 —e), the running time in terms of number 
of improvement steps is given by the following 
expression (here D is the ratio of the largest 
to smallest distance in the metric space over 


DU F). 


Lc) (FE) _log(nD) 


T<1 


which is polynomial in the input size. Each im- 
provement step needs computation of f(C) for 
C € £;. This is polynomial in the input size since 
|E£;| is assumed to be polynomial. 


k-Medians 

The first local search heuristic with provable 
performance guarantees is presented in the work 
of Arya et al. [1]. The is the natural p-swap 
heuristic: Given the current center set C; of size 
k, the set £;4 1 is defined by: 
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Er41 = (Ci \ A) UB, 
where ACC;,BC F \ Cr, |A|=|Bl <p}. 


The above simply means swap at most p centers 
from C; with the same number of centers from 
F \ Cr. Recall that |D| = n and |F| = m. 
Clearly, |E;41| < (k(m—k))? < (km)?. The 
start set Co is chosen arbitrarily. The value p 
is a parameter which affects the running time 
and the approximation ratio. It is chosen to be 
a constant, so that |£,| is polynomial in m. 


Theorem 1 ({1]) The p-swap heuristic achieves 
locality gap (3+2/p)+e in running time 
O(nk(log(nD))/e(mk)?). Furthermore, _ for 
every p there is a k-median instance where 
the p-swap heuristic has locality gap exactly 
(3 + 2/p). 


Setting p = 1/e, the above heuristic achieves 
a 3+ € approximation in running time 
O(n(mk)?G/2)), 


Facility Location 

For this problem, since there is no longer a con- 
straint on the number of centers, the local im- 
provement step needs to be suitably modified. 
There are two local search heuristics both of 
which yield a locality gap of 3 + ¢ in polynomial 
time. 

The “add/delete/swap” heuristic proposed by 
Kuehn and Hamburger [10] either adds a center 
to C;, drops a center from C;, or swaps a center 
in C; with one in F \ C;. The start set Co is again 
arbitrary. 


Fr41={(C\ A)UB, where ACC;, BC F\ Cy, 
|.A| = 0, |B] = lor |A| = 1,|B| = 0, or 
|A| = 1, |B] = 1} 


Clearly, |£;41| = O(m?), making the running 
time polynomial in the input size and 1/e. 
Korupolu, Plaxton, and Rajaraman [9] show that 
this heuristic achieves a locality gap of at most 
5+ e. Arya et al. [1] strengthen this analysis to 
show that this heuristic achieves a locality gap 
of 3 + ¢, and that bound this is tight in the sense 
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that there are instances where the locality gap is 
exactly 3. 

The “add one/delete many” heuristic proposed 
by Charikar and Guha [2] is slightly more in- 
volved. This heuristic adds one facility and drops 
all facilities which become irrelevant in the new 
solution. 


Erg = (Cr Uti) \ TM), 
where i € F \ C;, [(i) € C;} 


The set /(i) is computed as follows: Let W denote 
the set of items closer to 7 than to their assigned 
centers in C;. These items are ignored from the 
computation of /(i). For every center s € C;, 
let U; denote all items which are assigned to s. If 
ts + Vijcus\w Gi. 8) > Vicu,\w didU, i), 
then it is cheaper to remove location s and 
reassign the items in Us \ W to i. In this case, s 
is placed in J(i). Let N denote m + n. Computing 
I(i) is therefore a O(N) time greedy procedure, 
making the overall running time polynomial. 
Charikar and Guha [2] show the following 
theorem: 


Theorem 2 ({2]) The local search heuristic 
which attempts to add a random center 
i € C; and remove set I(i), computes a 3+6 
approximation with high probability within 
T = O(N log N(log N + 1/e)) — improvement 
steps, each with running time O(N). 


Capacitated Variants 

Local search heuristics are also known for ca- 
pacitated variants of the k-median and facility 
location problems. In this variant, each possible 
location i € F can serve at most u; number of 
users. In the soft capacitated variant of facility 
location, some 7; > 0 copies can be opened at 
i € F so that the facility cost is fjr; and the 
number of users served is at most rju;. The 
optimization goal is now to decide the value of 
r; for eachi € F so that the assignment of users 
to the centers satisfies the capacity constraints at 
each center, and the cost of opening the centers 
and assigning the users is minimized. For this 
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variant, Arya et al. [1] show a local search heuris- 
tic with a locality gap of 4 + «. 

In the version of facility location with hard 
capacities, location? € F has a hard bound yu; on 
the number of users that can be assigned here. 
If all the capacities u; are equal (uniform case), 
Korupolu, Plaxton, and Rajaraman [9] present an 
elegant local search heuristic based on solving 
a transshipment problem which achieves a 8 + € 
locality gap. The analysis is improved by Chu- 
dak and Williamson [4] to show a locality gap 
6+ e. The case of non-uniform capacities re- 
quires significantly new ideas — Pal, Tardos, and 
Wexler [14] present a network flow based local 
search heuristic that achieves a locality gap of 
9 + e. This bound is improved to 8 + ¢ by Mah- 
dian and Pal [12], who generalize several of the 
local search techniques described above in order 
to obtain a constant factor approximation for 
the variant of facility location where the facility 
costs are arbitrary non-decreasing functions of 
the demands they serve. 


Related Algorithmic Techniques 

Both the k-median and facility location prob- 
lems have a rich history of approximation re- 
sults. Since the study of uncapacitated facility 
location was initiated by Cornuejols, Nemhauser, 
and Wolsey [5], who presented a natural lin- 
ear programming (LP) relaxation for this prob- 
lem, several constant-factor approximations have 
been designed via several techniques, ranging 
from rounding of the LP solution [11, 15], local 
search [2, 9], the primal-dual schema [7], and 
dual fitting [6]. For the k-median problem, the 
first constant factor approximation [3] of 63 was 
obtained by rounding the natural LP relaxation 
via a generalization of the filtering technique 
in [11]. This result was subsequently improved 
to a 4 approximation by Lagrangean relaxation 
and the primal-dual schema [2, 7], and finally to 
a (3 + €) approximation via local search [1]. 


Applications 


The facility location problem has been widely 
studied in operations research [5, 10], and forms 
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a fundamental primitive for several resource lo- 
cation problems. The k-medians and k-means 
metrics are widely used in clustering, or un- 
supervised learning. For clustering applications, 
several heuristic improvements to the basic lo- 
cal search framework have been proposed: k- 
Medioids [8] selects a random input point and 
replaces it with one of the existing centers if there 
is an improvement; the CLARA [8] implemen- 
tation of k-Medioids chooses the centers from 
a random sample of the input points to speed up 
the computation; the CLARANS [13] heuristic 
draws a fresh random sample of feasible centers 
before each improvement step to further improve 
the efficiency. 
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Problem Definition 

In the context of distributed network computing, 
an important concern is the ability to design 
local algorithms, that is, distributed algorithms 
in which every node (Each node is a computing 
entity, which has the ability to exchange mes- 
sages with its neighbors in the network along its 
communication links.) of the network can deliver 
its result after having consulted only nodes in 
its vicinity. The word “vicinity” has a rather 
vague interpretation in general. Nevertheless, the 
objective is commonly to design algorithms in 
which every node outputs after having exchanged 
information with nodes at constant distance from 
it (i.e., at distance independent of the number 
of nodes n in the networks) or at distance at 
most polylogarithmic in n, but certainly signifi- 
cantly smaller than n or than the diameter of the 
network. 

The tasks to be solved by distributed algo- 
rithms acting in networks can be formalized as 
follows. The network itself is modeled by an 
undirected connected graph G with node set 
V(G) and edge set E(G), without loops and 
double edges. In the sequel, by graph we are 
only referring to this specific type of graphs. 
Nodes are labeled by a function ¢ Vo 
{0, 1}* that assigns to every node v its label 
£(v). A pair (G,£), where G is a graph and 
£ is a labeling of G, is called configuration, 
and a collection £ of configurations is called 
a distributed language. A typical example of a 
distributed language is: Lproperly colored = {(G, £) : 
L(v) # £(v’) for all {v, v'} € E(G)}. 

Unless specified otherwise, we are always as- 
suming that the considered languages are decid- 
able in the sense of classical (sequential) com- 
putability theory. To every distributed language £ 
can be associated a construction task which con- 
sists in computing the appropriate labels for a 
given network (Here, we are restricting ourselves 
to input-free construction tasks, but the content 
of this chapter can be generalized to tasks with 
inputs, in which case the labels are input-output 
pairs, and, given the inputs, the nodes must pro- 
duce the appropriate outputs to fit in the consid- 
ered language.): 
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Problem 1 (Construction Task for L) 


INPUT: A graph G (in which nodes haves no 
labels); 

OUTPUT: A label £(v) at each node v, such that 
(G,£) EL. 


For instance, the construction task for 
Lproperly colored Consists, for each node of a graph 
G, to output a color so that any two adjacent 
nodes do not output the same color. To every 
distributed language £ can also be associated a 
decision task, which consists in having nodes 
deciding whether any given configuration (G, £) 
is in £ (in this case, every node v is given its 
label €(v) as inputs). This type of tasks finds 
applications whenever it is desired to check the 
correctness of a solution produced by another 
algorithm or, say, by some black box that may 
act incorrectly. The decision rule, motivated by 
various considerations including termination 
detection, is as follows: if (G,£) € CL, then 
all nodes must accept the configuration, while if 
(G,l) € L, then at least one node must reject 
that configuration. In other words: 


Problem 2 (Decision Task for £) 


INPUT: A configuration (G, £) (i.e., each node 
v € V(G) has a label £(v)); 

OUTPUT: A boolean b(v) at each node v such 
that: 


GOQel — /\ bv) =te. 
veV(G) 


For instance, a decision algorithm for 
Lproperly colored CONSists, for each node v, of a 
graph G with input some color £(v), to accept if 
all its neighbors have colors distinct from €(v), 
and to reject otherwise. Finally, the third type of 
tasks can be associated to distributed languages, 
called verification tasks, which can also be seen 
as a nondeterministic variant of the decision 
tasks. In the context of verification, in addition 
to its label £(v), every node v € V(G) is also 
given a certificate c(v). This provides G with a 
global distributed certificate c : V(G) — {0, 1}* 
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that is supposed to attest to the fact that the 
labels are correct. If this is indeed the case, i.e., 
if (G,£) € ZL, then all nodes must accept the 
instance (provided with the due certificate). Note 
that a verification algorithm is allowed to reject 
a configuration (G, £) € £ in case the certificate 
is not appropriate for that configuration since 
for every configuration (G,f) € ZL, one just 
asks for the existence of at least one appropriate 
certificate. In addition, to prevent the nodes to be 
fooled by some certificate on an illegal instance, 
it is also required that if (G,¢) ¢ ZL, then for 
every certificate, at least one node must reject 
that configuration. In other words: 


Problem 3 (Verification Task for £) 


INPUT: A configuration (G, £), and a distributed 
certificate c; 

OUTPUT: A boolean b(v,c) at each node v, 
which may indeed depend on c, such that: 


GOeL—> VO A bQ,c')=tre. 


c’€{0,1}* veV(G) 


For instance, cycle-freeness cannot be locally 
decided, as even cycles and paths cannot be lo- 
cally distinguished. However, cycle-freeness can 
be locally verified, using certificates on O(log n) 
bits, as follows. The certificate of node v in 
a cycle-free graph G is its distance in G to 
some fixed node v9 € V(G). The verification 
algorithm essentially checks that every node v 
with c(v) > O has a unique neighbor v’ with 
c(v’) = c(v) — 1 and all its other neighbors w 
with c(w) = c(v) + 1, while a node v with 
c(v) = 0 checks that all its neighbors w satisfy 
c(w) = 1. If G has a cycle, then no certificates 
can pass these tests. As in sequential computabil- 
ity theory, the terminology “verification” comes 
from the fact that a distributed certificate can be 
viewed as a (distributed) proof that the current 
configuration is in the language, and the role of 
the algorithm is to verify this proof. The ability 
to simultaneously construct a labeling £ for G as 
well as a proof c certifying the correctness of € is 
a central notion in the design of distributed self- 
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stabilizing algorithms — in which variables can be 
transiently corrupted. 

Locality in distributed graph algorithms 
is dealing with the design and analysis of 
distributed network algorithms solving any of 
the above three kinds of tasks. 


Computational Model 

The study of local algorithms is usually tackled 
in the framework of the so-called LOCAL model, 
formalized and thoroughly studied in [13]. In this 
model, every node v is a Turing machine which is 
given an identity, i.e., a nonnegative integer id(v). 
All identities given to the nodes of any given net- 
work are pairwise distinct. All nodes execute the 
same algorithm. They wake up simultaneously, 
and the computation performs in synchronous 
rounds, where each round consists in three phases 
executed by each node: (1) send a message to 
all neighboring nodes in the network, (2) receive 
the messages sent by the neighboring nodes in 
the network, and (3) perform some individual 
computation. The complexity of an algorithm in 
the LOCAL model is measured in term of number 
of rounds until all nodes terminate. This number 
of rounds is actually simply the maximum, taken 
over all nodes in the network, of the distance 
at which information is propagated from a node 
in the network. In fact, an algorithm performing 
in ¢ rounds can be rewritten into an algorithm in 
which every node, first, collects all data from the 
nodes at distance at most t from it in G and, 
second, performs some individual computation 
on these data. 

Observe that the LOCAL model is exclusively 
focusing on the locality issue and ignores several 
aspects of the computation. In particular, it is 
synchronous and fault-free. Also, the model is 
oblivious to the amount of individual computa- 
tion performed at each node, and it is oblivi- 
ous to the amount of data that are transmitted 
between neighbors at each round. An impor- 
tant consequence of these facts is that lower 
bounds derived in this model are very robust, in 
the sense that they are not resulting from clock 
drifts, crashes, nor from any kind of limitation 
on the individual computation or on the volume 
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of transmitted data. Instead, lower bounds in the 
LOCAL model result solely from the fact that 
every node is unaware of what is lying beyond a 
certain horizon in the network and must cope with 
this uncertainty (Most upper bounds are however 
based on algorithms that perform polynomial- 
time individual computations at each node and 
exchange only a polylogarithmic amount of bits 
between nodes.). 

Note also that the identities given to the nodes 
may impact the result of the computation. In 
particular the label £ produced by a construction 
algorithm may not only depend on G but also 
on the identity assignment id : V(G) > N. 
The same holds for decision and verification 
algorithms, in which the accept/reject decision 
at a node may be impacted by its identity (thus, 
for an illegal configuration, the nodes that reject 
may differ depending on the identity assignment 
to the nodes). However, in the case of verifica- 
tion, it is desirable that the certificates given to 
the nodes do not depend on their identities, but 
solely on the current configuration. Indeed, the 
certificates should rather be solely depending on 
the given configuration with respect to the con- 
sidered language and should better not depend 
on implementation factors such as, say, the IP 
address given to a computer (The theory of proof- 
labeling scheme [7] however refers to scenarios 
in which it is fully legitimate that certificates may 
also depend on the node identities). 


Classical Tasks 

Many tasks investigated in the framework of 
network computing are related to classical graph 
problems, including computing proper colorings, 
independent sets, matchings, dominating sets, 
etc. Optimization problems are however often 
weakened. For instance, the coloring problem 
considered in the distributed setting is typically 
(A+ 1)-coloring, where A denotes the maximum 
node degree of the current network. Similarly, 
instead of looking for a minimum dominating set, 
or for a maximum independent set, one typically 
looks for dominating sets (resp., independent 
sets) that are minimal (resp., maximal) for 
inclusion. There are at least two reasons for 
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such relaxations, besides the fact that the relaxed 
versions are sequentially solvable in polynomial 
time by simple greedy algorithms while the 
original versions are NP-hard. First, one can 
trivially locally decide whether a solution of the 
aforementioned relaxed problems satisfies the 
constraints of the relaxed variants, which yield 
the question of whether one can also construct 
their solutions locally (Instead, problems like 
minimum-weight spanning tree construction 
cannot be checked locally as the presence 
of an edge in the solution may depend of 
another edge, arbitrarily far in the network.). 
Second, these relaxed problems already involve 
one of the most severe difficulties distributed 
computing has to cope with, that is, symmetry 
breaking. 


Key Results 


In this section, we say that a distributed algorithm 
is local if and only if it performs in a constant 
number of rounds in the LOCAL model, and we 
are interested in identifying distributed languages 
that are locally constructible, locally decidable, 
and/or locally verifiable. 


Local Algorithms 

Naor and Stockmeyer [11] have thoroughly stud- 
ied the distributed languages that can be lo- 
cally constructed. They established that it is TM- 
undecidable whether a distributed language can 
be locally constructed, and this holds even if 
one restricts the problem to distributed languages 
that can be locally decided (On the other hand, 
it appears to be not easy to come up with a 
nontrivial example of a distributed language that 
can be constructed locally. One such nontriv- 
ial example is given in [11]: weak coloring, 
in which every non-isolated node must have at 
least one neighbor colored differently, is locally 
constructible for a large class of graphs. This 
problem is related to some resource allocation 
problem.). The crucial notion of order-invariant 
algorithms, defined as algorithms such that the 
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output at every node is identical for every two 
identity assignments that preserve the relative 
ordering of the identities, was also introduced 
in [11]. Using Ramsey theory, it is proved that 
in networks with constant maximum degree, for 
every locally decidable distributed language £ 
with constant-size labels, if £ can be constructed 
by a local algorithm, then £ can also be con- 
structed by a local order-invariant algorithm. This 
result has many important consequences. One is 
for instance the impossibility to solve (A + 1)- 
coloring and maximal independent set (MIS) in 
a constant number of rounds. This follows from 
the fact that a ¢-round order-invariant algorithm 
cannot solve these problems in rings where nodes 
are consecutively labeled from | to n, because 
adjacent nodes with identities in [t + I,n — 
t — 1] must produce the same output. Another 
important consequence of the restriction to order- 
invariant algorithms is the derandomization theo- 
rem in [11] stating that, in constant degree graphs, 
for every locally decidable distributed language £ 
with constant-size label, if £ can be constructed 
by a randomized Monte Carlo local algorithm, 
then £ can also be constructed by a deterministic 
local algorithm. 

The distributed languages that can be locally 
decided, or verified, have been studied by Fraig- 
niaud, Korman, and Peleg in [6]. Several com- 
plexity classes are defined and separated, and 
complete languages are identified for the local 
reduction. It is also shown in [6] that the class 
of all distributed languages that can be locally 
verified by a randomized Monte Carlo algorithm 
with success probability See includes all dis- 
tributed languages. The impact of randomiza- 
tion is however somehow limited, at least for 
the class of distributed languages closed under 
node deletion. Indeed, [6] establishes that for any 
such language £, if £ can be locally decided 
by a randomized Monte Carlo algorithm with 
success probability greater than afso1 , then £ can 
be locally decided by a deterministic algorithm. 
Finally, [6] additionally discusses the power of 
oracles providing nodes with information about 
the current network, like, typically, its number of 
nodes. 
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Almost Local Algorithms 

Linial [8] proved that constructing a (A + 1)- 
coloring, or, equivalently, a MIS, requires 
Q(log* n) rounds, where log* x is the number 
of times one should iterate the log function, 
starting from x, for reaching a value less than 1. 
The log* function grows quite slowly (e.g., 
log* 101° = 5), but is not constant. This lower 
bound holds even for n-node rings in which 
identities are in [1,”], nodes know n, and nodes 
share a consistent notion of clockwise direction. 
Linial’s lower bound is tight, as a 3-coloring 
algorithm performing in O(log* 1) rounds can be 
obtained by adapting the algorithm by Cole and 
Vishkin [5] originally designed for the PRAM 
model to the setting of the lower bound. Also, 
Linial [8] describes a O(log* )-round algorithm 
for A?-coloring. Note that the 2(log* n)-round 
lower bound for (A + 1)-coloring extends to 
randomized Monte Carlo algorithms [10]. On the 
other hand, the best known upper bounds on the 
number of rounds to solve (A + 1)-coloring in 
arbitrary graphs are 29(Vlee”) for deterministic 
algorithms [12] and expected O(logn) for 
randomized Las Vegas algorithms [1, 9]. 

By expressing the complexity of local 
algorithms in terms of both the size n of 
the network and its max-degree A, one can 
distinguish the impact of these two parameters. 
For instance, Linial’s O(log* 1)-round algorithm 
for A?-coloring [8] can be adapted to produce an 
O(A? + log* n)-round algorithm for (A + 1)- 
coloring. This bound has been improved by a 
series of contributions, culminating to the cur- 
rently best known algorithm for (A + 1)-coloring 
performing in O(A + log* n) rounds [3]. Also, 
there is arandomized (A+ 1)-coloring algorithms 
performing in expected O (log A + log n) 
rounds [14]. This algorithm was _ recently 
improved to another algorithm performing in 


O (log A + ev eran) rounds [4]. 


Additional Results 

The reader is invited to consult the monograph [2] 
for more inputs on local distributed graph col- 
oring algorithms, the survey [15] for a detailed 
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survey on local algorithms, as well as the text- 
book [13] for the design of distributed graph algo- 
rithms in various contexts, including the LOCAL 
model. 


Open Problems 


As far as local construction tasks are concerned, 
in a way similar to what happens in sequential 
computing, the theory of local distributed 
computing lacks lower bounds. (Celebrated 
Linial’s lower bound [8] is actually one of the 
very few examples of a nontrivial lower bound 
for local computation). As a consequence, one 
observes large gaps between the numbers of 
rounds of the best known lower bounds and 
of the best known algorithms. This is typically 
the case for (A + 1)-coloring. One of the most 
important open problems in this field is in fact to 
close these gaps for coloring as well as for many 
other graph problems. Similarly, although studied 
in depth by, e.g., Naor and Stockmeyer [11], 
the power of randomization is still not fully 
understood in the context of local computation. In 
general, the best known randomized algorithms 
are significantly faster than the best known 
deterministic algorithms, as witnessed by the 
case of (A + 1)-coloring. Nevertheless, it is not 
known whether this is just an artifact of a lack of 
knowledge or an intrinsic separation between the 
two classes of algorithms. 

In the context of local decision and verification 
tasks, the interplay between the ability to decide 
or verify locally and the ability to search (i.e., 
construct) locally is not fully understood. The 
completeness notions for local decision in [6] 
do not seem to play the same role as the com- 
pleteness notions in classical complexity theory. 
In particular, in the context of local computing, 
one has not yet observed phenomena similar to 
self-reduction for NP-complete problems. Yet, 
the theory of local decision and verification is 
in its infancy, and it may be too early for draw- 
ing conclusions about its impact on distributed 
local search. An intriguing question is related 
to generalizing decision and verification tasks 
in a way similar to the polynomial hierarchy in 
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sequential computing, by adding more alternating 
quantifiers in the specification of Problem 3. For 
instance, it would then be interesting to figure 
out whether each level of the hierarchy has a 
“natural” language as representative. 
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Problem Definition 


Classical error-correcting codes allow one to en- 
code a k-bit message x into an n-bit codeword 
C(x), in such a way that x can still be accurately 
recovered even if C(x) gets corrupted in a small 
number of coordinates. The traditional way to 
recover even a small amount of information con- 
tained in x from a corrupted version of C(x) is 
to run a traditional decoder for C, which would 
read and process the entire corrupted codeword, 
to recover the entire original message x. The 
required information or required piece of x can 
then be read off. In the current digital age where 
huge amounts of data need to be encoded and 
decoded, even running in linear time to read 
the entire encoded data might be too wasteful, 
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and the need for sublinear algorithms for error 
correction is greater than ever. Specially if one is 
only interested in recovering a single bit or a few 
bits of x, it is possible to have codes with much 
more efficient decoding algorithms, which allow 
for the decoding to take place by only reading a 
sublinear number of code positions. Such codes 
are known as locally decodable codes (LDCs). 
Locally decodable codes allow reconstruction of 
an arbitrary bit x; with high probability by only 
reading t < k randomly chosen coordinates of 
(a possibly corrupted) C(x). 

The two main interesting parameters of a 
locally decodable code are (1) the codeword 
length n (as a function of the message length k) 
which measures the amount of redundancy that 
is introduced into the message by the encoder 
and (2) the query complexity of local decoding 
which counts the number of bits that need to be 
read from a (corrupted) codeword in order to 
recover a single bit of the message. Ideally, one 
would like to have both of these parameters 
as small as possible. One cannot, however, 
simultaneously minimize both of them; there 
is a trade-off. On one end of the spectrum, 
we have LDCs with the codeword length 
close to the message length, decodable with 
somewhat large query complexity. Such codes 
are useful for data storage and transmission. 
On the other end we have LDCs where the 
query complexity is a small constant, but the 
codeword length is large compared to the 
message length. Such codes find applications 
in complexity theory, derandomization, and 
cryptography (and this was the reason they 
were originally studied [15]). The true shape 
of the trade-off between the codeword length 
and the query complexity of LDCs is not known. 
Determining it is a major open problem (see [23] 
for an excellent recent survey of the LDC 
literature). 

Natural variants of locally decodable codes 
are locally correctable codes (LCCs) where every 
symbol of the true codeword can be recovered 
with high probability by reading only a small 
number of locations of the corrupted codeword. 
When the underlying code is linear, it is known 
that every LCC is also an LDC. 
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Notation and Formal Definition 

For a set © and two vectors c,c’ € ¥”", we 
define the relative Hamming distance between 
c and c’, which we denote by A(c,c’), to be 
the fraction of coordinates where they differ: 
A(c,c! = Prietniler # ¢/]. 

An error-correcting code of length n over the 
alphabet X is a subset C C &”. The rate of C is 
defined to be lot = oI . The (minimum) distance of 
C is defined to be the smallest 5 > 0 such that for 
every distinct c1,c2 € C, we have A(cj,c2) > 6. 

For an algorithm A and a string r, we will use 
A’ to represent that A is given query access to r. 

We now define locally decodable codes. 


Definition 1 (Locally Decodable Code) Let 
C < 3S” be a code with |C| = |S|*. Let 
E : 5* - C bea bijection (we refer to E 
as the encoding map for C; note that k/n equals 
the rate of the code C). We say that (C, FE) is 
locally decodable from a 6’-fraction of errors 
with ¢ queries if there is a randomized algorithm 
A, such that: 


* Decoding: Whenever a message x € * 
and a received word r € %” are such that 
A(r, E(x)) < 6’, then, for each i € [k], 


Pri[A’ (i) = x;] => 2/3. 


* Query complexity ¢: Algorithm A’ (i) always 
makes at most f queries to r. 


Key Results 


Locally decodable codes have been implicitly 
studied in coding theory for a long time, 
starting with Reed’s “majority-logic decoder” for 
binary Reed-Muller codes [18]. They were first 
formally defined by Katz and Trevisan [15] (see 
also [19]). Since then, the quest for understanding 
locally decodable codes has generated many 
developments. 

As mentioned before, there are two main 
regimes in which LDCs have been studied. 
The first regime, on which most prior work 
has focused, is the regime where the query 
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complexity is small — even a constant. In this 
setting, the most interesting question is to 
construct codes with rate as large as possible 
(though it is known that some significant loss in 
rate is inevitable). The second regime, which has 
been the focus of more recent work, is the high- 
rate regime. In this regime, one insists on the rate 
of the code being very close to 1 and then tries 
to obtain as low-query complexity as possible. 
We now discuss the known results in both 
regimes. 


Low-Query Regime 

A significant body of work on LDCs has focused 
on local decoding with a constant number of 
queries. Local decoding with 2 queries is almost 
fully understood. The Hadamard code is a 2- 
query LDC with codeword length n = 2¥. 
Moreover, it was shown in [9, 14] that any 2-query 
locally decodable code (that is decodable from 
some small fixed constant fraction of errors) must 
have n = 22), 

For a long time, it was generally believed 
that for decoding with any constant number of 
queries, this exponential blowup is needed: a 
k-bit message must be encoded into at least 
exp(k*) bits, for some constant « > 0. This 
is precisely the trade-off exhibited by the Reed- 
Muller codes, which were believed to be optimal. 
Recently, in a surprising and beautiful sequence 
of works [6, 8, 17, 22], a new family of codes 
called matching vector codes was constructed, 
and they were shown to have local decoding 
parameters surprisingly much better than that of 
Reed-Muller codes! This family of codes gives 
constant query locally decodable codes which en- 
code k bits into as few as n = exp(exp(log* (k))) 
bits for some small (Parameter € can be chosen 
arbitrarily close to 0 by increasing the number of 
queries as a function of €.) constant € > 0 and are 
locally decodable from some fixed small constant 
fraction of errors. 

There has also been considerable work [9, 
14, 15, 20, 21] on lower bounds on the length 
of low-query locally decodable codes. However, 
there is still a huge gap in our understanding of 
the best rate achievable for any query complex- 
ity that is at least 3. For instance, for 3-query 
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LDCs that decode from a fixed small constant 
fraction of errors, the best lower bounds we 
know are of the form n = 2(k*) [21]. The 
best upper bounds on the other hand only give 
constructions with n = exp(exp(log?“ (k))). 
For codes locally decodable for general query 
complexity f, it is known [15] thatn = k!+ 20/9 
(again, a really long way off from the upper 
bounds). Thus, in particular, for codes of constant 
rate, local decoding requires at least 2 (log k) 
queries. 


High-Rate Regime 

In the high-rate regime, one fixes the rate to be 
constant, i.e., 2 = O(k) or even (1 + a)k, for 
some small constant a and then tries to construct 
locally decodable codes with the smallest query 
complexity possible. Reed-Muller codes in this 
regime demonstrate the following setting of 
parameters: for any constant « > 0, there is a 
family of Reed-Muller codes of rate = exp(1/e) 
that is decodable with n* queries. Till recently 
this was the best trade-off for parameters known 
in this regime, and in fact we did not know any 
family of codes of rate > 1/2 with any sublinear 
query complexity. 

In the last few years, three very different fam- 
ilies of codes have been constructed [10—12] that 
go well beyond the parameters of Reed-Muller 
codes. These codes show that for arbitrary a, € > 
0, and for every finite field F, for infinitely many 
n, there is a linear code over F of length n with 
rate | — a, which is locally decodable (and even 
locally correctable) from a constant fraction (This 
constant is positive and is a function of only a 
and ¢€.) of errors with O(n‘) queries. Codes with 
such parameters were in fact conjectured not to 
exist, and this conjecture, if it were true, would 
have yielded progress on some well-known open 
questions in arithmetic circuit complexity [7]. 

Even more recently, it was shown that one can 
achieve even subpolynomial query complexity 
while keeping rate close to 1. In [13], it was 
shown that, for any a > O and for every finite 
field F, for infinitely many n, there is a code 
over F of length n with rate 1 — a, which is 
locally decodable (and even locally correctable) 
from a constant fraction (This constant is positive 
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and is a function of only a.) of errors with 
20(Vlogn log log) queries. 

On the lower bound front, all we know are 
the n = k!+2Q/9 Jower bounds by [15]. Thus, 
in particular, it is conceivable that for any small 
constant a, one could have a family of codes 
of rate 1 — @ that are decodable with O(logn) 
queries. 


Applications 


In theoretical computer science, locally de- 
codable codes have played an important 
part in the proof-checking revolution of the 
early 1990s [1,2,4,16], as well as in other 
fundamental results in hardness amplification 
and pseudorandomness [3, 5, 19]. Variations of 
locally decodable codes are also beginning to 
find practical applications in data storage and 
data retrieval. 


Open Problems 


In the constant query regime, the most important 
question is to get the rate to be as large as 
possible, and the major open question in this 
direction is the following: 


Question 1 Do there exist LDCs with polyno- 
mial rate, 1.e.,1 = kOd) | that are decodable with 
O(1) queries? 


In the high-rate regime, the best query lower 
bounds we know are just logarithmic, and the 
main challenge is to construct codes with im- 
proved query complexity, hopefully coming close 
to the best lower bounds we can prove. 


Question 2 Do there exist LDCs of rate 1 — a or 
even §2(1) that are decodable with poly(logn) 
queries? 


Given the recent constructions of new fam- 
ilies of codes in both the high-rate and low- 
query regimes with the strengthened parameters, 
it doesn’t seem all that farfetched to imagine 
that we might soon be able to give much better 
answers to the above questions. 
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Problem Definition 

Locally testable codes (LTC) are error-correcting 


codes that support algorithms which can distin- 
guish valid codewords from words that are “far” 
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from all codewords by probing a given word only 
at a sublinear (typically constant) number of loca- 
tions. LTCs are useful in the following scenario. 
Suppose data is transmitted by encoding it using 
a LTC. Then, one could check if the received data 
is nearly uncorrupted or has been considerably 
corrupted by making very few probes into the 
received data. 

An etror-correcting code C f0, > 
{0,1}” is a function mapping k-bit messages 
to n-bit codewords. The ratio k/n is referred to 
as the rate of the code C. The Hamming distance 
between two n-bit strings x and y, denoted by 
A(x, y), is the number of locations where x and 
y disagree, i.e., A(x, y) = {i € [n] | x; 4 yi}. 
The relative distance between x and y, denoted 
by 6(x,y), is the normalized distance, i.e., 
6(x,y) = A(x, y)/n. The distance of the code 
C, denote by d(C), is the minimum Hamming 
distance between two distinct codewords, i.e., 
d(C) = minyzy A(C(x),C(y)). The distance of 
a string w from the code C, denoted by A(w,C), 
is the distance of the nearest codeword to w, 
ie., A(w,C) = min, A(w,C(x)). The relative 
distance of a code and the relative distance of a 
string w to the code are the normalized versions 
of the corresponding distances. 


Definition 1 (locally testable code (LTC)) A 
code C : {0, 1}* —> {0, 1}” is said to be (q, 6, €)- 
locally testable if there exists a probabilistic 
oracle algorithm T, also called a tester that, on 
oracle access to an input string w € {0,1}”, 
makes at most qg queries (A query models a probe 
into the input string w in which one symbol (here 
a bit) of w is read.) to the string w and has the 
following properties: 


Completeness: For every message x € {0, 1}*, 
with probability 1 (over the tester’s internal 
random coins), the tester T accepts the word 
C(x). Equivalently, 


vx € {0, 1}*, Pr[T©™ accepts ] = 1. 


Soundness: For every string w € {0,1}”" such 
that A(w,C) > 6n, the tester T rejects the 
word w with probability at least e (despite 
reading only g bits of the word w). Equiva- 
lently, 
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Vw € {0, 13", 
A(w,C) > 6n => Pr[T” accepts] < 1—e. 


Local testability was first studied in the 
context of program checking by Blum, Luby, 
and Rubinfeld [8] who showed that the 
Hadamard code is locally testable (Strictly, 
speaking Blum et al. only showed that “Reed- 
Muller codes of order 1,” a strict subclass of 
Hadamard codes are locally testable, while 
later Kaufman and Litsyn [15] demonstrated 
the local testability of the entire class of 
Hadamard codes.) and Gemmell et al. [11] 
who showed that the Reed-Muller codes are 
locally testable. The notion of LTCs is implicit 
in the work on locally checkable proofs by 
Babai et al. [2] and subsequent works on PCP. 
The explicit definition appeared independently 
in the works of Rubinfeld and Sudan [17], 
Fried] and Sudan [10], Arora’s PhD thesis [1] 
(under the name of “probabilistically checkable 
proofs’), and Spielman’s PhD thesis [18] (under 
the name of “checkable codes”). A formal 
study of LTCs was initiated by Goldreich and 
Sudan [14]. 

The following variants of the above definition 
of locally testability have also been studied. 


e 2-sided vs. 1-sided error: The above defini- 
tion of LTCs has perfect completeness, in the 
sense that every valid codeword is accepted 
with probability exactly 1. The tester, in this 
case, 1S said to have 1-sided error. A 2-sider 
error tester, on the other hand, accepts valid 
codewords with probability at least c for some 
c € (1-6, 1]. However, most constructions of 
LTCs have perfect completeness. 

¢ Strong/robust LTCs: The soundness require- 
ment in Definition 1 can be strengthened in 
the following sense. We can require that there 
exists a constant p € (0, 1) such that for every 
string w € {0,1}" which satisfies A(w,C) > 
d, we have 


d 
Pr[T” accepts ] < 1 — =, 
n 
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In other words, non-codewords which are at 
least a certain minimum distance from the 
code are not only rejected with probability at 
least ¢ but are in fact rejected with probabil- 
ity proportional to the distance of the non- 
codeword from the code. Codes that have such 
testers are called (q, p)-strong locally testable 
codes. They are sometimes also referred to as 
(q, p)-robust locally testable codes. Most con- 
structions of LTCs satisfy the stronger sound- 
ness requirement. 

e Adaptive vs. nonadaptive: The g queries of 
the tester T could either be adaptive or non- 
adaptive. Almost all constructions of LTCs are 
nonadaptive. 

¢ Tolerant testers: Tolerant LTCs are codes with 
testers that accept not only valid codewords 
but also words which are close to the code, 
within a particular tolerance parameter 5’ for 


6’ <6. 


LTCs are closely related to probabilistically 
checkable proofs (PCPs). Most known construc- 
tions of PCPs yield LTCs with similar parame- 
ters. In fact, there is a generic transformation to 
convert a PCP of proximity (which is a PCP with 
more requirements) into an LTC with comparable 
parameters [7,19]. See a survey by Goldreich [13] 
for the interplay between PCP and LTC construc- 
tions. 

Locally decodable codes (LDCs), in contrast 
to LTCs, are codes with sublinear time decoders. 
Informally, such decoders can recover each mes- 
sage entry with high probability by probing the 
word at a sublinear (even constant) number of 
locations provided that the codeword has not been 
corrupted at too many locations. Observe that 
LTCs distinguish codewords from words that are 
far from the code while LDCs allow decoding 
from words that are close to the code. 


Key Results 


Local Testability of Hadamard Codes 
As a first example, we present the seminal result 
of Blum, Luby, and Rubinfeld [8] that showed 
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that Hadamard codes are locally testable. In this 
setting, we are given a string f € {0, 132" and we 
would like to test if f is a Hadamard codeword. 
It will be convenient to view the 2*-bit long 
string f as a function f : {0,1}* — {0,1}. In 
this alternate view, Hadamard codewords (strictly 
speaking, “Reed-Muller codewords of order 1”) 
correspond to linear functions, i.e., they satisfy 
Vx, y € {0,1}, f(x) + f(y) = f(x +y) Here 
addition “+” refers to bitwise xor. In other words, 
b, + bp := by @ bz for by,b2 € {0,1} and 
XY = (X1,%2-.., Xe) + (V1, 200 Ve) = 
(x1 Oy1,X2O V2... X~ YK) for x, y € {0, 1}*). 
The following test is due to Blum, Luby, and Ru- 
binfeld. The accompanying theorem shows that 
this is in fact a robust characterization of linear 
functions. 


BLR-Test 
Input: Parameter k and oracle access to f : 
{0, 1}* — {0, 1}: 


1. Choose x,y €r {0,1}* uniformly at ran- 
dom. 
2. Query f at locations x, y and x + y. 


3. Accept iff f(x) + f(y) = f(xt+ y). 


Clearly, the BLR-test always accepts all lin- 
ear functions (i.e., Hadamard codewords). Blum, 
Luby, and Rubinfeld (with subsequent improve- 
ments due to Coppersmith) showed that if f 
has relative distance at least 5 from all linear 
functions, then BLR-test rejects with probability 
at least min{5/2,2/9}. Their result was more 
general in the sense that it applied to all additive 
groups and not just {0,1}. For the special case 
of {0, 1}, Bellare et al. [3] obtained the following 
stronger result: 


Theorem 1 ((3,8]) Jf f is at relative distance at 
least 6 from all linear functions, then the BLR-test 
rejects f with probability at least 6. 


Local Testability of Reed-Muller Codes 

Rubinfeld and Sudan [17] considered the prob- 
lem of local testability of the Reed-Muller codes. 
Here, we consider codes over non-Boolean al- 
phabets and the natural extension of LTCs to 
this non-Boolean setting. Given a field F and 


Locally Testable Codes 


parameters d and m (where d + 1 < |F]), the 
Reed-Muller code consists of codewords which 
are evaluations of m-variate polynomials of total 
degree at most d. Let a, @1,02,...,@q41 be 
(d + 2) distinct elements in the field F (recall 
that d + 1 < |F|). The following test checks if a 
given f : F” — F is close to the Reed-Muller 
code. 


RS-test 
Input: Field F, parameter d, and oracle access to 
f:F">F: 


1. Choose x, y €r F” uniformly at random. 

2. Query f at locations (x + a; -y),i = 
l...,d +1, 

3. Interpolate to construct a univariate poly- 
nomial q : F > F of degree at most d such 
that g(a;) = f(xtaj-y),i =1,...,d+1. 

4. If q(ao) = f(x+ao- y) accept, else reject. 


The above test checks that the restriction of 
the function f to the line /(t) = x + ty is 
a univariate polynomial of degree at most d. 
Clearly, multivariate polynomials of degree at 
most d are always accepted by the RS-test. 


Theorem 2 ({17]) There exists a constant c such 
that if |F| > 2d + 2 and & < c/d?, then the 
following holds for every (positive) integer m. If 
Ff has relative distance at least 6 from all m- 
variate polynomials of total degree at most d, 
then the RS-test rejects f with probability at least 
6/2. 


Open Problems 


The Hadamard code is testable with three queries 
but has inverse exponential rate, whereas the 
Reed-Muller code (for certain setting of param- 
eters d,m, and F) has polylogarithmic query 
complexity and inverse polynomial rate. We can 
ask if there exist codes good with respect to both 
parameters. In other words, do there exist codes 
with inverse polynomial rate and linear distance 
which are testable with a constant number of 
queries? Such a construction, with nearly linear 
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rate, was obtained by Ben-Sasson and Sudan [5] 
and Dinur [9]. 


Theorem 3 ([5,9]) There exists a constant q and 
an explicit family of codes {Cx},% where Cx 
{0, LE — {0,1}* Polylos” that have linear dis- 
tance and are q-locally testable. 


This construction is obtained by combining 
the algebraic PCP of proximity-based construc- 
tions due to Ben-Sasson and Sudan [5] with 
the gap amplification technique of Dinur [9]. 
Meir [16] obtained a LTC with similar parameters 
using a purely combinatorial construction, albeit 
a non-explicit one. 

It is open if this construction can be fur- 
ther improved. In particular, it is open if there 
exist codes with constant rate and linear rela- 
tive distance (such codes are usually referred 
to as good codes) that are constant query lo- 
cally testable. We do not know of even a non- 
explicit code with such properties. To the con- 
trary, it is known that random low-density par- 
ity check matrix (LDPC) codes are in fact not 
locally testable [6]. For a more detailed survey 
on LTCs, their constructions, and limitations, 
the interested reader is directed to excellent sur- 
veys by Goldreich [13], Trevisan [19], and Ben- 
Sasson [4]. 
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Problem Definition 


Consider a weighted connected multigraph G = 
(V, E,@), where @ is a function from the edge 
set E of G into the set of positive reals. For 
a path P in G, the weight of P is the sum of 
weights of edges that belong to the path P. For 
a pair of vertices u,v € V, the distance between 
them in G is the minimum weight of a path 
connecting u and v in G. For a spanning tree 
T of G, the stretch of an edge (u,v) € E is 
defined by 


distr (u, v) 


stretchy (u, v) = distg(u, v) ; 


and the average stretch over all edges of F is 


Low Stretch Spanning Trees 


1 
avestr(G, 7) = — 


iE] > stretchy (u, v). 


(u,v)EE 


The average stretch of a multigraph G = 
(V,E,@) is defined as the smallest average 
stretch of a spanning tree T of G, avestr(G, T). 
The average stretch of a positive integer n, 
avestr(n), is the maximum average stretch of 
an n-vertex multigraph G. The problem is to 
analyze the asymptotic behavior of the function 
avestr(7). 

A closely related (dual) problem is to con- 
struct a probability distribution D of spanning 
trees for G, so that 


max 
e=(u,v)EE 


expstr(G, D) = {Tep(stretchz (u, v)) 


is small as possible. Analogously, expstr(G) = 
min{expstr(G, D)}, where the minimum is over 


all distributions D of spanning trees of G, and 
expstr(n) = maxg {expstr(G)}, where the maxi- 
mum is over all m-vertex multigraphs. 

By viewing the problem as a 2-player zero- 
sum game between a tree player that aims to 
minimize the payoff and an edge player that aims 
to maximize it, it is easy to see that for every 
positive integer 1, avestr(n) = expstr(7) [3]. The 
probabilistic version of the problem is, however, 
particularly convenient for many applications. 


Key Results 
The problem was studied since 1960s [9, 14, 16, 


17]. A major progress in its study was achieved 
by Alon et al. [3], who showed that 


Q(log n) = avestr(n) = expstr(”) 


= exp (0 ( Viog m-TogTogn)) 


Elkin et al. [10] improved the upper bound and 
showed that 
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avestr(n) = expstr(n) = O (log” n - log log n) . 


Applications 


One application of low-stretch spanning trees 
is for solving symmetric diagonally dominant 
linear systems of equations. Boman and Hen- 
drickson [6] were the first to discover the sur- 
prising relationship between these two seemingly 
unrelated problems. They applied the spanning 
trees of [3] to design solvers that run in time 
m3/220(/log n loglog 7) log(1/€) Spielman and Teng 
[15] improved their results by showing how to 
use the spanning trees of [3] to solve diagonally 
dominant linear systems in time 


m22 (vies nog Tog 7) log(1/e). 


By applying the low-stretch spanning trees de- 
veloped in [10], the time for solving these linear 
systems reduces to 


m log? n log(1/e), 


and to O(n(log n log log n)? log(1/e)) when the 
systems are planar. Applying a recent reduction 
of Boman, Hendrickson, and Vavasis [7], one 
obtains a O(n(log n log log n)? log(1/e)) time 
algorithm for solving the linear systems that arise 
when applying the finite element method to solve 
two-dimensional elliptic partial differential equa- 
tions. 

Chekuri et al. [8] used low-stretch spanning 
trees to devise an approximation algorithm 
for nonuniform buy-at-bulk network design 
problem. Their algorithm provides a_ first 
polylogarithmic approximation guarantee for this 
problem. 

Abraham et al. [2] use a technique of Star 
decomposition introduced by Elkin et al. [10] 
to construct embeddings with a constant average 
stretch, where the average is over all pairs of 
vertices, rather than over all edges. The result of 
Abraham et al. [2] was, in turn, already used in 
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a yet more recent work of Elkin et al. [11] on 
fundamental circuits. 


Open Problems 


Abraham and Neiman [1] subsequently devised 
an algorithm for constructing a spanning tree with 
average stretch O(log n log log n). The most ev- 
ident open problem is to close the gap between 
this algorithm and the Q(log n) lower bound. 
Another intriguing subject is the study of low- 
stretch spanning trees for various restricted fam- 
ilies of graphs. Progress in this direction was 
recently achieved by Emek and Peleg [12] that 
constructed low-stretch spanning trees with av- 
erage stretch O(logn) for unweighted series- 
parallel graphs. Discovering other applications of 
low-stretch spanning trees is another promising 
venue of study. 

Finally, there is a closely related relaxed no- 
tion of low-stretch Steiner or Bartal trees. Unlike 
a spanning tree, a Steiner tree does not have to 
be a subgraph of the original graph, but rather 
is allowed to use edges and vertices that were 
not present in the original graph. It is, however, 
required that the distances in the Steiner tree 
will be no smaller than the distances in the 
original graph. Low-stretch Steiner trees were 
extensively studied [4, 5, 13]. Fakcharoenphol et 
al. [13] devised a construction of low-stretch 
Steiner trees with an average stretch of O(log 7). 
It is currently unknown whether the techniques 
used in the study of low-stretch Steiner trees can 
help in improving the bounds for the low-stretch 
spanning trees. 
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The Exponential Time Hypothesis 
and Its Consequences 


In 2001, Impagliazzo, Paturi, and Zane [5, 6] 
introduced the Exponential Time Hypothesis 
(ETH): a complexity assumption saying that there 
exists a constant c > O such that no algorithm for 
3-SAT can achieve the running time of O(2°”), 
where 7 is the number of variables of the input 
formula. In particular, this implies that there is 
no subexponential-time algorithm for 3-SAT, 
that is, one with running time 2°. The key 
result of Impagliazzo, Paturi, and Zane is the 
Sparsification Lemma, proved in [6]. Without 
going into technical details, the Sparsification 
Lemma provides a reduction that allows us to 
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assume that the input instance of 3-SAT is sparse 
in the following sense: the number of clauses is 
linear in the number of variables. Thus, a direct 
consequence is that, assuming ETH, there is a 
constant c > 0 such that there is no algorithm for 
3-SAT with running time O(e@+m) ). Hence, an 
algorithm with running time 2°+™ is excluded 
in particular. 

After the introduction of the Exponential Time 
Hypothesis and the Sparsification Lemma, it 
turned out that ETH can be used as a robust 
assumption for proving sharp lower bounds on 
the time complexity of various computational 
problems. For many classic NP-hard graph 
problems, like VERTEX COVER, 3-COLORING, 
or HAMILTONIAN CYCLE, the known NP- 
hardness reductions from 3-SAT are linear, 
i.e., they transform an instance of 3-SAT with 
n variables and m clauses into an instance of the 
target problem whose total size is O(n + m). 
Consequently, if any of these problems admitted 
an algorithm with running time 2°Y+™), where 
N and M are the numbers of vertices and edges 
of the graph, respectively, then the composition 
of the reduction and such an algorithm would 
yield an algorithm for 3-SAT with running 
time 2°@+™) | thus contradicting ETH. As all 
these problems indeed can be solved in time 
20), this shows that the single-exponential 
running time is essentially optimal. The same 
problems restricted to planar graphs have NP- 
hardness reductions from 3-SAT with a quadratic 
size blowup, which excludes the existence of 
2°(VN+M) algorithms under ETH. Again, this is 
matched by 2O\WN) algorithms obtained using 
the Lipton-Tarjan planar separator theorem. 

Of particular interest to us are applications 
in parameterized complexity. Recall that a pa- 
rameterized problem is fixed-parameter tractable 
(FPT) if there is an algorithm solving it in time 
f(k) - n°, where n is the total input size, k is 
the parameter, f is some computable function, 
and c is a universal constant. Observe that if 
we provide a reduction from 3-SAT to a pa- 
rameterized problem where the output parameter 
depends linearly on n + m, then assuming ETH 
we exclude the existence of a subexponential 
parameterized algorithm, i.e., one with running 


1160 


time 2°) .»°), If the dependence of the output 
parameter on 1 + m is different, then we obtain a 
lower bound for a different function f. This idea 
has been successfully applied for many various 
parameterizations and different running times. 
For example, Lokshtanov et al. [8] introduced a 
framework for proving lower bounds excluding 
the running time of the form 2° "£*) ..°®, The 
framework can be used to show the optimality 
of known FPT algorithm for several important 
problems, with a notable example of CLOSEST 
STRING. 

More information on lower bounds based on 
ETH can be found in the survey of Lokshtanov 
et al. [7] or in the PhD thesis of the current 
author [9]. 


Problem Definition 


In the EDGE CLIQUE COVER problem, we are 
given a graph G and an integer k, and the ques- 
tion is whether one can find k complete sub- 
graphs Cy, C2,..., Cx of G such that E(G) = 
Uy E(C;). In other words, we have to cover 
the whole edge set of G using k complete sub- 
graphs of G. Such a selection of k complete 
subgraphs is called an edge clique cover. 

The study of the parameterized complexity of 
EDGE CLIQUE COVER was initiated by Gramm 
et al. [3]. The main observation of Gramm et al. 
is the applicability of the following data reduction 
rule: as long as in G there exists a pair of perfect 
twins (i.e., adjacent vertices having exactly the 
same neighborhood), then it is safe to remove 
one of them (and decrement the parameter k 
in the case when these twins form an isolated 
edge in G). Once there are no perfect twins 
and no isolated vertices in the graph (the latter 
ones can be also safely removed), then one can 
easily show the following: there is no edge clique 
cover of size less than log |V(G)|. Consequently, 
instances with k < log|V(G)| can be discarded 
as no-instances, and we are left with instances 
satisfying |V(G)| < 2*. In the language of 
parameterized complexity, this is called a kernel 
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with 2* vertices. By applying a standard covering 
dynamic programming algorithm on this kernel, 
we obtain an FPT algorithm for EDGE CLIQUE 
COVER with running time 22°” + |V(G)|O; 
the second summand is the time needed to apply 
the data reduction rule exhaustively. 

Given the striking simplicity of the approach 
of Gramm et al. [3], the natural open question was 
whether the obtained double-exponential running 
time of the algorithm for EDGE CLIQUE COVER 
could be improved. 


Key Results 


This question has been resolved by Cygan 
et al. [1], who showed that, under ETH, the 
running time obtained by Gramm et al. [3] is 
essentially optimal. More precisely, they proved 
the following result: 


Lemma 1 There exists a polynomial-time algo- 
rithm that, given an instance p of 3-SAT with n 
variables and m clauses, constructs an equiva- 
lent EDGE CLIQUE COVER instance (G,k) with 
k = O(logn) and |V(G)| = O(n +m). 


Thus, by considering a composition of the 
reduction of Lemma | with a hypothetical al- 
gorithm for EDGE CLIQUE COVER with running 
time 22°” . |V(G)|°™, we obtain the following 
lower bound: 


Theorem 1 Unless ETH fails, there is no algo- 
rithm for EDGE CLIQUE COVER with running 
time 22°? . |V(G)|OM. 


Curiously, Lemma 1 can be also used to 
show that the kernelization algorithm of Gramm 
et al. [3] is also essentially optimal. More 
precisely, we have the following theorem. 


Theorem 2 There exists a universal constant 
€ > 0 such that, unless P = NP, there is no 
constant 4 and a polynomial-time algorithm A 
that takes an instance (G,k) of EDGE CLIQUE 
COVER and outputs an equivalent instance 
(G’,k’) of EDGE CLIQUE COVER with binary 
encoding of length at most A - 2°. 


Lower Bounds Based on the Exponential Time Hypothesis: Edge Clique Cover 


The idea of the proof of Theorem 2 is to 
consider the composition of three algorithms: (i) 
the reduction of Lemma 1, (ii) a hypothetical 
algorithm as in the statement of Theorem 2 for 
a very small ¢ > 0, and (iii) a polynomial-time 
reduction from EDGE CLIQUE COVER to 3-SAT, 
whose existence follows from the fact that the 
former problem is in NP and the latter one is NP- 
hard. Since the constants hidden in the bounds 
for algorithms (i) and (iii) are universal, for some 
very small ¢ > 0 this composition would result 
in an algorithm that takes any instance of 3- 
SAT on 7 variables and shrinks it to an instance 
that has total bitsize o(n), ie., sublinear in n. 
Hence, by applying this algorithm multiple times, 
we would eventually shrink the instance at hand 
to constant size, and then we could solve it 
by brute force. As all the algorithms involved 
run in polynomial time, this would imply that 
P= NP. 

We remark that Theorem 2 was not observed 
in the original work of Cygan et al. [1], but 
its proof can be found in the PhD thesis of the 
current author [9], and it will appear in the up- 
coming journal version of [1]. Also, in an earlier 
work, Cygan et al. [2] proved a weaker state- 
ment that EDGE CLIQUE COVER does not admit 
a polynomial kernel unless NP C coNP/poly. 
Theorem 2 shows that even a subexponential 
kernel is unlikely under the weaker assumption 
of P # NP. 

Let us now shed some light on the proof of 
Lemma 1, which is the crucial technical ingredi- 
ent of the results. The main idea is to base the 
reduction on the analysis of the cocktail party 
graph: for an integer n > 1, the cocktail party 
graph H>, is defined as a complete graph on 
2n vertices with a perfect matching removed. 
Observe that H2, does not contain any perfect 
twins, so it is immune to the data reduction rule of 
Gramm et al. [3] and the minimum size of an edge 
clique cover in Hy is at least 1 + logn. On the 
other hand, it is relatively easy to construct a large 
family of edge clique covers of H2, that have size 
2[logn]. Actually, the question of determining 
the minimum size of an edge clique cover in Hoy, 
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was studied from a purely combinatorial point 
of view: Gregory and Pullman [4] proved that 
it is equal to inffk:n < (ip) }: answering an 
earlier question of Orlin. Note that this value is 
larger than 1 + logn only by an additive factor of 
O(log logn). 

Thus, the cocktail party graph provides a natu- 
ral example of a hard instance where the parame- 
ter is logarithmic. The crux of the construction is 
to start with the cocktail party graph H2, and, by 
additional gadgeteering, force the solution inside 
it to belong to the aforementioned family of edge 
clique covers of size 2[logn]. The behavior of 
these edge clique covers (called twin covers) can 
be very well understood, and we can encode 
the evaluation of the variables of the input 3- 
SAT formula as a selection of a twin cover to 
cover H,. In order to verify that the clauses 
of the input formula are satisfied, we construct 
additional clause gadgets. This involves only a 
logarithmic number of additional cliques and 
is based on careful constructions using binary 
encodings. 


Discussion 


After announcing the results of Cygan et al. [1], 
there was some discussion about their actual 
meaning. For instance, some authors suggested 
that the surprisingly high lower bound for EDGE 
CLIQUE COVER may be a argument against the 
plausibility of the Exponential Time Hypothesis. 
Our view on this is quite different: the double- 
exponential lower bound suggests that EDGE 
CLIQUE COVER is an inherently hard problem, 
even though it may not seem as such at first 
glance. The relevant parameter in this problem 
is not really the number of cliques k, but 
rather 2*, the number of possible different 
neighborhoods that can arise in a graph that 
is a union of k complete graphs. The lower 
bound of Cygan et al. [1] intuitively shows 
that one cannot expect to significantly reduce 
the number of neighborhoods that needs to be 
considered. 
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Problem Definition 


The dynamic connectivity problem requests 
maintenance of a graph G subject to the following 
operations: 


insert(u, v): insert an undirected edge (u, 
v) into the graph. 

delete(u, v): delete the edge (u, v) from the 
graph. 

connected(u, v): test whether uw and v lie in 
the same connected component. 


Let m be an upper bound on the number of 
edges in the graph. This entry discusses cell- 
probe lower bounds for this problem. Let ¢, be 
the complexity of insert and delete and 4, 
the complexity of query. 


The Partial-Sums Problem 

Lower bounds for dynamic connectivity are inti- 
mately related to lower bounds for another classic 
problem: maintaining partial sums. Formally, the 
problem asks one to maintain an array A[1..n] 
subject to the following operations: 


update(k, A): let A[k] < A. 

sum(A): returns the partial sum aa Ali]. 
testsum(k,o): returns a boolean value indi- 
cating whether sum(k) = o. 


To specify the problem completely, let elements 
Ali] come from an arbitrary group G containing 
at least 2° elements. In the cell-probe model with 
b-bit cells, let i= be the complexity of update 
and i the complexity of testsum (which is 
also a lower bound on sum). 

The tradeoffs between t” and is are well 
understood for all values of b and §. However, 
this entry only considers lower bounds under 
the standard assumptions that b = Q(lgn) and 
t, = tq. It is standard to assume b = (2(1gn) for 
upper bounds in the RAM model; this assumption 
also means that the lower bound applies to the 
pointer machine. Then, Patrascu and Demaine [6] 
prove: 


Theorem 1 The complexity of the partial-sums 
problems satisfies:t* - Ig(¢7/t) = 92(6/b - Ign). 
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Observe that this matches the textbook upper 
bound using augmented trees. One can build 
a balanced binary tree over A[1],..., A[m] and 
store in every internal node the sum of its subtree. 
Then, updates and queries touch O(lgn) nodes 
(and spend O([65/b]) time in each one due to the 
size of the group). To decrease the query time, 
one can use a B-tree. 


Relation to Dynamic Connectivity 

We now clarify how lower bounds for 
maintaining partial sums imply lower bounds 
for dynamic connectivity. Consider the partial- 
sums problem over the group G = Sj, i.e., the 
permutation group on n elements. Note that 
6 = lIg(n!) = 2(nlgn). It is standard to set 
b = O(lgn), as this is the natural word size used 
by dynamic connectivity upper bounds. This 
implies i Ig(¢ /t) = Q(nign). 

The lower bound follows from implementing 
the partial-sums operations using dynamic con- 
nectivity operations. Refer to Fig. 1. The vertices 
of the graph form an integer grid of size n xn. 
Each vertex is incident to at most two edges, 
one edge connecting to a vertex in the previous 
column and one edge connecting to a vertex 
in the next column. Point (x,y ;) in the grid 
is connected to point (x + 1, A[x](y1)), i-e.,the 
edges between two adjacent columns describe the 
corresponding permutation from the partial-sums 
vector. 

To implement update (x,z), all the edges 
between column x and x + 1 are first deleted and 
then new edges are inserted according to 1. This 
gives a = O(2n -t,). To implement test sum 
(x, 7), one can use n connected queries be- 
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Dynamic Connectivity, 
Fig. 1 Constructing an 
instance of dynamic 
connectivity that mimics 
the partial-sums problem 
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tween the pairs of points (1, vy) ~ (x + 1, a(y)). 
Then, a = O(n- tq). Observe that the sum 
query cannot be implemented as easily. Dynamic 
connectivity is the main motivation to study the 
testsum query. 

The lower bound of Theorem 1 translates 
into ntg + lg(2nt,/ntg) = Q(nlgn); hence 
tg \g(tu/tg) = (gn). Note that this lower 
bound implies max{t,,fg} = (ign). The 
best known upper bound (using amortization 
and randomization) is O(lgn(lglgn)?) [9]. For 
any t, = Q(Ign(iglgn)%), the lower bound 
tradeoff is known to be tight. Note that the 
graph in the lower bound is always a disjoint 
union of paths. This implies optimal lower 
bounds for two important special cases: dynamic 
trees [8] and dynamic connectivity in plane 
graphs [2]. 


Key Results 
Understanding Hierarchies 


Epochs 

To describe the techniques involved in the 
lower bounds, first consider the sum query and 
assume 6 = b. In 1989, Fredman and Saks [3] 
initiated the study of dynamic cell-probe lower 
bounds, essentially showing a lower bound 
of ie lgt” = Q(lgn). Note that this implies 
max{t2’, t*} = Q(lgn/lglgn). 

At an intuitive level, their argument proceeded 
as follows. The hard instance will have n random 
updates, followed by one random query. Leave 
r > 2 to be determined. Looking back in time 
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from the query, one groups the updates into expo- 
nentially growing epochs: the latest r updates are 
epoch 1, the earlier r* updates are epoch 2, etc. 
Note that epoch numbers increase going back in 
time, and there are O(log, 7) epochs in total. 

For some epoch i, consider revealing to the 
query all updates performed in all epochs differ- 
ent from i. Then, the query reduces to a partial- 
sums query among the updates in epoch i. Unless 
the query is to an index below the minimum index 
updated in epoch 7, the answer to the query is 
still uniformly random, i.e., has 5 bits of entropy. 
Furthermore, even if one is given, say, r/5/100 
bits of information about epoch i, the answer 
still has §2(6) bits of entropy on average. This 
is because the query and updates in epoch i are 
uniformly random, so the query can ask for any 
partial sum of these updates, uniformly at ran- 
dom. Each of the r’ partial sums is an independent 
random variable of entropy 6. 

Now one can ask how much information is 
available to the query. At the time of the query, 
let each cell be associated with the epoch during 
which it was last written. Choosing an epoch i 
uniformly at random, one can make the following 
intuitive argment: 


1. No cells written by epochs i+ 1,i+2,... 
can contain information about epoch i, as they 
were written in the past. 

2. In epochs 1,...,i—1, a number of bt . 
ey r/ < bt® - 2r'~! bits were written. 
This is less than r’5/100 bits of information 
for r > 200t% (recall the assumption 6 = b). 
By the above, this implies the query answer 
still has §2(6) bits of entropy. 

3. Since i is uniformly random among O(log, 1) 
epochs, the query makes an_ expected 
OW /log,n) probes to cells from epoch i. 
All queries that make no cell probes to epoch i 
have a fixed answer (entropy 0), and all other 
queries have answers of entropy < 6. Since 
an average query has entropy £2(6), a query 
must probe a cell from epoch 7 with constant 
probability. That means i /log.n = 2(1), 
and > = Q(log, n) = Q(lgn/lgt”). 


Lower Bounds for Dynamic Connectivity 


One should appreciate the duality between the 
proof technique and the natural upper bounds 
based on a hierarchy. Consider an upper bound 
based on a tree of degree r. The last r random 
updates (epoch 1) are likely to be uniformly 
spread in the array. This means the updates touch 
different children of the root. Similarly, the r 
updates in epoch 2 are likely to touch every node 
on level 2 of the tree, and so on. Now, the lower 
bound argues that the query needs to traverse 
a root-to-leaf path, probing a node on every level 
of the tree (this is equivalent to one cell from 
every epoch). 


Time Hierarchies 

Despite considerable refinement to the lower 
bound techniques, the lower bound of 
Q(lgn/lglgn) was not improved until 2004. 
Then, Patragcu and Demaine [6] showed an 
optimal bound of i Ig(¢” /t) = (2(ign), im- 
plying max{t~, ie } = Q(lgn). For simplicity, 
the discussion below disregards the tradeoff and 
just sketches the 92 (lg) lower bound. 

Patrascu and Demaine’s [6] counting 
technique is rather different from the epoch 
technique; refer to Fig. 2. The hard instance 
is a sequence of n operations alternating between 
updates and queries. They consider a balanced 
binary tree over the time axis, with every leaf 
being an operation. Now for every node of the 
tree, they propose to count the number of cell 
probes made in the right subtree to a cell written 
in the left subtree. Every probe is counted exactly 
once, for the lowest common ancestor of the read 
and write times. 

Now focus on two sibling subtrees, each con- 
taining k operations. The k/2 updates in the left 
subtree, and the k/2 queries in the right subtree, 
are expected to interleave in index space. Thus, 
the queries in the right subtree ask for 92(k) dif- 
ferent partial sums of the updates in the left sub- 
tree. Thus, the right subtree “needs” 2(k6) bits 
of information about the left subtree, and this in- 
formation can only come from cells written in the 
left subtree and read in the right one. This implies 
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Lower Bounds for 

Dynamic Connectivity, a 
Fig. 2 Analysis of cell 

probes in the 

a epoch-based and 

b time-hierarchy 

techniques 
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a lower bound of {2(k) probes, associated with 
the parent of the sibling subtrees. This bound is 
linear in the number of leaves, so summing up 
over the tree, one obtains a total 2( lgn) lower 
bound, or 2 (lg) cost per operation. 


An Optimal Epoch Construction 

Rather surprisingly, Patrascu and Tarnita [7] man- 
aged to reprove the optimal tradeoff of Theorem 1 
with minimal modifications to the epoch argu- 
ment. In the old epoch argument, the information 
revealed by epochs 1,...,i —1 about epoch i 
was bounded by the number of cells written in 
these epochs. The key idea is that an equally good 
bound is the number of cells read during epochs 
1,...,i — 1 and written during epoch i. 

In principle, all cell reads from epoch 
i—1 could read data from epoch i, making 
these two bounds identical. However, one can 
randomize the epoch construction by inserting 
the query after an unpredictable number of 
updates. This randomization “smooths” out 
the distribution of epochs from which cells 
are read, i.e., a query reads O(ty / log, n) 
cells from every epoch, in expectation over the 
randomness in the epoch construction. Then, 
the O(r'—!) updates in epochs 1,...,i—1 
only read O(r'-!-1*/log,n) cells from 
epoch i. This is not enough information if 
r > t*/log,n= Ot? /t7), which implies 
i = O(log, n) = Q(1g n/ lg /t7*)). 


Technical Difficulties 


Nondeterminism 

The lower bounds sketched above are based on 
the fact that the sum query needs to output (2(6) 
bits of information about every query. If dealing 
with the decision testsum query, an argument 
based on output entropy can no longer work. 

The most successful idea for decision queries 
has been to convert them to queries with non- 
boolean output, in an extended cell-probe model 
that allows nondeterminism. In this model, the 
query algorithm is allowed to spawn an arbitrary 
number of computation threads. Each thread can 
make f, cell probes, after with it must either ter- 
minate with a ‘reject’ answer, or return an answer 
to the query. All nonrejecting threads must return 
the same output. In this model, a query with 
arbitrary output is equivalent to a decision query, 
because one can just nondeterministically guess 
the answer, and then verify it. 

By the above, the challenge is to prove good 
lower bounds for sum even in the nondeter- 
minstic model. Nondeterminism shakes our view 
that when analyzing epoch i, only cell probes 
to epoch i matter. The trouble is that the query 
may not know which of its probes are actually 
to epoch i. A probe that reads a cell from a pre- 
vious epoch provides at least some information 
about epoch 7: no update in the epoch decided to 
overwrite the cell. Earlier this was not a prob- 
lem because the goal was only to rule out the 
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case that there are zero probes to epoch i. Now, 
however, different threads can probe any cell in 
memory, and one cannot determine which threads 
actually avoid probing anything in epoch i. In 
other words, there is a covert communication 
channel between epoch i and the query in which 
the epoch can use the choice of which cells to 
write in order to communicate information to the 
query. 

There are two main strategies for handling 
nondeterministic query algorithms. Husfeldt 
and Rauhe [4] give a proof based on some 
interesting observations about the combinatorics 
of nondeterministic queries. Patrascu. and 
Demaine [6] use the power of nondeterminism 
itself to output a small certificate that rules out 
useless cell probes. The latter result implies 
the optimal lower bound of Theorem 1 for 
testsum and, thus, the logarithmic lower bound 
for dynamic connectivity. 


Alternative Histories 

The framework described above relies on fixing 
all updates in epochs different from i to an av- 
erage value and arguing that the query answer 
still has a lot of variability, depending on updates 
in epoch i. This is true for aggregation problems 
but not for search problems. If a searched item is 
found with equal probability in any epoch, then 
fixing all other epochs renders epoch 7 irrelevant 
with probability 1 — 1/(log, 7). 

Alstrup et al. [1] propose a very inter- 
esting refinement to the technique, proving 
Q(lgn/lglgn) lower bounds for an impressive 
collection of search problems. Intuitively, their 
idea is to consider O(log, 1) alternative histories 
of updates, chosen independently at random. 
Epoch 7 is relevant in at least one of the histories 
with constant probability. On the other hand, 
even if one knows what epochs | through 7 — 1 
learned about epoch i in all histories, answering 
a random query is still hard. 


Bit-Probe Complexity 

Intuitively, if the word size is b = 1, the lower 
bound for connectivity should be roughly 
Q(lg*n), because a query needs {2(lgn) 
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bits from every epoch. However, ruling out 
anything except zero probes to an epoch turns 
out to be difficult, for the same reason that 
the nondeterministic case is difficult. Without 
giving a very satisfactory understanding of this 
issue, Patragcu and Tarnita [7] use a large bag 
of tricks to show an Q((Ign/lglgn)*) lower 
bound for dynamic connectivity. Furthermore, 
they consider the partial-sums problem in Zz and 
show an 92 (lgn/lg 1g 1g) lower bound, which is 
a triply-logarithmic factor away from the upper 
bound! 


Applications 


The lower bound discussed here extends by easy 
reductions to virtually all natural fully dynamic 
graph problems [6]. 


Open Problems 


By far, the most important challenge for future 
research is to obtain a lower bound of w(lgn) 
per operation for some dynamic data structure 
in the cell-probe model with word size O(lgi). 
Miltersen [5] specifies a set of technical con- 
ditions for what qualifies as a solution to such 
a challenge. In particular, the problem should be 
a dynamic language membership problem. 

For the partial-sums problem, though sum is 
perfectly understood, test sum still lacks tight 
bounds for certain ranges of parameters [6]. In 
addition, obtaining tight bounds in the bit-probe 
model for partial sums in Z appears to be rather 
challenging. 
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Problem Definition 


In the online bin packing problem, a sequence 
of items with sizes in the interval (0, 1] arrive 
one by one and need to be packed into bins, 
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so that each bin contains items of total size 
at most 1. Each item must be irrevocably as- 
signed to a bin before the next item becomes 
available. The algorithm has no knowledge about 
future items. There is an unlimited supply of bins 
available, and the goal is to minimize the total 
number of used bins (bins that receive at least one 
item). 

The most common performance measure for 
online bin packing algorithms is the asymptotic 
performance ratio, or asymptotic competitive ra- 
tio, which is defined as 


A(L) 


noo 


Rasy (A) := lim sup fax 


orr(L)=nl, 
(1) 


Hence, for any input L, the number of bins used 
by an online algorithm A is compared to the 
optimal number of bins needed to pack the same 
input. Note that calculating the optimal num- 
ber of bins might take exponential time; more- 
over, it requires that the entire input is known in 
advance. 


Key Results 


Yao showed that no online algorithm has per- 
formance ratio less than 3 [7]. The following 
construction is very important in the context of 
proving lower bounds for online algorithms. Start 
with an item of type | and size 1/2 + ¢, for some 
very small ¢ > 0. Now, in each step, add an item 
of the largest possible size of the form 1/s + ¢ 
that can fit with all previous items into a single 
bin. That is, the second item has size 1/3 + e, 
the third item has size 1/7 + e, etc. To be more 
precise, it can be shown that the sizes in this input 
sequence are given by 1/t; + ¢ (i => 1), where ¢; 
is defined by 


h=2, ter. =t(-I+1 i=. 


The first few numbers of this sequence are 
2,3,7,43,1,807. This sequence was first 
examined by Sylvester [5]. 
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Since we allow an additive constant to the 
competitive ratio, in order to turn the above 
set of items into an input that can be used to 
prove a lower bound for any online algorithm, 
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we need to repeat each item in the input N times 
for some arbitrarily large N. To summarize the 
preceding discussion, the input has the following 
form: 


1 1 1 1 1 
N = ,N = ,N = ,N — ,N —— Beet 2 
«(5 +8) «(5+) «(5 +8) «(a5 +8) «(tan **) o 


where the items are given to the algorithm in 
order of nondecreasing size. 

Brown and Liang independently gave a lower 
bound to 1.53635 [2,3], using the sequence (2). 
Van Vliet showed how to use the input defined 
by (2) to prove a lower bound of 1.54014. Van 
Vliet set up a linear programming formulation to 
define all possible online algorithms for this input 
to prove the lower bound. 

This works by characterizing online algo- 
rithms by which patterns they use and by how 
frequently they use them. A pattern is a multiset 
of items which fit together in one bin (have 
total size at most 1). As N tends to infinity, an 
online algorithm can be fully characterized by the 
fraction of bins that it packs according to each 
pattern. 

As an example, consider the two largest item 
sizes in (2). The only valid patterns are (1,0), 
(1,1), (0,2), where (x, y) means that there are 
x items of size ; + and y items of size 4 +ein 
the bin. The N smallest items arrive first, and the 
online algorithm will pack them into bins with 


patterns (1,1) and (0, 2) (where the first choice 
means that one item is now packed into it, and in 
the future only one item of size 5 + € is possibly 
packed with it). Say it uses x; bins with pattern 
(1,1) and x2 with pattern (0,2); then we must 
have x1 + 2x2 > N or x1/N + 2x2/N > 1.In 
the linear program, we use variables x} = x;/N, 
thus eliminating the appearance of the number NV 
altogether. 

The input (2) appeared to be “optimal” to 
make life hard for online algorithms: the smallest 
items arrive first, and the input is constructed in 
such a way that each item is as large as possible 
given the larger items (that are defined first). 
Intuitively, larger items are more difficult to 
handle by online algorithms than smaller items. 
Surprisingly, however, in 2012 Balogh et al. [1] 
managed to prove a lower bound of 248/161 ~ 
1.54037 using a slight modification of the input. 
Instead of using 1/43 as the fourth item size, they 
use 1/49 and then continue the construction in 
the same manner as before. We get the following 
input: 


1 1 
Jinx (Gre) x(sote 


For the input that consists only of the first four 
phases of this input, the resulting lower bound is 
now slightly lower than before, but this is more 
than compensated for by the next items. 


Open Problems 


Other variations of the input sequence (2) do not 
seem to give better lower bounds. Yet there is 


still a clear gap to the best known upper bound 
of 1.58889 by Seiden [4]. Can we give a stronger 
lower bound using some other construction? 
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Problem Definition 


One of the most fundamental algorithmic prob- 
lems on trees is how to find the lowest common 
ancestor (LCA) of a pair of nodes. The LCA of 
nodes u and v in a tree is the shared ancestor 
of u and v that is located farthest from the 
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root. More formally, the Jowest common ancestor 
(LCA) problem is: 


Preprocess: A rooted tree T having n nodes. 

Query: For nodes u and v of tree T, query 
LCAr(u, v) returns the least common ancestor 
of u and v in 7, that is, it returns the node 
farthest from the root that is an ancestor of 
both u and v. (When the context is clear, we 
drop the subscript T on the LCA.) 


The goal is to optimize both the preprocessing 
time and the query time. We will therefore refer 
to the running time of an algorithm with prepro- 
cessing time Tp (JV) and query time of Tg (NV) as 
having run time (Tp(N), To(N)). 

The LCA problem has been studied inten- 
sively both because it is inherently beautiful al- 
gorithmically and because fast algorithms for 
the LCA problem can be used to solve other 
algorithmic problems. 


Key Results 


In [7], Harel and Tarjan showed the surprising re- 
sult that LCA queries can be answered in constant 
time after only linear preprocessing of the tree 
T. This result was simplified over the course of 
several papers, and current solutions are based on 
combinations of four themes: 


1. The LCA problem is equivalent to the 
range minimum query (RMQ) problem, 
defined below, in that they can be reduced 
to each other in linear preprocessing time and 
constant query time. Thus, an optimal solution 
for one yields an optimal solution for the 
other. 

2. The LCA of certain trees, notably complete 
binary trees and trees that are linear paths, 
can be computed quickly. General trees can 
be decomposed into special trees. Similarly, 
the RMQ of certain classes of arrays can be 
computed quickly. 

3. Nodes can be labeled to capture information 
about their position in the tree. These labels 
can be used to compute the label of the LCA 
of two nodes. 
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Harel and Tarjan [7] showed that LCA 
computation has a lower bound of S2 (log log n) 
on a pointer machine. Therefore, the fast 
algorithms presented here will all require 
operations on O(log)-bit words. We will see 
how to use this assumption not only to make 
queries O(1) time but to improve preprocessing 
from O(nlogn) to O(n) via Four Russians 
encoding of small problem instances. 

Below, we explore each of these themes for 
LCA computation. 


RMQ 


The range minimum query (RMQ) problem, 
which seems quite different from the LCA 
problem, is, in fact, intimately linked. It is defined 
as: 


Preprocess: A length array A of numbers. 

Query: For indices i and j between 1 and n, 
query RMQ,(x, y) returns the index of the 
smallest element in the subarray Ali... /]. 
(When the context is clear, we drop the sub- 
script A on the RMQ.) 


The following two lemmas give linear reduc- 
tions between LCA and RMQ. 


Reducing LCA to RMQ 

Lemma 1 ({3]) Jf there is a (f(n), g(n))-time 
solution for RMQ, then there is a ( f(2n — 1)+ 
O(n), g(2n — 1) + O(1))-time for 
LCA. 


solution 


Proof Let T be the input tree. The reduction 
relies on one key observation: 


Observation 1 The LCA of nodes u and v is the 
shallowest node encountered between the visits to 
u and to v during a depth-first search traversal 


of T. 


Therefore, the reduction proceeds as follows. 


1. Let array E[1,...,2n — 1] store the nodes 
visited in an Euler tour of the tree T. (The 
Euler tour of T is the sequence of nodes we 
obtain if we write down the label of each node 
each time it is visited during a DFS. The array 
of the Euler tour has length 2n — 1 because we 
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start at the root and subsequently output a node 
each time we traverse an edge. We traverse 
each of the n — | edges twice, once in each 
direction.) That is, E[i] is the label of the ith 
node visited in the Euler tour of 7. 

2. Let the level of a node be its distance from the 
root. Compute the Level Array L[1,...,2n — 
1], where L[i] is the level of node E|i] of the 
Euler tour. 

3. Let the representative of a node in an Euler 
tour be the index of the first occurrence of the 
node in the tour (In fact, any occurrence of i 
will suffice to make the algorithm work, but 
we consider the first occurrence for the sake 
of concreteness.); formally, the representative 
of i is argmin;{E[j] = i}. Compute the 
Representative Array R[1,...,7], where R[i] 
is the index of the representative of node i. 


Each of these three steps takes O(n) 
time, yielding O(n) total time. To compute 
LCAT(x, y), we note the following: 


— The nodes in the Euler tour between the first 
visits to u and to v are E[R[u],..., R[v]] (or 
E[R[v],..., R[u])). 

— The shallowest node in this subtour is at index 
RMQz;(R[u], R[v]), since L[i] stores the level 
of the node at E[i] and the RMQ will thus 
report the position of the node with minimum 
level. 

— The node at this position is E[RMQ; (R[u], 
R{v])], which is thus the output of LCA7 (u, v), 
by Observation 1. 


Thus, we can complete our reduction by prepro- 
cessing Level Array L for RMQ. As promised, 
L is an atray of size 2n — 1, and building it takes 
time O(n). The total preprocessing is f(2n—1)+ 
O(n). To calculate the query time, observe that an 
LCA query in this reduction uses one RMQ query 
in L and three array references at O(1) time each, 
for a total of g(2m — 1) + O(1) time, and we have 
completed the proof of the reduction. a 


Reducing RMQ to LCA 

Lemma 2 ([6]) Jf there is a (f(n), f(1)) solu- 
tion for LCA, then there isa ( f(n) + O(n), g(n) 
+ O(1)) solution for RMQ. 
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Proof Let A[1,...,n] be the input array. 

The Cartesian tree of an array is defined as fol- 
lows. The root of a Cartesian tree is the minimum 
element of the array, and the root is labeled with 
the position of this minimum. Removing the root 
element splits the array into two pieces. The left 
and right children of the root are the recursively 
constructed Cartesian trees of the left and right 
subarrays, respectively. 

A Cartesian tree can be built in linear time 
as follows. Suppose C; is the Cartesian tree of 
A[l,...,i]. To build C;+1, we notice that node 
i+1 will belong to the rightmost path of C;+1, so 
we climb up the rightmost path of C; until finding 
the position where 7 + 1 belongs. Each compar- 
ison either adds an element to the rightmost path 
or removes one, and each node can only join the 
rightmost path and leave it once. Thus, the total 
time to build C, is O(n). 

The reduction is as follows. 


— Let C be the Cartesian tree of A. Recall that 
we associate with each node in C the index i 
corresponding to A[i]. 


Claim RMQ4(i, j) = LCAc(i, j). 


Proof Consider the least common ancestor, k, of 
i and j in the Cartesian tree C. In the recursive 
description of a Cartesian tree, k is the first 
node that separates i and j. Thus, in the array 
A, element A[k] is between elements A[i] and 
A[j]. Furthermore, A[k] must be the smallest 
such element in the subarray Al[i,..., 7] since, 
otherwise, there would be a smaller element k’ in 


Ali,..., 7] that would be an ancestor of k in C, 
and i and j would already have been separated 
by k’. 


More concisely, since k is the first element 
to split 7 and j, it is between them because it 
splits them, and it is minimal because it is the first 
element to do so. Thus, it is the RMQ. a 


We can complete our reduction by preprocess- 
ing the Cartesian tree C for LCA. Tree C takes 
time O(n) to build, and because C is an n node 
tree, LCA preprocessing takes f(n) time, for a 
total of f(n) + O(n) time. The query then takes 
f(a) + O(1), and we have completed the proof 
of the reduction. a 


1171 


An Algorithm for RMQ 

Observe that RMQ has a solution with complex- 
ity (O(n7), O(1)): build a table storing answers 
to all of the (5) possible queries. To achieve 
O(n?) preprocessing rather than the O(n?) naive 
preprocessing, we apply a trivial dynamic pro- 
gram. Notice that answering an RMQ query now 
requires just one array lookup. 

To improve the (O(n7), O(1))-time brute- 
force table algorithm for RMQ to (O(n logn), 
O(1)), precompute the result of all queries with 
a range size that is a power of two. That is, for 
every i between | and n and every j between 
1 and logn, find the minimum element in the 
block starting at 7 and having length 2/, that is, 
compute M[i, j] = argminga;. ; 42/1 {AIK}. 
Table M therefore has size O(n log), and it can 
be filled in time O(n logn) by using dynamic 
programming. Specifically, find the minimum 
in a block of size 2/ by comparing the two 
minima of its two constituent blocks of size 
2/-!. More formally, M[i, 7] = M[i, 7 — 1] if 
A[M[i, j — 1]] < M[i + 2/-! - 1,7 — 1], and 
M{i, 7] = M[i + 2/-! — 1, j — 1] otherwise. 

How do we use these blocks to compute an 
arbitrary RMQ(i, j)? We select two overlapping 
blocks that entirely cover the subrange: let 2* be 
the size of the largest block that fits into the range 
from i to 7, that is, let k = |log(j —i)|. Then 
RMQ(i, 7) can be computed by comparing the 
minima of the following two blocks: i to i+2*—1 
(M(i,k)) and j —2* +1 to j (M(j —2* +1,k)). 
These values have already been computed, so we 
can find the RMQ in constant time. 

This gives the Sparse Table (ST) algorithm for 
RMQ, with complexity (O(n logn), O(1)). 


LCA on Special Trees and RMQ on 
Special Arrays 


In this section, we consider special cases of LCA 
and RMQ that have fast solutions. These can be 
used to build optimal algorithms. 


Paths and Balanced Binary Trees 

If a tree is a path, that is, every node has outdegree 
1, then computing the LCA is quite trivial. In that 
case, the depth of each node can be computed in 
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O(n) time, and the LCA of two nodes is the node 
with smaller depth. 

For a complete binary tree, the optimal algo- 
rithm is somewhat more involved. In this case, 
each node can be assigned a label of length 
O(logn) from which the LCA can be computed 
in constant time. This labeling idea can be ex- 
tended to general trees, as we will see in sec- 
tion “Labeling Schemes.” 

Consider the following node labeling: for any 
node v of depth d(v), the label L(v) is obtained 
by assigning to the first d(v) bits of the code the 
left-right path from the root, where a left edge is 
coded with a 0 and a right edge is coded with a 1. 
The d(v) + 1" bit is 1, and all subsequent bits, up 
to 1 + logn are 0. 

Now let x = L(u) XOR L(v) and let w = 
LCA(u, v). The first d(w) bits of x are 0, since 
u and v share the same path until then. The next 
bit differs so the d(w) + Ist bit of x is the first 
bit that is 1. Thus, by computing |log x |, we find 
the depth of w. Then we can construct the label 
of w by taking the first d(w) bits of £(u), then 
a | at position d(w) + 1, and then Os. All these 
operations take constant time, and all labels can 
be computed in linear time. 

The first optimal LCA algorithm, by Harel and 
Tarjan [7], decomposed arbitrary trees into paths 
and balanced binary trees, using optimal labeling 
algorithms for each part. Such labeling schemes 
have been used as components in many of the 
subsequent algorithms. It turns out that there is an 
O(log n)-bit labeling scheme for arbitrary trees 
where the LCA can be computed in constant time 
just from labels. This algorithm will be discussed 
in section “Labeling Schemes.” 


An (O(n), O(1))-Time Algorithm for 

+1RMQ 

We have already seen an (O(n logn), O(1))- 
time algorithm for RMQ, which thus yields an 
LCA algorithm of the same complexity. However, 
it is possible to do better, via a simple observa- 
tion, plus the Four Russians technique. 

Consider the RMQ problem generated by the 
reduction given in Lemma 1. The level tour of a 
tree is not an arbitrary instance of RMQ. Rather, 
we note that all entries are integers and adjacent 
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entries differ by one. We call this special case 
+1RMQ. 

If we can show an (O(n), O(1))-time algo- 
rithm for +1RMQ, we directly get an algorithm 
of the same complexity for LCA, by Lemma 1, 
but we also get an algorithm of the same com- 
plexity for general RMQ by Lemma 2. Thus, to 
solve an arbitrary RMQ problem optimally, first 
compute the Cartesian tree and then the Euler and 
Level tours, thus reducing an arbitrary RMQ to a 
+1RMQ in linear time. 

In order to improve the preprocessing of 
+1RMQ, we will use a table lookup technique 
to precompute answers on small subarrays, 
for a log-factor speedup. To this end, partition 
A into blocks of size ee Define an array 
A’[l,...,2n/logn], where A’[i] is the minimum 
element in the 7th block of A. Define an equal size 
array B, where B|i] is a position in the ith block 
in which value A’[i] occurs. Recall that RMQ 
queries return the position of the minimum and 
that the LCA to RMQ reduction uses the position 
of the minimum, rather than the minimum itself. 
Thus, we will use array B to keep track of where 
the minima in A’ came from. 

The ST algorithm runs on array A’ in time 
(O(n), O(1)). Having preprocessed A’ for RMQ, 
consider how we answer any query RMQ(i, /) in 
A. The indices i and 7 might be in the same 
block, so we have to preprocess each block to 
answer RMQ queries. If i < j are in different 
blocks, then we can answer the query RMQ(i, /) 
as follows. First compute the values: 


1. The minimum from forward to the end of its 
block 

2. The minimum of all the blocks between i’s 
block and j’s block 

3. The minimum from the beginning of j’s block 
to j 


The query will return the position of the mini- 
mum of the three values computed. The second 
minimum is found in constant time by an RMQ 
on A’, which has been preprocessed using the ST 
algorithm. But we need to know how to answer 
range minimum queries inside blocks to compute 
the first and third minima and thus to finish off the 
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algorithm. Thus, the in-block queries are needed 
whether i and j are in the same block or not. 

Therefore, we focus now only on in-block 
RMQs. If we simply performed RMQ prepro- 
cessing on each block, we would spend too much 
time in preprocessing. If two blocks were iden- 
tical, then we could share their preprocessing. 
However, it is too much to hope for that blocks 
would be so repeated. The following observation 
establishes a much stronger shared-preprocessing 
property. 


Observation 2 If two arrays X[1,...,k] and 
Y[1,...,k] differ by some fixed value at each 
position, that is, there is ac such that X |i] = 
Y [i] + c for every i, then all RMQ answers will 
be the same for X and Y. In this case, we can use 
the same preprocessing for both arrays. 


Thus, we can normalize a block by subtracting 
its initial offset from every element. We now use 
the +1 property to show that there are very few 
kinds of normalized blocks. 


Lemma 3 There are O(./n) kinds of normalized 
blocks. 


Proof Adjacent elements in normalized blocks 
differ by +1 or —1. Thus, normalized blocks are 
specified by a +1 vector of length (1/2-logn)—1. 
There are 2{/2!eg™)-1 — Q(/n) such vectors. 


We are now basically done. We create O(./7) 
tables, one for each possible normalized block. 
In each table, we put all (284)? = O(log’ n) 
answers to all in-block queries. This gives a total 
of O(./7 log? n) total preprocessing of normal- 
ized block tables and O(1) query time. Finally, 
compute, for each block in A, which normalized 
block table it should use for its RMQ queries. 
Thus, each in-block RMQ query takes a single 
table lookup. 

Overall, the total space and preprocessing used 
for normalized block tables and A’ tables is O(7) 
and the total query time is O(1). This gives an 
optimal algorithm for LCA and RMQ. This algo- 
rithm was first presented as a PRAM algorithm 
by Berkman et al. [3]. Although this algorithm 
is quite simple, and easily implementable, for 
many years after its publication, LCA computa- 
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tion was still considered to be too complicated 
to implement. The algorithm presented here is 
somewhat streamlined compared to Berkman et 
al.’s algorithm, because they had the added goal 
of making a parallelizable algorithm. Thus, for 
example, they used two levels of encoding to 
remove the log factor from the preprocessing, 
with the first level breaking the RMQ into blocks 
of size logn and in the second level breaking 
the blocks up into mini-blocks of size loglogn. 
Similarly, the sparse table algorithm was some- 
what different and required binary-tree LCA as a 
subroutine. 

It is possible, even probably, that the slight 
complexities of the PRAM version of this al- 
gorithm obscured its elegance. This theory was 
tested by Bender and Farach-Colton [2], who 
presented the sequential version of the same algo- 
rithm with the simplified Sparse Table and RMQ 
blocking scheme presented here, with the goal of 
establishing the practicality of LCA computation. 
This seems to have done the trick, and many 
variants and implementations of this algorithm 
now exist. 


Labeling Schemes 


We have already seen a labeling scheme that 
allows for the fast computation of LCA on com- 
plete binary trees. But labels can be used to solve 
the LCA problem on arbitrary trees, as shown by 
Alstrup et al. [1]. 

To be specific, the goal is to assign to every 
node an O(log 7)-bit label so that the label of the 
LCA of two nodes can be computed in constant 
time from their labels. We have seen how to do 
this for a complete binary tree, but the problem 
there was simplified by the fact that the depth of 
such a tree is O(log) and the branching factor 
is two. Here, we consider the general case of 
arbitrary depth and degree. 

Begin by decomposing the tree into heavy 
paths. To do so, let the weight w(v) of any 
node v be the number of nodes in the subtree 
rooted at that node. All edges between a node 
and its heaviest child are called heavy edges 
and all other edges are called light edges. 
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Ties are broken arbitrarily. Note that there 
are O(logn) light edges on any root-leaf 
path. 

A path to a node can be specified by alternately 
specifying how far one traverses a heavy path 
before exiting to a light edge and the rank of 
the light edge at the point of exit. Each such 
code takes O(log) bits yielding O(log’ n) bits 
in total. 

To reduce the number of bits to O(logz), 
Alstrup et al. applied alphabetic codes, which 
preserve lexicographic ordering but are of vari- 
able length. In particular, they used a code with 
the following properties: 


— Let Y = yy, y2,...,¥~ be a sequence of 
integers with a yi=s. 

— There exists an alphabetic sequence B = 
by, b2,..., bx for Y such that, for all i, |b;| < 


[log s] — [log yi]. 


The idea now is that large trees get short 
codes and small trees get large codes. By cleverly 
building alphabetic codes that depend on the size 
of the trees, the code lengths telescope, giving a 
final code of length O(log). We refer the reader 
to the paper or to the excellent presentation by 
Bille [4] for details. 


Succinct Representations 


There have been several extensions of the 
LCA/RMQ problem. The one that has received 
the most attention is that of succinct structures 
to compute the LCA or RMQ. These structures 
are succinct because they use O(n) bits of extra 
space, as opposed to the structures above, which 
use §2(n) words of memory, or §2(n log 7) bits. 

The first succinct solution for LCA was due 
to Sadakane [10], who gave an optimal LCA al- 
gorithm using 2n + O(n(log logn)?/ log n) bits. 
The main approach of this algorithm is to replace 
the Sparse Table algorithm with a more bit- 
efficient variant. 

The first succinct solution for RMQ was also 
due to Sadakane [9]. His algorithm takes 4n + 
o(n) bits. The main idea, once again, follows the 
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Sparse Table algorithm. By using a ternary Carte- 
sian tree which stores values not in internal nodes 
but in leaves, the preorders of nodes coincide the 
orders of the values. 

The current best solution is by Fischer [5], 
who uses 2n + o(n) bits, which is shown to be 
optimal up to lower-order terms. The structure 
using the least known o(n)-bit term [8] uses 
2n + O(n/ log n) bits, for any constant c. All 
the succinct solutions are O(1) time. 
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Problem Definition 


Linear and integer programs have played a crucial 
role in the theory of approximation algorithms for 
combinatorial optimization problems. While they 
have also been central in identifying polynomial 
time solvable problems, it is only recently that 
these tools have been put to use in designing exact 
algorithms for NP-complete problems. Following 
the paradigm of above-guarantee parameteriza- 
tion in fixed-parameter tractability, these efforts 
have focused on designing algorithms where the 
exponential component of the running time de- 
pends only on the excess of the solution above 
the optimum value of a linear program for the 
problem. 


Method Description 

The linear program obtained from a given integer 
linear program (ILP) by relaxing the integrality 
conditions on the variables is called the stan- 
dard relaxation of the ILP or the standard LP. 
Similarly, the linear program obtained from the 
ILP by restricting the domain of the variables 
to the set of all half integers is called the half- 
integral relaxation of the ILP or the half-integral 
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LP. The standard LP is said to have a half- 
integral optimum if the optimum values of the 
standard relaxation and half-integral relaxation 
coincide. 

The VERTEX COVER problem provides one of 
the simplest illustrations of this method. In this 
problem, the objective is to find a minimum-sized 
subset of vertices whose removal makes a given 
graph edgeless. In the well-known integer lin- 
ear programming formulation (ILP) for VERTEX 
COVER, given a graph G, a feasible solution is 
defined as a function x : V — {0,1} satisfying 
the edge constraints x(u) + x(v) > 1 for every 
edge (u, v). The objective of the linear program is 
to minimize Y,<ey x (u) over all feasible solutions 
x. The value of the optimum solution to this ILP 
is denoted by vc(G). In the standard relaxation 
of the above ILP, the constraint x(v) € {0, 1} is 
replaced with x(v) > 0, for all v € V. Fora 
graph G, this relaxation is denoted by LPVC(G), 
and the minimum value of LPVC(G) is denoted 
by vc*(G). 

It is known that LPVC(G) has a half-integral 
optimum [10] and that LPVC(G) is persistent 
[11], that is, if a variable is assigned O (respec- 
tively 1) in an optimum solution to the stan- 
dard LP, then it can be discarded from (respec- 
tively included into) an optimum vertex cover 
of G. Based on the persistence of LPVC(G), 
a polynomial time preprocessing procedure for 
VERTEX COVER immediately follows. More pre- 
cisely, as long as there is an optimum solu- 
tion to the standard LP which assigns 0 or 1 
to a vertex of G, one may discard or include 
this vertex in the optimum solution. When this 
procedure cannot be executed any longer, an 
arbitrary vertex of G is selected, and the al- 
gorithm branches into 2 exhaustive cases based 
on this vertex being included or excluded in an 
optimum vertex cover of G. Standard analysis 
shows that this is an algorithm running in time 


0(4e@)—ve"(@))|GlOM), 
Key Results 


This method was first used in the context of 
fixed-parameter tractability by Guillemot [5] 
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who used it to give FPT algorithms for path- 
transversal and cycle-transversal problems. 
Subsequently, Cygan et al. [4] improved upon 
this result to give an FPT algorithm for the 
MULTIWAY CUT problem parameterized above 
the LP value. In this problem, the objective is 
to find a smallest set of vertices which pair-wise 
separates a given subset of vertices of a graph. As 
a consequence of this algorithm, they were able 
to obtain the current fastest FPT algorithm for 
MULTIWAY CUT parameterized by the solution 
size. 


Theorem 1 ([4]) There is an algorithm for 
MULTIWAY CUT running in time O(2*|G|°™), 
where k is the size of the solution. 


Following this work, Narayanaswamy 
et al. [9] and Lokshtanov et al. [8] considered the 
VERTEX COVER problem and built upon these 
methods with several additional problem-specific 
steps to obtain improved FPT algorithms for 
several problems, with the most notable among 
them being the ODD CYCLE TRANSVERSAL 
problem — the problem of finding a smallest set 
of vertices to delete in order to obtain a bipartite 
graph. These results were the first improvements 
over the very first FPT algorithm for this problem 
given by Reed, Smith, and Vetta [12]. 


Theorem 2 ([8]) There is an algorithm for 
ODD CYCLE TRANSVERSAL running in time 
O(2.32*|G|°™), where k is the size of the 
solution. 


Iwata et al. [7] applied this method to 
several problems to improve the polynomial 
dependence of the running times on the input 
Using network-flow-based linear time 
algorithms for solving the half-integral Vertex 
Cover LP (LPVC), they obtained an FPT 
algorithm for ODD CYCLE TRANSVERSAL 
with a linear dependence on the input size. 
Most recently, using tools from the theory of 
constraint satisfaction, Wahlstrom [13] extended 
this approach to a much broader class of 
problems with half-integral LPs and obtained 
improved FPT algorithms for a number of 
problems including node-deletion UNIQUE 
LABEL COVER and GROUP FEEDBACK VERTEX 


size. 


LP Based Parameterized Algorithms 


SET. The UNIQUE LABEL COVER problem plays 
a central role in the theory of approximation 
and was studied from the point of view of 
parameterized complexity by Chitnis et al. [2]. 
The GROUP FEEDBACK VERTEX SET problem 
is a generalization of the classical FEEDBACK 
VERTEX SET problem. The fixed-parameter 
tractability of this problem was proved in [5] 
and [3]. 


Theorem 3 ((2]) There is an algorithm for 
node-deletion UNIQUE LABEL COVER running 
in time O(||2*|G|°™) and an algorithm for 
GROUP FEEDBACK VERTEX running in time 
0(4*|G|°™). In the first case, © denotes the 
size of the alphabet, and in either case, k denotes 
the size of the solution. 


Applications 


This method relies crucially on the half integral- 
ity of a certain LP for the problem at hand. The 
most well-known problems with this property are 
VERTEX COVER, MULTIWAY CUT, and certain 
problems for which Hochbaum [6] defined a par- 
ticular kind of ILPs, referred to as IP2. However, 
the work of Wahlstrom [13] lifts this approach to 
a more general class of problems by interpreting 
a half-integral relaxation as a polynomial-time 
solvable problem on the discrete search space of 


1 
(0,2, 1%, 


Open Problems 


A primary challenge here is to build upon these 
LP-based tools to design an FPT algorithm for 
ODD CYCLE TRANSVERSAL with a provably 
optimal dependence on the parameter under ap- 
propriate complexity theoretic assumptions. 


Experimental Results 


Experimental results comparing algorithms for 
VERTEX COVER based on this method with other 
state-of-the art empirical methods are given in 


(ui: 
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Problem Definition 


Error-correcting codes are fundamental tools 
used to transmit digital information over 
unreliable channels. Their study goes back to the 
work of Hamming and Shannon, who used them 
as the basis for the field of information theory. 
The problem of decoding the original information 
up to the full error-correcting potential of the 
system is often very complex, especially for 
modern codes that approach the theoretical limits 
of the communication channel. 

LP decoding [4, 5, 8] refers to the appli- 
cation of linear programming (LP) relaxation 
to the problem of decoding an error-correcting 
code. Linear programming relaxation is a stan- 
dard technique in approximation algorithms and 
operations research, and is central to the study of 
efficient algorithms to find good (albeit subopti- 
mal) solutions to very difficult optimization prob- 
lems [13]. LP decoders have tight combinatorial 
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characterizations of decoding success that can be 
used to analyze error-correcting performance. 

The codes for which LP decoding has received 
the most attention are low-density parity-check 
(LDPC) codes [9], due to their excellent error- 
correcting performance. The LP decoder is par- 
ticularly attractive for analysis of these codes 
because the standard message-passing algorithms 
such as belief propagation (see [15]) used for 
decoding are often difficult to analyze, and indeed 
the performance of LP decoding is closely tied to 
these methods. 


Error-Correcting Codes 

and Maximum-Likelihood Decoding 

This section begins with a very brief overview of 
error-correcting codes, sufficient for formulating 
the LP decoder. Some terms are not defined 
for space reasons; for a full treatment of error- 
correcting codes in context, the reader is referred 
to textbooks on the subject (e.g., [11]). 

A binary error-correcting code is a subset 
C Cc {0,1}". The rate of the code C is 
r = log(|C|)/n. A linear binary code is a linear 
subspace of {0,1}". A codeword is a vector 
y €C. Note that 0” is always a codeword of 
a linear code, a fact that will be useful later. When 
the code is used for communication, a codeword 
y €C is transmitted over a noisy channel, 
resulting in some received word y € ©", where 
»/ is some alphabet that depends on the channel 
model. Generally in LP decoding a memoryless, 
symmetric channel is assumed. One common 
such channel is the binary symmetric channel 
(BSC) with parameter p, which will be referred 
to as BSC,, where 0 < p < 1/2. In the BSCp, 
the alphabet is X& = {0,1}, and for each i, the 
received symbol }; is equal to y; with probability 
p, and y; =1—y; otherwise. Although LP 
decoding works with more general channels, 
this chapter will focus on the BSC,. 

The maximum-likelihood (ML) decoding 
problem is the following: given a received 
word y € {0,1}", find the codeword y* €C 
that is most likely to have been sent over the 
channel. Defining the vector y € {—1,+1}” 
where y; = 1 — 2;, it is easy to show: 
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y* = argmin ) ° Vidi- (1) 


yec i 
The complexity of the ML decoding problem de- 
pends heavily on the code being used. For simple 
codes such as a repetition code C = {0", 1”}, the 
task is easy. For more complex (and higher-rate) 
codes such as LDPC codes, ML decoding is NP- 
hard [1]. 
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Since ML decoding can be very hard in general, 
one turns to sub-optimal solutions that can be 
found efficiently. LP decoding, instead of trying 
to solve (1), relaxes the constraint y € C, and 
instead requires that y € P for some succinctly 
describable linear polytope P C [0, 1]”, resulting 
in the following linear program: 


n 
YLp = arg min) > Vidi- 
yeP 


i=1 


(2) 


It should be the case that the polytope includes all 
the codewords, and does not include any integral 
non-codewords. As such, a polytope P is called 
proper for code C if PM {0, 1}” = C. 

The LP decoder works as follows. Solve the 
LP in (2) to obtain yyp € [0, 1]”. If pp is integral 
(i.e., all elements are 0 or 1), then output yrp. 
Otherwise, output “error”. By the definition of 
a proper polytope, if the LP decoder outputs 
a codeword, it is guaranteed to be equal to the 
ML codeword y*. This fact is known as the ML 
certificate property. 


Comparing with ML Decoding 

A successful decoder is one that outputs the orig- 
inal codeword transmitted over the channel, and 
so the quality of an algorithm is measured by the 
likelihood that this happens. (Another common 
non-probabilistic measure is the worst-case per- 
formance guarantee, which measures how many 
bit-flips an algorithm can tolerate and still be 
guaranteed to succeed.) Note that y* is the one 
most likely to be the transmitted codeword y, but 
it is not always the case that y* = ». However, no 
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LP Decoding, Fig. 1 

A decoding polytope P 
(dotted line) and the 
convex hull C (solid line) 
of the codewords y, y1, 
y2, and y3. Also shown 
are the four possible cases 
(a—d) for the objective 
function, and the normal 
cones to both P and C 


decoder can perform better than an ML decoder, 
and so it is useful to use ML decoding as a basis 
for comparison. 

Figure | provides a geometric perspective of 
LP decoding, and its relation to exact ML de- 
coding. Both decoders use the same LP objec- 
tive function, but over different constraint sets. 
In exact ML decoding, the constraint set is the 
convex hull C of codewords (i.e., the set of points 
that are convex combinations of codewords from 
C), whereas relaxed LP decoding uses the larger 
polytope P. In Fig. 1, the four arrows labeled 
(a)-(d) correspond to different “noisy” versions 
of the LP objective function. (a) If there is very 
little noise, then the objective function points to 
the transmitted codeword y, and thus both ML 
decoding and LP decoding succeed, since both 
have the transmitted codeword y as the optimal 
point. (b) If more noise is introduced, then ML 
decoding succeeds, but LP decoding fails, since 
the fractional vertex y’ is optimal for the relax- 
ation. (c) With still more noise, ML decoding 
fails, since y3 is now optimal; LP decoding still 
has a fractional optimum y’, so this error is in 
some sense “detected”. (d) Finally, with a lot of 
noise, both ML decoding and LP decoding have 
y3 as the optimum, and so both methods fail 
and the error is “undetected”. Note that in the 
last two cases (c, d), when ML decoding fails, 
the failure of the LP decoder is in some sense 
the fault of the code itself, as opposed to the 
decoder. 
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Normal Cones and C-Symmetry 
The (negative) normal cones at y (also called the 
fundamental cone [10)]) is defined as follows: 


Ny (P)={yeER" a> Vi (yi—y;)=0 for all yeP}, 
i 


Ny(C)={yER": 9° vi Gi—Ji) 20 for all yeC}. 
i 


Note that N;(P) corresponds to the set of cost 
vectors y such that y is an optimal solution 
to (2). The set N;(C) has a similar interpretation 
as the set of cost vectors y for which y is the 
ML codeword. Since P CC, it is immediate 
from the definition that N,(C) > Ny(P) for all 
y €C. Fig. | shows these two cones and their 
relationship. 

The success probability of an LP decoder is 
equal to the total probability mass of N;(P), 
under the distribution on cost vectors defined 
by the channel. The success probability of ML 
decoding is similarly related to the probability 
mass in the normal cone Ny(C). Thus, the dis- 
crepancy between the normal cones of P and C 
is a measure of the gap between exact ML and 
relaxed LP decoding. 

This analysis is specific to a particular trans- 
mitted codeword y, but one would like to apply it 
in general. When dealing with linear codes, for 
most decoders one can usually assume that an 
arbitrary codeword is transmitted, since the de- 
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cision region for decoding success is symmetric. 
The same holds true for LP decoding (see [4] for 
proof), as long as the polytope P is C-symmetric, 
defined as follows: 


Definition 1 A proper polytope P for the 
binary code C is C-symmetric if, for all 
y €P and y €C, it holds that y’ € P, where 


y;, = |yi — dil. 


Using a Dual Witness to Prove Error 

Bounds 

In order to prove that LP decoding succeeds, 
one must show that y is the optimal solution to 
the LP in (2). If the code C is linear, and the 
relaxation is proper and C-symmetric, one can 
assume that y = 0”, and then show that 0” is 
optimal. Consider the dual of the decoding LP 
in (2). If there is a feasible point of the dual LP 
that has the same cost (i.e., zero) as the point 0” 
has in the primal, then 0” must be an optimal 
point of the decoding LP. Therefore, to prove that 
the LP decoder succeeds, it suffices to exhibit 
a zero-cost point in the dual. (Actually, since the 
existence of the zero-cost dual point only proves 
that 0” is one of possibly many primal optima, 
one needs to be a bit more careful, a minor 
issue deferred to more complete treatments of this 
material.) 


Key Results 


LP decoders have mainly been studied in the 
context of Low-Density Parity-Check codes [9], 
and their generalization to expander codes [12]. 
LP decoders for Turbo codes [2] have also been 
defined, but the results are not as strong. This 
summary of key results gives bounds on the word 
error rate (WER), which is the probability, over 
the noise in the channel, that the decoder does 
not output the transmitted word. These bounds 
are relative to specific families of codes, which 
are defined as infinite set of codes of increasing 
length whose rate is bounded from below by 
some constant. Here the bounds are given in 
asymptotic form (without constants instantiated), 
and only for the binary symmetric channel. 
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Many other important results that are not listed 
here are known for LP decoding and related 
notions. Some of these general areas are surveyed 
in the next section, but there is insufficient space 
to reference most of them individually; the reader 
is referred to [3] for a thorough bibliography. 


Low-Density Parity-Check Codes 

The polytope P for LDPC codes, first defined 
in [4, 8, 10], is based on the underlying Tanner 
graph of the code, and has a linear number of 
variables and constraints. If the Tanner graph 
expands sufficiently, it is known that LP decoding 
can correct a constant fraction of errors in the 
channel, and thus has an inverse exponential error 
rate. This was proved using a dual witness: 


Theorem 1 ((6]) For any rate r > 0, there is 
a constant € > 0 such that there exists a rate 
r family of low-density parity-check codes with 
length n where the LP decoder succeeds as long 
as at most €n bits are flipped by the channel. This 
implies that there exists a constant €' > 0 such 
that the word error rate under the BSCy with 
p <€ is at most 2~-2™, 


Expander Codes 

The capacity of a communication channel bounds 
from above the rate one can obtain from a family 
of codes and still get a word error rate that goes 
to zero as the code length increases. The notation 
Cp is used to denote the capacity of the BSC,. 
Using a family of codes based on expanders [12], 
LP decoding can achieve rates that approach 
capacity. Compared to LDPC codes, however, 
this comes at the cost of increased decoding 
complexity, as the size of the LP is exponential 
in the gap between the rate and capacity. 


Theorem 2 ([7]) For any p > 0, and any rate 
r < Cp, there exists a rate r family of expander 
codes with length n such that the word error 


rate of LP decoding under the BSCp is at most 
2-2). 


Turbo Codes 
Turbo codes [2] have the advantage that they can 
be encoded in linear time, even in a streaming 
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fashion. Repeat-accumulate codes are a simple 
form of Turbo code. The LP decoder for Turbo 
codes and their variants was first defined in [4, 5], 
and is based on the trellis structure of the com- 
ponent convolutional codes. Due to certain prop- 
erties of turbo codes it is impossible to prove 
bounds for turbo codes as strong as the ones for 
LDPC codes, but the following is known: 


Theorem 3 ([5]) There exists a rate 1/2 — o(1) 
family of repeat-accumulate codes with length n, 
and a constant € > 0, such that under the BSCp 
with p < €, the LP decoder has a word error rate 
of at most n~2Q, 


Applications 


The application of LP decoding that has received 
the most attention so far is for LDPC codes. The 
LP for this family of codes not only serves as 
an interesting alternative to more conventional 
iterative methods [15], but also gives a useful 
tool for analyzing those methods, an idea first 
explored in [8, 10, 14]. Iterative methods such 
as belief propagation use local computations on 
the Tanner graph to update approximations of the 
marginal probabilities of each code bit. In this 
type of analysis, the vertices of the polytope P are 
referred to as pseudocodewords, and tend to coin- 
cide with the fixed points of this iterative process. 
Other notions of pseudocodeword-like structures 
such as stopping sets are also known to coincide 
with these polytope vertices. Understanding these 
structures has also inspired the design of new 
codes for use with iterative and LP decoding. 
(See [3] for a more complete bibliography of this 
work). 

The decoding method itself can be extended 
in many ways. By adding redundant informa- 
tion to the description of the code, one can de- 
rive tighter constraint sets to improve the error- 
correcting performance of the decoder, albeit at 
an increase in complexity. Adaptive algorithms 
that try to add constraints “on the fly” have also 
been explored, using branch-and-bound or other 
techniques. Also, LP decoding has inspired the 
use of other methods from optimization theory in 
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decoding error-correcting codes. (Again, see [3] 
for references.) 


Open Problems 


The LP decoding method gives a simple, efficient 
and analytically tractable approach to decoding 
error-correcting codes. The results known to this 
point serve as a proof of concept that strong 
bounds are possible, but there are still important 
questions to answer. Although LP decoders can 
achieve capacity with decoding time polynomial 
in the length of the code, the complexity of the 
decoder still depends exponentially on the gap 
between the rate and capacity (as is the case 
for all other known provably efficient capacity- 
achieving decoders). Decreasing this dependence 
would be a major accomplishment, and perhaps 
LP decoding could help. Improving the fraction 
of errors correctable by LP decoding is also an 
important direction for further research. 

Another interesting question is whether there 
exist constant-rate linear-distance code families 
for which one can formulate a polynomial-sized 
exact decoding LP. Put another way, is there 
a constant-rate linear-distance family of codes 
whose convex hulls have a polynomial number 
of facets? If so, then LP decoding would be 
equivalent to ML decoding for this family. If not, 
this is strong evidence that suboptimal decoding 
is inevitable when using good codes, which is 
a common belief. 

An advantage to LP decoding is the ML cer- 
tificate property mentioned earlier, which is not 
enjoyed by most other standard suboptimal de- 
coders. This property opens up the possibility for 
a wide range of heuristics for improving decoding 
performance, some of which have been analyzed, 
but largely remain wide open. 

LP decoding has (for the most part) only 
been explored for LDPC codes under memoryless 
symmetric channels. The LP for turbo codes has 
been defined, but the error bounds proved so 
far are not a satisfying explanation of the ex- 
cellent performance observed in practice. Other 
codes and channels have gotten little, if any, 
attention. 
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Problem Definition 


Majority rule is arguably the best decision mech- 
anism for public decision-making, which is em- 
ployed not only in public management but also in 
business management. The concept of majority 
equilibrium captures such a democratic spirit in 
requiring that no other solutions would please 
more than half of the voters in comparison to 
it. The work of Chen, Deng, Fang, and Tian 
[1] considers a public facility location problem 
decided via a voting process under the majority 
rule on a discrete network. This work distin- 
guishes itself from previous work by applying 
© Springer Science+Business Media New York 2016 


M.-Y. Kao (ed.), Encyclopedia of Algorithms, 
DOI 10.1007/978-1-4939-2864-4 


the computational complexity approach to the 
study of majority equilibrium. For the model with 
a single public facility located in trees, cycles, 
and cactus graphs, it is shown that the majority 
equilibrium can be found in linear time. On the 
other hand, when the number of public facili- 
ties is taken as the input size (not a constant), 
finding a majority equilibrium is shown to be 
NP-hard. 

Consider a network G = ((V,), (E,/)) with 
vertex and edge- weight functions w : V > Rt 
and/ : E — R™, respectively. Each vertex i € 
V represents a community, and w(i) represents 
the number of voters that reside there. For each 
e € E,l(e) > O represents the length of the road 
e = (i, j) connecting two communities 7 and 7. 
For two vertices i, 7 € V, the distance between 
i and j, denoted by dg(i, j), is the length of 
a shortest path joining them. The location of 
a public facility such as a library, community 
center, etc., is to be determined by the public 
via a voting process under the majority rule. 
Here, each member of the community desires to 
have the public facility close to himself, and the 
decision has to be agreed upon by a majority 
of the voters. Denote the vertex set of G by 
V = {v1, v2,..., Un}. Then each v; € V has a 
preference order >; on V induced by the distance 
on G. That is, x >; y if and only if dg(v;, x) < 
dg(v;, y) for two vertices x,y € V; similarly, 
x >; y if and only if dg(vui,x) < dg (vi, y). 
Based on such a preference profile, four types of 
majority equilibrium, called Condorcet winners, 
are defined as follows. 
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Definition 1 Let vo € V, then vo is called: 


1. A weak quasi-Condorcet winner, if for every 
u € V distinct of vo, 


w({u; € Vz v9 >i u}) = D> w(v;)/2:; 


vjiEV 


2. A strong quasi-Condorcet winner, if for every 
u € V distinct of vo, 


w({u; € V: uo >; uy) > > w(uj)/2; 


ujpEV 


3. A weak Condorcet winner, if for every u € V 
distinct of vo, 


a({u; € V:iu9 > us) > w({u; € V : u>vo}); 


4. A strong Condorcet winner, if for every u € V 
distinct of vo, 


a({u; € V:u9 >u}) > w({u; € Vs u>vo}). 


Under the majority voting mechanism described 
above, the problem is to develop efficient ways 
for determining the existence of Condorcet win- 
ners and finding such a winner when one exists. 


Problem 1 (Finding Condorcet Winners) IN- 
PUT: A network G = ((V, w), (E,/)). OUTPUT: 
A Condorcet winner v € V or nonexistence of 
Condorcet winners. 


Key Results 


The mathematical results on the Condorcet 
winners depend deeply on the understanding of 
combinatorial structures of underlying networks. 
Theorems 1-3 below are given for weak quasi- 
Condorcet winners in the model with a single 
facility to be located. Other kinds of Condorcet 
winners can be discussed similarly. 


Theorem 1 Every tree has one weak quasi- 
Condorcet winner or two adjacent weak 
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quasi-Condorcet winners, which can be found 
in linear time. 


Theorem 2 Let C, be a cycle of order n with 
vertex-weight function ® : V(Cy) — R*. Then 
v € V(C,) is a weak quasi-Condorcet winner 
of Cn if and only if the weight of each [4 |- 


; Seuss : 1 
interval containing v is at least 5 >) w(v). 


vECn 
Furthermore, the problem of finding a weak 


quasi-Condorcet winner of Cy is solvable in 
linear time. 


Given a graph G = (V, E), a vertex v of G 
is a cut vertex if E(G) can be partitioned into 
two nonempty subsets £; and E> such that the 
induced graphs G[E,] and G[E2] have just the 
vertex v in common. A block of G is a connected 
subgraph of G that has no cut vertices and is 
maximal with respect to this property. Every 
graph is the union of its blocks. A graph G is 
called a cactus graph, if G is a connected graph 
in which each block is an edge or a cycle. 


Theorem3 The problem of finding a weak 
quasi-Condorcet winner of a cactus graph with 
vertex-weight function is solvable in linear 
time. 


In general, the problem can be extended to 
the cases where a number of public facilities are 
required to be located during one voting process, 
and the definitions of Condorcet winners can also 
be extended accordingly. In such cases, the public 
facilities may be of the same type or different 
types; and the utility functions of the voters may 
be of different forms. 


Theorem 4 /f there are a bounded constant 
number of public facilities to be located at one 
voting process under the majority rule, then the 
problem of finding a Condorcet winner (any of 
the four types) can be solved in polynomial time. 


Theorem 5 [f the number of public facilities to 
be located is not a constant but considered as 
the input size, the problem of finding a Condorcet 
winner is NP-hard; and the corresponding deci- 
sion problem: deciding whether a candidate set 
of public facilities is a Condorcet winner is co- 
NP-complete. 
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Applications 


Damange [2] first reviewed continuous and dis- 
crete spatial models of collective choice, aim- 
ing at characterizing the public facility location 
problem as a result of the pubic voting process. 
Although the network models in Chen et al. [1] 
have been studied for some problems in eco- 
nomics [3,4], the main point of Chen et al.’s work 
is the computational complexity and algorithmic 
approach. This approach can be applied to more 
general public decision-making processes. 

For example, consider a public road repair 
problem, pioneered by Tullock [5] to study re- 
distribution of tax revenue under a majority rule 
system. An edge-weighted graph G = (V, E,w) 
represents a network of local roads, where the 
weight of each edge represents the cost of repair- 
ing the road. There is also a distinguished vertex 
s € V representing the entry point to the highway 
system. The majority decision problem involves 
a set of agents A C V situated at vertices of the 
network who would choose a subset F' of edges. 
The cost of repairing F’, which is the sum of the 
weights of edges in F, will be shared by all n 
agents, each an n-th of the total. In this model, 
a majority stable solution under the majority rule 
is a subset F C E that connects s to a subset 
A, C A of agents with |A;| > |A|/2 such that 
no other solution H connecting s to a subset 
of agents A2 C A with |A2|A|/2 satisfies the 


conditions that }* w(e) < >> w(e), and for 
. ecH eck ; . 
each agent in Ag, its shortest path to s in solution 


HZ is not longer than that in solution F’, and at 
least one of the inequalities is strict. It is shown 
in Chen et al. [1] that for this model, finding 
a majority equilibrium is \/P-hard for general 
networks and is polynomially solvable for tree 
networks. 


Cross-References 


General Equilibrium 

Leontief Economy Equilibrium 

Local Search for K-medians and Facility Loca- 
tion 


1185 


Recommended Reading 


1. Chen L, Deng X, Fang Q, Tian F (2002) Majority 
equilibrium for public facility allocation. Lect Notes 
Comput Sci 2697:435-444 

2. Demange G (1983) Spatial models of collective choice. 
In: Thisse JF, Zoller HG (eds) Locational analysis of 
public facilities. North-Holland Publishing Company, 
Amsterdam 

3. Hansen P, Thisse JF (1981) Outcomes of voting and 
planning: condorcet, weber and rawls locations. J Publ 
Econ 16:1-15 

4. Schummer J, Vohra RV (2002) Strategy-proof location 
on a network. J Econ Theory 104:405—428 

5. Tullock G (1959) Some problems of majority voting. 
J Polit Econ 67:571-579 


Manifold Reconstruction 


Siu-Wing Cheng 

Department of Computer Science and 
Engineering, Hong Kong University of Science 
and Technology, Hong Kong, China 


Keywords 


Czech complex; Delaunay complex; Homology; 
Homeomorphism; Implicit function; Voronoi 
diagram 


Years and Authors of Summarized 
Original Work 


2005; Cheng, Dey, Ramos 

2008; Niyogi, Smale, Weinberger 
2014; Boissonnat, Ghosh 

2014; Cheng, Chiu 


Problem Definition 


With the widespread of sensing and Internet tech- 
nologies, a large number of numeric attributes 
for a physical or cyber phenomenon can now 
be collected. If each attribute is viewed as a 
coordinate, an instance in the collection can be 
viewed as a point in R? for some large d. When 
the physical or cyber phenomenon is governed by 
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only a few latent parameters, it is often postu- 
lated that the data points lie on some unknown 
smooth compact manifold M of dimension k, 
where k < d. The goal is to reconstruct a 
faithful representation of M from the data points. 
Reconstruction problem are ill-posed in general. 
Therefore, the data points are assumed to be 
dense enough so that it becomes theoretically 
possible to obtain a faithful reconstruction. The 
quality of the reconstruction is measured in sev- 
eral ways: the Hausdorff distance between the 
reconstruction and M, the deviation between the 
normal spaces of the reconstruction and M at 
nearby points, and whether the reconstruction and 
M are topologically equivalent. 


Key Results 


It is clear that more data points are needed in 
some parts of M than others, and this is captured 
well by the concepts of medial axis and local 
feature size. The medial axis of M is the closure 
of the set of points in IR? that are at the same 
distances from two or more closest points in M. 
For every point x € M, its local feature f(x) is 
the distance from x to the medial axis of M. The 
input set S of data points in M is an €-sample 
if for every point x € M, d(x,S) < € f(x). 
The input set S is a uniform e-sample if for 
every point x € M, d(x,S) < e, assuming 
that the ambient space has been scaled such that 
minxe f(x) = 1. Furthermore, for every pair 
of points p,g € S, if d(p,q) = df(p) or 
d(p,q) = 6 for some constant 6, then we call 
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S an (€,6)-sample or a uniform (e, 6)-sample, 
respectively. Most reconstruction results in the 
literature are about what theoretical guarantees 
can be offered when € is sufficiently small. 


Cocone Complex 

Cheng, Dey, and Ramos [7] gave the first proof 
that a faithful homeomorphic simplicial complex 
can be constructed from the data points. First, 
the dimension k of M and the tangent spaces 
at the data points are estimated using known 
algorithms in the literature (e.g., [4, 8, 10]). Let 
6 be some appropriately small constant angle. 
The 6-cocone at a point p € S is the subset of 
points z € R®@ such that pz makes an angle at 
most 6 with the estimated tangent space at p. 
Consider the Delaunay triangulation of S. Let 
o be a Delaunay simplex and let Vz be its dual 
Voronoi cell. The cocone complex K of S is the 
collection of Delaunay simplices o such that Vg 
intersects the @-cocones of the vertices of o. It 
turns out that if the simplices in the Delaunay 
triangulation of S have bounded aspect ratio, 
then K is a k-dimensional simplicial complex 
homeomorphic to M. Moreover, the Hausdorff 
distance between K and M is at most « times 
the local feature size, and the angle deviation 
between the normal spaces of K and M at nearby 
points is O(e). A difficult step of the algorithm 
is the generalization of sliver removal [6] in R? 
to R@ in order to ensure that the simplices have 
bounded aspect ratio. The intuition is simple, but 
the analysis is quite involved. Figure | shows a 
sliver pqrs. View the Delaunay triangulation as a 
weighted Delaunay triangulation with all weights 


Manifold Reconstruction, Fig. 1 As the weight of p increases, the orthoball becomes larger and moves away from p 
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equal to zero. If the weight of p is increased, 
the orthoball of pgrs — the ball at zero power 
distances from all its vertices - moves away from 
p and becomes larger. Therefore, the ball will 
become so large that it contains some other points 
in S, thereby eliminating the sliver pgrs from the 
weighted Delaunay triangulation. 


Theorem 1 ([7]) There exist constants « and 6 
such that if S is an (€,6)-sample of a smooth 
compact manifold of dimension k in R?@, a simpli- 
cial complex K can be constructed such that: 


¢ Kis homeomorphic to M. 

¢ Lett be a j-dimensional simplex of K. Let p 
be a vertex of t. For every point q € M, if 
d(p.q) = Ole f(p)), then for every normal 
vector n at q, t has a normal vector v that 
makes an O(e) angle with n. 

¢ For every point x in K, its distance from the 
nearest point y € M is O(€ f(y)). 


The running time of the algorithm is exponential 
ind. 

Boissonnat, Guibas, and Oudot [3] show that 
the Voronoi computation can be avoided if one 
switches to the weighted witness complex. More- 
over, given an €-sample without the lower bound 
on the interpoint distances, they can construct a 
family of plausible reconstructions of dimensions 
1,2,...,k and let the user choose an appropriate 
one. The sliver issue is also encountered [3] and 
resolved in an analogous manner. 


Cech Complex 

Betti numbers are informative topological in- 
variants of the shape. There are d + 1 betti 
numbers, 6; for i € [0,d]. The zeroth betti 
number fo is the number of connected compo- 
nents in M. In three dimensions (i.e., d = 3), 
B2 is the number of voids in M (i.e., bounded 
components in R? \ M), and f, is the number 
of independent one-dimensional cycles in M 
that cannot be contracted within M to a single 
point. Two cycles are homologous if one can 
be deformed into the other continuously. Two 
overlapping cycles can be combined by eliminat- 
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ing the overlap. A set of cycles are independent 
if no cycle can be obtained by continuously de- 
forming and combining some other cycles in the 
set. A tunnel is physically accommodated in the 
complement of M, and its existence is witnessed 
by a one-dimensional cycle in M that “circles 
around” the tunnel (i.e., cannot be contracted in 
M to a single point). Therefore, the number of 
independent one-dimensional cycles that cannot 
be contracted to a single point measures the num- 
ber of “independent tunnels.” In fact, voids live 
in the complement of M too and their number 
is measured by the number of independent 2- 
dimensional cycles that cannot be contracted to 
a single point. In general, 6; is the number of 
independent i-dimensional cycles that cannot be 
contracted in M to a single point. Alternatively, 
f; is the rank of the ith homology group. Note 
that 8B; = 0 fori > k as M is k-dimensional. 

Numerical procedures are known to compute 
the betti numbers of a simplicial complex 
(e.g., [11]) and only the incidence relations 
among the elements in the complex are needed. 
Therefore, given a homeomorphic simplicial 
complex K of M, the betti numbers of K and 
hence of M can be computed. In fact, K and M 
have the same homology (groups). But requiring 
a homeomorphic complex is an overkill. Let B 
be a set of balls of the same radius r. For every 
subset of balls in 6, connect the ball centers in 
the subset to form a simplex if these balls have a 
nonempty common intersection. The resulting 
collection of simplices is known as a Cech 
complex C. Niyogi, Smale, and Weinberger [12] 
proved that if S is a dense sample from a 
uniform probability distribution on M and r is 
set appropriately, then C has the same homology 
as M with high probability. 


Theorem 2 ({12]) Let S be a set of n points 
drawn in 1.i.d. fashion according to the uniform 
probability measure on M. Assume thatr < 1/2. 
There exists constants a, and a2 depending on 
M such that ifn > o;(log a2 + log(1/6)), then 
C has the same homology as M with probability 
greater than | — 6. 


In general the Cech complex contains many 
simplices. Niyogi, Smale, and Weinberger 


1188 


showed that the same result also holds when 
the probability distribution has support near M 
instead of exactly on M. Subsequently, similar 
results have also been obtained for the Vietoris- 
Rips complex [1]. 


Tangential Delaunay Complex 

Although the cocone complex gives a homeomor- 
phic reconstruction, the running time is exponen- 
tial in d. It is natural to ask whether the running 
time can be made to depend exponentially on k 
instead. Boissonnat and Ghosh [2] answered this 
question affirmatively by introducing a new local 
Delaunay reconstruction. 

Let S be a dense (€,6)-sample of M. Sup- 
pose that the tangent spaces at the data points 
in S have been estimated with an O(e) angular 
error. Take a point p € S. Let Hp be the 
estimated tangent space at p. Let Vp be the 
Voronoi cell owned by p. Identify the set star(p) 
of Delaunay simplices that are incident to p 
and whose dual Voronoi faces intersect Hy. The 
collection of all such stars form the tangential 
Delaunay complex. The bisectors between p and 
the other points in S intersect Hp in (k — 1)- 
dimensional affine subspaces. These affine sub- 
spaces define a Voronoi diagram in H,, and 
star(p) is determined by the cell owned by p in 
this Voronoi diagram in H,. Therefore, no data 
structure of dimension higher than k is needed in 
the computation. 

The tangential Delaunay complex is not a 
triangulated manifold in general though due to 
some inconsistencies. For example, if o is a 
simplex in star(p), it is not necessarily true that 
o is in star(q) for another vertex g of o. Such 
inconsistencies can be removed by assigning 
weights to the points in S appropriately as in 
the case of eliminating slivers from the cocone 
complex. 


Theorem 3 ((2]) The guarantees in Theorem 1 
can be obtained by an algorithm that runs in 
O(dn? + d29)n) time, where n is the number 
of data points. 


More theoretical guarantees are provided 
in [2]. 
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Implicit Function 

The complexes in the previous methods are either 
very large or not so easy to compute in practice. 
An alternative approach is to approximate M by 
the zero-set of an implicit function g : R? > 
R?¢-* that is defined using the data points in S, 
assuming that S is a uniform €-sample: 


g(x) = D> w(x, p)- Bh x (x — p), 


pes 


* Let c > 3. Define w(x, p) = y(d(x, p))/ 
V pes Y(d(x,q)), where 


ez)” (22 +E 1) , if se[0, kce], 
0, if s>kce. 


y(sj=} > Fe 


Notice that g(x) depends only on the points in 
S within a distance kce from x. 

* For every point p € S, let Tp bead xk 
matrix with orthonormal columns such that its 
column space is an approximate tangent space 
at p with angular error O(e). For every point 
x € R%, let Cy = Dpes CX, P) TpT,,- 
Therefore, the space L, spanned by the eigen- 
vectors of Cy corresponding to the smallest 
d — k eigenvalues is a “weighted average” 
of the approximate normal spaces at the data 
points. Define By,x to be ad x (d —k) matrix 
with linearly independent columns such that 
its column space is Lx. It turns out the zero- 
set of g is independent of the choices of Bg. 


The weight function w makes local recon- 
struction possible without a complete sampling 
of M. Moreover, the construction of g is com- 
putational less intensive than the construction of 
a complex. The following guarantees are offered. 


Theorem 4 ([5]) Let M be the set of points at 
distance €* or less from M for any fixed T € 
(1,2). Let Sg denote the zero-set of py. Let v 
denote the map that sends a point in R@ to the 
nearest point in M. 


¢ For a small enough €, the restriction of v to 
SgQM is a homeomorphism between SgNM 
and M. 
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1s 


For every x € SgM M, the angle between the 


normal space of M at v(x) and the normal 
space of Sg at x is O(e%-)/2), 


A provably good iterative projection operator 
also known for ¢ [9]. 
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Problem Definition 


This chapter studies market games for their per- 
formance and convergence of the equilibrium 
points. The main application is the content dis- 
tribution in cellular networks in which a service 
provider needs to provide data to users. The 
service provider can use several cache locations 
to store and provide the data. The assumption is 
that cache locations are selfish agents (resident 
subscribers) who want to maximize their own 
profit. Most of the results apply to a general 
framework of monotone two-sided markets. 


Uncoordinated Two-Sided Markets 

Various economic interactions can be modeled as 
two-sided markets. A two-sided market consists 
of two disjoint groups of agents: active agents 
and passive agents. Each agent has a preference 
list over the agents of the other side, and can 
be matched to one (or many) of the agents in 
the other side. A central solution concept to 
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these markets are stable matchings, introduced 
by Gale and Shapley [5]. It is well known that 
stable matchings can be achieved using a central- 
ized polynomial-time algorithm. Many markets, 
however, do not have any centralized match- 
ing mechanism to match agents. In those mar- 
kets, matchings are formed by actions of self- 
interested agents. Knuth [9] introduced unco- 
ordinated two-sided markets. In these markets, 
cycles of better or best responses exist, but ran- 
dom better response and best response dynamics 
converge to a stable matching with probability 
one [2, 10, 14]. Our model for content distri- 
bution corresponds to a special class of unco- 
ordinated two-sided markets that is called the 
distributed caching games. 

Before introducing the distributed caching 
game as an uncoordinated two-sided market, 
the distributed caching problem and some game 
theoretic notations are defined. 


Distributed Caching Problem 

Let U be a set of n cache locations with given 
available capacities A; and given available band- 
widths B; for each cache location 7. There are k 
request types; (Request type can be thought of as 
different files that should be delivered to clients.) 
each request type t has a size a, (1 <t <k). 
Let H be a set of m requests with a reward Rj, 
a required bandwidth b;, a request type ¢; for 
each request j, and a cost cj for connecting each 
cache location i to each request j. The profit 
of providing request 7 by cache location i is 
Sij = Rj —cij. A cache location i can service 
a set of requests S;, if it satisfies the band- 
width constraint: es; b; < B;, and the capac- 
ity constraint: DirettsLiess} a; < A; (this means 
that the sum of the sizes of the request types 
of the requests in cache location i should be 
less than or equal to the available capacity of 
cache location i). A set S; of requests is feasible 
for cache location i if it satisfies both of these 
constraints. The goal of the DCP problem is to 
find a feasible assignment of requests to cache 
locations to maximize the total profit; i.e., the 
total reward of requests that are provided minus 
the connection costs of these requests. 
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Strategic Games 

A. strategic game G is defined as a tuple 
GU, {F;|i € U}, {a;O|i ¢€ U}) where 
(i) U is the set of n players or agents, (ii) Fj 
is a family of feasible (pure) strategies or 
actions for player i and (iii) aj : Wiceu Fi > 
Rt U {0} is the (private) payoff or utility 
function for agent i, given the set of strategies 
of all players. Player i’s strategy is denoted 
by s; € Fj. A strategy profile or a (strategy) 
state, denoted by S = (51,52,...,5n), 18 
a vector of strategies of players. Also let 
SOs) 2 Giiese Sot 5) ets es 
Best-Response Moves 

In a non-cooperative game, each agent wishes 
to maximize its own payoff. For a strategy 
profile S = (s1,52,...,8,), a better response 
move of player i is a strategy s; such that 
aj(S ®s') > aj;(S). In a strict better response 
move, the above inequality is strict. Also, for 
a strategy profile S = (s1,52,...,5n) a best 
response of player i in S is a better response 
move s;* € F; such that for any strategy s; € F;, 
a(S @ s*) > aj (S ® s;). 


Nash Equilibria 

A pure strategy Nash equilibrium (PSNE) of 
a strategic game is a strategy profile in which each 
player plays his best response. 


State Graph 

The state graph, D = (Ff, E), of a strategic game 
G, is an arc-labeled directed graph, where the 
vertex set F corresponds to the set of strategy 
profiles or states in G, and there is an arc from 
state S to state S’ with label iif the only difference 
between S and S’ is in the strategy of player i; and 
player i plays one of his best responses in strategy 
profile S '. A best-response walk is a directed 
walk in the state graph. 


Price of Anarchy 

Given a strategic game, G(U, { F;|i € U}, 

{a(|i € U}), and a maximization social function 
y : icy F; — R, the price of anarchy, denoted 
by poa(G,y), is the worst ratio between the 
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social value of a pure Nash equilibrium and the 
optimum. 


Distributed Caching Games 

The distributed caching game can be formalized 
as a two-sided market game: active agents 
correspond to n resident subscribers or cache 
locations, and passive agents correspond to m 
requests from transit subscribers. Formally, given 
an instance of the DCP problem, a strategic 
game G(U,{F;|i ¢ U},{a;|i ¢ U}) is defined 
as follows. The set of players (or active agents) 
U is the set of cache locations. The family of 
feasible strategies F; of a cache location i is 
the family of subsets s; of requests such that 
ee b; < B; and Lets; |jes;} 4 < A;. Given 
a vector S = (51,52,...,5n) of strategies of 
cache locations, the favorite cache locations 
for request j, denoted by FAV(/), is the set of 
cache locations i such that j € s; and fj has the 
maximum profit among the cache locations that 
have request in their strategy set,ie., fi; > firj 
for any i’ such that j € sj’. For a strategy profile 
S=(s1,-.-58n) U(S)=Q js crav(y Si /IFAVG)I- 
Intuitively, the above definition implies that 
the profit of each request goes to the cache 
locations with the minimum connection cost (or 
equivalently with the maximum profit) among the 
set of cache locations that provide this request. If 
more than one cache location have the maximum 
profit (or minimum connection cost) for a request 
j, the profit of this request is divided equally 
between these cache locations. The payoff of 
a cache location is the sum of profits from the 
requests it actually serves. A player i serves 
a request j if i € FAV(j). The social value of 
strategy profile S, denoted by y(S), is the sum of 
profits of all players. This value y(S) is a measure 
of the efficiency of the assignment of requests 
and request types to cache locations. 


Special Cases 

In this paper, the following variants and special 
cases of the DCP problem are also studied: 
The CapDCP problem is a special case of 
DCP problem without bandwidth constraints. 
The BanDCP problem is a special case of 
DCP problem without capacity constraints. 
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In the uniform BanDCP problem, the bandwidth 
consumption of all requests is the same. In the 
uniform CapDC problem, the size of all request 
types is the same. 


Many-to-One Two-Sided Markets with Ties 

In the distributed caching game, active and pas- 
sive agents correspond to cache locations and 
requests respectively. The set of feasible strate- 
gies for each active agent correspond to a set of 
solutions to a packing problem. Moreover, the 
preferences of both active and passive agents is 
determined from the profit of requests to cache 
locations. In many-to-one two-sided markets, the 
preference of passive and active agents as well as 
the feasible family of strategies are arbitrary. The 
preference list of agents may have ties as well. 


Monotone and Matroid Markets 

In monotone many-to-one two-sided markets, the 
preferences of both active and passive agents 
are determined based on payoffs p;; = pj; for 
each active agent i and passive agent j (similar 
to the DCP game). An agent i prefers j to j’ 
if pij > pi. In matroid two-sided markets, the 
feasible set of strategies of each active agent is the 
set of independent sets of a matroid. Therefore, 
uniform BanDCP game is a matroid two-sided 
market game. 


Key Results 


In this section, the known results for these prob- 
lems are summarized. 


Centralized Approximation Algorithm 

The distributed caching problem generalizes the 
multiple knapsack problem and the generalized 
assignment problem [3] and as a result is an APX- 
hard problem. 


Theorem 1 ([4]) There exists a linear program- 
ming based | — +-approximation algorithm and 
a local search 5-approximation algorithm for the 


DCP problem. 
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The 1—- +-approximation for this problem 
is based on rounding an exponentially large 
configuration linear program [4]. On _ the 
basis of some reasonable complexity theoretic 
assumptions, this approximation factor of 1 — + 
is tight for this problem. More formally, 


Theorem 2 ((4]) For any € > 0, there exists no 
1- 4 — €-approximation algorithm for the DCP 
problem unless NP © DTIME (nO ss”), 


Price of Anarchy 

Since the DCP game is a strategic game, it 
possesses mixed Nash equilibria [12]. The DCP 
game is a valid-utility game with a submodular 
social function as defined by Vetta [16]. This 
implies that the performance of any mixed Nash 
equilibrium of this game is at least 5 of the 
optimal solution. 


Theorem 3 ([4, 11]) The DCP game is a valid- 
utility game and the price of anarchy for mixed 
Nash equilibria is 5. Moreover, this result holds 
for all monotone many-to-one two-sided markets 
with ties. 


A direct proof of the above price of anarchy 
bound for the DCP game can be found in [11]. 


Pure Nash Equilibria: Existence 

and Convergence 

This part surveys known results for existence and 
convergence of pure Nash equilibria. 


Theorem 4 ([{11]) There are instances of the 
IBDC game that have no pure Nash equilibrium. 


Since, IBDC is a special case of CapDCP, the 
above theorem implies that there are instances 
of the CapDCP game that have no pure Nash 
equilibrium. In the above theorem, the bandwidth 
consumption of requests are not uniform, and 
this was essential in finding the example. The 
following gives theorems for the uniform variant 
of these games. 


Theorem 5 ({1, 11]) Any instance of the uni- 
form BanDCP game does not contain any cycle 
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of strict best-response moves, and thus possess 
a pure Nash equilibrium. On the other hand, there 
are instances of the uniform CapDCP game with 
no pure Nash equilibria. 


The above result for the uniform BanDCP game 
can be generalized to matroid two-sided markets 
with ties as follows. 


Theorem 6 ([1]) Any instance of the monotone 
matroid two-sided market game with ties is a po- 
tential game, and possess pure Nash equilib- 
ria. Moreover, any instance of the matroid two- 
sided market game with ties possess pure Nash 
equilibria. 


Convergence Time to Equilibria 

This section proves that there are instances of the 
uniform CapDCP game in which finding a pure 
Nash equilibrium is PLS-hard [8]. The definition 
of PLS-hard problems can be found in papers by 
Yannakakis et al. [8, 15]. 


Theorem 7 ({11]) There are instances of the 
uniform CapDCP game with pure Nash equi- 
libria (It is also possible to say that finding a 
sink equilibrium is PLS-hard. A sink equilibrium 
is a set of strategy profiles that is closed un- 
der best-response moves. A pure equilibrium is 
a sink equilibrium with exactly one profile. This 
equilibrium concept is formally defined in [7].) 
for which finding a pure Nash equilibrium is 
PLS-hard. 


Using the above proof and a result of Schaffer 
and Yannakakis [13, 15], it is possible to show 
that in some instances of the uniform CapDCP 
game, there are states from which all paths of best 
responses have exponential length. 


Corollary 1 ({11]) There are instances of the 
uniform CapDCP game that have pure Nash 
equilibria with states from which any sequence 
of best-response moves to any pure Nash equi- 
librium (or sink equilibrium) has an exponential 
length. 


The above theorems show exponential conver- 
gence to pure Nash equilibria in general DCP 
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games. For the special case of the uniform 
BanDCP game, the following is a positive result 
for the convergence time to equilibria. 


Theorem 8 ((2]) The expected convergence time 
of a random best-response walk to pure Nash 
equilibria in matroid monotone two-sided mar- 
kets (without ties) is polynomial. 


Since the uniform BanDCP game is a special 
case of matroid monotone two-sided markets 
with ties, the above theorem indicates that for 
the BanDCP game with no tie in the profit 
of requests, the convergence time of a random 
best-response walk is polynomial. Finally, we 
state a theorem about the convergence time of 
the general (non-monotone) matroid two-sided 
market games. 


Theorem 9 ((2]) Jn the matroid two-sided mar- 
kets (without ties), a random best response dy- 
namic of players may cycle, but it converges to 
a Nash equilibrium with probability one. How- 
ever, it may take exponential time to converge to 
a pure Nash equilibrium. 


Pure Nash equilibria of two-sided market 
games correspond to stable matchings in two- 
sided markets and vice-versa [2]. The fact that 
better response dynamics of players in two-sided 
market games may cycle, but will converge to 
a stable matching has been proved in [9, 14]. 
Ackermann et al. [2] extend these results for best- 
response dynamics, and show an exponential 
lower bound for expected convergence time to 
pure Nash equilibria. 


Applications 


The growth of the Internet, the World Wide Web, 
and wide-area wireless networks allow an in- 
creasing number of users to access vast amounts 
of information in different geographic areas. As 
one of the most important functions of the service 
provider, content delivery can be performed by 
caching popular items in cache locations close 
to the users. Performing such a task in a decen- 
tralized manner in the presence of self-interested 
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entities in the system can be modeled as an 
uncoordinated two-sided market game. 

The 3G subscriber market can be categorized 
into groups with shared interest in location-based 
services, e.g., the preview of movies in a theater 
or scenes of the mountain nearby. Since the 3G 
radio resources are limited, it is expensive to 
repeatedly transmit large quantities of data over 
the air interface from the base station (BS). It 
is more economical for the service provider to 
offload such repeated requests on to the ad-hoc 
network comprised of its subscribers where some 
of them recently acquired a copy of the data. In 
this scenario, the goal for the service provider 
is to give incentives for peer subscribers in the 
system to cache and forward the data to the 
requesting subscribers. Since each data item is 
large in size and transit subscribers are mobile, 
we assume that the data transfer occurs in a close 
range of a few hops. 

In this setting, envision a system consisting 
of two groups of subscribers: resident and transit 
subscribers. Resident subscribers are less mo- 
bile and mostly confined to a certain geograph- 
ical area. Resident subscribers have incentives to 
cache data items that are specific to this geo- 
graphical region since the service provider gives 
monetary rewards for satisfying the queries of 
transit subscribers. Transit subscribers request 
their favorite data items when they visit a partic- 
ular region. Since the service provider does not 
have knowledge of the spatial and temporal distri- 
bution of requests, it is difficult if not impossible 
for the provider to stipulate which subscriber 
should cache which set of data items. Therefore, 
the decision of what to cache is left to each indi- 
vidual subscriber. The realization of this content 
distribution system depends on two main issues. 
First, since subscribers are selfish agents, they 
may act to increase their individual payoff and 
decrease the performance of the system. Here, we 
provide a framework for which we can prove that 
in an equilibrium situation of this framework, we 
use the performance of the system efficiently. The 
second issue is that the payoff of each request 
for each agent must be a function of the set of 
agents that have this request in their strategy, 
since these agents compete on this request and 


1194 


the profit of this request should be divided among 
these agents in an appropriate way. Therefore, 
each selfish agent may change the set of items 
it cached in response to the set of items cached 
by others. This model leads to a non-cooperative 
caching scenario that can be modeled on a two- 
sided market game, studied and motivated in the 
context of market sharing games and distributed 
caching games [4, 6, 11]. 


Open Problems 


It is known that there exist instances of the dis- 
tributed caching game with no pure Nash equilib- 
ria. It is also known that best response dynamics 
of players may take exponential time to converge 
to pure Nash equilibria. An interesting question 
is to study the performance of sink equilibria [7, 
11] or the price of sinking [7, 11] for these 
games. The distributed caching game is a valid- 
utility game. Goemans, Mirrokni, and Vetta [7] 
show that despite the price of anarchy of 5 
for valid-utility games, the performance of sink 
equilibria (or price of sinking) for these games 
is 1, We conjecture that the price of sinking 
for DCP games is a constant. Moreover, it is 
interesting to show that after polynomial rounds 
of best responses of players the approximation 
factor of the solution is a constant. We know 
that one round of best responses of players is 
not sufficient to get constant-factor solutions. It 
might be easier to show that after a polynomial 
number of random best responses of players, the 
expected total profit of players is at least a con- 
stant factor of the optimal solution. Similar pos- 
itive results for sink equilibria and random best 
responses of players are known for congestion 
games [7, 11]. 

The complexity of verifying if a given state 
of the distributed caching game is in a sink 
equilibrium or not is an interesting question to 
explore. Also, given a distributed caching game 
(or a many-to-one two-sided market game), an 
interesting problem is to check if the set of all 
sink equilibria is pure Nash equilibria or not. 
Finally, an interesting direction of research is 
to classify classes of two-sided market games 
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for which pure Nash equilibria exists or best- 
response dynamics of players converge to a pure 
Nash equilibrium. 
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Problem Definition 


Let G = (V, E) be an undirected graph onn = 
|V | vertices andm = |E| edges. A matching in G 
is a set of edges M C E such that no two edges 
in M share any vertex. Matching has been one 
of the most well-studied problems in algorithmic 
graph theory for decades [4]. A matching M is 
called maximum matching if the number of edges 
in M is maximum. The fastest known algorithm 
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for maximum matching, due to Micali and Vazi- 
rani [5], runs in O(m./n). A matching is said to 
be maximal if it is not strictly contained in any 
other matching. It is well known that a maximal 
matching achieves a factor 2 approximation of the 
maximum matching. 


Key Result 


We address the problem of maintaining maxi- 
mal matching in a fully dynamic environment — 
allowing updates in the form of both insertion 
and deletion of edges. Ivkovié and Llyod [3] 
designed the first fully dynamic algorithm for 
maximal matching with O((n + m)°-7°77) update 
time. In this entry, we present a fully dynamic 
algorithm for maximal matching that achieves 
O(logn) expected amortized time per update. 


Ideas Underlying the Algorithm 


We begin with some terminologies and notations 
that will facilitate our description and also pro- 
vide some intuition behind our approach. Let 
M denote a matching in the given graph at any 
instant — an edge (u, v) € M is called a matched 
edge where u is referred to as a mate of v and 
vice versa. An edge in E\.M is an unmatched 
edge. A vertex x is matched if there exists an edge 
(x,y) € M; otherwise it is free or unmatched. 
In order to maintain a maximal matching, it 
suffices to ensure that there is no edge (u, v) in 
the graph such that both u and v are free with 
respect to the matching M. Therefore, a natural 
technique for maintaining a maximal matching 
is to keep track of each vertex if it is matched 
or free. When an edge (u, v) is inserted, we add 
(u, v) to the matching if u and v are free. For the 
case when an unmatched edge (u,v) is deleted, 
no action is required. Otherwise, for both u and v, 
we search their neighborhoods for any free vertex 
and update the matching accordingly. It follows 
that each update takes O(1) computation time ex- 
cept when it involves deletion of a matched edge; 
in this case the computation time is of the order 
of the sum of the degrees of the two endpoints of 
the deleted edge. So this trivial algorithm is quite 
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efficient for small degree vertices, but could be 
expensive for large degree vertices. An alternate 
approach could be to match a free vertex u with 
a randomly chosen neighbor, say v. Following 
the standard adversarial model, it can be ob- 
served that an expected deg(u)/2 edges incident 
to u will be deleted before deleting the matched 
edge (u,v). So the expected amortized cost per 


. : deg (u)+deg(v) 
edge deletion for u is roughly o( sees). 


If deg(v) < deg(u), this cost is O(1). But if 
deg(v) >> deg(u), then it can be as bad as the 
trivial algorithm. To circumvent this problem, we 
introduce an important notion, called ownership 
of edges. Intuitively, we assign an edge to that 
endpoint which has higher degree. 

The idea of choosing a random mate and the 
trivial algorithm described above can be com- 
bined together to design a simple algorithm for 
maximal matching. This algorithm maintains a 
partition of the vertices into two levels. Level 
0 consists of vertices which own fewer edges, 
and we handle the updates there using the triv- 
ial algorithm. Level | consists of vertices (and 
their mates) which own larger number of edges, 
and we use the idea of random mate to handle 
their updates. This 2 — LEVEL algorithm achieves 
O(./n) expected amortized time per update. A 
careful analysis of the 2 — LEVEL algorithm 
suggests that a finer partition of vertices could 
help in achieving a faster update time. This leads 
to our log,n — LEVEL algorithm that achieves 
expected amortized O(log) time per update. 

Our algorithm uses randomization very cru- 
cially in order to handle the updates efficiently. 
The matching maintained (based on the random 
bits) by the algorithm at any stage is not known 
to the adversary for it to choose the updates 
adaptively. This oblivious adversarial model is 
no different from randomized data structures like 
universal hashing. 


The 2-LEveEL Algorithm 


The algorithm maintains a partition of the set 
of vertices into two levels. Each edge present in 
the graph will be owned by one or both of its 
endpoints as follows. If both the endpoints of an 
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edge are at level 0, then it is owned by both of 
them. Otherwise it will be owned by exactly that 
endpoint which lies at a higher level. If both the 
endpoints are at level 1, the tie will be broken 
suitably by the algorithm. Let O, denote the set 
of edges owned by a vertex u at any instant of the 
algorithm. With a slight abuse of the notation, we 
will also use O,, to denote {v|(u, v) € O,}. As the 
algorithm proceeds, the vertices will make transi- 
tion from one level to another and the ownership 
of edges will also change accordingly. 

The algorithm maintains the following three 
invariants after each update: 


1. Every vertex at level | is matched. Every free 
vertex at level 0 has all its neighbors matched. 

2. Every vertex at level 0 owns less than ./n 
edges at any stage. 

3. Both endpoints of every matched edge are at 
the same level. 


It follows from the first invariant that the 
matching M is maximal at each stage. The sec- 
ond and third invariants help in incorporating the 
two ideas of our algorithm efficiently. 


Handling Insertion of an Edge 

Let (u,v) be the edge being inserted. If either u 
or v are at level 1, there is no violation of any 
invariant. However, if both u and v are at level 0, 
then we proceed as follows. Both u and v become 
the owner of the edge (u, v). If u and v are free, 
then we add (u, v) to M. Notice that the insertion 
of (u,v) also leads to increase of |O,| and |O,| 
by one and so may lead to violation of Invariant 
2. We process the vertex that owns more edges; 
let u be that vertex. If |O,| = ./n, then Invariant 
2 has got violated. In order to restore it, u moves 
to level 1 and gets matched to some vertex, say 
y, selected uniformly at random from O,.. Vertex 
y also moves to level | to satisfy Invariant 3. If 
w and x were, respectively, the earlier mates of u 
and y at level 0, then the matching of u with y has 
rendered w and x free. Both w and x search for 
free neighbors at level 0 and update the matching 
accordingly. It is easy to observe that in all these 
cases, it takes O(./n) time to handle an edge 
insertion. 
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Handling Deletion of an Edge 

Let (u,v) be an edge that is deleted. If (u,v) ¢ 
M, all the invariants are still valid. Let us con- 
sider the more important case of (u, v) € M —the 
deletion of (u,v) has caused u and v to become 
free. Therefore, the first invariant might have got 
violated for u and v. If edge (u,v) was at level 
0, then both u and v search for a free neighbor 
and update the matching accordingly. This takes 
O(./n) time. If edge (u, v) was at level 1, then u 
(similarly v) is processed as follows. 

First, u disowns all its edges whose other 
endpoint is at level 1. If |O,| is still greater than 
or equal to ./n, then u stays at level 1 and selects 
a random mate from O,. However, if |O,| has 
fallen below ./n, then u moves to level 0 and 
gets matched to a free neighbor (if any). For each 
neighbor of u at level 0, the transition of u from 
level | to 0 is, effectively, like insertion of a new 
edge. This transition leads to an increase in the 
number of owned edges by each neighbor of u 
at level 0. As a result, the second invariant for 
each such neighbor at level 0 may get violated if 
the number of edges it owns now becomes 4/7. 
To take care of these scenarios, we proceed as 
follows. We scan each neighbor of wu at level 0, 
and for each neighbor w, with |O,,| = ./n, a mate 
is selected randomly from ©,, and w is moved to 
level 1 along with its mate. This concludes the 
deletion procedure of edge (u, v). 


Analysis of the Algorithm 
It may be noted that, unlike insertion, the deletion 
of an edge could potentially lead to moving of 
many vertices from level 0 to | and this may in- 
volve significant computation. However, we will 
show that the expected amortized computation 
per update is O(./7). 

We analyze the algorithm using the concept of 
epochs. 


Definition 1 At any time f, let (u, v) be any edge 
in M. Then the epoch of (u,v) at time f is the 
maximal time interval containing ¢ during which 


(u,v) Ee M. 


The entire life span of an edge (u,v) can 
be viewed as a sequence of epochs when it 
is matched, separated by periods when it is 
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unmatched. Any edge update that does not 
change the matching is processed in O(1) time. 
An edge update that changes the matching results 
in the start of new epoch(s) or the termination of 
some existing epoch(s). And it is only during 
the creation or termination of an epoch that 
significant computation is involved. For the 
purpose of analyzing the update time (when 
matching is affected), we assign the computation 
performed to the corresponding epochs created or 
terminated. It is easy to see that the computation 
associated with an epoch at level 0 is O(./n). The 
computation associated with an epoch at level 1 is 
of the order of sum of the degrees of the endpoints 
of the corresponding matched edge which may 
be Q(n). When a vertex moves from level 0 to 1, 
although it owns ./n edges, this may grow later 
to O(n). So the computation associated with an 
epoch at level 1 can be quite high. We will show 
that the expected number of such epochs that get 
terminated during any arbitrary sequence of edge 
updates will be relatively small. The following 
lemma plays a key role. 


Lemma 1 The deletion of an edge (u, v) at level 
1 terminates an epoch with probability < 1/./n. 


Proof The deletion of edge (u,v) will lead to 
termination of an epoch only if (u,v) € M. If 
edge (u,v) was owned by u at the time of its 
deletion, note that u owned at least ./n edges at 
the moment of start of its epoch. Since u selected 
its matched edge uniformly at random from these 
edges, the (conditional) probability is Tr The 
same argument applies if v was the owner, so 
(u,v) is a matched edge at the time of deletion 
of (u, v) with probability at most 1/./n. Oo 


Consider any sequence of m edge updates. 
We analyze the computation associated with all 
the epochs that get terminated during these m 
updates. It follows from Lemma | and the lin- 
earity of expectation that the expected number 
of epochs terminated at level 1 will be m/./n. 
As discussed above, computation associated with 
each epoch at level 1 is O(n). So the expected 
computation associated with the termination of 
all epochs at level 1 is O(m./n). The number of 
epochs destroyed at level 0 is trivially bounded by 
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O(m). Each epoch at level 0 has O(./n) compu- 
tation associated with it, so the total computation 
associated with these epochs is also O(m./n). 
We conclude the following. 


Theorem 1 Starting with a graph on n vertices 
and no edges, we can maintain maximal match- 
ing for any sequence of m updates in expected 


O(m./n) time. 


The log, m — LeveL Algorithm 

The key idea for improving the update time lies 
in the second invariant of our 2 — LEVEL algo- 
rithm. Let a(n) be the threshold for the maximum 
number of edges that a vertex at level 0 can own. 
Consider an epoch at level 1 associated with some 
edge, say (u, v). The computation associated with 
this epoch is of the order of the number of edges u 
and v own which can be ©() in the worst case. 
However, the expected duration of the epoch is 
of the order of the minimum number of edges u 
can own at the time of its creation, 1.e., O(a(n)). 
Therefore, the expected amortized computation 
per edge deletion at level 1 is O(n/a(n)). Bal- 
ancing this with the a(7) update time at level 0 
yields a(n) = /n. 

In order to improve the running time of our 
algorithm, we need to decrease the ratio between 
the maximum and the minimum number of edges 
a vertex can own during an epoch at any level. It is 
this ratio that determines the expected amortized 
time per edge deletion. This observation leads us 
to a finer partitioning of the ownership classes. 
When a vertex creates an epoch at level 7, it 
owns at least 2' edges, and during the epoch, 
it is allowed to own at most 2'+! — 1 edges. 
As soon as it owns 2'+! edges, it migrates to a 
higher level. Notice that the ratio of maximum 
to minimum edges owned by a vertex during an 
epoch gets reduced from ./n to a constant leading 
to about log, 7 levels. Though the log, n LEVEL 
algorithm can be seen as a natural generalization 
of our 2 — LEVEL algorithm, there are many in- 
tricacies that make the algorithm and its analysis 
quite involved. For example, a single edge update 
may lead to a sequence of falls and rise of many 
vertices across the levels of the data structure. 
Moreover, there may be several vertices trying 
to fall or rise at any time while processing an 
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update. Taking a top-down approach in process- 
ing these vertices simplifies the description of the 
algorithm. The analysis of the algorithm becomes 
easier when we analyze each level separately. 
This analysis at any level is quite similar to the 
analysis of LEVEL—1 in our 2—LEVEL algorithm. 
We recommend the interested reader to refer 
to the journal version of this paper in order to 
fully comprehend the algorithm and its analysis. 
The final result achieved by our log, n — LEVEL 
algorithm is stated below. 


Theorem 2 Starting with a graph on n vertices 
and no edges, we can maintain maximal match- 
ing for any sequence of m updates in expected 
O(m logn) time. 


Using standard probability tools, it can be 
shown that the bound on the update time as stated 
in Theorem 2 holds with high probability, as well 
as with limited independence. 


Open Problems 


There have been new results on maintaining ap- 
proximate weighted matching [2] and (1 + e)- 
approximate matching [1, 6] fore < 1. The 
interested reader should study these results. For 
any € < 1, whether it is possible to maintain (1 + 
€)-approximate matching in poly-logarithmic up- 
date time is still an open problem. 
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Problem Definition 


The study of matching market equilibrium was 
initiated by Shapley and Shubik [13] in an assign- 
ment model. A classical instance of the matching 
market involves a set B of n unit-demand buyers 
and a set Q of m indivisible items, where each 
buyer wants to buy at most one item and each 
item can be sold to at most one buyer. Each 
buyer 7 has a valuation vj; > O for each item 
j, tepresenting the maximum amount that 7 is 
willing to pay for item 7. Each item j has a 
reserve price rj > 0, below which it won’t be 
sold. Without loss of generality, one can assume 
there is a null item whose value is zero to all 
buyers and whose price is always zero. 

An output of the matching market is a tuple 
(p, x), where p = (/1,..., Pm) = 0 is a price 
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vector with p; denoting the price charged for 
item j and x = (xj,...,X,) = 0 is an allocation 
vector with x; denoting the item that 7 wins. If 
i does not win any item, x; = @. An output 
is essentially a bipartite matching with monetary 
transfers between the matched parties (buyers and 
items). A feasible price vector is any vector p > 
r = (r,...r%m). Given an outcome (p, x), the 
utility (payoff) to a buyer i who gets item j and 
pays p; is uj(px) = v;; — p; (assume linear 
surplus) and u;(px) = 0 if i gets nothing. At 
price p, the demand set for buyer i is Dj(p) = 
{J € arg maxj(viz — pj) [viz — pj = 0} 

A tuple (p, x) is called a competitive equilib- 
rium if: 


¢ For any item j, pj; = r; if no one wins / in 
allocation x. 

¢ If buyer? wins item j (x; = /), then j € D; 
(p). 

¢ If buyer i does not win any item (x; = 9), 
then for every item j, vij — pj <0. 


The first condition above is a market clearance 
(efficiency) condition, which says that all 
unallocated items are priced at the given reserve 
prices. The second and third conditions ensure 
envy-freeness (fairness), implying that each 
buyer is allocated with an item that maximizes 
his utility at these prices. In a market competitive 
equilibrium, all items with prices higher than 
the reserve prices are sold out and everyone 
gets his maximum utility at the corresponding 
allocation. 

An outcome (p, x) is called a minimum 
competitive equilibrium if it is a competitive 
equilibrium and for any other competitive 
equilibrium (p’,x’), pj < p’; for every item 
j. It represents the interests of all buyers in terms 
of their total payment. Similarly, an outcome (p, 
x) is called a maximum competitive equilibrium 
if it is a competitive equilibrium and for any 
other competitive equilibrium (p’, x’), pj > p’ 
for every item j. It represents the interests of 
all sellers in terms of total payment received. 
Maximum and minimum equilibria represent the 
contradictory interests of the two parties in a 
two-sided matching market at the two extremes. 
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The solution concept of competitive equilib- 
rium is closely related to the well-established 
stability solution concept initiated by Gale and 
Shapley [9] in pure two-sided matching markets 
without money. To define stability in a multi- 
item matching market with money, we need to 
have a definition of preferences. Buyers prefer 
items with larger utility (vj; — p;), and sellers 
(items) prefer buyers with larger payment. Given 
an outcome (p, x), where p; is the price of item 
j and x; is the allocation of buyer 7, we say 
(i, j) is a blocking pair if there is p’, such that 
D’ > pj; (item j can receive more payment) 
and the utility that 7 obtains in (p, x) is less than 
vij — p; (by payment p’, to item j buyeri can get 
more utility). An outcome (p, x) is stable if it has 
no blocking pairs, that is, no unmatched buyer- 
seller pair can mutually benefit by trading with 
each other instead of their current partners. 


Key Results 


For any given matching market, Shapley and 
Shubik [13] formulate an efficient matching mar- 
ket outcome as a linear program of maximizing 
the social welfare of the allocation. The duality 
theorem then shows the existence of competi- 
tive equilibrium. They also prove that there is a 
unique minimum (maximum) equilibrium price 
vector. 


Theorem 1 (Shapley and Shubik [13]) A 
matching market competitive equilibrium always 
exits. The set of competitive equilibrium price 
vectors p form a lattice in R2. 


Shapley and Shubik [13] also establish the 
connections between stability and competitive 
equilibrium. 


Theorem 2 (Shapley and Shubik [13]) in 
a multi-item market, an outcome (p,x) is a 
competitive equilibrium if and only if it is stable. 


However, they did not define an adjustment 
process like the deferred-acceptance algorithm by 
Gale and Shapley [9]. Crawford and Knoer [5] 
study a more generalized setup with firms and 
workers, where firms can hire multiple workers 
while each worker can be employed by at most 
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one firm. It is essentially a many-to-one matching 
framework, where firms can be considered as 
buyers with multi-unit demand while buyers can 
be considered as items. They describe a salary 
adjustment process, which is essentially a version 
of the deferred-acceptance algorithm. They also 
show that for arbitrary capacities of the firms 
(buyers), when ties are ruled out, it always con- 
verges to a minimum competitive equilibrium. 
They provide an alternative proof of Shapley and 
Shubik’s [13] result and allow significant gener- 
alization of it. However, the analysis in Crawford 
and Knoer [5] is flawed by two unnecessarily 
restrictive assumptions. Later, Kelso and Craw- 
ford [10] relax these assumptions and propose 
a modification to the salary adjustment process, 
which works as follows (firms are essentially 
buyers with multi-unit demand and workers are 
the items): 


Salary Adjustment Process 
(Kelso-Crawford [10]) 


1. Firms begin by facing a set of 
initially very low salaries 
(reserve prices). 

2. Firms make offers to their most 
preferred set of workers. Any offer 
previously made by a firm toa 
worker that was not rejected must 
be honored. 

3. Each worker who receives one or 
more offers rejects all but his 
favorite, which he tentatively 
accepts. 

4. Offers not rejected in previous 
periods remain in force. For 
each rejected offer a firm made, 
increase the feasible salary 
for the rejecting worker. Firms 
continue to make offers to their 
favorite sets of workers. 

5. The process stops when no 
rejections are made. Workers then 
accept the offers that remain in 
force from the firms they have not 
rejected 


An important assumption underlying the 
algorithm is that workers are gross substitutes 
from the standpoint of the firm. That is, when 
the price of one worker goes up, demand for 
another worker should not go down. As the salary 
adjustment process goes on, for firms, the set of 
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feasible offers is reduced as some offers are 
rejected. For workers, the set of offers grows 
as more offers come up. Therefore, there is a 
monotonicity of offer sets in opposite directions. 
Alternatively, we can have a worker-proposing 
algorithm, where workers all begin by offering 
their services at the highest possible wage at their 
most preferred firm. 


Theorem 3 (Kelso and Crawford [10]) The 
salary adjustment process converges to a 
minimum competitive equilibrium, provided that 
all workers are gross substitutes from each firm’s 
standpoint. In other words the final competitive 
equilibrium allocation is weakly preferred by 
every firm to any other equilibrium allocation. 


Taking a different perspective, Demange, 
Gale, and Sotomayor [7] propose an ascending 
auction-based algorithm (called “exact auction 
mechanism” in their original paper) that 
converges to a minimum competitive equilibrium 
for the original one-to-one matching problem. It 
is a variant of the so-called Hungarian method 
by Kuhn [11] for solving the optimal assignment 
problem. The algorithm works as the following: 


Exact Auction Mechanism 
(Demange-Gale-Sotomayor 


[7] ) 


1. Assume (for simplicity) that 
valuations vj; are all integers. 

2. Set the initial price vector, p®, 
to the reserve prices, p)=r 

3. At round t when current prices 
are p’, each buyer i declares his 
demand set, Dj,(p'). 

4. If there is no over-demanded set 
of items, terminate the process. 

A market equilibrium at prices 

p' exists, and a corresponding 
allocation can be found by maximum 
matching. 

5. If there exits over-demanded 
set(s), find a minimal 
over-demanded set S$, and for all 
pe S, pit) = p41. set t=t4+1, 
and go to step 3. 


Theorem 4 (Demange, Gale, and Sotomayor 
[7]) The exact auction mechanism always 
finds a competitive equilibrium. Moreover, the 
equilibrium it finds is the minimum competitive 
equilibrium. 
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This auction outcome can be computed effi- 
ciently since we can compute (minimal) over- 
demanded sets in P-time. It is not necessary to 
increment prices by one unit in each iteration. In- 
stead, one can raise prices in the over-demanded 
set until the demand set of one of the respective 
bidders enlarges. It turns out that the payments 
(minimum price equilibrium) are precisely the 
VCG payments. Therefore, the mechanism is 
incentive compatible, and for every buyer, it is a 
dominant strategy to specify his true valuations. 
Moreover, this mechanism is even group strategy- 
proof, meaning that no strict subset of buyers who 
can collude have an incentive to misrepresent 
their true valuations. 

Demange, Gale, and Sotomayor [7] also pro- 
pose an approximation algorithm, called “ap- 
proximate auction mechanism” for computing a 
minimum competitive equilibrium. It is a version 
of the deferred-acceptance algorithm proposed by 
Crawford and Knoer [5], which in turn is a special 
case of the algorithm of Kelso and Crawford [10]. 
The algorithm works as follows. 


Approximate Auction Mechanism 
(Demange-Gale-Sotomayor [7] ) 


1. Set the initial price vector, p’, 
to the reserve prices, po)=r 

2. At round t when current prices 
are p', each buyer i may bid for 
any item. When he does so, he is 
committed to that item, which means 
he commits himself to possibly 
buying the item at the announced 
price. The item is (tentatively) 
assigned to that bidder. 

3. At this point, any uncommitted 
bidder may: 

* Bid for some unassigned item, in 
which case he becomes committed 
to it at its initial price. 

* Bid for an assigned item, in 
which case he becomes committed 
to that item, its price increase 
by some fixed amount 6, and the 
bidder to whom it was assigned 
becomes uncommitted 

* Drop out of the bidding. 

4. When there are no more uncommitted 
bidders, the auction terminates. 

Each committed bidder buys the 

item assigned to him at its current 

price. 
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This approximate auction mechanism would 
be appealing to the buyers since it does not 
require them to decide in advance exactly what 
their bidding behavior will be. Instead, at each 
stage, a buyer can make use of present and past 
stages of the auction to decide his next bid. If 
buyers behave in accordance with linear valua- 
tions, the final price will differ from the minimum 
equilibrium price by at most ké units, where 
k is the minimum of the number of items and 
bidders. Thus, by making 6 (the unit by which 
bids are increased) sufficiently small, one can 
get arbitrarily close to the minimum equilibrium 
price. 


Theorem 5 (Demange, Gale, and Sotomayor 
[7]) Under the approximate auction mechanism, 
the final price of an item will differ from the 
minimum equilibrium price by at most ké, where 
k = min(m, n). 

The mechanisms discussed so far (approxi- 
mately) compute a minimum competitive equi- 
librium. These approaches can be easily trans- 
formed to compute a maximum competitive equi- 
librium. Chen and Deng [3] discuss a combinato- 
rial algorithm which iteratively increases prices 
to converge to a maximum competitive equilib- 
rium starting from an arbitrary equilibrium. 


Applications 


The assignment model is used by Becker [2] to 
study marriage and household economics. Based 
on the fact that stable outcomes all correspond 
to optimal assignments, he studies which men 
are matched to which women under different 
assumptions of the assignment matrix. 


Extensions 


The existence of competitive equilibrium has 
later been established by Crawford and Knoer 
[5], Gale [8], and Quinnzi [12] for more general 
utility functions rather than the linear surplus, 
provided u;; (-) is strictly decreasing and continu- 
ous everywhere. 
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Under a minimum competitive equilibrium 
mechanism, it is a dominant strategy for every 
buyer to report his true valuation. On the other 
hand, under a maximum competitive equilibrium 
mechanism, while the sellers will be truthful, it 
is possible that some buyer bids a false value to 
obtain more utility. In a recent study, Chen and 
Deng [3] show the convergence from the max- 
imum competitive equilibrium toward the mini- 
mum competitive equilibrium in a deterministic 
and dynamic setting. 

Another strand of recent studies focus on the 
assignment model with budget constraints, which 
is applicable to many marketplaces such as online 
and TV advertising markets. An extra budget con- 
straint introduces discontinuity in the utility func- 
tion, which fundamentally changes the properties 
of competitive equilibria. In such setups, a com- 
petitive equilibrium does not always exist. Aggar- 
wal et al. [1] study the problem of computing a 
weakly stable matching in the assignment model 
with quasi-linear utilities subject to a budget 
constraint. However, a weakly stable matching 
does not possess the envy-freeness property of a 
competitive equilibrium. Chen et al. [4] establish 
a connection between competitive equilibrium in 
the assignment model with budgets and strong 
stability. Then they give a strong polynomial 
time algorithm for deciding existence of and 
computing a minimum competitive equilibrium 
for a general class of utility functions in the 
assignment model with budgets. 
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Problem Definition 


In recent years matroids have been used in the 
fields of parameterized complexity and exact al- 
gorithms. Many of these works mainly use a 
computation of representative families. Let M = 
(E,T) be a matroid and S = {Sj,...,5:} CZ 
be a family of independent sets of size p. A 
subfamily S C S is called a q-representative 
family for S (denoted by s a S), if for every 
Y C E of size at most q, if there exists a set 
S € S disjoint from Y with S UY e Z, then 
there exists a set § € S disjoint from Y with Su 
Y € TZ. The basic algorithmic question regarding 
representative families is, given a matroid M = 
(E,7T), a family S C TZ of independent sets of 
size p and a positive integer g, compute & eA D 
S of size as small as possible in time as fast as 
possible. 

The Two-Families Theorem of Bollobas [1] 
for extremal set systems implies that every family 
of independent sets of size p in a uniform matroid 
has a q-representative family with at most (? si) 
sets. The generalization of Two-Families Theo- 
rem to subspaces of a vector space by Lovasz [5] 
implies that every family of independent sets of 
size p in a linear matroid has a qg-representative 
family with at most (? =) sets. In fact one 


can show that the cardinality (? fa of a q- 
representative family of a family of independent 
sets of size p is optimal. It is important to note 
that the size of the g-representative family of a 
family of sets of size p in a uniform or linear 
matroid only depends on p and q and not on the 
cardinality of ground set of the matroid, and this 
fact is used to design parameterized and exact 
algorithms. 


Key Results 


For uniform matroids, Monien [8] gave an algo- 
rithm for computing a q-representative family of 
size at most )-7_, p' in time O(pq + )-4_, p'-t) 
and Marx [6] gave another algorithm, for com- 
puting a g-representative families of size at most 
‘@ pes in time O(p? - t7). For uniform matroids, 
Fomin et al. [2] proved the following theorem. 
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Theorem 1 ((2, 9]) Let S = {S1,...,S;} be 
a family of sets of size p over a universe of 
size n and lett0 < x <_ 1. Fora given q, 
a q-representative family s ae S with at 
most x~P?(1—x)~@ - 2°+9 logn sets can be 
computed in time O((1—x)~4 -2°? + .t log n). 


In [3], Fomin, Lokshtanov, and Saurabh 

proved Theorem 1 for x = —2.. That is, a 
p+q 

ct 


q-representative family S Crep S with at most 


(74) - 20+49) . logn sets can be computed 
in time O((242)2 . 29+ . ¢ . logn). Later 
Fomin et al. [2] observed that the proof in [3] 
can be modified to work for every 0 < x < 1 
and allows an interesting trade-off between the 
size of the computed representative families and 
the time taken to compute them, and this trade- 
off can be exploited algorithmically to speed 
up “representative families based” algorithms. 
Independently, at the same time, Shachnai and 
Zehavi [9] also observed that the proof in [3] 
can be generalized to get Theorem 1. We 
would like to mention that in fact a variant of 
Theorem | is proved in [2,9] which computes a 
weighted g-representative family. The proof of 
Theorem | uses algorithmic variant of “random 
permutation” proof of Bollobés Lemma and an 
efficient construction of a variant of universal 
sets called n- p-q-separating collections. 

For linear matroids, Marx [7] showed that 
Lovasz’s proof can be transformed into an algo- 
rithm computing a g-representative family: 


Theorem 2 ( [7]) Given a linear representation 
Am of a matroid M = (E,T7), a family 
{S1,..., S:} of independent sets of size p anda 
positive integer q, there is an algorithm which 
computes S Ciep S of size Cy) in time 
QO(plog(p+q)) . (741)° (A ||)O where 
||Ays|| is the size of Am. 


Fomin, Lokshtanov, and Saurabh [3] gave an 
efficient computation of representative families in 
linear matroids: 


Theorem 3 ((3]) Let M = (E,TZ) be a linear 
matroid of rank p + q = k given together with 
its representation matrix Ay over a field F. Let 
S = {S1,..., S;} be a family of independent sets 


Matroids in Parameterized Complexity and Exact Algorithms 


of size p. Then § Chey S with at most Cy) sets 

7 o-1 
can be computed in O ((?3)ep° + i) ) 
operations over F, where w < 2.373 is the matrix 
multiplication exponent. 


We would like to draw attention of the reader 
that in Theorems 3 and 2, the cardinality of the 
computed q-representative family is optimal and 
polynomial in p and q if one of p or g is a 
constant, unlike in Theorem 1. As in the case 
of Theorem 1, Theorem 3 is also proved for a 
weighted g-representative family. 

Most of the algorithms using representative 
families are dynamic programming algorithms. 
A class of families which often arise in dynamic 
programming are product families. A family F is 
called product of two families A and 6, where A 
and 6 are families of independent sets in a ma- 
troid M = (E,T), if F ={AUB|AE A, Be 
B,AN B = 9,AUB é T}. Fomin et al. [2] 
gave two algorithms to compute q-representative 
family of a product family F, one in case of 
uniform matroid and other in case of linear ma- 
troid. These algorithms significantly outperform 
the naive way of computing the product family F 
first and then a representative family of it. 


Applications 


Representative families are used to design effi- 
cient algorithms in parameterized complexity and 
exact algorithms. 


Parameterized and Exact Algorithms 

In this subsection we list some of the parameter- 
ized and exact algorithms obtained using repre- 
sentative families. 


1. €-MATROID INTERSECTION. In this prob- 
lem we are given £ matroids M,; = 
(E,71),...,Me = (E,Z,) along with their 
linear representations Ayy,,..., A M, Over the 
same field F and a positive integer k. The 
objective is to find a k element subset of 
E, which is independent in all the matroids 
M,,...,Me. Marx [7] gave a randomized 


algorithm for the problem running in time 


t O(1) . 
StkD) (Sofas las ll) , where f is a 
computable function. By giving an algorithm 
for deterministic truncation of linear matroids, 
Lokshtanov et al. [4] gave a deterministic 
algorithm for the problem running in time 


qoke (ee Aull) where @ is the 
matrix multiplication exponent. 

. LONG DIRECTED CYCLE. In the LONG DI- 
RECTED CYCLE problem, we are interested in 
finding a cycle of length at least k in a directed 
graph. Fomin et al. [2] and Shachnai and 
Zehavi [9] gave an algorithm of running time 
O0(6.75*+°®) mn? log? n) for this problem. 

. SHORT CHEAP TouR. In this problem we 
are given an undirected n-vertex graph G, w: 
E(G) — N and an integer k. The objective 
is to find a path of length k with minimum 
weight. Fomin et al. [2] and Shachnai and 
Zehavi [9] gave a O(2.619* nO log W) time 
algorithm for SHORT CHEAP TOUR, where W 
is the largest edge weight in the given input 
graph. 

. MULTILINEAR MONOMIAL DETECTION. 
Here the input is an arithmetic circuit C over 
Z* representing a polynomial P(X) over 
Z*. The objective is to test whether P(X) 
construed as a sum of monomials contain a 
multilinear monomial of degree k. For this 
problem Fomin et al. [2] gave an algorithm of 
running time O(3.8408*2°)s(C)n log? n), 
where s(C) is the size of the circuit. 

. MINIMUM EQUIVALENT GRAPH(MEG). In 
this problem we are seeking a spanning sub- 
digraph D’ of a given n-vertex digraph D 
with as few arcs as possible in which the 
reachability relation is the same as in the 
original digraph D. Fomin, Lokshtanov, and 
Saurabh [3] gave the first single-exponential 
exact algorithm, i.e., of running time 200) for 
the problem. 

. Dynamic Programming Over Graphs of 
Bounded Treewidth. Fomin et al. [2] gave 
algorithms with running time 
O((1+ 2°! -3)tw?@n) for FEEDBACK 
VERTEX SET and STEINER TREE, where tw 
is the treewidth of the input graph, 7 is the 
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number of vertices in the input graph, and w 
is the matrix multiplication exponent. 


Open Problems 


1. Can we improve the running time for the com- 
putation of representative families in linear 
matroids or in specific matroids like graphic 
matroids? 
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Problem Definition 


Given an undirected edge-weighted graph, G = 
(V, E), the maximum cut problem (MAX CUT) is 
to find a bipartition of the vertices that maximizes 
the weight of the edges crossing the partition. If 
the edge weights are non-negative, then this prob- 
lem is equivalent to finding a maximum weight 
subset of the edges that forms a bipartite sub- 
graph, i.e., the maximum bipartite subgraph prob- 
lem. All results discussed in this article assume 
non-negative edge weights. MAX CUT is one 
of Karp’s original NP-complete problems [20]. 
In fact, it is NP-hard to approximate to within a 


factor better than +s [17,35]. 


Max Cut 


For nearly 20 years, the best-known approx- 
imation factor for MAX CUT was half, which 
can be achieved by a very simple algorithm: 
form a set S' by placing each vertex in S with 
probability half. Since each edge crosses the cut 
(S,V \ S) with probability half, the expected 
value of this cut is half the total edge weight. 
This implies that for any graph, there exists a cut 
with value at least half of the total edge weight. 
In 1976, Sahni and Gonzalez presented a deter- 
ministic half-approximation algorithm for MAX 
CUT, which is essentially a de-randomization of 
the aforementioned randomized algorithm [31]: 
iterate through the vertices and form sets § and S$ 
by placing each vertex in the set that maximizes 
the weight of cut (S ,S) thus far. After each 
iteration of this process, the weight of this cut 
will be at least half of the weight of the edges 
with both endpoints in S US. 

This simple half-approximation algorithm 
uses the fact that for any graph with non-negative 
edge weights, the total edge weight of a given 
graph is an upper bound on the value of its 
maximum cut. There exist classes of graphs 
for which a maximum cut is arbitrarily close 
to half the total edge weight, i.e., graphs for 
which this “trivial” upper bound can be close to 
twice the true value of an optimal solution. An 
example of such a class of graphs is complete 
graphs on 7 vertices, K,. In order to obtain an 
approximation factor better than half, one must 
be able to compute an upper bound on the value 
of a maximum cut that is better, i.e., smaller, than 
the trivial upper bound for such classes of graphs. 


Linear Programming Relaxations 

For many optimization (maximization) problems, 
linear programming has been shown to yield 
better (upper) bounds on the value of an optimal 
solution than can be obtained via combinatorial 
methods. There are several well-studied linear 
programming relaxations for MAX CUT. For ex- 
ample, a classical integer program has a variable 
Xe for each edge and a constraint for each odd 
cycle, requiring that an odd cycle C contribute at 
most |C | — 1 edges to an optimal solution. 


Max Cut 


max ) WeXe 


ecE 
> xe < |C|-—1 Voddcycles C 
ecC 
Xe € {0,1}. 


The last constraint can be relaxed so that each 
Xe is required to lie between O and 1, but need 
not be integral, ie.,0 < xe < 1. Although 
this relaxation may have exponentially many con- 
straints, there is a polynomial-time separation 
oracle (equivalent to finding a minimum weight 
odd cycle), and thus, the relaxation can be solved 
in polynomial time [14]. Another classical integer 
program contains a variable x;; for each pair of 
vertices. In any partition of the vertices, either 
zero or two edges from a three-cycle cross the 
cut. This requirement is enforced in the following 
integer program. If edge (i, 7) ¢ E, then wj; is 


set to 0. 
max ) Wij Xij 


i,jeV 
Xip +Xje +X < 2 Wijk eV 
Xip txje—Xei = 0 Wijk EV 
xij € {0,1}. 


Again, the last constraint can be relaxed so that 
each x;; is required to lie between 0 and 1. In 
contrast to the aforementioned cycle-constraint- 
based linear program, this linear programming re- 
laxation has a polynomial number of constraints. 

Both of these relaxations actually have the 
same optimal value for any graph with non- 
negative edge weights [3,26,30]. (For a simplified 
proof of this, see [25].) Poljak showed that the 
integrality gap for each of these relaxations is 
arbitrarily close to 2 [26]. In other words, there 
are classes of graphs that have a maximum cut 
containing close to half of the edges, but for 
which each of the above relaxations yields an 
upper bound close to all the edges, i.e., no better 
than the trivial “all-edges” bound. In particular, 
graphs with a maximum cut close to half the 
edges and with high girth can be used to demon- 
strate this gap. A comprehensive look at these 
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linear programming relaxations is contained in 
the survey of Poljak and Tuza [30]. 

Another natural integer program uses vari- 
ables for vertices rather than edges: 


max > wij (xi (1—xj) +x; (1—x;)) (D 


G,j)EE 


xi € {0,1} Vie V. (2) 
Replacing (2) with x; € [0, 1] results in a nonlin- 
ear relaxation that is actually just as hard to solve 
as the integer program. This follows from the 
fact that any fractional solution can be rounded 
to obtain an integer solution with at least the 
same value. Indeed, for any vertex h € V with 
fractional value x;,, we can rewrite the objective 
function (1) as follows. Edges adjacent to vertex 
h are denoted by 6(h). For ease of notation, let 
us momentarily assume the graph is unweighted, 
although the argument works for non-negative 
edge weights. 


a 


G@/EE\5(h) 


xj(l— xj) + xj —x)+ 


A B 
— 
xn Y) (=xj)+U-xn) DO x7. B) 
Jeb(A) Jeb(A) 


If A > B, we round x, to 1, otherwise we round 
it to 0. Repeating this process for all vertices 
results in an integral solution whose objective 
value is no less than the objective value of the 
initial fractional solution. 


Eigenvalue Upper Bounds 

Delorme and Poljak [8] presented an eigenvalue 
upper bound on the value of a maximum cut, 
which was a strengthened version of a previous 
eigenvalue bound considered by Mohar and Pol- 
jak [24]. Computing Delorme and Poljak’s upper 
bound is equivalent to solving an eigenvalue min- 
imization problem. They showed that their bound 
is computable in polynomial time with arbitrary 
precision. In a series of work, Delorme, Poljak 
and Rendl showed that this upper bound behaves 
“differently” from the linear programming-based 
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upper bounds. For example, they studied classes 
of sparse random graphs (e.g., G(n, p) with p = 
50/n) and showed that their upper bound is close 
to optimal on these graphs [9]. Since graphs of 
this type can also be used to demonstrate an 
integrality gap arbitrarily close to 2 for the afore- 
mentioned linear programming relaxations, their 
work highlighted contrasting behavior between 
these two upper bounds. Further computational 
experiments on other classes of graphs gave more 
evidence that the bound was indeed stronger than 
previously studied bounds [27,29]. Delorme and 
Poljak conjectured that the five cycle demon- 
strated the worst-case behavior for their bound: a 
ratio of —32-~ ~ 0.88445 between their bound 
2545/5 

and the optimal integral solution. However, they 
could not prove that their bound was strictly less 
than twice the value of a maximum cut in the 
worst case. 


Key Result 


In 1994, Goemans and Williamson presented a 
randomized 0.87856-approximation algorithm 
for MAX CUT [12]. Their breakthrough work was 
based on rounding a semidefinite programming 
relaxation and was the first use of semidefinite 
programming in approximation algorithms. 
Poljak and Rendl showed that the upper bound 
provided by this semidefinite relaxation is 
equivalent to the eigenvalue bound of Delorme 
and Poljak [28]. Thus, Goemans and Williamson 
proved that the eigenvalue bound of Delorme and 
Poljak is no more than 1.138 times the value of a 
maximum cut. 


A Semidefinite Relaxation 

MAX CUT can be formulated as the following 
quadratic integer program, which is NP-hard to 
solve. Each vertex i € V is represented by a 
variable y;, which is assigned either 1 or —1 
depending on which side of the cut it appears. 


1 
max 5 2 wij (1 — yyy) 
(G,jJeE 


y, € {-l1,]} VieV. 


Max Cut 


Goemans and Williamson considered the follow- 
ing relaxation of this integer program, in which 
each vertex is represented by a unit vector. 


1 
max 5 ¥ wij (1 — u; + v;) 


(i,j)EE 
VieV 


vu, € RR" VieV. 


They showed that this relaxation is equivalent to 
a semidefinite program. Specifically, consider the 
following semidefinite relaxation: 


1 
mee s wij (1 — yi) 


Gj)EE 


yi = 1 VieV 


Y positive semidefinite. 


The equivalence of these two relaxations is due 
to the fact that a matrix Y is positive semidefinite 
if and only if there is a matrix B such that 
B’ B =Y. The latter relaxation can be solved to 
within arbitrary precision in polynomial time via 
the ellipsoid algorithm, since it has a polynomial- 
time separation oracle [15]. Thus, a solution to 
the first relaxation can be obtained by finding a 
solution to the second relaxation and finding a 
matrix B such that B’ B = Y. If the columns 
of B correspond to the vectors {v;}, then yj; = 
vj - v;, yielding a solution to the first relaxation. 


Random-Hyperplane Rounding 

Goemans and Williamson showed how to round 
the semidefinite programming relaxation of MAX 
CUT using a new technique that has since become 
known as “random-hyperplane rounding” [12]. 
First obtain a solution to the first relaxation, 
which consists of a set of unit vectors {v;}, one 
vector for each vertex. Then choose a random 
vector r € RR” in which each coordinate of r 
is chosen from the standard normal distribution. 
Finally, set S = {i | vj-r > 0} and output the cut 
(S,V \S). 


Max Cut 


The probability that a particular edge (i, 7) € 
E crosses the cut is equal to the probability 
that the dot products v; -r and v; -r differ in 
sign. This probability is exactly equal to 6;;/z, 
where 6;; is the angle between vectors v; and 
v;. Thus, the expected weight of edges crossing 
the cut is equal to )°¢; ye" Gi; /m. How large is 
this compared to the objective value given by the 
semidefinite programming relaxation, 1.e., what is 
the approximation ratio? 

Define ag, as the worst-case ratio of the 
expected contribution of an edge to the cut, to 
its contribution to the objective function of the 
semidefinite programming relaxation. In other 
words: Qgy = MiNg<g<x 25. It can be 
shown that ag, > 0.87856. Thus, the expected 
value of a cut is at least a,,- SDPopr, resulting 
in an approximation ratio of at least 0.87856 for 
MAX CUT. The same analysis applies to weighted 
graphs with non-negative edge weights. 

This algorithm was de-randomized by Maha- 
jan and Hariharan [23]. Goemans and Williamson 
also applied their random-hyperplane rounding 
techniques to give improved approximation guar- 
antees for other problems such as MAX-DICUT 
and MAX-2SAT. 


Integrality Gap and Hardness 

Karloff showed that there exist graphs for which 
the best hyperplane is only a factor ag, of the 
maximum cut [19], showing that there are graphs 
for which the analysis in [12] is tight. Since the 
optimal SDP value for such graphs equals the 
optimal value of a maximum cut, these graphs 
cannot be used to demonstrate an integrality gap. 
However, Feige and Schechtman showed that 
there exist graphs for which the maximum cut is 
a Qgy fraction of the SDP bound [10], thereby 
establishing that the approximation guarantee of 
Goemans and Williamson’s algorithm matches 
the integrality gap of their semidefinite program- 
ming relaxation. Recently, Khot, Kindler, Mos- 
sel, and O’ Donnell [22] showed that if the Unique 
Games Conjecture of Khot [21] is true, then it is 
NP-hard to approximate MAX CUT to within any 
factor larger than dg. 
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Better-than-Half Approximations 

Without SDPs 

Since Goemans and Williamson presented an 
Qgy-approximation algorithm for MAX CUT, it 
has been an open question if one can obtain a 
matching approximation factor or even an ap- 
proximation ratio of ; + € for some constant 
€ > 0 without using SDPs. Trevisan presented an 
algorithm based on spectral partitioning with run- 
ning time O(n?) and an approximation guarantee 
of 0.531 [34], which was subsequently improved 
to 0.614 by Soto [32]. 


Applications 


The work of Goemans and Williamson paved 
the way for the further use of semidefinite 
programming in approximation algorithms, 
particularly for graph partitioning problems. 
Methods based on the random-hyperplane 
technique have been successfully applied to many 
optimization problems that can be categorized 
as partition problems. A few examples are 
3-COLORING [18], MAX-3-CUT [7, 11, 13], MAX- 
BISECTION [16], CORRELATION CLUSTER- 
ING [5,33], and SPARSEST CUT [2]. Additionally, 
some progress has been made in extending 
semidefinite programming techniques outside 
the domain of graph partitioning to problems 
such as BETWEENNESS [6], BANDWIDTH [4], 
and LINEAR EQUATIONS mod p [1]. 
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Problem Definition 


The MAX LEAF SPANNING TREE problem asks 
us to find a spanning tree with at least k leaves 
in an undirected graph. The decision version of 
parameterized MAX LEAF SPANNING TREE is 
the following: 


MAX LEAF SPANNING TREE 

INPUT: A connected graph G, and an integer k. 
PARAMETER: An integer k. 

QUESTION: Does G have a spanning tree with at 
least k leaves? 


The parameterized complexity of the 
nondeterministic polynomial-time complete 
MAX LEAF SPANNING TREE problem has been 
extensively studied [2, 3, 9, 11] using a variety 
of kernelization, branching and other fixed- 
parameter tractable (FPT) techniques. The au- 
thors are the first to propose an extremal structure 
method for hard computational problems. The 
method, following in the sense of Grothendieck 
and in the spirit of the graph minors project of 
Robertson and Seymour, is that a mathematical 
project should unfold as a series of small steps in 
an overall trajectory that is described by the ap- 
propriate “mathematical machine.” The authors 
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are interested in statements of the type: Every 
connected graph on n vertices that satisfies a cer- 
tain set of properties has a spanning tree with at 
least k leaves, and this spanning tree can be found 
in time O(f(k) +n°), where c is a constant 
(independent of k) and fis an arbitrary function. 

In parameterized complexity, the value k is 
called the parameter and is used to capture some 
structure of the input or other aspect of the 
computational objective. For example, k might 
be the number of edges to be deleted in order 
to obtain a graph with no cycles, or k might 
be the number of DNA sequences to be aligned 
in an alignment, or k may be the maximum 
type-declaration nesting depth of a compiler, or 
k = 1/e may be the parameterization in the anal- 
ysis of approximation, or k might be a composite 
of several variables. 

There are two important ways of comparing 
FPT algorithms, giving rise to two FPT races. 
In the “f(k)” race, the competition is to find 
ever more slowing growing parameter functions 
Jt) governing the complexity of FPT algorithms. 
The “kernelization race” refers to the following 
lemma stating that a problem is in FPT if and 
only if the input can be preprocessed (kernelized) 
in “ordinary” polynomial time into an instance 
whose size is bounded by a function of k only. 


Lemma 1 A parameterized problem II is in 
FPT if and only if there is a polynomial-time 
transformation (in both n and k) that takes (x, k) 
to (x’,k’) such that: 


(1) (@, k) is a yes-instance of II if and only if 
(x’, k’) is a yes-instance of 11, 

(2) k' <k, and 

(3) |x’| < g(k) for some fixed function g. 


In the situation described by the lemma, say that 
we can kernelize to instances of size at most 
g(k). Although the two races are often closely 
related, the result is not always the same. The 
current best FPT algorithm for MAX LEAF is due 
to Bonsma [1] (following the extremal structure 
approach outlined by the authors) with a running 
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time of O*(8.12") to determine whether a graph 
G on n vertices has a spanning tree with at least 
k leaves; however the authors present the FPT 
algorithm with the smallest kernel size. 

The authors list five independent deliverables 
associated to the extremal structure theory, and 
illustrate all of the objectives for the MAX LEAF 
problem. The five objectives are: 


(A) Better FPT algorithms as a result of deeper 
structure theory, more powerful reduction 
rules associated with that structure theory, 
and stronger inductive proofs of improved 
kernelization bounds. 
Powerful preprocessing (data 
tion/kernelization) rules and combinations 
of rules that can be used regardless of 
whether the parameter is small and that can 
be combined with other approaches, such 
as approximation and heuristics. These are 
usually easy to program. 

(C) Gradients and transformation rules for local 
search heuristics. 

(D) Polynomial-time approximation algorithms 
and performance bounds proved in a system- 
atic way. 

(E) Structure to exploit for solving other prob- 
lems. 


(B 


reduc- 


wm 


Key Results 


The key results are programmatic, providing 
a method of extremal structure as a systematic 
method for designing FPT algorithms. The five 
interrelated objectives listed above are surveyed, 
and each is illustrated using the MAX LEAF 
SPANNING TREE problem. 


Objective A: FPT Algorithms 
The objective here is to find polynomial-time 
preprocessing (kernelization) rules where g(k) is 
as small as possible. This has a direct payoff in 
terms of program objective B. 

Rephrased as a structure theory question, the 
crucial issue is: What is the structure of graphs 
that do not have a subgraph with k leaves? 


Max Leaf Spanning Tree 


Max Leaf Spanning Tree, Fig. 1 Reduction rules were 
developed in order to reduce this Kleitman—West graph 
structure 


A graph theory result due to Kleitman and 
West shows that a graph of minimum degree 
at least 3, that excludes a k-leaf subgraph, 
has at most 4(k — 3) vertices. Figure 1 shows 
that this is the best possible result for this 
hypothesis. However, investigating the structure 
using extremal methods reveals the need for 
the reduction rule of Fig. 2. About 20 different 
polynomial-time reduction rules (some much 
more complex and “global” in structure than 
the simple local reduction rule depicted) are 
sufficient to kernelize to a graph of minimum 
degree 2 having at most 3.5k vertices. 

In general, an instance of a parameterized 
problem consists of a pair (x, k) and a “boundary” 
which is located by holding x fixed and varying 
k and regarding whether the outcome of the 
decision problem is yes or no. Of interest is the 
boundary when x is reduced. A typical boundary 
lemma looks like the following. 


Lemma 2 Suppose (G, k) is a reduced instance 
of MAX LEAF, WITH (G, k) a yes-instance and 
(G,k + 1) ano-instance. Then |G| < ck. (Here 
c is a small constant that becomes clarified dur- 
ing the investigation.) 


A proof of a boundary lemma is by minimum 
counterexample. A counterexample would be 
a graph such that (1) (G, k) is reduced, (2) (G, k) 
is a yes-instance of MAX LEAF, (3) (G,k + 1) is 
a no-instance, and (4) |G| > ck. 

The proof of a boundary lemma unfolds grad- 
ually. Initially, it is not known what bound will 
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=, Se 


k= 


Max Leaf Spanning Tree, Fig. 2. A reduction rule for the Kleitman—West graph 


eventually succeed and it is not known exactly 
what is meant by reduced. In the course of an 
attempted proof, these details are worked out. As 
the arguments unfold, structural situations will 
suggest new reduction rules. Strategic choices 
involved in a boundary lemma include: 


(1) Determining the polarity of the boundary, and 
setting up the boundary lemma. 

(2) Choosing a witness structure. 

(3) Setting inductive priorities. 

(4) Developing a series of structural claims that 
describe the situation at the boundary. 

(5) Discovering reduction rules that can act in 
polynomial-time on relevant structural situa- 
tions at the boundary. 

(6) As the structure at the boundary becomes 
clear, filling in the blank regarding the ker- 
nelization bound. 


The overall structure of the argument is “by 
minimum counterexample” according to the pri- 
orities established by choice 3, which generally 
make reference to choice 2. The proof proceeds 
by a series of small steps consisting of structural 
claims that lead to a detailed structural picture at 
the “boundary’’— and thereby to the bound on the 
size of G that is the conclusion of the lemma. The 
complete proof assembles a series of claims made 
against the witness tree, various sets of vertices, 
and inductive priorities and sets up a master 
inequality leading to a proof by induction, and 
a 3.5k problem kernel. 


Objective B: Polynomial-Time 
Preprocessingand Data-Reduction 

Routines 

The authors have designed a table for tracing 
each possible boundary state for a possible solu- 
tion. Examples are given that show the surprising 
power of cascading data-reduction rules on real 


input distributions and that describe a variety of 
mathematical phenomena relating to reduction 
rules. For example, some reduction rules, such as 
the Kleitman—West dissolver rule for MAX LEAF 
(Fig. 2), have a fixed “boundary size” (in this 
case 2), whereas crown-type reduction rules do 
not have a fixed boundary size. 


Objective C: Gradients and Solution 
Transformations for Local Search 

A generalization of the usual setup for local 
search is given, based on the mathematical power 
of the more complicated gradient in obtaining 
superior kernelization bounds. Idea 1 is that lo- 
cal search be conducted based on maintaining 
a “current witness structure” rather than a full 
solution (spanning tree). Idea 2 is to use the list 
of inductive priorities to define a “better solution” 
gradient for the local search. 


Objective D: Polynomial-Time 

Approximation Algorithms 

The polynomial-time extremal structure theory 
leads directly to a constant-factor p-time approx- 
imation algorithm for MAX LEAF. First, reduce 
G using the kernelization rules. The rules are 
approximation-preserving. Take any tree T (not 
necessarily spanning) in G. If all of the structural 
claims hold, then (by the boundary lemma argu- 
ments) the tree J must have at least n/c leaves 
for c = 3.75. Therefore, lifting T back along the 
reduction path, we obtain a c-approximation. 

If at least one of the structural claims does not 
hold, then the tree T can be improved against one 
of the inductive priorities. Notice that each claim 
is proved by an argument that can be interpreted 
as a polynomial-time routine that improves 7, 
when the claim is contradicted. 

These consequences can be applied to the 
original T (and its successors) only a polyno- 
mial number of times (determined by the list of 
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Max Leaf Spanning Tree, Table 1 The complexity ecology of parameters 


TW BW ie DS G ML 
TW FPT Wl1]-hard FPT FPT 2 FPT 
BW FPT Wl1]-hard FPT FPT 2 FPT 
VC FPT 2 FPT FPT 2 FPT 
DS 2 2 W[1]-hard Wi1]-hard 2 2 
G W[1]-hard Wi1]-hard W[1]-hard Wi1]-hard FPT 2 
ML FPT 2 FPT FPT FPT 2 


inductive priorities) until one arrives at a tree 
T’ for which all of the various structural claims 
hold. At that point, we must have a c-approximate 
solution. 


Objective E: Structure To Exploitin The 
Ecology of Complexity 

The objective here is to understand how every 
input-governing problem parameter affects the 
complexity of every other problem. As a small 
example, consider Table | using the shorthand 
TW is TREEWIDTH, BW is BANDWIDTH, VC 
is VERTEX COVER, DS is DOMINATING SET, 
G is GENUS and ML is MAX LEAF. The entry 
in the second row and fourth column indicates 
that there is an FPT algorithm to optimally solve 
the DOMINATING SET problem for a graph G of 
bandwidth at most k. The entry in the fourth row 
and second column indicates that it is unknown 
whether BANDWIDTH can be solved optimally by 
an FPT algorithm when the parameter is a bound 
on the domination number of the input. 

MAX LEAF applies to the last row of the 
table. For graphs of max leaf number bounded 
by k, the maximum size of an independent set 
can be computed in time O*(2.972*) based on 
a reduction to a kernel of size at most 7k. There 
is a practical payoff for using the output of one 
problem as the input to another. 


Applications 


The MAX LEAF SPANNING TREE problem has 
motivations in computer graphics for creating 
triangle strip representations for fast interactive 
rendering [5]. Other applications are found in the 


area of traffic grooming and network design, such 
as the design of optical networks and the uti- 
lization of wavelengths in order to minimize net- 
work cost, either in terms of the line-terminating 
equipment deployed or in terms of electronic 
switching [6]. The minimum-energy problem in 
wireless networks consists of finding a trans- 
mission radius vector for all stations in such 
a way that the total transmission power of the 
whole network is the least possible. A restricted 
version of this problem is equivalent to the MAX 
LEAF SPANNING TREE problem [7]. Finding 
spanning trees with many leaves is equivalent to 
finding small connected dominating sets and is 
also called the MINIMUM CONNECTED DOMI- 
NATING problem [13]. 


Open Problems 


Branching Strategies 

While extremal structure is in some sense the 
right way to design an FPT algorithm, this is 
not the only way. In particular, the recipe is 
silent on what to do with the kernel. An open 
problem is to find general strategies for employ- 
ing “parameter-appropriate structure theory” in 
branching strategies for sophisticated problem 
kernel analysis. 


Turing Kernelizability 

The polynomial-time transformation of (x, k) to 
the simpler reduced instance (x’, k’) is a many:1 
transformation. One can generalize the notion 
of many:1 reduction to Turing reduction. How 
should the quest for p-time extremal theory un- 
fold under this “more generous” FPT? 
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Algorithmic Forms of The Boundary 

Lemma Approach 

The hypothesis of the boundary lemma that (G, 
k) is a yes-instance implies that there exists a wit- 
ness structure to this fact. There is no assumption 
that one has algorithmic access to this structure, 
and when reduction rules are discovered, these 
have to be transformations that can be applied 
to (G, k) and a structure that can be discovered 
in (G, k) in polynomial time. In other words, 
reduction rules cannot be defined with respect to 
the witness structure. Is it possible to describe 
more general approaches to kernelization where 
the witness structure used in the proof of the 
boundary lemma is polynomial-time computable, 
and this structure provides a conditional context 
for some reduction rules? How would this change 
the extremal method recipe? 


Problem Annotation 

One might consider a generalized MAX LEAF 
problem where vertices and edges have various 
annotations as to whether they must be leaves (or 
internal vertices) in a solution, etc. Such a gen- 
eralized form of the problem would generally be 
expected to be “more difficult” than the vanilla 
form of the problem. However, several of the 
“best known” FPT algorithms for various prob- 
lems, are based on these generalized, annotated 
forms of the problems. Examples include PLA- 
NAR DOMINATING SET and FEEDBACK VER- 
TEX SET [4]. Should annotation be part of the 
recipe for the best possible polynomial-time ker- 
nelization? 
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Problem Definition 


In a scheduling problem we have to find an opti- 
mal schedule of jobs. Here we consider the paral- 
lel machines case, where m machines are given, 
and we can use them to schedule the jobs. In the 
most fundamental model, each job has a known 
processing time, and to schedule the job we have 
to assign it to a machine, and we have to give its 
starting time and a completion time, where the 
difference between the completion time and the 
starting time is the processing time. No machine 
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may simultaneously run two jobs. If no further 
assumptions are given then the machines can 
schedule the jobs assigned to them without an idle 
time and the total time required to schedule the 
jobs on a machine is the sum of the processing 
times of the jobs assigned to it. We call this value 
the load of the machine. 

Concerning the machine environment three 
different models are considered. If the processing 
time of a job is the same for each machine, then 
we call the machines identical machines. If each 
machine has a speed s;, the jobs have a processing 
weight p; and the processing time of job j on the 
i-th machine is p;/s;, then we call the machines 
related machines. If the processing time of job 
j is given by an arbitrary positive vector P; = 
(p;(1),...,p;(m)), where the processing time 
of the job on the i-th machine is p; (i), then we 
call the machines unrelated machines. Here we 
consider the identical machine case unless it is 
stated otherwise. 

Many objective functions are considered for 
scheduling problems. Here we consider only such 
models where the goal is the maximization of 
the minimal load which problem was proposed 
in [4]. We note that the most usual objective 
function is minimizing the maximal load which 
is called makespan. This objective is the dual of 
the makespan in some sense but both objective 
functions require to balance the loads of the 
machines. 

A straightforward reduction from the well- 
known NP-hard partition problem shows that the 
investigated problem is NP-hard. Therefore one 
main research question is to develop polynomial 
time approximation algorithms which cannot en- 
sure the optimal solution but always give a solu- 
tion which is not much worse than the optimal 
one. These approximation algorithms are usually 
evaluated by the approximation ratio. In case of 
maximization problems an algorithm is called 
c-approximation if the objective value given by 
the algorithm is at least c-times as large than 
the optimal objective value. If we have a poly- 
nomial time | — e-approximation algorithm for 
every € > 0, then this class of algorithms is 
called polynomial approximation scheme (PTAS 
in short). If these algorithms are also polynomial 
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in 1/e, then we call them fully polynomial ap- 
proximation scheme (FPTAS in short). 


Key Results 


Approximation Algorithms 

Since we plan to balance the load on the ma- 
chines, a straightforward idea to find a solution 
is to use some greedy method to schedule the 
jobs. If we schedule the jobs one by one, then 
a greedy algorithm assigns the job to the ma- 
chine with the smallest load. Unfortunately if 
we schedule the jobs in arbitrary order, then this 
algorithm does not have constant approximation 
ratio. In the worst inputs the problem is caused by 
the large jobs which are scheduled at the end of 
the input. Therefore, the next idea is to order the 
jobs by decreasing size and schedule them one 
by one assigning the actual job to the machine 
with the smallest load. This algorithm is called 
LPT (longest processing time) and analyzed in 
[4] and [3]. The first analysis was presented in 
[4], where the authors proved that the algorithm 
is 3/4-approximation and also proved that no 
greater approximation ratio can be given for the 
algorithm as the number of machines tends to oo. 
In [3] a more sophisticated analysis is given, the 
authors proved that the exact competitive ratio is 
(3m — 1)/(4m — 2). Later in [7] a PTAS was 
presented for the problem. The time complexity 
of the algorithm is O(c, -m - m) where cz is 
a constant which depends exponentially on e. 
Thus the presented class of algorithms is not 
an FPTAS. But it worths noting that we cannot 
expect an FPTAS for the problem. It belongs 
to the class of strongly NP-complete problems; 
thus an FPTAS would yield P= NP. The case of 
unrelated machines is much more difficult. In [2] 
it is proved that no better than 1/2 approximation 
algorithm exists unless P= NP; therefore we can- 
not expect a PTAS in this case. 


Online and Semi-online Problem 

In many applications we do not have a priori 
knowledge about the input and the algorithms 
must make their decision online based only on 
the past information. These algorithms are called 
online algorithms. In scheduling problems this 
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means that the jobs arrive one by one and the 
algorithm has to schedule the arriving job without 
any knowledge about the further ones. This area 
is very similar to the area of approximation 
algorithms, again we cannot expect algorithms 
which surely find optimal solutions. In approxi- 
mation algorithms the problem is that we do not 
have exponential time for computation; in online 
algorithms it is the lack of information. The 
algorithms are also analyzed by a similar method, 
but in the area of online algorithms it is called 
competitive ratio. For maximization problems an 
online algorithm is called c-competitive if the 
objective value given by the algorithm is at least 
c-times as large than the optimal objective value. 

The online version of scheduling maximizing 
the minimal load is studied in [1]. The most 
straightforward online algorithm is the above- 
mentioned List algorithm which assigns the ac- 
tual job to the machine with the smallest load. 
It is 1/m-competitive and it is easy to see (con- 
sidering m jobs of size | and if they are assigned 
to different machines m—1 further jobs of size m) 
that no better deterministic online algorithm can 
be given. In [1] randomized algorithms are stud- 
ied, the authors presented an 1/O(./mlogm)- 
competitive randomized algorithm and proved 
that no randomized algorithm can have better 
competitive ratio than 1/S2(./m). The case of 
related machines is also studied and it is proved 
that no algorithm exists which has a competitive 
ratio depending on the number of machines. 

In semi-online problems usually some extra 
information is given to the algorithm. The first 
such model is also studied in [1]. The authors 
studied the version where the optimal value 
is known in advance and they presented an 
m/(2m — 1)-competitive algorithm and they 
proved that ifm = 2 orm = 3 then no semi- 
online algorithm in this model with better com- 
petitive ratio exists. In case of related machines 
and known optimal value, an 1/m-competitive 
algorithm is given. Several further semi-online 
version is studied in the literature. In [5] the 
semi-online version where the maximal job size 
is known in advance, in [6] the version where 
total processing time of all jobs and the largest 
processing time is known in advance is studied. 
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Applications 


The first paper [4] mentions an application as the 
motivation of the model. It is stated that the prob- 
lem was motivated by modeling the sequencing 
of maintenance actions for modular gas turbine 
aircraft engines. If a fleet of M identical machines 
(engines) are given and they must be kept opera- 
tional for as long as possible, then we obtain the 
objective to maximize the minimal load. 
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Problem Definition 


Consider two rooted trees 7; and T> withn leaves 
each. The internal nodes of each tree have at least 
two children each. The leaves in each tree are 
labeled with the same set of labels, and further, no 
label occurs more than once in a particular tree. 
An agreement subtree of T; and T> is defined as 
follows. Let L; bea subset of the leaves of 7; and 
let Lz be the subset of those leaves of T> which 
have the same labels as leaves in L;. The subtree 
of T; induced by Ly is an agreement subtree 
of 7; and T> if and only if it is isomorphic to 
the subtree of 72 induced by Lz. The maximum 
agreement subtree problem (henceforth called 
MAST) asks for the largest agreement subtree of 
T; and 7>. 

The terms induced subtree and isomorphism 
used above need to be defined. Intuitively, the 
subtree of T induced by a subset L of the leaves 
of T is the topological subtree of T restricted 
to the leaves in L, with branching information 
relevant to L preserved. More formally, for any 
two leaves a, b of a tree T, let lear (a,b) denote 
their lowest common ancestor in T. If a = b, 
Icar (a,b) = a. The subtree U of T induced by 
a subset L of the leaves is the tree with leaf set 
L and interior node set {lcar(a,b)|a,b € L} 
inheriting the ancestor relation from T, that is, 
for alla, b € L, leay (a,b) = lear (a,b). 

Intuitively, two trees are isomorphic if the 
children of each node in one of the trees can 
be reordered so that the leaf labels in each tree 
occur in the same order and the shapes of the 
two trees become identical. Formally, two trees 
U, and U2 with the same leaf labels are said 
to be isomorphic if there is a 1-1 mapping pu 
between their nodes, mapping leaves to leaves 
with the same labels, and such that for any two 
different leaves a,b of U,, w(Icayi(a,b)) = 


Icay2(u(a), (d)). 
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Key Results 


Previous Work 

Finden and Gordon [8] gave a heuristic algorithm 
for the MAST problem on binary trees which had 
an O(n>) running time and did not guarantee 
an optimal solution. Kubicka, Kubicki, and 
McMorris [12] gave an O(n ©+© 8”) algorithm 
for the same problem. The first polynomial time 
algorithm for this problem was given by Steel 
and Warnow [14]; it had a running time of 
O(n). Steel and Warnow also considered the 
case of nonbinary and unrooted trees. Their 
algorithm takes O(n?) time for fixed-degree 
rooted and unrooted trees and O(n*? logn) 
for arbitrary-degree rooted and unrooted trees. 
They also give a linear reduction from the rooted 
to the unrooted case. Farach and Thorup gave 
an O (nevE") time algorithm for the MAST 


problem on binary trees; here c is a constant 
greater than |. For arbitrary-degree trees, their 


algorithm takes O (n?cV"E") 


unrooted case [6] and O(n! logn) time for 
the rooted case [7]. Farach, Przytycka, and 
Thorup [4] obtained an O(n log? n) algorithm 
for the MAST problem on binary trees. Kao [11] 
obtained an algorithm for the same problem 
which takes O(nlog?n) time. This algorithm 


takes O(min{nd? logd log? n, nd2 log? n}) for 
degree d trees. 

The MAST problem for more than two trees 
has also been studied. Amir and Keselman [1] 
showed that the problem is NP-hard for even 3 
unbounded degree trees. However, polynomial 
bounds are known [1, 5] for three or more 
bounded degree trees. 


time for the 


Our Contribution 

An O(nlogn) algorithm for the MAST prob- 
lem for two binary trees is presented here. This 
algorithm is obtained by improving upon the 
O(n log? n) algorithm from [4] (in fact, the final 
journal version [3] combines both papers). The 
O(n log? n) algorithm of [4] can be viewed as 
taking the following approach (although the au- 
thors do not describe it this way). It identifies two 
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special cases and then solves the general case by 
interpolating between these cases. 


Special Case 1 

The internal nodes in both trees form a path. The 
MAST problem reduces to essentially a size n 
Longest Increasing Subsequence problem in this 
case. As is well known, this can be solved in 
O(n logn) time. 


Special Case 2 

Both trees 7; and 72 are complete binary trees. 
For each node v in 7», only certain nodes u in 7; 
can be usefully mapped to v, in the sense that the 
subtree of 7, rooted at u and the subtree of 7> 
rooted at v have a nonempty agreement subtree. 
There are O(n log” n) such pairs (u,v). This can 
be seen as follows. Note that for (u,v) to be 
such a pair, the subtree of T; rooted at u and the 
subtree of 7> rooted at v must have a leaf label in 
common. For each label, there are only O(log” n) 
such pairs obtained by pairing each ancestor of 
the leaf with this label in 7, with each ancestor 
of the leaf with this label in T>. The total number 
of interesting pairs is thus O(n log? n). For each 
pair, computing the MAST takes O(1) time, as it 
is simply a question of deciding the best way of 
pairing their children. 

The interpolation process takes a centroid de- 
composition of the two trees and compares pairs 
of centroid paths, rather than individual nodes 
as in the complete tree case. The comparison of 
a pair of centroid paths requires finding match- 
ings with special properties in appropriately de- 
fined bipartite graphs, a nontrivial generalization 
of the Longest Increasing Subsequence prob- 
lem. This process creates O(n log” n) interesting 
(u, v) pairs, each of which takes O(log 7) time to 
process. 

This work provides two improvements, each 
of which gains a log n factor. 


Improvement 1 

The complete tree special case is improved to 
O(nlogn) time as follows. A pair of nodes 
(u,v),u € Ty,v € To, is said to be interesting 
if there is an agreement subtree mapping u to 
v. As is shown below, for complete trees, the 
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total number of interesting pairs (u,v) is just 
O(n logn). Consider a node v in T>. Let Lz be 
the set of leaves which are descendants of v. Let 
L, be the set of leaves in T; which have the same 
labels as the leaves in Lz. The only nodes that 
may be mapped to v are the nodes u in the subtree 
of T; induced by L;. The number of such nodes 
uis O(size of the subtree of 7> rooted at v). 
The total number of interesting pairs is thus the 
sum of the sizes of all subtrees of 7>, which is 
O(n logn). 

This reduces the number of interesting pairs 
(u,v) to O(nlogn). Again, processing a pair 
takes O(1) time (this is less obvious, for identify- 
ing the descendants of u which root the subtrees 
with which the two subtrees of v can be matched 
is nontrivial). Constructing the above induced 
subtree itself can be done in O(| L; |) time, as 
will be detailed later. The basic tool here is to 
preprocess trees 7; and 72 in O(n) time so 
that the least common ancestor queries can be 
answered in O(1) time. 


Improvement 2 

As in [4], when the trees are not complete bi- 
nary trees, the algorithm takes centroid paths and 
matches pairs of centroid paths. The O(logn) 
cost that the algorithm in [4] incurs in processing 
each such interesting pair of paths arises when 
there are large (polynomial in n size) instances of 
the generalized Longest Increasing Subsequence 
problem. At first sight, it is not clear that large 
instances of these problems can be created for 
sufficiently many of the interesting pairs; unfor- 
tunately, this turns out to be the case. However, 
these problem instances still have some useful 
structure. By using (static) weighted trees, pairs 
of interesting vertices are processed in O(1) time 
per pair, on the average, as is shown by an 
appropriately parametrized analysis. 


The Multiple Degree Case 

The techniques can be generalized to higher de- 
gree bounds d > 2 by combining it with tech- 
niques from [6, Sect. 2] for unbounded degrees. 
This appears to yield an algorithm with run- 
ning time O(min{n Jd log” n,nd logn log d}). 
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The conjecture, however, is that there is an algo- 
rithm with running time O(n Vd logn). 


Applications 


Motivation 

The MAST problem arises naturally in biology 
and linguistics as a measure of consistency be- 
tween two evolutionary trees over species and 
languages, respectively. An evolutionary tree for 
a set of taxa, either species or languages, is a 
rooted tree whose leaves represent the taxa and 
whose internal nodes represent ancestor informa- 
tion. It is often difficult to determine the true 
phylogeny for a set of taxa, and one way to gain 
confidence in a particular tree is to have different 
lines of evidence supporting that tree. In the 
biological taxa case, one may construct trees from 
different parts of the DNA of the species. These 
are known as gene trees. For many reasons, these 
trees need not entirely agree, and so one is left 
with the task of finding a consensus of the various 
gene trees. The maximum agreement subtree is 
one method of arriving at such a consensus. 
Notice that a gene is usually a binary tree, since 
DNA replicates by a binary branching process. 
Therefore, the case of binary trees is of great 
interest. 

Another application arises in automated trans- 
lation between two languages (Grishman and 
Yangarber, NYU, Private Communication). The 
two trees are the parse trees for the same meaning 
sentences in the two languages. A complication 
that arises in this application (due in part to 
imperfect dictionaries) is that words need not 
be uniquely matched, i.e., a word at the leaf of 
one tree could match a number (usually small) 
of words at the leaves of the other tree. The 
aim is to find a maximum agreement subtree; 
this is done with the goal of improving context- 
using dictionaries for automated translation. So 
long as each word in one tree has only a con- 
stant number of matches in the other tree (possi- 
bly with differing weights), the algorithm given 
here can be used, and its performance remains 
O(n logn). More generally, if there are m word 
matches in all, the performance becomes O((m+ 
n)logn). Note, however, that if there are two 
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collections of equal-meaning words in the two 
trees of sizes k, and k2 respectively, they induce 
kk matches. 
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Problem Definition 


The maximum agreement subtree problem for k 
trees (k-MAST) is a generalization of a similar 
problem for two trees (MAST). Consider a tuple 
of k rooted leaf-labeled trees (T,, Tz... Ty). Let 
A = {d1,d2,...ay} be the set of leaf labels. 
Any subset B C A uniquely determines the so- 
called topological restriction T|B of the three T 
to B. Namely, T|B is the topological subtree of 
T spanned by all leaves labeled with elements 
from B and the lowest common ancestors of all 
pairs of these leaves. In particular, the ancestor 
relation in 7'|B is defined so that it agrees with 
the ancestor relation in 7. A subset B of A such 
T!|B,...,T*|B are isomorphic is called an 
agreement set. 


Problem 1 (k-MAST) INPUT: A tuple T = 
Tat”) of leaf-labeled trees, with a 
common set of labels A = {a1,...,@n}, such 
that for each tree 7’ there exists one-to-one 
mapping between the set of leaves of that tree 
and the set of labels A. 

OUTPUT: k-MAST(T) is equal to the maxi- 
mum cardinality agreement set of ie 


Key Results 


In the general setting, K-MAST problem is NP- 
complete for k > 3 [1]. Under the assumption 
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that the degree of at least one of the trees is 
bounded, Farach et al. proposed an algorithm 
leading to the following theorem: 


Theorem 1 [f the degree of the trees in the tuple 
T = (T!,...,T*) is bounded by d, then the k- 
MAST(T) can be computed in O(kn? +n“) time. 


In what follows, the problem is restricted to 
finding the cardinality of the maximum agree- 
ment set rather than the set itself. The extension 
of this algorithm to an algorithm that finds the 
agreement set (and subsequently the agreement 
subtree) within the same time bounds is relatively 
straightforward. 

Recall that the classical O(n?) dynamic 
programming algorithm for MAST of two 
binary trees [11] processes all pairs of internal 
nodes of the two trees in a bottom-up fashion. 
For each pair of such nodes, it computes the 
MAST value for the subtrees rooted at this pair. 
There are O(n”) pairs of nodes, and the MAST 
value for the subtrees rooted at a given pair of 
nodes can be computed in constant time from 
MAST values of previously processed pairs of 
nodes. 

To set the stage for the more general case, 
let K-MAST(¥) be the solution to the k-MAST 
problem for the subtrees of T!(v1),..., 7" (vg) 
where T'(v;) is the subtree if J’ rooted at 
v;. If, for all i, u; is a strict ancestor of v; 
in T', then, t is dominated by u (denoted 
U <i). 

A naive extension of the algorithm for two 
trees to an algorithm for k trees would require 
computing kK-MAST(v) for all possible tuples ¥ 
by processing these tuples in the order consistent 
with the domination relation. The basic idea that 
allows to avoid Q(n*) complexity is to replace 
the computation of k-MAST(v) with the com- 
putation of a related value, mast(v), defined to 
be the size of the maximum agreement set for 
the subtrees of (T!,..., T*) rooted at (11,... Vx) 
subject to the additional restriction that the agree- 
ment subtrees themselves are rooted at vj,... vx, 
respectively. Clearly 


k-MAST(T!,..., 7") = max; mast(@). 
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The benefit of computing mast rather than k- 
MAST follows from the fact that most of mast 
values are zero and it is possible to identify (very 
efficiently) 0 with nonzero mast values. 


Remark 1 Vf mast(v) > 0 then ¥ = (leat (a,b), 
.. deal (a,b)) for some leaf labels a,b where 
Ica"(a,b) is the lowest common ancestor of 
leaves labeled by a and b in the tree 7’. 


A tuple v such that ) = (Ica™ (a,b),... leaT™ 
(a,b)) for some a,b € A is called an Ica- 
tuple. By Remark | it suffices to compute mast 
values for the Ica-tuples only. Just like in the 
naive approach, mast(v) is computed from mast 
values of other Ica-tuples dominated by v. An- 
other important observation is that only some Ica- 
tuples dominated by v are needed to compute 
mast(v). To capture this, Farach et al. define the 
so-called proper domination relation (introduced 
formally below) and show that the mast value 
for any Ica-tuple } can be computed from mast 
values of Ica-tuples properly dominated by v 
and that the proper domination relation has size 


O(n3). 


Proper Domination Relation 


Index the children of each internal node of any 
tree in an arbitrary way. Given a pair v,w of 
Ica-tuples such that w ~< v the correspond- 
ing domination relation has associated with it 
direction i (6,,...,6%) where w; de- 
scends from the child of v; indexed with 4;. 
Let v;(j) be the child of v; with index 7. The 
direction domination is termed active is if the 
subtrees rooted at the v1(61),...,v%(6,) have 
at least one leaf label in common. Note that 
each leaf label can witness only one active di- 
rection, and consequently each Ica-tuple can have 
at most 1 active domination directions. Two di- 
rections 6;.3 and 6; are called compatible if 
and only if the direction vectors differ in all 
coordinates. 


Definition 1 v properly denominates u (denoted 
u < v) if V dominates uv along an active direction 
6 and there exists another tuple w which is also 
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dominated by 0 along an active direction 61 
compatible with 6. 


From the definition of proper domination and 
from the fact that each leaf label can witness only 
one active domination direction, the following 
observations can be made: 


Remark 2 The strong domination relation < on 
Ica-tuples has size O(n). Furthermore, the rela- 
tion can be computed in O(kn?) time. 


Remark 3 For any Ica-tuple v, if mast(v) > 0 
then either ¥ is an lca-tuple composed of leaves 
with the same label or 0 properly dominates some 
Ica-tuple. 


It remains to show how the values mast(v) 
are computed. For each Ica-tuple v, the so-called 
compatibility graph G(v) is constructed. The 
nodes of G(v) are active directions from v and 
there is an edge between two such nodes if and 
only if corresponding directions are compatible. 
The vertices of G(U) are weighted and the weight 
of a vertex corresponding to an active direction 
5 equals the maximum mast value of a Ica-tuple 
dominated by v along the this direction. Let 
MWC(G(v)) be the maximum weight clique in 
G(v). 

The bottom-up algorithm for computing 
nonzero mast values based on the following 
recursive dependency whose correctness follows 
immediately from the corresponding definitions 
and Remark 3: 


Lemma 1 For any Ica-tuple 


mast(i)= max 1 if all elements of ¥U are leaves 
7 MWC(G(v)) otherwise 


(1) 


The final step is to demonstrate that once the 
Ica-tuples and the strong domination relation 
is precomputed, the computation all nonzero 
mast values can be preformed in O(n“) time. 
This is done by generating all possible cliques 
for all G(v). Using the fact that the degree 
of at least one tree is bounded by d, one can 
show that all the cliques can be generated in 
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o( > ()) = O(d3(ne/d)4) time and that 
l<d 
there is O(d(ne/d)@) of them [6]. 


Applications 


The k-MAST problem is motivated by the need 
to compare evolutionary trees. Recent advances 
in experimental techniques in molecular biology 
provide diverse data that can be used to con- 
struct evolutionary trees. This diversity of data 
combined with the diversity of methods used to 
construct evolutionary trees often leads to the 
situation when the evolution of the same set 
of species is explained by different evolutionary 
trees. The maximum agreement subtree prob- 
lem has emerged as a measure of the agreement 
between such trees and as a method to iden- 
tify subtree which is common for these trees. 
The problem was first defined by Finden and 
Gordon in the context of two binary trees [7]. 
These authors also gave a heuristic algorithm 
to solve the problem. The O(n”) dynamic pro- 
gramming algorithm for computing MAST val- 
ues for two binary trees has been given in [11]. 
Subsequently, a number of improvements leading 
to fast algorithms for computing MAST value 
of two trees under various assumptions about 
rooting and tree degrees [5, 8, 10] and references 
therein. 

The MAST problem for three or more un- 
bounded degree trees is NP-complete [1]. Amir 
and Keselman report an O(kn4*! + 724) time 
algorithm for the agreement of k bounded de- 
gree trees. The work described here provides a 
O(kn? + n@) for the case where the number of 
trees is k and the degree of at least one tree 
is bounded by d. For d = 2 the complex- 
ity of the algorithm is dominated by the first 
term. An O(kn*) algorithm for this case was 
also given by Bryant [4] and O(n? log*~!n) 
implementation of this algorithm was proposed 
in [9]. 

k-MAST problem is a fixed parameter 
tractable in p, the smallest number of leaf labels 
such that the removal of the corresponding leaves 
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produces agreement (see [2] and references 
therein). The approximability of the MAST 
and related problem has been studied in [3] and 
references therein. 
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Problem Definition 


A phylogenetic tree is a rooted, unordered tree 
whose leaves are distinctly labeled and whose 
internal nodes have degree at least two. By dis- 
tinctly labeled, we mean that no two leaves in the 
tree have the same label. Let T be a phylogenetic 
tree with a leaf label set S. For any subset S’ of S, 
the topological restriction of T to S’ (denoted by 
T | S’) is the tree obtained from T by deleting all 
nodes which are not on any path from the root 
to a leaf in S’ along with their incident edges 
and then contracting every edge between a node 
having just one child and its child. See Fig. 1 
for an illustration. For any phylogenetic tree T, 
denote its set of leaf labels by A(T). 

The maximum agreement supertree problem 
(MASP) [12] is defined as follows. 


Problem1 Let 7 = {71,7,...,T,%} be an 
input set of phylogenetic trees, where the sets 
A(T;) may overlap. The maximum agreement su- 
pertree problem (MASP) asks for a phylogenetic 
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Maximum Agreement Supertree, Fig. 1 Let 7 be the 
phylogenetic tree on the Jeff. Then T | {a,c,d} is the 
phylogenetic tree shown on the right 


tree Q with leaf label set A(Q) © Uz,e7 AMZi) 
such that |A(Q)| is maximized and for each 
T; € T, it holds that T; | A(Q) is isomorphic to 
Q | A(7;). 


The following notation is used below: 
n= lUr.er A(T) ,k = |J|, and D = 
max7;,e7 {deg(7;)}, where deg(7;) is the degree 
of 7; (1.e., the maximum number of children of 
any node belonging to 7;). 

A problem related to MASP is the maximum 
compatible supertree problem (MCSP) [2]: 


Problem 2 Let T = {71,7>,...,T,} be an 
input set of phylogenetic trees, where the sets 
A(T;) may overlap. The maximum compatible 
supertree problem (MCSP) asks for a phylo- 
genetic tree W with leaf label set A(W) C 
Ur,er A(7i) such that |A(W)| is maximized 
and for each 7; € 7, it holds that T; | A(W) can 
be obtained from W | A(7;) by applying a series 
of edge contractions. 


For information about MCSP, refer to [2, 11]. 


Key Results 


The special case of the maximum agreement su- 
pertree problem in which A(7T,) = A(7T2)... = 
A(T;,) has been well studied in the literature and 
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is also known as the maximum agreement subtree 
problem (MAST). By utilizing known results for 
MAST, several results can be obtained for various 
special cases of MASP. Firstly, it is known that 
MAST can be solved in O(V/Dn log(2n/D)) 
time when k = 2 (see [13]) or in O(kn? +n?) 
time when k > 3 (see [4,6]), which leads to the 
following theorems. 


Theorem 1 ([12]) When k = 2, MASP can 
be solved in O(Tyast + n) time, where 
Tuast is the time required to solve MAST 
for two O(n)-leaf trees. Note that TuastT = 


O(VD n log(2n/D)). 


Theorem 2 ((2]) For any fixed k > 3, if 
every leaf appears in either 1 or k trees, 
MASP can be solved in O(Tyas7 + kn) 
time, where Ty AsT #8 the time required to 
solve MAST for {T|L, T2|L, ..., Ty|L}, where 
L = ()\re7 ATi). Note that Tyssr = 
O(K|L|? + |L|?). 


On the negative side, the maximum agree- 
ment supertree problem is NP-hard in general, 
as shown by the next theorem. (A rooted triplet 
is a binary phylogenetic tree with exactly three 
leaves.) 


Theorem 3 ((2,12]) For any fixed k > 3, MASP 
with unbounded D is NP-hard. Furthermore, 
MASP with unbounded k remains NP-hard even 
if restricted to rooted triplets, i.e, D = 2. 


The inapproximability results for MAST by 
Hein et al. [9] and Gasieniec et al. [7] immedi- 
ately carry over to MASP with unbounded D as 
follows. 


Theorem 4 ([2,  12]) cannot be 
mated within a factor of glos*n in polyno- 
mial time for any constant 6 < 1, unless 
NP. © DTIME[2?¥!°8" }, even when restricted 
to k = 3. Also, MASP cannot be approximated 
within a factor of n® for any constant €¢ where 
0O<e< 3 in polynomial time unless P = 
NP, even for instances containing only trees of 


height 2. 


approxi- 


Although MASP is difficult to approximate 
in polynomial time, a simple approximation 
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algorithm based on a technique from [1] achieves 
an approximation factor that is close to the 
bounds given in Theorem 4. 


Theorem 5 ((12]) MASP can be approximated 
within a factor of yr in O(n?) - min{O(k - 
(loglogn)?), O(k + logn - loglogn)} time. 
MASP restricted to rooted triplets can be 
approximated within a factor of 


O(k + n? log? n) time. 


logn 


Fixed-parameter tractable algorithms for solv- 
ing MASP also exist. In particular, for binary 
phylogenetic trees, Jansson et al. [12] first gave 
an O(k(2n?)3**)-time algorithm. Later, Guille- 
mot and Berry [8] improved the time complex- 
ity to O((8n)*). Hoang and Sung [11] further 
improved the time complexity to O((6n)*), as 
summarized in Theorem 6. 


Theorem 6 ([11]) MASP restricted to D = 2 
can be solved in O((6n)*) time. 


For the case where each tree in 7 has degree at 
most D, Hoang and Sung [11] gave the following 
fixed-parameter polynomial-time solution. 


Theorem 7 ([11]) MASP restricted to phyloge- 
netic trees of degree at most D can be solved in 


O((kD)*?+3(2n)*) time. 


For unbounded n, k, and D, Guillemot and 
Berry [8] proposed a solution that is efficient 
when the input trees are similar. 


Theorem 8 ([8]) MASP can be_ solved in 
O((2k)?kn7) time, where p is an upper bound 
on the number of leaves that are missing from 
Ur,e7 A(7j) in a MASP solution. 


Applications 


One challenge in phylogenetics is to develop 
good methods for merging a collection of phy- 
logenetic trees on overlapping sets of taxa into 
a single supertree so that no (or as little as 
possible) branching information is lost. Ideally, 
the resulting supertree can then be used to deduce 
evolutionary relationships between taxa which 
do not occur together in any one of the in- 
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put trees. Supertree methods are useful because 
most individual studies investigate relatively few 
taxa [15] and because sample bias leads to certain 
taxa being studied much more frequently than 
others [3]. Also, supertree methods can combine 
trees constructed for different types of data or 
under different models of evolution. Furthermore, 
although computationally expensive methods for 
constructing reliable phylogenetic trees are in- 
feasible for large sets of taxa, they can be ap- 
plied to obtain highly accurate trees for smaller, 
overlapping subsets of the taxa which may then 
be merged using computationally less intense, 
supertree-based techniques (see, e.g., [5, 10, 14]). 

Since the set of trees which is to be combined 
may in practice contain contradictory branching 
structure (e.g., if the trees have been constructed 
from data originating from different genes or if 
the experimental data contains errors), a supertree 
method needs to specify how to resolve conflicts. 
One intuitive idea is to identify and remove a 
smallest possible subset of the taxa so that the 
remaining taxa can be combined without con- 
flicts. In this way, one would get an indication of 
which ancestral relationships can be regarded as 
resolved and which taxa need to be subjected to 
further experiments. The above biological prob- 
lem can be formalized as MASP. 


Open Problems 


An open problem is to improve the time complex- 
ity of the currently fastest algorithms for solving 
MASP. Moreover, the existing fixed-parameter 
polynomial-time algorithms for MASP are not 
practical, so it could be useful to provide heuris- 
tics that work well on real data. 
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Problem Definition 


The input to an instance of the classical stable 
marriage problem consists of a set of n men and 
n women. Additionally, each person provides a 
strictly ordered preference list of the opposite set. 
The goal is to find a complete matching of men 
to women that is also stable, i.e., a matching 
having the property that there does not exist a 
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man and a woman who prefer each other over 
their matched assignment. In their seminal work, 
Gale and Shapley [2] showed that every instance 
of the stable marriage problem admits at least 
one stable matching and showed that one can 
be found in polynomial time (see entry » Stable 
Marriage). In fact, stable marriage instances can 
have exponentially many stable matchings [18]. 

More general settings arise when relaxations 
of Gale and Shapley’s original version are permit- 
ted. In the stable marriage problem with incom- 
plete lists (SMI), men and women may deem ar- 
bitrary members of the opposite set unacceptable, 
prohibiting the pair from being matched together. 
In the stable marriage problem with ties (SMT), 
preference lists need not be strictly ordered but 
may instead contain subsets of agents all having 
the same rank. Instances of SMT and SMI always 
admit a stable matching, and, crucially, all stable 
matchings for a fixed instance have the same 
cardinality. Interestingly, when both ties and in- 
complete lists are allowed (denoted SMTI, see 
entry > Stable Marriage with Ties and Incomplete 
Lists), stable matchings again exist but can differ 
in cardinality. How can we find one of maximum 
cardinality? 


Key Results 


Benchmark Results 

Manlove et al. [20] established two key bench- 
marks for the problem of computing a maximum 
cardinality stable matching (MAX-SMTI). First, 
they showed that the problem is NP-hard under 
the following two simultaneous restrictions. One 
set of agents, say, the men, all have strictly 
ordered preference lists, while each woman’s 
preference list is either strictly ordered or is a tie 
of length two. Second, they showed that MAX- 
SMTI is approximable within a factor of 2 by 
arbitrarily breaking the ties and finding any stable 
matching in the resulting SMI instance. 

Since then, researchers have focused on im- 
proving the approximability bounds for MAX- 
SMTI. The severity of the restrictions in Manlove 
et al.’s hardness results has led researchers to 
study not only the general version but a number 
of special cases of MAX-SMTI as well. 
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Upper Bounds: The General Case 
For the general case of MAX-SMTI, Iwama et al. 


[12] gave a2—c (22) approximation algorithm, 
where c is a positive constant. This algorithm was 
subsequently improved to yield a performance 


guarantee of 2 — Te where c’ is a positive 


constant which is at most 1/ 4/6 [14]. The first 
approximation algorithm to achieve a constant 
performance guarantee better than two was given 
by Iwama et al. [13], establishing a performance 
ratio of 15/8. Next, Kiraly [16] devised a new 
approximation algorithm with a bound of 5/3. Fi- 
nally, McDermid [21] obtained 3/2, which is cur- 
rently the best known approximation ratio. Later, 
Paluch [22] and Kiraly [17] also obtained approx- 
imation algorithms with the same performance 
guarantee of 3/2; however, their algorithms have 
the advantage of running in linear time. Kiraly’s 
has the extra benefit of requiring only “local” 
preference list information. The following theo- 
rem summarizes the best known upper bound for 
the general case. 


Theorem 1 There is a 3/2-approximation algo- 
rithm for MAX-SMTI. 


Upper Bounds: Special Cases 

The special case of MAX-SMTI that has received 
the most attention is that in which ties may only 
appear in one set only. We let 1S-MAX-SMTI 
denote this problem. Halldorsson et al. [3] gave 
a (2/(1 + T~*))-approximation algorithm for 
1S-MAX-SMTI, where T is the length of the 
longest tie. This bound was improved to 13/7 
for MAX-SMTI instances in which ties are re- 
stricted to be of size at most 2 [3]. They later 
showed that 10/7 is achievable for 1S-MAX- 
SMTI [4] via a randomized approximation al- 
gorithm. Irving and Manlove [11] described a 
5/3-approximation algorithm for 1S-MAX-SMTI 
instances in which lists may have at most one tie 
that may only appear at the end of the preference 
list. One of the most important results in this 
area was that of Kiraly [16], who provided a par- 
ticularly simple and elegant 3/2-approximation 
algorithm with an equally transparent analysis 
for 1S-MAX-SMTI (with no further restrictions 
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on the problem). Since then, further improve- 
ments have been obtained for 1S-MAX-SMTI 
by Iwama, Miyazaki, and Yanagisawa [15], who 
exploited a linear programming relaxation to ob- 
tain a 25/17-approximation. Huang and Kavitha 
[10] used different techniques to improve upon 
this, giving a linear-time algorithm with a ratio 
of 22/15. Radnai [23] tightened their analysis to 
show that 41/28 is in fact achieved. Finally, Dean 
and Jalasutram [1] showed that the algorithm 
given in [15] actually achieves 19/13 through an 
analysis using a factor-revealing LP. The follow- 
ing theorem summarizes the best known upper 
bounds for the special cases of MAX-SMTI. 


Theorem 2 There is a 19/13-approximation al- 
gorithm for 1S-MAX-SMTI. When all ties have 
length at most two, there is a (randomized) 10/7- 
approximation algorithm for MAX-SMTI. 


Lower Bounds 

The best lower bounds on approximability are 
due to Yanagisawa [24] and Iwama et al. [5]. 
Yanagisawa [24] showed that MAX-SMTIT is NP- 
hard to approximate within 33/29 and UGC- 
hard to approximate within 4/3. 1S-MAX-SMTI 
was shown by Iwama et al. [5] to be NP-hard 
to approximate within 21/19 and UGC-hard to 
approximate within 5/4. The next theorem sum- 
marizes these results. 


Theorem 3 /t is NP-hard to approximate MAX- 
SMTI (1S-MAX-SMTI) within 33/29 (21/19). It is 
UGC-hard to approximate MAX-SMTI (1S-MAX- 
SMTI) within 4/3 (5/4). 


Applications 


Stable marriage research is a fascinating subset 
of theoretical computer science not only for its 
intrinsic interest but also for its widespread ap- 
plication to real-world problems. Throughout the 
world, centralized matching schemes are used in 
various contexts such as the assignment of stu- 
dents to schools and graduating medical students 
to hospitals. We direct the reader to [19, Section 
1.3.7] for a comprehensive overview (see also 
entry >» Hospitals/Residents Problem). Perhaps 
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the most famous of these is the National Resident 
Matching Program (NRMP) [7] in the United 
States, which allocates over 35,000 graduating 
medical students to their first job at a hospital. 
Similar schemes exist in Canada [8], Scotland 
[9], and Japan [6]. In one way or another, all 
of these matching schemes require one or both 
of the sets involved to produce preference lists 
ranking the other set. Methods similar to the 
Gale-Shapley algorithm are then used to create 
the assignments. 

Both economists and computer scientists alike 
have influenced the design and implementation 
of such matching schemes. In fact, the 2012 
Nobel Prize for Economic Sciences was awarded 
to Alvin Roth and Lloyd Shapley, in part for 
their contribution to the widespread deployment 
of matching algorithms in practical settings. Re- 
searchers Irving and Manlove at the School of 
Computing Science at the University of Glasgow 
led the design and implementation of algorithms 
for the Scottish Foundation Allocation Scheme 
that have been used by NHS Education for Scot- 
land to assign graduating medical students to 
hospital programs [19, Section 1.3.7]. This set- 
ting has actually yielded true instances of MAX- 
SMTI, as hospital programs have been allowed to 
have ties in their preference lists. 
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Problem Definition 


This problem is a pattern matching problem on 
leaf-labeled trees. Each input tree is considered 
as a branching pattern inducing specific groups 
of leaves. Given a set of input trees with identical 
leaf sets, the goal is to find the largest subset 
of leaves on the branching pattern of which the 
input trees do not disagree. A maximum com- 
patible tree is a tree on such a leaf set and 
with a branching pattern respecting that of each 
input tree (see below for a formal definition). 
The maximum compatible tree problem (MCT) 
is to find such a tree or, equivalently, its leaf 
set. The main motivation for this problem is in 
phylogenetics, to measure the similarity between 
evolutionary trees or to represent a consensus 
of a set of trees. The problem was introduced 
in [10] and [11, under the MRST acronym]. 
Previous related works concern the well-known 
maximum agreement subtree problem (MAST). 


Maximum Compatible Tree 
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Maximum Compatible Tree, Fig. 1 Three unrooted trees. A tree T, a tree JT’ such that T’ = T'|{a, c, e}, anda tree 


T” such that T” > T 


Solving MAST is finding the largest subset of 
leaves on which all input trees exactly agree. 
The difference between MAST and MCT is that 
MAST seeks a tree whose branching information 
is isomorphic to that of a subtree in each of the 
input trees, while MCT seeks a tree that contains 
the branching information (i.e., groups) of the 
corresponding subtree of each input tree. This 
difference allows the tree obtained for MCT to 
be more informative, as it can include branch- 
ing information present in one input tree but 
not in the others, as long as this information 
is compatible (in the sense of [14]) with the 
latters. Both problems are equivalent when all 
input trees are binary. Ganapathy and Warnow [6] 
were the first to give an algorithm to solve MCT 
in its general form. Their algorithm relies on a 
simple dynamic programming approach similar 
to a work on MAST [13] and has a running time 
exponential in the number of input trees and in 
the maximum degree of a node in the input trees. 
Later, [1] proposed a fixed-parameter algorithm 
using one parameter only. Approximation results 
have also been obtained, [3,7] proposing low-cost 
polynomial-time algorithms that approximate the 
complement of MCT within a constant factor. 


Notations Trees considered here are evolution- 
ary trees (phylogenies). Such a tree T has its 
leaf set L(T) in bijection with a label set and 
is either rooted, in which case all internal nodes 
have at least two children each, or unrooted, in 
which case internal nodes have a degree of at 
least three. Given a set L of labels and a tree 
T, the restriction of T to L, denoted T|L, is 
the tree obtained in taking the smallest induced 
subgraph of T that connects leaves with labels 


in L M L(T) and then removing any degree-two 
(non-root) node to make the tree homeomorphi- 
cally irreducible. Two trees 7, T’ are isomorphic, 
denoted T = T’, if and only if there is a graph 
isomorphism T +> T” preserving leaf labels 
(and the roots if both trees are rooted). A tree T 
refines a tree T’, denoted T & T’, whenever T 
can be transformed into T’ by collapsing some 
of its internal edges (collapsing an edge means 
removing it and merging its extremities). See 
Fig. 1 for examples of these relations between 
trees. Note that a tree T properly refining another 
tree T’ agrees with the entire evolutionary history 
of T’ while containing additional information 
absent from T’: at least one high-degree node of 
T’ is replaced in T by several nodes of lesser 
degree; hence, 7 contains more information than 
T’ on which species belong together. 

Given a collection 7 = {T,7>,..., Tx} of 
input trees with identical leaf sets L, a tree T 
with leaves in L is said to be compatible with 
T if and only if V7; € 7, T © T7;|L(T). If 
there is a tree T compatible with 7 such that 
L(T) = L, then the collection 7 is said to 
be compatible. Knowing whether a collection is 
compatible is a problem for which linear-time 
algorithms have been known for a long time 
(e.g., [9]). The MAXIMUM COMPATIBLE TREE 
problem is a natural optimization version of this 
problem to deal with incompatible collections of 
trees. 


Problem 1 (MAXIMUM COMPATIBLE TREE — 
MCT ) 


InpuT: A collection 7 of trees with the same 
leaf sets. 
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Maximum Compatible Tree, Fig. 2. An incompatible 
collection of two input trees {7 , J} and their maximum 
compatible tree, T MCT (T, T2). Removing the 
leaf d renders the input trees compatible, hence L(T) = 
{a,b,c,e}. Here, T strictly refines T2 restricted to 


OUTPUT: A tree compatible with 7 having the 
largest number of leaves. Such a tree is de- 
noted MCT(T). 


See Fig.2 for an example. Note that V7, 
|MCT(T)| = |MAST(T)| and that MCT is 
equivalent to MAST when the input trees are 
binary. Note also that instances of MCT and MAST 
can have several optimum solutions. 


Key Results 


Exact Algorithms 

The MCT problem was shown to be NP-hard on 6 
trees by [10] and then on 2 trees by [11]. The NP- 
hardness holds as long as one of the input trees is 
not of bounded degree. For two bounded-degree 
trees, Hein et al. [11] mention a polynomial-time 
algorithm based on aligning trees. Ganapathy and 
Warnow propose an exponential algorithm for 
solving MCT in the general case [6]. Given two 
trees T|, 72, they show how to compute a binary 
MCT of any pair of subtrees (S; € 7), S2 € T>) 
by dynamic programming. Subtrees whose root is 
of high degree are handled by considering every 
possible partition of the roots’s children in two 
sets. This leads the complexity bound to have 
a term exponential in d, the maximum degree 
of a node in the input trees. When dealing with 
k input trees, k-tuples of subtrees are consid- 
ered, and the simultaneous bipartitions of the 
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MCT(T,T)) 


L(T), which is expressed by the fact that a node in T> 
(the blue one) has its child subtrees distributed between 
several connected nodes of T (blue nodes). Note also that 


here |MCT(T1, T>)| > |MAST(T\, T2)| 


roots’s children for k subtrees are considered. 
Hence, the complexity bound is also exponential 
ink. 


Theorem 1 ([6]) Let L be a set of n leaves. 
The MCT problem for a collection of k rooted 
trees on L in which each tree has degree 
at most d + 1 can be solved in O(2?*4n*) 
time. 


The result easily extends to unrooted trees 
by considering each of the n leaves in 
turn as a possible root for all trees of the 
collection. 


Theorem 2 ([6]) Given a collection of k un- 
rooted trees with degree at most d + 1 on an 


n-leaf set, the MCT problem can be solved in 
O(22k4d nk+1), 


Let 7 be a collection on a set L of n leaves, 
[1] considered the following decision problem 
denoted MCT): given TJ and p ¢€ [0,n], does 
IMCT(T)|=n-— p? 


Theorem 3 ([1]) 


1. MCTpy on rooted trees can be solved in 
O(min{3?kn, 2.27? + kn3}) time. 
2. MCTp on unrooted trees can be solved in 


O((p + 1) x min{3?kn, 2.27? + kn3}) time. 


The 3?kn term comes from an algorithm that 
first identifies in O(kn) time a 3-leaf set S 
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on which the input trees conflict and then re- 
cursively obtains a maximum compatible tree 
T,, resp. T>, T3 for each of the three col- 
lections 7), resp. 72,73 obtained by removing 
from the input trees a leaf in S and lastly re- 
turning the 7; such that |MCT(7;)| is maximum 
(with i € [1,3]). The 2.27? + kn? term comes 
from an algorithm using a reduction of MCT 
to 3-HITTING SET. Negative results have been 
obtained by Guillemot and Nicolas concerning 
the fixed-parameter tractability of MCT with re- 
gard to the maximum degree D of the input 
trees: 


Theorem 4 ([8]) 


1. MCT is W[1]-hard with respect to D. 

2. MCT cannot be solved in O(Nne2””?)) time 
unless SNP C SE, where N denotes the input 
length, i.e, N = O(kn). 

The MCT problem also admits a_ variant 

that deals with supertrees, ie., trees having 

different (but overlapping) sets of leaves. The 
resulting problem is W[2]-hard with respect to 


P [2). 


Approximation Algorithms 

The idea of locating and then eliminating suc- 
cessively all the conflicts between the input trees 
has also led to approximation algorithms for the 
complement version of the MCT problem, denoted 
CMCT. Let L be the leaf set of each tree in 
an input collection 7; CMCT aims at selecting 
the smallest number of leaves S C L such 
that the collection {7;|(L — S) : T; € T} is 
compatible. 


Theorem 5 ([7]) Given a collection T of k 
rooted trees on an n-leaf set L, there is a 3- 


approximation algorithm for CMCT that runs in 
O(k?n?) time. 
The running time of this algorithm was later 


improved: 


Theorem 6 ([3, 5]) There is an O(kn + n?) 
time 3-approximation algorithm for CMCT on a 
collection of k rooted trees with n leaves. 
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Note also that working on rooted or unrooted 
trees does not change the achievable approxima- 
tion ratio for CMCT [3]. 


Applications 


In bioinformatics, the MCT problem (and sim- 
ilarly MAST) is used to reach different prac- 
tical goals. The first motivation is to measure 
the similarity of a set of trees. These trees can 
represent RNA secondary structures [11, 12] or 
estimates of a phylogeny inferred from differ- 
ent datasets composed of molecular sequences 
(e.g., genes) [14]. The gap between the size of 
a maximum compatible tree and the number of 
input leaves indicates the degree of disimilarity 
of the input trees. Concerning the phylogenetic 
applications, quite often some edges of the trees 
inferred from the datasets have been collapsed 
due to insufficient statistical support, resulting in 
some higher-degree nodes in the trees considered 
by MCT. Each such node does not indicate a 
multi-speciation event but rather the uncertainty 
with respect to the branching pattern to be chosen 
for its child subtrees. In such a situation, the 
MCT problem is to be preferred to MAST, as 
it correctly handles high-degree nodes, enabling 
them to be resolved according to branching infor- 
mation present in other input trees. As a result, 
more leaves are conserved in the output tree; 
hence, a larger degree of similarity is detected 
between the input trees. Note also that a low 
similarity value between the input trees can be 
due to horizontal gene transfers. When these 
events are not too numerous, identifying species 
subject to such effects is done by first suspecting 
leaves discarded from a maximum compatible 
tree. 

The shape of a maximum compatible tree, 
i.e., not just its size, also has an application in 
systematic biology to obtain a consensus of a set 
of phylogenies that are optimal for some tree- 
building criterion. For instance, the maximum 
parsimony and maximum likelihood criteria can 
provide several dozens (sometimes hundreds) of 
optimal or near-optimal trees. In practice, these 
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trees are first grouped into islands of neighboring 
trees, and a consensus tree is obtained for each 
island by resorting to a classical consensus tree 
method, e.g., the majority-rule or strict consen- 
sus. The trees representing the islands form a 
collection of which a consensus is then sought. 
However, consensus methods keeping all input 
leaves tend to create poorly resolved trees. An 
alternative approach lies in proposing a represen- 
tative tree that contains a largest possible subset 
of leaves on the position of which the trees of 
the collection agree. Again, MCT is more suited 
than MAST as the input trees can contain some 
high-degree nodes, with the same meaning as 
discussed above. 


Open Problems 


A direction for future work would be to exam- 
ine the variant of MCT where some leaves are 
imposed in the output tree. This question arises 
when a biologist wants to ensure that the species 
central to his study are contained in the output 
tree. For MAST on two trees, this constrained 
variant of the problem was shown in a natural 
way to be of the same complexity as the regu- 
lar version [4]. For MCT however, such a con- 
straint can lead to several optimization problems 
that need to be sorted out. Another important 
work to be done is a set of experiments to mea- 
sure the range of parameters for which the algo- 
rithms proposed to solve or approximate MCT are 
useful. 


URLs to Code and Datasets 
A Perl program can be asked to the author of this 
entry. 
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Problem Definition 


Energy resources are very limited in wireless 
sensor networks since the wireless devices are 
small and battery powered. There are two ways 
to deploy wireless sensor networks. One is de- 
terministic [1] deployment, and the other one is 
stochastic or random deployment [2]. In deter- 
ministic deployment, the goal is to minimize the 
number of sensors. In the latter one, the goal is 
to improve coverage ratio. Normally, in random 
or stochastic deployment, one target is covered 
by several sensors. It is unnecessary and a waste 
of energy to activate all sensors around the tar- 
get to monitor it. We can prolong the coverage 
duration through making sleep/activate schedules 
in wireless sensor networks when we don’t have 
abundant energy resources. In Fig. 1, t1, f2, and f3 
are three targets. 51, 52, and s3 are three sensors. 
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Maximum Lifetime Coverage, Fig. 1 Network model 
and comparison of disjoint and non-disjoint coverage [3] 


5, can cover ¢; and fz. s2 can cover fy and £3. 53 
can cover f2 and f3. Assume each sensor can be 
active for 1h. s, and sz can collaborate to cover 
all targets, and the coverage duration will be 1h. 
After 1h, there are no enough sensors to cover 
all targets. The coverage lifetime in this case is 
1h, and s3 is sleep within this 1h. There can be 
another coverage choice. s1 and sz collaborate for 
0.5h to cover all targets while s3 sleeps. s2 and 
s3 collaborate for 0.5h while s; sleeps. s; and 
53 collaborate for 0.5 h while sz sleeps. The total 
coverage lifetime will become 1.5 h. The problem 
is how to divide sensors into groups and how long 
each group should work to prolong the coverage 
lifetime. One sensor can appear in several groups. 
But the total active time of one sensor should 
satisfy its battery capacity. 

We model a wireless sensor network as a graph 
G(S,7T, E,W, L). The sensor set is denoted as 
S. T represents the set of targets in the network. 
If one target f € T can be covered by s € S, 
then there is an edge (s,t) in G. In Fig. 1b, there 
is an edge between ft and s; since f; is in sy’s 
sensing range. All of these edges are stored in F. 
Heterogeneous sensors are considered. Different 
sensors may have different energy consumption 
to do the same tasks. In general, we have different 
weights of sensors. W denotes the weights of 
all sensors, and L denotes the energy capacity 
of all sensors in G. Based on the definition of 
G, the formal definition of MLCP is defined as 
follow. 
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Formal Definition of MLCP 


Definition 1 (MLCP[3]) MLCP is that given 
G = (S,7,E,W,£), find a set of sen- 
sor subsetsets and duration of each subset 
($1, £1), (S2, £2),...,(S%, Le) in Ge to 
maximize )~;_, L;, where S; represents the 
sensor subset in G and L; represents the time 
duration of S;, satisfying: 


1. Vi € {1,2,..., k}, S; satisfies full coverage. 
Vte7T and VS;, As € S; satisfying (t,s) € 
é. 

2. For each sensor, the total active time should be 
smaller or equal to its power constraint. 


It is a long-standing open problem whether 
Maximum Lifetime Coverage Problem (MLCP) 
has a polynomial-time constant-approximation 
algorithm [3]. 

Based on primal-dual method (PD method), 
Minimum Weight Sensor Coverage Problem 
(MWSCP) is used to help to solve MLCP. 
The formal definition of MWSCP is given in 
Definition 2. 


Definition 2 (MWSCP[3]) MWSCP is to find 
a sensor subset SC in G = (S,7,€,W,L) 
to minimize )>,<5¢ w(s), where w(s) represents 
the weight of sensor s, such that V t € 7, ds € 
SC satisfying (t, 5) € €. 


Key Results 


1. Heuristic linear programming without perfor- 
mance guarantee [4] 

2. Pure heuristic algorithm better than heuristic 
linear programming 

3. One 4-approximation algorithms for MLCP 
with improvement from | + Inn, where 7 is 
the number of sensors 

4. One 4-approximation algorithms for MWSCP 


Integer Linear Programming and Heuristic 
Algorithms Proposed by Cardei et al. [4] 
Cardei et al. prove that MLCP is NP-hard. Two 
algorithms are proposed [4]. The first algorithm 


Maximum Lifetime Coverage 


mxm 


Maximum Lifetime Coverage, Fig. 2. Double partition 
and shifting [3] 


is to model MLCP as an integer linear program- 
ming. To solve it, the authors first relax the 
integer variables to real values and get the opti- 
mal solution to the relaxed linear programming. 
Find the maximum time duration of each group 
based on the optimal solution to the relaxed lin- 
ear programming. Update all sensors’ remaining 
battery capacity. A new MLCP is formed with 
different remaining sensor power abilities after 
previous round. Finally, a maximum lifetime will 
be achieved in the network. 

The second algorithm is a heuristic algorithm. 
Find a sensor group which can cover all targets. 
The lifetime of this group is determined by the 
minimum power ability of sensors in the group. 
Update all sensors’ energy level and choose sen- 
sor group again till no such group can be found. 
The final lifetime is the sum of time duration of 
all sensor groups. 


Performance-Guaranteed Approximation 
Algorithm Proposed by Ling et al. [3] 

Ling et al. use primal-dual method to solve 
MCLP. The primal problem is MWSCP [5]. 
To get a constant-approximation algorithm for 
MWSCP, double partition and shifting are used. 
As shown in Fig. 2, the area is divided into cells 
with size ea x ae , where r represents sensing 
range of sensors and m is a predetermined value. 
r = | in Fig. 2. Each cell is further divided into 


Maximum Lifetime Coverage 
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target=5,Alg.3 —— 
target=10,Alg.3 _ —*— 
6b target=5,Hueristic —-x— 
target=10,Hueristic ——=— 


Lifetime 


5 30 


45 60 
Number of Sensors 


Maximum Lifetime Coverage, Fig. 3 Comparison 
Lifetimes among different partitions [3] 


Va 


there will be m x m small squares in each cell. 
After twice partition, there will be m horizontal 
strips and m vertical strips in each cell. Using 
dynamic programming, the optimal solution to 
MWSCP can be found in each cell. Combined 
all the optimal solutions to each cell, we can get 
an approximation algorithm to MWSCP for the 
whole area. To solve the sensors on the border of 
each cell, the small squares are shifted to purple 
position in Fig. 2 and then to yellow position. The 
shift will stop after m times. The final result will 
be achieved by taking average of the solutions to 
black partition, purple partition, yellow partition, 
and other shifts. It is proved in [3] that the final 
result of MWSCP has a constant performance 
ratio 4. The running time of the algorithm is 
determined by the predetermined value m. If 


small squares of size x B which means 


m is big, the result will be more precise, but 
the running time is high. If m is small, the 
running time is small, but the performance will 
drop. 

Based on the idea of primal-dual method, 
the solution to MWSCP will help to derive the 
solution to MCLP with the same performance 
ratio. The solution to MWSCP is derived iteration 
by iteration. In each iteration, find the MWSCP 
firstly and then determine the time duration of 
the sensor set. Update the lifetime and weight of 
each sensors. The algorithm will stop if there is 
no such a sensor set exists. 


T 
target-5,m=6 —— 
target=10,m=6 —+— 
6b target=5,m=3 —-«— 
target=10,m=3 ——=— 


Lifetime 


5 30 45 60 
Number of Sensors 


of lifetime. (a) Lifetimes among different algorithms. (b) 


Experimental Results 


Cardei et al. demonstrate that their pure heuristic 
algorithm outperforms their heuristic linear pro- 
gramming algorithm in running time and lifetime. 

Ding et al. [3] conducts their experimental 
comparisons in an area of 6./2*6,/2 and m = 6. 
They deploy sensors and targets randomly in that 
area. All sensors have the same sensing range of 
2. The initial power capacity is 1 of each sensor. 
To show the density’s effect on the performance, 
they increase the sensors from 15 to 70 by 5 and 
increase the number of targets from 5 to 10. Ling 
et al. compare their algorithm to Cardei’s pure 
heuristic algorithm. If there are more sensors and 
the number of targets is fixed, the lifetime will be 
increased because there are more sensor groups 
(Fig. 3). 
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Problem Definition 


Let G = (V,£) be an undirected graph, and 
letn = |V|,m = |E|. A matching in G is 
a subset M C BE, such that no two edges of 
M have a common endpoint. A perfect match- 
ing is a matching of cardinality n/2. The most 
basic matching related problems are finding a 
maximum matching (i.e., a matching of maximum 
size) and, as a special case, finding a perfect 
matching if one exists. One can also consider the 
case where a weight functionw : E — R is given 
and the problem is to find a maximum weight 
matching. 

The maximum matching and maximum 
weight matching are two of the most fundamental 
algorithmic graph problems. They have also 
played a major role in the development of 
combinatorial optimization and algorithmics. 


Maximum Matching 


An excellent account of this can be found in a 
classic monograph [11] by Lovasz and Plummer 
devoted entirely to matching problems. A more 
up-to-date but also more technical discussion of 
the subject can be found in [19]. 


Classical Approach 
Solving the maximum matching problem in time 
polynomial in n is a highly nontrivial task. The 
first such solution was given by Edmonds [3] in 
1965 and has time complexity O(n?). Edmond’s 
ingenious algorithm uses a combinatorial ap- 
proach based on augmenting paths and blossoms. 
Several improvements followed, culminating in 
the algorithm with complexity O(m./n) given 
by Micali and Vazirani [12] in 1980 (a complete 
proof of the correctness of this algorithm was 
given much later by Vazirani [21], a nice exposi- 
tion of the algorithm and its generalization to the 
weighted case can be found in a work of Gabow 
and Tarjan [4]). Beating this bound proved very 
difficult, several authors managed to achieve only 
a logarithmic speed-up for certain values of m 
and n. All these algorithms essentially follow the 
combinatorial approach introduced by Edmonds. 
The maximum matching problem is much 
simpler for bipartite graphs. The complexity of 
O(m./n) was achieved for this case already in 
1971 by Hopcroft and Karp [7], while the key 
ideas of the first polynomial algorithms date back 
to the 1920s and the works of Konig and Egervary 
(see [11] and [19]). 


Algebraic Approach 

Around the time Micali and Vazirani introduced 
their matching algorithm, Lovasz gave a random- 
ized (Monte Carlo) reduction of the problem of 
testing whether a given n-vertex graph has a 
perfect matching to the problem of computing 
a certain determinant of an x n matrix. Using 
the Hopcroft-Bunch fast Gaussian elimination 
algorithm [1], this determinant can be computed 
in time MM(n) = O(n®) — time required to 
multiply two n x n matrices. Since m < 2.38 
(see [2, 20]), for dense graphs, this algorithm is 
asymptotically faster than the matching algorithm 
of Micali and Vazirani. 


Maximum Matching 


However, Lovasz’s algorithm only tests for 
perfect matching, it does not find it. Using it to 
find perfect/maximum matchings in a straightfor- 
ward fashion yields algorithm with complexity 
O(mn®) = O(n*8), A major open problem in 
the field was thus: can maximum matchings be 
actually found in O(n) time? 

The first step in this direction was taken in 
1989 by Rabin and Vazirani [16]. They showed 
that maximum matchings can be found in time 
O(n®t1) = O(n338), 


Key Results 


The following theorems state the key results of 
[13]. 


Theorem 1 Maximum matching in a n-vertex 
graph G can be found in O(n?) time (Las Vegas) 
by performing Gaussian elimination on a certain 
matrix related to G. 


Theorem 2 Maximum matching in an n-vertex 
bipartite graph can be found in O(n) time 
(Las Vegas) by performing a Hopcroft-Bunch fast 
Gaussian elimination on a certain matrix related 


to G. 


Theorem 3 Maximum matching in an n-vertex 
graph can be found in O(n®) time (Las Vegas). 


Note: O notation suppresses polylogarithmic 
factors, so O(f(n)) means O(f (n)log* (n)) for 
some k. 

Let us briefly discuss these results. Theorem | 
shows that effective matching algorithms 
can be simple. This is in large contrast 
to augmenting paths-/blossoms-based algo- 
rithms which are generally regarded as quite 
complicated. 

The other two theorems show that, for dense 
graphs, the algebraic approach is asymptotically 
faster than the combinatorial one. 

The algorithm for the bipartite case is very 
simple. It’s only nonelementary part is the fast 
matrix multiplication algorithm used as black 
box by the Hopcroft-Bunch algorithm. The gen- 
eral algorithm, however, is complicated and uses 
strong structural results from matching theory. 
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A natural question is whether or not it is pos- 
sible to give a simpler and/or purely algebraic 
algorithm. This has been positively answered by 
Harvey [5]. 

Several other related results followed. Mucha 
and Sankowski [14] showed that maximum 
matchings in planar graphs can be found in 
time O(n®/2) = O(n!!9) which is currently 
fastest known. Yuster and Zwick [22] extended 
this to any excluded minor class of graphs. 
Harvey [6] described a significantly simpler 
and purely algebraic version of the algorithm 
for general graphs. Sankowski [17] gave an 
RNC work-efficient matching algorithm (see 
also Mulmuley et al. [15] and Karp et al. [9] for 
earlier, less efficient RNC matching algorithms, 
and Karloff [8] for a description of a general 
technique for making such algorithm Las Vegas). 
He also generalized Theorem 2 to the case of 
weighted bipartite graphs with integer weights 
from [0,...,W], showing that in this case 
maximum weight matchings can be found in 
time O(Wn®) (see [18]). 


Applications 


The maximum matching problem has numerous 
applications, both in practice and as a subroutine 
in other algorithms. A nice discussion of practical 
applications can be found in the monograph [11] 
by Lovdsz and Plummer. It should be noted, 
however, that algorithms based on fast matrix 
multiplication are completely impractical, so the 
results discussed here are not really useful in 
these applications. 

On the theoretical side, faster maximum 
(weight) matching algorithms yield faster 
algorithms for related problems: disjoint s — t 
paths problem, the minimum (weight) edge 
cover problem, the (maximum weight) b- 
matching problem, the (maximum weight) b- 
factor problem, the maximum (weight) T-join, 
or the Chinese postman problem. For detailed 
discussion of all these applications, see [11] 
and [19]. 

The algebraic algorithm of Theorem | also has 
a significant educational value. The combinato- 
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rial algorithms for the general maximum match- 
ing problem are generally regarded too com- 
plicated for an undergraduate course. That is 
definitely not the case with the algebraic O(n*) 
algorithm. 


Open Problems 


One of the most important open problems in 
the area is generalizing the results discussed 
above to weighted graphs. Sankowski [18] gives 
a O(Wn®) algorithm for bipartite graphs with 
integer weights from the interval [(0...W]. The 
complexity of this algorithm is really bad in terms 
of W. No effective algebraic algorithm is known 
for general weighted graphs. 

Another interesting but most likely very hard 
problem is the derandomization of the algorithms 
discussed. 
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Problem Definition 


Given a sequence of numbers, A=(a1,d2,..., Qn), 
and two positive integers L, U, where |< L < 
U <n, the maximum-density segment problem 
is to find a consecutive subsequence, i.e., a 
segment or substring, of A with length at least L 
and at most U such that the average value of the 
numbers in the subsequence is maximized. 


Key Results 


If there is no length constraint, then obviously 
the maximum-density segment is the maximum 
number in the sequence. Let’s first consider the 
problem where only the length lower bound L 
is imposed. By observing that the length of the 
shortest maximum-density segment with length 
at least L is at most 2L — 1, Huang [9] gave an 
O(nL)-time algorithm. Lin et al. [13] proposed a 
new technique, called the right-skew decomposi- 
tion, to partition each suffix of A into right-skew 
segments of strictly decreasing averages. The 
right-skew decomposition can be done in O(n) 
time, and it can answer, for each position 7, a con- 
secutive subsequence of A starting at that position 
such that the average value of the numbers in the 
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subsequence is maximized. On the basis of the 
right-skew decomposition, Lin et al. [13] devised 
an O(n log L)-time algorithm for the maximum- 
density segment problem with a lower bound L, 
which was improved to O(n) time by Goldwasser 
et al. [8]. Kim [11] gave another O(n)-time algo- 
rithm by reducing the problem to the maximum- 
slope problem in computation geometry. As for 
the problem which takes both L and U into 
consideration, Chung and Lu [6] bypassed the 
construction of the right-skew decomposition and 
gave an O(n)-time algorithm. 

It should be noted that a closely related prob- 
lem in data mining, which basically deals with a 
binary sequence, was independently formulated 
and studied by Fukuda et al. [7]. 


An Extension to Multiple Segments 

Given a sequence of numbers, A = (dj,42, 
...,n), and two positive integers L and k, where 
k < =, let d(A[i, j]) denote the density of 
segment Ali, j], defined as (a; + aj41 +°-°-+ 
aj)/(j —i +1). The problem is to find & disjoint 
segments {51,52,...,5%} of A, each has a length 
of atleast L, such that )° d(s;) is maximized. 

1<i<k 

Chen et al. [5] proposed an O(nkL)-time algo- 
rithm and an improved O(nL + k?L7)-time al- 
gorithm was given by Bergkvist and Damaschke 
[2]. Liu and Chao [14] gave an O(n +k? L log L)- 
time algorithm. 


Applications 


In all organisms, the GC base composition of 
DNA varies between 25—75 %, with the greatest 
variation in bacteria. Mammalian genomes typi- 
cally have a GC content of 45-50 %. Nekrutenko 
and Li [15] showed that the extent of the com- 
positional heterogeneity in a genomic sequence 
strongly correlates with its GC content. Genes 
are found predominantly in the GC-richest iso- 
chore classes. Hence, finding GC-rich regions is 
an important problem in gene recognition and 
comparative genomics. 
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Given a DNA sequence, one would attempt to 
find segments of length at least L with the highest 
C+G ratio. Specifically, each of nucleotides C and 
Gis assigned a score of 1, and each of nucleotides 
A and T is assigned a score of 0. 

DNA sequence: ATGACTCGAGCTCGTCA 
Binary 00101011011011010 The 
maximum-average segments of the binary 
sequence correspond to those segments with 
the highest GC ratio in the DNA sequence. 
Readers can refer to [1, 3, 4, 11-13, 16-18] for 
more variants and applications. 


sequence: 


Open Problems 


The best asymptotic time bound of the algo- 
rithms for the multiple maximum-density seg- 
ments problem is O(n + k?L log L). Can this 
problem be solved in O(7) time? 
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Problem Definition 


Given a sequence of numbers, A = (aj,d2, 
...,Gn), and two positive integers L, U, where 
1< L <U <n, the maximum-sum segment 
problem is to find a consecutive subsequence, i.e., 
a segment or substring, of A with length at least 
L and at most U such that the sum of the numbers 
in the subsequence is maximized. 


Key Results 


The maximum-sum segment problem without 
length constraints is linear-time solvable by using 
Kadane’s algorithm [2]. Huang extended the 
recurrence relation used in [2] for solving the 
maximum-sum segment problem and derived 
a linear-time algorithm for computing the 
maximum-sum segment with length at least L. 
Lin et al. [13] proposed an O(n)-time algorithm 
for the maximum-sum segment problem with 
both L and U constraints, and an online version 
was given by Fan et al. [10]. 


An Extension to Multiple Segments 


Computing the & largest sums over all possible 
segments is a natural extension of the maximum- 
sum segment problem. This extension has been 
considered from two perspectives, one of which 
allows the segments to overlap, while the other 
disallows. 

Linear-time algorithms for finding all the 
nonoverlapping maximal segments were given 
in [5, 15]. On the other hand, one may focus on 
finding the k maximum-sum segments whose 
overlapping is allowed. A naive approach is to 
choose the k largest from the sums of all possible 
contiguous subsequences which requires O(n”) 
time. Bae and Takaoka [1] presented an O(kn)- 
time algorithm for the k maximum segment 
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problem. Liu and Chao [14] noted that the k 
maximum-sum segment problem can be solved 
in O(n + k) time [9] and gave an O(n + k)- 
time algorithm for the length-constrained k 
maximum-sum segment problem. 


Applications 


The algorithms for the maximum-sum segment 
problem have applications in finding GC-rich re- 
gions in a genomic DNA sequence, postprocess- 
ing sequence alignments, and annotating multiple 
sequence alignments. Readers can refer to [3-8, 
11, 13, 15-18] for more variants and applications. 


Open Problems 


It would be interesting to consider the higher 
dimensional cases. 
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Problem Definition 


The max-min allocation problem has the follow- 
ing setting. There is a set A of m agents and a 
set J of n items. Each agenti € A has utility 
ui € Reo for item j € J. Given a subset of items 
S C J, the utility of this set to agent 7 is denoted 
as uj(S) := ));¢s uij- The max-min allocation 
problem is to find an allocation of items to agents 
such that the minimum utility among the agents is 
maximized. That is, min;<, u;(S;) is maximized, 
where S; C J is the set of items allocated to agent 
i and S; N S; = @. 

The problem naturally arises as an approach to 
maximize fairness. Fairness is an important con- 
cept arising in numerous settings ranging from 
border disputes in political science to frequency 
allocations in spectrum auctions. Max-min fair- 
ness is one of the standard notions of fairness 
and has been an object of study for decades [6]. 
Most of the older works, however, have focussed 
on divisible settings, that is, situations where the 
resource can be infinitely divided and allocated. 
Furthermore, the computational perspective, that 
is, how efficiently can one find a fair allocation, 
has not been a primary viewpoint. The max-min 
allocation problem is a combinatorial allocation 
problem where the items cannot be divided, and 
the interest is in designing polynomial time algo- 
rithms to obtain fair, or near-fair, allocations. 


Key Results 


The max-min allocation problem is NP-hard and 
the focus is on designing approximation algo- 
rithms. Let OPT be the optimum value of a 
certain instance. A p-approximate solution, for 
p > 1, is an allocation where each agent gets 
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utility at least OPT/p. A p-approximation al- 
gorithm returns a p-approximate solution given 
any instance. An algorithm is a polynomial time 
approximation scheme (PTAS) if for any constant 
€ > 0, it returns an (1 + €)-approximate solution 
in polynomial time. 

Woeginger [11] obtained a PTAS for the max- 
min allocation problem when the utility of an 
item is the same for all agents. Bezakova and 
Dani [5] gave the first nontrivial (n — m + 1)- 
approximation algorithm for the general max-min 
allocation problem and also showed that it is NP- 
hard to obtain a better than 2-approximation al- 
gorithm for the problem. The latter result remains 
the best hardness known till date. 

Bansal and Sviridenko [3] introduced a 
restricted version of the max-min allocation 
problem which they called the Santa Claus 
problem. In this version, each item has an 
inherent utility u;, however, it can only be 
allocated to an agent in a certain subset 
Aj; © A. Equivalently, for each item j, uj; € 
{u;,0}. Bansal and Sviridenko [3] described 
an O(log logm/ log log log m)-approximation 
algorithm for the Santa Claus problem. Soon 
after, Feige [8] described an algorithm which 
estimates the value of the optimum of the Santa 
Claus problem up to O(1)-factor in polynomial 
time, although at the time no efficient algorithm 
was known to construct the allocation. Following 
constructive versions of the Lovasz local lemma 
due to Moser and Tardos [10] and Haeupler 
et al. [9], there now exists a polynomial time 
O(1)-approximation algorithm for the Santa 
Claus problem. The constant, however, is 
necessarily quite large and to our knowledge 
has not been explicitly specified in any published 
work. In contrast, Asadpour et al. [2] described 
a local search algorithm which returns a 4- 
approximate solution to the Santa Claus problem; 
however, it is not known whether the procedure 
terminates in polynomial time or not. 

Asadpour and Saberi [1] described a 
polynomial time O(./m log? m) approximation 
algorithm for the general max-min allocation 
problem. Bateni et al. [4] obtained an O(m*)- 
approximation algorithm running in mOeU/e) 
time for certain special cases of the max- 
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min allocation problem; in their special cases, 
utilities uj; lay in the set {0,1,oo}, and 
furthermore, for each item j there exists at 
most one agent i with uj; = oo. Chakrabarty 
et al. [7] designed an O(n*)-approximation 
algorithm for the general max-min allocation 
problem which runs in n?(/*)-time, for any 


é ae This implies guasi-polynomial 
time O(log!°n)-approximation algorithm 


and O(m*)-approximation algorithm, for any 
constant ¢ > 0, for the max-min allocation 
problem. An algorithm runs in quasi-polynomial 
time, if the logarithm of its running time is upper 
bounded by a polynomial in the bit length of the 
data. 


Sketch of the Techniques 

Almost all algorithms for the max-min allocation 
problem follow by rounding linear programming 
(LP) relaxations of the problem. One starts with a 
guess T of the optimum OPT. Using this, one 
writes an LP which has a feasible solution if 
OPT > T. The nontrivial part is to round this 
LP solution to obtain an allocation with every 
agent getting utility > 7/p. Since, by doing a 
binary search over the guesses, one can get T 
very close to OPT, the rounding step implies a p- 
approximation algorithm. Henceforth, we assume 
that T has been guessed to be OPT, and further- 
more, by scaling all utility values appropriately, 
we assume OPT = 1. 

The first LP relaxation one may think of is 
the following. First one clips each utility value 
at 1; uij = min(1,u;;). If OPT = 1, then the 
following LP is feasible. 


y Wace 
jer IT 
xi = 1, 

ae o 


IV 


1 wieA (1) 


viel (2) 


The first inequality states that every agent gets 
utility at least OPT = 1, and the second states 
that each item is allocated. It is not hard to 
find instances, and in fact Santa Claus instances, 
where the LP is feasible for OPT = 1, but 
in any allocation, some agent will obtain util- 
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ity at most 1/m. In other words, the integrality 
gap of this LP relaxation is (at least) m. 

There is a considerably stronger LP relaxation 
which is called the configuration LP relaxation. 
The variables in this LP are of the form y;,c 
where i € A and C C J where dViec uj = 
1. (i,C) is called a feasible configuration in 
this case. C denotes the collection of all feasible 
configurations. 


pe: =1, VWieA 
(3) 


duies Dd ene worl, Viel 
(4) 


The first inequality states that each agent 
precisely gets one subset of items, and the 
second states that each item is in precisely 
one feasible configuration. Although the LP 
has possibly exponentially variables, the dual 
has only polynomially many variables and can 
be solved to arbitrary accuracy via the ellipsoid 
method. We refer the reader to [3] for details. 
Bansal and Sviridenko [3] show that in the 
Santa Claus problem, a solution above LP can be 
rounded to give an allocation where every agent 


obtains utility p = Q (sete). To do this, 
the authors partition the items into big, ifu; > p 
and small otherwise. A solution is p-approximate 
if any agent gets either one big item or > 1/p 
small items. The big items are taken care of via a 
“matching like” procedure, while the small items 
are allocated by randomized rounding. Bansal 
and Sviridenko [3] use the Lovasz local lemma 
(LLL) to analyze the randomized algorithm. 
Feige [8] uses a more sophisticated randomized 
rounding in phases along with an LLL analysis 
to obtain a constant factor approximation to the 
value of the optimum. At the time, no algorithmic 
proofs of LLL were known; however, following 
works of [9, 10], a polynomial time O(1)- 
approximation algorithm is now known for the 
Santa Claus problem. 

In the general max-min allocation, the main 
problem in generalizing the above technique is 
that the same item could be big for one agent and 
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small for another agent. Asadpour and Saberi [1] 
give a two-phase rounding algorithm. In the first 
phase, a random matching is obtained between 
agents and their big items with roughly the 
property that each item is allocated with the same 
probability that the configuration LP prescribes. 
In the second phase, each agent 7 randomly 
selects a set C of items with probability y;,c. 
Since there are m agents, at most m items are 
allocated in the first phase. This allows [1] to 
argue that, with high probability, there is enough 
(roughly 1/./m) utility remaining among the 
unmatched items in C. Finally, they also show 
that the same item is not “claimed” by not more 
than O(log m) agents. 

The integrality gap of the configuration LP 
is 2(./m). Therefore a new LP relaxation is 
required to go beyond the Asadpour-Saberi re- 
sult. We now briefly sketch the technique of 
Chakrabarty et al. [7]. First, they show that any 
instance can be “reduced” to a canonical instance 
where agents are either heavy or light. Heavy 
agents have utility 1 for a subset of big items and 
0 for the rest. Light agents have a unique private 
item which give them utility 1, and the rest of 
the items either are small and give utility 1/K or 
give utility 0. Here K ~ n’® is a large integer. 
The LP of [7] is parametrized by a maximum 
matching M between heavy agents and their 
big items. If all heavy agents are matched, then 
there is nothing to be done since light agents 
can allocate their private item. Otherwise, there 
is a reassignment strategy: where a light agent is 
allocated K small items upon which he “frees” 
his private item, which is then again allocated 
to another agent and so on, till an unmatched 
heavy agent gets a big item. This reassignment 
can be seen as a directed in-arborescence whose 
depth can be argued is at most 1/¢ since at each 
level we encounter roughly K new light agents. 
The LP encodes this reassignment as a flow with 
a variable for each flow path of length at most 
1/e; this implies the number of variables is at 
most n?(/®) and a similar number of constraints. 
Therefore, the LP can be solved in nPQ/9 time 
which dominates the running time. If the instance 
has OPT = 1, then the LP has a feasible solution. 
One would then expect that given such a feasible 
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solution, one can obtain an allocation with every 
heavy agent getting a big item and each light 
agent getting either his private item or n* small 
items. Unfortunately, this may not be true. What 
Chakrabarty et al. [7] show is that if the LP has 
a feasible solution, then a “partial” allocation 
can be found where some light agents obtain 
sufficiently many small items, and their private 
items can be used to obtain a larger matching M’ 
among heavy agents and big items. The process 
is then repeated iteratively, with a new LP at 
each step guiding the partial allocation, till one 
matches every heavy agent. 


Summary 

In summary, the best polynomial time algorithms 
for the general max-min allocation problem, as of 
the date this article is written, have approximation 
factors which are a polynomial in the input data. 
On the other hand, even a 2-approximation for 
the problem has not been ruled out. Closing 
this gap is a confounding problem in the area 
of approximation algorithms. A constant factor 
approximation algorithm is known for the special 
case called the Santa Claus problem; even here, 
getting a polynomial time algorithm achieving a 
“small” constant factor is an interesting problem. 
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Problem Definition 


Mechanism design and private data analysis both 
study the question of performing computations 
over data collected from individual agents while 
satisfying additional restrictions. The focus in 
mechanism design is on performing computa- 
tions that are compatible with the incentives of 


1248 


the individual agents, and the additional restric- 
tions are toward motivating agents to participate 
in the computation (individual rationality) and 
toward having them report their true data (in- 
centive compatibility). The focus in private data 
analysis is on performing computations that limit 
the information leaked by the output on each 
individual agent’s sensitive data, and the addi- 
tional restriction is on the influence each agent 
may have on the outcome distribution (differen- 
tial privacy). We refer the reader to the sections 
on algorithmic game theory and on differential 
privacy for further details and motivation. 


Incentives and privacy. In real-world settings, 
incentives influence how willing individuals are 
to part with their private data. For example, an 
agent may be willing to share her medical data 
with her doctor, because the utility from sharing 
is greater than the loss of utility from privacy con- 
cerns, while she would probably not be willing to 
share the same information with her accountant. 
Furthermore, privacy concerns can also cause 
individuals to misbehave in otherwise incentive- 
compatible, individually rational mechanisms. 
Consider for example a second-price auction: 
the optimal strategy in terms of payoff is to 
truthfully report valuations, but an agent may 
consider misreporting (or abstaining) because 
the outcome reveals the valuation of the second- 
price agent, and the agent does not want to risk 
their valuation being revealed. In studies based 
on sensitive information, e.g., a medical study 
asking individuals to reveal whether they have 
syphilis, a typical individual with syphilis may be 
less likely to participate than a typical individual 
without the disease, thereby skewing the overall 
sample. The bias may be reduced by offering 
appropriate compensation to participating agents. 


The framework. Consider a setting with n in- 
dividual agents, and let x; € X be the private 
data of agent i for some type set X. Let f : 
X" — Y be a function of the joint inputs of 
the agents x = (x1,...,X,). Our goal is to build 
a mechanism M that computes f(x) accurately 
and is compatible with incentives and privacy as 
we will now describe. 


Mechanism Design and Differential Privacy 


We first fix a function v that models the gain 
in utility that an agent derives from the outcome 
of the mechanism. We restrict our attention to 
a setting where this value can only depend 
on the agent’s data and the outcome y of the 
mechanism: 

vj = v(x, y). 


We also fix a function A that models the loss in 
utility that an agent incurs because information 
about her private data is leaked by the outcome 
of the mechanism. Importantly, A depends on the 
mechanism M, as the computation M performs 
determines the leakage. The loss can also depend 
on how much the agent values privacy, described 
by a parameter p; (a real number in our model- 
ing), on the actual data of all the individuals, on 
the outcome, as well as other parameters such as 
the strategy of the agent: 


Ki => A(M, pi, X-i, Xi, V,---). 


The overall utility that agent 7 derives from par- 
ticipating in the computation of M is 
uj = Vj — Aj. (1) 
With this utility function in mind, our goal 
will be to construct truthful mechanisms M that 
compute f accurately. We note that in Eq. 1 we 
typically think about both v; and A; as positive 
quantities, but we do not exclude either of them 
being negative, so either quantity may result in a 
gain or a loss in utility. 
We can now define the mechanism M 
X” x R” — Y to be a randomized function 
taking as inputs the private inputs of the agents x 


and their privacy valuations p and returns a value 
in the set Y. 


Modeling the privacy loss. In order to analyze 
specific mechanisms, we will need to be able to 
control the privacy loss A. Toward this end, we 
will need to assume that A has some structure, 
and so we now discuss the assumptions we make 
and their justifications. 

One view of privacy loss is to consider a 
framework of sequential games: an individual 
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is not only participating in mechanism M, but 
she will also participate in other mechanisms 
M’',M”,... in the future, and each participation 
will cause her to gain or lose in utility. Because 
her inputs to these functions may be correlated, 
revealing her private inputs in M@ may cause her 
to obtain less utility in the future. For example, 
an individual may hesitate to participate in a 
medical study because doing so might reveal 
she has a genetic predisposition to a certain 
disease, which may increase her insurance 
premiums in the future. This view is general 
and can formalize many of the concerns we 
typically associate with privacy: discrimination 
because of medical conditions, social ostracism, 
demographic profiling, etc. 

The main drawback of this view is that it is 
difficult to know what the future mechanisms 
M’',M”,... may be. However, if M is differen- 
tially private, then participating in M entails a 
guarantee that remains meaningful even without 
knowing the future mechanisms. To see this, we 
will use the following definition that is equivalent 
to the definition of €-differential privacy [3]: 


Definition 1 (Differential privacy) A (random- 
ized) mechanism M : X” — Y ise-differentially 
private if for all x,x’ € X” that differ on one 
entry, and for all g : Y — [0, 00), it holds that 


Exp[g(M(x))] < e€ - Exp[g(M(x’))], 


where the expectation is over the randomness 
introduced by the mechanism M. 


Note that ef ~ 1+ ¢€ for small ¢; thus, if 
g(y) models the expected utility of an individual 
tomorrow given that the result of M(x) = y 
today, then by participating in a differentially 
private mechanism, the individual’s utility will 
change by at most €. 


Fact 1. Let g Y > [-1,1]). If M is 
¢-differential private, then Exp[g(M(x’))] — 
Exp[g(M(x))] < 2(e§ — 1) ® 2e for all 
x,x’ € X” that differ on one entry. 

To see why this is true, let g_(y) = 


max(0,—g(y)) and g+(y) = max(0, g(y)). 
From Definition 1 and the bound on the 
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outcome of g, we get that Exp[g+(M(x’))] — 
Exp[g+(M(x))] < (e€ — 1) - Exp[g+(M(x))] < 
ef — 1 and, similarly, Exp[g_(M(x))] — 
Exp[g—(M(x’))] < e€ — 1. As g(y) = 8+(9) - 
g—(y), we conclude that Exp[g(M(x’))] — 
Exp[g(M(x))] < 2(e* — 1). 


With this in mind, we typically view A as being 
“pounded by differential privacy” in the sense 
that if M is e-differentially private, then |A;| < 
Pi + €, where p; (a positive real number) is an 
upper bound on the maximum value of 2|g(y)|. 
In certain settings we make even more specific 
assumptions about A;, and these are discussed in 
the sequel. 


Generic Problems 
We will discuss two generic problems for which 
key results will be given in the next section: 


Privacy-aware mechanism design. Given an 
optimization problem gq : X” x Y — R, con- 
struct a privacy-aware mechanism whose output 
y approximately maximizes q(x,-). Using the 
terminology above, this corresponds to setting 
f(x) = argmax,q(x, y), and the mechanism 
is said to compute f() with accuracy a if (with 
high probability) g(x, f(x)) — q(x, 3) < a. 
We mention two interesting instantiations of g(). 
When g(x,y) = >); v(x, y), the problem is 
of maximizing social welfare. When x; corre- 
sponds to how agent i values a digital good and 
Y = R? is interpreted as a price for the good, 
setting g(x,y) = y- |i : x; = y| corresponds to 
maximizing the revenue from the good. 


Purchasing privacy. Given a function f 

X” -— Y, construct a mechanism computing 
payments to agents for eliciting permission to use 
(some of) the entries of x in an approximation for 
F(x). Here it is assumed that the agents cannot 
lie about their private values (but can misreport 
their privacy valuations). We will consider two 
variants of the problem. In the insensitive value 
model, agents only care about the privacy of 
their private values x. In the sensitive value 
model, agents also care about the privacy of their 
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privacy valuations p, e.g., because there may be 
a correlation between x; and p;. 


Basic Differentially Private Mechanisms 

We conclude this section with two differentially 
private mechanisms that are used in the construc- 
tions presented in the next section. 


The Laplace mechanism [3]. The Laplace dis- 
tribution with parameter 1/e, denoted Lap(1/e), 
is a continuous probability distribution with zero 
mean and variance 2/¢. The probability density 
function of Lap(1/e) is h(z) = feel, For 
A => 0 we get Prz~pap(i/ey[|Z| > A] = ge 


Fact 2. The mechanism M_,p that on input x € 
{0, 1}” outputs y = #{i : x; = 1} + Z where 
Z ~ Lap(1/e) is e-differentially private. From 
the properties of the Laplace distribution, we get 
that 


Pr —#i +x; =1}|> A)< oe. 
BE gl Hi = HI> A] < 


The exponential mechanism [8]. Consider the 
optimization problem defined by g : X" x Y > 
R, where g satisfies |g(x, y) — g(x’, y)| < 1 for 
all y € Y and all x, x’ that differ on one entry. 


Fact 3. The mechanism Mgxp that on input x € 
X” outputs y € Y chosen according to 


exp ($4 (x, t)) 


Py A ey exp (£46, ) 


(2) 


is €-differentially private. Moreover, 
Pr x, > opt(x) — A 
ene [q(x, y) = opt(x) — A] 
>1-|¥|-exp(-e-A/2), 
(3) 


where opt(x) = maxy € Y (q(x, y)). 


Notation. For two n-entry vectors x, x’, we write 
x ~; x’ to denote that they agree on all but the i- 
th entry. We write x ~ x’ if x ~; x’ for some i. 


Mechanism Design and Differential Privacy 


Key Results 


The work of McSherry and Talwar [8] was 
first to realize a connection between differential 
privacy and mechanism design. They observed 
that (with bounded utility from the outcome) a 
mechanism that preserves €-differential privacy 
is also €-truthful, yielding €-truthful mechanisms 
for approximately maximizing social welfare 
or revenue. Other works in this vein — using 
differential privacy but without incorporating the 
effect of privacy loss directly into the agent’s 
utility function — include [6, 10, 12]. 


Privacy-Aware Mechanism Design 
The mechanisms of this section share the follow- 
ing setup assumptions: 


Optimization problem. g : X” x Y — [0,n] and 
a utility function U : X x Y > (0, 1]. 

Input. 1 players each having an input x; € X 
and a privacy valuation p;. The players may 
misreport x;. 

Output. The mechanism outputs an element y € 
Y approximately maximizing q(x, y). 

Utility. Each player obtains utility U(x;, y) — Ai 
where the assumptions on how the privacy loss 
4; behaves vary for the different mechanisms 
below and are detailed in their respective sec- 
tions. 

Accuracy. Let opt(x) = maxycy(q(x,y)). A 
mechanism is (A,6)-accurate for all x if it 
chooses y € Y such that Pr[opt(x) —q(x, y) < 
A] => 1-6 where the probability is taken 
over the random coins of the mechanism. (One 
can also define accuracy in terms of opt(x) — 


Exp[q(x, y)].) 


Worst-Case Privacy Model 

In the worst-case privacy model, the privacy loss 
of mechanism M is only assumed to be upper 
bounded by the mechanisms’ privacy parameter, 
as in the discussion following Fact | [9]: 


where € 


Pr[M(x) = y] 
Pr[M(x') = y] 


O<A; < pie 


(4) 


= Maxy’~x,yeY In 
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Nissim, Orlandi, and Smorodinsky [9] give 
a generic construction of privacy-aware mecha- 
nisms assuming an upper bound on the privacy 
loss as in Eq. 4. The fact A; is only upper bounded 
and excludes the possibility of punishing mis- 
reporting via privacy loss (compare with Algo- 
rithms 3 and 4 below), and hence, the generic 
construction resorts to a somewhat nonstandard 
modeling from [10]. To illustrate the main com- 
ponents of the construction, we present a specific 
instantiation in the context of pricing a digital 
good, where such a nonstandard modeling is not 
needed. 


Pricing a digital good. An auctioneer selling 
digital good wishes to design a single price mech- 
anism that would (approximately) optimize her 
revenue. Every agent has a valuation x; €¢ X¥ = 
{0,0.01,0.02,...,1} for the good and privacy 
preference p;. Agents are asked to declare x; to 
the mechanism, which chooses a price y € Y = 
{0.01,0.02,..., 1}. Let me be the report of agent 
i. If x; < y, then agent 7 does not pay nor 
receives the good and hence gains zero utility, i.e., 
vj = 0. If x; => y, then agent gets the good and 
pays y and hence gains in utility. We let this gain 
be vj = G — y + 0.005, where the additional 
0.005 can be viewed as modeling a preference to 
receive the good (technically, this breaks the tie 
between the cases x} = y and xi = y — 1). To 
summarize, 


ah = xi — y + 0.005 if y < x; 
v(%,X;,y) = 0 otherwise 
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Algorithm 1 (ApxOptRev) 


Auxiliary input: privacy parameter €, probability 
0<n<1. 


Input: x’ = (xj,...,x/,) EX”. 


ApxOptRev executes Mj with probability 1 — 7 
and M> otherwise, where M,, Mp2 are: 


My: Choose y € Y using the exponential mechanism, 
Mexp (Fact 3), i-e., 


exp (§-t- |{i xX} > t}I) 


B= t= ee =O 


Mp2: Choose y € Y uniformly at random. 


The privacy loss for agent 7 is from the informa- 
tion that may be potentially leaked on x/ via the 
chosen price y. The auctioneer’s optimal revenue 
is opt(x) = max;ey(t - |{i : x; > t}|), and the 
revenue she obtains when the mechanism chooses 
price y is y- |{i : x} = y}|. The mechanism is 
presented in Algorithm 1. 


Agent utility. To analyze agent behavior, 
compare the utility of a misreporting agent 
to a truthful agent. (i) As Algorithm | is e- 
differentially private, by our assumption on 
A;, by misreporting agent i may reduce her 
disutility due to information leakage by at most 
pi: €. (ii) Note that v(xj,x},y) < v(x, xi, y). 
Using this and Fact 1, we can bound the 
expected gain due to misreporting in Mj, as 
follows: 


EXPy~ Mj (x/_,.x/) UO} Y)] — ExPy~ my @_,.xlU@i. x,y) S 


Expy, (x; x li , x y)] = Expy.m, (x; x Ui > Sis y)] S20: 


(iii) On the other hand, in M2, agent 7 loses 
at least g = 0.01 - 0.005 in utility whenever 
x; # xt; this is because y falls in the set {x; + 
0.01,...,x/} with probability x; — x; > 0.01 
when x; < Hee in which case she loses at least 
0.005 in utility and, similarly, y falls in the set 
{xi,...,x; — 0.01} with probability x; — x; > 
0.01 when x; < oe in which case she loses at 
least 0.005 in utility. 


We hence get that agent 7 strictly prefers to 
report truthfully when 


26 = 972 pre =O, (5) 
Designer utility. Let m be the number of agents 
for which Eq. 5 does not hold. We have opt(x’) > 
opt(x) — m, and hence, using Fact 3, we get 
that 
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Pr 
y~ApxOptRevix’) 


We omit from this short summary the discus- 
sion of how to choose the parameters € and 7 (this 
choice directly affects m). One possibility is to 
assume the p; has nice properties [9]. 


Per-Outcome Privacy Model 

In the output specific model, the privacy loss of 
mechanism MM is evaluated on a per-output basis 
[2]. Specifically, on output y € Y is assumed 
that 


|Ai(x, y)| < pi - Fi(x, y) where Fj (x, y) 
nL) = 91 
Pr[M(x") = y] 
(6) 


= Maxy’x”~;x 1 


To interpret Eq.6, consider an Bayesian ad- 
versary that has a prior belief jz on x; and fix 
x_;. After seeing y = M(x_;, x;), the Bayesian 
adversary updates her belief to yz’. For every 
event FE defined over x;, we get that 


[y-|{i sx} > y}| < opt(x) —m — A] 
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A 


|Y|-exp(—-eA/2) + 9 
100 - exp(—eA/2) + 7. 


L'(E) = w(E|M(x-i, xi) = y) 
Pr[M(x-j,x;) = y|E] 
“PrfM(x-i,xi) = y] 


E w(E) . etki &y) 


= B(E)- 


This suggests that 4; models harm that is “con- 
tinuous” in the change in adversarial belief about 
i, in the sense that a small adversarial change in 
belief entails small harm. (Note, however, that 
this argument is restricted to adversarial beliefs 
on x; given X_j.) 


Comparison with Worst-Case Privacy 

Note that if M is e-differentially private, then 
F(x, y) < € forall x, y. Equation 6 can hence be 
seen as a variant of Eq. 4 where the fixed value € 
is replaced with the output specific Fj (x, y). One 
advantage of such a per-outcome model is that 
the typical gain from misreporting is significantly 
smaller than e¢. In fact, for allx € X” and xt EX, 


Expy Moy [Fi (x, »)] — Expy Mo_;.x7) [Fi ¥)]] = O(€?). 


On the other hand, the modeled harm is somewhat 
weaker, as (by Fact 1) Eq.4 also captures harm 
that is not continuous in beliefs (such as decisions 
based on the belief crossing a certain threshold). 

Assuming privacy loss is bounded as in Eq. 6, 
Chen, Chong, Kash, Moran, and Vadhan [2] 
construct truthful mechanisms for an election 
between two candidates, facility location, and a 
VCG mechanism for public projects (the latter 
uses payments). Central to the constructions is 
the observation that F; is large exactly when 
agent 7 has influence on the outcome of M(). 
To illustrate the main ideas in the construction, 
we present here the two-candidate election 
mechanism. 


Two-candidate election. Consider the setting of 
an election between two candidates. Every agent 
i has a preference x; € X = {A, B} and privacy 
preference p;. Agents are asked to declare x; to 
the mechanism, which chooses an outcome y € 
Y = {A, B}. The utility of agent 7 is then 


lifx=y 


ula, Y) = 0 otherwise 


The privacy loss for agent i is from the infor- 
mation that may be potentially leaked on her re- 
ported x/ via the outcome y. The designer’s goal 
is to (approximately) maximize the agents’ social 
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Algorithm 2 ApxMaj 
Auxiliary input: privacy parameter €. 


Input: x’ = (xj,...,x/,) € X”. 


ApxMaj performs the following: 


1. Sample a value Z from Lap(1/e). 
2. Choose y = A if 
{7 1x =A} > lj : x), =B}| + Zand y=B 


otherwise. 


welfare (i.e., total utility from the outcome). The 
mechanism is presented in Algorithm 2. 


Agent utility. To analyze agent behavior, we 
compare the utility of a misreporting agent to 
a truthful agent. Notice that once the noise Z 
is fixed if agent 7 affects the outcome, then her 
disutility from information leakage is at most 
pi: € and her utility from the outcome decreases 
by 1. If agent i cannot affect the outcome, then 
misreporting does not change either. We hence 
get that agent strictly prefers to report truthfully 
when 

pire <i. 


(7) 


Note that by our analysis, Eq. 7 implies universal 
truthfulness — agent i prefers to report truthfully 
for every choice of the noise Z. In contrast, Eq. 5 
only implies truthfulness in expectation. 


Social welfare. Letting m be the number of 
agents for which Eq.7 does not hold, and using 
Fact 2, we get that Algorithm ApxMaj maximizes 
social welfare up to error m + foi’ with prob- 
ability 1 — 6. As in the previous section, we omit 
from this short summary the discussion of how 
to choose € (this choice affects m and hence the 


accuracy of the mechanism). 


Purchasing Privacy 
The mechanisms of this section share the follow- 
ing setup assumptions, unless noted otherwise: 


Input. n players each having a data bit x; € 
{0,1} and a privacy valuation pj > 0. The 
players may misreport p; but cannot misreport 
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x;. We will assume for convenience of nota- 
tion that p; < p2 <... < Dn. 

Intermediate outputs. The mechanism selects a 
subset of participating players S C [n] anda 
scaling factor ¢ and a privacy parameter €. 

Output. The mechanism uses the Laplace 
mechanism to output an estimate s = 


+ (Diets) xi + Z) where Z ~ Lap(1/e) 
and payments v; fori € [n]. 

Utility. Each player obtains utility v; — A; where 
the assumptions on how the privacy loss A; be- 
haves vary for the different mechanisms below 
and are detailed in their respective sections. 

Accuracy. A mechanism is a-accurate if Pr[|s — 
F(x)| < an] => 2/3 where the probability 
is taken over the random coins of the mech- 
anism. 


We focus on designing mechanisms that approx- 
imate the sum function f(x) = )°;_, x; where 
each x; € {0, 1}, which has been the most widely 
studied function in this area. As one can see 
from the above setup assumptions, the crux of 
the mechanism design problem is in selecting 
the set S, choosing a privacy parameter €, and 
computing payments for the players. We note that 
several of the works we describe below gener- 
alize beyond the setting we describe here (i.e., 
computing different f, fewer assumptions, etc.). 
The following presentation was designed to give 
a unified overview (sacrificing some generality), 
but to preserve the essence both of the challenges 
posed by the problem of purchasing private data 
and each mechanism’s idea in addressing the 
challenges. 


Insensitive Valuation Model 
In the insensitive valuation model, the privacy 
loss A; of amechanism M is assumed to be [5] 


Ai = pi &| where €; 


Pr[M(x, p) = 5] 
Pr[M(x’, p) = s] 


(8) 


Maxy,x/~;x,p,s 


It is named the insensitive valuation model be- 
cause €; only measures the effect on privacy of 
changing player i’s data bit, but not the effect of 
changing that player’s privacy valuation. 
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Algorithm 3 (FairQuery) 
Auxiliary input: budget constraint B > 0. 


1. Let k € [n] be the largest integer such that 
De(n —k) < B/k. 

2. Select S = {1,...,k} and sete = —.. 
Set the scaling factor t = 1. 

3. Set payments v; = 0 for alli > k and 
vj = min{Z, Px+16€} for alli < k. 


Algorithm 4 MinCostAuction 


Auxiliary input: accuracy parameter a € (0, 1). 


1. Seta’ = Vath and k = a —a’)n. 


2. Select S = {1,...,k}ande = a Set the 
scaling factor tf = 1. 

3. Set payments v;=0 for i>k and vj = px+1€ for 
alli < k. 


Mechanisms. Two mechanisms are presented 
in the insensitive value model in [5], listed in 
Algorithms 3 and 4. Algorithm 3 (FairQuery) is 
given a hard budget constraint and seeks to opti- 
mize accuracy under this constraint; Algorithm 4 
(MinCostAuction) is given a target accuracy re- 
quirement and seeks to minimize payouts under 
these constraints. 


Guarantees. Algorithms 3 and 4 are individ- 
ually rational and truthful. Furthermore, Algo- 
rithm 3 achieves the best possible accuracy (up 
to constant factors) for the class of envy-free 
and individually rational mechanisms, where the 
sum of payments to players does not exceed B. 
Algorithm 4 achieves the minimal payout (up to 
constant factors) for the class of envy-free and 
individually rational mechanisms that achieve a- 
accuracy. 


Sensitive Value Model 
Ghosh and Roth [5] also defined the sensitive 
value model where A, is as in Eq. 8, except that 
e; is defined to equal 


Pr[M(x, p) = s| 


Pr[M(x', p') = 8] 
(9) 


MaX,., p,(x’,p’)~;(x,p),s IN 
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Namely, we also measure the effect on the 
outcome distribution of the change in a single 
player’s privacy valuation. It was shown in [5] 
and subsequent generalizations [11] that in 
this model and various generalizations where 
the privacy valuation itself is sensitive, it 
is impossible to build truthful, individually 
rational, and accurate mechanisms with worst- 
case guarantees and making finite payments. 
To bypass these impossibility results, several 
relaxations were introduced. 


Bayesian relaxation [4]. Fleischer and Lyu use 
the sensitive notion of privacy loss given in Eq. 9. 
In order to bypass the impossibility results about 
sensitive values, they assume that the mechanism 
designer has knowledge of prior distributions 
P°, P! for the privacy valuations. They assume 
that all players with data bit b have privacy val- 
uation sampled independently according to P?, 


R 
namely, that pj < P*', independently for all 7. 
Their mechanism is given in Algorithm 5. 


Algorithm 5 Bayesian mechanism from [4] 


Auxiliary input: privacy parameter €. 


1. Compute c = 1 — a Compute ay for 
b € {0,1} such that Pr polP <aQpl] =c. 
p= 


2. Set S be the set of players i such that p; <ay;. 
Set the scaling factor t = c. 

3. For each player i € S, pay €ax,. Pay the other 
players 0. 


Algorithm 5 is truthful and individually ratio- 
nal. Assuming that the prior beliefs are correct, 
the mechanism is O(4)-accurate. The key use of 
knowledge of the priors is in accuracy: the prob- 
ability of a player participating is c independent 
of its data bit. 


Take-it-or-leave-it mechanisms [7]. Ligett and 


Roth put forward a setting where the privacy loss 
is decomposed into two parts 


Ap = APH, 
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where A? is the privacy loss incurred by leaking 
information of whether or not an individual is 
selected to participate (i.e., whether individual i 
is in the set S), and where A} is the privacy loss 
incurred by leaking information about the actual 
data bit. 

The interpretation is that a _ surveyor 
approaches an individual and offers them v; 
to participate. The individual cannot avoid 
responding to this question and so unavoidably 
incurs a privacy loss Ar without compensation. 
If he chooses to participate, then he loses an 
additional Ax , but in this case he receives v; in 
payment. While this is the framework we have 
been working in all along, up until now we have 
not distinguished between these two sources of 
privacy loss, rather considering only the overall 
loss. By explicitly separating them, [7] can make 
more precise statements about how incentives 
relate to each source of privacy loss. 

In this model the participation decision of 
an individual is a function (only) of its privacy 
valuation, and so we define 


A? = pie? where €? 
PM (x, p) = 5] 
Pr[M(x, p’) = 5] 
(10) 


= Maxy, p, p’~; p,s 


x 


We define AX = pje? 
insensitive model, Eq. 8. The mechanism is given 
in Algorithm 6. 

Algorithm 6 is @-accurate. It is not individu- 
ally rational since players cannot avoid the take- 
it-or-leave-it offer, which leaks information about 
their privacy valuation that is not compensated. 
However, it is “one-sided truthful’ in the sense 
that rational players will accept any offer v; 
satisfying vj > A? — A*. [7] also proves that 
for appropriately chosen 7, the total payments 
made by Algorithm 6 are not much more than 


where ¢* is as in the 


that of the optimal envy-free mechanism mak- 
ing the same take-it-or-leave-it offers to every 
player. 


Monotonic valuations [11]. Nissim, Vadhan, 
and Xiao [11] study a relaxation of sensitive 
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Algorithm 6 (Take-it-or-leave-it mechanism 
[7]) 


Auxiliary input: accuracy parameter a € (0, 1), 
payment increment 7 > 0. 


1. Set 7 =lande =a. 
2. Repeat the following: 
(a) Set E; = 100(log j + 1)/a? and S; = 9. 
(b) Fori = 1 until £;: 
i. Sample without replacement 7 es [n]. 
ii. Offer player i a payment of (1 + )/. 
iii. If player i accepts, set S; = S; U {i}. 
(c) Sample v Pa A(1/e). If 
|S;|+v > (1—a/8)£E;, then break and 
output selected set S = S;, privacy 
parameter €, and normalizing factor t = E;. 
For every j’ < j, pay (1 + 7)” to each 
player that accepted in round 7’ and pay 0 to 
all other players. 
(d) Otherwise, increment 7 and continue. 


values that they call monotonic valuations, where 
it is assumed that 


Ai (x, p) S pi ej" (x, p) where €/"(x, p) 


= MAX (yx, p’)~7™ (x, p),s 
Pr{M = 
7 Pel, p)=s]_ or 
Pr[M(x’, p')=s] 


mon 


Here, @',p') ~! (x, p) denotes that 
(x’, p’), (x, p) are identical in all entries except 
the 7’th entry, and in the 7’th entry, it holds that 
either x; > x/ and p; = p; both hold or x; < x; 
and p; < p; both hold. 

The intuition behind the definition is that for 
many natural settings, x; = 1 is more sensitive 
than x; = 0 (e.g., if x; represents whether an 
individual tested positive for syphilis), and it is 
therefore reasonable to restrict attention to the 
case where the privacy valuation when x; = 1 
is at least the privacy valuation when x; = 0. 

There are two other aspects in which this 
notion is unlike those used in the earlier works on 
purchasing privacy: (i) the definition may depend 
on the input, so the privacy loss may be smaller 
on some inputs than others, and (ii) we assume 
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only an upper bound on the privacy loss, since 
€”°" does not say which information is leaked 
about player 7, and so it may be that the harm 
done to player i is not as severe as €"°" would 
suggest. The mechanism is given in Algorithm 7. 


Algorithm 7 (Mechanism for monotonic val- 
uations [11]) 


Auxiliary inputs: budget constraint B > 0, privacy 
parameter € > 0. 


1. Sett = Jen’ 

2. Output selected set S = {i | p; < T}, output 
privacy parameter €, and scaling factor t = 1. 

3. Pay B/n to players in S, pay 0 to others. 


Algorithm 7 is individually rational for all 
players and truthful for all players satisfying pj < 
t. Assuming all players are rational, on inputs 
where there are / players having p; > tT, the 
mechanism is (O(4) + h)-accurate. The accu- 
racy guarantee holds regardless of how the play- 
ers with p; > t report their privacy valuations. 
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Problem Definition 


Mutual exclusion is a fundamental concurrent 
programming problem (see » Concurrent Pro- 
gramming, Mutual Exclusion entry), in which a 
set of processes must coordinate their access to 
a critical section so that, at any point in time, at 
most a single process is in the critical section. 

To a large extent, shared-memory mutual ex- 
clusion research focused on busy-waiting mu- 
tual exclusion, in which, while waiting for the 
critical section to be freed, processes repeatedly 
test the values of shared-memory variables. A 
significant portion of this research over the last 
two decades was devoted to local-spin algorithms 
[2], in which all busy-waiting is done by means 
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of read-only loops that repeatedly test locally 
accessible variables. 


Local-Spin Algorithms and the RMRs 

Metric 

A natural way to measure the time complex- 
ity of algorithms in shared-memory multiproces- 
sors is to count the number of memory accesses 
they require. This measure is problematic for 
busy-waiting algorithms because, in this case, a 
process may perform an unbounded number of 
memory accesses while busy-waiting for another 
process holding the lock. Moreover, Alur and 
Taubenfeld [1] have shown that even the first 
process to enter the critical section can be made 
to perform an unbounded number of accesses. 

As observed by Anderson, Kim, and Her- 
man [6], “most early shared-memory algorithms 
employ...busy-waiting loops in which many 
shared variables are read and written...Under 
contention, such busy-waiting loops generate 
excessive traffic on the processors-to-memory 
interconnection network, resulting in poor 
performance.” 

Contemporary shared-memory mutual exclu- 
sion research focuses on local-spin algorithms, 
which avoid this problem as they busy-wait by 
means of performing read-only loops that repeat- 
edly test locally accessible variables (see, e.g., 
[4,5, 7,9, 11, 13, 14]). The performance of these 
algorithms is measured using the remote memory 
references (RMRs) metric. 

The classification of memory accesses into 
local and remote depends on the type of multipro- 
cessor. In the distributed shared-memory (DSM) 
model, each shared variable is local to exactly one 
processor and remote to all others. In the cache- 
coherent (CC) model, each processor maintains 
local copies of shared variables inside a cache; 
the consistency of copies in different caches is en- 
sured by a coherence protocol. At any given time, 
a variable is local to a processor if the coherence 
protocol guarantees that the corresponding cache 
contains an up-to-date copy of the variable and is 
remote otherwise. 

Anderson was the first to present a local-spin 
mutual exclusion algorithm using only reads and 
writes with bounded RMR complexity [3]. In 
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his algorithm, a process incurs O(n) RMRs to 
enter and exit its critical section, where n is the 
maximum number of processes participating in 
the algorithm. Yang and Anderson improved on 
that and presented an O(logn) RMRs mutual 
exclusion algorithm based on reads and writes 
[20]. This is asymptotically optimal under both 
the CC and DSM models [7]. 


Read-Modify-Write Operations 

The system’s hardware or operating system pro- 
vides primitive operations (or simply operations) 
that can be applied to shared variables. The sim- 
plest operations, which are always assumed, are 
the familiar read and write operations. Modern 
architectures provide stronger read-modify-write 
operations (a.k.a. fetch-and-® operations). The 
most notable of these is compare and swap (ab- 
breviated CAS), which takes three arguments: an 
address of a shared variable, an expected value, 
and a new value. The CAS operation atomically 
does the following: if the variable stores the 
expected value, it is replaced with the new value; 
otherwise, it is unchanged. The success or failure 
of the CAS operation is then reported back to 
the program. It is crucial that this operation is 
executed atomically; thus, an algorithm can read 
a datum from memory, modify it, and write it 
back only if no other process modified it in the 
meantime. 

Another widely implemented RMW operation 
is the swap operation, which takes two argu- 
ments: an address of a shared variable and a 
new value. When applied, it atomically stores 
the new value to the shared variable and returns 
the previous value. The CAS operation may be 
viewed as a conditional version of swap, since it 
performs a swap operation only if the value of 
the variable to which it is applied is the expected 
value. 

Architectures supporting strong RMW opera- 
tions admit implementations of mutual exclusion 
that are more efficient in terms of their RMR 
complexity as compared with architectures that 
support only read and write operations. In work 
that preceded the introduction of the MCS lock, 
Anderson [2] and Graunke and Thakkar [12] 
presented lock algorithms, using strong RMW 


1258 


operations such as CAS and swap, that incur only 
a constant number of RMRs on CC multiproces- 
sors. However, these algorithms are not local spin 
on DSM multiprocessors and the amount of pre- 
allocated memory per lock is linear in the number 
of processes that may use it. 


Key Results 

Mellor-Crummey and Scott’s algorithm [18] is 
the first local-spin mutual exclusion algorithm in 
which processes incur only a constant number of 
RMRs to enter and exit the critical section, in 
both CC and DSM multiprocessors. The amount 
of memory that needs to be pre-allocated by 
locks using this algorithm (often called MCS 
locks) is constant rather than a function of the 
maximum number of processes that may use the 
lock. Moreover, MCS locks guarantee a strong 
notion of fairness called first-in, first-out (FIFO, 


Algorithm 1 Mellor-Crummey and Scott 
algorithm 


1 Qnode: structure {bit locked, Qnode* next}; 

2 shared Qnode nodes[0...n — 1], Qnode* tail 
ch605:initially null; 

3 local Qnode* myNode initially &nodes{i], 
successor, pred; 


Entry code for process i 


myNode.next < null; 
pred <— swap(tail, myNode); 
if pred # null then 
myNode.locked <— true; 
pred.next <— myNode; 
repeat while myNode.locked = true; 
end 


— ie ee 


Critical Section; 


Exit code for process i; 


14 if myNode.next = null then 


15 if compare-and-swap(tail, myNode, null) = 
false then 
16 repeat while myNode.next = null ; 
17 successor <— myNode.next; 
18 succcessor.locked < false ; 
end 
else 
21 successor <— myNode.next; 
22 succcessor.locked < false; 
end 
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a.k.a. first-come-first-served). Informally, FIFO 
ensures that processes succeed in capturing the 
lock in the order in which they start waiting for it 
(see [15] for a more formal definition of the FIFO 


property). 


The Algorithm 

Pseudocode of the algorithm is presented in Al- 
gorithm 1. The key data structure used by the 
algorithm is the nodes array (statement 2), where 
entry 7 is owned by process i, fori € {0,...,2— 
1}. This array represents a queue of processes 
waiting to enter the critical section. Each array 
entry is a Qnode structure (statement 1), com- 
prising a next pointer to the structure of the 
next process in the queue and a locked flag on 
which a process waits until it is signaled by its 
predecessor. Shared variable tail points to the end 
of the queue and either stores a pointer to the 
structure of the last process in the queue or is null 
if the queue is empty. 

Before entering the critical section, a process 
p first initializes the next pointer of its Qnode 
structure to null (statement 5), indicating that it 
is about to become the last process in the queue. 
It then becomes the last queue process by atom- 
ically swapping the values of tail and its local 
Qnode structure (statement 6); the previous value 
of tail is stored to local variable pred. If the queue 
was previously empty (statement 7), p enters the 
critical section immediately. Otherwise, p initial- 
izes its Qnode structure (statement 8), writes a 
pointer to its Qnode structure to the next field of 
its predecessor’s Qnode structure (statement 9), 
and then busy-waits until it is signaled by its 
predecessor (statement 10). 

To exit the critical section, process p first 
checks whether its queue successor (if any) has 
set the next pointer of its Qnode structure (state- 
ment 14), in which case p signals its successor 
to enter the critical section (statements 21-22). 
Even if no process has set p’s next pointer yet, 
it is still possible that p does have a successor q 
that executed the swap of statement 6 but did not 
yet update p’s next pointer in statement 9. Also 
in this case, p must signal g before it is allowed 
to exit the critical section. To determine whether 
or not this is the case, p attempts to perform a 
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CAS operation that will swap the value of tail 
back to null if p is the single queue process. If the 
CAS fails, then p does have a successor and must 
therefore wait until its next pointer is updated by 
the successor. Once this happens, p signals its 
successor and exits (statements 16-18). 

Mellor-Crummey and Scott’s paper won the 
2006 Edsger W. Dijkstra Prize in Distributed 
Computing. Quoting from the prize announce- 
ment, the MCS lock is “...probably the most 
influential practical mutual exclusion algorithm 
of all time.” 


Cross-References 


Concurrent Programming, Mutual Exclusion 
Transactional Memory 
Wait-Free Synchronization 


Further Reading 


For a comprehensive discussion of local-spin mu- 
tual exclusion algorithms, the reader is referred 
to the excellent survey by Anderson, Kim, and 
Herman [6]. Craig, Landin, and Hagersten [8, 17] 
presented another queue lock — the CLH lock. 
The algorithm underlying CLH locks is simpler 
than the MCS algorithm and, unlike MCS, only 
requires the swap strong synchronization opera- 
tion. On the downside, CLH locks are not local 
spin on DSM multiprocessors. 

In many contemporary multiprocessor archi- 
tectures, processors are organized in clusters and 
intercluster communication is much slower than 
intra-cluster communication. Hierarchical locks 
take into account architecture-dependent consid- 
erations such as inter- and intra-cluster laten- 
cies and may thus improve lock performance 
on nonuniform memory architectures (NUMA). 
The idea underlying hierarchical locks is that 
intercluster lock transfers should be favored. A 
key challenge faced by hierarchical lock imple- 
mentations is that of ensuring fairness. 

Radovic and Hagersten presented the first hi- 
erarchical lock algorithms [19], based on the idea 
that a process busy-waiting for a lock should back 
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off for a short duration if the lock is held by 
a process from its own cluster and for a much 
longer duration otherwise; this is a simple way of 
ensuring that intra-cluster lock transfers become 
more likely. Several works pursued this line of 
research by presenting alternative NUMA-aware 
lock algorithms (e.g., [10, 16]). For alternatives to 
lock-based concurrent programming, the reader 
is referred to [Wait-free Synchronization, Trans- 
actional Memory]. 
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Problem Definition 


This field of research evolves around the design 
of algorithms in the presence of memory con- 
straints. Research on this topic has been going 
on for over 40 years [16]. Initially, this was 
motivated by the high cost of memory space. 
Afterward, the topic received a renewed interest 
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with appearance of smartphones and other types 
of handheld devices for which large amounts of 
memory are either expensive or not desirable. 

Although many variations of this principle 
exist, the general idea is the same: the input is in 
some kind of read-only data structure, the output 
must be given in a write-only structure, and in 
addition to these two structures, we can only use a 
fixed amount of memory to compute the solution. 
This memory should be enough to cover all 
space requirements of the algorithm (including 
the variables directly used by the algorithm, space 
needed to make recursion, invoking procedures, 
etc.). In the following we list the most commonly 
considered limitations for both the input and the 
workspace. 


Considerations on the Input 

One of the most restrictive models that has been 
considered is the one-pass (or streaming) model. 
In this setting the elements of the input can only 
be scanned once in a sequential fashion. Given 
the limitations, the usual aim is to approximate 
the solution and ideally obtain some kind of 
worst-case approximation ratio with respect to 
the optimal solution. 

The natural extension of the above constraint 
is the multi-pass model, in which the input can be 
scanned sequentially a constant number of times. 
In here we look for trade-off between the number 
of passes and either the size of the workspace or 
the quality of the approximation. 

The next natural step is to allow input to be 
scanned any number of times and even allowing 
random access to the input values. Research for 
this model focuses on either computability (..e., 
determining whether or not a particular problem 
is solvable with a workspace of fixed size) or the 
design of efficient algorithms whose running time 
is not much worse (when compared to the case in 
which no space constraints exist). 

A more generous model considered is the in- 
place one. In this model, the values of the input 
can be rearranged (or sometimes even overwrit- 
ten). Note that the input need not be recoverable 
after an execution of the algorithm. By making 
an appropriate permutation of the input, we can 
usually encode different data structures. Thus, 
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algorithms under this model often achieve the 
running times comparable to those in uncon- 
strained settings. 


Considerations on the Workspace 

The most natural way to measure the space re- 
quired by the algorithms is simply the number of 
bits used. On many cases it is simpler to count the 
number of words (i.e., the minimum amount of 
space needed to store a variable, a pointer to some 
position in the input, or simply a counter) used by 
the algorithm. It is widely accepted that a word 
needs O(log) bits; thus it is easy to alternate 
between both approaches. 

Most of the literature focuses in the case in 
which the workspace can only fit O(log) bits. 
This workspace (combined with random access 
to the input) defines the heavily studied log-space 
complexity class within computational complex- 
ity. Within this field the main focus of research 
is to determine whether or not a problem can be 
solved (without considering other properties such 
as the running time of the algorithm). Due to the 
logarithmic bit-word equivalence, an algorithm 
that uses O(logn) bits is also referred to as a 
constant workspace algorithm. 

There has also been an interest in the design 
of algorithms whose workspace depends on some 
parameter determined by the user. In this case 
the aim is to obtain an algorithm whose running 
time decreases as the space increases (this is often 
referred to as a time-space trade-off). 


Key Results 


Selection and Sorting in Multi-pass Models 
One of the most studied problems under the 
multi-pass model is sorting. That is, given a list 
of n distinct numbers, how fast can we sort them? 
How many passes of the input are needed when 
our total amount of memory is limited? Whenever 
workspace is not large enough to hold the sorted 
list, the aim is to simply report the values of the 
input in ascending order. 

The first time-space trade-off algorithm for 
sorting under the multi-pass model was given 
by Munro and Paterson [13], where several up- 
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per and lower space bounds were given (as a 
function on the number of times we can scan 
the input). The bounds were afterward improved 
and extended for the random access model: it 
is known that the running time of an algorithm 
that uses O(b) bits must be at least 2(n?/b) [8]. 
Matching upper bounds for the case in which b € 
Q(logn) N O(n/logn) were shown by Pagter 
and Rauhe [15] (where b denotes the size of the 
workspace, in bits). 


Selection 

Another closely related topic is selection. In ad- 
dition to the list of m numbers, we are given an 
integer k <n. The aim is to compute the number 
whose rank is k (i.e., the kth smallest value). It is 
well-known that this problem can be solved in 
linear time when no space constraints exist [7]. 
Munro and Patterson [13] presented a time-space 
trade-off algorithm for the multi-pass model. For 
any w € Q(log*n) N O(n/ logn), the algorithm 
runs in O(n log,,n +n log w) time and uses O(w) 
words of space. 

The algorithm stores two values — called filters 
— for which we know that the element to select 
lies in between (initially, the filters are simply 
set to oo, respectively). Thus, the aim is to 
iteratively scan the input shrinking the distance 
between the filters. At each iteration we look for 
an element whose rank is as close as possible 
to k (ignoring elements that do not lie within 
the filters). Once we have a candidate, we can 
compute its exact rank in linear time by com- 
paring its value with the other elements of the 
input and update either the upper or lower filter 
accordingly. The process is repeated until O(w) 
elements remain between the filters. 

The key of this algorithm lies in a good choice 
of an approximation so that a large amount of 
values are filtered. The method of Munro and 
Patterson [13] first constructs a sample of the 
input as follows: for a block of up to ee el- 
ements, its sample simply consists of these el- 
ements sorted in increasing value. For larger 
blocks B (say, 2! ia elements for some i € 
{1,..., flog(n/w)]}), partition the block into two 
equally sized sub-blocks and construct their sam- 
ples inductively. Then, the sample of B is created 
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by selecting one every other element of each of 
the samples of the two sub-blocks and sorting the 
obtained list. The sample of the whole input can 
be constructed in a bottom-up fashion that uses 
at most OG) words in each level (thus, O(w) 
in total). Once we have computed the sample 
of the input, we can extract its approximation 
by selecting the corresponding value within the 
sample. 


Randomized Algorithms 

The previous approach can be drastically simpli- 
fied under randomized environments. Simply se- 
lect an element of the input uniformly at random, 
compute its rank, and update one of the filters ac- 
cordingly. With high probability after a constant 
number of iterations, a constant fraction of the 
input will be discarded. Thus, overall O(logn) 
iterations (and a constant number of words) will 
be needed to find the solution. Chan [9] improved 
this idea, obtaining an algorithm that runs in 
O(n loglog,,n) time and uses O(w) words (for 
any w <n). 


Improvements 

For most values of w, the algorithm of Munro and 
Patterson is asymptotically tight, even if we allow 
random access to the input. Thus, further research 
focused in extending the range space for which 
optimality is known. Frederickson [11] increased 
the optimality range for the selection problem to 
any w € 92(logn2) N O(2'°8"/'8""). Recently, 
Elmasry et al. [10] gave a linear time algorithm 
that only uses O(n) bits (i.e., they preserve the 
linear running time of [7] and reduce the size of 
the workspace by a logarithmic factor). Raman 
and Ramnath [17] used a similar approximate 
median approach for the case in which o(logn) 
words fit in the workspace. 


Undirected Graph Connectivity in the 
Random Access Model 

Given an undirected graph G = (V, E) and two 
vertices s,¢ € V, the undirected s—t connectivity 
problem is to decide whether or not there exists a 
path from s to ¢ in G. This problem can be easily 
solved in linear time using linear space (with 
either breadth-first search or width-first search 
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schemes) in unrestricted models. However, deter- 
mining the existence of a deterministic algorithm 
that only uses O(logn) bits space was a long- 
standing open problem in the field of complexity 
theory. 


Problem Background 

Aleliunas et al. [1] showed that the problem can 
be easily solved with a randomized algorithm. 
Essentially they show that a sufficiently long ran- 
dom walk will traverse all vertices of a connected 
graph. Thus, if we start at s and do not reach t 
after a polynomially bounded number of steps, 
we can conclude that with high probability, s and 
t are not connected. 

The connectivity problem can also be solved 
with a nondeterministic logspace algorithm 
(where the certificate is simply the path 
connecting s and t). Thus, Savitch’s theorem [19] 
can transform it to a deterministic algorithm 
that uses O(log”) bits (and superpolynomial 
time). The space requirements were afterward 
reduced to O(log?/? n) [14] and O(log*/? n) [2]. 
Recently, Reingold [18] positively answered 
the question by giving a deterministic logspace 
algorithm. Although no discussion on the running 
time is explicitly mentioned, it is well known that 
Reingold’s algorithm runs in polynomial time. 
This is due to the fact that a Turing machine with 
a logarithmic space constraint can have at most 
20llosn) — O(n) different configurations. 


Reingold’s Algorithm 

Conceptually, the algorithm aims to transform G 
into a graph in which all vertices have degree 
three. This is done by virtually replacing each 
vertex of degree k > 3 by a cycle of k vertices 
each of which is adjacent to one of the neighbors 
of the previous graph (and adding self-loops to 
vertices of low degree). 

The algorithm then combines the squaring 
and the zig-zag product operations. The squaring 
operation connects vertices whose distance in the 
original graph is at most two, while the zig-zag 
product between two graphs G and H essentially 
replaces every vertex of G with a copy of H (and 
connects vertices of two copies of H if and only 
if the original vertices were adjacent in G). 
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Intuitively speaking, the squaring operation 
will reduce the diameter while the zig-zag 
product keeps the degree of the vertices bounded 
(for this algorithm, H consists of a sparse graph 
of constant degree and small diameter). After 
repeating this process a logarithmic number 
of times, the resulting graph has bounded 
degree, logarithmic diameter and preserves the 
connectivity between the corresponding vertices. 
In particular, we can determine the connectivity 
between u and v by exhaustively looking through 
all paths of logarithmic length starting from uw. 
Since each vertex has bounded degree, the paths 
can be encoded without exceeding the space 
bounds. The algorithm will stop as soon as v is 
found in any of these paths or after all paths have 
been explored. 

Even though we cannot store the transforma- 
tion of G explicitly, the only operation that is 
needed during the exhaustive search is to de- 
termine the ith clockwise neighbor of a vertex 
after j transformation steps have been done on 
G (for some i and j € O(logn)). Reingold 
provided a method to answer such queries using 
constant number of bits on the graph resulting 
after doing j —1 transformation steps on G. Thus, 
by repeating the process inductively, we can find 
the solution to our query without exceeding the 
space bounds. 


Other Models of Note 

The study of memory constrained algorithms has 
received a lot of interest by the computational ge- 
ometry community. Most of them use random ac- 
cess to the input, use a constant number of words, 
and aim to reporting fundamental geometric data 
structures. For example, Jarvis march [12] (also 
known as the gift-wrapping algorithm) computes 
the convex hull of a set of n points in O(nh) time 
(where h is the number of vertices on the convex 
hull). Asano and Rote [3] showed how to com- 
pute the Voronoi diagram and minimum spanning 
tree of a given set of points in O(n?) and O(n?) 
time, respectively. Similarly, several time-space 
trade-off algorithms have been designed for clas- 
sic problems within a simple polygon, such as 
triangulation [4], shortest path computation [4], 
or visibility [5]. These algorithms use properties 
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of the problem considered so as to somehow com- 
pute the solution using local information when- 
ever possible. In most cases, this ends up in an 
algorithm that is completely different from those 
used when no memory constraints exist. 


Compressed Stack 

A different approach was taken by Barba 
et al. [6], where the class of stack algorithms 
is considered. This class consists of deterministic 
algorithms that have a one-pass access to the 
input and can use a constant number of words. 
In addition, they allow the usage of a stack so as 
to store elements of the input. At any instant of 
time, only the top element of the stack is available 
to the algorithm. Note that with this additional 
stack, we can store up to O(n) values of the 
input. Hence, the model is not strictly speaking 
memory constrained. 

Although this model may seem a bit artificial, 
Barba et al. give several examples of well-known 
programs that fit into this class of algorithms 
(such as computing the convex hull of a simple 
polygon or computing the visibility of a point 
inside a simple polygon). More interestingly, they 
show how to remove the auxiliary stack, effec- 
tively transforming any stack algorithm into a 
memory constrained algorithm that uses O(w) 
words (for any w € {1,...,7}). 


Block Reconstruction 

The key to this approach lies in the fact that por- 
tions of the stack can be reconstructed efficiently: 
let dj;,dj41,...,a; be a set of consecutive ele- 
ments of the input (for some i < /) such that 
we know that a; and a; are in the stack after 
a; has been processed. Then, we can determine 
which values in between a; and a; are also in the 
stack by re-executing the algorithm restricting the 
input to the a;,...,a@; interval (thus, taking time 
proportional to O(j —i)). 

Using this idea, we can avoid explicitly storing 
the whole stack by using a compressed stack 
data structure that never exceeds the size of the 
workspace. For the particular case in which w = 
0(./n), the algorithm virtually partitions the 
input into blocks of size ./n. The invariant of 
the data structure is that the portions of the 
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stack corresponding to the last two blocks that 
have pushed elements into the stack are stored, 
whereas for any other block, we only store the 
first and last elements that are in the stack, if 
any. By using a charging scheme, they show that 
each block triggers at most one reconstruction, 
and each reconstruction takes time proportional 
to the size of the destroyed block. In all, the com- 
pressed stack data structure reduces the size of 
the workspace to @(./n) without asymptotically 
increasing the running time. 

In the general case, the input is split 
into p equally sized blocks (where p = 
max{2, w/ logn}), and each block is further sub- 
partitioned into blocks until the blocks of the 
lowermost level can be explicitly stored (or the 
recursion exceeds size of the workspace). The 
smaller the size of the workspace, the higher 
the number of levels it will have and thus more 
time will be spent in reconstructing blocks. This 
creates a time-space trade-off for any stack 
algorithm whose running time is O( = “) 
time for any workspace of w € o(logn) words. 
For larger workspaces, the algorithm runs in 
O(n!*+1/le8?) time and uses O(p log, n) words 
(for2 < p <n). 
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Problem Definition 


The Model: A mobile robotic sensor (or simply 
sensor) is modeled as a computational unit with 
sensorial capabilities: it can perceive the spatial 
environment within a fixed distance V > 0, 
called visibility range, it has its own local work- 
ing memory, and it is capable of performing local 
computations [3,4]. 

Each sensor is a point in its own local Carte- 
sian coordinate system (not necessarily consistent 
with the others), of which it perceives itself as the 
center. A sensor can move in any direction, but 
it may be stopped before reaching its destination, 
e.g. because of limits to its motion energy; how- 
ever, it is assumed that the distance traveled in 
a move by a sensor is not infinitesimally small 
(unless it brings the sensor to its destination). 

The sensors have no means of direct commu- 
nication: any communication occurs in an im- 
plicit manner, by observing the other sensors’ 
positions. Moreover, they are autonomous (i.e., 
without a central control) identical (i.e., they 
execute the same protocol), and anonymous (i.e., 
without identifiers that can be used during the 
computation). 

The sensors can be active or inactive. When 
active, a sensor performs a Look-Compute-Move 
cycle of operations: it first observes the portion 
of the space within its visibility range obtaining 
a snapshot of the positions of the sensors in its 
range at that time (Look); using the snapshot as 
an input, the sensor then executes the algorithm to 
determine a destination point (Compute); finally, 
it moves toward the computed destination, if 
different from the current location (Move). After 
that, it becomes inactive and stays idle until the 
next activation. Sensors are oblivious: when a 
sensor becomes active, it does not remember 
any information from previous cycles. Note that 
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several sensors could actually occupy the same 
point; we call multiplicity detection the ability of 
a sensor to see whether a point is occupied by a 
single sensor or by more than one. 

Depending on the degree of synchronization 
among the cycles of different sensors, three sub- 
models are traditionally identified: synchronous, 
semi-synchronous, and asynchronous. In the syn- 
chronous (FSYNC) and in the semi-synchronous 
(SSYNC) models, there is a global clock tick 
reaching all sensors simultaneously, and a sen- 
sor’s cycle is an instantaneous event that starts at 
a clock tick and ends by the next. In FSYNC, at 
each clock tick all sensors become active, while 
in SSYNC only a subset of the sensors might 
be active in each cycle. In the asynchronous 
model (ASYNC), there is no global clock and 
the sensors do not have a common notion of 
time. Furthermore, the duration of each activity 
(or inactivity) is finite but unpredictable. As a 
result, sensors can be seen while moving, and 
computations can be made based on obsolete 
observations. 

Let S(t) = {s1(t),...,5n(t)} denote the set 
of the sensors’ at time t. When no ambigu- 
ity arises, we shall omit the temporal index f. 
Moreover, with an abuse of notation we indicate 
by s; both a sensor and its position. Let S;(t) 
denote the set of sensors that are within distance 
V from s; at time f, that is, the set of sensors 
that are visible from s;. At any point in time 
t, the sensors induce a visibility graph G(t) = 
(N, E(t)) defined as follows: N = S and Vr,s € 
N, (r,s) € E(t) iff r and s are at distance no 
more than the visibility range V. 


The Problem: In this setting, one of the most 
basic coordination and synchronization task is 
Gathering: the sensors, initially placed in arbi- 
trary distinct positions in a 2-dimensional space, 
must congregate at a single location (the choice 
of the location is not predetermined) within finite 
time. In the following, we assume n > 2. A prob- 
lem closely related to Gathering is Convergence, 
where the sensors need to be arbitrarily close to 
a common location, without the requirement of 
ever reaching it. A special type of convergence 
(also called Near-Gathering or collisionless con- 
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vergence) requires the sensors to converge with- 
out ever colliding with each other. 


Key Results 


Basic Impossibility Results 

First of all, neither Convergence nor Gathering 
can be achieved from arbitrary initial placements 
if the initial visibility graph G(0) is not con- 
nected. So, in all the literature, G(0) is always 
assumed to be connected. Furthermore, if the 
sensors have neither agreement on the coordinate 
system nor multiplicity detection, then Gathering 
is not solvable in SSYNC (and thus in ASYNC), re- 
gardless of the range of visibility and the amount 
of memory that they may have. 


Theorem 1 ([8]) Jn absence of multiplicity de- 
tection and of any agreement on the coordinate 
systems, Gathering is deterministically unsolv- 
able in SSYNC. 


Given this impossibility result, the natural 
question is whether the problem can be solved 
with common coordinate systems. 


Gathering with Common Coordinate 

Systems 

Assuming that the sensors agree on a common 
coordinate system, Gathering has been shown 
to be solvable under the weakest of the three 
schedulers (ASYNC) [2]. 

Let R be the rightmost vertical axis where 
some sensor initially lie. The idea of the algo- 
rithm is to make the sensors move toward 7, in 
such a way that, after a finite number of steps, 
they will reach it and gather at the bottommost 
position occupied by a sensor at that time. Let the 
Look operation of sensor s; at time ¢ return S;(f). 
The computed destination point of s; depends 
on the positions of the visible sensors. Once the 
computation is completed, s; moves toward its 
destination (but it may stop before the destination 
is reached). Informally,] 


e If s; sees sensors to its left or above on the 
vertical axis passing through its position (this 
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axis will be referred to as its vertical axis), it 
does not move. 

¢ If s; sees sensors only below on its vertical 
axis, it moves down toward the nearest sensor. 

¢ If s; sees sensors only to its right, it moves 
horizontally toward the vertical axis of the 
nearest sensor. 

e If s; sees sensors both below on its vertical 
axis and on its right, it computes a destination 
point and performs a diagonal move to the 
right and down, as explained below. 


To describe the diagonal movement we intro- 
duce some notation (refer to Fig. 1). Let AA’ be 
the vertical diameter of S;(t) with A’ the top 
and A the bottom end point; let £; denote the 
topologically open region (with respect to AA’) 
inside S;(t) and to the right of s; and let S = sjA 
and S’ = 5; A’, where neither S’ and S' include 
s;. Let W be the vertical axis of the horizontally 
closest sensor (if any) in £;. 


Diagonal Movement (W) 


upper intersection between S;(ft) and 


C_ := lower intersection between S;(t) and 
wr; 


2B = AS; B; 

If Bp < 
Rotate (s;, B); 

H:=Diagonal_ Destination (WA,B); 

Move towards H. 


60° then (B,V) 


where Rotate() and Diagonal _Destina- 
tion () are as follows: 


— Rotate(s;,B) rotates the segment 5;B in 
such a way that 6 = 60° and returns the 
new position of B and W. This angle choice 
ensures that the destination point is not outside 
the circle. 

— Diagonal Destination (Y, A, B) 
computes the destination of s; as follows: 
the direction of s;’s movement is given 
by the perpendicular to the segment AB; 
the destination of s; is the point H on the 
intersection of the direction of its movement 
and of the axis W. 
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a A’ b 


A 
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Memoryless Gathering of Mobile Robotic Sensors, Fig. 1 From [4]: (a) Notation. (b) Horizontal move. (c) 


Diagonal move 


Memoryless Gathering a 

of Mobile Robotic 

Sensors, Fig. 2. From [4]: 

Notation for algorithm 

CONVERGENCE [1] , 


Theorem 2 ((2]) With common coordinate sys- 
tems, Gathering is possible in ASYNC. 


Gathering has been shown to be possible in 
SSYNC also when compasses are unstable for 
some arbitrary long periods, provided they have 
a common clockwise notion, and that they even- 
tually stabilize, and assuming the total number of 
sensors is known [9]. 


Convergence and Near-Gathering 


Convergence in SsYNc 
The impossibility result does not apply to the case 
of Convergence. In fact, it is possible to solve it in 
SSYNC in the basic model (i.e., without common 
coordinate systems) [1]. 

Let SC; (t) denote the smallest enclosing cir- 
cle of the set {s;(t)|s; € S(¢)} of positions 
of sensors in S(t); let c;(t) be the center of 


SC; (t). 


CONVERGENCE 
Assumptions: SSYNC. 


If S;(t) = {s;}, then gathering is completed. 
Vsj € Si(t) \ {sit}: 
dj; := dist(s;(t), Sj (t)), 


0; = ci(t)si(t)s; (0), 
lj ‘= (d;/2) cos 6; + 


V(V/2)? — ((d;/2) sin ;)?, 


limit := Mins ; E5;(t)\fs, {Lj} 

goal := dist(s;(t), ci(t)), 

D := min{goal, limit}, 

Pp := point on s;(t)c;(t) at distance D from 
Si(t). 


Move towards p. 


Everytime s; becomes active, it moves to- 
ward c;(t), but only up to a certain distance. 
Specifically, if s; does not see any sensor other 
than itself, then s; does not move. Otherwise, 
its destination is the point p on the segment 
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si (t)c;(t) that is closest to c;(t) and that satisfies 
the following condition: For every sensor s; € 
S(t), p lies in the disk C; whose center is the 
midpoint m; of s;(¢) and s;(¢t) and whose range 
is V/2 (see Fig. 2). This condition ensures that s; 
and s; will still be visible after the movement of 
s;, and possibly of s;. 


Theorem 3 ({1]) Convergence is possible in 
SSYNC. 


Convergence in AsSYNC 

Convergence has been shown to be possible also 
in ASYNC, but under special schedulers: partial 
ASYNC [6] and /-fair AsyNc [5]. In partial 
ASYNC the time spent by a sensor performing the 
Look, Compute, and Sleep operations is bounded 
by a globally predefined amount and the time 
spent in the Move operation by a locally pre- 
defined amount; in 1-fair ASYNC between two 
successive activations of each sensor, all the other 
sensors are activated at most once. Finally, Con- 
vergence has been studied also in presence of 
perception inaccuracies (radial errors in locating 
a sensor) and it has been shown how to reach 
convergence in FSYNC for small inaccuracies. 


Near-Gathering 

Slight modifications can make the algorithm of 
[1] described above collisionless, thus solving 
also the collisionless Convergence problem (i.e., 
Near-Gathering) in SSYNC. Near-Gathering can 
be achieved also in ASYNC, with two additional 
assumptions [7]: 1) the sensors must partially 
agree on a common coordinate system (one axis 
is sufficient) and 2) the initial visibility graph 
must be well connected, that is, the subgraph of 
the visibility graph that contains only the edges 
corresponding to sensors at distance _ strictly 
smaller than V must be connected. 


Open Problems 


The existing results for Gathering and Conver- 
gence leave several problems open. For example, 
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Gathering in SSYNC (and thus ASYNC) has been 
proven impossible when neither multiplicity de- 
tection nor an orientation are available. While 
common orientation suffices, it is not known 
whether the presence of multiplicity detection 
alone is sufficient to solve the problem. Also, 
the impossibility result does not apply to FSYNC; 
however no algorithm is known in such a setting 
that does not make use of orientation. Finally, it is 
not known whether Convergence (collisionless or 
not) is solvable in ASYNC without additional as- 
sumptions: so far no algorithm has been provided 
nor an impossibility proof. 
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Problem Definition 


The class of piecewise smooth complex (PSC) 
includes geometries that go beyond smooth sur- 
faces. They contain polyhedra, smooth and non- 
smooth surfaces with or without boundaries, and 
more importantly non-manifolds. Thus, provable 
mesh generation algorithms for this domain ex- 
tend the scope of mesh generation to a wide 
variety of domains. Just as in surface mesh gener- 
ation, we are required to compute a set of points 
on the input complex and then connect them with 
a simplicial complex which is geometrically close 
and is topologically equivalent to the input. One 
challenge that makes this task harder is that the 
PSCs allow arbitrary small input angles, a notori- 
ous well-known hurdle for mesh generation. 

A PSC is a set of cells, each being a smooth, 
connected manifold, possibly with boundary. The 
Q-cells, 1-cells, and 2-cells are called corners, 
ridges, and patches, respectively. A PSC could 
also contain 3-cells that designate regions to be 
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meshed with tetrahedra, if we are interested in a 
volume mesh. 


Definition 1 (ridge; patch) A ridge is a closed, 
connected subset of a smooth 1-manifold without 
boundary in R3. A patch is a 2-manifold that is a 
closed, connected subset of a smooth 2-manifold 
without boundary in R?. 


Definition 2. A piecewise smooth complex S is 
a finite set of vertices, ridges, and patches that 
satisfy the following conditions. 


1. The boundary of each cell in S is a union of 
cells in S. 

2. If two distinct cells c; and cz in S intersect, 
their intersection is a union of cells in S 
included in the boundary of c; or c2. 


Our goal is to generate a triangulation of a 
PSC. Element quality is not a primary issue here 
though a good radius-edge ratio can be obtained 
by additional refinement except near the small 
input angles. The definition below makes our no- 
tion of triangulation of a PSC precise. Recall that 
|7' | denotes the underlying space of a complex T 
(Fig. 1). 


Definition 3 (triangulation of a piecewise 
smooth complex) A simplicial complex T 
is a triangulation of a PSC S if there is a 
homeomorphism from |S| to |T| such that 
h(v) = v for each vertex v € S and for each cell 
— € S, there is a subcomplex 7; C T such that h 
is a homeomorphism from & to |T¢|. 


Key Results 


To generate a mesh for a PSC with theoretical 
guarantees, we use Delaunay refinement as in 
the smooth surface meshing. For a point set 
P C R?, let Vor P and Del P denote the Voronoi 
diagram and Delaunay triangulation of P, respec- 
tively. The restricted Delaunay complex as in the 
smooth surface meshing plays an important role 
in sampling the PSCs. 
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Meshing Piecewise Smooth Complexes, Fig. 1 Example meshes of PSC: (left) a piecewise smooth surface, a non- 
manifold, a surface with non-trivial topology 


Let Vz denote the dual Voronoi face of a De- 
launay simplex € in Del P. The restricted Voronoi 
face of Vg with respect to X C R? is the inter- 
section Vg|x = Vg NX. The restricted Voronoi 
diagram and restricted Delaunay triangulation of 
P with respect to X are 


Vor P|x = {Velx | Velx 4 @} and Del P |x 
= {E | Velx 4 O} respectively. 


In words, Del P|x consists of those Delaunay 
simplices in Del P whose dual Voronoi face in- 
tersects X. We call these simplices restricted. 
For a restricted triangle, its dual Voronoi edge 
intersects the domain in a single or multiple 
points. These are the centers of surface Delaunay 
balls that circumscribe the vertices of the triangle. 

In smooth surface meshing, the restricted tri- 
angles violating certain desirable properties are 
refined by the addition of the surface Delaunay 
ball centers. It turns out that this process can- 
not continue forever because the inserted points 
maintain a fixed lower bound on their distances 
from the existing points. This argument breaks 
down if non-smoothness is allowed. In particular, 
ridges and corners where several patches meet 
cause the local feature size to be zero in which 
case inserted points with a lower bound on local 
feature size may come arbitrarily close to each 
other. Nevertheless, Boissonnat and Oudot [1] 
showed that the Delaunay refinement that they 
proposed for smooth surfaces can be extended 
to a special class of piecewise smooth surfaces 


called Lipschitz surfaces. Their algorithm may 
break down for domains with small angles. The 
analysis requires that the input angles subtended 
by the tangent planes of the patches meeting at 
the ridges or a corner are close to 180°. 

The first guaranteed algorithm for PSCs 
with small angles is due to Cheng, Dey, 
and Ramos [3]. They introduced the idea of 
using weighted vertices as protecting balls 
in a weighted Delaunay triangulation. In this 
triangulation each point p is equipped with 
a weight w, which can also be viewed as a 
ball p = B(p,wp) centered at p with radius 
wp. The squared weighted distance between 
two points (p,w,) and (g,w,) is measured as 
|| p—||? —w%, —w3. Notice that the weight can be 
zero in which case the weighted point is a regular 
point. One can define a Voronoi diagram and 
its dual Delaunay triangulation for a weighted 
point set just like their Euclidean counterparts 
by replacing Euclidean distances with weighted 
distances. To emphasize the weighted case, let 
us denote a weighted point set P with P[w] 
and its weighted Delaunay triangulation with 
Del P[w]. 

The algorithm of Cheng, Dey, and Ramos [3] 
has two phases, the protection phase and the 
refinement phase. In the protection phase, it com- 
putes a set of protecting balls centered at the 
ridges and corners of the input PSC. The union of 
the protecting balls cover the ridges and corners 
completely. Let P[w] be the weighted points 
representing the protecting balls. The algorithm 
computes Del P[w] and the restricted Delaunay 
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triangles in Del P[w]|s. In the refinement phase, 
it refines the restricted triangles if either they 
have a large surface Delaunay ball (albeit in the 
weighted sense), or they fail to form a topological 
disk around each vertex on each patch adjoining 
the vertex. The algorithm guarantees that the final 
mesh is a triangulation of the input PSC and 
the two are related by a homeomorphism. The 
proof of the homeomorphism uses an extension 
of the topological ball property of Edelsbrunner 
and Shah [6] to accommodate PSCs. 

The algorithm of Cheng et al. [3] requires 
difficult geometric computations such as feature 
size estimates. Cheng, Dey, and Levine [2] sim- 
plified some of these computations at the expense 
of weakening the topological guarantees. Like 
the smooth surface meshing algorithm in [4], 
they guarantee that each input patch and ridge 
is approximated with output simplices that form 
a manifold of appropriate dimension. The al- 
gorithm is supplied with a user-specified size 
parameter. If this size parameter is small enough, 
the output is a triangulation of the PSC in the 
sense of Definition 3. In a subsequent paper, Dey 
and Levine [5] proposed a strategy to combine 
the protection phase with refinement phase which 
allowed to adjust the ball sizes on the fly rather 
than computing them beforehand by estimating 
feature sizes. Cheng, Dey, and Shewchuk [4] 
refined this strategy even further to have an im- 
proved algorithm with detailed analysis. 


Theorem 1 ([4]) There is a Delaunay refine- 
ment algorithm that runs with a parameter 
A > 0 0n an input PSC S and outputs a mesh 
T = Del P[w]|s with the following guarantees: 


1. For each patch o € S, |DelPl[w]|c| is a 
2-manifold with boundary, and every vertex 
in Del P[wl|o lies on o. The boundary of 
[Del P[w]|o| is homeomorphic to Bdo, 
the boundary of o, and every vertex in 
Del P[w]|Ba o lies on Bd o. 

2. If A is sufficiently small, then T is a triangu- 
lation of S (recall Definition 3). Furthermore, 
there is a homeomorphism h from |S| to |T| 
such that for every i-dimensional cell — € S; 
with i € [0,2], h is a homeomorphism from 
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E to |Del P{w]le|, every vertex in Del P[w]|e 
lies on &, and the boundary of |Del Piwle| is 
|Del P[w]|pa¢| 


Notice that the above guarantee specifies 
that the homeomorphism between the input and 
the output respects the stratification of corners, 
ridges, and patches and thus preserves these 
features. Once a mesh for the surface patches is 
completed, Delaunay refinement algorithms can 
further refine the mesh to improve the quality 
of the surface triangles or the tetrahedra they 
enclose. The algorithm can only attack triangles 
and tetrahedra with large orthoradius-edge ratios; 
some simplices with large circumradius-edge 
ratios may survive. The tetrahedral refinement 
algorithm should be careful in that if inserting 
a vertex at the circumcenter of a poor-quality 
tetrahedron destroys some surface triangle, the 
algorithm simply should opt not to insert the new 
vertex. This approach has the flaw that tetrahedra 
with large radius-edge ratios sometimes survive 
near the boundary. 


URLs to Code and Data Sets 


CGAL(http://cgal.org), a library of geometric al- 
gorithms, contains software for mesh generation 
of piecewise smooth surfaces. The DelPSC soft- 
ware that implements the PSC meshing algorithm 
as described in [4] is available from http://web. 
cse.ohio-state.edu/~tamaldey/delpsc.html. 
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Problem Definition: The Notion of a 
Message Adversary 


Message adversaries have been introduced by N. 
Santoro and P. Widmayer in a paper titled Time 
is not a healer [15] to model and understand 
what they called dynamic transmission failures in 
the context of synchronous systems. Then, they 
extended their approach in [16] where they used 
the term ubiquitous failures. The terms heard- 
of communication [5], transient link failure [17], 
and mobile failure [12] have later been used by 
other authors to capture similar network behav- 
iors in synchronous or asynchronous systems. 


Message Adversaries 


The aim of this approach is to consider mes- 
sage losses as a normal link behavior (as long as 
messages are correctly transmitted). The notion 
of a message adversary is of a different nature 
than the notion of the fair link assumption. A fair 
link assumption is an assumption on each link 
taken separately, while the message adversary 
notion considers the network as a whole; its aim 
is not to build a reliable network but to allow 
the statement of connectivity requirements that 
must be met for problems to be solved. Mes- 
sage adversaries allow us to consider topology 
changes not as anomalous network behaviors, but 
as an essential part of the deep nature of the 
system. 


Reliable Synchronous Systems 

A fully connected synchronous system is made 
up of n computing entities (processes) denoted 
Pi, ---, Pn, Where each pair of processes is 
connected by a bidirectional link. The progress 
of the processes is ruled by a global clock which 
generates a sequence of rounds. During a round, 
each process (a) first sends a message to all the 
other processes (broadcast), (b) then receives a 
message from each other process, and (c) finally 
executes a local computation. The fundamental 
property of a synchronous system is that the 
messages sent at a round r are received during 
the very same round 7. As we can see, this type 
of synchrony is an abstraction that encapsulates 
(and hides) specific timing assumptions (there are 
bounds on message transfer delays and process- 
ing times, and those are known and used by the 
underlying system level to generate the sequence 
of synchronized rounds [13]). 

In the case of a reliable synchronous system, 
both processes and links are reliable, i.e., no 
process deviates from its specification, and all 
the messages that are sent, and only them, are 
received (exactly once) by each process. 


Message Adversary 

A message adversary is a daemon which, at 
each round, can suppress messages (hence, these 
messages are never received). The adversary is 
not prevented from having a read access to the 
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local states of the processes at the beginning of 
each round. 

It is possible to associate a directed graph G” 
with each round r. Its vertices are the processes, 
and there is an edge from p; to p; if the message 
sent at round r by p; to p; is not suppressed 
by the adversary. There is a priori no relation 
on the consecutive graphs G’, G’*!, etc. As an 
example, the daemon can define G’*! from the 
local states of the processes at the end of round r. 

Let SMP, [adv AD] denote the syn- 
chronous system whose communications are 
under the control of an adversary-denoted AD. 
SMP, [adv : 0] denotes the synchronous system 
in which the adversary has no power (it can 
suppress no message), while SMP,,[adv : co] 
denotes the synchronous system in which the 
adversary can suppress all the messages at 
every round. It is easy to see that, from a 
message adversary and computability point of 
view, SMP,,[adv : ] is the most powerful 
synchronous system model, while SMP, [adv : 
oo] is the weakest. More generally, the more 
constrained the message adversary AD, the more 
powerful the synchronous system. 


Key Results 
Key Results in Synchronous Systems 


The Spanning Tree Adversary 

Let TREE be the message adversary defined by 
the following constraint: at every round r, the 
graph G” is an undirected spanning tree, ie., 
the adversary cannot suppress the two messages 
— one in each direction — sent on the edges 
of G". Let SMP,,[adv : TREE] denote the 
corresponding synchronous system. As already 
indicated, for any r andr’ 4 r, G” and G” are 
not required to be related; they can be composed 
of totally different sets of links. 

Let us assume that each process p; has an ini- 
tial input v;. It is shown in [11] that SMP, [adv : 
TREE] allows the processes to compute any com- 
putable function on their inputs, i.e., functions on 
the vector [v1,..., Un]. 
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Solving this problem amounts to ensure that 
each input v; attains each process p; despite the 
fact that the spanning tree can change arbitrarily 
from a round to the next one. This follows from 
the following observation. At any round r, the set 
of processes can be partitioned into two subsets: 
the set yes; which contains the processes that 
have received v;, and the set no; which contains 
the processes that have not yet received v;. As 
G” is an undirected spanning tree (the tree is 
undirected because no message is suppressed on 
each of its edges), it follows that there is an edge 
of G" that connects a process of the set yes; to 
a process that belongs to the set no;. So during 
round r, there at least one process of the set no; 
which receives a copy of v; and will consequently 
belong to the set yes; of the next round. It follows 
that at most (7 — 1) rounds are necessary for v; to 
attain all the processes. 


Consensus in the Presence of Message 
Adversaries 

In the consensus problem, each process proposes 
a value and has to decide a value v such that 
v was proposed by a process, and no two pro- 
cesses decide different values. This problem is 
addressed in [2, 6, 8, 12, 17] in the message ad- 
versary context. 


Impossibility Agreement-Related Results 

As presented in [15], the k-process agreement 
problem (which must not be confused with the 
k-set agreement problem) is defined as follows. 
Each process p; proposes an input value v; € 
{0, 1}, and at least k processes have to decide 
the same proposed value v. Let us observe that 
this problem can be trivially solved without any 
communication when k < [3] (namely, each 
process decides its input value). Let us also notice 
that if k > [4], there is at most one decision 
value. 

It is shown in [15] that the k-process 
agreement problem cannot be solved for k > 
[5], if the message adversary is allowed to 
suppress up to (1 — 1) messages at every round. 
Other impossibility results are presented in [16]. 
More results can be found in [3]. 
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d-Solo Executions 

A process runs solo when it computes its local 
output without receiving any information from 
other processes, either because they crashed or 
they are too slow. This corresponds to a mes- 
sage adversary that suppresses all the messages 
sent to some subset of processes (which conse- 
quently run solo). The computability power of 
models in which several processes may run solo 
is addressed in [9]. This paper introduces the 
concept of d-solo model, 1 < d < n (syn- 
chronous round-based wait-free models where up 
to d processes may run solo in each round), 
and characterizes the colorless tasks that can 
be solved in a d-solo model. Among other re- 
sults, this paper shows that the (d,€)-solo ap- 
proximate agreement task (which generalizes €- 
approximate agreement) can be solved in the d- 
solo model, but cannot be solved in the (d + 1)- 
solo model. Hence, the d-solo models define a 
strict hierarchy. 


Key Results in Asynchronous Systems 
In a very interesting way, message adversaries 
allows the establishment of equivalences between 
synchronous systems and asynchronous systems. 
These equivalences, which are depicted in 
Fig. | (from [14]), concern tasks. A task is the 
distributed analogous of a mathematical func- 
tion [10], in the sense that each process has 
an input and must compute an output, and the 
processes need to cooperate to compute their in- 
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dividual outputs (this cooperation is inescapable, 
which makes the task distributed). 

A xr B means that any task that can be 
computed in the model A can be computed in the 
model B and vice versa. An arrow from A to B 
means that, from a task solvability point of view, 
the model A is strictly stronger than the model B 
(there are tasks solvable in A and not in B, and 
all the tasks solvable in B are solvable in A). 

Let ARWrn-iffd: 0] (resp. AMPrn-1 
[fd : 0]) denote the basic asynchronous read/write 
(resp. Message-passing) system model in which 
up to (n — 1) processes may crash (premature 
halting). These models are also called wait-free 
models. The notation [fd : @] means that these 
systems are not enriched with additional compu- 
tational power. Differently, ARWp n»—-1[fd : FD] 
(resp. AMP n—-1 [fd : FD]) denotes ARWn n-1 
[fd : 0] (resp. AMP n—1[fd : @]) enriched with 
a failure detector FD (see below). 


¢ The message adversary-denoted TOUR (four 
tournament) has been introduced By Afek and 
Gafni in [1]. At any round, this adversary can 
suppress one message on each link but not 
both: for any pair of processes (p;, p;), either 
the message from p; to p; or the message 
from p; to p; or none of them can be sup- 
pressed. 

The authors have shown the following 
model equivalence: SMP,,[adv : TOUR] 
~7 ARWyn-1|fd : 9]. This is an important 
result as it established for the first time a 


[14]: SMP, [adv : SOURCE, QUORUM] ~ 7 AMP,, ni[fd : ©, 9 


ee ee 


[14]: SMP,,[adv: QUORUM] 7 AMP, [fd : 5] 


[14]: SMP,,|adv : SOURCE, TOUR] 


T ARM), n—ilfd : Q] 


La 


[1]: SMP, [adv : TOUR] ~p ARWy, nil fd : 0 


[14]: SMP, [adv : SOURCE] ~ p AMP, na fd: Q 


ee 


SMP, [adv : oo] ~ pAMP,, n— ilfd : 0) 


Message Adversaries, Fig. 1 A message adversary-hierarchy based on task equivalence and failure detectors 
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very strong relation linking message-passing 
synchronous systems where no_ process 
crashes, but messages can be lost according 
to the adversary TOUR, with the basic 
asynchronous wait-free read/write model. 

¢ The other model equivalences, from a task 
solvability point of view, are due to Raynal 
and Stainer [14], who considered two failure 
detectors and introduced two associated mes- 
sage adversaries. 

A failure detector is an oracle that provides 
processes with information on failures. The 
failure detector Q, called “eventual leader,” 
was introduced in [4]. It is the failure detec- 
tor that provides the minimal information on 
failures that allow consensus to be solved in 
ARWrn—-1[fd : 0] andin AMP» n-1[fd : 9] 
where a majority of processes do not crash. 
The failure detector &, which is called “quo- 
rum’, was introduced in [7]. It is the failure 
detector that provides the minimal information 
on failures that allow a read/write register to 
be built on top of an asynchronous message- 
passing system prone to up to (n — 1) process 
crashes. 

The message adversary SOURCE is con- 
strained by the following property: there is a 
process px and a round r such that, at any 
round r’ > r, the adversary does not suppress 
the message sent by px to the other processes. 
The message adversary QUORUM captures 
the message patterns that allow to obtain the 
quorums defined by &. 

As indicated, the corresponding model 
equivalences are depicted on Fig. 1. As an ex- 
ample, when considering distributed tasks, the 
synchronous message-passing model where 
no process crashes and where the message 
adversary is constrained by SOURCE and 
TOUR (SMP), [adv : SOURCE, TOUR]) and 
the basic wait-free asynchronous read/write 
model enriched with Q (ARW»p n-1[fd : Q]) 
have the same computability power. 

When looking at the figure, it is easy to 
see that the suppression of the constraint 
TOUR from the model SMP,,[adv 
SOURCE, TOUR] gives the model SMP», 
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[adv SOURCE], which is equivalent 
to AMPnr n-ilfd Q]. Hence, sup- 
pressing TOUR weakens the model from 
ARWnr,n-1\fd Q]) to AMP n n-ilfd : Q]) 
(i.e., from asynchronous read/write with Q to 
asynchronous message-passing with Q). 


Applications 


Message adversaries are important because they 
allow network changes in synchronous systems 
to be easily captured. Message losses are no 
longer considered as link or process failures, 
but as a normal behavior generated by process 
mobility and wireless links. Message adversaries 
provide us with a simple way to state assumptions 
(and sometimes minimal assumptions) on link 
connectivity which allow distributed computing 
problems to be solved in synchronous systems. 
They also allow the statement of equivalences 
relating (a) synchronous systems weakened 
by dynamically changing topology and (b) 
asynchronous read/write (or message-passing) 
systems enriched with distinct types of failure 
detectors. 
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Problem Definition 


The Traveling Salesman Problem (TSP) is the 
following optimization problem: 


Input: A complete loopless undirected graph 
G=(V,E,w) with a weight function w: 
E — Qs» that assigns to each edge a non- 
negative weight. 

Feasible solutions: All Hamiltonian tours, i.e., 
the subgraphs H of G that are connected, and 
each node in them that has degree two. 

Objective function: The weight function 
w(H) = Yo eex w(e) of the tour. 

Goal: Minimization. 


The TSP is an NP-hard optimization problem. 
This means that a polynomial time algorithm 
for the TSP does not exist unless P = NP. One 
way out of this dilemma is provided by ap- 
proximation algorithms. A polynomial time algo- 
rithm for the TSP is called an a-approximation 
algorithm if the tour H produced by the algo- 
rithm fulfills w(H) < a - OPT(G). Here OPT(G) 
is the weight of a minimum weight tour of G. 
If G is clear from the context, one just writes 
OPT. An a-approximation algorithm always pro- 
duces a feasible solution whose objective value 
is at most a factor of a away from the optimum 
value. a is also called the approximation factor 
or performance guarantee. « does not need to be 
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a constant; it can be a function that depends on 
the size of the instance or the number of nodes n. 

If there exists a polynomial time approxi- 
mation algorithm for the TSP that achieves 
an exponential approximation factor in n, then 
P=NP [6]. Therefore, one has to look at 
restricted instances. The most natural restriction 
is the triangle inequality, that means, 


wu, v) <wtu,x)+w(x,v) forallu,v,x eV. 


The corresponding problem is called the Met- 
ric TSP. For the Metric TSP, approximation al- 
gorithms that achieve a constant approximation 
factor exist. Note that for the Metric TSP, it is 
sufficient to find a tour that visits each vertex 
at least once: Given such a tour, we can find 
a Hamiltonian tour of no larger weight by skip- 
ping every vertex that we already visited. By 
the triangle inequality, the new tour cannot get 
heavier. 


Key Results 


A simple 2-approximation algorithm for the 
Metric TSP is the tree doubling algorithm. 
It uses minimum spanning trees to compute 
Hamiltonian tours. A spanning tree T of a graph 
G = (V, E,w) is aconnected acyclic subgraph of 
G that contains each node of V. The weight w(7) 
of such a spanning tree is the sum of the weights 
of the edges in it, ie, w(T) = der w(e). 
A spanning tree is called a minimum spanning 
tree if its weight is minimum among all 
spanning trees of G. One can efficiently compute 
a minimum spanning tree, for instance via Prim’s 
or Kruskal’s algorithm, see e.g., [5]. 

The tree doubling algorithm seems to be folk- 
lore. The next lemma is the key for proving the 
upper bound on the approximation performance 
of the tree doubling algorithm. 


Lemma 1 Let T be a minimum spanning tree of 
G = (V, E,w). Then w(T) < OPT. 


Proof Yf one deletes any edge of a Hamiltonian 
tour of G, one gets a spanning tree of G. O 
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Algorithm 1 Tree doubling algorithm 


Input: a complete loopless edge weighted undirected 
graph G = (V, E, w) with weight function w : E > 
Qs» that fulfills the triangle inequality 

Output: a Hamiltonian tour of G that is a 2" approxi- 
mation 

1: Compute a minimum spanning tree T of G. 

2: Duplicate each edge of T and obtain a Eulerian mul- 
tigraph T’. 

3: Compute a Eulerian tour of T’ (for instance via a 
depth first search in T). Whenever a node is visited 
in the Eulerian tour that was already visited, this 
node is skipped and one proceeds with the next un- 
visited node along the Eulerian cycle. (This process 
is called shortcutting.) Return the resulting Hamil- 
tonian tour H. 


Theorem 2 Algorithm 1 alwaysreturns a Hamil- 
tonian tour whose weight is at most twice the 
weight of an optimum tour. Its running time is 
polynomial. 


Proof By Lemma 1, w(T) < OPT. Since one 
duplicates each edge of T, the weight of 7’ equals 
w(T’) = 2w(T) < 2OPT. When taking shortcuts 
in step 3, a path in 7’ is replaced by a single edge. 
By the triangle inequality, the sum of the weights 
of the edges in such a path is at least the weight 
of the edge it is replaced by. (Here, the algorithm 
breaks down for arbitrary weight functions.) Thus 
w(H) < w(T’). This proves the claim about the 
approximation performance. 

The running time is dominated by the time 
needed to compute a minimum spanning tree. 
This is clearly polynomial. Oo 


Christofides’ algorithm (Algorithm 2) is a clever 
refinement of the tree doubling algorithm. It 
first computes a minimum spanning tree. On 
the nodes that have an odd degree in T, it then 
computes a minimum weight perfect matching. 
A matching M of G is called a matching on 
U CV if all edges of M consist of two nodes 
from U. Such a matching is called perfect if 
every node of U is incident with an edge of M. 


Lemma 3 LetU CV,#U even. Let M_ be 
a minimum weight perfect matching on U. Then 


w(M) < OPT/2. 
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Algorithm 2 Christofides’ algorithm 


Input: a complete loopless edge weighted undirected 
graph G = (V, E, w) with weight function w : E > 
Qs» that fulfills the triangle inequality 
Output: a Hamiltonian tour of G that is a 3/2" approxi- 
mation 
1: Compute a minimum spanning tree T of G. 
2: Let U C V be the set of all nodes that have odd de- 
gree in T. In G, compute a minimum weight per- 
fect matching M on U. 
3: Compute a Eulerian tour of T U M (considered as 
a multigraph). 
4: Take shortcuts in this Eulerian tour to a Hamilto- 
nian tour H. 


Proof Let H be an optimum Hamiltonian tour of 
G. One takes shortcuts in H to get a tour H’ on 
Glu as follows: H induces a permutation of the 
nodes in U, namely the order in which the nodes 
are visited by H. One connects the nodes of U 
in the order given by the permutation. To every 
edge of H’ corresponds a path in H connecting the 
two nodes of this edge. By the triangle inequality, 
w(H’) < w(#). Since #U is even, H’ is the union 
of two matchings. The lighter one of these two 
has a weight of at most w(H’)/2 < OPT/2. O 


One can compute a minimum weight perfect 
matching in time O(n), see for instance [5]. 


Theorem 4 Algorithm 2 is a 3/2-approximation 
algorithm with polynomial running time. 


Proof First observe that the number of odd 
degree nodes of the spanning tree is even, since 
the sum of the degrees of all nodes equals 
2(n — 1), which is even. Thus a perfect matching 
on U exists. The weight of the Eulerian tour 
is obviously w(T)+w(M). By Lemma 1, 
w(T) < OPT. By Lemma 3, w(M) < OPT/2. 
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The weight w(H) of the computed tour H is 
at most the weight of the Eulerian tour by the 
triangle inequality, i.e., w(H) < 3OPT. Thus the 
algorithm is a 3/2-approximation algorithm. Its 
running time is O(n?). El 


Applications 


Experimental analysis shows that Christofides’ 
algorithm itself deviates by 10 % to 15 % from 
the optimum tour [3]. However, it can serve as 
a good starting tour for other heuristics like the 
Lin—Kernigham heuristic. 


Open Problems 


The analysis of Algorithm 2 is tight; an exam- 
ple is the metric completion of the graph de- 
picted in Fig. 1. The unique minimum spanning 
tree consists of all solid edges. It has only two 
nodes of odd degree. The edge between these two 
nodes has weight (1 + €)(n + 1). No shortcuts 
are needed, and the weight of the tour produced 
by the algorithm is ~3n. An optimum tour con- 
sists of all dashed edges plus the leftmost and 
rightmost solid edge. The weight of this tour is 
Qn-1)1 +6) +2 %2n. 

The question whether there is an approxima- 
tion algorithm with a better performance guar- 
antee is a major open problem in the theory of 
approximation algorithms. 

Held and Karp [2] design an LP based algo- 
rithm that computes a lower bound for the weight 
of an optimum TSP tour. It is conjectured that 
the weight of an optimum TSP tour is at most 
a factor of 4/3 larger than this lower bound, but 
this conjecture is unproven for more than three 
decades. An algorithmic proof of this conjecture 


Metric TSP, Fig. 1 A tight example for Christofides’ algorithm. There are 2m + 1 nodes. Solid edges have a weight of 


one, dashed ones have a weight of 1 + € 
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would yield an 4/3-approximation algorithm for 
the Metric TSP. 


Experimental Results 


See e.g., [3], where a deviation of 10 % to 15 % 
of the optimum (more precisely of the Held—Karp 
bound) is reported for various sorts of instances. 


Data Sets 


The webpage of the 8th DIMACS implemen- 
tation challenge, www.research.att.com/~dsj/ 
chtsp/, contains a lot of instances. 


Cross-References 
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Recommended Reading 


Christofides never published his algorithm. It is 
usually cited as one of two technical reports 
from Carnegie Mellon University, TR 388 of 
the Graduate School of Industrial Administration 
(now Tepper School of Business) and CS-93-13. 
None of them seem to be available at Carnegie 
Mellon University anymore [Frank Balbach, per- 
sonal communication, 2006]. A one-page abstract 
was published in a conference record. But his 
algorithm quickly found his way into standard 
textbooks on algorithm theory, see [7] for a recent 
one. 
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Problem Definition 


Metrical task systems (MTS), introduced 
by Borodin, Linial, and Saks [5], is a cost 
minimization problem defined on a metric space 
(X, dy) and informally described as follows: 
A given system has a set of internal states X. The 
aim of the system is to serve a given sequence 
of tasks. The servicing of each task has a certain 
cost that depends on the task and the state of 
the system. The system may switch states before 
serving the task, and the total cost for servicing 
the task is the sum of the service cost of the 
task in the new state and the distance between 
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the states in a metric space defined on the set 
of states. Following Manasse, McGeoch, and 
Sleator [11], an extended model is considered 
here, in which the set of allowable tasks may be 
restricted. 


Notation 

Let 7* denote the set of finite sequences of 
elements from a set T. For x, y € T*, x o y is the 
concatenation of the sequences x and y, and |x| is 
the length of the sequence x. 


Definition 1 (Metrical Task System) Fix a met- 
ric space (X, dy). Let T = {(rx)xex : Vx € X, 
r(x) € [0, co]} be the set of all possible tasks. 
Let T C I’ bea subset of tasks, called allowable 
tasks. 

MTS((X, dx), T, ao € X): 


INPUT: A_ finite sequence of tasks t= 
(T,...,Tn) € T*. 

OUTPUT: A sequence of points a= 
(a1,...,4m) € X*, 

|a| = Icl. 


OBJECTIVE: minimize 


cost(t,a) = (dx (ai-1,.47) + ti(ai)). 


i=1 


When T = I, the MTS problem is called gen- 
eral. 


When X is finite and the task sequence t € T™* is 
given in advance, a dynamic programming algo- 
rithm can compute an optimal solution in space 
O(|X |) and time O(|t| - |X|). MTS, however, is 
most interesting in an online setting, where the 
system must respond to a task t; with a state 
a; € X without knowing the future tasks in T. 
Formally, 


Definition 2 (Online algorithms for MTS) 
A deterministic algorithm for a MTS((X, dx), 
T, do) is a mapping S : T* > X* such that 
for every t € T, |S(t)| =|t|. A deterministic 
algorithm S : T* — X™* is called online if for 
every t,0 € 7%, there exists a € X*, |a| = |o| 
such that S(too) = S(t)oa. A randomized 
online algorithm is a probability distribution over 
deterministic online algorithms. 
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Online algorithms for MTS are evaluated us- 
ing (asymptotic) competitive analysis, which is, 
roughly speaking, the worst ratio of the algo- 
rithm’s cost to the optimal cost taken over all 
possible task sequences. 


Definition 3 A randomized online algorithm 
R for MTS((X, dx), ado) is called c-competitive 
(against oblivious adversaries) if there exists 
b = b(X) €R such that for any task sequence 
teéT*, and any point sequence ae X%*, 
la| = |rl, 


“[cost(z, R(t))] < c-cost(t,a) +b, 


where the expectation is taken over the distribu- 
tion R. 


The competitive ratio of an online algorithm R 
is the infimum over c > 1 for which R is c- 
competitive. The deterministic [respectively, ran- 
domized] competitive ratio of MTS((X, dx), T, 
do) is the infimum over the competitive ratios 
of all deterministic [respectively, randomized] 
online algorithms for this problem. Note that 
because of the existential quantifier over b, the 
asymptotic competitive ratio (both randomized 
and deterministic) of a MTS((X, dy), T, dao) is 
independent of ao, and it can therefore be dropped 
from the notation. 


Key Results 


Theorem 1 ((5]) The deterministic competitive 
ratio of the general MTS problem on any n-point 
metric space is 2n — 1. 


In contrast to the deterministic case, the under- 
standing of randomized algorithms for general 
MTS is not complete, and generally no sharp 
bounds such as Theorem | are known. 


Theorem 2 ([5, 10]) The randomized compet- 
itive ratio of the general MTS problem on n- 
point uniform space (where all distances are 
equal) is at least Hy = aa i—', and at most 
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The best bounds currently known for general n- 
point metrics are proved in two steps: First the 
given metric is approximated by an ultrametric, 
and then a bound on the competitive ratio of 
general MTS on ultrametrics is proved. 


Theorem 3 ([8,9]) For any n-point metric space 
(X, dx), there exists an O(log” nloglogn) com- 
petitive randomized algorithm for the general 
MTS on (X, dx). 


The metric approximation component in the 
proof of Theorem 3 is called probabilistic 
embedding. An optimal O(logn) probabilistic 
embedding is shown by Fakcheroenphol, Rao and 
Talwar before [8] improving on results by Alon, 
Karp, Peleg, and West and by Bartal, where this 
notion was invented. A different type of metric 
approximation with better bounds for metrics of 
low aspect ratio is given in [3]. 

Fiat and Mendel [9] show a O(logn log logn) 
competitive algorithm for n-point ultrametrics, 
improving (and using) a result of Bartal, Blum, 
Burch, and Tomkins [1], where the first poly- 
logarithmic (or even sublinear) competitive ran- 
domized algorithm for general MTS on general 
metric spaces is presented. 


Theorem 4 ((2, 12]) For any n-point metric 
space (X, dx), the randomized competitive ra- 
tio of the general MTS on (X, dx) is at least 
2 (logn/ loglogn). 


The metric approximation component in the 
proof of Theorem 4 is called Ramsey subsets. 
It was first used in this context by Karloff, 
Rabani, and Ravid, later improved by Blum, 
Karloff, Rabani and Saks, and Bartal, Bollobas, 
and Mendel [2]. A tight result on Ramsey subsets 
is proved by Bartal, Linial, Mendel, and Naor. 
For a simpler (and stronger) proof, see [12]. 

A lower bound of §2(logn/loglogn) on the 
competitive ratio of any randomized algorithm 
for general MTS on n-point ultrametrics is proved 
in [2], improving previous results of Karloff, 
Rabani, and Ravid, and Blum, Karloff, Rabani 
and Saks. 

The last theorem is the only one not concern- 
ing general MTSs. 
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Theorem 5 ({6]) Jt is PSPACE hard to determine 
the competitive ratio of a given MTS instance 
((X, dxy),ay € X,T), even when dy is the uni- 
form metric. On the other hand, when dy is 
uniform, there is a polynomial time deterministic 
online algorithm for MTS ((X,dx),do € X,T) 
whose competitive ratio is O(log|X|) times the 
deterministic competitive ratio of the MTS((X, 
dx), ao, T). Here it is assumed that the instance 
((X, dx), ao, T) is given explicitly. 


Applications 


Metrical task systems were introduced as an ab- 
straction for online computation, they generalize 
many concrete online problems such as paging, 
weighted caching, k-server, and list update. His- 
torically, it served as an indicator for a general 
theory of competitive online computation. 

The main technical contribution of the MTS 
model is the development of the work function 
algorithm used to prove the upper bound in The- 
orem 1. This algorithm was later analyzed by 
Koutsoupias and Papadimitriou in the context 
of the k-server problem, and was shown to be 
2k —1 competitive. Furthermore, although the 
MTS model generalizes the k-server problem, 
the general MTS problem on the n-point met- 
ric is essentially equivalent to the (n — 1)-server 
problem on the same metric [2]. Hence, lower 
bounds on the competitive ratio of general MTS 
imply lower bounds for the k-server problem, and 
algorithms for general MTS may constitute a first 
step in devising an algorithm for the k-server 
problem, as is the case with the work function 
algorithm. 

The metric approximations used in The- 
orem 3, and Theorem 4 have found other 
algorithmic applications. 


Open Problems 


There is still an obvious gap between the upper 
bound and lower bound known on the random- 
ized competitive ratio of general MTS on general 
finite metrics. It is known that, contrary to the 
deterministic case, the randomized competitive 
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ratio is not constant across all metric spaces of 
the same size. However, in those cases where 
exact bounds are known, the competitive ratio is 
O(logn). An obvious conjecture is that the ran- 
domized competitive is O(log) for any n-point 
metric. Arguably, the simplest classes of metric 
spaces for which no upper bound on the random- 
ized competitive ratio better than O(log’ n) is 
known, are paths and cycles. 

Also lacking is a “middle theory” for MTS. 
On the one hand, general MTS are understood 
fairly well. On the other hand, specialized MTS 
such as list update, deterministic k-server algo- 
rithms, and deterministic weighted-caching, are 
also understood fairly well, and have a much 
better competitive ratio than the corresponding 
general MTS. What may be missing are “in 
between” models of MTS that can explain the low 
competitive ratios for some of the concrete online 
problems mentioned above. 

It would be also nice to strengthen Theorem 5, 
and obtain a polynomial time deterministic online 
algorithm whose competitive ratio on any MTS 
instance on any n-point metric space is at most 
poly-log(n) times the deterministic competitive 
ratio of that MTS instance. 
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Problem Definition 


MINHASH sketches (also known as min-wise 
sketches) are randomized summary structures of 
subsets which support set union operations and 
approximate processing of cardinality and simi- 
larity queries. 

Set-union support, also called mergeability, 
means that a sketch of the union of two sets 
can be computed from the sketches of the two 
sets. In particular, this applies when the second 
set is a single element. The queries supported 
by MINHASH sketches include cardinality (of a 
subset from its sketch) and similarity (of two 
subsets from their sketches). 

Sketches are useful for massive data analysis. 
Working with sketches often means that instead 
of explicitly maintaining and manipulating very 
large subsets (or equivalently 0/1 vectors), we can 
instead maintain the much smaller sketches and 
can still query properties of these subsets. 

We denote the universe of elements by U and 
its size by n = |U|. We denote by S(X) the 
sketch of the subset X C U. 


Set Operations 

¢ Inserting an element: Given a set X and 
element y € U,asketch S(X U {y}) can be 
computed from S(X) and y. 

¢ Merging two sets: For two (possibly overlap- 
ping) sets X and Y, we can obtain a sketch of 
their union $(X U Y) from S(X) and S(Y). 


Support for insertion makes the sketches suit- 
able for streaming, where elements (potentially 
with repeated occurrences) are introduced se- 
quentially. Support for merges is important for 
parallel or distributed processing: We can sketch 
a data set that has multiple parts by sketching 
each part and combining the sketches. We can 
also compute the sketches by partitioning the data 
into parts, sketching each of the parts concur- 
rently, and finally merging the sketches of the 
parts to obtain a sketch of the full data set. 


Queries 
From the sketches of subsets, we would like to 
(approximately) answer queries on the original 
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data. More precisely, for a set pf subsets {X;}, we 
are interested in estimating a function f({X;}). 
To do this, we apply an estimator i to the 
respective set of sketches {.S(X;)}. 

We would like our estimators to have certain 
properties: When estimating nonnegative quan- 
tities (such as cardinalities or similarities), we 
would want the estimator to be nonnegative as 
well. We are often interested in unbiased estima- 
tors and always in admissible estimators, which 
are Pareto optimal in terms of variance (variance 
on one instance cannot be improved without 
increasing variance on another). We also seek 
good concentration, meaning that the probability 
of error decreases exponentially with the relative 
error. 

We list some very useful queries that are 
supported by MINHASH sketches: 


¢ Cardinality: The number of elements in the 
set f(X) = |X|. 

¢ Similarity: The Jaccard coefficient f(X, Y) = 
[XNY|/|X UY|, cosine similarity f(X, Y) = 
IX NY|//|X||¥|, or cardinality of the union 
F(X, Y) = |X UY|. 

¢ Complex relations: Cardinality of the union 
of multiple sets | |_); X;|, number of elements 
occurring in at least 2 sets |{7 | di: 4 i2, j € 
Xj, N Xji}|, set differences, etc. 

* Domain queries: When elements have asso- 
ciated metadata (age, topic, activity level), we 
can include this information in the sketch, 
which becomes a random sample of the set. 
Including this information allows us to pro- 
cess domain queries, which depend on the 
matadata. For example, “the number of Cali- 
fornians in the union of two (or more) sets.” 


Key Results 


MINHASH sketches had been proposed as sum- 
mary structures which satisfy the above require- 
ments. There are multiple variants of the MIN- 
HASH sketch, which are optimized for different 
applications. The common thread is that the el- 
ements x € U of the universe U are assigned 
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random rank values r(x) (which are typically 
produced by a random hash function). The MIN- 
HASH sketch S(X) of a set X includes order 
statistics (maximum, minimum, or top-/bottom- 
k values) of the set of ranks {r(x) | x € 
X}. Note that when we sketch multiple sets, the 
same random rank assignment is common to all 
sketches (we refer to this as coordination). 

Before stating precise definitions for the dif- 
ferent MINHASH sketch structures, we provide 
some intuition for the power of order statistics. 
We first consider cardinality estimation. The min- 
imum rank value min,¢x r(x) is the minimum of 
|X | independent random variables, and therefore, 
its expectation should be smaller when the car- 
dinality |X| is larger. Thus, the minimum rank 
carries information on |X|. We next consider 
the sketches of two sets X and Y. Recall that 
they are computed with respect to the same as- 
signment r. Therefore, the minimum rank values 
carry information on the similarity of the sets: 
in particular, when the sets are more similar, 
their minimum ranks are more likely to be equal. 
Finally, the minimum rank element of a set is 
a random sample from the set and, therefore, 
as such, can support estimation of statistics of 
the set. 

The variations of MINHASH sketches differ in 
the particular structure: how the rank assignment 
is used and the domain and distribution of the 
ranks r(x) ~ D. 


Structure 
MINHASH sketches are parameterized by an inte- 
ger k > 1, which controls a trade-off between the 
size of the sketch representation and the accuracy 
of approximate query results. 

MINHASH sketches come in the following 
three common flavors: 


¢ A k-mins sketch [6, 13] includes the smallest 
rank in each of k independent rank assign- 
ments. There are k different rank functions 
rj and the sketch S(X) = (t,...,7,) has 
Tj = minyex r;(y). When viewed as a sam- 
ple, it corresponds to sampling k times with 
replacement. 
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¢ A k-partition sketch [13, 14, 18], which in 
the context of cardinality estimation is called 
stochastic averaging, uses a single rank as- 
signment together with a uniform at random 
mapping of items to k buckets. We use b : 
U — |k] for the bucket mapping and r for the 
rank assignment. The sketch (t,,..., T%) then 
includes the item with minimum rank in each 
bucket. That is, t; = minyex|p(y)=i Mi (y). If 
the set is empty, the entry is typically defined 
as the supremum of the domain of r. 

¢ A bottom-k sketch [4,6] t| < < 
includes the &k items with smallest rank in 
{r(y) | y € X}. Interpreted as a sample, it 
corresponds to sampling k elements without 
replacement. Related uses of the same method 
include KMV sketch [2], coordinated order 
samples [3, 19, 21], or conditional random 
sampling [17]. 


Note that all three flavors are the same when 
k=1. 

With all three flavors, the sketch represents k 
random elements of D. When viewed as random 
samples, MINHASH sketches of different subsets 
X are coordinated, since they are generated using 
the same random rank assignments to the domain 
U. The notion of coordination is very power- 
ful. It means that similar subsets have similar 
sketches (a locality sensitive hashing property). 
It also allows us to support merges and similarity 
estimation much more effectively. Coordination 
in the context of survey sampling was introduced 
in [3] and was applied for sketching data in [4,6]. 


Rank Distribution 

Since we typically use a random hash function 
H(x) ~ D to generate the ranks, it always 
suffices to store element identifiers instead of 
ranks, which means the representation of each 
rank value is [log, n] bits and the bit size of the 
sketch is at most k logn. This representation size 
is necessary when we want to support domain 
queries — the sketch of each set should identify 
the element associated with each included rank, 
so we can retrieve the metadata needed to evalu- 
ate a selection predicate. 
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For the applications of estimating cardinalities 
or pairwise similarities, however, we can work 
with ranks that are not unique to elements and, in 
particular, come from a smaller discrete domain. 
Working with smaller ranks allows us to use 
sketches of a much smaller size and also replace 
the dependence of the sketch on the domain size 
(O(log) per entry) by dependence on the subset 
sizes. In particular, we can support cardinality 
estimation and similarity estimation of subsets of 
size at most m with ranks of size O(log logm). 
Since the & rank values used in the sketch are typ- 
ically highly correlated, the sketch S(X) can be 
stored using O(log log m + k) bits in expectation 
(O(log log m + k logk) bits for similarity). This 
is useful when we maintain sketches of many sets 
and memory is at a premium, as when collecting 
traffic statistics in IP routers. 

For analysis, it is often convenient to work 
with continuous ranks, which without loss of 
generality are r ~ U[0, 1] [6], since there is a 
monotone (order preserving) transformation from 
any other continuous distribution. Using ranks 
of size O(logn) is equivalent to working with 
continuous ranks. 

In practice, we work with discrete ranks, for 
example, values restricted to 1/2! for integral 
i > 0 [13] or more generally using a base b > 1 
and using 1/b'. This is equivalent to drawing 
a continuous rank and rounding it down to the 
largest discrete point of the form 1/5’. 


Streaming: Number of Updates 

Consider now maintaining a MINHASH sketch 
in the streaming model. We maintain a sketch 
S of the elements X that we had seen until 
now. When we process a new element y, then 
if y € X, the sketch is not modified. We can 
show that the number of times the sketch is 
modified is in expectation at most k Inn, where 
n = |X| is the number of distinct elements in 
the prefix. We provide the argument for bottom- 
k sketches. It is similar with other flavors. The 
probability that a new element has a rank value 
that is smaller than the kth smallest rank in X 
is the probability that it is in one of the first 
k positions in a permutation of size n + 1. 
That is, the probability is 1 if nm < k and is 
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k/(n + 1 otherwise. Summing over new distinct 
elementsn = 1,...,|X|, we obtain )\7_, k/i < 
k Inn. 


Inserting an Element 

We now consider inserting an element y, 
that is, obtaining a sketch S(X U {y}) from 
S(X) and y. The three sketch flavors have 
different properties and trade-offs in terms 
of insertion costs. We distinguish between 
insertions that result in an actual update of the 
sketch and insertions where the sketch is not 
modified. 


¢ k-mins sketch: We need to generate the rank 
of y, r;(y), in each of k different assignments 
(k hash computations). We can then compare, 
coordinate-wise, each rank with the respective 
one in the sketch, taking the minimum of the 
two values. This means that each insertion, 
whether the sketch is updated or not, results 
in O(k) operations. 

¢ Bottom-k sketch: We apply our hash function 
to generate r(y). We then compare r(y) with 
Tx. If the sketch contains fewer than k ranks 
(|S| < k or tx is the rank domain supremum), 
then r(y) is inserted to S. 

Otherwise, the sketch is updated only if 

r(y) < t,x. In this case, the largest sketch 
entry t, is discarded and r(y) is inserted to 
the sketch S. When the sketch is not modified, 
the operation is O(1). Otherwise, it can be 
O(logk). 

¢ k-partition sketch: We apply the hash func- 
tions to y to determine the bucket b(y) € [k] 
and the rank r(y). To determine if an update 
is needed, we compare r(y) and Tyvy). If the 
latter is empty (tp (,) is the domain supremum) 
or if r(y) < Tp), We assign Tp(y) <— r(y). 


Merging 
We now consider computing the sketch S(X U 
Y) from the sketches S(X) and S(Y) of two sets 
XY. 

For k-mins and k-partition sketches, the 
sketch of S(X U Y) is simply the coordinate- 
wise minimum (min{t1,T)},...,min{t,, t}) 
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of the sketches S(X) = (%,...,T) and 
S(Y) = (tj,...,t). For bottom-k sketches, 
the sketch of S(X U Y) includes the k smallest 
rank values in S(X) U S(Y). 


Estimators 
Estimators are typically specifically derived for a 
given sketch flavor and rank distribution. 

Cardinality estimators were pioneered by Fla- 
jolet and Martin [13], continuous ranks were con- 
sidered in [6], and lower bounds were presented 
in [1]. State-of-the-art practical solutions include 
[7, 14]. Cardinality estimation can be viewed in 
the context of the theory of point estimation: 
estimating the parameter of a distribution (the 
cardinality) from the sketch (the “outcome”). 
Estimation theory implies that current estimators 
are optimal (minimize variance) for the sketch 
[7]. Recently, historic inverse probability (HIP) 
estimators were proposed, which apply with all 
sketch types and improve variance by maintain- 
ing an approximate count alongside the MIN- 
HASH sketch [7], which is updated when the 
sketch is modified. 

Estimators for set relations were first consid- 
ered in [6] (cardinality of union, by computing 
a sketch of the union and applying a cardinality 
estimator) and [4] (the Jaccard coefficient, which 
is the ratio of intersection to union size). The 
Jaccard coefficient can be estimated on all sketch 
flavors (when ranks are not likely to have colli- 
sions) by simply looking at the respective ratio 
in the sketches themselves. In general, many set 
relations can be estimated from the sketches, and 
state-of-the-art derivation is given in [8, 10]. 


Applications 


Approximate distinct counters are widely used 
in practice. Applications include statistics collec- 
tion at IP routers and counting distinct search 
queries [15]. 

An important application of sketches, and 
their first application to estimate set relations, 
was introduced in [6]. Given a directed graph, 
and a node v, we can consider the set R(v) of 
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all reachable nodes. It turns out that sketches 
S(R(v)) for all nodes v in a graph can be 
computed very efficiently, in nearly linear time. 
The approach naturally extends to sketching 
neighborhoods in a graph. The sketches of nodes 
support efficient estimation of important graph 
properties, such as the distance distribution, node 
similarities (compare their relations to other 
nodes), and influence of a set of nodes [11, 12]. 

Similarity estimation using sketches was ap- 
plied to identify near-duplicate Web pages by 
sketching “shingles” that are consecutive lists or 
words [4]. Since then, MINHASH sketches are 
extensively applied for similarity estimation of 
text documents and other entities. 


Extensions 


Weighted Elements 

MINHASH sketches are summaries of sets or of 
0/1 vectors. In many applications, each element 
x € U has a different intrinsic nonnegative 
weight w(x) > 0, and queries are formulated 
with respect to these weights: Instead of cardi- 
nality estimates we can consider the respective 
weighted sum )>,<y w(x). Instead of the Jac- 
card for the similarity of two sets X and Y, 
we may be interested in the weighted version 
Yxexny W(X)/ oy exuy WX). When this is the 
case, to obtain more accurate query results, we 
use sketches so that the inclusion probability of 
an element increases with its weight. The sketch 
in this case would correspond to a weighted sam- 
ple. This is implemented by using ranks which 
are drawn from a distribution that depends on the 
weights w(y) [6,9, 19-21]. 


Hash Functions 

We assumed here the availability of truly random 
hash function. In practice, observed performance 
is consistent with this assumption. We mention 
however that the amount of independence needed 
was formally studied using min-wise independent 
hash functions [5, 16]. 
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Uno 


Problem Definition 


Let G be a graph on n vertices and m edges. 
An edge is written xy (equivalently yx). A 
dominating set in G is a set of vertices D 
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such that every vertex of G is either in D or 
is adjacent to some vertex of D. It is said to 
be minimal if it does not contain any other 
dominating set as a proper subset. For every 
vertex x, let N[x] be {x} U {y|xy © E} and for 
every S C V let N[S] := 'o N[x]. For S C V 


xeS 
and x € S we call any y € N[x]\N[S\x], a 
private neighbor of x with respect to S. The set 
of minimal dominating sets of G is denoted by 
D(G). We are interested in an output-polynomial 
algorithm for enumerating D(G), i.e., listing, 
without repetitions, all the elements of D(G) 


in time bounded by p|u+m, = |D| 
DeD(G) 
(DOM-ENUM for short). 

It is easy to see that DOM-ENUM is a 
special case of HYPERGRAPH DUALIZATION. 
Let N(G), called the closed neighborhood 
hypergraph, be the hypergraph with hyper- 
edges {N[x]|x © V}. It is easy to see that D 
is a dominating set of G if and only if D isa 
transversal of M’(G). Hence, DOM-ENUM is a 
special case of HY-PERGRAPH DUALIZATION. 
For several graph classes their closed neighbor- 
hood hypergraphs are subclasses of hypergraph 
classes where an output-polynomial algorithm 
is known for HYPERGRAPH DUALIZATION, 
e.g., minor-closed classes of graphs, graphs of 
bounded degree, graphs of bounded conformality, 
graphs of bounded degeneracy, graphs of 
logarithmic degeneracy [11, 12, 19]. So, Dom- 
ENUM seems more tractable than HYPERGRAPH 
DUALIZATION since there exist families of 
hypergraphs that are not closed neighborhoods of 
graphs [1]. 


Key Results 


Contrary to several special cases of HYPER- 
GRAPH DUALIZATION in graphs, (e.g., enumera- 
tion of maximal independent sets, enumeration of 
spanning forests, etc.) DOM-ENUM is equivalent 
to HYPERGRAPH DUALIZATION. Indeed, it is 
proved in [14] that with every hypergraph H, 
one can associate a co-bipartite graph B(H) 


Minimal Dominating Set Enumeration 


such that every minimal dominating set of 
B(H) is either a transversal of H or has size 
at most 2. A consequence is that there exists a 
polynomial delay polynomial space algorithm 
for HYPERGRAPH DUALIZATION if and only 
if there exists one for DOM-ENUM, even in 
co-bipartite graphs. The reduction is moreover 
asymptotically tight (with respect to polynomial 
delay reductions as defined in [19]) in the sense 
that there exist hypergraphs # such that for 
every graph G we cannot have tr(H) = D(G) 
[14]. This intriguing result has the advantage of 
bringing tools from graph structural theory to 
tackle the difficult and widely open problem 
HYPERGRAPH DUALIZATION. Furthermore, 
until recently the most graph classes where DOM- 
ENUM is known to be tractable were those for 
which closed neighborhood hypergraphs were 
subclasses of some of the tractable hypergraph 
classes for HYPERGRAPH DUALIZATION. We 
will give examples of graph classes where graph 
theory helps a lot to solve DOM-ENuM, and 
sometimes allows to introduce new techniques 
for the enumeration. 

It is widely known now that every monadic 
second-order formula can be checked in poly- 
nomial time in graph classes of bounded clique- 
width [3,20]. Courcelle proved in [2] that one can 
also enumerate, with linear delay linear space, 
the solutions of every monadic second-order for- 
mula. Since one can express in monadic second- 
order logic that a subset D of vertices is a 
minimal dominating set, DOM-ENUM has a linear 
delay linear space in graph classes of bounded 
clique-width. The algorithm by Courcelle is quite 
ingenious: it firsts constructs a DAG, some sub- 
trees of which correspond to the positive runs of 
the tree-automata associated with the formula on 
the given graph and then enumerate these sub- 
trees. 

Many graph classes do not have bounded 
clique-width (interval graphs, permutation 
graphs, unit-disk graphs, etc.) and many such 
graph classes have nice structures that helped in 
the past for solving combinatorial problems, e.g., 
the clique-tree of chordal graphs, permutation 
models, etc. For some of these graph classes 
structural results can help to solve DOM-ENUM. 
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A common tool in enumeration area is the 
parsimonious reduction. One wants to enumer- 
ate a set of objects O and instead constructs 
a bijective function b O -— T such that 
there is an efficient algorithm to enumerate 7. 
For instance it is proved in [11, 14] that ev- 
ery minimal dominating set D of a split graph 
G can be characterized by D M C(G) where 
C(G) is the clique of G. A consequence is 
that in a split graph G there is a bijection be- 
tween D(G) and the set {S C C(G)|Vx € 
S,x has a private neighbor}, and since this latter 
set is an independent system, DOM-ENUM in split 
graphs admits a linear delay polynomial space 
algorithm. 

One can obtain other parsimonious reductions 
using graph structures. For instance, it is easy to 
check that every minimal dominating set in an 
interval graph is a collection of paths. Moreover, 
using the interval model (and ordering intervals 
from their left endpoints) every minimal domi- 
nating set can be constructed greedily by keeping 
track of the last two chosen vertices. Indeed it is 
proved in [13] that with every interval graph G 
one can associate a DAG, the maximal paths of 
which are in bijection with the minimal dominat- 
ing sets of G. The nodes of the DAG are pairs 
(x, y) such that x < y and such that x and y 
can be both in a minimal dominating set, and the 
arcs are ((x, y), (y, z)) such that (1) {x, y, z} 
can be in a minimal dominating set, (2) there is 
no vertex between y and z that is not dominated 
by y or Z, sources are pairs (x, y) where every 
interval before x is dominated by x, and sinks 
are pairs (x, y) where every interval after y is 
dominated by y. This reduction to maximal paths 
of a DAG can be adapted to several other graph 
classes having a linear structure similar to the 
interval model, e.g. permutation graphs, circular- 
arc graphs [13]. In general, if for every graph G 
in a graph class C one can associate an ordering 
of the vertices such that for every subset S C V 
the possible ways to extend S into a minimal 
dominating set depends only on the last k vertices 
of S, for some fixed constant k depending only on 
C, then for every G € C the enumeration of D(G) 
can be reduced to the enumeration of paths in a 
DAG as for interval graphs and thus DOM-ENUM 
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is tractable in C [19]. This seems for instance to 
be the case for d-trapezoid graphs. 

Parsimonious reductions between graph 
classes can be also defined. For instance, the 
completion of a graph G, i.e., the set of edges 
that can be added to G without changing D(G) 
are characterized in [11,14], this characterization 
lead the authors to prove that the completion 
of every Pe-free chordal graph is a split graph, 
which results in a linear delay polynomial space 
algorithm for DOM-ENUM in Pe-free chordal 
graphs. 

The techniques developed by the HYPER- 
GRAPH DUALIZATION community combined 
with graph structural theory can give rise to new 
tractable cases of DOM-ENUM. For instance, 
the main drawback of Berge’s algorithm is that 
at some level computed transversals are not 
necessarily subsets of solutions and this prevents 
from obtaining an output-polynomial algorithm 
since the computed set may be arbitrary large 
compared to the solution set [21]. One way to 
overcome this difficulty consists in choosing 
some levels J; ...,/, of Berge’s algorithm such 
that every computed set at level /; is a subset 
of a solution at level /;41. A difficulty with 
that scheme is to compute all the descendants 
in level /;41 of a transversal in level /;. This 
idea combined with the structure of minimal 
dominating sets in line graphs is used to derive a 
polynomial delay polynomial space algorithm for 
DOM-ENUM in line graphs [15]. A consequence 
is that there is a polynomial delay polynomial 
space algorithm to list the set of minimal edge 
dominating sets in graphs. 

Another famous technique in enumeration 
area is the back tracking. Start from the empty 
set, and in each iteration choose a vertex x and 
partition the problem into two sub-problems: 
the enumeration of minimal dominating sets 
containing x and the enumeration of those 
not containing x, at each step we have a 
set X to include in the solution and a set 
Y not to include. If at each step one can 
solve the EXTENSION PROBLEM, i.e., whether 
there is a minimal dominating set containing 
X and not intersecting Y, then DOM-ENUM 
admits a polynomial delay polynomial space 
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algorithm. However, the EXTENSION PROBLEM 
is NP-complete in general [19] and even in 
split graphs [16]. But, sometimes structure 
helps. For instance, in split graphs whenever 
X UY C C(G), the EXTENSION PROBLEM 
is polynomial [11, 14] and was the key for the 
linear delay algorithm. Another special case 
of the EXTENSION PROBLEM is proved to be 
polynomial in chordal graphs using the clique 
tree of chordal graphs and is also the key 
to prove that DOM-ENUM in chordal graphs 
admits a polynomial delay polynomial space 
algorithm [16]. The algorithm uses deeply the 
clique tree and is a nested combination of several 
enumeration algorithms. 


Open Problems 


1. The first major challenge is to find an output- 
polynomial algorithm for DOM-ENUM, even 
in co-bipartite graphs. One way to address 
this problem is to understand the structure of 
minimal dominating sets in a graph. Failing 
to solve this problem, can graphs help to 
improve the quasi-polynomial time algorithm 
by Fredman and Khachiyan [7]? 

2. Until now if the techniques used to solve 
DOM-ENUM in many graph classes are well- 
known, deep structural theory of graphs is not 
used and the used graph structures are more or 
less ad hoc. Can we unify all these results and 
obtain at the same time new positive results? 
Indeed, there are several well-studied graph 
classes where the status of DOM-ENUM is 
still open: bipartite graphs, unit-disk graphs, 
graphs of bounded expansion to cite a few. 
Are developed tools sufficient to address these 
graph classes? 

3. There are several well-studied variants of the 
dominating set problem, in particular total 
dominating set and connected dominating set 
(see the monographs [9, 10]). It is proved in 
[14] that the enumeration of minimal total 
dominating sets and minimal connected dom- 
inating sets in split graphs is equivalent to 
HYPERGRAPH DUALIZATION. This is some- 
how surprising and we do not yet understand 
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why such small variations make the problem 
difficult even in split graphs. Can we explain 
this situation? 


. From [14] we know that the enumeration of 


minimal connected dominating sets is harder 
than HYPERGRAPH DUALIZATION. Are both 
problems equivalent? Can we find a graph 
class C where each graph in C has a non- 
exponential number of minimal connected 
dominating sets, but minimum connected 
dominating set is NP-complete? Notice that 
if a class of graphs C has a polynomially 
bounded number of minimal separators, 
then the enumeration of minimal connected 
dominating sets can be reduced to DOM- 
ENUM [14]. 


. A related question to DOM-ENUM is a tight 


bound for the number of minimal dominat- 
ing sets in graphs. The best upper bound is 
O(1.7159") and the best lower bound is 15”/° 
[6]. For several graph classes, tight bounds 
were obtained [4, 8]. Prove that 15”/® is the 
upper bound or find the tight bound. 

Another related subject to DOM-ENUM is the 
counting of (minimal) dominating sets in time 
polynomial in the input graph. If the counting 
of dominating sets is a #P-hard problem and 
have been investigated in the past [5, 17, 18], 
not so much is known for the counting of min- 
imal dominating sets, one can cite few exam- 
ples: graphs of bounded clique-width [2], and 
interval, permutation and circular-arc graphs 
[13]. If we define for G the minimal domina- 
tion polynomial M D(G, x) that is the gener- 
ating function of its minimal dominating sets, 
for which graph classes this polynomial can 
be computed? Does it have a (linear) recursive 
definition? For which values x can we evalu- 
ate it? 
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Problem Definition 


A minimal perfect hash function (MPHF) is a 
(data structure providing a) bijective map from a 
set S' of 1 keys to the set of the first n natural 
numbers. In the static case (i.e., when the set S 
is known in advance), there is a wide spectrum of 
solutions available, offering different trade-offs in 
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terms of construction time, access time, and size 
of the data structure. 


Problem Formulation 

Let [x] denote the set of the first x natural num- 
bers. Given a positive integer u = 2”, and a set 
S C [u] with |S| = n, a functionh : S — [m] is 
perfect if and only if it is injective and minimal if 
and only ifm = n. An (M)PHF is a data structure 
that allows one to evaluate a (minimal) perfect 
function of this kind. 

When comparing different techniques for 
building MPHFs, one should be aware of the 
trade-offs between construction time, evaluation 
time, and space needed to store the function. A 
general tenet is that evaluation should happen 
in constant time (with respect to n), whereas 
construction is only required to be feasible in a 
practical sense. 

Space is often the most important aspect when 
a construction is taken into consideration; usu- 
ally space is computed in an exact (i.e., non- 
asymptotic) way. Some exact space lower bounds 
for this problem are known (they are pure space 
bounds and do not consider evaluation time): 
Fredman and Komlés proved [4] that no MPHF 
can occupy less than n log e+log log u+ O(log n) 
bits, as soon as u > n?2+€; this bound is essen- 
tially tight [9, Sect. 1.2.3, Thm. 8], disregarding 
evaluation time. 


Key Results 


One fundamental question is how close to the 
space lower bound n loge + loglogu one can 
stay if the evaluation must be performed in con- 
stant time. The best theoretical results in this 
direction are given in [6], where an nloge + 
log logu+ O(n (log log n)?/ logn +log log log wu) 
technique is provided (optimal up to an additive 
factor) whose construction takes linear time in 
expectation. The technique is only of theoretical 
relevance, though, as it yields a low number of 
bits per key only for unrealistically large values 
of n. 

We will describe two practical solutions: the 
first one provides a structure that is simple, con- 
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stant time, and asymptotically space optimal (i.e., 
O(n)); its actual space requirement is about twice 
the lower bound. The second one can potentially 
approach the lower bound, even if in practice this 
would require an unfeasibly long construction 
time; nonetheless, it provides the smallest known 
practical data structure — it occupies about 1.44 
times the lower bound. 

We present the two constructions in some 
detail below; they both use the idea of building 
an MPHF out of a PHF that we explain first. 


From a PHF to a MPHF 
Given a set T C [m] of size |T| = n, define 
rank : [m] — [n] by letting 


ranky(p) = |{i € T |i < p}l. 


Clearly, every PHF g S — [m] can be 
combined with rankg(g) : [m] — [n] to ob- 
tain an MPHF. Jacobson [7] offers a constant- 
time implementation for the rank data structure 
that uses o(m) additional bits besides the set 7 
represented as an array of m bits; furthermore, 
constant-time solutions exist that require as little 
as O(n/(logn)°) (for any desired c) over the 
information-theoretical lower bound log (”") [10]. 

For practical solutions, see [5, 11]. For very 
sparse sets 7, the Elias-Fano scheme can be 
rewarding in terms of space, but query time 
becomes O(log(m/n)). 


The Hypergraph-Based Construction 

We start by recalling the hypergraph-based 
construction presented in [8]. Their method, 
albeit originally devised only for order- 
preserving MPHE, can be used to store compactly 
an arbitrary r-bit function f : S — [27]. 
The construction draws three hash functions 
ho, hy,h2 S > [yn] (with y & 1.23) 
and builds a 3-hypergraph with one hyperedge 
(ho(x), 41 (x), h2(x)) for every x € S. With 
positive probability, this hypergraph does not 
have a nonempty 2-core, that is, its hyperedges 
can be sorted in such a way that every hyperedge 
contains (at least) a vertex that never appeared 
before, called the hinge. Equivalently, the set of 
equations (in the variables a; ) 
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F (2) = (no (x) + 4hy (x) + Ahoy) mod 2” 


has a solution that can be found by a hypergraph- 
peeling process in time O(7). Storing the func- 
tion consists in storing yn integers of r bits each 
(the array a;), so yrn bits are needed (excluding 
the bits needed for the hash functions); function 
evaluation takes constant time. 

In [3] the authors (which were not aware 
of [8]) present a “mutable Bloomier filter,’ which 
is formed by a PHF and a data storage indexed 
by the output of the PHF. The idea is to let 
r = 2 and to decide f after the hinges have 
been successfully determined, letting f(x) be the 
index of the hinge of the hyperedge associated 
with x, that is, the index i € {0, 1,2} such that 
h;(x) is the hinge of (A(x), 11 (x), h2(x)). This 
way, the function g : S — [m] defined by g(x) = 
h f(x) () is a PHF, and it is stored in 2yn bits. 

The fact that combining such a construction 
with a ranking data structure might actually pro- 
vide an MPHF was noted in [2] (whose authors 
did not know [3]). An important implementation 
trick that makes it possible to get ~2.65 bits per 
key is the fact that r = 2, but actually we need to 
store three values. Thus, when assigning values to 
hinges, we can use 3 (which modulo 3 is equiva- 
lent to zero) in place of 0: in this way, hinges are 
exactly associated to those a; that are nonzero, 
which makes it possible to build a custom ranking 
structure that does not use an additional bit vec- 
tor, but rather ranks directly nonzero pairs of bits. 


The “Hash, Displace, and Compress” 
Construction 

A completely different approach is suggested 
in [1]: once more, they first build a PHF A: S > 
[m] where m = (1 + €)n for some € > 0. The 
set S is first divided into r buckets by means of 
a first-level hash function g : S — [r]; the r 
buckets g~1(0),..., g~!(r—1) are sorted by their 
cardinalities, with the largest buckets first. 

Let Bo,...,B;--1 be the buckets and let 
bo. f1,¢2,... be a sequence of independent 
fully random hash functions S — [m]. For every 
i = 0O,...,7 — 1, the construction algorithm 
determines the smallest index p; such that @p, 
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is injective when applied to B; and moreover 
p; (B;) is disjoint from U; <p, (B;). A careful 
analysis shows that this construction takes linear 
time in expectation (the choice of r impacts on 
construction time) and that the expected pj; is 
bounded by a constant, so the indices can be 
stored in O(log(1/e)m) space. 

In practice, if r is chosen so that the average 
bucket size is ~5, it is possible to obtain an 
MPHEF using ~2.05 bits per key with a construc- 
tion time that is still feasible, albeit an order 
of magnitude larger than the hypergraph-based 
construction. 

The authors of [1] also discuss a variant that 
can directly build MPHFs, but the construction 
time is no longer linear in expectation; moreover, 
from a practical viewpoint it is useful to enlarge 
slightly the buckets so that they have a prime size 
(this makes it easier to generate a good sequence 
of hash functions [1]). 


Open Problems 


Improving construction and query time in prac- 
tice and getting closer to the space lower bound 
keeping the construction feasible are the main 
open problems about MPHFs, as there are al- 
ready known constructions that close the gap 
asymptotically. 


URLs to Code and Data Sets 


The Sux4J library  (http://sux4j.di.unimi.it/) 
provides Java implementations of the methods 
we discussed. The CMPH library (http://cmph. 
sourceforge.net/) provides C implementations. 
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Problem Definition 


Overview 
Minimum bisection is a basic representative of 
a family of discrete optimization problems deal- 
ing with partitioning the vertices of an input 
graph. Typically, one wishes to minimize the 
number of edges going across between the dif- 
ferent pieces, while keeping some control on the 
partition, say by restricting the number of pieces 
and/or their size. (This description corresponds 
to an edge-cut of the graph; other variants cor- 
respond to a vertex-cut with similar restrictions.) 
In the minimum bisection problem, the goal is to 
partition the vertices of an input graph into two 
equal-size sets, such that the number of edges 
connecting the two sets is as small as possible. 
In a seminal paper in 1988, Leighton and 
Rao [14] devised for MINIMUM-BISECTION 
a logarithmic-factor bicriteria approximation 
algorithm. (A bicriteria approximation algorithm 
partitions the vertices into two sets each 
containing at most 2/3 of the vertices, and its 
value, i.e., the number of edges connecting the 
two sets, is compared against that of the best 
partition into equal-size sets.) Their algorithm 
has found numerous applications, but the 
question of finding a true approximation with 
a similar factor remained open for over a decade 
later. In 1999, Feige and Krauthgamer [6] 
devised the first polynomial-time algorithm that 
approximates this problem within a factor that is 
polylogarithmic (in the graph size). 


Cuts and Bisections 

Let G=(V,E) be an undirected graph with 
n = |V| vertices, and assume for simplicity that 
n is even. For a subset S of the vertices, let 
S = V \ S. The cut (also known as cutset) (S, §) 
is defined as the set of all edges with one endpoint 
in S and one endpoint in S. These edges are said 
to cross the cut, and the two sets S and S are 
called the two sides of the cut. 
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Assume henceforth that G has nonnegative 
edge-weights. (In the unweighted version, every 
edge has a unit weight.) The cost of a cut (S, 5) 
is then defined to be the total edge-weight of all 
the edges crossing the cut. 

A cut (S,S) is called a bisection of G if 
its two sides have equal cardinality, namely 
|S| = |S| = n/2. Let b(G) denote the minimum 
cost of a bisection of G. 


Problem 1 (MINIMUM-BISECTION) 

INPUT: An undirected graph G with nonnegative 
edge-weights. 

OUTPUT: A bisection (S, 8) of G that has mini- 
mum cost. 


This definition has a crucial difference from the 
classical MINIMUM-CUT problem (see e.g., [10] 
and references therein), namely, there is a restric- 
tion on the sizes of the two sides of the cut. As 
it turns out, MINIMUM-BISECTION is NP-hard 
(see [9]), while MINIMUM-CUT can be solved in 
polynomial time. 


Balanced Cuts and Edge Separators 

The above rather basic definition of mini- 
mum bisection can be extended in several 
ways. Specifically, one may require only an 
upper bound on the size of each side. For 
0< pf <1, a cut (S, S) is called B-balanced if 
max{|S|,|S|} < Bn. Note the latter requirement 
implies min{|S|,|S|}>(1—B)n. In _ this 
terminology, a bisection is a 1/2-balanced cut. 


Problem 2 (8-BALANCED-CUT) 

INPUT: An undirected graph G with nonnegative 
edge-weights. 

OutTPuT: A f-balanced cut (S,S) of G with 
max{|S|,|S|} < Bn, that has cost as small as 
possible. 


The special case of 8 = 2/3 is commonly refered 
to as the EDGE-SEPARATOR problem. 

In general, the sizes of the two sides may be 
specified in advance arbitrarily (rather than being 
equal); in this case the input contains a number 
k, and the goal is to find a cut (S,S) such 
that |S| =k. One may also wish to divide the 
graph into more than two pieces of equal size 
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and then the input contains a number r > 2, or 
alternatively, to divide the graph into r pieces of 
whose sizes are k,,...,k,, where the numbers k; 
are prescribed in the input; in either case, the 
goal is to minimize the number of edges crossing 
between different pieces. 


Problem 3 (PRESCRIBED-PARTITION) 

INPUT: An undirected graph G = (V, E) with 
nonnegative edge-weights, and integers k,,... ,k, 
such that )>, ki = |V|. 

OUTPUT: A partition V = V, U---UV, of G 
with |V;| = k; for all i, such that the total edge- 
weight of edges whose endpoints lie in different 
sets V; is as small as possible. 


Key Results 


The main result of Feige and Krauthgamer [6] 
is an approximation algorithm for MINIMUM- 
BISECTION. The approximation factor they orig- 
inally claimed is O(log”), because it used the 
algorithm of Leighton and Rao [14]; however, 
by using instead the algorithm of [2], the factor 
immediately improves to O(log! n). 


Theorem 1 Minimum-Bisection can be approx- 
imated in polynomial time within O(log'? n) 
factor. Specifically, the algorithm produces for an 
input graph G a bisection (S, 8) whose cost is at 
most O(log! n) - b(G). 


The algorithm immediately extends to similar 
results for related and/or more general problems 
that are defined above. 


Theorem 2 6-Balanced-Cut (and in particular 
Edge-Separator) can be approximated in polyno- 
mial time within O(log!” n) factor: 


Theorem 3 Prescribed-Partition can be approx- 
imated in time n° to within O(log n) factor. 


For all three problems above, the approximation 
ratio improves to O(logn) for the family of 
graphs excluding a fixed minor (which includes 
in particular planar graphs). For simplicity, this 
result is stated for Minimum-Bisection. 
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Theorem 4 Jn graphs excluding a fixed graph 
as a minor (e.g., planar graphs), the problems 
(i) Minimum-Bisection, (ii) B-Balanced-Cut, and 
(iii) Prescribed-Partition with fixed r can all be 
approximated in polynomial time within factor 


O(logn). 


It should be noted that all these results can be 
generalized further, including vertex-weights and 
terminals-vertices (s — t pairs), see [Sect. 5 in 6]. 


Related Work 

A bicriteria approximation algorithm for B- 
balanced cut returns a cut that is 6’-balanced 
for a predetermined £’ > 8. For bisection, for 
example, 6 = 1/2 and typically B’ = 2/3. 

The algorithms in the above theorems use 
(in a black-box manner) an approximation 
algorithm for a problem called minimum 
quotient-cuts (or equivalently, sparsest-cut with 
uniform-demands). For this problem, the best 
approximation currently known is O(,/logn) 
for general graphs due to Arora, Rao, and 
Vazirani [2], and O(1) for graphs excluding 
a fixed minor due to Klein, Plotkin, and 
Rao [13]. These approximation algorithms 
for minimum quotient-cuts immediately give 
a polynomial time bicriteria approximation 
(sometimes called = pseudo-approximation) 
for MINIMUM-BISECTION. For example, in 
general graphs the algorithm is guaranteed to 
produce a 2/3-balanced cut whose cost is at 
most O(.,/logn)-b(G). Note however that 
a 2/3-balanced cut does not provide a good 
approximation for the value of b(G). For instance, 
if G consists of three disjoint cliques of equal 
size, an optimal 2/3-balanced cut has no edges, 
whereas b(G) = 9(n”). For additional related 
work, including approximation algorithms for 
dense graphs, for directed graphs, and for other 
graph partitioning problems, see [Sect. 1 in 6] 
and the references therein. 


Applications 


One major motivation for MINIMUM-BISECTION, 
and graph partitioning in general, is a divide- 
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and-conquer approach to solving a variety of 
optimization problems, especially in graphs, 
see e.g., [15, 16]. In fact, these problems 
arise naturally in a wide range of practical 
settings such as VLSI design and image 
processing; sometimes, the motivation is 
described differently, e.g., as a clustering task. 

Another application of MINIMUM-BISECTION 
is in assignment problems, of a form that is 
common in parallel systems and in scientific 
computing: jobs need to be assigned to machines 
in a balanced way, while assigning certain pairs 
of jobs the same machine, as much as possible. 
For example, consider assigning n jobs to 2 
machines, when the amount of communication 
between every two jobs is known, and the goal 
is to have equal load (number of jobs) on 
each machine, and bring to minimum the total 
communication that goes between the machines. 
Clearly, this last problem can be restated as 
MINIMUM-BISECTION in a Suitable graph. 

It should be noted that in many of these 
settings, a true approximation is not absolutely 
necessary, and a bicriteria approximation may 
suffice. Nevertheless, the algorithms stated in 
section “Key Results” have been used to design 
algorithms for other problems, such as (1) an 
approximation algorithm for minimum bisection 
in k-uniform hypergraphs [3]; (2) an approxi- 
mation algorithm for a variant of the minimum 
multicut problem [17]; and (3) an algorithm that 
efficiently certifies the unsatisfiability of random 
2k-SAT with sufficiently many clauses [5]. 

From a practical perspective, numerous 
heuristics (algorithms without worst-case 
guarantees) for graph partitioning have been 
proposed and studied, see [1] for an extensive 
survey. For example, one of the most famous 
heuristics is Kerninghan and Lin’s local search 
heuristic for minimum bisection [11]. 


Open Problems 


Currently, there is a large gap between the 
O(log'> n) approximation ratio for MINIMUM- 
BISECTION achieved by Theorem 1 and the 
hardness of approximation results known for 
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it. As mentioned above, MINIMUM-BISECTION 
is known to be NP-hard (see [9]). 

The problem is not known to be APX-hard 
but several results provide evidence towards this 
possibility. Bui and Jones [4] show that for every 
fixed € > 0, it is NP-hard to approximate the 
minimum bisection within an additive term of 
n?-€, Feige [7] showed that if refuting 3SAT 
is hard on average on a natural distribution of 
inputs, then for every fixed ¢ > 0 there is no 
4/3 — approximation algorithm for minimum 
bisection. Khot [12] proved that minimum bisec- 
tion does not admit a polynomial-time approxi- 
mation scheme (PTAS) unless NP has random- 
ized sub-exponential time algorithms. 

Taking a broader perspective, currently there 
is a (multiplicative) gap of O(logn) between the 
approximation ratio for MINIMUM-BISECTION 
and that of minimum quotient-cuts (and thus also 
to the factor achieved by bicriteria approxima- 
tion). It is interesting whether this gap can be 
reduced, e.g., by using the algorithm of [2] in 
a non-black box manner. 

The vertex-cut version of MINIMUM- 
BISECTION is defined as follows: the goal is 
to partition the vertices of the input graph into 
V =AUBUS with |S| as small as possible, 
under the constraints that max{|A|,|B|} < 1/2 
and no edge connects A with B. It is not known 
whether a polylogarithmic factor approximation 
can be attained for this problem. It should be 
noted that the same question regarding the 
directed version of MINIMUM-BISECTION was 
answered negatively by Feige and Yahalom [8]. 
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Problem Definition 


This problem is concerned with the most efficient 
use of redundancy in load balancing on faulty 
parallel links. More specifically, this problem 
considers a setting where some messages need 
to be transmitted from a source to a destination 
through some faulty parallel links. Each link fails 
independently with a given probability, and in 
case of failure, none of the messages assigned 
to it reaches the destination. (This assumption 
is realistic if the messages are split into many 
small packets transmitted in a round-robin fash- 
ion. Then the successful delivery of a message 
requires that all its packets should reach the 
destination.) An assignment of the messages to 
the links may use redundancy, i.e., assign mul- 
tiple copies of some messages to different links. 
The reliability of a redundant assignment is the 
probability that every message has a copy on 
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some active link, thus managing to reach the 
destination. Redundancy increases reliability, but 
also increases the message load assigned to the 
links. A good assignment should achieve high 
reliability and keep the maximum load of the 
links as small as possible. 

The reliability of a redundant assignment de- 
pends on its structure. In particular, the reliability 
of different assignments putting the same load 
on every link and using the same number of 
copies for each message may vary substantially 
(e.g., compare the reliability of the assignments 
in Fig. 1). The crux of the problem is to find an 
efficient way of exploiting redundancy in order to 
achieve high reliability and low maximum load. 
(If one does not insist on minimizing the max- 
imum load, a reliable assignment is constructed 
by assigning every message to the most reliable 
links.) 

The work of Fotakis and Spirakis [1] for- 
mulates the scenario above as an optimization 
problem called Minimum Fault-Tolerant Conges- 
tion and suggests a simple and provably effi- 
cient approach of exploiting redundancy. This 
approach naturally leads to the formulation of 
another interesting optimization problem, namely 
that of computing an efficient fault-tolerant parti- 
tion of a set of faulty parallel links. [1] presents 
polynomial-time approximation algorithms for 
computing a fault-tolerant partition of the links 
and proves that combining fault-tolerant parti- 
tions with standard load balancing algorithms 
results in a good approximation to Minimum 
Fault-Tolerant Congestion. To the best knowl- 
edge of the entry authors, this work is the first to 
consider the approximability of computing a re- 
dundant assignment that minimizes the maximum 
load of the links subject to the constraint that 
random faults should be tolerated with a given 
probability. 


Notations and Definitions 

Let M denote a set of m faulty parallel links 
connecting a source s to a destination ¢, and 
let J denote a set of m messages to be trans- 
mitted from s to ft. Each link i has a rational 
capacity cj; > 1 and a rational failure probabil- 
ity f; € (0,1). Each message j has a rational 
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Minimum Congestion Redundant Assignments, Fig. 
1 Two redundant assignments of 4 unit size messages to 
8 identical links. Both assign every message to 4 links 
and 2 messages to every link. The corresponding graph 
is depicted below each assignment. The assignment on 
the left is the most reliable 2-partitioning assignment ¢2. 
Lemma 3 implies that for every failure probability 


size s; > 1. Let fax = maxjey{ fi} denote the 
failure probability of the most unreliable link. 
Particular attention is paid to the special case of 
identical capacity links, where all capacities are 
assumed to be equal to 1. 

The reliability of a set of links M’, denoted 
Rel(M’), is the probability that there is an active 
link in M’. Formally, Rel(M’) = 1—[]jeqy fi- 
The reliability of a collection of disjoint link 
subsets M = {Mj,..., My}, denoted Rel(™), is 
the probability that there is an active link in every 
subset of M. Formally, 


v 


ITyi- I 4 


l=1 i1eMe 


Rel(M) = I] Rel(My) = 
l=1 


A redundant assignment ¢: J + 2” \ @ is 
a function that assigns every message j to a non- 
empty set of links ¢(j) C M. An assignment 
@ is feasible for a set of links M’ if for every 
message j, (j) 1 M’ # @. The reliability of an 
assignment ¢, denoted Rel(@), is the probability 
that @ is feasible for the actual set of active links. 
Formally, 
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f, $2 is at least as reliable as any other assign- 
ment @ with Cong(@) < 2. For instance, 2 is at 
least as reliable as the assignment on the right. In- 
deed the reliability of the assignment on the right 
is 1—-4f4+2f°+4f7—3f 8, which is bounded 
from above by Rel(@2)=1—2f4+ f2 for all 
f € [0,1] 


» 


Rel(¢) = 
M’CcM 


Viet, o()NM'#B 


[[q-f) [] sf 


ieM’ ie¢M\M’ 


The congestion of an assignment ¢, denoted 
Cong(@), is the maximum load assigned by ¢ to 
a link in M. Formally, 


Sj 
Cong($) = max 4 » _ 
J:iebG) 
Problem 1 (Minimum Fault-Tolerant Conges- 
tion) 
INPUT: A Set of faulty parallel links M={(c1, fi), 
...3(Cm; fm)}, a set of messages J={51,...,8n}, 
and a rational number € € (0, 1). 
OUTPUT: A redundant assignment ¢:J'>2™ \ 
with Rel(f@) = 1 — € that minimizes Cong(@). 


Minimum Fault-Tolerant Congestion is NP-hard 
because it is a generalization of minimizing 
makespan on (reliable) parallel machines. The 
decision version of Minimum Fault-Tolerant 
Congestion belongs to PSPACE, but it is not 
clear whether it belongs to NP. The reason is 
that computing the reliability of a redundant 
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assignment and deciding whether it is a feasible 
solution is #P-complete. 

The work of Fotakis and Spirakis [1] presents 
polynomial-time approximation algorithms for 
Minimum Fault-Tolerant Congestion based on 
a simple and natural class of redundant assign- 
ments whose reliability can be computed easily. 
The high level idea is to separate the reliability as- 
pect from load balancing. Technically, the set of 
links is partitioned in a collection of disjoint sub- 
sets M = {Mj,...,M,} with Rel(M) > 1-e. 
Every subset My € M is regarded as a reliable 
link of effective capacity c(M¢) = minjem, {ci}. 
Then any algorithm for load balancing on reliable 
parallel machines can be used for assigning the 
messages to the subsets of M, thus computing 
a redundant assignment @ with Rel(f¢) > 1—e. 

The assignments produced by this approach 
are called partitioning assignments. More 
precisely, an assignment ¢: J++ 2”\Q@ is 
a v-partitioning assignment if for every pair 
of messages j,/j’, either (7) = ¢(j"’) or 
o(j)N o(/’) = G, and ¢ assigns the messages 
to v different link subsets. 

Computing an appropriate fault-tolerant 
collection of disjoint link subsets is an 
interesting optimization problem by itself. 
A feasible solution ™ satisfies the constraint 
that Rel(M) >1-—e. For identical capacity 
links, the most natural objective is to maximize 
the number of subsets in ™ (equivalently, 
the number of reliable links used by the load 
balancing algorithm). For arbitrary capacities, 
this objective generalizes to maximizing the total 
effective capacity of M. 


Problem 2 (Maximum Fault-Tolerant Parti- 
tion) 
INPUT: A set of faulty parallel links M={(c1, 


fi), -.-,(Cm, fm)}, and a rational number 
e € (0,1). 
OuTPUT: A collection M = {M,,...,M,} of 


disjoint subsets of M with Rel(™M) > 1 —« that 
maximizes )\p_, c(M). 


The problem of Maximum Fault-Tolerant 
Partition is NP-hard. More precisely, given m 
identical capacity links with rational failure 
probabilities and a rational number € € (0, 1), 
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it is NP-complete to decide whether the links 
can be partitioned into sets M, and M) with 
Rel(M;,) - Rel(M2) => 1—e. 


Key Results 


Theorem 1 There is a 2-approximation algo- 
rithm for Maximum Fault-Tolerant Partition of 
identical capacity links. The time complexity of 
the algorithm is O(i—)o;<y ln fi) nm). 


Theorem 2 For every constant 6 > 0, there is 
a (8+6)-approximation algorithm for Maximum 
Fault-Tolerant Partition of capacitated links. The 
time complexity of the algorithm is polynomial in 
the input size and 1/6. 


To demonstrate the efficiency of the partitioning 
approach for Maximum Fault-Tolerant Conges- 
tion, Fotakis and Spirakis prove that for cer- 
tain instances, the reliability of the most reliable 
partitioning assignment bounds from above the 
reliability of any other assignment with the same 
congestion (see Fig. | for an example). 


Lemma 3 For any positive integers A, v, 4 and 
any rational f € (0,1), let @ be a redundant as- 
signment of Av unit size messages to vt identical 
capacity links with failure probability f. Let dy 
be the v-partitioning assignment that assigns A 
messages to each of v disjoint subsets consisting 
of £ links each. If Cong(¢) < A = Cong(d¢y), 
then Rel(@) < (1 — f“)” = Rel(@,). 


Based on the previous upper bound on the relia- 
bility of any redundant assignment, [1] presents 
polynomial-time approximation algorithms for 
Maximum Fault-Tolerant Congestion. 


Theorem 4 There is a_ quasi-linear-time 
4-approximation algorithm for Maximum Fault- 
Tolerant Congestion on identical capacity links. 


Theorem 5 There is a_ polynomial-time 2 
[In(m/e)/ In(1./ fmax) |-approximation algorithm 
for Maximum Fault-Tolerant Congestion on 
instances with unit size messages and capacitated 
links. 
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Applications 


In many applications dealing with faulty 
components (e.g., fault-tolerant network 
design, fault-tolerant routing), a combinatorial 
structure (e.g., a graph, a hypergraph) should 
optimally tolerate random faults with respect 
to a given property (e.g., connectivity, non- 
existence of isolated points). For instance, 
Lomonosov [5] derived tight upper and 
lower bounds on the probability that a graph 
remains connected under random edge faults. 
Using the bounds of Lomonosov, Karger [3] 
obtained improved theoretical and practical 
results for the problem of estimating the 
reliability of a graph. In this work, Lemma 3 
provides a tight upper bound on the probability 
that isolated nodes do not appear in a not 
necessarily connected hypergraph with Av 
nodes and vu “faulty” hyperedges of cardinal- 
ity A. 

More precisely, let @ be any assignment of 
Av unit size messages to vu identical links 
that assigns every message to jz links and A 
messages to every link. Then @¢ corresponds 
to a hypergraph Hg, where the set of nodes 
consists of Av elements corresponding to the 
unit size messages and the set of hyperedges 
consists of vy elements corresponding to 
the identical links. Every hyperedge contains 
the messages assigned to the corresponding 
link and has cardinality A (see Fig. 1 for 
a simple example with A=2, v=2, and 
ju = 4). Clearly, an assignment ¢ is feasible 
for a set of links M’ C M iff the removal of 
the hyperedges corresponding to the links in 
M \ M’ does not leave any isolated nodes (For 
a node v, let degy(v) = |{e € E(H): v € e}|. 
A node v is isolated in H if degy(v) = 0) 
in Hg. Lemma3 implies that the hypergraph 
corresponding to the most reliable v-partitioning 
assignment maximizes the probability that 
isolated nodes do not appear when hyperedges 
are removed equiprobably and independently. 

The previous work on fault-tolerant network 
design and routing mostly focuses on the worst- 
case fault model, where a feasible solution must 
tolerate any configuration of a given number of 
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faults. The work of Gasieniec et al. [2] studies the 
fault-tolerant version of minimizing congestion 
of virtual path layouts in a complete ATM net- 
work. In addition to several results for the worst- 
case fault model, [2] constructs a virtual path 
layout of logarithmic congestion that tolerates 
random faults with high probability. On the other 
hand, the work of Fotakis and Spirakis shows 
how to construct redundant assignments that tol- 
erate random faults with a probability given as 
part of the input and achieve a congestion close 
to optimal. 


Open Problems 


An interesting research direction is to determine 
the computational complexity of Minimum Fault- 
Tolerant Congestion and related problems. The 
decision version of Minimum Fault-Tolerant 
Congestion is included in the class of languages 
decided by a polynomial-time non-deterministic 
Turing machine that reduces the language to 
a single call of a #P oracle. After calling the 
oracle once, the Turing machine rejects if the 
oracle’s outcome is less than a given threshold 
and accepts otherwise. This class is denoted 
NP#PI1-comp] in [1]. In addition to Minimum 
Fault-Tolerant Congestion, Np*PLL-compl includes 
the decision version of Stochastic Knapsack 
considered in [4]. A result of Toda and 
Watanabe [6] implies that NP*PL-comPl contains 
the entire Polynomial Hierarchy. A challenging 
open problem is to determine whether the 
decision version of Minimum Fault-Tolerant 
Congestion is complete for NP#*PLL.compl 

A second direction for further research 
is to consider the generalizations of other 
fundamental optimization problems  (e.g., 
shortest paths, minimum connected subgraph) 
under random faults. In the fault-tolerant version 
of minimum connected subgraph for example, 
the input consists of a graph whose edges fail 
independently with given probabilities, and 
a rational number ¢€ € (0,1). The goal is to 
compute a spanning subgraph with a minimum 
number of edges whose reliability is at least 
l—e. 
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Problem Definition 


Nowadays, the sensor exists everywhere. The 
wireless sensor network has been studied exten- 
sively. In view of this type of networks, there 
are two most important properties, coverage and 
connectivity. In fact, the sensor is often used 
for collecting information, and hence, its sensing 
area has to cover the target (points or area). 
Usually, for a wireless sensor, its sensing area is 
a disk with the center at the sensor. The radius 
of this disk is called the sensing radius. After 
information is collected, the sensor has to send to 
central station for analysis. This requires all ac- 
tive sensors to form a connected communication 
network. Actually, every sensor has also a com- 
munication function, and it can send information 
to other sensors located in its communication 
area, which is also a disk with center at the sensor. 
The radius of the communication disk is called 
the communication radius. 

The sensor is often very small and energy is 
often supplied with batteries. Therefore, energy 
efficiency is a big issue in the study of wireless 
sensor networks. A sensor network is said to be 
homogeneous if all sensors in the network have 
the same size of sensing radius and the same size 
of communication radius. For a homogeneous 
wireless sensor network, the energy consumption 
can be measured by the number of active sensors. 
The minimum connected sensor cover problem 
is a classic optimization problem based on the 
above consideration in the study of wireless sen- 
sor networks, which is described as follows. 

Consider a homogeneous wireless sensor net- 
work in the Euclidean plane. Given a connected 
target area 92, find the minimum number of sen- 
sors satisfying the following two conditions: 


[Coverage] The target area 92 is covered by 
selected sensors. 

[Connectivity] Selected sensors induce a con- 
nected network. 


Minimum Connected Sensor Cover 


A subset of sensors is called a sensor cover 
if it satisfies the coverage condition and called a 
connected sensor cover if it satisfies both the cov- 
erage condition and the connectivity condition. 

The minimum connected sensor cover prob- 
lem is NP-hard. The study on approximation 
solutions of this problem has attracted many 
researchers. 


Key Results 


The minimum connected sensor cover problem 
was first proposed by Gupta, Das, and Gu [8]. 
They presented a greedy algorithm with perfor- 
mance ratio O(r Inn) where n is the number of 
sensors and r is the link radius of the sensor 
network, i.e., for any two sensors with over- 
lapping sensing disks, their hop distance in the 
communication network is at most r. 

Zhang and Hou [12] studied a special case 
that the communication radius is at least twice of 
the sensing radius, and they showed that in this 
case, the coverage of a connected region implied 
the connectivity. In this case, they presented a 
polynomial-time constant-approximation. 

Das and Gupta [13] and Xing et al. [11] 
explored more about the relationship between 
coverage and connectivity. Bai et al. [2] studied a 
sensor deployment problem regarding the cover- 
age and connectivity. Alam and Haas [1] studied 
the minimum connected sensor cover problem in 
three-dimensional sensor networks. 

Funke et al. [5] allow sensors to vary 
their sensing radius. With variable sensing 
radius and communication radius, Zhou, 
Das, and Gupta [14] designed a polynomial- 
time approximation with performance ratio 
O(logn). Chosh and Das [6] designed a greedy 
approximation using the maximal independent 
set and Voronoi diagram. They determined the 
size of connected sensor cover produced by their 
algorithm. However, no comparison with optimal 
solution, that is, no analysis on approximation 
performance ratio, is given. In fact, none of 
the above efforts give an improvement on the 
approximation performance ratio O(rlogn) 
given by Gupta, Das, and Gu [8]. 
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Wu et al. [10] made the first improvement. 
They present two polynomial-time approxi- 
mations. The first one has performance ratio 
O(r). This approximation is designed based on 
a polynomial-time constant-approximation [4] or 
a polynomial-time approximation scheme [9] for 
the minimum target coverage problem as follows: 
given a homogeneous set of sensors and a set of 
target points in the Euclidean plane, find the 
minimum number of sensors covering all given 
target points. 

The O(r)-approximation consists of three 
steps. In the first step, it replaces the target 
area by O(n”) target points such that the target 
area is covered by a subset of sensors if and 
only if those target points are all covered by 
this subset of sensors. In the second step, it 
computes a constant-approximation solution, 
say C-approximation solution S for the minimum 
target coverage problem with those target points 
as input. Since the optimal solution for the 
minimum connected sensor cover problem must 
be a feasible solution for the minimum target 
coverage problem. |.S| is within a factor c of the 
size of a minimum connected sensor cover, i.e., 
|| Sc: OPtncse* 

In the third step, the algorithm employs a 
polynomial-time 1.39-approximation algorithm 
for the network Steiner tree problem [3], and 
apply this algorithm on the input consisting of 
a graph with unit weight for each edge, which 
is the communication network of sensors, and 
a terminal set S. Note that in a graph with unit 
weight for each edge, the total edge weight of a 
tree is the number of vertices in the tree minus 
one. Therefore, the result obtained in the third 
step is a connected sensor cover with cardinality 
upper bounded by |S| plus (1 + 1.39 x (size of 
minimum network Steiner tree on $)). 

To estimate the size of minimum network 
Steiner tree on S, consider a minimum connected 
sensor cover S*. Let T be a spanning tree in 
the subgraph induced by S*. For each sensor 
s € S, there must exist a sensor s’ € S* with 
sensing disk overlapping with the sensing disk of 
s. Therefore, there is a path Ps, with distance at 
most r, connecting s and s’ in the communication 
network. Clearly, T U (Uses Ps) is a Steiner 
tree on S. Hence, the size of minimum network 
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Steiner tree on S is at most |S] — 1+ |S|-7r. 
Therefore, the connected sensor cover obtained 
in the third step has size at most 


[S}+1+1.39 - (S|\@+1-1 
|S|(1.39r + 2.39) 
c(1.39r + 2.39) - opt, 


IA 


IA 


mcsc 


O(r) . OPtnese: 


The second polynomial-time approximation 
designed by Wu et al. [10] is a random algo- 
rithm with performance ratio O(log?n). This 
approximation is obtained by the following two 
observations: (1) The minimum connected sensor 
cover problem is a special case of the minimum 
connected set cover problem. (2) The minimum 
connected set-cover problem has a close relation- 
ship to the group Steiner tree. Therefore, some 
results on group Steiner trees can be transformed 
into connected sensor covers. 

In conclusion, Wu et al. [10] obtained the 
following: 


Theorem 1 There exists a _ polynomial-time 
O(r)-approximation for the minimum connected 
sensor cover problem. There exists also a 
polynomial-time random algorithm with per- 
formance O(log? n) for the minimum connected 
sensor cover problem. 


Open Problems 


For two approximations in Theorem 1, one has 
performance ratio O(r) independent from n and 
the other one has performance ratio O(log? 7) 
independent from r. This fact suggests that ei- 
ther n or r is closely related or there exists a 
polynomial-time constant-approximation. There- 
fore, we have the following conjecture: 


Conjecture 1 There exists a polynomial-time 
O(1)-approximation for the minimum connected 
sensor cover problem. 
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Problem Definition 


In the most common model for wireless net- 
works, stations are represented by points in R2. 
They are equipped with a omnidirectional trans- 
mitter and receiver which enables them to com- 
municate with other stations. In order to send 
a message from a station s to a station f, station 
s needs to emit the message with enough power 
such that ¢ can receive it. It is usually assumed 
that the power required by a station s to trans- 
mit data directly to station ¢ is ||st||%, for some 
constant a > 1, where ||st|| denotes the distance 
between s and f. 

Because of the omnidirectional nature of the 
transmitters and receivers, a message sent by 
a station s with power r® can be received by 
all stations within a disc of radius r around s. 
Hence the energy required to send a message 
from a station s directly to a set of stations S’ is 
determined by maxyesy ||sv||*. 

An instance of the minimum energy broadcast 
routing problem in wireless networks (MEBR) 
consists of a set of stations S and a constant 
a > 1. One of the stations in S is designated as the 
source station so. The goal is to send a message at 
minimum energy cost from So to all other stations 
in S. This operation is called broadcast. 
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In the case a = 1, the optimal solution is to 
send the message directly from sg to all other 
stations. For a > 1, sending the message via 
intermediate stations which forward it to other 
stations is often more energy efficient. 

A solution of the MEBR instance can be 
described in terms of a so-called broadcast tree. 
That is, a directed spanning tree of S which con- 
tains directed paths from sg to all other vertices. 
The solution described by a broadcast tree T is the 
one in which every station forwards the message 
to all its out-neighbors in T. 


Problem 1 (MEBR) 

INSTANCE: A set S of points in R4, s9 ES 
designated as the source, and a constant a. 
SOLUTION: A broadcast tree T of S. 
MEASURE: The objective is to minimize the 
total energy needed to broadcast a message from 
So to all other nodes,which can be expressed by 


re (1) 


max ||uv| 


par ved(u) 


where 5(u) denotes the set of out-neighbors of 
Station u in T. 


The MEBR problem is known to be NP-hard for 
d > 2anda > 1 [2]. APX-hardness is known for 
d > 3 [5]. 


Key Results 


Numerous heuristics have been proposed for this 
problem. Only a few of them have been ana- 
lyzed theoretically. The one which attains the best 
approximation guarantee is the so-called MST- 
heuristic [12]. 

MST-HEURISTIC: Compute a minimum 
spanning tree of S (mst(S)) and turn it into an 
broadcast tree by directing the edges. 


Theorem 1 ([{1]) Jn the Euclidean plane, the 
MST-heuristic is a 6 approximation algorithm for 
MEBR for alla = 2. 
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Minimum Energy Broadcasting in Wireless Geometric Networks, Fig. 1 Illustration of the first and second 
approach for bounding w/(S). In both approaches, w(S) is bounded in terms of the total area covered by the shapes 


Theorem 2 ({9]) In the Euclidean three- 
dimensional space, the MST-heuristic is a 18.8 
approximation algorithm for MEBR for all 
a > 3. 


For a<d, the MStT-heuristic does not 
provide a constant approximation ratio. The 
d-dimensional kissing numbers represent lower 
bounds for the performance of the MST-heuristic. 
Hence the analysis for d = 2 is tight, whereas for 
d = 3 the lower bound is 12. 


Analysis 
The analysis of the MST-heuristic is based on 
good upper bounds for 


w(S):= > flell®. (2) 


eemst(S) 


which obviously is an upper bound on (1). The 
radius of an instance of MEBR is the distance 
between so to the station furthest from so. It turns 
out that the MST-heuristic performs worst on 
instances of radius 1 whose optimal solution is 
to broadcast the message directly from so to all 
other stations. Since the optimal value for such 
instances is 1, the approximation ratio follows 
from good upper bounds on w/(S) for instances 
with radius 1. 

The rest of this section focuses on the case 
d =a = 2. There are two main approaches for 
upper bounding w(S). In both approaches, w(S) is 
upper bounded in terms of the area of particular 


kinds of shapes associated with either the stations 
or with the edges of the MST. 

In the first approach, the shapes are disks 
of radius m/2 placed around every station of S, 
where m is the length of the longest edge of 
mst(S). Let A denote the area covered by the 
disks. One can prove w(S) < <4 (A - m(m/2)*). 
Assuming that S has radius 1, one can prove 
w(S) < 8 quite easily [4]. This approach can 
even be extended to obtain w(S) < 6.33 [8], and 
it can be generalized for d > 2. 

In the second approach [7, 11], w(S) is ex- 
pressed in terms of shapes associated with the 
edges of mst(S), e.g., diamond shapes as shown 
on the right of Fig. 1. The area of a diamond 
for an edge e is equal to |le||?/(2/3). Since one 
can prove that the diamonds never intersect, one 
obtains w(S) = A/(2/3). For instances with ra- 
dius 1, one can get w(S) < 12.15. 

For the 2-dimensional case, one can even ob- 
tain a matching upper bound [1]. The shapes used 
in this proof are equilateral triangles, arranged 
in pairs along every edge of the MST. As can 
be seen on the left of Fig. 2, these shapes do 
intersect. Still one can obtain a good upper bound 
on their total area by means of the convex hull 
of S: 

Let the extended convex hull of S be the convex 
hull of S extended by equilateral triangles along 
the border of the convex hull. One can prove 
that the total area generated by the equilateral 
triangle shapes along the edges of mst(S) is upper 
bounded by the area of the extended convex hull 
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Minimum Energy Broadcasting in Wireless Geomet- 
ric Networks, Fig. 2 Illustration of the tight bound for 
d = 2. The total area of the equilateral triangles on the 


of S. By showing that for instances of radius 1 
the area of the extended convex hull is maximized 
by the point configuration shown on the right of 
Fig. 2, the matching upper bound of 6 can be 
established. 


Applications 


The MEBR problem is a special case of a large 
class of problems called range assignment prob- 
lems. In all these problems, the goal is to assign 
a range to each station such that a certain type 
of communication operation such as broadcast, 
all-to-1 (gathering), all-to-all (gossiping), can be 
accomplished. See [3] for a survey on range 
assignment problems. 

It is worth noting that the problem of upper 
bounding w(S) has already been considered in 
different contexts. The idea of using diamond 
shapes to upper bound the length of an MST has 
already been used by Gilbert and Pollak in [6]. 
Steele [10] makes use of space filling curves to 
bound w/(S). 


Open Problems 


An obvious open problem is to close the gap in 
the analysis of the MST-heuristic for the three 
dimensional case. This might be very difficult, as 
the lower bound from the kissing number might 
not be tight. 


left is bounded by its extended convex hull shown in the 
middle. The point set that maximizes area of the extended 
convex hull is the star shown on the right 


Even in the plane, the approximation ratio of 
the MST-heuristic is quite large. It would be inter- 
esting to see a different approach for the problem, 
maybe based on LP-rounding. It is still not known 
whether MEBR is APX-hard for instances in the 
Euclidean plane. Hence there might exist a PTAS 
for this problem. 
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Problem Definition 


Ad hoc wireless networks have received signifi- 
cant attention in recent years due to their poten- 
tial applications in battlefield, emergency disaster 
relief and other applications [11, 15]. Unlike 
wired networks or cellular networks, no wired 
backbone infrastructure is installed in ad hoc 
wireless networks. A communication session is 
achieved either through a single-hop transmission 
if the communication parties are close enough, 
or through relaying by intermediate nodes oth- 
erwise. Omni-directional antennas are used by 
all nodes to transmit and receive signals. They 
are attractive in their broadcast nature. A single 
transmission by a node can be received by many 
nodes within its vicinity. This feature is extremely 
useful for multicasting/broadcasting communica- 
tions. For the purpose of energy conservation, 
each node can dynamically adjust its transmitting 
power based on the distance to the receiving node 
and the background noise. In the most common 
power-attenuation model [10], the signal power 
falls as = where r is the distance from the trans- 
mitter antenna and «x is a real constant between 
2 and 4 dependent on the wireless environment. 
Assume that all receivers have the same power 
threshold for signal detection, which is typically 
normalized to one. With these assumptions, the 
power required to support a link between two 
nodes separated by a distance r is r*. A key 
observation here is that relaying a signal between 
two nodes may result in lower total transmission 
power than communicating over a large distance 
due to the nonlinear power attenuation. They 
assume the network nodes are given as a finite 
point (The terms node, point and vertex are in- 
terchangeable here: node is a network term, point 
is a geometric term, and vertex is a graph term.) 
set P in a two-dimensional plane. For any real 
number x, they use G to denote the weighted 
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complete graph over P in which the weight of an 
edge ¢ is |le||*. 

The minimum-energy unicast routing is es- 
sentially a shortest-path problem in G. Con- 
sider any unicast path from a node p = po € P 
to another node q = pm € P: popi-::Pm—1Pm- 
In this path, the transmission power of each 
node p;, 0 <i <m-—1, is ||p;p;41||“ and the 
transmission power of p,, is zero. Thus the to- 
tal transmission energy required by this path is 
sr"! IIp:pi+i|/". which is the total weight of 
this path in G“. So by applying any shortest-path 
algorithm such as the Dijkstra’s algorithm [5], 
one can solve the minimum-energy unicast rout- 
ing problem. 

However, for broadcast applications (in 
general multicast applications), Minimum- 
Energy Routing is far more challenging. Any 
broadcast routing is viewed as an arborescence 
(a directed tree) T, rooted at the source node 
of the broadcasting, that spans all nodes. Use 
fr (p) to denote the transmission power of the 
node p required by 7. For any leaf node p of T, 
Jr (p) = 0. For any internal node p of T, 


fr (p) = max ||pq||" , 
pqeT 


in other words, the «-th power of the longest 
distance between p and its children in T. The 
total energy required by T is we p Jt (p). Thus 
the minimum-energy broadcast routing problem 
is different from the conventional link-based 
minimum spanning tree (MST) problem. Indeed, 
while the MST can be solved in polynomial 
time by algorithms such as Prim’s algorithm 
and Kruskal’s algorithm [5], it is NP-hard [4] 
to find the minimum-energy broadcast routing 
tree for nodes placed in two-dimensional plane. 
In its general graph version, the minimum- 
energy broadcast routing can also be shown to 
be NP-hard [7], and even worse, it can not be 
approximated within a factor of (1 —«) log A, 
unless NP CDTIME [ae oeeleen?) by an 
approximation-preserving reduction from the 
Connected Dominating Set problem [8], where 
A is the maximal degree and € is any arbitrary 
small positive constant. 
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Three greedy heuristics have been proposed 
for the minimum-energy broadcast routing prob- 
lem by [15]. The MST heuristic first applies the 
Prim’s algorithm to obtain a MST, and then orient 
it as an arborescence rooted at the source node. 
The SPT heuristic applies the Dijkstra’s algo- 
rithm to obtain a SPT rooted at the source node. 
The BIP heuristic is the node version of Dijkstra’s 
algorithm for SPT. It maintains, throughout its 
execution, a single arborescence rooted at the 
source node. The arborescence starts from the 
source node, and new nodes are added to the 
arborescence one at a time on the minimum incre- 
mental cost basis until all nodes are included in 
the arborescence. The incremental cost of adding 
a new node to the arborescence is the minimum 
additional power increased by some node in the 
current arborescence to reach this new node. The 
implementation of BIP is based on the standard 
Dijkstra’s algorithm, with one fundamental dif- 
ference on the operation whenever a new node 
q is added. Whereas the Dijkstra’s algorithm 
updates the node weights (representing the cur- 
rent knowing distances to the source node), BIP 
updates the cost of each link (representing the 
incremental power to reach the head node of 
the directed link). This update is performed by 
subtracting the cost of the added link pq from the 
cost of every link qr that starts from g to a node r 
not in the new arborescence. 


Key Results 


The performance of these three greedy heuristics 
have been evaluated in [15] by simulation studies. 
However, their analytic performances in terms of 
the approximation ratio remained open until [13]. 
The work of Wan et al. [13] derived the bounds 
on their approximation ratios. 

Let us begin with the SPT algorithm. Let € 
be a sufficiently small positive number. Consider 
m nodes pj, p2,°°:,Pm evenly distributed on 
a cycle of radius 1 centered at a node o. For 
1 <i <™m, let q; be the point in the line segment 
op; with ||oq; || = €. They consider a broadcast- 
ing from the node o to these n = 2m nodes 
P1, P2.°°: » Pm, 41, 92.°°:,Gm-. The SPT is the 
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superposition of paths oq;p;, | <i < m. Its to- 
tal energy consumption is e? + m(1—e)?. On 
the other hand, if the transmission power of 
node o is set to 1, then the signal can reach 
all other points. Thus the minimum energy con- 
sumed by all broadcasting methods is at most 
1. So the approximation ratio of SPT is at least 
«2 + m(1—e)”. Ase —> 0, this ratio converges 
to> =m. 
They [13] also proved that 


Theorem 1 The approximation ratio of MST is 
at least 6 for any k = 2. 


Theorem 2 The approximation ratio of BIP is at 
least 33 for any kK = 2. 


They then derived the upper bounds by exten- 
sively using the geometric structures of Euclidean 
MSTs (EMST). They first observed that as long 
as the cost of a link is an increasing function 
of the Euclidean length of the link, the set of 
MSTs of any point set coincides with the set 
of Euclidean MSTs of the same point set. They 
proved a key result about an upper bound on the 
parameter > eusT(p) \le||° for any finite point 
set P inside a disk with radius one. 


Theorem 3 Let c be the supreme of >») -emsT(P) 


\le||? overall such point sets P. Then 6 < c < 12. 


The following lemma proved in [13] is used to 
bound the energy cost for broadcast when each 
node can dynamically adjust its power. 


Lemma 4 For any point set P in the plane, the 
total energy required by any broadcasting among 


P is at least + eeMST(P) llell*. 


Lemma 5 For any broadcasting among a point 
set P in a two-dimensional plane, the total energy 
required by the arborescence generated by the 
BIP algorithm is at most ) .eysr py lle” 


Thus, they conclude the following two theorems. 


Theorem 6 The approximation ratio of EMST is 
at most c, and therefore is at most 12. 
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Theorem 7 The approximation ratio of BIP is at 
most c, and therefore is at most 12. 


Later, Wan et al. [14] studied the energy efficient 
multicast for wireless networks when each node 
can dynamically adjust its power. Given a set 
of receivers Q, the problem Min-Power Asym- 
metric Multicast seeks, for any given communi- 
cation session, an arborescence 7 of minimum 
total power which is rooted at the source node 
s and reaches all nodes in Q. As a generaliza- 
tion of Min-Power Asymmetric Broadcast Rout- 
ing, Min-Power Asymmetric Multicast Routing 
is also NP-hard. Wieselthier et al. [15] adapted 
their three broadcasting heuristics to three mul- 
ticasting heuristics by a technique of pruning, 
which was called as pruned minimum spanning 
tree (P-MST), pruned shortest-path tree (P-SPT), 
and pruned broadcasting incremental power (P- 
BIP), respectively in [14]. The idea is as follows. 
They first obtain a spanning tree rooted at the 
source of a given multicast session by applying 
any of the three broadcasting heuristics. They 
then eliminate from the spanning arborescence 
all nodes which do not have any descendant in 
Q. They [14] show by constructing examples that 
all structures P-SPT, P-MST and P-BIP could 
have approximation ratio as large as O(n) in the 
worst case for multicast. They then further pro- 
posed a multicast scheme with a constant approx- 
imation ratio on the total energy consumption. 
Their protocol for Min-Power Asymmetric Mul- 
ticast Routing is based on Takahashi-Matsuyama 
Steiner tree heuristic [12]. Initially, the multicast 
tree T contains only the source node. At each 
iterative step, the multicast tree T is grown by 
one path from some node in T to some destination 
node from Q that is not yet in the tree 7. The path 
must have the least total power among all such 
paths from a node in T to a node in Q — T. This 
procedure is repeated until all required nodes are 
included in 7. This heuristic is referred to as 
Shortest Path First (SPF). 


Theorem 8 For asymmetric multicast communi- 
cation, the approximation ratio of SPF is between 
6 and 2c, which is at most 24. 
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Applications 


Broadcasting and multicasting in wireless ad hoc 
networks are critical mechanisms in various ap- 
plications such as information diffusion, wireless 
networks, and also for maintaining consistent 
global network information. Broadcasting is of- 
ten necessary in MANET routing protocols. For 
example, many unicast routing protocols such as 
Dynamic Source Routing (DSR), Ad Hoc On 
Demand Distance Vector (AODV), Zone Routing 
Protocol (ZRP), and Location Aided Routing 
(LAR) use broadcasting or a derivation of it to 
establish routes. Currently, these protocols all 
rely on a simplistic form of broadcasting called 
flooding, in which each node (or all nodes in 
a localized area) retransmits each received unique 
packet exactly one time. The main problems 
with flooding are that it typically causes un- 
productive and often harmful bandwidth conges- 
tion, as well as inefficient use of node resources. 
Broadcasting is also more efficient than sending 
multiple copies the same packet through uni- 
cast. It is highly important to use power-efficient 
broadcast algorithms for such networks since 
wireless devises are often powered by batteries 
only. 


Open Problems 


There are some interesting questions left for fur- 
ther study. For example, the exact value of the 
constant c remains unsolved. A tighter upper 
bound on c can lead to tighter upper bounds on 
the approximation ratios of both the link-based 
MST heuristic and the BIP heuristic. They con- 
jecture that the exact value for c is 6, which seems 
to be true based on their extensive simulations. 
The second question is what is the approximation 
lower bound for minimum energy broadcast? Is 
there a PTAS for this problem? 

So far, all the known theoretically good al- 
gorithms either assume that the power needed 
to support a link uv is proportional to ||uv||* 
or is a fixed cost that is independent of the 
neighboring nodes that it will communicate with. 
In practice, the energy consumption of a node 
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is neither solely dependent on the distance to its 
farthest neighbor, nor totally independent of its 
communication neighbor. For example, a more 
general power consumption model for a node 
u would be c; + C2 - ||uv||* for some constants 
c, > 0 and cz > O where v is its farthest com- 
munication neighbor in a broadcast structure. No 
theoretical result is known about the approxi- 
mation of the optimum broadcast or multicast 
structure under this model. When cz = 0, this 
is the case where all nodes have a fixed power 
for communication. Minimizing the total power 
used by a reliable broadcast tree is equivalent to 
the minimum connected dominating set problem 
(MCDS), i.e., minimize the number of nodes that 
relay the message, since all relaying nodes of 
a reliable broadcast form a connected dominating 
set (CDS). Notice that recently a PTAS [2] has 
been proposed for MCDS in UDG graph. 
Another important question is how to find 
efficient broadcast/multicast structures such that 
the delay from the source node to the last node 
receiving message is bounded by a predetermined 
value while the total energy consumption is min- 
imized. Notice that here the delay of a broad- 
cast/multicast based on a tree is not simply the 
height of the tree: many nodes cannot transmit 
simultaneously due to the interference. 
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Problem Definition 


The problem is concerned with efficiently 
scheduling jobs on a system with multiple 
resources to provide a good quality of service. In 
scheduling literature, several models have been 
considered to model the problem setting, and 
several different measures of quality have been 
studied. This note considers the following model: 
There are several identical machines, and jobs 
are released over time. Each job is characterized 
by its size, which is the amount of processing 
it must receive to be completed, and its release 
time, before which it cannot be scheduled. In this 
model, Leonardi and Raz studied the objective 
of minimizing the average flow time of the jobs, 
where the flow time of a job is the duration 
of time since it is released until its processing 
requirement is met. Flow time is also referred 
to as response time or sojourn time and is a 
very natural and commonly used measure of the 
quality of a schedule. 


Notations 

Let J = {1,2,...,m} denote the set of jobs in 
the input instance. Each job 7 is characterized by 
its release time r; and its processing requirement 
p;. There is a collection of m identical machines, 
each having the same processing capability. A 
schedule specifies which job executes at what 
time on each machine. Given a schedule, the 
completion time c; of a job is the earliest time 
at which job 7 receives p; amount of service. 
The flow time f; of j is defined as c; —rj. A 
schedule is said to be preemptive if a job can be 
interrupted arbitrarily and its execution can be re- 
sumed later from the point of interruption without 
any penalty. A schedule is non-preemptive if a 
job cannot be interrupted once it is started. In the 
context of multiple machines, a schedule is said 
to be migratory if a job can be moved from one 
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machine to another during its execution without 
any penalty. In the off-line model, all the jobs J 
are given in advance. In scheduling algorithms, 
the online model is usually more realistic than the 
off-line model. 


Key Results 


For a single machine, it is a folklore result that 
the Shortest Remaining Processing Time (SRPT) 
policy that at any time works on the job with 
the least remaining processing time is optimal 
for minimizing the average flow time. Note that 
SRPT is an online algorithm and is a preemptive 
scheduling policy. 

If no preemption is allowed, Kellerer, Taut- 
enhahn, and Woeginger [6] gave an O(n'/?) 
approximation algorithm for minimizing the flow 
time on a single machine and also showed that 
no polynomial time algorithm can have an ap- 
proximation ratio of n!/?-® for any ¢ > 0 unless 
P=NP. 

Leonardi and Raz [8] gave the first nontrivial 
results for minimizing the average flow time on 
multiple machines. Later, a simpler presentation 
of this result was given by Leonardi [7]. The main 
result of [8] is the following. 


Theorem 1 ((8]) On multiple machines, the 
SRPT algorithm is O(min(log(n/m), log P)) 
competitive for minimizing average flow time, 
where P is the maximum to minimum job size 
ratio. 


They also gave a matching lower bound (up to 
constant factors) on the competitive ratio. 


Theorem 2 ((8]) For the problem of mini- 
mizing flow time on multiple machines, any 
online algorithm has a competitive ratio of 
Q(min(log(n/m), log P)), even when random- 
ization is allowed. 


Note that minimizing the average flow time 
is equivalent to minimizing the total flow time. 
Suppose each job pays $1 at each time unit it 
is alive (i.e., unfinished), then the total payment 
received is equal to the total flow time. Summing 
up the payment over each time step, the total 
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flow time can be expressed as the summation 
over the number of unfinished jobs at each time 
unit. As SRPT works on jobs that can be finished 
as soon as possible, it seems intuitively that 
it should have the least number of unfinished 
jobs at any time. While this is true for a single 
machine, it is not true for multiple machines (as 
shown in an example below). The main idea of 
[8] was to show that at any time, the number 
of unfinished jobs under SRPT is “essentially” 
no more than O(min(log P)) times that under 
any other algorithm. To do this, they developed 
a technique of grouping jobs into a logarithmic 
number of classes according to their remaining 
sizes and arguing about the total unfinished work 
in these classes. This technique has found a lot of 
uses since then to obtain other results. To obtain 
a guarantee in terms of n, some additional ideas 
are required. 

The instance below shows how SRPT 
could deviate from optimum in the case of 
multiple machines. This instance is also the 
key component in the lower bound construction 
in Theorem 2 above. Suppose there are two 
machines, and three jobs of size 1, 1, and 2 
arrive at time t = 0. SRPT would schedule the 
two jobs of size 1 at ¢ = O and then work on 
size 2 job at time tf = 1. Thus, it has one unit of 
unfinished work at t = 2. However, the optimum 
could schedule the size 2 job at time 0 and finish 
all these jobs by time 2. Now, at time ¢ = 2, 
three more jobs with sizes 1/2, 1/2, and | arrive. 
Again, SRPT will work on size 1/2 jobs first, and 
it can be seen that it will have two unfinished 
jobs with remaining work 1/2 each att = 3, 
whereas the optimum can finish all these jobs by 
time 3. This pattern is continued by giving three 
jobs of size 1/4, 1/4, and 1/2 at t = 3 and so on. 
After k steps, SRPT will have k jobs with sizes 
Lf 194178: 35,12" 4, 
while the optimum has no jobs remaining. Now 
the adversary can give 2 jobs of size 1/2" each 
every 1/2* time units for a long time, which 
implies that SRPT could be Q(log P.) worse than 
optimum. 

Leonardi and Raz also considered off-line al- 
gorithms for the non-preemptive setting in their 


paper. 
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Theorem 3 ([8]) There is a polynomial time off- 
line algorithm that achieves an approximation 
ratio of O(n"/? logn/m) for minimizing average 
flow time on m machines without preemption. 


To prove this result, they give a general tech- 
nique to convert a preemptive schedule to a non- 
preemptive one at the loss of an O(n'/?) factor 
in the approximation ratio. They also showed an 
almost matching lower bound. In particular, 


Theorem 4 ([8]) No polynomial time algorithm 
for minimizing the total flow time on multiple 
machines without preemption can have an ap- 
proximation ratio of O(n'/3-*) for any ¢ > 0, 
unless P = NP. 


Extensions 

Since the publication of these results, they have 
been extended in several directions. Recall 
that SRPT is both preemptive and migratory. 
Awerbuch, Azar, Leonardi, and Regev [2] 
gave an online scheduling algorithm that is 
nonmigratory and still achieves a competitive 
ratio of O(mindog(u/m),log P)). Avrahami 
and Azar [1] gave an even more restricted 
O(min(log P,log(n/m))) competitive online 
algorithm. Their algorithm, in addition to being 
nonmigratory, dispatches a job immediately to 
a machine upon its arrival. Recently, Garg and 
Kumar [4, 5] have extended these results to a 
setting where machines have nonuniform speeds. 
Other related problems and settings such as 
stretch minimization (defined as the flow time 
divided by the size of a job), weighted flow time 
minimization, general cost functions such as 
weighted norms, and the non-clairvoyant setting 
where the size of a job is not unknown upon its 
arrival have also been investigated. The reader 
is referred to the relatively recent survey [9] for 
more details. 


Applications 


The flow time measure considered here is one of 
the most widely used measures of quality of ser- 
vice, as it corresponds to the amount of time one 
has to wait to get the job done. The scheduling 
model considered here arises very naturally when 
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there are multiple resources and several agents 
that compete for service from these resources. 
For example, consider a computing system with 
multiple homogeneous processors where jobs are 
submitted by users arbitrarily over time. Keeping 
the average response time low also keeps the 
frustration levels of the users low. The model is 
not necessarily limited to computer systems. At 
a grocery store, each cashier can be viewed as a 
machine, and the users lining up to check out can 
be viewed as jobs. The flow time of a user is time 
spent waiting until she finishes her transaction 
with the cashier. Of course, in many applications, 
there are additional constraints such as it may be 
infeasible to preempt jobs or, if customers expect 
a certain fairness, such people might prefer to be 
serviced in a first-come-first-served manner at a 
grocery store. 


Open Problems 


The online algorithm of Leonardi and Raz is also 
the best-known off-line approximation algorithm 
for the problem. In particular, it is not known 
whether an O(1) approximation exists even for 
the case of two machines. Settling this would 
be very interesting. In related work, Bansal [3] 
considered the problem of finding nonmigratory 
schedules for a constant number of machines. 
He gave an algorithm that produces a (1 + €)- 
approximate solution for any ¢ > O in time 
nOllcs 7/82) This suggests the possibility of a 
polynomial time approximation scheme for the 
problem, at least for the case of a constant number 
of machines. 
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Problem Definition 


Let S be a set of 1 points in d-dimensional 
real space where d > 1 is an integer con- 
stant. A minimum spanning tree (MST) of S is 
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a connected acyclic graph with vertex set S of 
minimum total edge length. The length of an edge 
equals the distance between its endpoints under 
some metric. Under the so-called Lp metric, the 
distance between two points x and y with co- 
ordinates (x1,X2,...,Xq) and (1, y2,.--, Va); 
respectively, is defined as the pth root of the sum 
d 

Dd |i — yi. 


i=1 
Key Results 


Since there is a very large number of papers 
concerned with geometric MSTs, only a few of 
them will be mentioned here. 

In the common Euclidean L> metric, which 
simply measures straight-line distances, the MST 
problem in two dimensions can be solved opti- 
mally in time O(nlogn), by using the fact that the 
MST is a subgraph of the Delaunay triangulation 
of the input point set. The latter is in turn the 
dual of the Voronoi diagram of S, for which 
there exist several O(nlogn)-time algorithms. 
The term “optimally” here refers to the algebraic 
computation tree model. After computation of the 
Delaunay triangulation, the MST can be com- 
puted in only O(v) additional time, by using a 
technique by Cheriton and Tarjan [6]. 

Even for higher dimensions, i.e., when d > 2, 
it holds that the MST is a subgraph of the dual 
of the Voronoi diagram; however, this fact cannot 
be exploited in the same way as in the two- 
dimensional case, because this dual may contain 
Q(n?) edges. Therefore, in higher dimensions, 
other geometric properties are used to reduce the 
number of edges which have to be considered. 
The first subquadratic-time algorithm for higher 
dimensions was due to Yao [15]. A more efficient 
algorithm was later proposed by Agarwal et al. 
[1]. For d = 3, their algorithm runs in random- 
ized expected time O((nlog n)*/*) and for d > 
4, in expected time O (n2~2/(14/21)+1 +) where 
¢ stands for an arbitrarily small positive constant. 

The algorithm by Agarwal et al. builds on 
exploring the relationship between computing 
an MST and finding a closest pair between n 
red points and m blue points, which is called 
the bichromatic closest pair problem. They 
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showed that if Tyz(n,m) denotes the time to 
solve the latter problem, then an MST can be 
computed in O(T z(n, n)logén) time. Later, 
Callahan and Kosaraju [4] improved this bound to 
O(Tg(n,n)log n). Both methods achieve running 
time O(Ty(n,n)), if Tg(u,n) = Q(n'**%), for 
some a > 0. Finally, Krznaric et al. [11] showed 
that the two problems, i.e., computing an MST 
and computing the bichromatic closest pair, have 
the same worst-case time complexity (up to 
constant factors) in the commonly used algebraic 
computation tree model and for any fixed Lp 
metric. The hardest part to prove is that an MST 
can be computed in time O(Tg(n,n)). The other 
part, which is that the bichromatic closest pair 
problem is not harder than computing the MST, 
is easy to show: if one first computes an MST 
for the union of the 7 + m red and blue points, 
one can then find a closest bichromatic pair in 
linear time, because at least one such pair has to 
be connected by some edge of the MST. 

The algorithm proposed by Krznaric et al. [11] 
is based on the standard approach of joining trees 
in a forest with the shortest edge connecting two 
different trees, similar to the classical Kruskal’s 
and Prim’s MST algorithms for graphs. To reduce 
the number of candidates to be considered as 
edges of the MST, the algorithm works in a 
sequence of phases, where in each phase only 
edges of equal or similar length are considered, 
within a factor of 2. 

The initial forest is the set S' of points, that is, 
each point of the input constitutes an individual 
edgeless tree. Then, as long as there is more than 
one tree in the forest, two trees are merged by 
producing an edge connecting two nodes, one 
from each tree. After this procedure, the edges 
produced comprise a single tree that remains in 
the forest, and this tree constitutes the output of 
the algorithm. 

Assume that the next edge that the algorithm 
is going to produce has length /. Each tree T in 
the forest is partitioned into groups of nodes, each 
group having a specific node representing the 
group. The representative node in such a group 
is called a leader. Furthermore, every node in a 
group including the leader has the property that it 
lies within distance e - / from its leader, where ¢ 
is a real constant close to zero. 
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Instead of considering all pairs of nodes which 
can be candidates for the next edge to produce, 
first, only pairs of leaders are considered. Only if 
a pair of leaders belong to different trees and the 
distance between them is approximately /, then 
the closest pair of points between their two re- 
spective groups is computed, using the algorithm 
for the bichromatic closest pair problem. 

Also, the following invariant is maintained: for 
any phase producing edges of length @(/) and 
for any leader, there is only a constant number 
of other leaders at distance @(/). Thus, the total 
number of pairs of leaders to consider is only 
linear in the number of leaders. 

Nearby leaders for any given leader can be 
found efficiently by using bucketing techniques 
and data structures for dynamic closest pair 
queries [3], together with extra artificial points 
which can be inserted and removed for probing 
purposes at various small boxes at distance O(/) 
from the leader. In order to maintain the invariant, 
when moving to subsequent phases, one reduces 
the number of leaders accordingly, as pairs of 
nearby groups merge into single groups. Another 
tool which is also needed to consider the right 
types of pairs is to organize the groups according 
to the various directions in which there can be 
new candidate MST edges adjacent to nodes in 
the group. For details, please see the original 
paper by Krznaric et al. [11]. 

There is a special version of the bichromatic 
closest point problem which was shown by Krz- 
naric et al. [11] to have the same worst-case time 
complexity as computing an MST, namely, the 
problem for the special case when both the set 
of red points and the set of blue points have a 
very small diameter compared with the distance 
between the closest bichromatic pair. This ratio 
can be made arbitrarily small by choosing a suit- 
able ¢ as the parameter for creating the groups and 
leaders mentioned above. This fact was exploited 
in order to derive more efficient algorithms for 
the three-dimensional case. 

For example, in the L; metric, it is possible to 
build in time O(nlog 1) a special kind of a planar 
Voronoi diagram for the blue points on a plane 
separating the blue from the red points having 
the following property: for each query point g in 
the half-space including the red points, one can 
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use this Voronoi diagram to find in time O(log) 
the blue point which is closest to g under the 
L, metric. (This planar Voronoi diagram can 
be seen as defined by the vertical projections 
of the blue points onto the plane containing the 
diagram, and the size of a Voronoi cell depends 
on the distance between the corresponding blue 
point and the plane.) So, by using subsequently 
every red point as a query point for this data 
structure, one can solve the bichromatic closest 
pair problem for such well-separated red-blue 
sets in total O(nlogn) time. 

By exploiting and building upon this idea, 
Krznaric et al. [11] showed how to find an MST 
of S in optimal O(nmlogn) time under the L, 
and Loo metrics when d = 3. This is an im- 
provement over previous bounds due to Gabow 
et al. [10] and Bespamyatnikh [2], who proved 
that, for d = 3, an MST can be computed in 
O(nlog nloglogn) time under the L; and Leo 
metrics. 

The main results of Krznaric et al. [11] are 
summarized in the following theorem. 


Theorem In the algebraic computation tree 
model, for any fixed Lp metric and for any fixed 
number of dimensions, computing the MST has 
the same worst-case complexity, within constant 
factors, as solving the bichromatic closest pair 
problem. Moreover, for three-dimensional space 
under the L, and Lg metrics, the MST (as well 
as the bichromatic closest pair) can be computed 
in optimal O(nlog n) time. 


Approximate and Dynamic Solutions 


Callahan and Kosaraju [4] showed that a span- 
ning tree of length within a factor 1 + € from that 
of an MST can be computed in time O(n(logn + 
€~4/? Jog e~!)). Approximation algorithms with 
worse trade-off between time and quality had 
earlier been developed by Clarkson [7], Vaidya 
[14] and Salowe [13]. In addition, if the input 
point set is supported by certain basic data struc- 
tures, then the approximate length of an MST 
can be computed in randomized sublinear time 
[8]. Eppstein [9] gave fully dynamic algorithms 
that maintain an MST when points are inserted or 
deleted. 
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Applications 


MSTs belong to the most basic structures in com- 
putational geometry and in graph theory, with a 
vast number of applications. 


Open Problems 


Although the complexity of computing MSTs 
is settled in relation to computing bichromatic 
closest pairs, this means also that it remains 
open for all cases where the complexity of com- 
puting bichromatic closest pairs remains open, 
e.g., when the number of dimensions is greater 
than 3. 


Experimental Results 


Narasimhan and Zachariasen [12] have reported 
experiments with computing geometric MSTs via 
well-separated pair decompositions. More recent 
experimental results are reported by Chatterjee, 
Connor, and Kumar [5]. 
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Problem Definition 


The following classical optimization problem 
is considered: for a given undirected weighted 
geometric network, find its minimum-cost 
subnetwork that satisfies a priori given multi- 
connectivity requirements. This problem 
restricted to geometric networks is considered 
in this entry. 


Notations 
Let G = (V, E) be a geometric network, whose 
vertex set V corresponds to a set of n points 
in R®@ for certain integer d, d > 2 and whose 
edge set E corresponds to a set of straight-line 
segments connecting pairs of points in V. G is 
called complete if EF connects all pairs of points 
inV. 

The cost 5(x, y) of an edge connecting a pair 
of points x, y € R® is equal to the Euclidean dis- 
tance between points x and y, that is, d(x, y) = 


V4 — y;)*, where x = (X1,...,Xq) and 


y = ()1,---,¥a). More generally, the cost 
6(x, y) could be defined using other norms, such 
as £, norms for any p > l, ie., d(x,y) = 


Osi - yi)?) '/P The cost of the network is 
equal to the sum of the costs of the edges of the 
network, cost(G) = Diy yex 5(%, y). 

A network G = (V, E) spans aset S of points 
if V = S.G = (V,E) is k-vertex connected 
if for any set U C V of fewer than k vertices, 
the network (V \ U, EN ((V \U) x (V \ U))) 
is connected. Similarly, G is k-edge connected if 
for any set € C E of fewer than k edges, the 
network (V, E \ €) is connected. 
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The (Euclidean) minimum-cost  k-vertex- 
connected spanning network problem: for a 
given set S of 1 points in the Euclidean space 
R?, find a minimum-cost k-vertex-connected 
Euclidean network spanning points in S. 


The (Euclidean) minimum-cost  k-edge- 
connected spanning network problem: for a 
given set S of 1 points in the Euclidean space 
R?, find a minimum-cost k-edge-connected 
Euclidean network spanning points in S. 

A variant that allows parallel edges is also 
considered: 


The (Euclidean) minimum-cost  k-edge- 
connected spanning multi-network problem: 
for a given set S of n points in the Euclidean 
space R%, find a minimum-cost k-edge- 
connected Euclidean multi-network spanning 
points in S (where the multi-network can have 
parallel edges). 

The concept of minimum-cost k-connectivity 
naturally extends to include that of Euclidean 
Steiner k-connectivity by allowing the use of 
additional vertices, called Steiner points. For a 
given set S' of points in R¢, a geometric network 
G is a Steiner k-vertex connected (or Steiner k- 
edge connected) for S if the vertex set of G is a 
superset of S and for every pair of points from 
S there are k internally vertex-disjoint (edge- 
disjoint, respectively) paths connecting them in 


G. 


The (Euclidean) minimum-cost Steiner k- 
vertex/edge connectivity problem: find a 
minimum-cost network on a superset of S that is 
Steiner k-vertex/edge connected for S. 

Note that for k = 1, it is simply the Steiner 
minimal tree problem, which has been very ex- 
tensively studied in the literature (see, e.g., [15]). 

In a more general formulation of multi- 
connectivity graph problems, nonuniform 
connectivity constraints have to be satisfied. 


The survivable network design problem: for a 
given set S of points in R? and a connectiv- 
ity requirement function r SxS > N, 
find a minimum-cost geometric network span- 
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ning points in S such that for any pair of ver- 
tices p,q € S the subnetwork has rp in- 
ternally vertex-disjoint (or edge-disjoint, respec- 
tively) paths between p and q. 

In many applications of this problem, often 
regarded as the most interesting ones [10, 14], 
the connectivity requirement function is specified 
with the help of a one-argument function which 
assigns to each vertex p its connectivity type 
ry € N. Then, for any pair of vertices p,q € S, 
the connectivity requirement rp is simply given 
as min{rp, rg} (13, 14, 18, 19]. This includes the 
Steiner tree problem (see, e.g., [2]), in which 
rp € {0, 1} for any vertex p € S. 

A_ polynomial-time approximation scheme 
(PTAS) is a family of algorithms {A,} such that, 
for each fixed e > 0, A; runs in time polynomial 
in the size of the input and produces a (1 + ¢)- 
approximation. 


Related Work 

For a very extensive presentation of results 
concerning problems of finding minimum- 
cost k-vertex- and k-edge-connected spanning 
subgraphs, nonuniform connectivity, connectivity 
augmentation problems, and geometric problems, 
see [1,3,4, 12, 16]. 

Despite the practical relevance of the multi- 
connectivity problems for geometrical networks 
and the vast amount of practical heuristic results 
reported (see, e.g., [13, 14, 18, 19]), very little 
theoretical research had been done towards de- 
veloping efficient approximation algorithms for 
these problems until a few years ago. This con- 
trasts with the very rich and successful theoretical 
investigations of the corresponding problems in 
general metric spaces and for general weighted 
graphs. And so, until 1998, even for the sim- 
plest and most fundamental multi-connectivity 
problem that of finding a minimum-cost 2-vertex- 
connected network spanning a given set of points 
in the Euclidean plane, obtaining approximations 
achieving better than a 3 ratio had been elusive 
(the ratio 3 is the best polynomial-time approxi- 
mation ratio known for general networks whose 
weights satisfy the triangle inequality [9]; for 
other results, see, e.g., [5, 16]). 
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Key Results 


The first result is an extension of the well- 
known VP-hardness result of minimum-cost 
2-connectivity in general graphs (see, e.g., [11]) 
to geometric networks. 


Theorem 1 The problem of finding a minimum- 
cost 2-vertex-/2-edge-connected geometric net- 
work spanning a set of n points in the plane is 


NP-hard. 


Next result shows that if one considers the 
minimum-cost multi-connectivity problems in an 
enough high dimension, the problems become 
APX-hard. 


Theorem 2 ([6]) There exists a constant § > 0 
such that it is NP-hard to approximate within 
1 + & the minimum-cost 2-connected geometric 
network spanning a set of n points in R!'°22"1 


This result extends also to any £, norm. 


Theorem 3 ([6]) For integer d > logn and for 
any fixed p = 1, there exists a constant —§ > 0 
such that it is NP-hard to approximate within 
1 + & the minimum-cost 2-connected network 
spanning a set of n points in the £p metric in R?. 


Since the minimum-cost multi-connectivity 
problems are hard, the research turned into 
the study of approximation algorithms. By 
combining some of the ideas developed for the 
polynomial-time approximation algorithms for 
TSP due to Arora [2] (see also [17]) together 
with several new ideas developed specifically for 
the multi-connectivity problems in geometric 
networks, Czumaj and Lingas obtained the 
following results. 


Theorem 4 ([6,7]) Let k and d be any integers, 
k,d > 2, and let ¢ be any positive real. Let S bea 
set of n points in R®. There is a randomized algo- 


. ae Od) yx(kd/e)O) 
rithm that in time n-(logn) "4/9" «22° 


with probability at least 0.99 finds a k-vertex- 
connected (or k-edge-connected) spanning net- 
work for S whose cost is at most (1 + &)-time 
optimal. 

Furthermore, this algorithm can be deran- 
domized in polynomial time to return a k-vertex- 
connected (or k-edge-connected) spanning net- 
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work for S whose cost is at most (1 + €) times 
the optimum. 


Observe that when all d, k, and « are constant, 
then the running times are 7 - log?) n. 

The results in Theorem 4 give a PTAS for 
small values of k and d. 


Theorem 5 (PTAS for vertex/edge connectivity 
[6,7]) Letd > 2 be any constant integer. There is 
a certain positive constant c < 1 such that for all 
k such thatk < (log logn)°, the problems of find- 
ing a minimum-cost k-vertex-connected spanning 
network and a k-edge-connected spanning net- 
work for a set of points in R@ admit PTAS. 


The next theorem deals with multi-networks 
where feasible solutions are allowed to use paral- 
lel edges. 


Theorem 6 ({7]) Let k and d be any integers, 
k,d > 2, and let ¢ be any positive real. Let S 
be a set of n points in R4. There is a randomized 
algorithm that in time n - logn - (d/e)° +n - 
2k 44/24?) 


, with probability at least 0.99 
finds a k-edge-connected spanning multi-network 
for S whose cost is at most (1 + €&) times the 
optimum. The algorithm can be derandomized in 
polynomial time. 


Combining this theorem with the fact that par- 
allel edges can be eliminated in case k = 2, one 
obtains the following result for 2-connectivity in 
networks. 


Theorem 7 (Approximation schemes for 2- 
connected graphs, [7]) Let d be any integer, 
d > 2, and let ¢ be any positive real. Let S be 
a set of n points in R¢@. There is a randomized 
algorithm that in time n + logn + (d/e)°® + 
2 

n - 24/8) Winn probability at least 0.99 
finds a 2-vertex-connected (or 2-edge-connected) 
spanning network for S whose cost is at most 
(1 +68) times the optimum. This algorithm can be 
derandomized in polynomial time. 


For constant d, the running time of the 


randomized algorithms is nlogn - (1/e)°0 + 
90/2) 


Theorem 8 ([8]) Let d be any integer, d > 2, 
and let ¢ be any positive real. Let S be a set 
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of n points in R4. There is a randomized al- 
gorithm that in time n - logn - (d/e)°@ +n - 
o(d?) gio) 
24/e) +n-2? with probability at least 
0.99 finds a Steiner 2-vertex-connected (or 2- 
edge-connected) spanning network for S whose 
cost is at most (1 + €) times the optimum. This al- 
gorithm can be derandomized in polynomial time. 


Theorem 9 ([8]) Let d be any integer, d > 2, 
and let ¢ be any positive real. Let S be a set of 
n points in R¢. There is a randomized algorithm 

2 
that in time n-logn-(d/¢)° +n 20d/eyE 4. 
qd) 


with probability at least 0.99 gives a 
(1 + €)-approximation for the geometric network 
survivability problem with ry € {0, 1,2} for any 
v € V. This algorithm can be derandomized in 
polynomial time. 


Applications 


Multi-connectivity problems are central in algo- 
rithmic graph theory and have numerous applica- 
tions in computer science and operation research, 
see, e.g., [1, 12, 14, 19]. They also play very 
important role in the design of networks that 
arise in practical situations, see, e.g., [1, 14]. 
Typical application areas include telecommuni- 
cation, computer, and road networks. Low degree 
connectivity problems for geometrical networks 
in the plane can often closely approximate such 
practical connectivity problems (see, e.g., the dis- 
cussion in [14, 18, 19]). The survivable network 
design problem in geometric networks also arises 
in many applications, e.g., in telecommunication, 
communication network design, VLSI design, 
etc. [13, 14, 18, 19]. 


Open Problems 


The results discussed above lead to efficient algo- 
rithms only for small connectivity requirements 
k; the running time is polynomial only for the 
value of k up to (loglogz)*° for certain positive 
constant c < 1. It is an interesting open problem 
if one can obtain polynomial-time approximation 
scheme algorithms also for large values of k. 
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It is also an interesting open problem if the 
multi-connectivity problems in geometric net- 
works can have practically fast approximation 
schemes. 
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Problem Definition 


The minimum spanning tree (MST) problem is, 
given a connected, weighted, and undirected 
graph G = (V,£,w), to find the tree with 
minimum total weight spanning all the vertices 
V. Here, w: E — R is the weight function. The 
problem is frequently defined in geometric terms, 
where V is a set of points in d-dimensional space 
and w corresponds to Euclidean distance. The 
main distinction between these two settings is the 
form of the input. In the graph setting, the input 
has size O(m-+n) and consists of an enumeration 
of the n = |V| vertices and m = |E| edges and 
edge weights. In the geometric setting, the input 
consists of an enumeration of the coordinates of 


each point (O(dn) space): all @ edges are 


implicitly present and their weights implicit in 
the point coordinates. See [16] for a discussion of 
the Euclidean minimum spanning tree problem. 


History 

The MST problem is generally recognized [7, 12] 
as one of the first combinatorial problems studied 
specifically from an algorithmic perspective. It 
was formally defined by Bortvka in 1926 [1] 
(predating the fields of computability theory and 
combinatorial optimization and even much of 
graph theory), and since his initial algorithm, 
there has been a sustained interest in the problem. 
The MST problem has motivated research in 
matroid optimization [3] and the development 
of efficient data structures, particularly priority 
queues (aka heaps) and disjoint set structures 
[2, 18]. 


Related Problems 

The MST problem is frequently contrasted with 
the traveling salesman and minimum Steiner tree 
problems [6]. A Steiner tree is a tree that may 
span any superset of the given points; that is, 
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additional points may be introduced that reduce 
the weight of the minimum spanning tree. The 
traveling salesman problem asks for a tour (cycle) 
of the vertices with minimum total length. The 
generalization of the MST problem to directed 
graphs is sometimes called the minimum branch- 
ing [5]. Whereas the undirected and directed 
versions of the MST problem are solvable in 
polynomial time, traveling salesman and mini- 
mum Steiner tree are NP-complete [6]. 


Optimality Conditions 
A cut is a partition (V’, V”) of the vertices V. An 
edge (u, v) crosses the cut (V’, V”) if u € V’ and 
v € V". A sequence (v9, ¥1,...,Ug—1, V0) iS a 
cycle if (Ui, U;41¢ mod x) € £ for0 <i < k. 
The correctness of all MST algorithms is es- 
tablished by appealing to the dual cut and cycle 
properties, also known as the blue rule and red 
rule [18]. 


Cut Property An edge is in some minimum span- 
ning tree if and only if it is the lightest edge 
crossing some cut. 

Cycle Property An edge is not in any minimum 
spanning tree if and only if it is the sole 
heaviest edge on some cycle. 


It follows from the cut and cycle properties 
that if the edge weights are unique, then there 
is a unique minimum spanning tree, denoted 
MST (G). Uniqueness can always be enforced 
by breaking ties in any consistent manner. 
MST algorithms frequently appeal to a useful 
corollary of the cut and cycle properties called 
the contractibility property. Let G\C denote 
the graph derived from G by contracting the 
subgraph C, that is, C is replaced by a single 
vertex c and all edges incident to exactly one 
vertex in C become incident to c; in general, 
G\C may have more than one edge between two 


vertices. 

Contractibility Property If C is a subgraph such 
that for all pairs of edges e and f with exactly 
one endpoint in C, there exists a path P C C 
connecting e f with each edge in P lighter 
than either e or f, then C is contractible. For 
any contractible C, it holds that MST (G) = 
MST(C) UMST(G\C). 
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The Generic Greedy Algorithm 

Until recently, all MST algorithms could be 
viewed as mere variations on the following 
generic greedy MST algorithm. Let 7 consist 
initially of m trivial trees, each containing a 
single vertex of G. Repeat the following step 
n — 1 times. Choose any T € Vand find the 
minimum weight edge (u,v) with u € T and v 
in a different tree, say T’ € 7. Replace T and T’ 
in 7 with the single tree T U {(u, v)} U T’. After 
n — | iterations, 7 = {MST(G)}. By the cut 
property, every edge selected by this algorithm is 
in the MST. 


Modeling MST Algorithms 

Another corollary of the cut and cycle properties 
is that the set of minimum spanning trees of a 
graph is determined solely by the relative order 
of the edge weights — their specific numerical 
values are not relevant. Thus, it is natural to 
model MST algorithms as binary decision trees, 
where nodes of the decision tree are identified 
with edge weight comparisons and the children of 
anode correspond to the possible outcomes of the 
comparison. In this decision tree model, a trivial 
lower bound on the time of the optimal MST 
algorithm is the depth of the optimal decision 
tree. 


Key Results 


The primary result of [14] is an explicit MST 
algorithm that is provably optimal even though its 
asymptotic running time is currently unknown. 


Theorem 1 There is an explicit, deterministic 
minimum spanning tree algorithm whose running 
time is on the order of Dusr(m,n), where m is the 
number of edges, n the number of vertices, and 
Dwysr(m,n) the maximum depth of an optimal 
decision tree for any m-edge n-node graph. 


It follows that the Pettie-Ramachandran algo- 
rithm [14] is asymptotically no worse than any 
MST algorithm that deduces the solution through 
edge weight comparisons. The best known upper 
bound on Dystr(m,n) is O(ma(m,n)), due to 
Chazelle [2]. It is trivially Q(m). 
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Let us briefly describe how the Pettie- 
Ramachandran algorithm works. An (m,n) 
instance is a graph with m edges and n vertices. 
Theorem | is proved by giving a linear time 
decomposition procedure that reduces any (m,n) 
instance of the MST problem to instances of 
(m*,n*),(my,"1),...,(Ms,Ns), where 

m= + Ym,n = Yoni, n* < 


size 
m= 


n/ log log logn, ad each nj < “Teele login 
The (m*,n*)instance can be solved in O(m +n) 
time with existing MST algorithms [2]. To solve 
the other instances, the Pettie-Ramachandran 
algorithm performs a brute-force search to find 
the minimum depth decision tree for every 
graph with at most log log log n vertices. Once 
these decision trees are found, the remaining 
instances are solved in O()> Dust (mi,ni)) = 


O(Dmst(m,n)) time. Due D the restricted size 
of these instances (n; < logloglogz), the time 
for a brute-force search is a negligible o(n). 
The decomposition procedure makes use of 
Chazelle’s soft heap [2] (an approximate priority 
queue) and an extension of the contractibility 


property. 


Approximate Contractibility Let G’ be derived 
from G by increasing the weight of some 
edges. If C is contractible w.r.t. G’, then 
MST (G) = MST(MST(C) UMST(G\C)U 
E*), where E*is the set of edges with in- 
creased weights. 


A secondary result of [14] is that the running 
time of the optimal algorithm is actually linear on 
nearly every graph topology, under any permuta- 
tion of the edge weights. 


Theorem 2 Let G be selected uniformly at ran- 
dom from the set of all n-vertex, m-edge graphs. 
Then regardless of the edge weights, MST (G) 
can be found in O(m + n) time with probability 
1 — 2720/02) Where aw = a(m,n) is the slowly 
growing inverse Ackermann function. 


Theorem | should be contrasted with the results 
of Karger, Klein, and Tarjan [9] and Chazelle [2] 
on the randomized and deterministic complexity 
of the MST problem. 


Minimum Spanning Trees 


Theorem 3 ([9]) The minimum spanning forest 
of a graph with m edges can be computed by a 
randomized algorithm in O(m) time with proba- 
bility 1 — 2-20), 


Theorem 4 ((2]) The minimum spanning tree of 
a graph can be computed in O(ma(m,n)) time 
by a deterministic algorithm, where a is the 
inverse Ackermann function. 


Applications 


Bortvka [1] invented the MST problem while 
considering the practical problem of electrify- 
ing rural Moravia (present-day Czech Republic) 
with the shortest electrical network. MSTs are 
used as a starting point for heuristic approxi- 
mations to the optimal traveling salesman tour 
and optimal Steiner tree, as well as other net- 
work design problems. MSTs are a component 
in other graph optimization algorithms, notably 
the single-source shortest path algorithms of Tho- 
rup [19] and Pettie-Ramachandran [15]. MSTs 
are used as a tool for visualizing data that is 
presumed to have a tree structure; for example, 
if a matrix contains dissimilarity data for a set 
of species, the minimum spanning tree of the 
associated graph will presumably group closely 
related species; see [7]. Other modern uses of 
MSTs include modeling physical systems [17] 
and image segmentation [8]; see [4] for more 
applications. 


Open Problems 


The chief open problem is to determine the de- 
terministic complexity of the minimum spanning 
tree problem. By Theorem |, this is tantamount 
to determining the decision tree complexity of the 
MST problem. 


Experimental Results 


Moret and Shapiro [11] evaluated the perfor- 
mance of greedy MST algorithms using a variety 
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of priority queues. They concluded that the best 
MST algorithm is Jarnik’s [7] (also attributed 
to Prim and Dijkstra; see [3, 7, 12]) as imple- 
mented with a pairing heap [13]. Katriel, Sanders, 
and Traff [10] designed and implemented a non- 
greedy randomized MST algorithm based on that 
of Karger et al. [9]. They concluded that on 
moderately dense graphs, it runs substantially 
faster than the greedy algorithms tested by Moret 
and Shapiro. 
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Problem Definition 


Given a set S of n points in the Euclidean 
plane, a triangulation T of S is a maximal set 
of nonintersecting straight-line segments whose 
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endpoints are in S. The weight of T is defined 
as the total Euclidean length of all edges in T. 
A triangulation that achieves minimum weight 
is called a minimum weight triangulation, often 
abbreviated MWT, of S. 


Key Results 


Since there is a very large number of papers and 
results dealing with minimum weight triangula- 
tion, only relatively very few of them can be 
mentioned here. 

Mulzer and Rote have shown that MWT is NP- 
hard [12]. Their proof of NP-completeness is not 
given explicitly; it relies on extensive calculations 
which they performed with a computer. Remy 
and Steger have shown a quasi-polynomial time 
approximation scheme for MWT [13]. These re- 
sults are stated in the following theorem: 


Theorem 1 The problem of computing the MWT 
(minimum weight triangulation) of an input set 
S of n points in the plane is NP-hard. However, 
for any constant € > 0, a triangulation of S 
achieving the approximation ratio of 1 + €, for 
an arbitrarily small positive constant €, can be 
computed in ti O(log n) 

puted in time n 3 


The complexity status of the symmetric prob- 
lem of finding the maximum weight triangulation 
is still open, but there exists a quasi-polynomial 
time approximation scheme for it [10]. 


The Quasi-Greedy Triangulation 
Approximates the MWT 

Levcopoulos and Krznaric showed that a trian- 
gulation of total length within a constant factor 
of MWT can be computed in polynomial time 
for arbitrary point sets [7]. The triangulation 
achieving this result is a modification of the so- 
called greedy triangulation. The greedy triangu- 
lation starts with the empty set of diagonals and 
keeps adding a shortest diagonal not intersecting 
the diagonals which have already been added, 
until a full triangulation is produced. The greedy 
triangulation has been shown to approximate the 
minimum weight triangulation within a constant 
factor, unless a special case arises where the 
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greedy diagonals inserted are “climbing” in a 
special, very unbalanced way along a relatively 
long concave chain containing many vertices and 
with a large empty space in front of it, at the same 
time blocking visibility from another, opposite 
concave chain of many vertices. In such “bad” 
cases, the worst-case ratio between the length 
of the greedy and the length of the minimum 
weight triangulation is shown to be © (,/n). 
To obtain a triangulation which always approx- 
imates the MWT within a constant factor, it 
suffices to take care of this special bad case in 
order to avoid the unbalanced “climbing,” and 
replace it by a more balanced climbing along 
these two opposite chains. Each edge inserted 
in this modified method is still almost as short 
as the shortest diagonal, within a factor smaller 
than 1.2. Therefore, the modified triangulation 
which always approximates the MWT is named 
the quasi-greedy triangulation. In a similar way 
as the original greedy triangulation, the quasi- 
greedy triangulation can be computed in time 
O(n logn) [8]. Gudmundsson and Levcopoulos 
[5] showed later that a variant of this method 
can also be parallelized, thus achieving a constant 
factor approximation of MWT in O(log7) time, 
using O(n) processors in the CRCW PRAM 
model. Another by-product of the quasi-greedy 
triangulation is that one can easily select in linear 
time a subset of its edges to obtain a convex 
partition which is within a constant factor of the 
minimum length convex partition of the input 
point set. This last property was crucial in the 
proof that the quasi-greedy triangulation approx- 
imates the MWT. The proof also uses an older 
result that the (original, unmodified) greedy trian- 
gulation of any convex polygon approximates the 
minimum weight triangulation [9]. Some of the 
results from [7] and from [8] can be summarized 
in the following theorem: 


Theorem 2 Let S be an input set of n points 
in the plane. The quasi-greedy triangulation 
of S, which is a slightly modified version of 
the greedy triangulation of S, has total length 
within a constant factor of the length of the MWT 
(minimum weight triangulation) of S and can 
be computed in time O(nlogn). Moreover, the 
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(unmodified) greedy triangulation of S has length 
within O (Vn) of the length of MWT of S, and 
this bound is asymptotically tight in the worst 
case. 


Computing the Exact Minimum Weight 
Triangulation 

Below, three approaches to compute the exact 
MWT are shortly discussed. These approaches 
assume that it is numerically possible to effi- 
ciently compare the total length of sets of line 
segments in order to select the set of smallest 
weight. This is a simplifying assumption, since 
this is an open problem per se. However, the prob- 
lem of computing the exact MWT remains NP- 
hard even under this assumption [12]. The three 
approaches differ with respect to the creation and 
selection of subproblems, which are then solved 
by dynamic programming. 

The first approach, sketched by Lingas [11], 
employs a general method for computing optimal 
subgraphs of the complete Euclidean graph. By 
developing this approach, it is possible to achieve 
subexponential time 20(Vvm08) The idea is to 
create the subproblems which are solved by dy- 
namic programming. This is done by trying all 
(suitable) planar separators of length O (Vn), 
separating the input point set in a balanced way, 
and then to proceed recursively within the result- 
ing subproblems. 

The second approach uses fixed-parameter 
algorithms. So, for example, if there are only 
O(log n) points in the interior of the convex hull 
of S, then the MWT of S can be computed in 
polynomial time [4]. This approach extends also 
to compute the minimum weight triangulation 
under the constraint that the outer boundary 
is not necessarily the convex hull of the input 
vertices; it can be an arbitrary polygon. Some 
of these algorithms have been implemented; see 
Grantson et al. [2] for a comparison of some 
implementations. These dynamic programming 
approaches take typically cubic time with respect 
to the points of the boundary but exponential time 
with respect to the number of remaining points. 
So, for example, if k is the number of hole points 
inside the boundary polygon, then an algorithm, 


1327 


which has also been implemented, can compute 
the exact MWT in time O(n3 - 2* - k) [2]. 

In an attempt to solve larger problems, a dif- 
ferent approach uses properties of MWT which 
usually help to identify, for random point sets, 
many edges that must be, respectively cannot be, 
in MWT. One can then use dynamic program- 
ming to fill in the remaining MWT edges. For 
random sets consisting of tens of thousands of 
points from the uniform distribution, one can thus 
compute the exact MWT in minutes [1]. 


Applications 


The problem of computing a triangulation arises, 
for example, in finite element analysis, terrain 
modeling, stock cutting, and numerical approxi- 
mation [3,6]. The minimum weight triangulation 
has attracted the attention of many researchers, 
mainly due to its natural definition of optimality, 
and because it has proved to be a challenging 
problem over the past 30 years, with unknown 
complexity status until the end of 2005. 


Open Problems 


All results mentioned leave open problems. 
For example, can one find a simpler proof 
of NP-completeness, which can be checked 
without running computer programs? It would 
be desirable to improve the approximation 
constant which can be achieved in polynomial 
time (to simplify the proof, the constant 
shown in [7] is not explicitly calculated and it 
would be relatively large, if the proof is not 
refined). The time bound for the approximation 
scheme could hopefully be improved. It could 
also be possible to refine the software which 
computes efficiently the exact MWT for large 
random point sets, so that it can handle 
efficiently a wider range of input, i.e., not 
only completely random point sets. This could 
perhaps be done by combining this software with 
implementations of fixed-parameter algorithms, 
as the ones reported in [2,4], or with other 
approaches. It is also open whether or not the 
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subexponential exact method can be further 
improved. 


Experimental Results 


Please see the last paragraph under the section 
about key results. 


URL to Code 


Link to code used to compare some dynamic 
programming approaches in [2]: http://fuzzy.cs. 
unimagdeburg.de/~borgelt/pointgon.html 
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Problem Definition 


The minimum weighted completion time prob- 
lem involves (i) a set J of n jobs, a positive 
weight w; for each job 7 € J, and a release 
date r; before which it cannot be scheduled; 
(ii) a set of m machines, each of which can 
process at most one job at any time; and (iii) an 
arbitrary set of positive values { p;,;}, where pj, ; 
denotes the time to process job 7 on machine’. A 
schedule involves assigning jobs to machines and 
choosing an order in which they are processed. 
Let C; denote the completion time of job j fora 
given schedule. The weighted completion time of 
a schedule is defined as }° w;C;, and the goal 
jet 


is to compute a schedule that has the minimum 
weighted completion time. 

In the scheduling notation introduced by Gra- 
ham et al. [8], a scheduling problem is denoted 
by a 3-tuple a|B|y, where a denotes the machine 
environment, § denotes the additional constraints 
on jobs, and y denotes the objective function. 
In this article, we will be concerned with the 
a-values 1, P, R, and Rm, which respectively 
denote one machine, identical parallel machines 
(.e., fora fixed job 7 and for each machine /, pj, ; 
equals a value p; that is independent of 7), unre- 
lated machines (the p;,; ’s are dependent on both 
job i and machine /), and a fixed number m (not 
part of the input) of unrelated machines. The field 
B takes on the values r;, which indicates that the 
jobs have release dates, and the value pmtn, which 
indicates that preemption of jobs is permitted. 
Further, the value prec in the field 6 indicates that 
the problem may involve precedence constraints 
between jobs, which poses further restrictions 
on the schedule. The field y is either }* w;C; 
or )> C;, which denote total weighted and total 
(unweighted) completion times, respectively. 

Some of the simpler classes of the weighted 
completion time scheduling problems admit 
optimal polynomial-time solutions. They include 
the problem P|| 5° C;, for which the shortest- 
job-first strategy is optimal, the problem 
1 || Sow;C;, for which Smith’s rule [14] 
(scheduling jobs in their nondecreasing order 
of p;/w, values) is optimal, and the problem 
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R|| >> C;, which can be solved via matching 
techniques [3, 10]. With the introduction of 
release dates, even the simplest classes of the 
weighted completion time minimization problem 
becomes strongly nondeterministic polynomial- 
time (NP)-hard. In this article, we focus on 
the work of Afrati et al. [1], whose main 
contribution is the design of polynomial-time 
approximation schemes (PTASs) for several 
classes of scheduling problems to minimize 
weighted completion time with release dates. 
Prior to this work, the best solutions for 
minimizing weighted completion time with 
release dates were all O(1)-approximation 
algorithms (e.g., [5, 6, 12]); the only known 
PTAS for a strongly NP-hard problem involving 
weighted completion time was due to Skutella 
and Woeginger [13], who developed a PTAS 
for the problem P|| }° w,;Cj;. For an excellent 
survey on the minimum weighted completion 
time problem, we refer the reader to Chekuri 
and Khanna [4]. Another important objective 
is the flow time, which is a generalization of 
completion time; a recent breakthrough of Bansal 
and Kulkarni shows how to approximate the total 
flow time and maximum flow time to within 
polylgarithmic factors [2]. 


Key Results 


Afrati et al. [1] were the first to develop PTASs 
for weighted completion time problems involving 
release dates. We summarize the running times of 
these PTASs in Table 1. 

The results presented in Table | were obtained 
through a careful sequence of input transforma- 
tions followed by dynamic programming. The in- 
put transformations ensure that the input becomes 
well structured at a slight loss in optimality, while 
dynamic programming allows efficient enumera- 
tion of all the near-optimal solutions to the well- 
structured instance. 

The first step in the input transformation is 
geometric rounding, in which the processing 
times and release dates are converted to powers 
of 1 + € with at most 1 + € loss in the overall 
performance. More significantly, the step (i) 
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Minimum Weighted Completion Time, Table 1 Summary of results of Afrati et al. [1] 


Problem Running time of polynomial-time approximation schemes 
1 rj| wy C; O (2re(e)n +nlogn) 

Plrj|DwsC; O (ant 1) n + nlogn) 

P|rj,pmin | Yow; C; (27 n + nlog n) 


Rm Irj| ow ;C; 
Rm |r; pmin | Yow; C; 
Rm || Dowj Cj 


ensures that there are only a small number of 
distinct processing times and release dates to 
deal with, (ii) allows time to be broken into 
geometrically increasing intervals, and (iii) aligns 
release dates with start and end times of intervals. 
These are useful properties that can be exploited 
by dynamic programming. 

The second step in the input transformation is 
time stretching, in which small amounts of idle 
time are added throughout the schedule. This step 
also changes completion times by a factor of at 
most 1 + O(e) but is useful for cleaning up the 
scheduling. Specifically, if a job is large (i.e., 
occupies a large portion of the interval where it 
executes), it can be pushed into the idle time of 
a later interval where it is small. This ensures 
that most jobs have small sizes compared with the 
length of the intervals where they execute, which 
greatly simplifies schedule computation. The next 
step is job shifting. Consider a partition of the 
time interval [0,co) into intervals of the form 
I, = [0 + ©)*, (1 + €)*F!) for integral values 
of x. The job-shifting step ensures that there is a 
slightly suboptimal schedule in which every job j 
gets completed within O(log, , 1+ +)) intervals 
after r ;. This has the following nice property: If 
we consider blocks of intervals Bo, By,..., with 
each block 6; containing O(log, , <1 + +)) con- 
secutive intervals, then a job / starting in block 
8; completes within the next block. Further, the 
other steps in the job-shifting phase ensure that 
there are not too many large jobs which spill 
over to the next block; this allows the dynamic 
programming to be done efficiently. 

The precise steps in the algorithms and their 
analysis are subtle, and the above description is 


clearly an oversimplification. We refer the reader 
to [1] or [4] for further details. 


Applications 


A number of optimization problems in parallel 
computing and operations research can be for- 
mulated as machine scheduling problems. When 
precedence constraints are introduced between 
jobs, the weighted completion time objective can 
generalize the more commonly studied makespan 
objective and hence is important. 


Open Problems 


Some of the major open problems in this area are 
to improve the approximation ratios for schedul- 
ing on unrelated or related machines for jobs 
with precedence constraints. The following prob- 
lems in particular merit special mention. The 
best known solution for the 1 | prec | }>w;C; 
problem is the 2-approximation algorithm due to 
Hall et al. [9]; improving upon this factor is a 
major open problem in scheduling theory. The 
problem R |prec| }>w;Cj; in which the prece- 
j 


dence constraints form an arbitrary acyclic graph 
is especially open — the only known results in 
this direction are when the precedence constraints 
form chains [7] or trees [11]. 

The other open direction is inapproximability 
— there are significant gaps between the known 
approximation guarantees and hardness factors 
for various problem classes. For instance, the 
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R|| ow; C; and RIr;| >> w;C; are both known 
to be approximable-hard, but the best known 
algorithms for these problems (due to Skutella 
[12]) have approximation ratios of 3/2 and 2, 
respectively. Closing these gaps remains a signif- 
icant challenge. 
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Problem Definition 


The min sum set cover (MSSC) problem is a 
latency version of the set cover problem. The 
input to MSSC consists of a collection of sets 
{Sitiefm] Over a universe of elements [n] := 
{1,2,3,...,m}. The goal is to schedule elements, 


1332 


one at a time, to hit all sets as early on average 
as possible. Formally, we would like to find a 
permutation z : [n] — |n] of the elements [n] 
(z(Z) is the ith element in the ordering) such that 
the average (or equivalently total) cover time of 
the sets {S;};e[m] is minimized. The cover time 
of a set S; is defined as the earliest time t such 
that 2(t) € S;. For convenience, we will say that 
we schedule/process element (i) at time 7. 

Since MSSC was introduced in [4], several 
generalizations have been studied. Here we dis- 
cuss two of them. In the generalized min sum set 
cover (GMSSC) problem [2], each set S; has a 
requirement x;. In this generalization, a set S; is 
covered at the first time t when x; elements are 
scheduled from Sj, i.e., |{7 (1), 2(2),..., 7(t)}/N 
S;| > «;. Note that MSSC is a special case of 
GMSSC when «; = 1 for alli € [n]. 

Another interesting generalization is sub- 
modular ranking (SR) [1]. In SR, each set S; 
is replaced with a nonnegative and monotone 
submodular function f; : 2!) —> [0,1] with 
Fi ({n]) = 1; function f is said to be submodular 
if f(AU B) + f(AN B) < f(A) + f(B) for 
all A, B C [n] and monotone if f(A) < f(B) 
for all A C B. The cover time of each function 
fi. is now defined as the earliest time ¢ such 
that fi({7z(1),7(2),...,7()}) = 1. Note 
that GMSSC is a special case of SR when 
fi(A) = min{|S; 9 Al/«;, 1}. Also it is worth 
noting that SR generalizes set cover. 


Key Results 


We summarize main results known for MSSC, 
GMSSC, and SR. 


Theorem 1 ([4]) There is a 4-approximation for 
MSSC, and there is a matching lower bound 4—€ 
unless P = NP. 


Interestingly, the tight 4-approximation was 
achieved by a very simple greedy algorithm that 
schedules an element at each time that covers the 
largest number of uncovered sets. The analysis 
in [4] introduced the notion of “histograms”; see 
below for more detail. 
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Theorem 2 ([3, 6, 7]) There is an 
approximation for GMSSC. 


O(1)- 


Azar and Gamzu gave the first nontrivial ap- 
proximation for GMSSC whose guarantee was 
O(log max; k;) [2]. The analysis was also based 
on histograms and was inspired by the work 
in [4]. Bansal et al. [3] showed that the anal- 
ysis of the greedy algorithm in [2] is essen- 
tially tight and used a linear programming relax- 
ation and randomized rounding to give the first 
O(1)-approximation for GMSSC; the precise ap- 
proximation factor obtained was 485. The LP 
used in [3] was a time-indexed LP strengthened 
with knapsack covering inequalities. The round- 
ing procedure combined threshold rounding and 
randomized “boosted-up” independent rounding. 
The approximation was later improved to 28 by 
[7], subsequently to 12.4 by [6]. The key idea for 
these improvements was to use a@-point rounding 
to resolve conflicts between elements, which is 
popular in the scheduling literature. 


Theorem 3 ([{1, 5]) There is an O(log(1/e))- 
approximation for SR where € is the minimum 
marginal positive increase of any function fi. 


Note that this result immediately implies an 
O(log max; k; )-approximation for GMSSC. The 
algorithm in [1] is an elegant greedy algorithm 
which schedules an element e at time ¢ with the 
maximum >, (fi(AU {e}) — fi(A))/(— fi(A)) 
—here A denotes all elements scheduled by time 
t—1,andif f;(A) = 1, then fj is excluded from 
the summation. Note that this algorithm becomes 
the greedy algorithm in [4] for the special case 
of MSSC. The analysis was also based on his- 
tograms. Later, Im et al. [5] gave an alternative 
analysis of this greedy algorithm which was in- 
spired by the analysis of other latency problems. 
We note that the algorithm that schedules element 
e that gives the maximum total marginal increase 
of { f;} has a very poor approximation guarantee, 
as observed in [1]. 

As we discussed above, there are largely 
three analysis techniques used in this line 
of work: histogram-based analysis, latency 
argument-based analysis, and LP rounding. 
We will sketch these techniques following 
[3-5] closely — we chose these papers since 
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they present the techniques in a relatively 
simpler way, though they do not necessarily 
give the best approximation guarantees or most 
general results. We begin with the analysis tools 
developed for greedy algorithms. To present 
key ideas more transparently, we will focus on 
MSSC. 


Histogram-Based Analysis 

We sketch the analysis of the 4-approximation in 
[4]. Let R; denote the uncovered sets at time ¢ and 
N, the sets that are first covered at time t. Observe 
that Dein |R;| is the algorithm’s total cover 
time. In the analysis, we represent the optimal 
and the algorithm’s solutions using histograms. 
First, in the optimal solution’s histogram sets 
are ordered in increasing order of their cover 
times, and set S; has width 1 and height equal 
to its cover time. In the algorithm’s solution, as 
before, sets are ordered in increasing order of 
their cover times, but set S; has height equal to 
|R;|/|Nz| where ¢ is S;’s cover time. Here, the 
increase in the algorithm’s objective at time f is 
uniformly distributed to sets N; that are newly 
covered at time ¢. Note that the areas of both 
histograms are equal to the optimal cost and the 
algorithm’s cost, respectively. Then we can show 
that after shrinking the algorithm’s histogram by 
a factor of 2, both horizontally and vertically, 
one can place it completely inside the optimal 
solution’s histogram. This analysis is very simple 
and is based on a clever observation on the greedy 
solution’s structure. This type of analysis was 
also used in [1,2]. 


min > > (1 — yit) 


te[n]ie[m] 
s.t. > Xet = 1 
te[n] 
ye Xet = 1 
e€[n] 


YDS mer = (Ki — JAN) + vit 


e€S;\A 1<t’/<t 
O<y<l 


x>0. 
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Latency Argument-Based Analysis 

This analysis does not seem to yield tight ap- 
proximation guarantees, but could be more flex- 
ible since it does not compare two histograms 
directly. The key idea is to show that if we 
can’t charge the number of uncovered sets in 
our algorithm’s schedule at time ¢ to the anal- 
ogous number in the optimal schedule, then our 
algorithm must have covered a lot of sets during 
the time interval [t/2,¢]. In other words, if our 
algorithm didn’t make enough progress recently, 
then our algorithm’s current status can be shown 
to be comparable to the optimal solution’s status. 
Intuitively, if our algorithm is not comparable 
to the optimal solution, then the algorithm can 
nearly catch up with the optimal solution by 
following the choices the optimal solution has 
made. For technical reasons, we may have to 
compare our algorithm’s status to the optimal 
solution’s earlier status. This analysis is easily 
generalized to GMSSC, SR, and more general 
metric settings [5]. 

We now discuss the linear programming- 
based approach. Bansal et al. discussed why 
greedy algorithms are unlikely to yield an O(1)- 
approximation for GMSSC [3]. 


LP and Randomized Rounding 

Consider the following time-indexed integer pro- 
gram (IP) used in [3]: variable x; is an indicator 
variable that is 1 if element e is scheduled at time 
t, otherwise 0. Variable y;; is 1 if S; is covered 
by time t, otherwise 0. The IP is relaxed into an 
LP by allowing x, y to be fractional. 


Ve € [n] 


Vt € [n] 


Vi €[m],A C S;,t € [n] 
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Note that for integral solutions, the objective 
is exactly the total cover time since each set S; 
uncovered at time ¢ adds | to the objective. The 
first two constraints say that every element must 
be scheduled and exactly one element must be 
scheduled at a time. If we use the most natural 
constraint )’ees. )i<yr<r Xer’ = Ki dir (it says 
that if set S; is covered by time ¢, then there 
must be at least x; elements scheduled from S; 
by time f), the LP has a large integrality gap [3]. 
Hence, [3] strengthened the LP with the above 
knapsack covering inequalities. There is an easy 
separation oracle for the last constraint; hence we 
can solve the LP in polynomial time. The analysis 
is done by showing that the expected cover time 
of each set S; is at most O(1) factor larger than 
the earliest time t when the set S; is covered by 
the LP by at least a half, ie., yj; > 1/2. This 
is sufficient to give an O(1)-approximation since 
the LP pays at least t/2 for set S;. 


Applications 


The MSSC problem and its closely related prob- 
lems have various applications in adaptive query 
processing and distributed resource allocation 
problems. Also GMSSC has applications in 
Web page ranking and broadcast scheduling. 
For details, see [1,4]. Perhaps it would be no 
stretch to say that min sum set cover problems 
are at least loosely connected to all problems 
whose goal is to satisfy multiple demands with 
the overall minimum latency. 


Open Problems 


An outstanding open problem is to settle the ap- 
proximability of GMSSC. As mentioned before, 
GMSSC captures MSSC (all «; = 1), for which 
there is a tight 4-approximation known [4]. The 
other extreme case is when x; = |S;| for all 
i. This problem is essentially equivalent to a 
classic precedence constrained scheduling prob- 
lem 1 |prec| }); w; Cj for which there are several 
2-approximations known; see [3] for pointers. 
However, the current best approximation factor 
known for GMSSC is 12.4. Im et al. conjectured 
that GMSSC admit a 4-approximation [6]. 


Misra-Gries Summaries 


Recommended Reading 


1. Azar Y, Gamzu I (2011) Ranking with submodular 
valuations. In: SODA, San Francisco, pp 1070-1079 

2. Azar Y, Gamzu I, Yin X (2009) Multiple intents re- 
ranking. In: STOC, Bethesda, pp 669-678 

3. Bansal N, Gupta A, Krishnaswamy R (2010) A con- 
stant factor approximation algorithm for generalized 
min-sum set cover. In: SODA, Austin, pp 1539-1545 

4. Feige U, Lovasz L, Tetali P (2004) Approximating min 
sum set cover. Algorithmica 40(4):219-234 

5. Im S, Nagarajan V, van der Zwaan R (2012) Minimum 
latency submodular cover. In: ICALP (1), Warwick, 
pp 485-497 

6. Im S, Sviridenko M, Zwaan R (2014) Preemptive and 
non-preemptive generalized min sum set cover. Math 
Program 145(1—2):377-401 

7. Skutella M, Williamson DP (2011) A note on the 
generalized min-sum set cover problem. Oper Res Lett 
39(6):433-436 


Misra-Gries Summaries 


Graham Cormode 
Department of Computer Science, University of 
Warwick, Coventry, UK 


Keywords 


Approximate counting; Frequent items; Stream- 
ing algorithms 


Years and Authors of Summarized 
Original Work 


1982; Misra, Gries 


Problem Definition 


The frequent items problem is to process a stream 
of items and find all items occurring more than 
a given fraction of the time. It is one of the 
most heavily studied problems in data stream 
algorithms, dating back to the 1980s. Many ap- 
plications rely directly or indirectly on finding 
the frequent items, and implementations are in 
use in large-scale industrial systems. Informally, 
given a sequence of items, the problem is simply 
to find those items which occur most frequently. 
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Typically, this is formalized as finding all items 
whose frequency exceeds a specified fraction of 
the total number of items. Variations arise when 
the items have weights and further when these 
weights can also be negative. 


Definition 1 Given a stream S of n items 
the frequency of an item i is 
fi = |t; =i}|. The exact ¢-frequent items 
comprise the set {i| fj > én}. 


Usinctas 


Example The stream S = (a,b,a,c,c,a,b,d) 


has ta = 3, fh = 2,fe = 2, fa = 1. For ¢ = 
0.2, the frequent items are a,b, and c. 


A streaming algorithm which solves this prob- 
lem must use a linear amount of space, even for 
large values of @: given an algorithm that claims 
to solve this problem, we could insert a set S 
of N items, where every item has frequency 1. 
Then, we could also insert N copies of item 7. If 
i is then reported as a frequent item (occurring 
more than 50 % of the time), theni € S, elsei ¢ 
S. Consequently, since set membership requires 
Q2(N) space, §2(N) space is also required to 
solve the frequent items problem. Instead, an ap- 
proximate version is defined based on a tolerance 
for error €. 


Definition 2 Given a stream S of n items, the €- 
approximate frequent items problem is to return 
a set of items F so that for all itemsi € F, fj > 
(f@—e€)n, and there is noi ¢ F suchthat f; > on. 


Since the exact (€ = 0) frequent items prob- 
lem is hard in general, we will use “frequent 
items” or “the frequent items problem” to refer 
to the €-approximate frequent items problem. A 
related problem is to estimate the frequency of 
items on demand. 


Definition 3 Given a stream S of n items defin- 
ing frequencies f; as above, the frequency esti- 
mation problem is to process a stream so that, 
given any 7, an fi is returned satisfying F? < 
fix fi ten. 


Key Results 


The problem of frequent items dates back at least 
to a problem first studied by Moore in 1980 [5]. 
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It was published as a “problem” in the Journal 
of Algorithms in the June 1981 issue [17], to 
determine if there was a majority choice in a list 
of 7 votes. 


Preliminaries: The Majority Algorithm 

In addition to posing the majority question as 
a problem, Moore also invented the MAJORITY 
algorithm along with Boyer in 1980, described 
in a technical report from early 1981 [4]. A 
similar solution with proof of the optimal number 
of comparisons was provided by Fischer and 
Salzburg [9]. MAJORITY can be stated as follows: 
store the first item and a counter, initialized to 1. 
For each subsequent item, if it is the same as the 
currently stored item, increment the counter. If it 
differs and the counter is zero, then store the new 
item and set the counter to 1; else, decrement the 
counter. After processing all items, the algorithm 
guarantees that if there is a majority vote, then 
it must be the item stored by the algorithm. 
The correctness of this algorithm is based on a 
pairing argument: if every non-majority item is 
paired with a majority item, then there should still 
remain an excess of majority items. Although not 
posed as a streaming problem, the algorithm has 
a streaming flavor: it takes only one pass through 
the input (which can be ordered arbitrarily) to find 
a majority item. To verify that the stored item 
really is a majority, a second pass is needed to 
simply count the true number of occurrences of 
the stored item. 


Algorithm 1: MISRA-GRIES(k) 


n<0;T < @; 
for eachi : 
n<en-+l1; 
ifi ¢ T 
then c; < c; + 1; 
else if |T| <k—-1 
T <T U {i}; 
q< 1; 
else for all 7 ¢ T 
Cj — Cj 1; 
do jifc; =0 
then T <— 7T\{j}; 


do then 
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Misra-Gries Summary 
The Misra-Gries summary is a simple algorithm 
that solves the frequent items problem. It can be 
viewed as a generalization of MAJORITY to track 
multiple frequent items. 

Instead of keeping a single counter and item 
from the input, the MISRA-GRIES summary 
stores k — 1 (item, counter) pairs. The natural 
generalization of MAJORITY is to compare 
each new item against the stored items T and 
increment the corresponding counter if it is 
among them. Else, if there is some counter with 
count zero, it is allocated to the new item and the 
counter set to 1. If all A — 1 counters are allocated 
to distinct items, then all are decremented by 1. A 
grouping argument is used to argue that any item 
which occurs more than 1 /k times must be stored 
by the algorithm when it terminates. Example 
pseudocode to illustrate this algorithm is given 
in Algorithm 1, making use of set notation to 
represent the operations on the set of stored items 
T: items are added and removed from this set 
using set union and set subtraction, respectively, 
and we allow ranging over the members of this 
set (thus, implementations will have to choose 
appropriate data structures which allow the 
efficient realization of these operations). We 
also assume that each item j stored in T has 
an associated counter c;. For items not stored in 
T, then c; is defined as 0 and does not need to be 
explicitly stored. 

This n/k generalization was first proposed 
by Misra and Gries [16]. The time cost of the 
algorithm is dominated by the O(1) dictionary 
operations per update and the cost of decre- 
menting counts. Misra and Gries use a balanced 
search tree and argue that the decrement cost is 
amortized O(1); Karp et al. propose a hash table 
to implement the dictionary [11]; and Demaine 
et al. show how the cost of decrementing can be 
made worst case O(1) by representing the counts 
using offsets and maintaining multiple linked 
lists [8]. 

Bose et al. [3] observed that executing this 
algorithm with k = 1/e ensures that the count 
associated with each item on termination is at 
most en below the true value. The bounds on 
the accuracy of the structure were tightened by 
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Berinde et al. to show that the error depends only 
on the “tail”: the total weight of items outside the 
top-k most frequent, rather than the total weight 
of all items [2]. This gives a stronger accuracy 
guarantee when the input distribution is skewed, 
for example, if the frequencies follow a Zipfian 
distribution. They also show that the algorithm 
can be altered to tolerate updates with weights, 
rather than assuming that each item has equal unit 
weight. 

A similar data structure called SPACESAVING 
was introduced by Metwally et al. [15]. This 
structure also maintains a set of items and coun- 
ters, but follows a different set of update rules. 
Recently, it was shown that the SPACESAVING 
structure is isomorphic to MISRA-GRIES: the 
state of both structures can be placed in corre- 
spondence as each update arrives [1]. The dif- 
ferent representations reflect that SPACESAVING 
maintains an upper bound on the count of stored 
items, while MISRA-GRIES keeps a lower bound. 
In studies, the upper bound tends to be closer to 
the true count, but it is straightforward to switch 
between the two representations. 

Moreover, Agarwal et al. [1] showed that 
the MISRA-GRIES summary is mergeable. That 
is, two summaries of different inputs of size 
k can be combined together to obtain a new 
summary of size k that summarizes the union 
of the two inputs. This merging can be done 
repeatedly, to summarize arbitrarily many inputs 
in arbitrary configurations. This allows the 
summary to be used in distributed and parallel 
environments. 

Lastly, the concept behind the algorithm 
of tracking information on k_ representative 
elements has inspired work in other settings. 
Liberty [12] showed how this can be used to track 
an approximation to the best k-rank summary 
of a matrix, using k rows. This was extended 
by Ghashami and Phillips [10] to offer better 
accuracy by keeping more rows. 


Applications 


The question of tracking approximate counts 
for a large number of possible objects arises in 


Misra-Gries Summaries 


a number of settings. Many applications have 
arisen in the context of the Internet, such as 
tracking the most popular source, destinations, or 
source-destination pairs (those with the highest 
amount of traffic) or tracking the most popular 
objects, such as the most popular queries to a 
search engine, or the most popular pieces of 
content in a large content host. It forms the basis 
of other problems, such as finding the frequent 
itemsets within a stream of transactions: those 
subsets of items which occur as a subset of many 
transactions. Solutions to this problem have used 
ideas similar to the count and prune strategy of 
the Misra-Gries summary to find approximate 
frequent itemsets [14]. Finding approximate 
counts of items is also needed within other stream 
algorithms, such as approximating the entropy of 
a stream [6]. 


Experimental Results 


There have been a number of experimental stud- 
ies of Misra-Gries and related algorithms, for a 
variety of computing models. These have shown 
that the algorithm is accurate and fast to exe- 
cute [7, 13]. 


URLs to Code and Data Sets 


Code for this algorithm is widely available: 


http://www.cs.rutgers.edu/~muthu/massdal- 
code-index.html 

http://hadjieleftheriou.com/sketches/index.html 

https://github.com/cpnielsen/twittertrends 
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Problem Definition 


How can a network be explored efficiently with 
the help of mobile agents? This is a very broad 
question and to answer it adequately it will be 
necessary to understand more precisely what mo- 
bile agents are, what kind of networked environ- 
ment they need to probe, and what complexity 
measures are interesting to analyze. 


Mobile Agents 

Mobile agents are autonomous, intelligent com- 
puter software that can move within a network. 
They are modeled as automata with limited mem- 
ory and computation capability and are usually 
employed by another entity (to which they must 
report their findings) for the purpose of collecting 
information. The actions executed by the mobile 
agents can be discrete or continuous and tran- 
sitions from one state to the next can be either 
deterministic or non-deterministic, thus giving 
rise to various natural complexity measures de- 
pending on the assumptions being considered. 


Network Model 
The network model is inherited directly from the 
theory of distributed computing. It is a connected 
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graph whose vertices comprise the computing 
nodes and edges correspond to communication 
links. It may be static or dynamic and its re- 
sources may have various levels of accessibility. 
Depending on the model being considered, nodes 
and links of the network may have distinct labels. 
A particularly useful abstraction is an anonymous 
network whereby the nodes have no identities, 
which means that an agent cannot distinguish 
two nodes except perhaps by their degree. The 
outgoing edges of a node are usually thought 
of as distinguishable but an important distinc- 
tion can be made between a globally consistent 
edge-labeling versus a locally independent edge- 
labeling. 


Efficiency Measures for Exploration 

Efficiency measures being adopted involve the 
time required for completing the exploration task, 
usually measured either by the number of edge 
traversals or nodes visited by the mobile agent. 
The interplay between time required for explo- 
ration and memory used by the mobile agent 
(time/memory tradeoffs) are key parameters con- 
sidered for evaluating algorithms. Several re- 
searchers impose no restrictions on the mem- 
ory but rather seek algorithms minimizing ex- 
ploration time. Others, investigate the minimum 
size of memory which allows for exploration 
of a given type of network (e.g., tree) of given 
(known or unknown) size, regardless of the ex- 
ploration time. Finally, several researchers con- 
sider time/memory tradeoffs. 


Main Problems 

Given a model for both the agents and the net- 
work, the graph exploration problem is that of 
designing an algorithm for the agent that allows 
it to visit all of the nodes and/or edges of the 
network. A closely related problem is where the 
domain to be explored is presented as a region of 
the plane with obstacles and exploration becomes 
visiting all unobstructed portions of the region in 
the sense of visibility. Another related problem 
is that of rendezvous where two or more agents 
are required to gather at a single node of a net- 
work. 
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Key Results 


Claude Shannon [17] is credited with the first 
finite automaton algorithm capable of exploring 
an arbitrary maze (which has a range of 5 x5 
squares) by trial and error means. Exploration 
problems for mobile agents have been exten- 
sively studied in the scientific literature and the 
reader will find a useful historical introduction in 
Fraigniaud et al. [11]. 


Exploration in General Graphs 

The network is modeled as a graph and the agent 
can move from node to node only along the 
edges. The graph setting can be further specified 
in two different ways. In Deng and Papadim- 
itriou [8] the agent explores strongly connected 
directed graphs and it can move only in the 
direction from head to tail of an edge, but not 
vice-versa. At each point, the agent has a map 
of all nodes and edges visited and can recognize 
if it sees them again. They minimize the ratio of 
the total number of edges traversed divided by 
the optimum number of traversals, had the agent 
known the graph. In Panaite and Pelc [15] the 
explored graph is undirected and the agent can 
traverse edges in both directions. In the graph 
setting it is often required that apart from com- 
pleting exploration the agent has to draw a map 
of the graph, i.e., output an isomorphic copy of it. 
Exploration of directed graphs assuming the exis- 
tence of labels is investigated in Albers and Hen- 
zinger [1] and Deng and Papadimitriou [8]. Also 
in Panaite and Pelc [15], an exploration algorithm 
is proposed working in time e + O(n), where 
is n the number of nodes and e the number of 
links. Fraigniaud et al. [11] investigate memory 
requirements for exploring unknown graphs (of 
unknown size) with unlabeled nodes and locally 
labeled edges at each node. In order to explore 
all graphs of diameter D and max degree d a mo- 
bile agent needs §2(D logd) memory bits even 
when exploration is restricted to planar graphs. 
Several researchers also investigate exploration 
of anonymous graphs in which agents are allowed 
to drop and remove pebbles. For example in 
Bender et al. [4] it is shown that one pebble 
is enough for exploration, if the agent knows 
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an upper bound on the size of the graph, and 
O(log log) pebbles are necessary and sufficient 
otherwise. 


Exploration in Trees 

In this setting it is assumed the agent can dis- 
tinguish ports at a node (locally), but there is 
no global orientation of the edges and no mark- 
ers available. Exploration with stop is when the 
mobile agent has to traverse all edges and stop 
at some node. For exploration with return the 
mobile agent has to traverse all edges and stop 
at the starting node. In perpetual exploration the 
mobile agent has to traverse all edges of the tree 
but is not required to stop. The upper and lower 
bounds on memory for the exploration algorithms 
analyzed in Diks et al. [9] are summarized in 
the table, depending on the knowledge that the 
mobile agent has. Here, 7 is the number of nodes 
of the tree, N > 7 is an upper bound known to 
the mobile agent, and d is the maximum degree 
of a node of the tree. 


Exploration Knowledge Lower bounds | Upper bounds 
Perpetual @ None O(log d) 
w/Stop n<N 92 (log log logn}) O(log N) 
w/Return @ 2 (log n) O(log? n) 


Exploration in a Geometric Setting 
Exploration in a geometric setting with unknown 
terrain and convex obstacles is considered by 
Blum et al. [5]. They compare the distance 
walked by the agent (or robot) to the length 
of the shortest (obstacle-free) path in the scene 
and describe and analyze robot strategies that 
minimize this ratio for different kinds of scenes. 
There is also related literature for exploration 
in more general settings with polygonal and 
rectangular obstacles by Deng et al. [7] and 
Bar-Eli et al. [3], respectively. A setting that 
is important in wireless networking is when 
nodes are aware of their location. In this case, 
Kranakis et al. [12] give efficient algorithms for 
navigation, namely compass routing and face 
routing that guarantee delivery in Delaunay and 
arbitrary planar geometric graphs, respectively, 
using only local information. 
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Rendezvous 

The rendezvous search problem differs from 
the exploration problem in that it concerns two 
searchers placed at different nodes of a graph that 
want to minimize the time required to rendezvous 
(usually) at the same node. At any given time 
the mobile agents may occupy a vertex of the 
graph and can either stay still or move from 
vertex to vertex. It is of interest to minimize 
the time required to rendezvous. A_ natural 
extension of this problem is to study multi- 
agent mobile systems. More generally, given 
a particular agent model and network model, a set 
of agents distributed arbitrarily over the nodes of 
the network are said to rendezvous if executing 
their programs after some finite time they all 
occupy the same node of the network at the same 
time. Of special interest is the highly symmetric 
case of anonymous agents on an anonymous 
network and the simplest interesting case is 
that of two agents attempting to rendezvous on 
a ring network. In particular, in the model studied 
by Sawchuk [16] the agents cannot distinguish 
between the nodes, the computation proceeds in 
synchronous steps, and the edges of each node 
are oriented consistently. The table summarizes 
time/memory tradeoffs known for six algorithms 
(see Kranakis et al. [13] and Flocchini et al. [10]) 
when the & mobile agents use indistinguishable 
pebbles (one per mobile agent) to mark their 
position in an n node ring. 


Memory Time Memory Time 
O(k logn) O(n) O(logn) | O(n) 
O(log n) O(kn) O(logk) | O(n) 
Olkloglogn) O (284) O(ogk)  O(nlogk) 


Kranakis et al. [14] show a striking computa- 
tional difference for rendezvous in an oriented, 
synchronous, 7 x torus when the mobile agents 
may have more indistinguishable tokens. It is 
shown that two agents with a constant number 
of unmovable tokens, or with one movable token 
each cannot rendezvous if they have o(logn) 
memory, while they can perform rendezvous with 
detection as long as they have one unmovable 
token and O(logn) memory. In contrast, when 
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two agents have two movable tokens each then 
rendezvous (respectively, rendezvous with de- 
tection) is possible with constant memory in 
a torus. Finally, two agents with three movable to- 
kens each and constant memory can perform ren- 
dezvous with detection in a torus. If the condition 
on synchrony is dropped the rendezvous problem 
becomes very challenging. For a given initial 
location of agents in a graph, De Marco et al. [6] 
measure the performance of a rendezvous al- 
gorithm as the number of edge traversals of 
both agents until rendezvous is achieved. If the 
agents are initially situated at a distance D in an 
infinite line, they give a rendezvous algorithm 
with cost O(D|Lmin|*) when D is known and 
O((D + |Limax|)?) if D is unknown, where | Lnin| 
and |Zmax| are the lengths of the shorter and 
longer label of the agents, respectively. These re- 
sults still hold for the case of the ring of unknown 
size but then they also give an optimal algorithm 
of cost O(n|Lmin|), if the size n of the ring is 
known, and of cost O(n|Lmax|), if it is unknown. 
For arbitrary graphs, they show that rendezvous is 
feasible if an upper bound on the size of the graph 
is known and they give an optimal algorithm of 
cost O(D|Lmin|) if the topology of the graph and 
the initial positions are known to the agents. 


Applications 


Interest in mobile agents has been fueled by two 
overriding concerns. First, to simplify the com- 
plexities of distributed computing, and second 
to overcome the limitations of user interface ap- 
proaches. Today they find numerous applications 
in diverse fields such as distributed problem solv- 
ing and planning (e.g., task sharing and coordina- 
tion), network maintenance (e.g., daemons in net- 
working systems for carrying out tasks like mon- 
itoring and surveillance), electronic commerce 
and intelligence search (e.g., data mining and 
surfing crawlers to find products and services 
from multiple sources), robotic exploration (e.g., 
rovers, and other mobile platforms that can ex- 
plore potentially dangerous environments or even 
enhance planetary extravehicular activity), and 
distributed rational decision making (e.g., auction 
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protocols, bargaining, decision making). The in- 
terested reader can find useful information in sev- 
eral articles in the volume edited by Weiss [18]. 


Open Problems 


Specific directions for further research would 
include the study of time/memory tradeoffs in 
search game models (see Alpern and Gal [2]). 
Multi-agent systems are particularly useful for 
content-based searches and exploration, and fur- 
ther investigations in this area would be fruitful. 
Memory restricted mobile agents provide a rich 
model with applications in sensor systems. In 
the geometric setting, navigation and routing in 
a three dimensional environment using only local 
information is an area with many open problems. 
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Problem Definition 


The verification of monadic second-order (MSO) 
graph properties, equivalently, the model- 
checking problem for MSO logic over finite 
binary relational structures, is fixed-parameter 
tractable (FPT) where the parameter consists of 
the formula that expresses the property and the 
tree-width or the clique-width of the input graph 
or structure. How to build usable algorithms for 
this problem? The proof of the general theorem 
(an algorithmic meta-theorem, cf. [12]) is based 
on the description of the input by algebraic terms 
and the construction of finite automata that accept 
the terms describing the satisfying inputs. But 
these automata are in practice much too large 
to be constructed [11, 14]. A typical number 
of states is 921? and lower bounds match this 
number. Can one use automata and overcome 
this difficulty? 


Key Results 


We propose to use fly-automata (FA) [3]. They 
are automata whose states are described and not 
listed and whose transitions are computed on 
the fly and not tabulated. When running on a 
term of size 1,000, a fly-automaton with 22" 
states computes only 1,000 transitions if it is 
deterministic. FA can have infinitely many states. 
For example, a state can record, among other 
things, the (unbounded) number of occurrences 
of a particular symbol in the input term. FA can 


XNY =BA Vu, v.{edg(u, v) => 
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thus check certain graph properties that are not 
monadic second-order expressible. An example 
is regularity, the fact that all vertices have the 
same degree. Furthermore, an FA equipped with 
an output function that maps the set of accept- 
ing states to an effectively given domain D can 
compute a value, for example, the number of k- 
colorings of the given graph G or the minimum 
cardinality of one of the k color classes if G 
is k-colorable (this number measures how close 
this graph is to be (kK — 1)-colorable). We have 
implemented and tested an FA that computes the 
number of 3-colorings of a graph. 

Tree-width and clique-width are graph com- 
plexity measures that serve as parameters in many 
FPT algorithms [7, 8, 10]. Both are based on 
hierarchical decompositions of graphs that can 
be expressed by terms written with the opera- 
tion symbols of appropriate graph algebras [6]. 
The model-checking automata take such terms 
as inputs. We will present results concerning 
graphs of bounded clique-width. The similar re- 
sults for graphs of bounded tree-width reduce 
to them as we will explain at the end of this 
section. 


Graph Algebras and Monadic 

Second-Order Logic 

Graphs are finite, undirected, and without loops 
and multiple edges. The extension to directed 
graphs, possibly with loops and/or labels, is 
straightforward. A graph G is identified with 
the relational structure (Vg, edgg) where edgg 
is a binary symmetric relation representing 
adjacency. 

Rather than giving a formal definition of 
monadic second-order (MSO) logic, we present 
the closed formula expressing 3-colorability (an 
NP-complete property). It is 4X, Y.Col(X, Y) 
where Co/(X, Y ) is the formula 


IAUEX AVE X)ATAUEYAVEY)AAUEXUYAVEX UY). 


This formula expresses that X,Y and Vg — 
(X UY) are the three color classes of a 3-coloring. 


The corresponding colors are respectively 1, 2, 
and 3. 
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Definition 1 (The graph algebra G) 


(a) We will use N; as a set of labels called 


port labels. A p-graph is a triple G = 
(Vg,edgg,mG) where mG is a mapping: 
Vg — Ny. If tG(x) = a, we say that x is 
an a-port. The set 2(G) of port labels of G 
is its type. By using a default label, say 1, we 


make every nonempty graph into a p-graph 
of type {1}. 


(b) We let Fx be the following finite set of 


(c 


wm 


operations on p-graphs of type included in 

C := {1,...,k} CNL: 

e The binary symbol @ denotes the union 
of two disjoint p-graphs, 

¢ The unary symbol relabg_,, denotes the 
relabelling that changes every port label 
a into b (where a,b € C), 

¢ The unary symbol add, », fora < 5b, 
a,b € C, denotes the edge addition that 
adds an edge between every a-port x and 
every b-port y (unless there is already an 
edge between them, our graphs have no 
multiple edges), 

* For each a € C, the nullary symbol a 
denotes an isolated a-port. 

Every term ¢ in T(F;) (the set of finite terms 

written with F;) is called a k-expression. Its 

value is a p-graph, val(t), that we now define. 

For each position u of t (equivalently, each 

node u of the syntax tree of f), we define 

a p-graph val(t)/u, whose vertex set is the 

set of leaves of t below u. The definition 

of val(t)/u is, for fixed t, by bottom-up 

induction on u: 

¢ If u is an occurrence of a, then val(t)/u 
has vertex u as an a-port and no edge, 

e¢ If uw is an occurrence of © with sons uw 
and uz, then val(t)/u val(t)/u, ® 
val(t)/u2 (note that val(t)/u, and 
val(t) /uz are disjoint), 

¢ Ifwis an occurrence of relabg_,, with son 


ui, then val(t)/u := relabg_p(val(t)/u1), 


¢ If wu is an occurrence of add,» with son 
uy, then val(t)/u := addg p(val(t)/uy). 
Finally, val(t) := val(t)/root;. Its vertex 

set is the set of all leaves (occurrences of 

nullary symbols). For an example, let 


(d 
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(e) 


(f) 


(a) 
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t :=addy, (add, ,(a° ©* b*) ®° relabj_,.. 
(add? ,(a° ®'° b!"))) 


where the superscripts 1-11 number the 
positions of ¢. The p-graph val(t) is 
3a —55 —11e—9q where the subscripts 
a,b,c indicate the port labels. (For clarity, 
port labels are letters in examples.) If 
u := 2 and w := 8, then t/u t/w 
addgp(a © b); however, val(t)/u is the p- 
graph 3, — 5, and val(t)/w is 94 — 119, 
isomorphic to val(t)/u. 
The clique-width of a graph G, denoted by 
cwd(G), is the least integer k such that G is 
isomorphic to val(t) for some t in T (Fx). We 
denote by Gx the set val(T (Fx)) of p-graphs 
that are the value of a term over F;. We let F 
be the union of the sets F; and G be the union 
of the sets G;. Every p-graph is isomorphic to 
a graph in G, hence, has a clique-width. 
An F-congruence is an equivalence relation 
= on p-graphs such that: 
¢ Two isomorphic p-graphs are equivalent, 
and 
- if G G’ and H H’, then 
m(G) u(G'), G@H G' ® 
H', addgp(G) addg»p(G’) and 
relabg-+p(G) ® relabg+p(G’). 
A set of graphs L is recognizable if it is a 
union of classes of an F-congruence such 
that, for each finite type C C N4, the 
number of equivalence classes of p-graphs of 
type C is finite. 
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Definition 2 (Fly-automata) 


Let H be a finite or countable, effectively 
given, signature. A fly-automaton over H 
(in short, an FA over H) is a 4-tuple A = 
(H, Q.4,6,4,Acc,) such that QO, is the fi- 
nite or countable, effectively given, set of 
states; Acc, is the set of accepting states, a 
decidable subset of O.4; and 6.4 is a com- 
putable function that defines the transition 
rules: for each tuple (f,q1,..-,dm) with 
qi,---,dm € Qa, f € H, p(f) =m= 0, 
bA(f, q1,---,Gm) is a finite set of states. We 
write f[qi,...,¢m] > g (and f > qif f is 
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nullary) to mean that g € 64(f,q1,..-,4m).- 
We say that A is finite if H and Q , are finite. 
Runs and recognized languages are defined 
as usual; see [1]. A deterministic FA A 
(by “deterministic” we mean “deterministic 
and complete’’) has a unique run on each term 
t, and g.,(t) is the state reached at the root 
of t. The mapping q.4 is computable, and the 
membership in L(A) of a termt € T(/) is 
decidable. 

Every FA A that is not deterministic can 
be determinized by an easy extension of the 
usual construction, see [3]; it is important 
that the sets 64(f, q1,.--,m) be finite. 

A deterministic FA over H with output func- 
tion is a 4-tuple A = (H,Q4,6,4, Out,) 
that is a deterministic FA where Acc, is 
replaced by a total and computable output 
function Outs: Q.4 — D such that D is 
an effectively given domain. The function 
computed by A is Comp(A) : T(H) > D 
such that Comp(A)(t) := Out,(qa(t)). 


(b 


ma 


(c 


wa 


(d 


wm 


Example I The number of accepting runs of an 
automaton. 


Let A = (H,Q4,6,4,Acc,) be a nondeter- 
ministic FA. We construct a deterministic FA B 
that computes the number of accepting runs of A 
on any term in T(/7). As set of states Op, we 
take the set of finite subsets of O.4 x N+. The 
transitions are defined so that B reaches state a 
at the root of t € T(#) if and only if a@ is the 
finite set of pairs (¢,n) € QO. x N+ such that 
n is the number of runs of A that reach state q 
at its root. This number is finite and @ can be 
seen as a partial function: 0, — N+ having 
a finite domain. For a symbol f of arity 2, 6 
has the transition: f[a, 8] — y where y is the 
set of pairs (g,n) such that n is the sum of the 
integers np.n,y over all pairs (p,r) € O4 X Ox 
such that (p,np) € a, (ny) € B andg € 
OAC, pr). The transitions for other symbols are 


® [a, B] > a UB, 
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defined similarly. The function Out, maps a state 
a to the sum of the integers n such that (q,n) € 
aM (Acc, x Ny).0 


Example 2 An FA for checking 3-colorability. 


In order to construct an FA that accepts the 
terms t € T(F) such that val(t) is 3-colorable, 
we first construct an FA A for the property 
Col(X,Y). For this purpose, we transform F 
into F®) by replacing each nullary symbol a by 
the four nullary symbols (a,ij), 7,7 € {0,1}. 
A term t € T(F®)) defines, first, the graph 
val(t’) where t’ is obtained from ¢ by removing 
the Booleans i, 7 from the nullary symbols and, 
second, the pair (Vy, Vy) such that Vy is the set 
of vertices u (leaves of ft) that are occurrences 
of (a,1j) for some a and j and Vy is the set 
of those that are occurrences of (a,i1) for some 
a and i. The set of terms t € T(F)) such 
that Col(Vxy, Vy) holds in val(t’) is defined by a 
deterministic FA A that we now specify. Its states 
are Error and the finite subsets of N+ x {1, 2, 3}. 
Their meanings are as follows: 


e At position u of t, the automaton reaches state 
Error if and only if val(t’)/u has a vertex in 
Vx 1 Vy or an edge between two vertices, 
either both in Vy or both in Vy or both in 
Vg — (Vx U Vy), hence of the same color, 
respectively 1, 2, or 3; 

¢ Itreaches statea C N+ x {1, 2,3} if and only 
if these conditions do not hold and @ is the set 
of pairs (a, i) such that val(t')/u has an a-port 
of color i. 


All states except Error are accepting. Here are 
the transitions of A: 


(a, 00) > {(4, 3)}, (a, 10) > {(4, 1}, 
(a, 01) > {(a, 2)}, (a, 11) > Error. 


For a, 8 C Ny x {1, 2, 3}, A has transitions: 


addgp|a] — Error, if (a,i) and (6,1) belong to w for some i = 1, 2,3, 


addgp|a] > a, otherwise , 


relabg_p|a| — B, obtained by replacing a by d in each pair of a. 
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Its other transitions are @[a, 8] — Error if 
a or B is Error, addg[Error] — Error, and 
relabg-+p|Error| > Error. 

This FA checks Col(X,Y). To check, 
AX, Y.Col(X,Y), we build a nondeterministic 
FA B by deleting the state Error, by replacing 
the first three rules of A by a > {(a,3)},a > 
{(a,1)},a — {(a,2)}, and by deleting those 
that yield Error. All states are accepting, but on 
some terms, no run can reach the root, and these 
terms are rejected. Furthermore, the construction 
of Example 1 shows how to make 6 into a 
deterministic FA that computes the number of 
3-colorings, because the 3-colorings of val(t) 
are in bijection with the accepting runs of 6 
on?f.O 


Recognizability Theorem: The set of graphs 
that satisfy a closed MSO formula @ is 
F-recognizable. 


Weak Recognizability Theorem: For every 
closed MSO formula g, for every k, the set of 
graphs in G, that satisfy g is F,-recognizable. 


About proofs: The Recognizability Theorem 
is Theorem 5.68 of [6]. Its proof shows that 
the equivalence defined by the fact that the two 
considered p-graphs have the same type and sat- 
isfy the same closed MSO formulas of quantifier 
height at most that of @ satisfies the conditions 
of Definition 1(f). (These formulas have unary 
predicates for expressing port labels.) The Weak 
Recognizability Theorem follows from the for- 
mer one. It can be proved directly by constructing 
an FA over F [3]. (We construct a single FA, not 
a particular FA for each subsignature Fx as in 
Theorem 6.35 of [6].) This construction can be 
implemented, at least in a number of nontrivial 
cases. The proof of the strong theorem does not 
provide any usable automaton. 


Counting and Optimizing Automata 

Let P(X1,..., Xs) be an MSO property of vertex 
sets X1,..., Xs. We denote (X1,..., Xs) by Xx 
and t / P(X) means that X satisfies P in the 
graph val(t) defined by a term f. We are interested 
not only to check the validity of JX¥.P(X) but 
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also to compute from a term f¢ the following 
values: 


#X.P(X), defined as the number of assignments 
X such thatt E P(X), 

SpX.P(X), the spectrum of P(X), defined as the 
set of tuples of the form (|X,|,...,|Xs5|) such 
that t E P(X), 

MSpX.P(X), the multispectrum of P(X), de- 
fined as the multiset of tuples (|X1|,...,|Xs|) 
such thatt FE P(X). 


These computations can be done by FA. The 
construction for #X.P(X) is similar to that of 
Example |. We obtain in this way FPT or XP 
algorithms [8, 10]. 


Edge Set Quantifications and Tree-Width 

The two recognizability theorems and the corre- 
sponding constructions of FA yielding FPT and 
XP algorithms hold and can be done for graphs 
of bounded tree-width and MSO formulas with 
edge set quantifications: it suffices to replace 
a graph G by its incidence graph Inc(G), a 
bipartite graph whose vertices are those of G 
and its edges, to observe that the clique-width 
of Inc(G) is bounded in terms of the tree-width 
of G and that an MSO formula with edge set 
quantifications over G can be translated into an 
MSO formula over Jnc(G). Another approach is 
in [2]. 


Beyond MS Logic 

The property that the considered graph is the 
union of two disjoint regular graphs with possibly 
some edges between these two subgraphs is not 
MSO expressible but can be checked by an FA. 
An FA can also compute the minimal number of 
edges between X and Vg — X such that G[X] 
and G[Vg — X] are connected, when such a set X 
exists. 


Open Problems 


The parsing problem for graphs of clique- 
width at most k is NP-complete (with k in 
the input) [9]. Good heuristics remain to be 
developed. 
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Experimental Results 


These constructions have been implemented and 
tested [3-5]. We have computed the number of 
optimal colorings of graphs of clique-width at 
most 8 for which the chromatic polynomial is 
known, which allows to verify the correctness of 
the automaton. We can verify in, respectively, 35 
and 105 min that the 20 x 20 and the 6 x 60 grids 
are 3-colorable. In 29 min, we can verify that the 
McGee graph (24 vertices) given by a term over 
Fj is acyclically 3-colorable. 

A different approach using games is presented 
in [13]. 
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Problem Definition 


Many complex networks of interests such as the 
Internet, social, and biological networks exhibit 
the community structure where nodes are natu- 
rally clustered into tightly connected communi- 
ties, with only sparser connections between them. 
The modularity maximization is concerned with 
finding such community structures in a given 
complex network. 

Consider a network represented as an undi- 
rected graph G = (V, £) consisting of n = |V| 
vertices and m = |E| edges. The adjacency 
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matrix of G is denoted by A = (Aj;), where 
Aj; is the weight of edge (7, 7) and Aj; = 0 if 
(i, 7) € E. We also denote the (weighted) degree 
of vertex i, the total weights of edges incident at 
i, by deg(Z) or, in short, dj. 

Community structure (CS) is a division of 
the vertices in V into a collection of disjoint 
subsets of vertices C = {Cj,C2,...,C;} (with 
unspecified /) where Lies C; = V. Each subset 
C; C V is called a community, and we wish to 
have more edges connecting vertices in the same 
communities than edges that connect vertices in 
different communities. The modularity [7] of C 
is the fraction of the edges that fall within the 
given communities minus the expected number of 
such fraction if edges were distributed at random. 
The randomization of the edges is done so as to 
preserve the degree of each vertex. If vertices i 


and j have degrees d; and d;, then the expected 
dd; 


number of edges between i and j is =47-. Thus, 
the modularity, denoted by Q, is then 
1 did; 

4 Aj -—~)5; 

20) = aD (45-Fe)a, wy 


where M is the total edge weights and the ele- 
ment 6;; of the membership matrix 6 is defined 
as 


1, 
a= i: 


The modularity values can be either positive 
or negative, and the higher (positive) modular- 
ity values indicate stronger community structure. 
Therefore, the maximizing modularity problem 
asks us to find a division C which maximizes the 
modularity value Q(C). 


if i and 7 are in the same community 


otherwise 


Key Results 


Computational Complexity 

This problem is different from the partition 
problem as we do not know the total number 
of partitions beforehand. That being said, / is 
unspecified. Somewhat surprisingly, modularity 
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maximization is still NP-complete on trees, one 
of the simplest graph classes. 


Theorem 1 Modularity maximization on trees is 
NP-complete. 


The proof has been presented in [3], reducing 
from the subset-sum problem. Furthermore, for 
dense graphs, namely, for the complements of 
3-regular graphs, DasGupta and Desai have pro- 
vided a (1 + €)-inapproximability of the modular- 
ity maximization problem [1], stated as follows. 


Theorem 2 /t is NP-hard to approximate the 
modularity maximization problem on (n — 4)— 
regular graphs within a factor of i + € for some 
constant € > 0. 


The proof has been presented in [1], reduc- 
ing from the maximum-cardinality-independent 
set problem for 3-regular graphs (3-MIS). The 
basic intuition behind this proof is that large-size 
cliques must be properly contained within the 
communities. 


Exact Solutions 

Although the problem is in NP class, efficient 
algorithms to obtain optimal solutions for small 
size networks are still of interest. Dinh and Thai 
have presented an exact algorithm with a run- 
ning time of O(n>) to the problem on uniform- 
weighted trees [3]. The algorithm is based on 
the dynamic programming, which exploits the 
relationship between maximizing modularity and 
minimizing the sum of squares of component 
volumes, where volume of a component S is 
defined as vol(S) = Docs dy. 

When the input graph is not a tree, an exact 
solution based on integer linear programming 
(ILP) is provided by Dinh and Thai [3]. Note that 
in the ILP for modularity maximization, there is 
a triangle inequality xj; + xjk — xix => 0 to 
guarantee the values of x;; be consistent to each 
other. Here x;; = 0 if i and j are in the same 
community; otherwise x;; = 0. Therefore, the 
ILP has 3(3) = 0(n?) constraints, which is about 
half a million constraints for a network of 100 
vertices. As a consequence, the sizes of solved 
instances were limited to few hundred nodes. 
Along this direction, Dinh and Thai have pre- 
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sented a sparse metric, which reduces the number 
of constraints to O(n?) in sparse networks where 
m = O(n) [3]. 


Approximation Algorithms 

When G is a tree, the problem can be solved by a 
polynomial time approximation scheme (PTAS) 
with a running time of O(n*t!) fore > 0 
[3]. The PTAS is solely based on the following 
observation. Removing & — 1 edges in G will 
yield k connected communities and OQ, > (1 — 
L) Qopt where Q, is the maximum modularity of 
a community structure with k communities and 
Q opt is the optimal solution. 

When G having the degree distribution fol- 
lows the power law, i.e., the fraction of nodes 
in the network having k degrees is proportional 
to k~”, where 1 < y < 4, the problem can 
be approximated to a constant factor for y > 2 
and up to an O(1/logn) when 1 < y < 2 [2]. 
The details of this algorithm, namely, low-degree 
following (LDF), is presented in Algorithm 1. 


Algorithm 1 Low-degree following algorithm 
(parameter dy € N*) 
1. L:=@,M :=0%,0:=@, pi =O0Vi=1..n 
2. for each vertex i € V do 
3. if (ki < do)&Gi ¢€ LUM) then 
4. if NG) \ M # @ then 
De 
6 


Select a vertex j € N(i) \ M 
: Let 
M=MU{i},L=LUC} DHS 
7. else 


8. Select a vertex t € N() 
9. O=OU {i}, pj =t 
10. L£=9 


11. for each vertex i € V \(M UO) do 

12, Cp={ij}U{jEM|p; =i}Ut € O| pp, =i} 
13. L=LU{C} 

14. Return £ 


The selection of do is important to derive 
the approximation factor as do needs to be a 
sufficient large constant that is still relative small 
to n when n tends to infinity. In an actual im- 
plementation of the algorithm, Dinh and Thai 
have designed an automatic selection of do to 
maximize Q. LDF can be extended to solve the 
problem in directed graphs [2]. 
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Theorem 3 ([1]) There exists an O(logd)- 


approximation for d-regular graphs with 
d < xp and O(log dimax)-approximation for 
weighted graphs dmax < qe: 


Modularity in Dynamic Networks 

Networks in real life are very dynamic, which 
requires us to design an adaptive approximation 
algorithm to solve the problem. In this approach, 
the community structure (CS) at time ¢ is detected 
based on the community structure at time f — 
1 and the changes in the network, instead of 
recomputing it from scratch. Indeed, the above 
LDF algorithm can be enhanced to cope with 
this situation [4]. At first LDF is run to find the 
base CS at time 0. Then at each time step, we 
adaptively follow and unfollow the nodes that 
violate the condition 3 in Algorithm 1. 


Applications 


Finding community structures has applications in 
vast domains. For example, a community in bi- 
ology networks often consists of proteins, genes, 
or subunits with functional similarity. Thus, find- 
ing communities can help to predict unknown 
proteins. Likewise, in online social networks, a 
community can be a group of users having com- 
mon interests; therefore, obtaining CS can help 
to predict user interests. Furthermore, detect- 
ing CS finds itself extremely useful in deriving 
social-based solutions for many network prob- 
lems, such as forwarding and routing strategies 
in communication networks, Sybil defense, worm 
containment on cellular networks, and sensor 
reprogramming. In the network visualization per- 
spective, finding CS helps display core network 
components and their mutual interactions, hence 
presents a more compact and understandable de- 
scription of the network as a whole. 
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Problem Definition 


A minimal perfect hash function is a (data struc- 
ture providing a) bijective map from a set S of 
n keys to the set of the first 7 natural numbers. 
In the static case (i.e., when the set S is known 
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in advance), there is a wide spectrum of solutions 
available, offering different trade-offs in terms of 
construction time, access time, and size of the 
data structure. 

An important distinction is whether any bi- 
jection will be suitable or whether one wants it 
to respect some specific property. A monotone 
minimal perfect hash function (MMPHF) is one 
where the keys are bit vectors and the function is 
required to preserve their lexicographic order. 

Sometimes in the literature, this situation is 
identified with the one in which the set S has 
some predefined linear order and the bijection is 
required to respect the order: in this case, one 
should more precisely speak of order-preserving 
minimal perfect hash functions; for this scenario, 
a ready-made (2(n logn) space lower bound is 
trivially available (since all the n! possible key 
orderings must be representable). This is not true 
in the monotone case, so the distinction between 
monotone and order preserving is not moot. 


Problem Formulation 

Let [x] denote the set of the first x natural num- 
bers. Given a positive integer u = 2”, and a set 
SC [u] with |S| = n, a functionh : S > [m] 
is perfect iff it is injective, minimal iff m = n, 
and monotone iff x < y implies h(x) < h(y). 
In the following, we assume to work in the RAM 
model with words of length w. For simplicity, we 
will describe only the case in which the keys have 
fixed length w; the results can be extended to the 
general case [2]. 


Key Results 


One key building block that is needed to describe 
the possible approaches to the problem is that 
of storing an arbitrary r-bit function h : S > 
[2”] in a succinct way (i.e., using rn + o(n) 
bits and guaranteeing constant access time). A 
practical (although slightly less efficient) method 
for storing a succinct function f : S — [2”] was 
presented in [8]. The construction draws three 
hash functions ho,/1,h2 : S — [yn] (with 
y x 1.23) and builds a 3-hypergraph with one 
hyperedge (ho(x),/1(x),h2(x)) for every x € 
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Monotone Minimal Perfect Hash Functions, Fig.1 A 
toy example: S = {s0,..-.,510} is divided into three 
buckets of size three (except for the last one that contains 


SO 2 

Sy 2 

S2 2 

i : 00 0 
d is : q, Qoioollo 
0 <5 1 -901001101012 

S6 11 0 3 

s7 ill re i 

Sg 11 

SO 1 

S10 1 


Monotone Minimal Perfect Hash Functions, Fig. 2 
Bucketing with longest common prefix for the set S of 
Fig. 1: do maps each element x of S to the length of the 
longest common prefix of the bucket to which x belongs; 
d, maps each longest common prefix to the bucket index 


S. With positive probability, this hypergraph does 
not have a nonempty 2-core; or, equivalently, the 
set of equations (in the variables a; ) 


F(X) = (Aho) + Gh (x) + hy (xy) mod 2” 


has a solution that can be found by a hypergraph- 
peeling process in time O(v). Storing the func- 
tion amounts to storing yn integers of r bits each 
(the array a;), so yrn bits are needed (excluding 
the bits required to store the hash functions), 
and it is possible to improve this amount to 
yn + rn using standard techniques [2]; function 
evaluation takes constant time (one essentially 
just needs to evaluate three hash functions). We 
shall refer to this data structure as an MWHC 
Junction (from the name of the authors). Alterna- 
tive, asymptotically more efficient data structures 
for the same purpose are described in [3, 4]. 


Trivial Solution 

MWHC functions can themselves be used to 
store a MMPHF (setting r = [logn] and using 
the appropriate function f). This idea (requiring 
yn{[logn] bits) is also implied in the original 
paper [8], where the authors actually present their 
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Bo Bs 
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just two elements), whose delimiters D = {52,55, 5g} 
appear in boldface 


construction as a solution to the order-preserving 
minimal perfect hash function problem. 


A Constant-Time, O(n log w)-Space 

Solution 

More sophisticated approaches are based on the 
general technique of bucketing: a distributor 
function f : S — [m] is stored, which divides 
the set of keys into m buckets respecting the 
lexicographic order; then, for every bucket 
i € [m], a MMPHF g; on f~!(i) is stored as 
a succinct function. Different choices for the 
distributor and the bucket sizes provide different 
space/time trade-offs. 

A constant-time solution can be obtained 
using buckets of equal size b = O(logn). Let 
Bo,..., Bm—1 be the unique order-preserving 
partition of S into buckets of size b and pj; 
be the longest common prefix of the keys in 
B;; Fig. 1 shows an example with b = 3. It is 
easy to see that the p;’s are all distinct, which 
allows us to build a succinct function d, mapping 
Pi t+» 1. Moreover, we can store a succinct 
function do : S — [w] mapping x to the length 
of the longest common prefix p; of the bucket 
B; containing x. The two functions together 
form the distributor (given x one applies dj 
to the prefix of x of length do(x)); see Fig. 2. 
The space required by do and d, is O(n log w) 
and O((n/b) log(n/b)) = O(n), respectively. 
The functions g;’s require }°; O(|B;|logb) = 
O(n loglogn) = O(nlogw) bits. The overall 
space requirement is thus O(n logw); optimal 
values for b when using MWHC functions are 
computed in [2]. 


A O(log w)-Time, O(n log log w)-Space 
Solution 

In search for a better space bound, we note that 
an obvious alternative approach to the bucketing 
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The standard compacted trie built from the set D of Fig. 1. 
This data structure can be used to rank arbitrary elements 
of the universe with respect to D: when the trie is visited 
with an element not in D, the visit will terminate at an exit 


problem is by ranking. Given a set of strings X, 
a ranking data structure provides, for each string 
s € [u], the number of strings in X that are 
smaller than s, that is, |{x € X | x < s}I. 
If you consider the set D of delimiters (i.e., the 
set containing the last string of each bucket), a 
distributor can be built using any data structure 
that provides the rank of an arbitrary string with 
respect to D. 

For instance, a trivial way to obtain such 
rank information is to build a compacted trie [7] 
containing the strings in D (see Fig.3). Much 
more sophisticated data structures are obviously 
available today (e.g., [6]), but they all fail to 
meet their purpose in our case. They occupy too 
much space, and we do not really need to rank 
all possible strings: we just need to rank strings 
in S. We call this problem the relative ranking 
problem: given sets D C S, we want to rank a 
string s w.r.t. D under the condition that s belongs 
to S. 

The relative ranking problem can _ be 
approached in different ways: in particular, a 
static probabilistic z-fast trie [1] provides a O(n)- 
space, O(log w)-time solution for buckets of size 
O(log w) (the trie can actually provide a wrong 
output for a small set of keys; their correct output 
needs to be stored separately — see [1] for details). 


node, determining that the given element is to the left (i.e., 
smaller than) or to the right (i.e., larger than) all the leaves 
that descend from that node. The picture shows, for each 
element of S,, the node where the visit would end 


The functions g;’s require O(n log log w) bits, 
which dominates the space bound. 


Different Approaches 

Other theoretical and practical solutions, corrob- 
orated with experimental data, were described 
in [2]. 


Open Problems 


Currently, the main open problem about mono- 
tone minimal perfect hashing is that all known 
lower bounds are trivial; in particular, the lower 
bound n loge + logw — O(logn) by Fredman 
and Komlés (provided that u = 2” > n?té€ 
for some fixed € > 0) for minimal perfect hash 
functions [5] is of little help — it is essentially 
independent from the size of the universe. We 
already know that the dependence on the size 
of the universe can be really small (as small as 
O(n log log w) bits), but it is currently conjec- 
tured that there is no monotone minimal perfect 
hashing scheme whose number of bits per key is 
constant in the size of the universe. 
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URLs to Code and Datasets 


The most comprehensive implementation of 
MMPHFs is contained in the Sux4J Java 
free library (http://sux4j.di.unimi.it/). A good 
collection of datasets is available by the LAW 
(http:/aw.di.unimi.it/) under the form of list of 
URLs from web crawls and list of ids from social 
networks (e.g., Wikipedia): these are typical use 
cases of a MMPHF. 
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Problem Definition 


A real-valued function f : D — R defined over 
a partially ordered set (poset) D is monotone if 
F(x) < f(y) for any two points x ~< y. In 
this article, we focus on the poset induced by 
the coordinates of a d-dimensional, n-hypergrid, 
[n]?, where x < y iff x; < y; for all integers 
1 <i <d. Here, we have used [n] as a shorthand 
for {1,...,n}. The hypercube, {0, 1}4, and the n- 
line, [n], are two special cases of this. 

Monotonicity testing is the algorithmic prob- 
lem of deciding whether a given function is 
monotone. The algorithm has query access to the 
function, which means that it can query f at any 
domain point x and obtain the value of f(x). 
The performance of the algorithm is measured 
by the number of queries it makes. Although the 
running time of the algorithm is also important, 
we ignore this parameter in this article because 
in most relevant cases it is of the same order of 
magnitude as the query complexity. We desire 
algorithms whose query complexity is bounded 
polynomially in d and logn. 

Monotonicity is a fundamental property. In 
one dimension, monotone functions correspond 
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to sorted arrays. In many applications it may be 
useful to know if an outcome of a procedure 
monotonically changes with some or all of its 
attributes. In learning theory, for instance, mono- 
tone concepts are known to require fewer samples 
to learn; it is useful to test beforehand if a concept 
is monotone or not. 


Property Testing Framework 

It is not too hard to see that, without any extra 
assumptions, monotonicity testing is infeasible 
unless almost the entire input is accessed. There- 
fore, the problem is studied under the property 
testing framework where the goal is to distinguish 
monotone functions from those which are “far” 
from monotone. A function f is said to be e€-far 
from being monotone if it needs to be modified on 
at least €-fraction of the domain points to make it 
monotone. 

Formally, the monotonicity testing problem 
is defined as follows. Given an input distance 
parameter ¢, the goal is to design a q(d,n, €)- 
query tester which is a randomized algorithm that 
queries the function on at most q(d,n,€) points 
and satisfies the following requirements: 


(a) If the function is monotone, the algorithm 
accepts. 

(b) If the function is €-far from being monotone, 
the algorithm rejects. 


The tester can err in each of the above cases but 
the error probability should be at most 1/3. If a 
tester never rejects a monotone function, then it is 
called a one-sided error tester. If the queries made 
by the tester do not depend on the function values 
returned by the previous queries, then the tester 
is called a nonadaptive tester. The function q is 
called the query complexity of the monotonicity 
tester, and one is interested in (ideally matching) 
upper and lower bounds for it. 


Key Results 


Monotonicity testing was first studied by Ergun 
et al. [7] for functions defined on the n-line, that 
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is, the case when d = 1. The authors designed an 
O(e! logn)-query tester. 

Goldreich et al. [9] studied monotonicity 
testing over the hypercube {0, 1}” and designed 
an O(e—!d)-query tester for Boolean functions. 
These are functions whose range is {0,1}. 
This tester repeats the following edge test 
O(e~'d) times: sample an edge of the hypercube 
uniformly at random, query the function values 
at its endpoints, and check if these two points 
violate monotonicity. That is, if x ~< y and 
F(x) > f(y), where x, y are the queried points. 
If at any time a violation is found, the tester 
rejects; otherwise it accepts. Clearly, this tester is 
nonadaptive and with one-sided error. Goldreich 
et al. [9] also demonstrated an O(e~!d logn)- 
query tester for Boolean functions defined over 
the d-dimensional n-hypergrid. This tester also 
defines a distribution over comparable pairs (not 
necessarily adjacent) of points, that is, points 
x,y with x < y. In each iteration, it samples a 
pair from the distribution, queries the function 
value on these points, and checks for a violation 
of monotonicity. Such testers are called pair 
testers. 

Goldreich et al. [9] showed a range reduction 
theorem which states that if there exists a 
pair tester for Boolean functions over the 
hypergrid whose query complexity has linear 
dependence on e, that is g(d,n,€) = €~'q(d,n), 
then such a tester could be extended to 
give an O(e~'q(d,n)|R\|)-query tester for 
real-valued functions, where R is the range 
of the real-valued function. Note that |R| 
could be as large as 2”. Dodis et al. [6] 
obtained a stronger range reduction theorem 
and showed an O(e~!q(d,n) log |R|)-query 
tester under the same premise. This implied an 
O(e—'d logn log |R|)-query monotonicity tester 
for the hypergrid. 

Recently, Chakrabarty and Seshadhri [3] re- 
moved the dependence on the range size exhibit- 
ing an O(e—!d logn)-query monotonicity tester 
for any real-valued function. In fact, the tester 
in their paper is the same as the tester defined 
in [9] alluded to above. In a separate paper, the 
same authors [4] complemented the above result 
by showing that any tester (adaptive and with 
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two-sided error) for monotonicity with distance 
parameter € must make §2(e~!d logn)-queries. 


Sketch of the Techniques 

In this section, we sketch some techniques used 
in the results mentioned above. For simplicity, 
we restrict ourselves to functions defined on the 
hypercube. 

To analyze their tester, Goldreich et al. [9] 
used the following shifting technique from com- 
binatorics to convert a Boolean function f to a 
monotone function. Pick any dimension i and 
look at all the violated edges whose endpoints 
differ precisely in this dimension. If (x ~< y) is 
a violation, then since f is Boolean, it must be 
that f(x) = 1 and f(y) = O. For every such 
violation, redefine f(x) = Oand f(y) = 1. Note 
that once dimension i is treated so, there is no vi- 
olation across dimension i. The crux of the argu- 
ment is that for any other dimension /, the num- 
ber of violated edges across that dimension can 
only decrease. Therefore, once all dimensions are 
treated, the function becomes monotone, and the 
number of points at which it has been modified 
is at most twice the number of violated edges in 
the beginning. In other words, if f is ¢-far, then 
the number of violated edges is at least €2¢~!. 
Therefore, the edge test succeeds with probability 
at least «/d since the total number of edges is 
gar". 

The fact that treating one dimension does not 
increase the number of violated edges across any 
other dimension crucially uses the fact that the 
function is Boolean and is, in general, not true 
for all real-valued functions. The range reduction 
techniques of [6,9] convert a real-valued function 
to a collection of Boolean ones with certain 
properties, and the size of the collection, which 
is |R| in [9] and O(log |R]) in [6], appears as the 
extra multiplicative term in the query complexity. 

The technique of Chakrabarty and Se- 
shadhri [3] to handle general real-valued 
functions departs from the above in that it does 
not directly fix a function to make it monotone. 
Rather, they use the connection between the 
distance to monotonicity and matchings in the 
violation graph Gy which was defined by [6]. 
The vertices of Gy are the domain points, and 
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(x, y) is an edge in Gy if and only if (x, y) isa 
violation to monotonicity of f. A folklore result, 
which appears in print in [8], is that the distance 
to monotonicity of f is precisely the cardinality 
of the minimum vertex cover of Gf divided by 
the domain size. This, in turn, implies that if f 
is €-far, then any maximal matching in G ¢ must 
have cardinality at least ene); 

Chakrabarty and Seshadhri [3] introduce the 
notion of the weighted violation graph G ¢ where 
the weight of (x ~< y) is defined as (f(x) — 
f(y)). They prove that the number of violated 
edges of the hypercube is at least the size of the 
maximum weight matching M* in G . Since M* 
is also maximal, this shows that the number of 
violated edges is at least €2¢—!. The proof of [3] 
follows by charging each matched pair of points 
in M™* differing in the ith dimension to a distinct 
violated edge across the ith dimension. 


Boolean Monotonicity Testing 
Let us go back to Boolean functions defined over 
the hypercube. Recall that Goldreich et al. [9] 
designed an O(e—!d)-query tester for such 
functions. Furthermore, their analysis is tight: 
there are functions for which the edge tester’s 
success probability is O(«€/d). The best known 
lower bound [8], however, is that any nonadaptive 
tester with one-sided error with respect to a 
specific constant distance parameter requires 
92(V/d) queries, and any adaptive, one-sided 
error tester required (2 (log d)-queries. 
Chakrabarty and Seshadhri [2] obtained 
the first o(d)-query tester: they describe an 
O(d7/8e-3/2 In(1/e)) query tester for Boolean 
functions defined over the hypercube. The tester 
is a combination of the edge test and what the 
authors call the path test — in each step one of 
the two is performed with probability 1/2. The 
path test is the following. Orient all edges in 
the hypercube to go from the point with fewer 
1s to the one with more 1s. Sample a random 
directed path from the all-Os point to the all- 
1s point. Sample two points x,y on this path 
which (a) have “close” to d/2 ones, and (b) 
are “sufficiently” far away from each other. 
Query f(x), f(y) and check for a violation to 
monotonicity. 
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Chakrabarty and Seshadhri [2] obtain the fol- 
lowing result for the path tester. If in the hyper- 
cube (and not the violation graph), there exists 
a matching of violated edges whose cardinality 
equals 027, then the path test catches a violation 
with probability roughly Q(03//d). Although 
this is good if o is large, say a constant, the 
analysis above does not give better testers for 
functions with small o. The authors circumvent 
this via the following dichotomy theorem. Given 
a function f with distance parameter e, let the 
number of violated edges be 52”; from the result 
of Goldreich et al. [9], we know that 6 > e. 
Chakrabarty and Seshadhri [2] prove that if for 
any function f the quantity o is small, then 
6 must be large. In particular, they prove that 
60 > €7/32. Therefore, for any function, either 
the edge test or the path test succeeds with prob- 
ability w(1/d). 

Very recent work settles the question of non- 
adaptive, Boolean monotonicity testing. Chen, 
De, Servedio, and Tan [5] prove that any non- 
adaptive, monotonicity tester for Boolean func- 
tions needs to make 2(d2~-©)-queries for any 
constant c > 0, even if it is allowed to have two- 
sided error. Khot, Minzer, and Safra [10] gener- 
alize the dichotomy theorem of Chakrabarty and 
Seshadhri [2] to obtain a O(d !/?e~?)-query, non- 
adaptive monotonicity tester with one-sided error. 


Open Problems 


The query complexity of adaptive monotonicity 
testers for Boolean valued functions is not well 
understood. The best upper bounds are that of 
nonadaptive testers, while the best lower bound 
is only §2(logd). Understanding whether adap- 
tivity helps in Boolean monotonicity testing or 
not is an interesting open problem. Recent work 
of Berman, Raskhodnikova, and Yaroslavtsev [1] 
sheds some light: they show that any nonadaptive, 
one-sided error monotonicity tester for functions 
f : [nl] x [n] — {0,1}, that is, functions 
defined over the two-dimensional grid, requires 
Q(4log(4)) queries where ¢€ is the distance 
parameter; on the other hand, they also demon- 
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strate an adaptive, one-sided error monotonicity 
tester for such functions which makes only O(+) 
queries. 

In this article, we only discussed the poset 
defined by the n-hypergrid. The best known tester 
for functions defined over a general N-element 
poset is an O(,/ N/e) tester, while the best lower 


1 
bound is Q(N es") for nonadaptive testers. 
Both results are due to Fischer et al. [8], and 
closing this gap is a challenging problem. 
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Problem Definition 


A multi-armed bandit is a sequential decision 
problem defined on a set of actions. At each time 
step, the decision maker selects an action from 
the set and obtains an observable payoff. The 
goal is to maximize the total payoff obtained in 
a sequence of decisions. The name bandit refers 
to the colloquial term for a slot machine (“one- 
armed bandit” in American slang) and to the 
decision problem, faced by a casino gambler, of 
choosing which slot machine to play next. Ban- 
dit problems naturally address the fundamental 
trade-off between exploration and exploitation 
in sequential experiments. Indeed, the decision 
maker must use a strategy (called allocation pol- 
icy) able to balance the exploitation of actions 
that did well in the past with the exploration 
of actions that might give higher payoffs in the 
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future. Although the original motivation came 
from clinical trials [14] (when different treat- 
ments are available for a certain disease and 
one must decide which treatment to use on the 
next patient), bandits have often been used in 
industrial applications, for example, to model the 
sequential allocation of a unit resource to a set of 
competing tasks. 


Definitions and Notation 
A bandit problem with K > 2 actions is specified 
by the processes (X;,+) that, at each time step 


t = 1,2,..., assign a payoff X;4 to each action 
i = 1,...,K. An allocation policy selects at 
time ¢ an action J; € {1,..., K}, possibly using 


randomization, and receives the associated payoff 
X1,,1- Note that the index J; of the action selected 
by the allocation policy at time ¢ can only depend 
on the set X7,,1,..., X7,_,,-1 Of previously ob- 
served payoffs (and on the policy’s internal ran- 
domization). It is this information constraint that 
creates the exploration vs. exploitation dilemma 
at the core of bandit problems. 

The performance of an allocation policy over 
a horizon of T steps is typically measured against 
that of the policy that consistently plays the 
optimal action for this horizon. This notion of 
performance, called regret, is formally defined by 


Rr = 


max 
i=1,...,K 


T T 
A Yo Xia — Yo Xt . (1) 


t=1 t=1 


The expectation in (1) is taken with respect to 
both the policy’s internal randomization and the 
potentially stochastic nature of the payoff pro- 
cesses. Whereas we focus here on payoff pro- 
cesses that are either deterministic or stochastic 
i.i.d., other choices have been also considered. 
Notable examples are the Markovian payoff pro- 
cesses [8] or the more general Markov decision 
processes studied in reinforcement learning. 

If the processes (X;,+) are stochastic i.i.d. with 
unknown expectations [41,...,/4K, aS in Rob- 
bins’ original formalization of the bandit prob- 
lem [13], then 
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where * = max; jt; is the highest expected 
payoff. In this case, the regret (1) becomes the 
stochastic regret 


T 


RP =Th*- > 


t=1 


o[ Hr, | - (2) 


On the other hand, if the payoff processes (Xj,1) 
are fixed, deterministic sequences of unknown 
numbers x;,+, then (1) becomes the nonstochastic 
regret 


DET _ 
Rr = 


[x71] (3) 


T T 
mex, oxi 
1K 


i= 
t=1 t=1 


where the expectation is only with respect to 
the internal randomization used by the allocation 
policy. This nonstochastic version of the bandit 
problem is directly inspired by the problem of 
playing repeatedly an unknown game — see the 
pioneering work of Hannan [9] and Blackwell [5] 
on repeated games and also the recent literature 
on online learning. 

The analyses in [2,3] assume bounded pay- 
offs, Xj. € [0,1] for all 7 and t. Under this 
assumption, Rr = O(T) irrespective of the 
allocation policy being used. The main problem 
is to determine the optimal allocation policies 
(the ones achieving the slowest regret growth) 
for the stochastic and the deterministic case. The 
parameters that are typically used to express the 
regret bounds are the horizon T and the number 
K of actions. 


Key Results 


Consider first the stochastic iid. bandits 
with K > 2 actions and expected payoffs 
([Xi2] = wi fori = 1,...,K andt > 1. 
Also, let A; = p* — pt; (where, as before, 

: max;=1,...,K /4i). A simple instance of 


Le = 
stochastic bandits are the Bernoulli bandits, 
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where payoffs X;,1, Xj,2,... for each action 
are ii.d. Bernoulli random variables. Lai and 
Robbins [11] prove the following asymptotic 
lower bound on Ry for Bernoulli bandits. 
Let Nir = |{t=1,...,T}t;=i| be the 
number of times the allocation policy selected 
action i within horizon T, and let KL(y, ju’) be 
the Kullback-Leibler divergence between two 


Bernoulli random variables of parameter j4 and 
/ 


LM. 

Theorem 1 ({11]) Consider an allocation policy 
that, for any Bernoulli bandit with K = 2 actions, 
for any actioni with ju; < *, and for anya > 0, 


satisfies E[N;,r] = 0(T%). Then, for any choice 
Of [1,-++, MK; 
T>00 nT = Sea )- 


This shows that when j11,..., {4x are fixed and 
T — ©, no policy can have a stochastic regret 
growing slower than @(K In7) in a Bernoulli 
bandit. For any fixed horizon 7, the following 
stronger lower bound holds. 


Theorem 2 ((3]) For all K > 2 and any horizon 
T, there exist [41,...,.4K such that any allo- 
cation policy for the Bernoulli bandit with K 
actions suffers a stochastic regret of at least 


JKT . (4) 


1 
RP > — 
t= 90) 


Note that Yao’s minimax principle immediately 
implies the lower bound 2(v KT) on the non- 
stochatic regret RF". 

Starting with [11], several allocation policies 
have been proposed that achieve a stochastic 
regret of optimal order K InT. The UCB algo- 
rithm of [2] is a simple policy achieving this 
goal nonasymptotically. At each time step t, UCB 
selects the action 7, maximizing Nig + C;,1. Here 
X;,z is the sample average of payoffs in previous 
selections of i, and C;,; is an upper bound on the 
length of the confidence interval for the estimate 
of j4; at confidence level 1 — i. 
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Theorem 3 ((2]) There exists an allocation pol- 
icy that for any Bernoulli bandit with K > 2 
actions satisfies 


8 
RW < int +2 UT. 
pe tates). ae 


The same result also applies to any stochastic 
iid. bandit with payoffs bounded in [0,1]. A 
comparison with Theorem 2 can be made by 
removing the dependence on A; in the upper 
bound of Theorem 3. Once rewritten in this way, 
the bound becomes R7? < 2 KT InT, showing 
optimality (up to log factors) of UCB even when 
the values j11,...,/4K can be chosen as a func- 
tion of a target horizon 7. 

A bound on the nonstochastic regret matching 
the lower bound of Theorem 2 up to log fac- 
tors is obtained via the randomized Exp3 policy 
introduced in [3]. At each time step t, Exp3 
selects each action i with probability propor- 
tional to exp(m Xi), where Xe3 is an impor- 
tance sampling estimate of the cumulative payoff 
Xip +++ + Xit-1 (recall that x; is only ob- 
served if J; = i) and the parameter 7; is set to 


Jn K)/(tK). 


Theorem 4 ((3]) For any bandit problem with 
K > 2 actions and deterministic payoffs (xi,t), 
the regret of the Exp3 algorithm satisfies Rr" < 


2V KT In K forall T. 


Variants 
The bandit problem has been extended in several 
directions. For example, in the pure exploration 
variant of stochastic bandits [6], a different notion 
of regret (called simple regret) is used. In this 
setting, at the end of each step f, the policy has to 
output a recommendation J; for the index of the 
optimal action. The simple regret of the policy for 
horizon T is then defined by rr = u* —Epy,. 
The term adaptive adversary is used to de- 
note a generalized nonstochastic bandit problem 
where the payoff processes of all actions are rep- 
resented by a deterministic sequence fi, fo,... 
of functions. The payoff at time ¢ of action 
i is then defined by f;(/1,...,/+-1,7), where 
I,,..., [4-1 18 the sequence of actions selected 
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by the policy up to time t—1. In the presence of an 
adaptive adversary, the appropriate performance 
measure is policy regret [1]. 

In the setting of contextual bandits [15], at 
each time step the allocation policy has access 
to side information (e.g., in the form of a feature 
vector). Here regret is not measured with respect 
to the best action, but rather with respect to the 
best mapping from side information to actions in 
a given class of such mappings. 

If the set of action in a regular bandit is very 
large, possibly infinite, then the regret can be 
made small by imposing dependencies on the 
payoffs. For instance, the payoff at time ¢ of 
each action a is defined by f;(a), where f; is an 
unknown function. Control on the regret is then 
achieved by making specific assumptions on the 
space of actions a and on the payoff functions 
Jt (e.g., linear, Lipschitz, smooth, convex, etc.) 
— see, e.g., [4,7] for early works in this direction. 


Applications 


Bandit problems have an increasing number of 
industrial applications particularly in the area 
of online services, where one can benefit from 
adapting the service to the individual sequence of 
requests. A prominent example of bandit appli- 
cation is online advertising. This is the problem 
of deciding which advertisement to display on 
the web page delivered to the next visitor of a 
website. A related problem is website optimiza- 
tion, which deals with the task of sequentially 
choosing design elements (font, images, layout) 
of the web page to be displayed to the next 
visitor. Another important application area is that 
of personalized recommendation systems, where 
the goal is to choose what to show from multiple 
content feeds in order to match the user’s interest. 
In these applications, the payoff is associated 
with the users’s actions, e.g., click-throughs or 
other desired behaviors — see, e.g., [12]. Bandits 
have been also applied to the problem of source 
routing, where a sequence of packets must be 
routed from a source host to a destination host in 
a given network, and the network protocol allows 
to choose a specific source-destination path for 
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each packet to be sent. The (negative) payoff is 
the time it takes to deliver a packet and depends 
additively on the congestion of the edges in the 
chosen path — see, e.g., [4]. A further application 
area is computer game playing, where each move 
is chosen by simulating and evaluating many 
possible game continuations after the move. Al- 
gorithms for bandits (more specifically, for a tree- 
based version of the bandit problem) can be used 
to explore more efficiently the huge tree of game 
continuations by focusing on the most promising 
subtrees. This idea has been successfully imple- 
mented in the MoGo player, which plays Go at 
the world-class level. MoGo is based on the UCT 
strategy for hierarchical bandits [10], which in 
turn builds on the UCB allocation policy. 
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Problem Definition 


Three related optimization problems derived 
from the classical edge disjoint paths problem 
(EDP) are described. An instance of EDP 
consists of an undirected graph G = (V, E) 
and a multiset T = {s1f1, sote,...,Sx¢tg} of k 
node pairs. EDP is a decision problem: can the 
pairs in T be connected (alternatively routed) 
via edge-disjoint paths? In other words, are there 
paths P;, Po,..., Py, such that for 1 <i <k, P; 
is path from s; to #;, and no edge e € E is in 
more than one of these paths? EDP is known 
to be NP-Complete. This article considers there 
maximization problems related to EDP. 


« Maximum Edge-Disjoint Paths Problem 
(MEDP). Input to MEDP is the same as for 
EDP. The objective is to maximize the number 
of pairs in J that can be routed via edge- 
disjoint paths. The output consists of a subset 
S C {1,2,...,k} and for eachi € S a path 
P; connecting s; to t; such that the paths are 
edge-disjoint. The goal is to maximize |S]. 

« Maximum Edge-Disjoint Paths Problem 
with Congestion (MEDPwC). MEDPwC is 
a relaxation of MEDP. The input, in addition 
to G and the node pairs, contains an integer 
congestion parameter c. The output is the 
same for MEDP; a subset S C {1,2,...,k} 
and for eachi € S a path P; connecting 5; 
to t;. However, the paths P;,1 <i < k are 
not required to be edge-disjoint. The relaxed 
requirement is that for each edge e € E, 
the number of paths for the routed pairs that 
contain e is at most c. Note that MEDPwC 
with c = | is the same as MEDP. 

¢ All-or-Nothing Multicommodity Flow 
Problem (ANF). ANF is a different relaxation 
of MEDP obtained by relaxing the notion of 
routing. A pair s;f; 1s now said to be routed 
if a unit flow is sent from s; to ¢; (potentially 
on multiple paths). The input is the same as 
for MEDP. The output consists of a subset 
S C {1,2,...,k} such that there is a feasible 
multicommodity flow in G that routes one 
unit of flow for each pair in S. The goal is to 
maximize |S]. 
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In the rest of the article, graphs are assumed 
to be undirected multigraphs. Given a graph 
G = (V,E) and S C YV, let 6¢(S) denote 
the set of edges with exactly one end point 
in S. Let n denote the number of vertices in the 
input graph. 


Key Results 


A few results in the broader literature are re- 
viewed in addition to the results from [5]. EDP 
is NP-Complete when k is part of the input. 
A highly non-trivial result of Robertson and Sey- 
mour yields a polynomial time algorithm when k 
is a fixed constant. 


Theorem 1 ([16]) There is a polynomial time 
algorithm for EDP when k is a fixed constant 
independent of the input size. 


Using Theorem | it is easy to see that 
MEDP and MEDPwC have polynomial time 
algorithms for fixed k. The same holds for 
ANF by simple enumeration since the decision 
version is polynomial-time solvable via linear 
programming. 

The focus of this article is on the case when k 
is part of the input, and in this setting, all three 
problems considered are NP-hard. The starting 
point for most approximation algorithms is the 
natural multicommodity flow relaxation given 
below. This relaxation is valid for both MEDP 
and ANF. The end points of the input pairs are 
referred to as terminals and let X denote the set 
of terminals. To describe the relaxation as well as 
simplify further discussion, the following simple 
assumption is made without loss of generality; 
each node in the graph participates in at most 
one of the input pairs. This assumption implies 
that the input pairs induce a matching M on the 
terminal set X. Thus the input for the problem can 
alternatively given as a triple (G, X, M). 

For the given instance (G, X, M), let P; de- 
note the set of paths joining s; and ¢; in G and let 
P = U;P;. The LP relaxation has the following 
variables. For each path P ¢€ P there is a variable 
JP) which is the amount of flow sent on P. For 
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each pair s;t; there is a variable x; to indicate the 
total flow that is routed for the pair. 


k 
(MCF —LP)max }°x; st 


i=1 


m— >) f(P)=0 lsix<k 


PEP; 


Yo f(P)<1 Vveek 


P:eeP 


xi, f(P) € (0, Yl <i<k,PeP 


The above path formulation has an exponential 
(in n) number of variables, however it can still 
be solved in polynomial time. There is also an 
equivalent compact formulation with a polyno- 
mial number of variables and constraints. Let 
OPT denote the value of an optimum solution to 
a given instance. Similarly, let OPT-LP denote the 
value of an optimum solution the LP relaxation 
for the given instance. It can be seen that OPT- 
LP > OPT. It is known that the integrality gap 
of (MCF-LP) is (./n) [10]; that is, there is 
an infinite family of instances such that OPT — 
LP/OPT = 2(./n. The current best approx- 
imation algorithm for MEDP is given by the 
following theorem. 


Theorem 2 ([7]) The integrality gap of (MCF- 
LP) for MEDP is @(./n) and there is an O(./n) 
approximation for MEDP. 


For MEDPwC the approximation ratio improves 
with the congestion parameter c. 


Theorem 3 ([18]) There is an O(n'/°) approx- 
imation for MEDPWC with congestion parame- 
ter c. In particular there is a polynomial time 
algorithm that routes 2(OPT — LP/n*/°) pairs 
with congestion at most c. 


The above theorem is established via randomized 
rounding of a solution to (MCF-LP). Similar 
results, but via simpler combinatorial algorithms, 
are obtained in [2, 15]. 

In [5] a new framework was introduced to 
obtain approximation algorithm for routing prob- 
lems in undirected graphs via (MCF-LP). A key 
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part of the framework is the so-called well-linked 
decomposition that allows a reduction of an arbi- 
trary instance to an instance in which the termi- 
nals satisfy a strong property. 


Definition 1 Let G = (V, E) bea graph. A sub- 
set X C V is cut-well-linked in G if for every 
SCY, |dg(S)| => minf|S 9X], |V\S)O XI}. 
X is flow-well-linked if there exists a feasible frac- 
tional multicommodity flow in G for the instance 
in which there is a demand of 1/|X| for each 
unordered pair uv,u,v € X. 


The main result in [5] is the following. 


Theorem 4 ([5]) Let (G, X, M) be an instance 
of MEDP or ANF and let OPT-LP be the 
value of an optimum solution to (MCF-LP) 
on (G,X,M). There there is a polynomial 
time algorithm that obtains a collection of 
instances (G1, X1, Mj), (G2, X2, M2),..., 
(Gp, Xp, Mp) with the following properties: 


¢ The graphs G1,G2,...,Gp are node-disjoint 
induced subgraphs of G. For 1 <i <h, Xj 
X and M; C M. 

© For\ <i <h,X; is flow-well-linked in G;. 

© yo, |Xi| = Q(OPT — LP/ log? n). 


For planar graphs and graphs that exclude a fixed 
minor, the above theorem gives a stronger 
guarantee: ae |X;| = 2(OPT —LP/ logn). 
A well-linked instance satisfies a _ strong 
symmetry property based on the following 
observation. If A is flow-well-linked in G then 
for any matching J on X, OPT-LP on the instance 
(G, A, J) is 82(|A]). Thus the particular matching 
M of a given well-linked instance (G, X, M) is 
essentially irrelevant. The second part of the 
framework in [5] consists of exploiting the well- 
linked property of the instances produced by the 
decomposition procedure. At a high level this 
is done by showing that if G has a well-linked 
set X, then it contains a “crossbar” (a routing 
structure) of size (2(|X|/poly(logn)). See [5] 
for more precise definitions. Techniques for the 
second part vary based on the problem as well as 
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the family of graphs in question. The following 
results are obtained using Theorem 4 and other 
non-trivial ideas for the second part [3-S, 8]. 


Theorem 5 ([5]) There is an O(log” n) approx- 
imation for ANF. This improves to an O(logn) 
approximation in planar graphs. 


Theorem 6 ([5]) There is an O(logn) approxi- 
mation for MEDPwC in planar graphs for c = 2. 
There is an O(logn) approximation for ANF in 
planar graphs. 


Theorem 7 ([8]) There is an O(r lognlogr) 
approximation for MEDP in graphs of treewidth 
at most r. 


Generalizations and Variants 

Some natural variants and generalizations of the 
problems mentioned in this article are obtained 
by considering three orthogonal aspects: (1) node 
disjointness instead of edge-disjointness, (11) ca- 
pacities on the edges and/or nodes, and (iii) de- 
mand values on the pairs (each pair s;t; has an 
integer demand d; and the objective is to route dj 
units of flow between s; and t;). Results similar to 
those mentioned in the article are shown to hold 
for these generalizations and variants [5]. Capac- 
ities and demand values on pairs are somewhat 
easier to handle while node-disjoint problems 
often require additional non-trivial ideas. The 
reader is referred to [5] for more details. 

For some special classes of graphs (trees, 
expanders and grids to name a few), constant 
factor or poly-logarithmic approximation ratios 
are known for MEDP. 


Multicommodity Flow, Well-linked Terminals 

and Routing Problems, Table 1 Known bounds for 

MEDP, ANF and MEDPwC in general undirected graphs. 
Integrality gap of (MCF-LP) 


Upper bound Lower bound 


MEDP O(./n) 2(./n) 
MEDPwC  O(n!/°) RQlog4-9/ C+D py 
ANF O(log” n) Q(log!/2-€ n) 
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Applications 


Flow problems are at the core of combinatorial 
optimization and have numerous applications in 
optimization, computer science and operations 
research. Very special cases of EDP and MEDP 
include classical problems such as _ single- 
commodity flows, and matchings in general 
graphs, both of which have many applications. 
EDP and variants arise most directly in telecom- 
munication networks and VLSI design. Since 
EDP captures difficult problems as special cases, 
there are only a few algorithmic tools that can 
address the numerous applications in a unified 
fashion. Consequently, empirical research tends 
to focus on application specific approaches to 
obtain satisfactory solutions. The flip side of the 
difficulty of EDP is that it offers a rich source of 
problems, the study of which has led to important 
algorithmic advances of broad applicability, as 
well as fundamental insights in graph theory, 
combinatorial optimization, and related fields. 


Open Problems 


A number of very interesting open problems 
remain regarding the approximability of the prob- 
lems discussed in this article. Table 1 gives the 
best known upper and lower bounds on the ap- 
proximation ratio as well as integrality gap of 
(MCF-LP). All the inapproximability results in 
Table 1, and the integrality gap lower bounds 
for MEDPwC and ANF, are from [1]. The inap- 
proximability results are based on the assumption 
that NP Z ZTIME(nP°!’“°s”)). Closing the gaps 
between the lower and upper bounds are the 
major open problems. 


The best upper bound on the approximation ratio is the 
same as the upper bound on the integrality gap of (MCF- 
LP) 


Approximation ratio 
Lower bound 
Q(log!/2-€ n) 
R(logt-O/E+D n) 
Q(log!/2-€ n) 
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Problem Definition 


The Multicut problem is a natural generalization 
of the s-t mincut problem — given an undirected 
capacitated graph G = (V, EF) with k pairs of 
vertices {5;,¢;}; the goal is to find a subset 
of edges of the smallest total capacity whose 
removal from G disconnects s; from t; for every 
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i € {1,---,k}. However, unlike the Mincut 
problem which is polynomial-time solvable, the 
Multicut problem is known to be NP-hard and 
APX-hard for k > 3 [6]. 

This problem is closely related to the Multi- 
Commodity Flow problem. The input to the latter 
is a capacitated network with k commodities 
(source-sink pairs); the goal is to route as much 
total flow between these source-sink pairs as pos- 
sible while satisfying capacity constraints. The 
maximum multi-commodity flow in a graph can 
be found in polynomial time via linear program- 
ming, and there are also several combinatorial 
FPTASes known for this problem [7, 9, 11]. 

It is immediate from the definition of Multi- 
cut that the multicommodity flow in a graph is 
bounded above by the capacity of a minimum 
multicut in the graph. When there is a single 
commodity to be routed, the max-flow min-cut 
theorem of Ford and Fulkerson [8] states that 
the converse also holds: the maximum s-t flow 
in a graph is exactly equal to the minimum s- 
t cut in the graph. This duality between flows 
and cuts in a graph has many applications and, in 
particular, leads to a simple algorithm for finding 
the minimum cut in a graph. 

Given its simplicity and elegance, several at- 
tempts have been made to extend this duality 
to other classes of flow and partitioning prob- 
lems. Hu showed, for example, that the min- 
multicut equals the maximum multi-commodity 
flow when there are only two commodities in 
the graph [12]. Unfortunately, this property does 
not extend to graphs with more than two com- 
modities. The focus has therefore been on obtain- 
ing approximate max-multicommodity flow min- 
multicut theorems. Such theorems would also 
imply a polynomial-time algorithm for approxi- 
mately computing the capacity of the minimum 
multicut in a graph. 


Key Results 


Garg, Vazirani and Yannakakis [10] were the first 
to obtain an approximate max-multicommodity 
flow min-multicut theorem. They showed that 
the maximum multicommodity flow in a graph 


Multicut 


is always at least an O(logk) fraction of the 
minimum multicut in the graph. Moreover, their 
proof of this result is constructive. That is, they 
also provide an algorithm for computing a mul- 
ticut for a given graph with capacity at most 
O(logk) times the maximum multicommodity 
flow in the graph. This is the best approxima- 
tion algorithm known to date for the Multicut 
problem. 


Theorem 1 Let M denote the minimum multicut 
in a graph with k commodities and f denote 
the maximum multicommodity flow in the graph. 
Then 


a < f <M 
O(logk) ~~ ~ 


Moreover, there is a polynomial time algorithm 
for finding an O(log k)-approximate multicut in 
a graph. 


Furthermore, they show that this theorem is tight 
to within constant factors. That is, there are fam- 
ilies of graphs in which the gap between the 
maximum multicommodity flow and minimum 
multicut is O(log k). 


Theorem 2 There exists a infinite family of mul- 
ticut instances {(Gx, Px)} such that for all k, the 
graph Ge = (Ve, Ex) contains k vertices and 
Py SC Vex Vz is a set of 2(k*) source-sink pairs. 
Furthermore, the maximum multicommodity flow 
in the instance (Gx, Pr) is O(k/logk) and the 
minimum multicut is §2(k). 


Garg et al. also consider the Sparsest Cut problem 
which is another partitioning problem closely 
related to Multicut, and provided an approxima- 
tion algorithm for this problem. Their results for 
Sparsest Cut have subsequently been improved 
upon [3, 15]. The reader is referred to the entry 
on > Sparsest Cut for more details. 


Applications 


A key application of the Multicut problem is 
to the 2CNF = Deletion problem. The latter is 
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a constraint satisfaction problem in which given 
a weighted set of clauses of the form P = Q, 
where P and @Q are literals, the goal is to delete 
a minimum weight set of clauses so that the re- 
maining set is satisfiable. The 2CNF = Deletion 
problem models a number of partitioning prob- 
lems, for example the Minimum Edge-Deletion 
Graph Bipartization problem — finding the mini- 
mum weight set of edges whose deletion makes 
a graph bipartite. Klein et al. [14] showed that 
the 2CNF = Deletion problem reduces in an ap- 
proximation preserving way to Multicut. There- 
fore, a p-approximation to Multicut implies a p- 
approximation to 2CNF = Deletion. (See the 
survey by Shmoys [16] for more applications.) 


Open Problems 


There is a big gap between the best-known 
algorithm for Multicut and the best hardness 
result (APX-hardness) known for the prob- 
lem. Improvements in either direction may 
be possible, although there are indications 
that the O(logk) approximation is the best 
possible. In particular, Theorem 2 implies 
that the integrality gap of the natural linear 
programming relaxation for Multicut is O(log k). 
Although improved approximations have been 
obtained for other partitioning problems using 
semi-definite programming instead of linear 
programming, Agarwal et al. [1] showed that 
similar improvements cannot be achieved for 
Multicut — the integrality gap of the natural SDP- 
relaxation for Multicut is also O(logk). On the 
other hand, there are indications that the APX- 
hardness is not tight. In particular, assuming the 
so-called Unique Games conjecture, it has been 
shown that Multicut cannot be approximated to 
within any constant factor [4, 13]. In light of these 
negative results, the main open problem related 
to this work is to obtain a super-constant hardness 
for the Multicut problem under a standard 
assumption such as P 4 NP. 

The Multicut problem has also been 
studied in directed graphs. The best known 
approximation algorithm for this problem is 
an O(n!!/23 Jog? n)-approximation due to 
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Aggarwal, Alon and Charikar [2], while on the 
hardness side, Chuzhoy and Khanna [5] show that 
there is no 2%(los!~n) approximation, for any 
e > 0, unless NPCZPP. Chuzhoy and Khanna 
also exhibit a family of instances for which the 
integrality gap of the natural LP relaxation of 
this problem (which is also the gap between the 
maximum directed multicommodity flow and the 
minimum directed multicut) is Q(n'/). 


Cross-References 


Sparsest Cut 


Recommended Reading 


1. Agarwal A, Charikar M, Makarychev K, Makarychev 
Y (2005) O(./log n) approximation algorithms for 
Min UnCut, Min 2CNF deletion, and directed cut 
problems. In: Proceedings of the 37th ACM sym- 
posium on theory of computing (STOC), Baltimore, 
pp 573-581 

2. Aggarwal A, Alon N, Charikar M (2007) Improved 
approximations for directed cut problems. In: Pro- 
ceedings of the 39th ACM symposium on theory of 
computing (STOC), San Diego, pp 671-680 

3. Arora S, Satish R, Vazirani U (2004) Expander flows, 
geometric embeddings, and graph partitionings. In: 
Proceedings of the 36th ACM symposium on theory 
of computing (STOC), Chicago, pp 222-231 

4. Chawla S, Krauthgamer R, Kumar R, Rabani Y, 
Sivakumar D (2005) On the hardness of approximat- 
ing sparsest cut and multicut. In: Proceedings of the 
20th IEEE conference on computational complexity 
(CCC), San Jose, pp 144-153 

5. Chuzhoy J, Khanna S$ (2007) Polynomial flow-cut 
gaps and hardness of directed cut problems. In: Pro- 
ceedings of the 39th ACM symposium on theory of 
computing (STOC), San Diego, pp 179-188 

6. Dahlhaus E, Johnson DS, Papadimitriou CH, Sey- 
mour PD, Yannakakis M (1994) The complexity of 
multiterminal cuts. SIAM Comput J 23(4):864-894 

7. Fleischer L (1999) Approximating fractional mullti- 
commodity flow independent of the number of com- 
modities. In: Proceedings of the 40th IEEE sympo- 
sium on foundations of computer science (FOCS), 
New York, pp 24-31 

8. Ford LR, Fulkerson DR (1956) Maximal flow through 
a network. Can J Math 8:399-404 

9. Garg N, K6nemann J (1998) Faster and simpler 
algorithms for multicommodity flow and other frac- 
tional packing problems. In: Proceedings of the 39th 
IEEE symposium on foundations of computer science 
(FOCS), pp 300-309 


1366 


10. Garg N, Vazirani VV, Yannakakis M (1996) Approx- 
imate max-flow min-(multi)cut theorems and their 
applications. SIAM Comput J 25(2):235-251 

11. Grigoriadis MD, Khachiyan LG (1996) Coordination 
complexity of parallel price-directive decomposition. 
Math Oper Res 21:321-340 

12. Hu TC (1963) Multi-commodity network flows. Oper 
Res 11(3):344-360 

13. Khot S, Vishnoi N (2005) The unique games conjec- 
ture, integrality gap for cut problems and the embed- 
dability of negative-type metrics into /, In: Proceed- 
ings of the 46th IEEE symposium on foundations of 
computer science (FOCS), pp 53-62 

14. Klein P, Agrawal A, Ravi R, Rao S (1990) Approx- 
imation through multicommodity flow. In: Proceed- 
ings of the 31st IEEE symposium on foundations of 
computer science (FOCS), pp 726-737 

15. Linial N, London E, Rabinovich Y (1995) The ge- 
ometry of graphs and some of its algorithmic appli- 
cations. Combinatorica 15(2):215-245, Also in Pro- 
ceedings of 35th FOCS, pp 577-591 (1994) 

16. Shmoys DB (1997) Cut problems and their applica- 
tion to divide-and-conquer. In: Hochbaum DS (ed) 
Approximation algorithms for NP-hard problems. 
PWS, Boston, pp 192-235 


Multidimensional Compressed 
Pattern Matching 


Amihood Amir 

Department of Computer Science, Bar-Ilan 
University, Ramat-Gan, Israel 

Department of Computer Science, Johns 
Hopkins University, Baltimore, MD, USA 


Keywords 


Multidimensional compressed search; Pat- 
tern matching in compressed images; Two- 
dimensional compressed matching 


Years and Authors of Summarized 
Original Work 


2003; Amir, Landau, Sokol 


Problem Definition 


Let c be a given compression algorithm, and let 
c(D) be the result of c compressing data D. The 
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compressed search problem with compression al- 
gorithm c is defined as follows. 


INPUT: Compressed text c(7') and pattern P. 
OutTPuT: All locations in T where pattern P 
occurs. 


A compressed matching algorithm is optimal if 
its time complexity is O(|c(T)]). 

Although optimality in terms of time is al- 
ways important, when dealing with compres- 
sion, the criterion of extra space is perhaps 
more important (Ziv, Personal communication, 
1995). Applications employ compression tech- 
niques specifically because there is a limited 
amount of available space. Thus, it is not suffi- 
cient for a compressed matching algorithm to be 
optimal in terms of time, it must also satisfy the 
given space constraints. Space constraints may 
be due to limited amount of disk space (e.g., on 
a server), or they may be related to the size of the 
memory or cache. Note that if an algorithm uses 
as little extra space as the size of the cache, the 
runtime of the algorithm is also greatly reduced 
as no cache misses will occur [13]. It is also 
important to remember that in many applications, 
e.g., LZ compression on strings, the compression 
ratio —|S|/|c(S)| —is a small constant. In a case 
where the compression ratio of the given text 
is a constant, an optimal compressed matching 
performs no better than the naive algorithm of 
decompressing the text. However, if the con- 
stants hidden in the “big O” are smaller than the 
compression ratio, then the compressed matching 
does offer a practical benefit. If those constants 
are larger than the optimal the compressed search 
algorithm may, in fact, be using more space than 
the uncompressed text. 


Definition 1 (inplace) A compressed matching 
is said to be inplace if the extra space used is 
proportional to the input size of the pattern. 


Note that this definition encompasses the com- 
pressed matching model (e.g., [2]) where the 
pattern is input in uncompressed form, as well 
as the fully compressed model [10], where the 
pattern is input in compressed form. The inplace 
requirement allows the extra space to be the input 
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size of the pattern, whatever that size may be. 
However, in many applications the compression 
ratio is a constant; therefore, a stronger space 
constraint is defined. 


Definition 2 Let AP be the set of all patterns 
of size m, and let c(AP) be the set of all com- 
pressed images of AP. Let m’ be the length of 
the smallest pattern in c(AP). A compressed 
matching algorithm with input pattern P of length 
mis called strongly inplace if the amount of extra 
space used is proportional to m’. 


The problem as defined above is equally 
applicable to textual (one-dimensional), image 
(two-dimensional), or any type of data, such as 
bitmaps, concordances, tables, XML data, or any 
possible data structure. 

The compressed matching problem is consid- 
ered crucial in image databases, since they are 
highly compressible. The initial definition of the 
compressed matching paradigm was motivated 
by the two dimensional run-length compression. 
This is the compression used for fax transmis- 
sions. The run-length compression is defined as 
follows. 

Let S = s152---5, be a string over some al- 
phabet »’. The run-length compression of string 
S is the string S’ = oj'0;?---o;* such that (1) 
0; # 0741 for 1 <i <k and (2) S can be de- 
scribed as the concatenation of k segments, the 
symbol o; repeated 7; times, the symbol o2 
repeated rz times, ..., and the symbol ox repeated 
rg, times. The two-dimensional run-length com- 
pression is the concatenation of the run-length 
compression of all the matrix rows (or columns). 

The two-dimensional run-length compressed 
matching problem is defined as follows: 

INPUT: Text array T of size n xn, and pattern 
array P of size m x m both in two-dimensional 
run-length compressed form. 

OuTPuT: All locations in T of occurrences of P. 
Formally, the output is the set of locations (i, /) 
such that T[i +k, 7 +/J=P[kK +1,/4+ 1]k, l= 
O...m—1. 

Another ubiquitous lossless two-dimensional 
compression is CompuServe’s GIF standard, 
widely used on the World Wide Web. It uses 
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LZW [19] (a variation of LZ78) on the image 
linearized row by row. 

The two-dimensional LZ compression is 
formally defined as follows. Given an image 
T[l...n,1...n], create a string Ty, [1...n7] 
by concatenating all rows of 7. Compressing 
Tin With one-dimensional LZ78 yields the two- 
dimensional LZ compression of the image T. 

The two-dimensional LZ compressed match- 
ing problem is defined as follows: 

INPUT: Text array T of size n xn, and pattern 
array P of size m x m both in two-dimensional 
LZ compressed form. 

OuTPUT: All locations in T of occurrences of 
P. Formally, the output is the set of locations (i, 
J) such that Ti +k, 7 +/J = P[k +1,/4 1] 
k,l =0...m—1. 


Key Results 


The definition of compressed search first ap- 
peared in the context of searching for two di- 
mensional run-length compression [1, 2]. The 
following result was achieved there. 


Theorem 1 (Amir and Benson [3]) There exists 
an O(\c(T)| log |e(T)|) worst-case time solution 
to the compressed search problem with the two 
dimensional run-length compression algorithm. 


The above mentioned paper did not succeed 
in achieving either an optimal or an inplace 
algorithm. Nevertheless, it introduced the 
notion of two-dimensional periodicity. As in 
strings, periodicity plays a crucial rdle in two- 
dimensional string matching, and its advent has 
provided solutions to many longstanding open 
problems of two-dimensional string matching. 
In [5], it was used to achieve the first linear- 
time, alphabet-independent, two-dimensional 
text scanning. Later, in [4, 16] it was used in 
two different ways for a linear-time witness table 
construction. In [7] it was used to achieve the 
first parallel, time and work optimal, CREW 
algorithm for text scanning. A simpler variant of 
periodicity was used by [11] to obtain a constant- 
time CRCW algorithm for text scanning. A recent 
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further attempt has been made [17] to generalize 
periodicity analysis to higher dimensions. 

The first optimal two-dimensional compressed 
search algorithm was the following. 


Theorem 2 (Amir et al. [6]) There exists an 
O(|c(T)|) worst-case time solution to the com- 
pressed search problem with the two-dimensional 
run-length compression algorithm. 


Optimality was achieved by a concept the authors 
called witness-free dueling. The paper proved 
new properties of two-dimensional periodicity. 
This enables duels to be performed in which no 
witness is required. At the heart of the dueling 
idea lies the concept that two overlapping occur- 
rences of a pattern in a text can use the content 
of a predetermined text position or witness in 
the overlap to eliminate one of them. Finding 
witnesses is a costly operation in a compressed 
text; thus, the importance of witness-free dueling. 

The original algorithm of Amir et al. [6] 
takes time O(|c(T)| + |P|logo), where o is 
min(|P|,|2’|), and & is the alphabet. However 
with the witness table construction of Galil and 
Park [12] the time is reduced to O(|c(T)| + |P]). 
Using known techniques, one can modify their 
algorithm so that its extra space is O(| P|). This 
creates an optimal algorithm that is also inplace, 
provided the pattern is input in uncompressed 
form. With use of the run-length compression, 
the difference between |P| and |c(P)| can be 
quadratic. Therefore it is important to seek an 
inplace algorithm. 


Theorem 3 (Amir et al. [9]) There exists an 
O(|c(T)| + |P| logo) worst-case time solution 
to the compressed search problem with the two- 
dimensional run-length compression algorithm, 
where o ismin(|P|,|'|), and X is the alphabet, 
for all patterns that have no trivial rows (rows 
consisting of a single repeating symbol). The 
amount of space used isO(\c(P)]|). 


This algorithm uses the framework of the non- 
compressed two dimensional pattern matching 
algorithm of [6]. The idea is to use the dueling 
mechanism defined by Vishkin [18]. Applying 
the dueling paradigm directly to run-length com- 
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pressed matching has previously been considered 
impossible since the location of a witness in the 
compressed text cannot be accessed in constant 
time. In [9], a way was shown in which a witness 
can be accessed in (amortized) constant time, 
enabling a relatively straightforward application 
of the dueling paradigm to compressed matching. 

A strongly inplace compressed matching algo- 
rithm exists for the two-dimensional LZ compres- 
sion, but its preprocessing is not optimal. 


Theorem 4 (Amir et al. [8]) There exists an 
O(\c(T)| + | P|? logo) worst-case time solution 
to the compressed search problem with the two- 
dimensional LZ compression algorithm, where 
o ismin(|P|,|2'|), and & is the alphabet. The 
amount of space used is O(m), for anm xX m size 
pattern. O(m) is the best compression achiev- 
able for anym x m sized pattern under the two- 
dimensional LZ compression. 


The algorithm of [8] can be applied to any two- 
dimensional compressed text, in which the com- 
pression technique allows sequential decompres- 
sion in small space. 


Applications 


The problem has many applications since two- 
dimensional data appears in many different types 
of compression. The two compressions discussed 
here are the run-length compression, used by fax 
transmissions, and the LZ compression, used by 
GIF. 


Open Problems 


Any lossless two-dimensional compression used, 
especially one with a large compression ratio, 
presents the problem of enabling the search with- 
out uncompressing the data for saving of both 
time and space. 

Searching in two-dimensional lossy compres- 
sions will be a major challenge. Initial steps in 
this direction can be found in [14, 15], where 
JPEG compression is considered. 
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Problem Definition 


Given two two-dimensional arrays, the fext 
T[l...2,1...n] and the pattern P[l...m, 
1...m],m <n, both with element values from 
alphabet &' of size o, the basic two-dimensional 
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string matching (2DSM) problem is to find all 
occurrences of P in T,i.e., allmxm subarrays of 
T that are identical to P. In addition to the basic 
problem, several types of generalizations are 
considered: approximate matching (allow local 
errors), invariant matching (allow global trans- 
formations), and multidimensional matching. 

In approximate matching, an occurrence is a 
subarray S of the text, whose distance d(S, P) 
from the pattern does not exceed a threshold 
k. Different distance measures lead to different 
variants of the problem. When no distance is 
explicitly mentioned, the Hamming distance, the 
number of mismatching elements, is assumed. 

For one-dimensional strings, the most com- 
mon distance is the Levenshtein distance, the 
minimum number of insertions, deletions, and 
substitutions for transforming one string into the 
other. A simple generalization to two dimen- 
sions is the Krithivasan—Sitalakshmi (KS) dis- 
tance, which is the sum of row-wise Leven- 
shtein distances. Baeza-Yates and Navarro [6] 
introduced several other generalizations, one of 
which, the RC distance, is defined as follows. A 
two-dimensional array can be decomposed into 
a sequence of rows and columns by removing 
either the last row or the last column from the 
array until nothing is left. Different decomposi- 
tions are possible depending on whether a row 
or a column is removed at each step. The RC 
distance is the minimum cost of transforming a 
decomposition of one array into a decomposition 
of the other, where the minimum is taken over 
all possible decompositions as well as all possi- 
ble transformations. A transformation consists of 
insertions, deletions, and modifications of rows 
and columns. The cost of inserting or deleting 
a row/column is the length of the row/column, 
and the cost of modification is the Levenshtein 
distance between the original and the modified 
row/column. 

The invariant matching problems search 
for occurrences that match the pattern after 
some global transformation of the pattern. In 
the scaling invariant matching problem, an 
occurrence is a subarray that matches the pattern 
scaled by some factor. If only integral scaling 
factors are allowed, the definition of the problem 
is obvious. For real-valued scaling, a refined 


Multidimensional String Matching 


model is needed, where the text and pattern 
elements, called pixels in this case, are unit 
squares on a plane. Scaling the pattern means 
stretching the pixels. An occurrence is a matching 
M between text pixels and pattern pixels. The 
scaled pattern is placed on top of the text with 
one corner aligned, and each text pixel 7'[r, s], 
whose center is covered by the pattern, is matched 
with the covering pattern pixel P[r’,s’], ie., 
([r, s], [r’, s’]) € M. 

In the rotation invariant matching problem, 
too, an occurrence is a matching between text 
pixels and pattern pixels. This time the center of 
the pattern is placed at the center of a text pixel, 
and the pattern is rotated around the center. The 
matching is again defined by which pattern pixels 
cover which text pixel centers. 

All the problems can be generalized to more 
than two dimensions. In the d-dimensional prob- 
lem, the text is an n@ array and the pattern an 
mé array. The focus is on two dimensions, but 
multidimensional generalizations of the results 
are mentioned when they exist. 

Many other variants of the problems are omit- 
ted here due to a lack of space. Some of them as 
well as some of the results in this entry are sur- 
veyed by Amir [1]. A wider range of problems as 
well as traditional image processing techniques 
for solving them can be found in [9]. 


Key Results 


The classical solution to the 2DSM problem by 
Bird [8] and independently by Baker [7] reduces 
the problem to one-dimensional string matching. 
It has two phases: 


1. Find all occurrences of pattern rows on 
the text rows and mark them. This takes 
O(n? logmin(m,o)) time using the Aho— 
Corasick algorithm. On an integer alphabet 
» = {0,1,...,0 — 1}, the time can be 
improved to O(n? +m? min(m, 0) +0) using 
O(m? min(m,o) + ©) space. 

2. The pattern is considered a sequence of m 
rows and each n x m subarray of the text a 
sequence of n rows. The Knuth—Morris—Pratt 
string matching algorithm is used for finding 
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the occurrences of the pattern in each subarray. 
The algorithm makes O(n) row comparisons 
for each of the n — m + 1 subarrays. With the 
markings from Step 1, a row comparison can 
be done in constant time, giving O(n”) time 
complexity for Step 2. 


The time complexity of the Bird—Baker 
algorithm is linear if the alphabet size o is 
constant. The algorithm of Amir, Benson, and 
Farach [4] (with improvements by Galil and 
Park [13]) achieves linear time independent of 
the alphabet size using a quite different kind of 
algorithm based on string matching by duels and 
two-dimensional periodicity. 


Theorem 1 (Bird [8]; Baker [7]; Amir, Benson, 
and Farach [4]) The 2DSM problem can be 
solved in the optimal O(n) worst-case time. 


The Bird—Baker algorithm generalizes 
straightforwardly into higher dimensions by 
repeated application of Step 1 to reduce a 
problem in d dimensions into n—m + 1 problems 
in d — 1 dimensions. The time complexity 
is O(dn? logm®). The Amir—Benson—Farach 
algorithm has been generalized to three 
dimensions with the time complexity O(n?) [14]. 

The average-case complexity of the 2DSM 
problem was studied by Kaérkkainen and Ukko- 
nen [16], who proved a lower bound and gave an 
algorithm matching the bound. 


Theorem 2 (Karkkainen and Ukkonen [16]) 
The 2DSM problem can be solved in the optimal 
O(n? (log, m)/m?) average-case time. 


The result (both lower and upper bound) 
generalizes to the d-dimensional case with 
the O(n4¢ log, m/ m?) average-case__ time 
complexity. 

Amir and Landau [3] give algorithms for ap- 
proximate 2DSM problems for both the Ham- 
ming distance and the KS distance. The RC 
model was developed and studied by Baeza— 
Yates and Navarro [6]. 


Theorem 3 (Amir and Landau [3]; Baeza— 
Yates and Navarro [6]) The approximate 2DSM 
problem can be solved in O(kn?) worst-case time 
for the Hamming distance, in O(k?n?) worst- 
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case time for the KS distance, and in O(k?mn*) 
worst-case time for the RC distance. 


The results for the KS and RC distances gener- 
alize to d dimensions with the time complexities 
O(k(k + d)n®) and O(d!m?4n4), respectively. 

Approximate matching algorithms with 
good average-case complexity are described by 
Karkkainen and Ukkonen [16] for the Hamming 
distance and by Baeza—Yates and Navarro [6] for 
the KS and RC distances. 


Theorem 4 (Karkkainen and Ukkonen [16]; 
Baeza-Yates and Navarro [6]) The ap- 
proximate 2DSM problem can be solved in 
O(kn? (log, m)/m?) average-case time for the 
Hamming and KS distances and in O(n?/m) 
average-case time for the RC distance. 


The results for the Hamming and the RC 
distance have d-dimensional generalizations with 
the time complexities O(kn@ (log, m?)/m®) 
and O(kn? /m?~), respectively. 

The scaling and rotation invariant 2DSM 
problems involve a continuous valued parameter 
(scaling factor or rotation angle). However, the 
corresponding matching between text and pattern 
pixels changes only at certain points, and there 
are only O(nm) effectively distinct scales and 
O(m?) effectively distinct rotation angles. A 
separate search for each distinct scale or rotation 
would give algorithms with time complexities 
O(n3m) and O(n?m3), but faster algorithms 
exist. 


Theorem 5 (Amir and Chencinski [2]; Amir, 
Kapah, and Tsur [5]) The scaling invariant 
2DSM problem can be solved in O(n?m) worst- 
case time, and the rotation invariant 2DSM prob- 
lem in O(n?m?) worst-case time. 


Fast average-case algorithms for the rotation 
invariant problem are described by Fredriksson, 
Navarro, and Ukkonen [12]. They also consider 
approximate matching versions. 


Theorem 6 (Fredriksson, Navarro, and Ukko- 
nen [12]) The rotation invariant 2DSM problem 
can be solved in the optimal O(n? (log, m)/m7) 
average-case time. The rotation invariant approx- 
imate 2DSM problem can be solved in the optimal 
O(n?(k + log, m)/m?) average-case time. 
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Fredriksson, Navarro, and Ukkonen [12] also 
consider rotation invariant matching in d dimen- 
sions. 

Hundt, Liskiewicz, and Nevries [15] show that 
there are O(n*m?) effectively distinct combina- 
tions of scales and rotations and give an O(n®m?) 
time algorithm for finding the best match under a 
distance that generalizes the Hamming distance 
implying the following result. 


Theorem 7 (Hundt, Liskiewicz, and Nevries [15]) oi 


The scaling and rotation invariant 2DSM and 
approximate 2DSM problems can be solved in 
O(n®m7?) time. 


Applications 


The main application area is pattern matching in 
images, particularly applications where the point 
of view in the image is well defined, such as aerial 
and astronomical photography, optical character 
recognition, and biomedical imaging. Even three- 
dimensional problems arise in biomedical appli- 
cations [10]. 


Open Problems 


There may be some room for improving the 
results under the combined scaling and rota- 
tion invariance using techniques similar to those 
in Theorems 5 and 6. Many combinations of 
the different variants of the problem have not 
been studied. With rotation invariant approximate 
matching under the RC distance even the problem 
needs further specification. 


Experimental Results 


No conclusive results exist though some experi- 
ments are reported in [10, 11, 15, 16]. 
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Problem Definition 


The problem is concerned with scheduling dy- 
namically arriving jobs in the scenario when the 
processing requirements of jobs are unknown 
to the scheduler. This is a classic problem that 
arises, for example, in CPU scheduling, where 
users submit jobs (various commands to the op- 
erating system) over time. The scheduler is only 
aware of the existence of the job and does not 
know how long it will take to execute, and the 
goal is to schedule jobs to provide good quality of 
service to the users. Formally, this note considers 
the average flow time measure, defined as the 
average duration of time since a job is released 
until its processing requirement is met. 


Notations 


Let J = {1,2,...,n} denote the set of jobs in 
the input instance. Each job j is characterized by 
its release time r; and its processing requirement 
p;- In the online setting, job 7 is revealed to the 
scheduler only at time r;. A further restriction 
is the non-clairvoyant setting, where only the 
existence of job 7 is revealed at r;, in particular 
the scheduler does not know p; until the job 
meets its processing requirement and leaves the 
system. Given a schedule, the completion time 
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cj; Of a job is the earliest time at which job j 
receives p; amount of service. The flow time /; 
of j is defined as c; —r;. A schedule is said to be 
preemptive, if a job can be interrupted arbitrarily, 
and its execution can be resumed later from 
the point of interruption without any penalty. It 
is well known that preemption is necessary to 
obtain reasonable guarantees even in the offline 
setting [4]. 

There are several natural non-clairvoyant algo- 
rithms such as first come first served, processor 
sharing (work on all current unfinished jobs at 
equal rate), and shortest elapsed time first (work 
on job that has received least amount of service 
thus far). Coffman and Kleinrock [2] proposed 
another natural algorithm known as the multi- 
level feedback queueing (MLF). MLF works as 
follows: there are queues Qo, Q1, Q2,... and 
thresholds 0 < to < ft) < t.... Initially upon 
arrival, a job is placed in Qo. When a job in 
Q; receives t; amount of cumulative service, it is 
moved to Q;+1. The algorithm at any time works 
on the lowest numbered nonempty queue. Coff- 
man and Kleinrock analyzed MLF in a queuing 
theoretic setting, where the jobs arrive according 
to a Poisson process and the processing require- 
ments are chosen identically and independently 
from a known probability distribution. 

Recall that the online shortest remaining 
processing time (SRPT) algorithm that at any 
time works on the job with the least remaining 
processing time produces an optimum schedule. 
However, SRPT requires the knowledge of job 
sizes and hence is not non-clairvoyant. Since a 
non-clairvoyant algorithm only knows a lower 
bound on a jobs size (determined by the amount 
of service it has received thus far), MLF tries to 
mimic SRPT by favoring jobs that have received 
the least service thus far. 


Key Results 


While non-clairvoyant algorithms have been 
studied extensively in the queuing theoretic set- 
ting for many decades, this notion was considered 
relatively recently in the context of competitive 
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analysis by Motwani, Phillips, and Torng [5]. 
As in traditional competitive analysis, a non- 
clairvoyant algorithm is called c-competitive if 
for every input instance, its performance is no 
worse than c times that of the optimum offline 
solution for that instance. Motwani, Phillips, and 
Torng showed the following. 


Theorem 1 ([5]) For the problem of minimizing 
average flow time on a single machine, any 
deterministic non-clairvoyant algorithm must 
have a competitive ratio of at least Q(n‘/3), 
and any randomized algorithm must have a 
competitive ratio of at least Q(log n), where 
n is number of jobs in the instance. 


It is not too surprising that any deterministic 
algorithm must have a poor competitive ratio. 
For example, consider MLF where the thresholds 
are powers of 2, ie., 1,2,4,.... Sayn = ok 
jobs of size 2* + 1 each arrive at times 0, 7 2- 
2*....,(2* — 1)2*, respectively. Then, it is easily 
verified that the average flow time under MLF is 
Q(n”), where as the average flow time is under 
the optimum algorithm is Q(7). 

Note that MLF performs poorly on the above 
instance since all jobs are stuck till the end with 
just one unit of work remaining. Interestingly, 
Kalyanasundaram and Pruhs [3] designed a ran- 
domized variant of MLF (known as RMLF) and 
proved that its competitive ratio is almost opti- 
mum. For each job j, and for each queue Q;, the 
RMLF algorithm sets a threshold ¢;,; randomly 
and independently according to a truncated expo- 
nential distribution. Roughly speaking, setting a 
random threshold ensures that if a job is stuck in 
a queue, then its remaining processing is a rea- 
sonable fraction of its original processing time. 


Theorem 2 ((3]) The RMLF_ algorithm is 
O(log n log log n) competitive against an 
adversary. Moreover, the RMLF 
algorithm is O(log n log log n) competitive 
even against an adaptive adversary provided the 
adversary chooses all the job sizes in advance. 


oblivious 


Later, Becchetti and Leonardi [1] showed that 
in fact the RMLF is optimally competitive up to 
constant factors. They also analyzed RMLF on 
identical parallel machines. 
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Theorem 3 ({1]) The RMLF algorithm is 
O(log n) competitive for a single machine. For 
multiple identical machines, RMLF achieves a 
competitive ratio of O(log n log (4)), where m 
is the number of machines. 


Applications 


MLE and its variants are widely used in operating 
systems [6, 7]. These algorithms are not only 
close to optimum with respect to flow time but 
also have other attractive properties such as the 
amortized number of preemptions is logarithmic 
(preemptions occur only if a job arrives or departs 
or moves to another queue). 


Open Problems 


It is not known whether there exists a o(n)- 
competitive deterministic algorithm. It would be 
interesting to close the gap between the upper 
and lower bounds for this case. Often in real 
systems, even though the scheduler may not know 
the exact job size, it might have some information 
about its distribution based on historical data. 
An interesting direction of research could be 
to design and analyze algorithms that use this 
information. 
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Problem Definition 


The topic of this article is the parameterized 
multilinear monomial detection problem: 


k-MLD: Given an arithmetic circuit C 
representing a polynomial P(X) over Z4+, 
decide whether P(X) construed as a sum of 
monomials contains a multilinear monomial 
of degree k. 


An arithmetic circuit is a directed acyclic graph 
with nodes corresponding to addition and mul- 
tiplication gates, sources labeled with variables 
from a set X or positive integers, and one special 
terminal corresponding to the output gate. A 
monomial of degree k is a product of exactly k 
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variables from X, and it is called multilinear if 
these & variables are distinct. 

The k-MLD problem is arguably a funda- 
mental problem, encompassing as special cases 
many natural and well-studied parameterized 
problems. Along with the algorithm for its 
solution, K-MLD provides a general framework 
for designing parameterized algorithms [11, 15]. 
The framework has yielded the fastest known 
algorithms for many parameterized problems, 
including all parameterized decision problems 
that were previously known to be solvable via 
dynamic programming combined with the color- 
coding method [2]. 


Key Results 


Theorem 1 The k-MLD problem can be solved 
by a randomized algorithm with one-sided error 
in O* (2*) time and polynomial space. (The O* (-) 
notation hides factors polynomial in the size of 
the input.) 


The algorithm claimed in Theorem | always 
reports the correct answer when the input poly- 
nomial does not contain a degree-k multilinear 
monomial. In the opposite case, it reports a cor- 
rect answer with probability at least 1/4. 


Overview of the Algorithm 

The algorithm utilizes a set of commutative ma- 
. K yok : . 

trices M C Ze 2" with the following proper- 

ties: 


(i) Foreach M € M, we have M? = 0 mod 2. 

(ii) If M,,..., My are randomly selected matri- 
ces from M, then their product is equal to 
the “all-ones” matrix 12" *2" , with probabil- 
ity at least 1/4, and 0 mod 2 otherwise. 


The construction of M is based on matrix rep- 
resentations of the group Te. i.e., the abelian 
group of k-dimensional 0-1 vectors with addition 
mod 2 [11]. 

To simplify our discussion, let us assume that 
all monomials in P(X) have total degree k. The 
role of M is then almost self-evident: evaluating 
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P(X) onarandom assignment X : X > M will 
annihilate (mod 2) all non-multilinear monomials 
in P(X) due to property (i). On the other hand, 
each degree-k multilinear monomial will “sur- 
vive” the assignment with constant probability, 
due to property (ii). 

However, property (ii) clearly does not suffice 
for P(X) to evaluate to nonzero (with some prob- 
ability) in the presence of multilinear monomials. 
The main reason is that the coefficients of all 
multilinear monomials in P(X) may be equal 
to 0 mod 2. The solution is the introduction of 
a new set A of “fingerprint” variables in order 
to construct an extended polynomial P(X, A) 
over Zz. The key property of P(X, A) is that 
its multilinear monomials are in a one-to-one 
correspondence with copies of the multilinear 
monomials in P(X). Specifically, each copy of 
a multilinear monomial 4(X) of P(X) gets its 
own distinct multilinear monomial of the form 
q(A)p(X) in P (X, A). The extended polynomial 
is constructed by applying simple operations on 
C. In most cases it is enough to attach a dis- 
tinct “multiplier” variable from A to each edge 
of C; in the general case, some more work is 
needed that may increase the size of C by a 
quadratic factor in the worst case. We can then 
consider what is the effect of evaluating P (X, A) 
on X: (i) If P(X) does not contain any degree- 
k multilinear monomial, then each monomial of 
P(X, A) contains a squared variable from X. 
Hence P(X, A) is equal to 0 mod 2. On the other 
hand, if P(X) does contain a degree-k multilin- 
ear monomial, then, by construction, the diagonal 
entries of P(X , A) are all equal to a nonzero 
polynomial Q(A), with probability at least 1/4. 
Due to its size, we cannot afford to write down 
Q(A), but we do have “black-box” access to it 
via evaluating it. We can thus test it for identity 
with zero via the Schwartz-Zippel Lemma [14]. 
This requires only a single evaluation of Q(A) on 
a random assignment A : A > GF[2!022*+10]. 
Overall, the algorithm returns a “yes” if and only 
if P(X, A) £0 mod 2. 

By the properties of M, it suffices to compute 
the trace of P(X, A) or equivalently the sum of 
its eigenvalues. As observed in [11, 13], this can 
be done with 2* evaluations of P(X, A) over the 
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ring of polynomials Z[A]. This yields the O* (2*) 
time and polynomial space claims. 

The construction of an extended polynomial 
P(X, A) was used in [11, 15] for two special 
cases, but it can be generalized to arbitrary poly- 
nomials as claimed in [13]. The idea to use the 
Schwartz-Zippel Lemma in order to test Q(A) for 
identity with zero appeared in [15]. 


A Negative Result 

The matrices in M together with multiplication 
and addition modulo 2 form a commutative ma- 
trix algebra A which has 22" elements. Com- 
putations with this algebra require time at least 
2‘ since merely describing one element of A 
requires 2 bits. One may ask whether there 
is another significantly smaller algebra that can 
replace A in this evaluation-based framework and 
yield a faster algorithm. The question was an- 
swered in the negative in [13], for the special but 
important case when k = |X|. Concretely, it was 
shown that there is no evaluation-based algorithm 
that can detect a multilinear term of degree n in 
an n-variate polynomial in 0(2”/./n) time. 


A Generalization 

The CONSTRAINED k-MLD problem is a gener- 
alization of kK-MLD that was introduced in [12]. 
The set X is a union of ¢ mutually disjoint sets 


of variables X;,...,X;, each associated with a 
positive number jz;. The sets X; and the numbers 
fi, fori = 1,...,t¢ are part of the input. A 


multilinear monomial is permissible if it contains 
at most j4; variables from X;, for all i. The 
problem is then defined as follows: 


CONSTRAINED k-MLD: Given an arithmetic 
circuit C representing a polynomial P(X) 
over Z,, decide whether P(X) construed as 
a sum of monomials contains a permissible 
multilinear monomial of degree k. 


Theorem 2 The CONSTRAINED k-MLD prob- 
lem can be solved by a randomized algorithm 
with one-sided error in O* (2*) time and polyno- 
mial space. 


Theorem 2 was shown in [5]. It is worth noting 
that the algorithm in the proof of Theorem 2 does 
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not rely on matrix algebras. Thus, it provides an 
alternative proof for Theorem 1. 


Applications 


Many parameterized problems are reducible to 
the k-MLD problem. For several of these prob- 
lems, the fastest — in terms of the exponen- 
tial dependence on k — known algorithms are 
composed by a relatively simple reduction to 
a k-MLD instance and a subsequent invocation 
of the algorithm for its solution. Such prob- 
lems include the k-TREE problem on directed 
graphs and certain packing problems [11, 13, 
15]. 

The k-MLD framework provides an expo- 
nentially faster alternative to the color-coding 
method [2] for parameterized decision problems. 
As noted in [8], color-coding-based algorithms 
consist canonically of a random assignment of k 
colors to entities of the input (e.g., to the vertices 
of graph) followed by a dynamic programming 
procedure. From the steps of the dynamic pro- 
gramming procedure, one can delineate the con- 
struction of a k-MLD instance. This task was, for 
example, undertaken in [9], giving improved al- 
gorithms for all subgraph containment problems 
previously solved via color coding. 

The algebraic language provided by the 
k-MLD framework has simplified the design 
of parameterized algorithms, yielding faster 
algorithms even for problems for which the 
applicability of color coding was not apparent 
due to the more complicated underlying 
dynamic programming procedures. This includes 
partial graph domination problems [13] and 
parameterized string problems in computational 
biology [6]. 

All algebraizations used in [11, 13] construct 
polynomials whose monomials are in one-to-one 
correspondence with potential solutions of the 
underlying combinatorial problem (e.g., length- 
k walks). The actual solutions (e.g., &-paths) are 
mapped to multilinear monomials, while nonso- 
lutions are mapped to non-multilinear monomi- 
als. Andreas Bjorklund introduced a significant 
departure from this approach [4]. He viewed 
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modulo 2 computations as a resource rather as 
a nuisance and worked on sharper algebraiza- 
tions using appropriate determinant polynomi- 
als. These polynomials do contain multilinear 
monomials corresponding to nonsolutions, unlike 
previous algebraizations; however, these come 
in pairs and they cancel out modulo 2. On the 
other hand, with the appropriate use of “finger- 
print” variables, the multilinear monomials cor- 
responding to valid solutions appear with a coef- 
ficient of 1. Bjérklund’s novel ideas led initially 
to a faster algorithm for d-dimensional match- 
ing problems [4] and subsequently to the break- 
through O(1.67”) time algorithm for the HAMIL- 
TONIAN PATH problem on n-vertex graphs [3], 
breaking below the O*(2”) barrier and marking 
the first progress in the problem after nearly 
50 years of stagnation. 

Modulo 2 cancelations were also explicitly 
exploited in the design of the first single- 
exponential time algorithms for various 
graph connectivity problems parameterized by 
treewidth [7]. For example, the HAMILTONIAN 
PATH problem can now be solved in O*(4") time 
for n-vertex graphs, assuming a tree decomposi- 
tion of width at most f is given as input. The pre- 
viously fastest algorithm runs in O*(t?) time. 


Open Problems 


The following problems are open: 


¢ Color coding stood until recently as the fastest 
deterministic algorithm for the kK-MLD prob- 
lem, running in O* ((2e)*) time, the currently 
fastest deterministic algorithm in O*(2.85*) 
time [10]. Is there a deterministic algorithm 
for kK-MLD that runs in O*(2*) time? 

e The known deterministic algorithms can also 
solve the weighted version of k-PATH which 
asks for a k-path of minimum weight, in 
O* (2.85* log W) time, where W is the largest 
edge weight in the graph. The algorithm for 
k-MLD can be adapted to solve weighted 
k-pATH in O*(2*W) time. Is there an 
O*(2* log W) time algorithm for weighted 
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k-PATH and, more generally, for weighted 
versions of k-MLD? 

¢ Color coding with balanced hashing families 
has been used to approximately count k-paths 
in a graph, in O*((2e)*) time [1]. Is there an 
O* (2*) approximate counting algorithm? 

¢ Finally, is there an algorithm for kK-MLD that 
runs in O*(2¢*) time, for any € < 1? Such 
an algorithm would constitute major progress 
in exact and parameterized algorithms for NP- 
hard problems. 
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Problem Definition 


Given a finite set of k pattern strings P = 
{P}, P?,..., PY and a text string T = 
tito...t,, T and the P's being sequences 
over an alphabet » of size o, the multiple 
string matching (MSM) problem is to find 
one or, more generally, all the text positions 
where a P’ occurs in T. More precisely the 
problem is to compute the set {j | di, P’ = 
tjtjti-.. ti|Pilad or equivalently the set 
{j | ii,P' = tj_|pij4itj—|Pi|42 ... tj}. Note 
that reporting all the occurrences of the patterns 
may lead to a quadratic output (e.g., when P's 
and 7 are drawn from a one-letter alphabet). The 
length of the shortest pattern in P is denoted by 
£min. This problem is an extension of the exact 
string matching problem. 

Both worst- and average-case complexities are 
considered. For the latter one assumes that pattern 
and text are randomly generated by choosing 
each character uniformly and independently from 
>’. For simplicity and practicality the assumption 
| P| = o(n) is made, for 1 <i < k, in this entry. 


Key Results 


A first solution to the multiple string match- 
ing problem consists in applying an exact string 
matching algorithm for locating each pattern in 
Pf. This solution has an O(kn) worst-case time 
complexity. There are more efficient solutions 
along two main approaches. The first one, due 
to Aho and Corasick [1], is an extension of the 
automaton-based solution for matching a sin- 
gle string. The second approach, initiated by 
Commentz-Walter [5], extends the Boyer-Moore 
algorithm to several patterns. 

The Aho-Corasick algorithm first builds a trie 
T(P), a digital tree recognizing the patterns of P. 
The trie 7 (P) is a tree whose edges are labeled by 
letters and whose branches spell the patterns of 
P. A node p in the trie T(P) is associated with 
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the unique word w spelled by the path of T(P) 
from its root to p. The root itself is identified with 
the empty word e. Notice that if w is a node in 
T(P), then w is a prefix of some P’ € P. If in 
addition a € »’, then child(w, a) is equal to wa if 
wa is anode in T (P); it is equal to NIL otherwise. 

During a second phase, when patterns are 
added to the trie, the algorithm initializes an out- 
put function out. It associates the singleton { P’} 
with the nodes P! (1 <i < k) and associates the 
empty set with all other nodes of T(P). 

Finally, the last phase of the preprocessing 
consists in building a failure link for each node of 
the trie and simultaneously completing the output 
function. The failure function fail is defined on 
nodes as follows (w is a node): fail(w) = u where 
u is the longest proper suffix of w that belongs to 
T(P). Computation of failure links is done dur- 
ing a breadth-first traversal of T(P). Completion 
of the output function is done while computing 
the failure function fail using the following rule: 
if fail(w) = u, then out(w) = out(w) U out(u). 

To stop going back with failure links during 
the computation of the failure links, and also to 
overpass text characters for which no transition is 
defined from the root during the searching phase, 
a loop is added on the root of the trie for these 
symbols. This finally produces what is called a 
pattern matching machine or an Aho-Corasick 
automaton (see Fig. 1). 

After the preprocessing phase is completed, 
the searching phase consists in parsing the text 
T with T(P). This starts at the root of T(P) 
and uses failure links whenever a character in T 
does not match any label of outgoing edges of the 
current node. Each time a node with a nonempty 
output is encountered, this means that the patterns 
of the output have been discovered in the text, 
ending at the current position. Then, the position 
is output. 


Theorem 1 (Aho and Corasick [1]) After pre- 
processing P, searching for the occurrences of 
the strings of P in a text T can be done in time 
O(n x logo). The running time of the associated 
preprocessing phase is O(|P| x logo). The extra 
memory space required for both operations is 


O(|P)). 
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c ¢ {s,e,a,c} 
a 


state 0 2 3 4 5 6 

prefix € se sea sear  searc search 
fail 0 7 8 9 2 B 

out {ear} {search,arch} 
state 7 8 9 10 11 12 13 

prefix e ea ear a ar arc arch 

fal 0 10 ll O 0 14 15 

out {ear} {arch} 

state 14 15 16 17 18 19 

prefix c ch chachar chart 


fal 0 0 10 11 O 
out {chart} 


Multiple String Matching, Fig. 1 The pattern matching 
machine or Aho-Corasick automaton for the set of strings 
{search, ear, arch, chart} 


The Aho-Corasick algorithm is actually a gen- 
eralization to a finite set of strings of the Morris- 
Pratt exact string matching algorithm. 

Commentz-Walter [5] generalized the Boyer- 
Moore exact string matching algorithm to mul- 
tiple string matching. Her algorithm builds a 
trie for the reverse patterns in P together with 
two shift tables and applies a right to left scan 
strategy. However, it is intricate to implement and 
has a quadratic worst-case time complexity. 

The DAWG-match algorithm [6] is a general- 
ization of the BDM exact string matching algo- 
rithms. It consists in building an exact indexing 
structure for the reverse strings of P such as 
a factor automaton or a generalized suffix tree, 
instead of just a trie as in the Aho-Corasick and 
Commentz-Walter algorithms (see Fig.2). The 
overall algorithm can be made optimal by using 
both an indexing structure for the reverse patterns 
and an Aho-Corasick automaton for the patterns. 
Then, searching involves scanning some portions 
of the text from left to right and some other 
portions from right to left. This enables to skip 
large portions of the text T. 
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Multiple String Matching, Fig. 2 An example of 
DAWG, index structure used for matching the set of 
strings {search, ear, arch, chart}. The automaton 
accepts the reverse prefixes of the strings 


Theorem 2 (Crochemore et al. [6]) The 
DAWG-match algorithm performs at most 2n 
symbol comparisons. Assuming that the sum 
of the length of the patterns in P is less than 
Lmink, the DAWG-match algorithm makes on 
average O((nlog, €min)/€min) inspections of 
text characters. 


The bottleneck of the DAWG-match algorithm 
is the construction time and space consumption of 
the exact indexing structure. This can be avoided 
by replacing the exact indexing structure by a 
factor oracle for a set of strings. A factor or- 
acle is a simple automaton that may recognize 
a few additional strings comparing to exact in- 
dexing structure. When the factor oracle is used 
alone, it gives the Set Backward Oracle Matching 
(SBOM) algorithm [2]. It is an exact algorithm 
that behaves almost as well as the DAWG-match 
algorithm. 

The bit-parallelism technique can be used to 
simulate the DAWG-match algorithm. It gives 
the MultiBNDM algorithm of Navarro and Raf- 
finot [9]. This strategy is efficient when k x £min 
bits fit in a few computer words. The prefixes of 
strings of P of length €min are packed together 
in a bit vector. Then, the search is similar to the 


Multiple String Matching 


BNDM exact string matching and is performed 
for all the prefixes at the same time. 

The use of the generalization of the bad- 
character shift alone as done in the Horspool 
exact string matching algorithm gives poor per- 
formances for the MSM problem due to the 
high probability of finding each character of the 
alphabet in one of the strings of P. 

The algorithm of Wu and Manber [13] consid- 
ers blocks of length €. Blocks of such a length 
are hashed using a function / into values less 
than maxvalue. Then shift[h(B)] is defined as the 
minimum between |P'| — j and min — £4 1 
with B = Pps for 1 < i < k and 
1 < j < |P'|. The value of € varies with the 
minimum length of the strings in P and the size of 
the alphabet. The value of maxvalue varies with 
the memory space available. 

The searching phase of the algorithm consists 
in reading blocks B of length @. If shift(h(B)| > 
0, then a shift of length shift[h(B)| is applied. 
Otherwise, when shift(h(B)| = 0, the patterns 
ending with block B are examined one by one 
in the text. The first block to be scanned is 
témin—€+1 ---témin- This method is incorporated in 
the agrep command [12]. 

Recent works have been devoted to multiple 
string matching on packed strings where each 
symbol is encoded using logo bits. In this 
context, Belazzougui [3] gave an efficient 
algorithm that works in O(n + ((logk + 
log £min+ log log |P|)/€min+ (log 0)/w) +occ) 
where w is the size of the machine word and occ 
is the number of occurrences of patterns of P in 
T. On average it is possible to solve the problem 
in O(n/£min) time using O(|P|log|P]) bits of 
space [4]. 


Applications 


MSM algorithms serve as basis for multidimen- 
sional pattern matching and approximate pattern 
matching with wildcards. The problem has many 
applications in computational biology, database 
search, bibliographic search, virus detection in 
data flows, and several others. 
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Experimental Results 


The book of G. Navarro and M. Raffinot [10] is 
a good introduction to the domain. It presents 
experimental graphics that report experimental 
evaluation of multiple string matching algorithms 
for different alphabet sizes, pattern lengths, and 
sizes of pattern set. 


URLs to Code and Data Sets 


Well-known packages offering efficient MSM 
are agrep (https://github.com/Wikinaut/agrep) 
and grep with the -F option (http://www.gnu.org/ 
software/grep/grep.html). 
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Multiple Unit Auctions with Budget Constraint 


Problem Definition 


In this problem, an auctioneer would like to sell 
an idiosyncratic commodity with m copies to n 
bidders, denoted by i = 1,2,...,n. Each bidder 
i has two kinds of privately known information: 
te Rt, fe € R*. ¢ represents the price buyer 
i is willing to pay for per copy of the commodity 
and ie represents i’s budget. 

Then a one-round sealed-bid auction proceeds 
as follows. Simultaneously, all the bidders submit 
their bids to the auctioneer. When receiving the 
reported unit value vector u = (w1,...,Un) and 
the reported budget vector b = (b1...,bn) of 
bids, the auctioneer computes and outputs the 
allocation vector x = (x,,...,X,) and the price 
vector p = (p1,..-, Pn). Each element of the 
allocation vector indicates the number of copies 
allocated to the corresponding bidder. If bidder 
i receives x; copies of the commodity, he pays 
the auctioneer p;x;. Then bidder 7’s total payoff 
is (¢;' — Di)x; if xi pi < e and —oo otherwise. 
Correspondingly, the revenue of the auctioneer is 
A(u, b,m) = >>; pixi. 

If each bidder submits his privately true unit 
value ¢;' and budget iP to the auctioneer, the 
auctioneer can determine the single price p+ (i.e., 
Vi, Pi = ps) and the allocation vector which 
maximize the auctioneer’s revenue. This optimal 
single price revenue is denoted by F(u, b, m). 

Interestingly, in this problem, we assume bid- 
ders have free will and have complete knowl- 
edge of the auction mechanism. Bidders would 
just report the bid (maybe different from his 
corresponding privately true values) which could 
maximize his payoff according to the auction 
mechanism. 

So the objective of the problem is to design 
a truthful auction satisfying voluntary participa- 
tion to raise the auctioneer’s revenue as much as 
possible. An auction is truthful if for every bidder 
i, bidding his true valuation would maximize his 
payoff, regardless of the bids submitted by the 
other bidders [7,8]. An auction satisfies voluntary 
participation if each bidder’s payoff is guaran- 
teed to be nonnegative if he reports his bid truth- 
fully. The success of the auction A is determined 
by competitive ratio 6 which is defined as the 
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F (u,b,m) 


upper bound of Aub ai [6]. Clearly, the smaller 
competitive ratio 6 is, the better the auction A is. 


Definition 1 (Multiple-Unit Auctions with 
Budget Constraint) 

INPUT: the number of copies m, the submitted 
unit value vector u, and the submitted budget 
vector b. 

OUTPUT: the allocation vector x and the price 
vector p. 


CONSTRAINTS: 


(a) Truthful; 
(b) Voluntary participation; 
(c) }0, x1 <m. 


Key Results 


Let Dmax denote the largest budget among the 
bidders receiving copies in the optimal solution 


and define ~w = —: 


Theorem 1 ([3]) A truthful auction satisfying 

voluntary participation with competitive ratio 
ws2 

1/ maxo<3<1 \(l —~sya— 20-4) can be 

designed. 


Theorem 2 ([1]) A truthful auction satisfying 
voluntary participation with competitive ratio 
40° can be designed. 


a—l 
Theorem 3 ({1]) /fa is known in advance, then 
a truthful auction satisfying voluntary partici- 


4 5 Das ~ (xa+la 
pation with competitive ratio Gaanz can be 
: _ a@—1+((@-1)?—4a)!/2 
designed, where x = ——~—,,——_.. 


Theorem 4 ({1]) For any truthful randomized 
auction A satisfying voluntary participation, the 
competitive ratio is at least 2 — € when a = 2. 


Applications 


This problem is motivated by the development of 
IT industry and the popularization of auctions, 
especially, auctions on the Internet. Multiple copy 
auctions of relatively low-value goods, such as 
the auction of online ads for search terms to 
bidders with budget constraints, is assuming a 
very important role. Companies such as Google 
and Yahoo!’s revenue depends almost on certain 
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types of auctions. There are many papers includ- 
ing [2,4,5] which focus on different facets of the 
same model. 
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Problem Definition 


This problem is motivated by an important and 
timely application in computational biology that 
arises in whole-genome shotgun sequencing. 
Shotgun sequencing is a high throughput 
technique that has resulted in the sequencing 
of a large number of bacterial genomes as 
well as Drosophila (fruit fly) and Mouse and 
the celebrated Human genome (at Celera) (see, 
e.g., [8]). In all such projects, one is left with 
a collection of DNA fragments. These fragments 
are subsequently assembled, in-silico, by a com- 
putational algorithm. The typical assembly algo- 
rithm repeatedly merges overlapping fragments 
into longer fragments called contigs. For various 
biological and computational reasons some re- 
gions of the DNA cannot covered by the contigs. 
Thus, the contigs must be ordered and oriented 
and the gaps between them must be sequenced 
using slower, more tedious methods. For further 
details see, e.g., [3]. When the number of gaps 
is small (e.g., less than ten) biologists often use 
combinatorial PCR. This technique initiates a set 
of “bi-directional molecular walks” along the 
gaps in the sequence; these walks are facilitated 
by PCR. In order to initiate the molecular walks 
biologists use primers. Primers are designed so 
that they bind to unique (with respect to the entire 
DNA sequence) templates occurring at the end 
of each contig. A primer (at the right temperature 
and concentration) anneals to the designated 
unique DNA substring and promotes copying of 
the template starting from the primer binding site, 
initiating a one-directional walk along the gap in 
the DNA sequence. A PCR reaction occurs, and 
can be observed as a DNA ladder, when two 
primers that bind to positions on two ends of the 
same gap are placed in the same test tube. 

If there are N contigs, the combinatorial (ex- 
haustive) PCR technique tests all possible pairs 
(quadratically many) of 2N primers by placing 
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two primers per tube with the original uncut DNA 
strand. PCR products can be detected using gels 
or they can be read using sequencing technology 
or DNA mass-spectometry. When the number 
of gaps is large, the quadratic number of PCR 
experiments is prohibitive, so primers are pooled 
using K > 2 primers per tube; this technique is 
called multiplex PCR [4]. This problem deals 
with finding optimal strategies for pooling the 
primers to minimize the number of biological 
experiments needed in the gap-closing process. 

This problem can be modeled as the problem 
of identifying or learning a hidden matching 
given a vertex set V and an allowed query opera- 
tion: for a subset F C V, the query Qf is “does 
F contain at least one edge of the matching’’? In 
this formulation each vertex represents a primer, 
an edge of the matching represents a reaction, and 
the query represents checking for a reaction when 
a set of primers are combined in a test tube. The 
objective is to identify the matching asking as few 
queries as possible, that is performing as few tests 
as possible. For further discussion of this model 
see [3, 7]. 

This problem is of interest even in the 
deterministic, fully non-adaptive case. A family 
F of subsets of a vertex set V solves the matching 
problem on V if for any two distinct matchings 
M, and M> on V there is at least one F € F 
that contains an edge of one of the matchings and 
does not contain any edge of the other. Obviously, 
any such family enables learning an unknown 
matching deterministically and non-adaptively, 
by asking the questions Q; for each F ¢€ F. 
The objective here is to determine the minimum 
possible cardinality of a family that solves the 
matching problem on a set of n vertices. 

Other interesting variants of this problem are 
when the algorithm may be randomized, or when 
it is adaptive, that is when the queries are asked 
in k rounds, and the queries of each round may 
depend on the answers from the previous rounds. 


Key Results 


In [2], the authors study the number of queries 
needed to learn a hidden matching in several 
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models. Following is a summary of the main 
results presented in this paper. 

The trivial upper bound on the size of a family 
that solves the matching problem on n vertices is 
G) , achieved by the family of all pairs of vertices. 
Theorem | shows that in the deterministic non- 
adaptive setting one cannot do much better than 
this, namely, that the trivial upper bound is tight 
up to a constant factor. Theorem 2 improves this 
upper bound by showing a family of approxi- 
mately half that size that solves the matching 
problem. 


Theorem 1 For every n > 2, every family F 
that solves the matching problem on n vertices 


satisfies 
ne 49 [n 
~ 153\2]} 7 


Theorem 2 For every n there exists a family of 


size 
(5 + o(1)) (') 


that solves the matching problem on n vertices. 


Theorem 3 shows that one can do much better 
using randomized algorithms. That is, one can 
learn a hidden matching asking only O(n logn) 
queries, rather than order of n*. These random- 
ized algorithms make no errors, however, they 
might ask more queries with some small proba- 
bility. 


Theorem 3 The matching problem on n vertices 
can be solved by probabilistic algorithms with the 
following parameters: 


* 2 rounds and (1/(21n2))nlogn(l + 0(1)) 
x 0.72n logn queries 

¢ I round and (1/In2)nlogn(l + 0(1)) 
x 1.44n logn queries. 


Finally, Theorem 4 considers adaptive algo- 
rithms. In this case there is a tradeoff between 
the number of queries and the number of rounds. 
The more rounds one allows, the fewer tests 
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are needed, however, as each round can start only 
after the previous one is completed, this increases 
the running time of the entire procedure. 


Theorem 4 For all 3 < k < logn, there is a de- 
terministic k-round algorithm for the matching 
problem on n vertices that asks 


O (n!* 27 (logn)!= ) 


queries per round. 


Applications 


As described in section “Problem Definition’, 
this problem was motivated by the application of 
gap closing in whole-genome sequencing, where 
the vertices correspond to primers, the edges to 
PCR reactions between pairs of primers that bind 
to the two ends of a gap, and the queries to tests 
in which a set of primers are combined in a test 
tube. 

This gap-closing problem can be stated more 
generally as follows. Given a set of chemicals, 
a guarantee that each chemical reacts with at most 
one of the others, and an experimental mech- 
anism to determine whether a reaction occurs 
when several chemicals are combined in a test 
tube, the objective is to determine which pairs of 
chemicals react with each other with a minimum 
number of experiments. 

Another generalization which may have more 
applications in molecular biology is when the 
hidden subgraph is not a matching but some 
other fixed graph, or a family of graphs. The 
paper [2], as well as some other related works 
(e.g., [1, 5, 6]), consider this generalization for 
other graphs. Some of these generalizations have 
other specific applications in molecular biology. 


Open Problems 


¢ Determine the smallest possible constant c 
such that there is a deterministic non-adaptive 
algorithm for the matching problem on n ver- 
tices that performs c (5) (1 + o(1)) queries. 
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e Find more efficient deterministic k-round al- 
gorithms or prove lower bounds for the num- 
ber of queries in such algorithms. 

¢ Find efficient algorithms and prove lower 
bounds for the generalization of the problem 
to graphs other than matchings. 
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Problem Definition 


Tolerance graphs model interval relations in such 
a way that intervals can tolerate a certain degree 
of overlap without being in conflict. A graph 
G = (V,E) on7n vertices is a tolerance graph 
if there exists a collection J = {Jy |v eV} 
of closed intervals on the real line and a set 
t = {ty | v € V} of positive numbers, such that 
for any two vertices u,v € V,uv € E if and only 
if |Z, A 7,| = min{t,,t,}, where |/| denotes the 
length of the interval /. 

Tolerance graphs have been introduced in [3], 
in order to generalize some of the well-known ap- 
plications of interval graphs. If in the definition of 
tolerance graphs we replace the operation “min” 
between tolerances by “max,” we obtain the class 
of max-tolerance graphs [7]. Both tolerance and 
max-tolerance graphs have attracted many re- 
search efforts (e.g., [4,5, 7-10]) as they find 
numerous applications, especially in bioinfor- 
matics, constraint-based temporal reasoning, and 
resource allocation problems, among others [4, 
5, 7, 8]. In particular, one of their applications 
is in the comparison of DNA sequences from 
different organisms or individuals by making use 
of a software tool like BLAST [1]. 

In some circumstances, we may want to 
treat different parts of the genomic sequences 
in BLAST nonuniformly, since for instance some 
of them may be biologically less significant or we 
have less confidence in the exact sequence due to 
sequencing errors in more error-prone genomic 
regions. That is, we may want to be more 
tolerant at some parts of the sequences than at 
others. This concept leads naturally to the notion 
of multitolerance (known also as bitolerance) 
graphs [5, 11]. The main idea is to allow two 
different tolerances to each interval, one to the 
left and one to the right side, respectively. Then, 
every interval tolerates in its interior part the 
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intersection with other intervals by an amount 
that is a convex combination of these two border 
tolerances. 

Formally, let J = [/, r] be a closed interval on 
the real line and /;,r; € J be two numbers be- 
tween / andr, called tolerant points; note that it 
is not necessary that /, < r;. For every A € [0, 1], 
we define the interval J7,,,(A) = [2 + (7 —J)A, 
I, + (r — 1;)A], which is the convex combination 
of [/, /+] and [r;, r]. Furthermore, we define the set 
ZU, l.7r1) = (1,7, (A) | A € [0, 1]} of intervals. 
That is, ZU, /;,r;) is the set of all intervals that 
we obtain when we linearly transform [/, /;] into 
[r:, 7]. For an interval J, the set of tolerance inter- 
vals t of I is defined either as t = Z(/, /;, rz) for 
some values /;, 7; € I of tolerant points or as tT = 
{R}. A graph G = (V,E) is a multitolerance 
graph if there exists a collection / = {Jy |v € V} 
of closed intervals and a family t = {ty | v € V} 
of sets of tolerance intervals, such that for any two 
vertices u,v € V, uv € E if and only if there 
exists an element Q, € t, with Q,, C J, or there 
exists an element Oy € Ty with Oy C I,. Then, 
the pair (/,t) is called a multitolerance represen- 
tation of G. Tolerance graphs are a special case 
of multitolerance graphs. 

Note that, in general, the adjacency of two 
vertices u and v in a multitolerance graph G 
depends on both sets of tolerance intervals t, 
and t,. However, since the real line R is not 
included in any finite interval, if t, = {IR} for 
some vertex u of G, then the adjacency of u with 
another vertex v of G depends only on the set 
Ty of v. If G has a multitolerance representa- 
tion (J,t), in which ty #~ {R} for every v € 
V, then G is called a bounded multitolerance 
graph. Bounded multitolerance graphs coincide 
with trapezoid graphs, i.e., the intersection graphs 
of trapezoids between two parallel lines Ly and 
Lz on the plane, and have received considerable 
attention in the literature [5, 11]. However, the 
trapezoid intersection model cannot cope with 
general multitolerance graphs, in which it can be 
Ty = {R} for some vertices v. Therefore, the only 
way until now to deal with general multitolerance 
graphs was to use the inconvenient multitolerance 
representation, which uses an infinite number of 
tolerance intervals. 
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Key Results 


In this entry we introduce the first nontrivial inter- 
section model for general multitolerance graphs, 
given by objects in the 3-dimensional space, 
called trapezoepipeds. This trapezoepiped repre- 
sentation unifies in a simple and intuitive way 
the widely known trapezoid representation for 
bounded multitolerance graphs and the paral- 
lelepiped representation for tolerance graphs [9]. 
The main idea is to exploit the third dimension 
to capture the information of the vertices with 
Ty = {IR} as the set of tolerance intervals. 
This intersection model can be constructed ef- 
ficiently (in linear time), given a multitolerance 
representation. 

Apart from being important on its own, the 
trapezoepiped representation can be also used to 
design efficient algorithms and structural results. 
Given a multitolerance graph with n vertices and 
m edges, we present algorithms that compute 
a minimum coloring and a maximum clique in 
O(nlogn) time (which turns out to be opti- 
mal), and a maximum-weight independent set 
in O(m + nlogn) time (where (2(n logn) is 
a lower bound for the complexity of this prob- 
lem [2]). Moreover, a variation of this algorithm 
can compute a maximum-weight independent set 
in optimal O(n logn) time, when the input is a 
tolerance graph, thus closing the complexity gap 
of [9]. 

Given a multitolerance representation of 
a graph G = (V,E£), vertex v € V is called 
bounded if ty = Z(1y, It,,,1t,) for some values 
Lins ty € Ty. Otherwise, v is unbounded. 
Vg and Vy are the sets of bounded and 
unbounded vertices in V, respectively. Clearly 
V=VpU Vy. 


Definition 1 For a vertex v € Vg (resp. v € Vy) 
in a multitolerance representation of G, the val- 
ues ty1 = /;, —ly and ty2 = ry — Tr, (resp. 
ty,1 = ty,2 =o) are the left tolerance and the 
right tolerance of v, respectively. Moreover, if 
v € Vy, then ty = oo is the tolerance of v. 


It can be easily seen by Definition | that if 
we set ty,1 = ty,2 for every vertex v € V, then 
we obtain a tolerance representation, in which 
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ty,1 = ty,2 is the (unique) tolerance of v. Let now 
L, and L» be two parallel lines at unit distance 
in the plane. 


Definition 2 Given an interval J, = [ly, ry] and 
tolerances fy,1,tv,2, Ty is the trapezoid in R2 
defined by the points cy,by on Ly and dy, dy 
on Lz, where dy = ly, by = Ty, Cy = 
min {ry,/y + ty,1}, and dy = max {ly, ry — ty,2}. 
The values ¢,,; = arccot(cy — dy) and dy 2 = 
arc cot (by — dy) are the left slope and the right 
slope of Ty, respectively. Moreover, for every 
unbounded vertex v € Vy, dy = ov,1 = $v,2 
is the slope of Ty. 


Note that, in Definition 2, the endpoints 
dy, by,Cy,dy of any trapezoid T, (on the 
lines L; and Lz) lie on the plane z = 0 
in R3. Therefore, since we assumed that the 
distance between the lines L; and L> is one, 
these endpoints of Ty correspond to the points 
(ay,0,0), (by, 1,0), (cy, 1,0), and (d,,0,0) in 
IR3, respectively. For the sake of presentation, we 
may not distinguish in the following between 
these points in R? and the corresponding 
real values dy,by,cy,dy, Whenever this 
slight abuse of notation does not cause any 
confusion. 

We are ready to give the main definition of this 
entry, namely, the trapezoepiped representation. 
For a set X of points in R?3, denote by Aeonvex(X ) 
the convex hull defined by the points of X. That 
is, Ty = Heonvex(Av, by, Cy, dy) for every vertex 
v € V by Definition 2, where dy, by, cy, dy are 
points of the plane z = 0 in R?. 


Definition 3 (trapezoepiped representa- 
tion) Let G=(V,E) be a _ multitolerance 
graph with a multitolerance representation 
{Ty = [av, by], tT | v € V} and A= max{by | v € 
V}—min{ay | v € V} be the greatest distance 
between two interval endpoints. For every vertex 
v € V, the trapezoepiped T, of v is the convex 
set of points in R? defined as follows: 


(a) If tu,1,tv,2 < |Iv| (ie., v is bounded), then 
Ly = epic h T holly ple 

(b) If ty ty,1 ty.2 = oO (ie, v is 
unbounded), then Ty = Heonyex(a’,, ci). 


vou 
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Where ai, = (ay,0,A — cot¢y 1), bf = 
(by, 1, A — cot hy,2), C, = (Cy, 1, A — cot by,1), 
and d) = (d,,0,A — cot¢,,2). The set of 
trapezoepipeds {7, | v € V} is a trapezoepiped 
representation of G (Fig. 1). 


Theorem 1 Let G = (V, E) be a multitolerance 
graph with a multitolerance representation 
Uy = [av, bv], |v EV}. Then for every 
u,v € V, uv € E ifand only if T, NT, 4 @. 


Efficient Algorithms 

As one of our main tools towards providing 
efficient algorithms on multitolerance graphs, 
we refine Definition 3 by introducing the 
notion of a canonical trapezoepiped represen- 
tation. A trapezoepiped representation R of a 
multitolerance graph G = (V,£) is called 
canonical if the following is true: for every 
unbounded vertex v € Vy in R, if we replace 
Ty by Fssiica F gulp.) in R, we would 
create a new edge in G. Note that replacing 
Ty by Pica T ial ly) in R is equivalent to 
replacing in the corresponding multitolerance 
representation of G the infinite tolerance t, = oo 
by the finite tolerances 
i.e., to making v a bounded vertex. Clearly, 
every trapezoepiped representation R can be 
transformed to a canonical one by iteratively 
replacing unbounded vertices with bounded ones 
(without introducing new edges), as long as this 
is possible. Using techniques from computational 
geometry, we can prove the next theorem. 


ty = ty,2 = lIy|, 


Theorem 2 Every trapezoepiped representation 
of a multitolerance graph G with n vertices can 
be transformed to a canonical representation of 
G in O(n logn) time. 


The main idea for the proof of Theorem 2 
is the following. We associate with every un- 
bounded vertex v € Vy an (appropriately de- 
fined) point p, and with every bounded vertex 
u € Vg three points py,1, Pu,2, Pu,3 in the plane. 
Furthermore we associate with every bounded 
vertex u € Vg the two line segments £,,; and ¢,,,2 
in the plane, which have the points {p,,1, Pu,2} 
and {Pu,2, Pu,3} as endpoints, respectively. We 
can prove that an unbounded vertex v € Vy can 
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Multitolerance Graphs, Fig. 1 (a) A multitolerance 
graph G and (b) a trapezoepiped representation R of G. 
Here, hy,,; = A—cotdy,,; for every bounded vertex 


be replaced by a bounded vertex without intro- 
ducing a new edge if and only if, in the above 
construction, the point p, lies above the lower 
envelope Env(L) of the line segments L = 
{lu1,€u2 : u € Vp}. Since |L| = O(n), we 
can compute Env(L) in O(n log 1) time using the 
algorithm of [6]. 

In the resulting canonical representation R’ 
of G, for every unbounded vertex v € Vy, 
there exists at least one bounded vertex u € Vg 
such that uv ¢ E and Ty, lies “above” T, in 
R’. Moreover, we can prove that in this case 
N(v) © N(u), and thus there exists a minimum 
coloring of G where u and v have the same color. 
The main idea for our (optimal) O(v log n)-time 
minimum coloring algorithm is the following. 
We first compute in O(7 logn) time a minimum 
coloring of the induced subgraph G[Vg] using 
the coloring algorithm of [2] for trapezoid graphs. 
Then, given this coloring, we assign in linear time 
a color to all unbounded vertices. Furthermore, 
using Theorem 2, the maximum clique algorithm 
of [2] for trapezoid graphs, and the fact that 


vj € Ve and j € {1,2}, while hy, = A —cotdy, for 
every unbounded vertex v; € Vy 


multitolerance graphs are perfect, we provide 
an (optimal) O(7 logn)-time maximum clique 
algorithm for multitolerance graphs. 

Our O(m + nlogn)-time maximum-weight 
independent set algorithm for multitolerance 
graphs is based on dynamic programming. Dur- 
ing its execution, the algorithm uses binary search 
trees to maintain two finite sets M and H of O(n) 
weighted markers each, which are appropriately 
sorted on the real line. For the case where the 
input graph G is a tolerance graph, this algorithm 
can be slightly modified to compute a maximum- 
weight independent set in (optimal) O(n logn) 
time, thus closing the complexity gap of [9]. 


Classification of Multitolerance Graphs 

Apart from its use in devising efficient algo- 
rithms, the trapezoepiped representation proved 
useful also in classifying multitolerance graphs 
inside the hierarchy of perfect graphs that is 
given in [5, Figure 2.8]. The resulting hierarchy 
of classes of perfect graphs is complete, i.e., all 
inclusions are strict. 
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Open Problems 


The trapezoepiped representation provides ge- 
ometric insight for multitolerance graphs, and 
it can be expected to prove useful in deriving 
new algorithmic as well as structural results. It 
remains open to close the gap between the lower 
bound of 2(nlogn) and the upper bound of 
O(m + n logn) for the weighted independent set 
on general multitolerance graphs. Furthermore, 
interesting open problems for further research in- 
clude the weighted clique problem, the Hamilto- 
nian cycle problem, the dominating set problem, 
as well as the recognition problem of general 
multitolerance graphs. 


Recommended Reading 


1. Altschul SF, Gish W, Miller W, Myers EW, Lipman 
DJ (1990) Basic local alignment search tool. J Mol 
Biol 215(3):403-410 

2. Felsner S, Miiller R, Wernisch L (1997) Trapezoid 
graphs and generalizations, geometry and algorithms. 
Discret Appl Math 74:13-32 

3. Golumbic MC, Monma CL (1982) A generalization 
of interval graphs with tolerances. In: Proceedings of 
the 13th Southeastern conference on combinatorics, 
graph theory and computing, Boca Raton. Congressus 
Numerantium, vol 35, pp 321-331 

4. Golumbic MC, Siani A (2002) Coloring algo- 
rithms for tolerance graphs: reasoning and scheduling 
with interval constraints. In: Proceedings of the 
joint international conferences on artificial intelli- 
gence, automated reasoning, and symbolic computa- 
tion (AISC/Calculemus), Marseille, pp 196-207 

5. Golumbic MC, Trenk AN (2004) Tolerance graphs. 
Cambridge studies in advanced mathematics. Cam- 
bridge University Press, Cambridge 

6. Hershberger J (1989) Finding the upper envelope of 
n line segments in O(n log n) time. Inf Process Lett 
33(4):169-174 

7. Kaufmann M, Kratochvil J, Lehmann KA, Subrama- 
nian AR (2006) Max-tolerance graphs as intersection 
graphs: cliques, cycles, and recognition. In: Proceed- 
ings of the 17th annual ACM-SIAM symposium on 
discrete algorithms (SODA), Miami, pp 832-841 

8. Lehmann KA, Kaufmann M, Steigele S, Nieselt K 
(2006) On the maximal cliques in c-max-tolerance 
graphs and their application in clustering molecular 
sequences. Algorithms Mol Biol 1:9 

9. Mertzios GB, Sau I, Zaks S (2009) A new intersection 
model and improved algorithms for tolerance graphs. 
SIAM J Discret Math 23(4):1800-1813 


Multiway Cut 


10. Mertzios GB, Sau I, Zaks S (2011) The recognition 
of tolerance and bounded tolerance graphs. SIAM J 
Comput 40(5):1234—1257 

11. Parra A (1998) Triangulating multitolerance graphs. 
Discret Appl Math 84(1-3):183-197 


Multiway Cut 


Gruia Calinescu 
Department of Computer Science, Illinois 
Institute of Technology, Chicago, IL, USA 


Keywords 


Multiterminal cut 


Years and Authors of Summarized 
Original Work 


1998; Calinescu, Karloff, Rabani 


Problem Definition 


Given an undirected graph with edge costs and 
a subset of k nodes called terminals, a multiway 
cut is a subset of edges whose removal discon- 
nects each terminal from the rest. MULTIWAY 
CUT is the problem of finding a multiway cut of 
minimum cost. 


Previous Work 

Dahlhaus, Johnson, Papadimitriou, Seymour, and 
Yannakakis [6] initiated the study of MULTIWAY 
CuT and proved that MULTIWAY CUT is MAX 
SNP-hard even when restricted to instances with 
three terminals and unit edge costs. Therefore, 
unless P = NP, there is no polynomial-time 
approximation scheme for MULTIWAY CUT. For 
k = 2, the problem is identical to the undirected 
version of the extensively studied s—t min-cut 
problem of Ford and Fulkerson, and thus has 
polynomial-time algorithms (see, e.g., [1]). Prior 
to this paper, the best (and essentially the only) 
approximation algorithm for k > 3 was due to 
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the above-mentioned paper of Dahlhaus et al. 
They give a very simple combinatorial isolation 
heuristic that achieves an approximation ratio of 
2(1 — 1/k). Specifically, for each terminal i, find 
a minimum-cost cut separating i from the remain- 
ing terminals, and then output the union of the 
k —1 cheapest of the k cuts. For k = 4 and for 
k = 8, Alon (see [6]) observed that the isolation 
heuristic can be modified to give improved ratios 
of 4/3 and 12/7, respectively. 

In special cases, far better results are known. 
For fixed k in planar graphs, the problem is 
solvable in polynomial time [6]. For trees and 
2-trees, there are linear-time algorithms [5]. For 
dense unweighted graphs, there is a polynomial- 
time approximation scheme [2, 8]. 


Key Results 


Theorem 1 ([3]) There is a deterministic poly- 
nomial time algorithm that finds a multiway cut 
of cost at most (1.5—1/k) times the optimum 
multiway cut. 


The approximation algorithm from Theorem | is 
based on a novel linear programming relaxation 
described later. On the basis of the same 
linear program, the approximation ratio was 
subsequently improved to 1.3438 by Karger, 
Klein, Stein, Thorup, and Young [10]. For three 
terminals, [10] and Cheung, Cunningham, and 
Tang [4] give very different 12/1 1-approximation 
algorithms. 

Two variations of the problem have been con- 
sidered in the literature: Garg, Vazirani, and Yan- 
nakakis [9] obtain a (2 —2/k)-approximation 
ratio for the node-weighted version, and Naor 
and Zosin [11] obtain 2-approximation for the 
case of directed graphs. It is known that any 
approximation ratio for these variations translates 
immediately into the same approximation ratio 
for VERTEX COVER, and thus it is hard to get any 
significant improvement over the approximation 
ratio of 2. 

The algorithm from Theorem | appears next, 
giving a flavor of how this result is obtained. The 
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complete proof of the approximation ratio is not 
long and appears in [3] or the book [12]. 


Notation 
Let G=(V,E) be an undirected graph on 
V = {1, 2, ..., m} in which each edge uv € E 
has a non-negative cost c(u,v) = c(v,u), and 
let T = {1,2,...,k} C V be a set of terminals. 
MULTIWAY CUT is the problem of finding a min- 
imum cost set C C EF such that in (V, EXC), 
each of the terminals 1, 2, ..., & is in a different 
component. Let MWC = MWC(G) be the value 
of the optimal solution to MULTIWAY CUT. 

A, denotes the (k — 1)-simplex, i.e., the 
(k — 1)-dimensional convex polytope in R* given 
by {x € RF|(x = 0) A (Sj 7 = DV}. 

For x € R*, ||x|| is its Z; norm: ||x|| = >; Lil. 
For 7 = 1,2,...,k, e/ € R* denotes the unit 
vector given by (e/); = 1 and (e/); = 0 for all 


i Xj. 


LP-Relaxation 

The simplex relaxation for MULTIWAY CUT with 
edge costs has as variables k-dimensional real 
vectors x", defined for each vertex u € V: 


1 
Minimize a > c(u, v) - |x" — x? || 


uweE 
Subject to: 
x"E A, WueVv 
x=e’ VreT. 


In other words, the terminals stay at the ver- 
tices of the (kK — 1)-simplex, and the other nodes 
anywhere in the simplex, and measure an edge’s 
length by the total variation distance between its 
endpoints. Clearly, placing all nodes at simplex 
vertices gives an integral solution: the lengths of 
edges are either O (if both endpoints are at the 
same vertex) or | (if the endpoints are at different 
vertices), and the removal of all unit length edges 
disconnects the graph into at least k components, 
each containing at most one terminal. 

To solve this relaxation as a linear program, 
new variables are introduced: y"”, defined for all 
uv € E, and x‘, defined for allu € V andi € T. 
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Also new variables are yi”, defined for alli € T 
and uv € E. Then one writes the linear program: 


1 
Minimize 5 > c(u, v)y"” 


uveE 

Subject to : 
x" Ee Ag VueVv 
a =e VtreT 
ye = oye Vuvek 

ieT 
yi >xi-xp Vuve E, ieT 
yi >xp—xi VWuveE, ieT. 


It is easy to see that this linear program op- 
timally solves the simplex relaxation above, by 
noticing that an optimal solution to the linear 
program can be assumed to put y/'” = |xi — x?| 
and y“”” = ||x" — x” ||. Thus, solving the simplex 
relaxation can be done in polynomial time. This 
is the first step of the approximation algorithm. 
Clearly, the value Z* of this solution is a lower 
bound on the cost of the minimum multiway cut 
MWC. 

The second step of the algorithm is a rounding 
procedure which transforms a feasible solution of 
the simplex relaxation into an integral feasible 
solution. The rounding procedure below differs 
slightly from the one given in [3], but can be 
proven to give exactly the same solution. This 
variant is easier to present, although if one wants 
to prove the approximation ratio then the only 
way we know of is by showing that indeed this 
variant gives the same solution as the more com- 
plicated algorithm given in [3]. 


Rounding 

Set B(i,p)={ueV |x >1—p}, the set 
of nodes suitably “close” to terminal i in 
the simplex. Choose a permutation 0 = 
(01,02,...,0%) to be either (1,2,3,...,k -— 
1k) or (K-1,k—2,k —3,...,1,k) with 
probability 1/2 each. Independently, choose 
p € (0, 1) uniformly at random. Then, process the 
terminals in the order o(1), 0(2), 0(3),...,0(k). 
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Algorithm 1 The rounding procedure 

1: Leto = (1,...,k—3,k—2,k—1,k) or (k—1,k— 
2,k —3,...,1,k), each with prob. 1/2 

2: Let p bea random real in (0, 1) /* See the paragraph 
below. */ 

3: for j = 1tok—1do 

4: for all u such that x“ € B(oj, p) \ Uj.i<j Boi, p) 

do 

5 x“ = e°) /* assign node u to terminal oj */ 

6: end for 

7. end for 

8: for all u such that x" ¢ U;.;<,B(0;, p) do 

9: x" = ek 

10: end for 


For each j from 1 to k — 1, place the nodes that 
remain in B(o;, p) at e°/. Place whatever nodes 
remain at the end at e*. The following code 
specifies the rounding procedure more formally. 
x denotes the rounded (integral) solution. 

To derandomize and implement this algorithm 
in polynomial time, one tries both permutations o 
and at most k(n + 1) values of p. Indeed, for any 
permutation o, two different values of p, p1 < po, 
produce combinatorially distinct solutions only 
if there is a terminal 7 and a node wu such that 
x € (1 — p2,1—,]. Thus, there are at most 
k(n + 1) “interesting” values of p, which can be 
determined easily by sorting the nodes accord- 
ing to each coordinate separately. The resulting 
discrete sample space for (0, ¢) has size at most 
2k(n + 1), so one can search it exhaustively. 

The analysis of the algorithm, however, is 
based on the randomized algorithm above, as the 
proof shows that the expected total cost of edges 
whose endpoints are at different vertices of A; in 
the rounded solution x is at most 1.5 Z*. To get 
an (1.5 — 1/k)Z* upper bound, one must rename 
the terminals such that terminal k maximizes 
a certain quantity given by the simplex relaxation, 
or alternatively randomly pick a terminal as the 
last element of the permutation (the order of 
the first k — 1 terminals does not matter as long 
as both the increasing and the decreasing per- 
mutations are tried by the rounding procedure). 
Exhaustive search of the sample space produces 
one integral solution whose cost does not exceed 
the average. 
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Applications 


MULTIWAY CUT is used in Computer Vision, but 
unless one can solve the instance exactly, algo- 
rithms for the generalization METRIC LABELING 
are needed. MULTIWAY CUT has applications in 
parallel and distributed computing, as well as in 
chip design. 


Open Problems 


The improvements of [10, 4] are based on bet- 
ter rounding procedures and both compare the 
integral solution obtained to Z*. This leads to 
the natural question: what is the supremum, over 
multiway cut instances G, of Z*(G)/MWC(G). 
This supremum is called integrality gap or in- 
tegrality ratio. For three terminals, [10] and [4] 
show that the integrality gap is exactly 12/11, 
while for general k, Freund and Karloff [7] give 
a lower bound of 8/7. The best-known upper 
bound is 1.3438, achieved by an approximation 
algorithm of [10]. 
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Problem Definition 


Protein phosphorylation plays an important 
role in various biological functions and cellular 
processes. Identifying potential phosphorylation 
sites in a protein often helps to reveal functional 
details at the molecular level and was always 
performed by in vivo or in vitro experiments. 
Since the last decade, bioinformatics has been 
contributing significantly in characterizing 
protein structures and functionalities solely from 
its primary information, which also sheds light 
on phosphorylation site prediction. As per our 
expectation, in silico prediction should not only 
provide an alternative way to identify protein 
phosphorylation sites at lower cost but also with 
much higher throughput (e.g., proteome-wide 
screening), so that biologists can quickly pinpoint 
the potential sites for further experiments from 
a long list of targets. Therefore, it is soon 
becoming valuable and imperative to build such a 
bioinformatics tool or framework that can predict 
general and kinase-specific phosphorylation sites 
in proteins. 

In definition, protein phosphorylation predic- 
tion is a computational approach to determine 
whether a certain amino acid in a protein se- 
quence can be potentially phosphorylated or not. 
More specifically, given a protein (or peptide) 
sequence P = [aj] i = 1,...n (where n is 
the sequence length, with amino acid a; at ith 
position), the prediction algorithm is to tell if 
each of a; (especially when a; is serine, threonine, 
or tyrosine) can be phosphorylated in P or not. 
In a kinase-specific format, this question is asked 
with a proposed kinase name or kinase family. 
Moreover, this question can also be asked in a 
species-specific or condition-specific format. 


Key Results 


Machine Learning Approach 

The algorithm we designed to fulfill this predic- 
tion task is able to resolve the association be- 
tween phosphorylation and sequence information 
from the experimentally identified phosphoryla- 
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tion sites. Thus, it is formulated as a machine 
learning approach rather than an ab initio method. 
The collected experimental data need to be split 
into training and testing set to generate, tune, 
and validate our machine learning models. These 
machine learning models are then capable of 
predicting general or kinase-specific phospho- 
rylation site for proteins with unknown sites. 
The general or specific prediction models are 
very dependent on the correspondent data sets in 
training. For example, kinase-specific prediction 
requires kinase-specific training data sets. After 
different preprocessing steps applied on data sets 
for general or specific purposes, the prediction 
models are generated from the same machine 
learning method (i.e., support vector machine 
[1-3] in our framework). 

Technically speaking, per site prediction can 
be modeled as a binary classification problem, 
where the class label Y is either +1 for identified 
phosphorylation site or —1 for unidentified 
site, with X as its feature vector. A machine 
learning model can be considered as a map 
function from feature space X to the class label, 
i.e., M: g(X) — Y, obtained from the training 
data set{(X,, Yi), (X2, Y2)...,(Km, Ym)}. The 
prediction for the unknown X* is simply 
calculated through Y* = g(X*). 

In our case, X; is a feature vector from the 
protein sequence, extracted from a flanking pep- 
tide surrounding i-th amino acid a;. The flanking 
sequence is often centralized by a; and is a sub- 
string of the original protein sequence, denoted as 
p(a;) = [ajw.---4i,---, di¢wl, where w is called 
the window size. 

Support vector machine (SVM) then generates 
our prediction model M by maximizing the mar- 
gin of the classification boundary. 


Features 
K nearest neighbor (KNN) scores, disorder 
scores, and amino acid information are used as 
the features in our SVM-based machine learning 
approach. 

For amino acid aj, k nearest neighbors are 
defined as the top k most similar peptides 
(within smallest distances) to the target peptide 
p(a;) in the training data set. Besides size k, 
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the neighborhood can be also defined as a 
certain percentage of the whole training data 
set (e.g., 1% of the total population). KNN 
score is then the ratio between the numbers of 
positive and negative sites within this predefined 
neighborhood. 

Notice that the peptide similarity needs to be 
defined and normalized. In a more clear illustra- 
tion, the two flanking sequences centralized by 
amino acid a; and aj are represented as p(aj) = 


[ajw, Aw) 06 6) Bi, os, AeEWo1y Aitw] and 
p(aj) = | atag Aiea Rhy a pes pee |, 
respectively. 


The distance D(p(aj), p(a;)) between peptides 
p(aj) and p(a;) is calculated by 


dX SGi+n,4j+k) 


_ 4. k=—w 
D(p(ai), p(aj)) = 1 w+ 


where w is the window size, and the function S(.) 
is to calculate the amino acid similarity between 
aj and a; based on the normalized amino acid 
substitution matrix Q. More specifically, 


Q(ai,aj;) — min(Q) 


NG) sO) Sa) 


where a; and aj are two amino acids, Q is the 
substitution matrix, and max(Q) and min(Q) rep- 
resent the maximal and minimal values in the 
matrix Q. By default, BLOSUM62 is used as the 
most general substitution matrix. In fact, Q can 
also be directly calculated from the training data 
set and then KNN score is very specific to the 
training samples. 

Disorder score per amino acid site reflects the 
stability of the local structure and is calculated by 
VSL2B [4]. By considering the disorder property 
as a more neighborhood-dependent and contin- 
uous feature, we correct (smooth) the disorder 
score at the position a; using the mean value 
across the flanking peptide p(aj), i.e., 


Disorder(a;) = average(p(q;)) 


1 
2w+1 


II 


Ww 
» disorder(a; +x) 
k=-w 
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Amino acid information for flanking sequence 
p(aj) can refer to both composition and position 
information. At one extreme, amino acid fre- 
quency reflects composition information but no 
position information. The amino acid preference 
in phosphorylated peptides [5] can be revealed by 
this frequency feature. The size of this frequency 
vector is 20, which stores the normalized counts 
for each amino acid type within the range of 
the flanking peptide p(aj). On the other extreme, 
amino acid binary vector can provide position- 
specific information by bookkeeping a 0-1 vector 
for each amino acid at each position. The length 
of amino acid binary vector is 20*w, much longer 
than the frequency, which may potentially cause 
over-fitting in machine learning when the sample 
size is small. Therefore, selecting the right way to 
encode and represent the amino acid information 
is a trade-off between the losslessness of the po- 
sitional information and the length of the feature 
vector. 


Bootstrap and Aggregation 

Since the under-identified phosphorylation sites 
(negative data) are always overwhelming the 
identified ones (positive data) in the training 
data set, we resolved this problem with bootstrap 
procedures to avoid the potential bias in the final 
classifier due to this unbalancing situation. The 
bootstrap step is a randomized resampling to 
get a balanced training data each time, which is 
repeated many times in order to explore the whole 
sample space thoroughly. So we will get many 
models based on the actual number of bootstrap 
steps, such as Mj,..., Mx, and k could be up to 
thousands. Then, we do the final classification 
based on the voting or mean value from these 
many models, i.e., G : G[gi(X), .. .gx(X)], where 
G is called the aggregation step. The aggregated 
model is thus unbiased despite the imbalance of 
the labels in the training data. 


Cross Validation 

With the trained model, the testing result is often 
displayed as a trade-off between specificity and 
sensitivity, e.g., by a receiver-operating character- 
istic (ROC) curve. Specificity and sensitivity are 
defined as follows: 
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specificity = eos 
TN + FP 
sensitivity = ao 
TP +FN 


where TN represents true negative, FP false posi- 
tive, TP true positive, and FN false negative. 

Cross validation is a way to measure if the 
power of the phosphorylation prediction model 
trained from the known data can be well extended 
to the unknown. Usually, the cross validation can 
be performed with leave-one-out strategy (for 
small data set, i.e., kinase-specific data set) or 
from non-overlapped testing and training sets 
with x folds settings (for general phosphorylation 
site). 


Musite as a Toolkit 

Musite is an open-source software toolkit de- 
signed for large-scale phosphorylation prediction 
for both general and kinase-specific cases [6, 7]. 
The framework is quite flexible, so that user can 
take advantage of different preprocessing steps 
for specifying training or testing data, as well as 
picking different features and tuning parameters. 
By default, Musite provides general phosphory- 
lation prediction models, several popular kinase- 
specific models, and multiple species-specific 
predictions (e.g., a plant-specific tool was build 
using our in-house plant protein phosphorylation 
database P?DB [8-10]). Moreover, trained with 
users’ specific data sets, Musite is also capable of 
generating customized models to do precise pre- 
diction particularly on their own research focus. 


Applications 


This tool or framework can be used as a quick 
filter on a long list of candidate proteins for 
experimental biologists to narrow down the 
phosphorylation sites to perform biochemical 
assay. It can also help to evaluate or compare the 
experimental observations in discovery studies. 
On the other hand, the computational experts can 
use this tool to do comparative studies, by fast 
and cheap computer screening across multiple 
proteomes. This tool can be easily and freely 
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incorporated into any translational bioinformatics 
pipeline to characterize or annotate protein func- 
tionality within large scale proteomics studies. 


Open Problems 


1. The current version is an alignment free 
method. Is it possible or necessary to consider 
alignment, i.e., allowing indels for peptides 
similarity calculation? 

2. The current features are more or less local. 
Are there any feasible features representing 
long distance association with phosphoryla- 
tion site? 

3. Can we extract any interesting biological rules 
from the machine learning models for general 
or specific phosphorylation events? 


URLs to Code 


The source code can be downloaded from Source- 
Forge. 

http://musite.sourceforge.net/ 

The online prediction Web services are available 
at: 

http://musite.net/ 

http://p3db.org/prediction.php 
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Problem Definition 


This problem is concerned with the multicast 
routing and cost sharing in a selfish network 
composed of relay terminals and receivers. This 
problem is motivated by the recent observation 
that the selfish behavior of the network could 
largely degraded existing system performance, 
even dysfunction. The work of Wang, Li and 
Chu [7] first presented some negative results of 
the strategyproof mechanism in multicast routing 
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and sharing, and then proposed a new solution 
based on Nash Equilibrium that could greatly 
improve the performance. 

Wang, Li and Chu modeled a network by a link 
weighted graph G = (V, E,c), where Vis the set 
of all nodes and c is the cost vector of the set E of 
links. For a multicast session, let Q denote the set 
of all receivers. In game theoretical networking 
literatures, usually there are two models for the 
multicast cost/payment sharing. 

Axiom Model (AM) All receivers must re- 
ceive the service, or equivalently, each receiver 
has an infinity valuation [3]. In this model, a shar- 
ing method € computes how much each receiver 
should pay when the receiver set is R and cost 
vector is ¢. 

Valuation Model (VM) There is a set 0 = 
{41,92,°+: ,dr} of r possible receivers. Each re- 
ceiver qj € Q has a valuation y,; for receiving the 
service. Let 7 = (1, 2,...,7r) be the valuation 
vector and 7p be the valuation vector of a set 
RC Q of receivers. In this model, they are in- 
terested in a sharing mechanism S consisting of 
a selection scheme o(y,¢) and a sharing method 
&(n,c). o;(y,¢) denotes whether receiver i re- 
ceives the service or not, and &;(7,c) computes 
how much the receiver g; should pay for the 
multicast service. Let P(7, c) be the total payment 
for providing the service to the receiver set. 

In the valuation model, a receiver who is 
willing to receive the service is not guaranteed to 
receive the service. For notational simplicity, 
o(y,c) is used to denote the set of actual 
receivers. Under the Valuation Model, a fair 
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Algorithm 1 = The multicast system WO = 
(MPM, SOM) based on multicast tree LCPT 


1: Compute path LCP(s, q;, d) and set 


mm ($5qj4),d 
oj = CE mm 4 j/0.2) for every qj € Q. 


2: Set on, d) = Oand PPM(n, d) = 0 for each link 
e; ¢ LCP(s, qj, d). 
3: for each receiver q; do 
ifn; = ¢; then 
Receiver q; is granted the service and charged 
gOM(n, d), setR=RU qj. 


6: else 

7: Receiver q; is not granted the service and is 
charged 0. 

8: end if 

9: end for 


10: Set OPM(m, d) = Land PPM(n, d) = PLPT (NZ, d) 
for each link e; € LCPT(R, d). 


sharing according to the following criteria is 
studied. 


¢ Budget Balance: For the receiver set 
R=o(n.0), PO.) = Dy, co (1.0). If 
a-P(y,c) © DVicr&la.e) = Ply, 0), 
for some given parameter 0 <a <1, then 
S =(0,&) is called a-budget-balance. If 
budget balance is not achievable, then 
a sharing scheme S may need to be a-budget- 
balance instead of budget balance. 

¢ No Positive Transfer (NPT): Any receiver 
qi’ 8 sharing should not be negative. 

e Free Leaving: (FR) The potential receivers 
who do not receive the service should not pay 
anything. 

¢« Consumer Sovereignty (CS): For any re- 
ceiver q;, if y; is sufficiently large, then gq; is 
guaranteed to be an actual receiver. 

¢ Group-Strategyproof (GS): Assume that 
n is the valuation vector and 7’ 4 y. If 
Ei (n’,c) = &(y,e) for each g; € n, then 
Ei (',c) = &:(,¢). 


Notations 

The path with the lowest cost between two odes 
s and ¢t is denoted as LCP(s,t,c), and its cost 
is dented as |LCP(s,t,c)|. Given a simple path 
P in the graph G with cost vector c, the sum 
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Algorithm 2 FPA Mechanism M*UC 


1: Each terminal bids a price b;. 
: Every link sends a unit size dummy packet with prop- 


ertyp =t-(n-b, — )~ b;) and receives payment 
e;€G 


NN 


fils. qusb) = t-[bu (nba — Deseo bi) — FI 


Here, b, is the maximum cost any link can declare. 
3: Compute the unique path LCP(s, q1, b’) by applying 
certain fixed tie-breaking rule consistently. 
4: Each terminal bids again for a price b’. 
5: for each link e; do 
6: It is select to relay the packet and receives pay- 
ment b’ if and only if e; is on path LCP(s, q1, b’). 
7: end for 


of the cost of links on path P is denoted as 
|P(c)|. For a simple path P = vy; » vy, if 
LCP(s,t,c)()P = {u;,v;}, then P is called 
a bridge over LCP(s,t,c). This bridge P covers 
link ex if eg € LOP(v;, v;,¢). Given a link e; € 
LCP(s, ¢, c), the path with the minimum cost that 
covers e; is denoted as Bmin(e;,¢). The bridge 
Bnm(s, t, c) = MaXe; ELCP(s,t,c) Buin (e; ’ c) is the 
max-min cover of the path LCP(s, tf, c). 

A bridge set B is a bridge cover for 
LCP(s,t,c), if for every link e; € LCP(s,t,c), 
there exists a bridge B € B such that e; ¢ LCP 
(Us(p), Vi(p),¢). The weight of a bridge 
cover G(s,t,c) is defined as |B(s,t,c)| = 
> BeB(s,t,c) ue; eR Ci: A bridge cover is a least 
bridge cover (LB), denoted by LB(s,t,c), if it 
has the smallest weight among all bridge covers 
that cover LCP(s, t, c). 


Key Results 


Theorem 1 /fY = (M,S) is ana-stable multi- 
cast system, then a <1/n. 


Theorem 2 Multicast system Y™ is 1/(r -n)- 
stable, where r is the number of receivers. 


Theorem | gives an upper bound for a for any 
a-stable unicast system YW. It is not difficult to 
observe that even the receivers are cooperative, 
Theorem | still holds. Theorem 2 showed that 
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Algorithm 3 FPA based unicast system 


1: Execute Line 1 — 3 in Algorithm 2. 

2: Compute LB(s, q1,b), and set @ = pee 

3: If@ < n then set of4 (1, b) = Land €44(7, b) =¢. 
Every relay link on LCP is selected and receives an 
extra payment b’. 

4: For each link e; ¢ LCP(s, q1,b’), it receives a pay- 
ment PAY(n,,b) — y - (bi — bi)’. 


[LB(s,q1,b)| 


there exists a multicast system is 1/(r - 1)-stable. 
When r = 1, the problem become traditional 
unicast system and the bound is tight. When 
relaxing the dominant strategy to the Nash Equi- 
libria requirement, a First Price Auction (FPA) 
mechanism is proposed by Wang et al. under the 
Axiom Model that has many nice properties. 


Theorem 3 There exists NE for FPA mechanism 
MAYS and for any NE, (a) each link bids his true 
cost as the first bid bj, (b) the actual shortest path 
is always selected, (c) the total cost for different 
NE differs at most 2 times. 


Based on the FPA Mechanism yAUc | Wang, Li 
and Chu design a unicast system as follows. 


Theorem 4 The FPA based unicast system not 
only has Nash Equilibria, but also is 4-NE-stable 
with € additive, for any given €. 


By treating each receiver as a separate receiver 
and applying the similar process as in the unicast 
system, Wang, Li and Chu extended the unicast 
system to a multicast system. 


Theorem 5 The FPA based multicast system not 
only has Nash Equilibria, but also is 1/(2-r)- 
NE-stable with € additive, for any given €. 


Applications 


More and more research effort has been done 
to study the non-cooperative games recently. 
Among these various forms of games, the 
unicast/multicast routing game [2, 5, 6] and 
multicast cost sharing game [l, 3, 4] have 
received a considerable amount of attentions 
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over the past few year due to its application in the 
Internet. However, both unicast/multicast routing 
game and multicast cost sharing game are one 
folded: the unicast/multicast routing game does 
not take the receivers into account while the 
multicast cost sharing game does not treat the 
links as non-cooperative. In this paper, they study 
the scenario, which was called multicast system, 
in which both the links and the receivers could 
be non-cooperative. Solving this problem paving 
a way for the real world commercial multicast 
and unicast application. A few examples are, but 
not limited to, the multicast of the video content 
in wireless mesh network and commercial WiFi 
system; the multicast routing in the core Internet. 


Open Problems 


A number of problems related to the work of 
Wang, Li and Chu [7] remain open. The first and 
foremost, the upper bound and lower bound on a 
still have a gap of r if the multicast system is a- 
stable; and a gap of 2r if the multicast system is 
a-Nash stable. 

The second, Wang, Li and Chu only showed 
the existence of the Nash Equilibrium under their 
systems. They have not characterized the conver- 
gence of the Nash Equilibrium and the strategies 
of the user, which are not only interesting but also 
important problems. 
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Problem Definition 


In this entry, the authors state results on some 
transformation-based distances for evolutionary 
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trees. Several distance models for evolutionary 
trees have been proposed in the literature. Among 
them, the best known is perhaps the nearest 
neighbor interchange (nni) distance introduced 
independently in [10] and [9]. The authors will 
focus on the nni distance and a closely related 
distance called the subtree-transfer distance 
originally introduced in [5, 6]. Several papers 
that involved DasGupta, He, Jiang, Li, Tromp, 
and Zhang essentially showed the following 
results: 


¢« A correspondence between the nni distance 
and the linear-cost subtree-transfer distance on 
unweighted trees. 

¢ Computing the nni distance is NP-hard, but 
admits a fixed-parameter tractability and a 
logarithmic ratio approximation algorithms. 

¢ A 2-approximation algorithm for the linear- 
cost subtree-transfer distance on weighted 
evolutionary trees. 


The authors first define the nni and linear-cost 
subtree-transfer distances for unweighted trees. 
Then the authors extend the nni and linear-cost 
subtree-transfer distances to weighted trees. For 
the purpose of this entry, an evolutionary tree 
(also called phylogeny) is an unordered tree, has 
uniquely labeled leaves and unlabeled interior 
nodes, can be unrooted or rooted, can be un- 
weighted or weighted, and has all internal nodes 
of degree 3. 


Unweighted Trees 

An nni operation swaps two subtrees that are 
separated by an internal edge (u, v), as shown in 
Fig. 1. 

The nni operation is said to operate on this 
internal edge. The nni distance, Doi(T), T2), 
between two trees 7; and T> is defined as the 
minimum number of nni operations required to 
transform one tree into the other. 

An nni operation can also be viewed as mov- 
ing a subtree past a neighboring internal node. 
A more general operation is to transfer a sub- 
tree from one place to another arbitrary place. 
Figure 2 shows such a subtree-transfer operation. 

The subtree-transfer distance between two 
trees 7; and 7> is the minimum number of 
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Nearest Neighbor 
Interchange and Related 
Distances, Fig. 1 The two 
possible nni operations on 
an internal edge (u, v): 
exchange B < C or 


BoD 
s5 s5 
one subtree transfer 
—————> 
s1 s2. $3 s4 sl s2 s3 s4 


Nearest Neighbor Interchange and Related Dis- 
tances, Fig. 2, An example of subtree-transfer 


subtrees one needs to move to transform 7} 
into Tz [5-7]. It is sometimes appropriate in 
practice to discriminate among subtree-transfer 
operations as they occur with different frequen- 
cies. In this case, one can charge each subtree- 
transfer operation a cost equal to the distance 
(the number of nodes passed) that the subtree 
has moved in the current tree. The linear-cost 
subtree-transfer distance, Dj-s(T,, T2), between 
two trees 7; and 7> is then the minimum total 
cost required to transform Tj into T2 by subtree- 
transfer operations [1,3]. 


Weighted Trees 

Both the linear-cost subtree-transfer and nni mod- 
els can be naturally extended to weighted trees. 
The extension for nni is straightforward: an nni 
operation is simply charged a cost equal to the 
weight of the edge it operates on. For feasibil- 
ity of weighted nni transformation between two 
given weighted trees T; and 72, one also requires 
that the following conditions are satisfied: (1) for 
each leaf label a, the weight of the edge in 7} 
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incident on a is the same as the weight of the 
edge in 7> incident on a and (2) the multisets of 
weights of internal edges of 7; and 7> are the 
same (Fig. 3). 

In the case of linear-cost subtree-transfer, al- 
though the idea is immediate, i.e., a moving sub- 
tree should be charged for the weighted distance 
it travels, the formal definition needs some care 
and is given below. Consider (unrooted) trees 
in which each edge e has a weight w(e) > 0. 
To ensure feasibility of transforming a tree into 
another, one requires the total weight of all edges 
to equal one. A subtree-transfer is now defined 
as follows. Select a subtree S of T at a given 
node u and select an edge e ¢ S. Split the edge e 
into two edges e; and e2 with weights w(e,) and 
w(e2) (w(e1), w(e2) = 0, w(e1)+w(e2) = w(e)), 
and move S to the common end point of e; and 
é. Finally, merge the two remaining edges e/ 
and e// adjacent to u into one edge with weight 
w(e’) + w(e”). The cost of this subtree-transfer 
is the total weight of all the edges over which S 
is moved. Figure 3 gives an example. The edge- 
weights of the given tree are normalized so that 
their total sum is 1. The subtree S' is transferred 
to split the edge e4 to eg and e7 such that w(e¢), 
w(e7) => 0 and w(ee) + w(e7) = w(ea); finally, 
the two edges e; and e2 are merged to es such that 
w(es5) = w(e1) + w(e2). The cost of transferring 
S is w(e2) + w(e3) + wee). 

Note that for weighted trees, the linear-cost 
subtree-transfer model is more general than the 
nni model in the sense that one can slide a subtree 
along an edge with subtree-transfers. Such an 
operation is not realizable with nni moves. 
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Nearest Neighbor Interchange and Related Distances, Fig. 3 Subtree-transfer on weighted phylogenies. Tree (b) 


is obtained from tree (a) with one subtree-transfer 


Key Results 


Let 7; and 7> be the two trees, each with 1 nodes, 
that are being used in the distance computation. 


Theorem 1 ([1,2,4]) Assume that T; and T> are 
unweighted. Then, the following results hold: 


° Dmi(T1,T2) = Diest(T1, T2). 

¢ Computing Dyi(T,, T2) is NP-complete. 

¢ Suppose that Dyyi(T1, T2) < d. Then, an opti- 
mal sequence of nni operations transforming 
T, into T> can be computed in O(n? log n + n- 
2234/2) time. 

¢ Dni(T1, T2) can be approximated to within a 
factor of log n + O()) in polynomial time. 


Theorem 2 ({1-4]) Assume that T; and T> are 
weighted. Then, the following results hold: 


¢ Dnyi(T1, T2) can be approximated to within a 
factor of 6 + 6log n in O(n7log n) time. 

¢ Assume that T,; and T> are allowed to have 
leaves that are not necessarily uniquely la- 
beled. Then, computing Dcs(T1, T2) is NP- 
hard. 

¢ Dies (11, T2) can be approximated to within a 
factor of 2 in O(n*log n) time. 


Applications 


The results reported here are on transformation- 
based distances for evolutionary trees. Such a 
tree can be rooted if the evolutionary origin is 
known and can be weighted if the evolutionary 
length on each edge is known. Reconstructing the 


correct evolutionary tree for a set of species is 
one of the fundamental yet difficult problems in 
evolutionary genetics. Over the past few decades, 
many approaches for reconstructing evolution- 
ary trees have been developed, including (not 
exhaustively) parsimony, compatibility, distance, 
and maximum likelihood approaches. The out- 
comes of these methods usually depend on the 
data and the amount of computational resources 
applied. As a result, in practice they often lead to 
different trees on the same set of species [8]. It 
is thus of interest to compare evolutionary trees 
produced by different methods or by the same 
method on different data. 

Another motivation for investigating the 
linear-cost subtree-transfer distance comes from 
the following motivation. When recombination 
of DNA sequences occurs in an evolution, two 
sequences meet and generate a new sequence, 
consisting of genetic material taken left of the 
recombination point from the first sequence and 
right of the point from the second sequence 
[5, 6]. From a phylogenetic viewpoint, before 
the recombination, the ancestral material on 
the present sequence was located on two 
sequences, one having all the material to the 
left of the recombination point and another 
having all the material to the right of the 
breaking point. As a result, the evolutionary 
history can no longer be described by a single 
tree. The recombination event partitions the 
sequences into two neighboring regions. The 
history for the left and the right regions could 
be described by separate evolutionary trees. The 
recombination makes the two evolutionary trees 
describing neighboring regions differ. However, 
two neighbor trees cannot be arbitrarily different, 
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one must be obtainable from the other by a 
subtree-transfer operation. When more than 
one recombination occurs, one can describe an 
evolutionary history using a list of evolutionary 
trees, each corresponds to some region of the 
sequences and each can be obtained by several 
subtree-transfer operations from its predecessor 
[6]. The computation of a linear-cost subtree- 
transfer distance is useful in reconstructing such 
a list of trees based on parsimony [5, 6]. 


Open Problems 


1. Is there a constant ratio approximation 
algorithm for the nni distance on unweighted 
evolutionary trees or is the O(log n)- 
approximation the best possible? 

2. Is the linear-cost subtree-transfer distance NP- 
hard to compute on weighted evolutionary 
trees if leaf labels are not allowed to be 
nonunique? 

3. Can one improve the approximation ratio 
for linear-cost subtree-transfer distance on 
weighted evolutionary trees? 
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Problem Definition 


Let G = (V, E) be an n-vertex, m-edge directed 
graph (digraph), whose edges are associated with 
a real-valued cost function wt: E > R. The 
cost, wf(P), of a path P in G is the sum of the 
costs of the edges of P. A simple path C whose 
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starting and ending vertices coincide is called 
a cycle. If wt(C) < 0, then C is called a negative 
cycle. The goal of the negative cycle problem is 
to detect whether there is such a cycle in a given 
digraph G with real-valued edge costs, and if 
indeed exists to output the cycle. 

The negative cycle problem is closely related 
to the shortest path problem. In the latter, a min- 
imum cost path between two vertices s and f¢ is 
sought. It is easy to see that an s-t shortest path 
exists if and only if no s-t path in G contains 
a negative cycle [1, 13]. It is also well-known 
that shortest paths from a given vertex s to all 
other vertices form a tree called shortest path 
tree [1, 13]. 


Key Results 


For the case of general digraphs, the best algo- 
rithm to solve the negative cycle problem (or to 
compute the shortest path tree, if such a cycle 
does not exist) is the classical Bellman—Ford 
algorithm that takes O(nm) time (see e.g., [1]). 
Alternative methods with the same time complex- 
ity are given in [4, 7, 12, 13]. Moreover, in [11, 
Chap. 7] an extension of the Bellman—Ford algo- 
rithm is described which, in addition to detecting 
and reporting the existing negative cycles (if any), 
builds a shortest path tree rooted a some vertex 
s reaching those vertices u whose shortest s-u 
path does not contain a negative cycle. If edge 
costs are integers larger than —L (L > 2), then 
a better algorithm was given in [6] that runs 
in O(m,/n log L) time, and it is based on bit 
scaling. 

A simple deterministic algorithm that runs in 
O(n? logn) expected time with high probabil- 
ity is given in [10] for a large class of input 
distributions, where the edge costs are chosen 
randomly according to the endpoint-independent 
model (this model includes the common case 
where all edge costs are chosen independently 
from the same distribution). 

Better results are known for several important 
classes of sparse digraphs (i.e., digraphs with 
m = O(n) edges) such as planar digraphs, out- 
erplanar digraphs, digraphs of small genus, and 
digraphs of small treewidth. 
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For general sparse digraphs, an algorithm is 
given in [8] that solves the negative cycle problem 
in O(n + py! log 7) time, where / is a topolog- 
ical measure of the input sparse digraph G, and 
whose value varies from | up to O(n). Infor- 
mally, y represents the minimum number of out- 
erplanar subgraphs, satisfying certain separation 
properties, into which G can be decomposed. In 
particular, y is proportional to y(G) + g, where 
G is supposed to be embedded into an orientable 
surface of genus y(G) so as to minimize the 
number q of faces that collectively cover all 
vertices. For instance, if G is outerplanar, then 
y = 1, which implies an optimal O(n) time al- 
gorithm for this case. The algorithm in [8] does 
not require such an embedding to be provided 
by the input. In the same paper, it is shown 
that random G,;,,, graphs with threshold function 
1/n are planar with probability one and have an 
expected value for y equal to O(1). Furthermore, 
an efficient parallelization of the algorithm on the 
CREW PRAM model of computation is provided 
in [8]. 

Better bounds for planar digraphs are as fol- 
lows. If edge costs are integers, then an algo- 
rithm running in O(n‘/? log(nL)) time is given 
in [9]. For real edge costs, an O(n log? n)-time 
algorithm was given in [5]. 

An optimal O(n)-time algorithm is given in [3] 
for the case of digraphs with small treewidth 
(and real edge costs). Informally, the treewidth 
t of a graph G is a parameter which measures 
how close is the structure of G to a tree. For 
instance, the class of graphs of small treewidth 
includes series-parallel graphs (¢ = 2) and outer- 
planar graphs (¢ = 2). An optimal parallel algo- 
rithm for the same problem, on the EREW PRAM 
model of computation, is provided in [2]. 


Applications 


Finding negative cycles in a digraph is a funda- 
mental combinatorial and network optimization 
problem that spans a wide range of applications 
including: shortest path computation, two dimen- 
sional package element, minimum cost flows, 
minimal cost-to-time ratio, model verification, 
compiler construction, software engineering, 
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VLSI design, scheduling, circuit production, 
constraint programming and image processing. 
For instance, the isolation of negative feedback 
loops is imperative in the design of VLSI circuits. 
It turns out that such loops correspond to negative 
cost cycles in the so-called amplifier-gain graph 
of the circuit. In constraint programming, it 
is required to check the feasibility of sets of 
constraints. Systems of difference constraints 
can be represented by constraint graphs, and 
one can show that such a system is feasible if 
and only if there are no negative cost cycles 
in its corresponding constraint graph. In zero- 
clairvoyant scheduling, the problem of checking 
whether there is a valid schedule in such 
a scheduling system can be reduced to detecting 
negative cycles in an appropriately defined 
graph. For further discussion on these and other 
applications see [1, 12, 14]. 


Open Problems 


The negative cycle problem is closely related 
to the shortest path problem. The existence 
of negative edge costs makes the solution of 
the negative cycle problem or the computation 
of a shortest path tree more difficult and 
thus more time consuming compared to the 
time required to solve the shortest path tree 
problem in digraphs with non-negative edge 
costs. For instance, for digraphs with real edge 
costs, compare the O(nm)-time algorithm in 
the former case with the O(m +n logn)-time 
algorithm for the latter case (Dijkstra’s algorithm 
implemented with an efficient priority queue; see 
e.g., [1]). 

It would therefore be interesting to try to 
reduce the gap between the above two time com- 
plexities, even for special classes of graphs or the 
case of integer costs. 

The only case where these two complexities 
coincide concerns the digraphs of small 
treewidth [3], making it the currently most 
general such class of graphs. For planar digraphs, 
the result in [5] is only a polylogarithmic factor 
away from the O(n)-time algorithm in [9] that 
computes a shortest path tree when the edge 
costs are non-negative. 
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Experimental Results 


An experimental study for the negative cycle 
problem is conducted in [4]. In that paper, several 
methods that combine a shortest path algorithm 
(based on the Bellman—Ford approach) with a cy- 
cle detection strategy are investigated, along with 
some new variations of them. It turned out that the 
performance of algorithms for the negative cycle 
problem depends on the number and the size of 
the negative cycles. This gives rise to a collection 
of problem families for testing negative cycle 
algorithms. 

A follow-up of the above study is presented 
in [14], where two new heuristics are introduced 
and are incorporated on three of the algorithms 
considered in [4] (the original Bellman—Ford and 
the variations in [13] and [7]), achieving dramatic 
improvements. The data sets considered in [14] 
are those in [4]. 


Data Sets 


Data set generators and problem families are 
described in [4], and are available from http:// 
www.avglab.com/andrew/soft.html. 


URL to Code 


The code used in [4] is available from http:// 
www.avglab.com/andrew/soft.html. 
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Problem Definition 


Over the last few decades, a wide variety of 
networks have emerged. The general structure 
of these networks including their global con- 
nectivity properties has been studied extensively. 
On the other hand, strategic aspects of them 
are also very interesting to explore by consid- 
ering the nodes as independent agents. The ex- 
citing area of network creation games attempts 
to understand how real-world networks (such as 
the Internet) develop when multiple independent 
agents (e.g., ISPs) build pieces of the network to 
selfishly improve their own objective functions 
which heavily depend on their connectivity prop- 
erties. 

We start by elaborating on these connectivity 
objectives and its relation to the global design 
and structure of the network. Network design is 
a fundamental family of problems at the intersec- 
tion between computer science and operations re- 
search, amplified in importance by the sustained 
growth of computer networks such as the Inter- 
net. Traditionally, the goal is to find a minimum- 
cost (sub) network that satisfies some specified 
property such as k-connectivity or connectivity 
on terminals (as in the classic Steiner tree prob- 
lem). Such a formulation captures the (possibly 
incremental) creation cost of the network but does 
not incorporate the cost of actually using the 
network. By contrast, network routing has the 
goal of optimizing the usage cost of the network 
but assumes that the network has already been 
created. The network creation game attempts to 
unify the network design and network routing 
problems by modeling both creation and usage 
costs. Specifically, each node in the system is 
an independent selfish agent that can create a 
link (edge) to any other node, at a cost of a. 
In addition to these creation costs, each node 
incurs a usage cost related to the distances to 
the other nodes. In the model introduced by 
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Fabrikant, Luthra, Maneva, Papadimitriou, and 
Shenker [11], the usage cost incurred by a node 
is the sum of distances to all other nodes. Equiva- 
lently, if we divide the cost (and thus a) by the 
number 7 of nodes, the usage cost is the aver- 
age distance to other nodes. In another natural 
model, the usage cost incurred by a node is the 
maximum distance to all other nodes: this model 
captures the worst-case instead of average-case 
behavior of routing. To model the dominant be- 
havior of large-scale networking scenarios such 
as the Internet, we consider each node to be an 
agent (player) [12] that selfishly tries to minimize 
its own creation and usage costs [1, 6, 11]. In 
this context, the price of anarchy [14, 15, 17] is 
the worst possible ratio of the total cost found 
by some independent selfish behavior and the 
optimal total cost possible by a centralized, so- 
cial welfare-maximizing solution. The price of 
anarchy is a well-studied concept in algorithmic 
game theory for problems such as load balancing, 
routing, and network design; see, e.g., [1, 3- 
7, 11,15,16]. 


Equilibria To model the dominant behavior of 
large-scale networking scenarios such as the In- 
ternet, we consider the case where every node 
(player) selfishly tries to minimize its own cre- 
ation and usage cost. This game-theoretic setting 
naturally leads to the various kinds of equilibria 
and the study of their structure. Two frequently 
considered notions are Nash equilibrium, where 
no player can change its strategy (which edges to 
buy) to locally improve its cost, and strong Nash 
equilibrium, where no coalition of players can 
change their collective strategy to locally improve 
the cost of each player in the coalition. Nash 
equilibria capture the combined effect of both 
selfishness and lack of coordination, while strong 
Nash equilibria separate these issues, enabling 
coordination and capturing the specific effect 
of selfishness. However, the notion of strong 
Nash equilibrium is extremely restrictive in our 
context, because all players can simultaneously 
change their entire strategies, abusing the local 
optimality intended by original Nash equilibria 
and effectively forcing globally near-optimal so- 
lutions. Thus it makes sense to focus on weaker 
notions of equilibria. 
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Structure of equilibria What structural proper- 
ties can be predicted about equilibria in net- 
work creation games? For example, Fabrikant 
et al. [11] conjectured that equilibrium graphs in 
the unilateral model were all trees, but this is not 
always the case as shown by Albers et al. [1] 
One particularly interesting structural feature is 
whether all equilibrium graphs have small di- 
ameter (say, polylogarithmic), analogous to the 
small-world phenomenon. A closely related issue 
is the price of anarchy, that is, the worst possible 
ratio of the total cost of an equilibrium (found 
by independent selfish behavior) and the optimal 
total cost possible by a centralized solution (max- 
imizing social welfare). The price of anarchy is a 
well-studied concept in algorithmic game theory 
for problems such as load balancing, routing, 
and network design. Upper bounds on diameter 
of equilibrium graphs translate to approximately 
equal upper bounds on the price of anarchy but 
not necessarily vice versa. 


Notation 

Formally, we define four games depending on 
the objective (sum or max) and the consent 
(unilateral or bilateral). In all versions, we have 
n players; call them 1,2,...,n. The strategy 
of player i is specified by a subset s; of 
1,2,...,m \ i, which corresponds to the set 
of neighbors to which player i forms a link. 
Together, let s = {s1,52,...,5,} denote the 
strategies of all players. 

To define the cost of a strategy, we intro- 
duce an undirected graph Gs with vertex set 
{1,2,...,m}. In the unilateral game, Gs has an 
edge (i,j) if eitheri € s; or j € s;. In the 
bilateral game, G; has an edge (i,j) if both 
i € s; and j € s;. Define ds(i, j) to be 
the distance (the number of edges in a shortest 
path) between vertices i and j in graph Gs. 
In the sum game, the cost incurred by player i 
is cj(s) = a|s;| + 30; —, ds(i, j), and in the max 
game, the cost incurred by player i is cj(s) = 
a|s;| + max" _,ds(i, j). In both cases, the total 
cost incurred by strategy s is c(s) = ee ci(s). 
In the unilateral game, a (pure) Nash equilibrium 
is a strategy s such that cj(s) < c;(s’) for all 
strategies s’ that differ from s in only one player i. 
The price of anarchy is then the maximum cost of 
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a Nash equilibrium divided by the minimum cost 
of any strategy (called the social optimum). 

In the bilateral game, Nash equilibria are not 
so interesting because the game requires coalition 
between two players to create an edge (in gen- 
eral). For example, if every player i chooses the 
empty strategy s; = @, then we obtain a Nash 
equilibrium inducing an empty graph Gs, which 
has an infinite cost c(s). To address this issue, 
Corbo and Parkes [6] use the notion of pairwise 
stability [13]: a strategy is pairwise stable if (1) 
for any edge (i, 7) of G,, both c;(s) < c;(s’) and 
cj(s) < c;(s’) where s’ differs from s only in 
deleting edge (i, 7) from Gy and (2) for any non- 
edge (i, j) of Gs, either c;(s) < c;(s') or cj (s) < 
cj(s’) where s’ differs from s only in adding 
edge (i, j) to Gs. The price of anarchy is then 
the maximum cost of a pairwise-stable strategy 
divided by the social optimum (the minimum cost 
of any strategy). 


Key Results 


We start by the sum unilateral games. Fabrikant 
et al. [11] introduce these games and prove an 
upper bound of O(./a) on the price of anarchy 
for all a. Albers et al. [1] prove that the price 
of anarchy is constant fora = O(./n), as 
well as for the larger range a > 12nlg(n). 
In addition, Albers et al. prove a general upper 


a2 n2y\ 1/3 
bound of 15[ 1+ (ming, }) . The lat- 


ter bound shows the first sublinear worst-case 
bound, O(n'/3), for all w~. Demaine et al. [10] 
prove the first o(n*) upper bound on the price 
of anarchy for general w, namely, 20¢V 9s), 
They also prove that price of anarchy is con- 
stant fora = O(n'*) for any fixed e > 0, 
substantially reducing the range of a for which 
constant bounds have not been obtained. Demain 
et al. also prove that in the max unilateral games, 
the price of anarchy is at most 2 fora => n, 


O (mint4vi%e, (n/a)'/3}) for 2,/log(n) < 
a < n, and O(n?/%) for a < 2,/log(n). 
Alon et al. [2] consider a natural version of 
network creation games in which nodes only can 
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switch their edges instead of drastically changing 
their strategies. In these simpler games, they 
achieve similar bounds on the price of anarchy. 
The advantage of their model is its simplic- 
ity in both agents strategies at each point and 
the fact that there is no @ to be considered in 
their model. 

The bilateral variation on the network cre- 
ation game, considered by Corbo and Parkes [6], 
requires both nodes to agree before they can 
create a link between them. In the sum bilateral 
network creation game, Corbo and Parkes prove 
that the price of anarchy is O(min{,/a,n/./a}). 
Demaine et al. [10] prove that this upper bound 
is tight by showing a matching lower bound 
of Q(min{./a,n/./a}). For the max bilateral 
case, Demaine et al. show that the price of an- 
archy is Oya) for a < n and at most 2 for 
a>n, 

Finding a polylogarithmic upper bound on 
price of anarchy for all values of a in these 
four network creation settings remains an open 
problem. In an effort to reduce the upper bounds, 
Demaine et al. [9] introduce the cooperative net- 
work creation games in which all agents can con- 
tribute in the construction of any edge even if they 
are not an endpoint of the edge. They prove that 
in the sum cooperative network creation game 
the price of anarchy is at most polylogarithmic 
in terms of the number of nodes. As a result, 
they exhibit the small-world phenomenon (poly- 
logarithmic diameter) in the equilibrium graphs 
of these games. To reduce the price of anarchy 
even further, Demaine et al. [8] consider a special 
version of network creation games, and using 
some kind of an advertising campaign, they show 
that the price of anarchy is a constant number 
independent of the number of nodes. 


Techniques 

To keep this survey of results short, we just 
overview some of the nice combinatorial tech- 
niques in this area. Albers et al. [1] observe that 
any node u has the option of just connecting to 
another node v and exploits the BFS tree rooted 
at v. In an equilibrium graph Gs, this should not 
be a better strategy for u. Applying this trick and 
summing up all these inequalities for different 
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choices of u, Albers et al. prove that for any Nash 
equilibrium s and any vertex v in Gs, the cost c(s) 
is at most 2w(n — 1) + nDist(v) + (n — 1)” where 
Dist(v) = oweva,) ds(v, 0’). 

Demaine et al. use this lemma to prove that 
price of anarchy is O(D) where D is the diameter 
of the graph. To upper bound the diameter, they 
develop different techniques for different ranges 
of a. For instance, for a = O(n'!~£), they 
prove that the neighborhood sizes around any 
node grows exponentially with a rate of n/a = 
§2(n*). Formally, they prove that when the radius 
of the neighborhood around a node is doubled, 
the number of nodes inside the neighborhood 
is multiplied by n/a until this radius becomes 
comparable with the diameter of the graph D. 
Clearly, it takes O(1/e) rounds of doubling the 
neighborhood radius to cover all nodes which 
means that the diameter is at most exponentially 
growing in 1/¢ which is a constant for a fixed e. 
For other ranges of a, more complicated bounds 
are needed. 
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Problem Definition 


In this entry, the following two problems are 
considered: (1) the problem of finding an approx- 
imate Nash equilibrium in a positively normal- 
ized bimatrix (or two-player) game; and (2) the 
smoothed complexity of finding an exact Nash 
equilibrium in a bimatrix game. It turns out that 
these two problems are strongly correlated [3]. 
Let G = (A,B) be a bimatrix game, where 
A = (a;,;) and B = (b;,;) are both n x n matri- 
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ces. Game G is said to be positively normalized, 
if 0 < aj,j,6;,; < 1foralll <i,j <n. 

Let P” denote the set of all probability vectors 
in R”, i.e., non-negative vectors whose entries 
sum to 1. A Nash equilibrium [8] of G = (A, B) 
is a pair of mixed strategies (x* € P”,y* € P”) 
such that for all x,y € P”, 


(x*)TAy* > x! Ay* and (x*)'By* > (x*) By, 


while an €-approximate Nash equilibrium is 
a pair (x* € P”, y* € P”) that satisfies 


(x*)"Ay* > x'Ay*—e and 


(x*)'By* > (x*)'By—e, forall x,yeP”. 
In the smoothed analysis [11] of bimatrix 

games, a perturbation of magnitude o > 0 is 

first applied to the input game: For a positively 

normalized n x n game G = (A, B), let A and B 

be two matrices with 

B 


ae A a ae 
aij =4i,j +rf; and b,j = bij +177), 


Vl<i,j <n, 
while rf, and rp, are chosen independently and 
uniformly from interval [—o, o] or from Gaussian 
distribution with variance o”. These two kinds 
of perturbations are referred to as o-uniform 
and o-Gaussian perturbations, respectively. An 
algorithm for bimatrix games has polynomial 
smoothed complexity (under o-uniform or 
o-Gaussian perturbations) [11], if it finds a Nash 
equilibrium of game (A, B) in expected time poly 
(n, 1/o), for all (A, B). 


Key Results 


The complexity class PPAD [9] is defined in 
entry > Complexity of Bimatrix Nash Equilibria. 
The following theorems are proved in [3]. 


Theorem 1 For any constant c > 0, the prob- 
lem of computing a 1/n°-approximate Nash 
equilibrium of a positively normalized n x n bi- 
matrix game is PPAD-complete. 
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Theorem 2 The problem of computing a Nash 
equilibrium in a bimatrix game is not in smoothed 
polynomial time, under uniform or Gaussian per- 
turbations, unless PPAD © RP. 


Corollary 1 The smoothed complexity of the 
Lemke-Howson algorithm is not polynomial, un- 
der uniform or Gaussian perturbations, unless 


PPAD C RP. 


Applications 


See entry >» Complexity of Bimatrix Nash Equi- 
libria. 


Open Problems 


There remains a complexity gap on the approxi- 
mation of Nash equilibria in bimatrix games: The 
result of [7] shows that, an €-approximate Nash 
equilibrium can be computed in n 2 “°s”/ ©) time, 
while [3] show that no algorithm can find an 
€-approximate Nash equilibrium in poly(7, 1/e)- 
time for € of order 1/poly(n), unless PPAD is in 
P. However, the hardness result of [3] does not 
cover the case when € is a constant between 0 
and 1. Naturally, it is unlikely that the problem 
of finding an €-approximate Nash equilibrium is 
PPAD-complete when ¢ is an absolute constant, 
for otherwise, all the search problems in PPAD 
would be solvable in neg”) time, due to the 
result of [7]. An interesting open problem is that, 
for every constant € > 0, is there a polynomial- 
time algorithm for finding an €-approximate Nash 
equilibrium? The following conjectures are pro- 
posed in [3]: 

Conjecture 1 There is an O(n**€ “)-time algo- 
rithm for finding an €-approximate Nash equilib- 
rium in a bimatrix game, for some constants c 
and k. 


Conjecture 2 There is an algorithm to find 
a Nash equilibrium in a _ bimatrix game 
with smoothed complexity O(n*+* °) under 
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perturbations with magnitude o, for some 


constants c and k. 


It is also conjectured in [3] that Corollary 1 
remains true without any complexity assumption 
on class PPAD. A positive answer would extend 
the result of [10] to the smoothed analysis frame- 
work. 
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Problem Definition 


Phylogenies are binary trees whose leaves are 
labeled with distinct leaf labels. This problem 
in this article is concerned with a well-known 
measurement, called non-shared edge distance, 
for comparing the dissimilarity between two phy- 
logenies. Roughly speaking, the non-shared edge 
distance counts the number of edges that differ- 
entiate one phylogeny from the other. 

Let e be an edge in a phylogeny 7. Removing 
e from T splits T into two subtrees. The leaf 
labels are partitioned into two subsets according 
to the subtrees. The edge e is said to induce 
a partition of the set of leaf labels. Given two 
phylogenies T and T’ having the same number 
of leaves with the same set of leaf labels, an 
edge e in T is shared if there exists some edge 
e’ in T’ such that the edges e and e’ induce the 
same partition of the set of leaf labels in their 
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corresponding tree. Otherwise, e is non-shared. 
Notice that T and T’ have the same number of 
edges, so that the number of non-shared edges in 
T (with respect to T’) is the same as the number 
of non-shared edges in T’ (with respect to T). 
Such a number is called the non-shared edge 
distance between T and T’. Two problems are 
defined as follows: 


Non-shared Edge Distance Problem 

INPUT: Two phylogenies on the same set of leaf 
labels 

OutTPuT: The non-shared edge distance between 
the two input phylogenies 


All-Pairs Non-shared Edge Distance 

Problem 

INPUT: A collection of phylogenies on the same 
set of leaf labels 

OuTPUT: The non-shared edge distance between 
each pair of the input phylogenies 


Extension 

Phylogenies that are commonly used in practice 
have weights associated to the edges. The notion 
of non-shared edge can be easily extended for 
edge-weighted phylogenies. In this case, an edge 
e will induce a partition of the set of leaf labels as 
well as the multi-set of edge weights (here, edge 
weights are allowed to be non-distinct). Given 
two edge-weighted phylogenies R and R’ having 
the same set of leaf labels and the same multi- 
set of edge weights, an edge e in R is shared 
if there exists some edge e’ in R’ such that the 
edges e and e’ induce the same partition of the set 
of leaf labels and the multi-set of edge weights. 
Otherwise, e is non-shared. The non-shared edge 
distance between R and R’ is similarly defined, 
giving the following problem: 


General Non-shared Edge Distance 

Problem 

INPUT: Two edge-weighted phylogenies on the 
same set of leaf labels and same multi-set of 
edge weights 

OuTPUT: The non-shared edge distance between 
the two input phylogenies 
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Key Results 


Day [3] proposed the first linear-time algorithm 
for the Non-shared Edge Distance Problem. 


Theorem 1 Let T and T' be two input phylo- 
genies with the same set of leaf labels and n be 
the number of leaves in each phylogeny. The non- 
shared edge distance between T and T’ can be 
computed in O(n) time. 


Let A be a collection of k phylogenies on 
the same set of leaf labels and n be the number 
of leaves in each phylogeny. The All-Pairs Non- 
shared Edge Distance Problem can be solved by 
applying Theorem | on each pair of phylogenies, 
thus solving the problem in a total of O(k?n) 
time. Pattengale and Moret [9] proposed a ran- 
domized result based on [7] to solve the problem 
approximately, whose running time is faster when 
n<k <2", 


Theorem 2 Let ¢ be a parameter with ¢ > 0. 
Then, there exists a randomized algorithm such 
that with probability at least 1 — k~, the non- 
shared edge distance between each pair of phylo- 
genies in A can be approximated within a factor 
of (1 + €) from the actual distance; the running 
time of the algorithm is O(k(n? + k logk) / €?). 


For general phylogenies, let R and R’ be two 
input phylogenies with the same set of leaf labels 
and the same multi-set of edge weights and n 
be the number of leaves in each phylogeny. The 
General Non-shared Edge Distance Problem can 
be solved easily in O(n”) time by applying The- 
orem | in a straightforward manner. The running 
time is improved by Hon et al. in [5]. 


Theorem 3 The non-shared edge distance be- 
tween R and R’ can be computed in O(n logn) 
time. 


Applications 


Phylogenies are commonly used by biologists 
to model the evolutionary relationship among 
species. Many reconstruction methods (such as 
maximum parsimony, maximum likelihood, com- 
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patibility, distance matrix) produce different phy- 
logenies based on the same set of species, and it is 
interesting to compute the dissimilarities between 
them. Also, through the comparison, information 
about rare genetic events such as recombinations 
or gene conversions may be uncovered. The most 
common dissimilarity measure is the Robinson- 
Foulds metric [11], which is exactly the same as 
the non-shared edge distance. 

Other dissimilarity measures, such as the 
nearest-neighbor interchange (NNI) distance 
and the subtree-transfer (STT) distance (see [2] 
for details), are also proposed in the literature. 
These measures are sometimes preferred by the 
biologists since they can be used to deduce the 
biological events that create the dissimilarity. 
Nevertheless, these measures are usually difficult 
to compute. In particular, computing the NNI 
distance and the STT distance is shown to be NP- 
hard by DasGupta et al. [1, 2]. Approximation 
algorithms are devised for these problems 
(NNI, [4, 8]; STT, [1, 6]). Interestingly, all 
these algorithms make use of the non-shared 
edge distance to bound their approximation 
ratios. 
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Problem Definition 


Many common computational problems on 
directed graphs are computationally intractable; 
they are NP-complete and sometimes even 
harder. Examples include domination problems 
such as directed dominating set, Kernel, directed 
Steiner networks, directed disjoint paths, and 
many other problems. 

For undirected graphs, there is an extensive 
structure theory available to help dealing with this 
computational intractability. In particular, there 
is a well-developed hierarchy of classes of undi- 
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rected graphs and a rich set of algorithmic tools 
which allow to solve hard computational prob- 
lems on these classes of graphs. Most notably in 
this context are classes of graphs of bounded tree 
width, planar graphs or graphs embeddable on 
any other fixed surface, classes excluding a fixed 
minor, and many other graph classes. This theory 
is closely related to parameterized complexity 
theory. 

For directed graphs, to date, there is no compa- 
rable theory available. A directed version of tree 
width was introduced by Reed [9] and Johnson 
et al. [4]. Further proposals for “tree width’’-like 
width measures for directed graphs have been 
made in the literature; see, e.g., references in 
[1]. Algorithmically, the main application is that 
on classes of bounded directed tree width, the 
directed k-disjoint paths problem can be solved 
in polynomial time for any fixed value of k. 

Almost all of these proposals have in com- 
mon that the class of acyclic digraphs (DAGs) 
have small width, i.e., acyclic digraphs are taken 
as particularly simple digraphs. While this is 
certainly useful for problems such as directed dis- 
joint paths, problems such as directed dominat- 
ing set remain NP-complete and fixed-parameter 
intractable on acyclic directed graphs. 

What is needed, therefore, are digraph param- 
eters and structural classes of digraphs which 
separate acyclic digraphs into simple and hard 
instances. Nowhere crownful classes propose a 
solution to this problem based on the concept of 
excluded directed minors. 


Key Results 


While there is a well-defined concept of a mi- 
nor for undirected graphs, there is as yet no 
commonly agreed concept of directed minors. A 
widely used, and very conservative, version of di- 
rected minor is a butterfly minor (see, e.g., [4]) in 
which a directed edge (u, v) is contractible if it is 
the only outgoing edge of u or the only incoming 
edge of v. In [5] a much more general concept of 
directed minors is used to give a classification of 
classes of digraphs in terms of shallow directed 
minors. For the sake of brevity, we introduce 
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directed minors here only for digraphs called 
crowns, which is enough for defining nowhere 
crownful classes of digraphs. 

An out-branching is a digraph H whose un- 
derlying undirected graph is a tree and in which 
there is a unique vertex r, the root of H, such 
that all edges are oriented away from the root, 
i.e., every vertex in H is reachable by a unique 
directed path from the root. An in-branching is 
the same as an out-branching but all edges are 
oriented towards the root. 


Definition 1 A crown of order q, for g > 0, is 
the graph S, with 


© V(Sq) := {01,.. 
q} and 

¢ E(Sq) := {(ui,j,¥i),i,j,¥s) 1 1 <i < 
j Sai. 


Ug hing 1 Si <j< 


Definition 2 Let H with V(H) := {v1,..., ug} 
Ufujj; : 1 <i < j < q} bea crown of order 
q, for some gq > 0. A digraph G contains H asa 
directed minor, if for every v;, 1 < i < q, there 
is an in-branching JT; © G and for every u;,;, 
1<i <j < q, there is an out-branching S;,; C 
G such that all subgraphs 7;, S;,; are pairwise 
vertex disjoint and for all 1 <i < j < q, there 
are edges e;,e; from a vertex in S;,; to a vertex 
in 7; and T;, respectively. 

HZ is a depth-r-minor of G, or an r-shallow 
minor of G, denoted H=4G, for some r > 0, if 
all S;,; and all 7; are of height at most r. 


Definition 3 A class C of directed graphs is 
nowhere crownful if for every r > 0 there exists 
aq = q(r) so that S, KG for all G € C. If 
the function taking each r to g(r) as above is 
computable, then we call C effectively nowhere 
crownful. 


Nowhere crownful classes of digraphs are very 
general. For instance, if C is a class of digraphs 
and C is the class of underlying undirected graphs 
(obtained from digraphs in C by ignoring edge 
direction), then if C has bounded genus, excludes 
a fixed minor, or is nowhere dense, then C is 
nowhere crownful. But there are nowhere crown- 
ful classes C of digraphs such that C does not 
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have any of the properties above. On the other 
hand, the class of acyclic digraphs is not nowhere 
crownful as it contains every crown. 

Nowhere crownful classes of digraphs can be 
characterized equivalently as follows. Let G be 
a digraph and d > 0. A set U C V(G) is d- 
scattered if there isnov € V(G) andu #4 u’ EU 
with u,u’ € Nj (v). That is, no two elements of 
U can be reached from a single vertex v by paths 
of length at most d. 


Definition 4 A class C of directed graphs is 
uniformly quasi-wide if there are functions 
s:N—+Nand N : NxWN — N such that for 
every G € C andalld,m € Nand W C V(G) 
with |W| > N(d,m), there is a set S C V(G) 
with |S| < s(d) andaset U C W with |U| =m 
such that U is d-scattered in G—S'. The functions 
s, N are called the margin of C. If s and N are 
computable, then we call C effectively uniformly 
quasi-wide. 


Theorem 1 A class C of digraphs is nowhere 
crownful if, and only if, it is directed uniformly 
quasi-wide. 


Nowhere crownful classes of digraphs were 
defined as a directed analogue to the concept 
of nowhere dense classes of undirected graphs, 
for which a similar equivalence to uniformly 
quasi-wideness can be proved. See [2, 3, 6- 
8] for nowhere dense classes of graphs and 
algorithmic applications. As mentioned above, 
nowhere crownful classes properly generalise 
nowhere denseness to digraphs. 


Applications 


A directed dominating set in a digraph G is a set 
X C V(G) such that every u € V(G) \ X is 
the out-neighbour of a vertex in X. A distance-d 
directed dominating set is a set X C V(G) such 
that every vertex v € V(G) can be reached from a 
vertex in X by adirected path of length at most d. 
An important variant of the undirected dominat- 
ing set problem is the connected dominating set 
problem, where we are asked to find a dominating 
set D of size k such that D induces a connected 
subgraph. There are various natural translations 
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of this problem to the directed case: we can 
require the dominating set to induce a strongly 
connected subgraph or we can simply require it 
to induce an out-branching. The second variation, 
which we call dominating out-branching, still 
captures the idea that information can flow from 
the root to all vertices in the dominating set. 

There is an easy reduction from the undirected 
dominating set problem to its directed counter- 
part proving that the directed dominating set 
problem is fixed-parameter intractable and NP- 
complete. In fact, the problem to decide whether 
an undirected graph G contains a dominating 
set of order k can be reduced to the question 
whether an acyclic digraph contains a directed 
dominating set of order k. The result of the 
reduction is a crown (plus one extra vertex). So, 
classes of digraphs where this problem and its 
extension to distance-d versions are to become 
tractable should exclude crowns. This observa- 
tion was one of the motivations for defining and 
studying nowhere crownful classes of digraphs. 
Furthermore, if besides the directed dominating 
set we also want to solve its distance-d version, 
we need to exclude crown minors in some form. 

However, excluding  shallow-crowns is 
sufficient for these problems to become fixed- 
parameter tractable. 


Theorem 2 Let C be a class of directed graphs 
which is nowhere crownful. Then the directed 
(independent or unrestricted) dominating set 
problem, the dominating out-branching problem, 
as well as their distance-d versions are fixed- 
parameter tractable on C. 


In the same way, several other similar 
problems can be shown to become tractable on 
nowhere crownful classes. 


Open Problems 


As mentioned before, nowhere crownful classes 
are modelled after nowhere dense classes of 
undirected graphs. For such classes, many 
other equivalent characterizations and powerful 
algorithmic applications are known. For instance, 
nowhere dense classes of graphs allow for very 
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efficient sparse neighborhood covers (and this is 
again if, and only if) and can be defined by a 
game yielding bounded search tree techniques. 
Furthermore, there is a close connection between 
nowhere dense classes of graphs and generalized 
colouring numbers (see, e.g., [8]). 

It is open in how far these characterizations 
and applications can be generalized to the digraph 
setting. 

A particular open problem is the tractability of 
strongly connected Steiner networks and strongly 
connected dominating set on nowhere crownful 
classes of digraphs. 

Nowhere crownful classes provide a way for 
dealing with domination-type problems. Directed 
tree width on the other hand provides a way of 
dealing with linkage problems such as disjoint 
paths. It is open how to bring the two concepts 
together. 
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Problem Definition 


Cooperative game theory considers how to dis- 
tribute the total income generated by a set of 
participants in a joint project to individuals. The 
Nucleolus, trying to capture the intuition of min- 
imizing dissatisfaction of players, is one of the 
most well-known solution concepts among var- 
ious attempts to obtain a unique solution. In 
Deng, Fang, and Sun’s work [3], they study 
the Nucleolus of flow games from the algorithmic 
point of view. It is shown that, for a flow game 
defined on a simple network (are capacity being 
all equal), computing the Nucleolus can be done 
in polynomial time, and for flow games in general 
cases, both the computation and the recognition 
of the Nucleolus are \VP-hard. 

A cooperative (profit) game (NV, v) consists of 
a player set N = {1,2,--- ,m} and a characteris- 
tic function v : 2% > R with v(@) = 0, where 
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the value v(S)(S C N) is interpreted as the profit 
achieved by the collective action of players in S. 
Any vector x € R” with }° x; = v(N) is an 
allocation. An allocation x ip called an imputation 
if x; > v({i}) for alli € N. Denote by Z(v) the 
set of imputations of the game. 

Given an allocation x, the excess of a coalition 
S(S CN) at x is defined as 


e(S,x) = x(S)—v(S), 


where x(S) = >> x; for S C N. The value 
e(S, x) can be inecuaiea as a measure of sat- 
isfaction of coalition S with the allocation x. 
The core of the game (N, v), denoted by C(v), 
is the set of allocations whose excesses are all 
nonnegative. For an allocation x of the game 
(N, v), let 9(x) denote the (2” — 2)-dimensional 
vector whose components are the nontrivial ex- 
cesses e(S, x), @ 4 S # N, arranged in a 
non-decreasing order. That is, 0;(x) < 9;(x), 
forl <i < j < 2” —2. Denote by > / 
the “lexicographically greater than” relationship 
between vectors of the same dimension. 


Definition 1 The Nucleolus yn(v) of game 
(N, v) is the set of imputations that lexico- 
graphically maximize 9(x) over all imputations 
x € L(v). That is, 


n(v) = € Z(v): 6(x) =) O(y) for all y € Z(v)}. 


Even though, the Nucleolus may contain mul- 
tiple points by the definition, it was proved by 
Schmeidler [14] that the Nucleolus of a game 
with nonempty imputation set contains exactly 
one element. Kopelowitz [12] proposed that the 
Nucleolus can be obtained by recursively solving 
sequential linear programs (SLP): 


max € 
x(S) = v(S) +e VSE Fp 
r=O0,1+-+,k=1 


x(S) > v(S) +e WSE2\ J, 
r=0 
x €Z(v). 


LP; : 
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Here, Jo = {0,N} and & = 0 initially; 
the number ¢, is the optimum value of the rth 
program (LP,) and J = {S € 2% : x(S) = 
u(S) + &, for every x € X;, where X, = {x € 
T(v) : (x,é,) is an optimal solution to LP,}. It 
can be shown that after at most n — 1 iterations, 
one arrives at a unique optimal solution (x*, ex), 
where x* is just the Nucleolus of the game. In 
addition, the set of optimal solutions Xj to the 
first program LP, is called the least core of the 
game. 


The definition of the Nucleolus entails com- 
parisons between vectors of exponential length. 
And with linear programming approach, each 
linear programs in SLP may possess exponential 
size in the number of players. Clearly, both do not 
provide an efficient solution in general. 

Flow games, first studied in Kailai and Zemel 
[9, 10], arise from the profit distribution problem 
related to the maximum flow in a network. Let 
D = (V, E;o;5s,t) be a directed flow network, 
where V is the vertex set, E is the arc set, w : 
E — Rt? is the arc capacity function, and s 
and ¢ are the source and the sink of the network, 
respectively. The network D is simple if w(e) = 
1 for each e € E, which is denoted briefly by 
D=(V,E;s,t). 


Definition 2 The flow game Ty = (£,v) as- 
sociated with network D = (V,£E;;5,t) is 
defined by: 


(i) The player set is F. 

Gi) VS C E, v(S) is the value of a maximum 
flow from s to ¢ in the subnetwork of D 
consisting only of arcs belonging to S. 


Problem 1 (Computing the Nucleolus) 


INSTANCE: A flow network D = (V, E;0;s,1). 

QUESTION: Is there a polynomial time algo- 
rithm to compute the Nucleolus of the flow 
game associated with D? 


Problem 2 (Recognizing the Nucleolus) 

INSTANCE: A flow network D = (V, E;; 5,1) 
andy: E— Rt, 

QUESTION: Is it true that y is the Nucleolus of 
the flow game associated with D? 


Nucleolus 


Key Results 


Theorem 1 Let D = (V,E;s,t) be a simple 
network and V ¢ = (E,v) be the associated flow 
game. Then the Nucleolus 1(v) can be computed 
in polynomial time. 


By making use of duality technique in linear 
programming, Kalai and Zemel [10] gave a char- 
acterization on the core of a flow game. They 
further conjectured that their approach may serve 
as a practical basis for computing the Nucleolus. 
In fact, the proof of Theorem | in the work of 
Deng, Fang, and Sun [3] is just an elegant appli- 
cation of Kalai and Zemel’s approach (especially 
the duality technique) and hence settling their 
conjecture. 


Theorem 2 Given a flow game l'¢ = (E,\) de- 
fined on network D = (V, E;;s8,t), computing 
the Nucleolus y(v) is NP-hard. 


Theorem 3 Given a flow game Te = (E,Vv) 
defined on network D = (V,E;0;s8,t) and an 
imputation y € L(v), checking whether y is the 
Nucleolus of V ¢ is NP-hard. 


Although a flow game can be formulated as a 
linear production game [2], the size of reduc- 
tion may in general be exponential in space, 
and consequently, their complexity results on the 
Nucleolus are independent. However, in the P- 
hardness proof of Theorems 2 and 3, the flow 
game constructed possesses a polynomial size 
formulation of linear production game [3]. There- 
fore, as a direct corollary, the same \VP-hardness 
conclusions for linear production games are ob- 
tained. That is, both computing and recognizing 
the Nucleolus of a linear production game are 
NP-hard. 


Applications 


As an important solution concept in economics 
and game theory, the Nucleolus and related so- 
lution concepts have been applied to insurance 
policies, real estate and bankruptcy, etc. However, 
it is a challenging problem to decide what classes 
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of cooperative games permit polynomial time 
computation of the Nucleolus. 

The first polynomial time algorithm for 
Nucleolus in a special tree game was proposed 
by Megiddo [13], in advocation of efficient 
algorithms for cooperative game solutions. 
Subsequently, some efficient algorithms have 
been developed for computing the Nucleolus, 
such as for assignment games [15] and matching 
games [1, 11]. On the negative side, MP- 
hardness result was obtained for minimum cost 
spanning tree games [5] and weighted voting 
games [4]. 

Granot, Granot, and Zhu [8] observed that 
most of the efficient algorithms for computing 
the Nucleolus are based on the fact that the 
information needed to completely characterize 
the Nucleolus is much less than that dictated 
by its definition. Therefore, they introduced the 
concept of a characterization set for the Nu- 
cleolus to embody the notion of “minimum” 
relevant information needed for the Nucleolus. 
Furthermore, based on the sequential linear pro- 
grams (SLP), they established a general relation- 
ship between the size of a characterization set 
and the complexity of computing the Nucleo- 
lus. Following this approach, some known effi- 
cient algorithms for computing the Nucleolus are 
derived directly. 

Another approach to computing the Nucleolus 
is taken by Faigle, Kern, and Kuipers [6], which 
is motivated by Schmeidler’s observation that the 
Nucleolus of a game lies in the kernel [14]. In 
the case where the kernel of the game contains 
exactly one core vector and the minimum excess 
for any given allocation can be computed effi- 
ciently, their approach derives a polynomial time 
algorithm for the Nucleolus. However, their algo- 
rithm uses the ellipsoid method as a subroutine, 
implying that the efficiency of the algorithm is of 
a more theoretical kind. 


Open Problems 
The field of combinatorial optimization has much 


to offer for the study of cooperative games. It 
is usually the case that the values of subgroups 
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can be obtained via a combinatorial optimization 
problem, where the game is called a combinato- 
rial optimization game. This class of games leads 
to the applications of a variety of combinatorial 
optimization techniques in design and analysis 
of algorithms, as well as establishing complexity 
results. One of the most interesting result is 
the LP duality characterization of the core [2]. 
However, little work dealt with the Nucleolus by 
using the duality technique so far. Hence, the 
work of Deng, Fang, and Sun [3] on computing 
the Nucleolus may be of independent interest. 

There are still many unsolved complexity 
questions concerning the Nucleolus. For the 
computation of the Nucleolus of matching 
games, Kern and Paulusma [11] proposed an 
efficient algorithm in unweighted case and 
conjectured that it is in general MP-hard. 
Biro, Kern, and Paulusma [1] partly settled the 
conjecture by showing that in weighted case, 
when the matching game has a nonempty core, 
the Nucleolus can be computed in polynomial 
time. Since both the flow game and the matching 
game fall into the class of packing/covering 
games, it is interesting to know the complexity of 
computing the Nucleolus for other game models 
in this class, such as vertex covering games and 
minimum coloring games. 

For cooperative games arising from A’ P-hard 
combinatorial optimization problems, the com- 
putation of the Nucleolus may in general be a 
hard task. For example, in a traveling salesman 
game, nodes of the graph are the players and an 
extra node 0, and the value of a subgroup S of 
players is the length of a minimum Hamiltonian 
tour in the subgraph induced by S U {0} [2]. It 
would not be surprising if one shows that both the 
computation and the recognition of the Nucleolus 
for this game model are \/P-hard. However, 
this is not known yet. The same questions are 
proposed for facility location games [7], though 
there have been efficient algorithms for some 
special cases. 

Moreover, when the computation of the Nu- 
cleolus is difficult, it is also interesting to seek for 
meaningful approximation concepts of the Nucle- 
olus, especially from the political and economic 
background. 
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Problem Definition 


Here is a precise definition of BST algorithms and 
their costs. This model is implied by most BST 
papers and developed in detail by Wilber [22]. 
A Static set of n keys is stored in the nodes of 
a binary tree. The keys are from a totally ordered 
universe, and they are stored in symmetric order. 
Each node has a pointer to its left child, to its 
right child, and to its parent. Also, each node may 
keep o(log 7) bits of additional information but 
no additional pointers. 
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A BST algorithm is required to process a se- 
quence of m accesses (without insertions or dele- 
tions), S = 51,82,53,54...Sm. The ith access 
starts from the root and follows pointers until s; 
is reached. The algorithm can update the fields in 
any node or rotate any edges that it touches along 
the way. The cost of the algorithm to execute an 
access sequence is defined to be the number of 
nodes touched plus the number of rotations. A 
BST algorithm is on-line if it processes access 
s; without making use of anything after s; in the 
access sequence. 

Let A be any online BST algorithm, and define 
A(S) to be the cost to algorithm A of processing 
sequence S and OPT(S, To) to be the minimum 
possible (online or off-line) cost to process the se- 
quence S, starting from an initial tree To. The al- 
gorithm A is T-competitive if for all possible se- 
quences S, A(S) < T* OPT(S, Tp) + O(m+n). 

Since the number of rotations needed to 
change any binary tree of n keys into another 
one (with the same 7 keys) is at most 2n — 6 
[4,5, 12, 13,15], it follows that OPT(S, To) differs 
from OPT(S, To’) by at most 2n — 6. Thus, if 
m > n, then the initial tree can only affect the 
constant factor. 


Key Results 


The interleave bound is a lower bound on 
OPT(S, 79) that depends only on S. Consider 
any binary search tree P of all the elements 
in To. For each node y in P, define the left side 
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of y to include all nodes in y’s left subtree and y. 
And define the right side of y to include all nodes 
in y’s right subtree. For each node y, label each 
access s; in S by whether it is in the left or right 
side of y, ignoring all accesses not in y’s subtree. 
Denote the number of times the label changes 
for y as IB(S, y). The interleave bound IB(S) is 


>» IB(S. y). 
y 


Theorem 1 (Interleave Lower Bound [6, 22]) 
IB(S)/2 —n is a lower bound on OPT(S, To). 


Demaine et al. observe that it is impossible to 
use this lower bound to improve the competitive 
ratio beyond O(log log 7). 


Theorem 2 (Tango is O(log log n)-competitive 
BST [6]) The running time of Tango BST on a 
sequence S of m accesses is O(((OPT(S, To)) + 
n)*(1 + loglogn)). 


Applications 


Binary search tree (BST) is one of the oldest 
data structures in the history of computer science. 
It is frequently used to maintain an ordered set 
of data. In the last 40 years, many specialized 
binary search trees have been designed for spe- 
cific applications. Almost every one of them sup- 
ports access, insertion, and deletion in worst-case 
O(log n) time on average for random sequences 
of access. This matches the best theoretically 
possible worst-case bound. For most of these data 
structures, a random sequence of m accesses will 
use O(m log n) time. 

While it is impossible to have better asymp- 
totic performance for a random sequence of m ac- 
cesses, many of the real-world access sequences 
are not random. For instance, if the set of accesses 
are randomly drawn from a small subset of k 
element, it’s possible to answer all the accesses 
in O(m log k) time. A notable binary search tree 
is splay tree. It is proved to perform well for many 
access patterns [2, 3, 8, 14, 16-18]. As a result, 
Sleator and Tarjan [14] conjectured that splay tree 
is O(1)-competitive to the optimal off-line BST. 
After more than 20 years, the conjecture remains 
an open problem. 


O(log log n)-Competitive Binary Search Tree 


Over the years, several restricted types of 
optimality have been proved. Many of these re- 
strictions and usage patterns are based on real- 
world applications. If each access is drawn inde- 
pendently at random from a fixed distribution, D, 
Knuth [11] constructed a BST based on D that is 
expected to run in optimal time up to a constant 
factor. Sleator and Tarjan [14] achieve the same 
bound without knowing D ahead of time. Other 
types includes key-independent optimality [10] 
and BST with free rotations [1]. 

In 2004, Demaine et al. suggested searching 
for alternative BST algorithms that have small 
but nonconstant competitive factors [6]. They 
proposed Tango, the first data structure proved 
to achieve a nontrivial competitive factor of 
O(loglog n). This is a major step toward 
developing a O(1)-competitive BST, and this 
line of research could potentially replace a large 
number of specialized BSTs. 


Extensions and Promising Research 
Directions 


Following this paper, several new O(log log n)- 
competitive BSTs have emerged [9, 21]. A 
notable example is multi-splay trees [21]. It 
generalizes the interleave bound to include 
insertions and deletions. Multi-splay trees also 
have many theorems analogous to splay trees 
[20, 21], such as the access lemma and the 
working set theorem. Wang [21] conjectured 
that multi-splay trees is O(1)-competitive, but it 
remains an open problem. 

Returning to the original motivation for this 
research, the problem of finding an o(log log n)- 
competitive online BST remains open. Several 
attempts have been made to improve the lower 
bound [6, 7, 22], but none of them have led 
to a lower competitive ratio. Even in the off- 
line model, the problem of finding an O(1)- 
competitive BST is difficult. The best known off- 
line constant competitive algorithm uses dynamic 
programming and requires exponential time. 

In 2009 Demaine et al. [23] described a geo- 
metric view of BST algorithms. This is an equiv- 
alent model of BST algorithms, but sufficiently 
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different that it has allowed progress to be made 
in a number of directions. First of all it has 
simplified and unified BST lower bounds. It has 
also allowed progress to be made toward proving 
a different algorithm to be O(1) competitive. 
The algorithm is called GreedyFuture and was 
proposed as an off-line algorithm in 1988 by Joan 
Lucas [24]. After each access, the algorithm re- 
structures the access path according to the future 
accesses. Specifically if the next access is on this 
path, then that node is made the new root and the 
left and right sides are built in a similar fashion 
recursively. If the next access is to a subtree of 
that path, then the path node to the left of that 
subtree is made the root, and the path node to 
the right of it is made the right child of the root, 
and the process again continues recursively. A 
remarkable result of the geometric view is that 
it shows how the GreedyFuture algorithm can 
actually be implemented as an online algorithm. 
At the moment, this seems to be the most likely 
candidate to be proven to be O(1) competitive. 
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Problem Definition 


Consider a communication network, for exam- 
ple, the network of cities across the country 
connected by communication links. There are 
several sender-receiver pairs on this network that 
wish to communicate by sending traffic across 
the network. The problem deals with routing all 
the traffic across the network such that no link 
in the network is overly congested. That is, no 
link in the network should carry too much traffic 
relative to its capacity. The obliviousness refers 
to the requirement that the routes in the network 
must be designed without the knowledge of the 
actual traffic demands that arise in the network, 
i.e., the route for every sender-receiver pair stays 
fixed irrespective of how much traffic any pair 
chooses to send. Designing a good oblivious 
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routing strategy is useful since it ensures that the 
network is robust to changes in the traffic pattern. 


Notations 

Let G = (V,E) be an undirected graph with 
nonnegative capacities c(e) on edges e € E. Sup- 
pose there are k source-destination pairs (5;, t;) 
fori = 1,...,k, and let d; denote the amount of 
flow (or demand) that pair 7 wishes to send from 
s; to ¢;. Given a routing of these flows on G, the 
congestion of an edge e is defined as u(e)/c(e), 
the ratio of the total flow crossing edge e divided 
by its capacity. The congestion of the overall rout- 
ing is defined as the maximum congestion over 
all edges. The congestion minimization problem 
is to find the routing that minimizes the maximum 
congestion. Observe that specifying a flow from 
$; to ft; is equivalent to finding a probability dis- 
tribution (not necessarily unique) on a collection 
of paths from s; to ¢;. 

The congestion minimization problem can be 
studied in many settings. In the offline setting, 
the instance of the flow problem is provided in 
advance, and the goal is to find the optimum 
routing. In the online setting, the demands arrive 
in an arbitrary adversarial order, and a flow must 
be specified for a demand immediately upon 
arrival; this flow is fixed forever and cannot be 
rerouted later when new demands arrive. Several 
distributed approaches have also been studied 
where each pair routes its flow in a distributed 
manner based on some global information such 
as the current congestion on the edges. 

In this note, the oblivious setting is considered. 
Here, a routing scheme is specified for each pair 
of vertices in advance without any knowledge 
of which demands will actually arrive. Note that 
an algorithm in the oblivious setting is severely 
restricted. In particular, if d; units of demand 
arrive for pair (s;,t;), the algorithm must neces- 
sarily route this demand according to the pre- 
specified paths irrespective of the other demands 
or any other information such as congestion of 
other edges. Thus, given a network graph G, 
the oblivious flows need to be computed just 
once. After this is done, the job of the routing 
algorithm is trivial; whenever a demand arrives, it 
simply routes it along the precomputed path. An 
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oblivious routing scheme is called ccompetitive if 
for any collection of demands D, the maximum 
congestion of the oblivious routing is no more 
than c times the congestion of the optimum of- 
fline solution for D. Given this stringent require- 
ment on the quality of oblivious routing, it is not a 
priori clear that any reasonable oblivious routing 
scheme should exist at all. 


Key Results 


Oblivious routing was first studied in the con- 
text of permutation routing where the demand 
pairs form a permutation and have unit value 
each. It was shown that any oblivious routing 
that specifies a single path (instead of a flow) 
between every two vertices must necessarily per- 
form badly. This was first shown by Borodin and 
Hopcroft [6] for hypercubes, and the argument 
was later extended to general graphs by Kakla- 
manis, Krizanc, and Tsantilas [10], who showed 
the following. 


Theorem 1 ([6,10]) For every graph G of size 
n and maximum degree d and every oblivious 
routing strategy using only a single path for every 
source-destination pair, there is a permutation 
that causes an overlap of at least (n/d)'/* paths 
at some node. Thus, if each edge in G has 
unit capacity, the edge congestion is at least 


(n/d)'/?/d. 


Since there exists constant degree graphs such 
as the butterfly graphs that can route any permuta- 
tion with logarithmic congestion, this implies that 
such oblivious routing schemes must necessarily 
perform poorly on certain graphs. 

Fortunately, the situation is substantially better 
if the single path requirement is relaxed and a 
probability distribution on paths (equivalently a 
flow) is allowed between each pair of vertices. In 
a seminal paper, Valiant and Brebner [13] gave 
the first oblivious permutation routing scheme 
with low congestion on the hypercube. It is in- 
structive to consider their scheme. Consider an 
hypercube with N = 2” vertices. Represent 
vertex i by the binary expansion of i. For any 
two vertices s and ¢, there is a canonical path (of 
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length at most n=log N) from s to ¢ obtained 
by starting from s and flipping the bits of s 
in left to right order to match with that of f. 
Consider routing scheme that for a pair s and f, it 
first chooses some node p uniformly at random, 
routes the flow from s to p along the canonical 
path, and then routes it again from p to ¢ along the 
canonical path (or equivalently it sends 1/N units 
of flow from s to each intermediate vertex p and 
then routes it to t). A relatively simple analysis 
shows that 


Theorem 2 ((13]) The above oblivious routing 
scheme achieves a congestion of O(1) for hyper- 
cubes. 


Subsequently, oblivious routing schemes were 
proposed for few other special classes of net- 
works. However, the problem of designing oblivi- 
ous routing schemes for general graphs remained 
open until recently, when in a breakthrough result 
Riacke showed the following. 


Theorem 3 ({11]) For any undirected capaci- 
tated graph G = (V, E), there exist an oblivious 
routing scheme with congestion O(log? n) where 
n is the number of vertices in G. 


The key to Racke’s theorem is a hierarchical 
decomposition procedure of the underlying graph 
(described in further detail below). This hierar- 
chical decomposition is a fundamental combi- 
natorial result about the cut structure of graphs 
and has found several other applications, some of 
which are mentioned in section “Applications.” 
Riacke’s proof of Theorem 3 only showed the 
existence of a good hierarchical decomposition 
and did not give an efficient polynomial time 
algorithm to find it. In subsequent work, Har- 
relson, Hildrum, and Rao [9] gave a polynomial 
time procedure to find the decomposition and 
improved the competitive ratio of the oblivious 
routing to O(log 7nlog log n). 


Theorem 4 ([9]) There exists an O(log? n log 
log n)-competitive oblivious routing scheme for 
general graphs, and moreover, it can be found in 
polynomial time. 


Recently, Racke [12] has given a tight O(log 
n)-competitive oblivious routing scheme together 
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with an efficient algorithm to find it. His al- 
gorithm is based on an elegant connection to 
probabilistic embedding of arbitrary metrics into 
tree metrics. 

Interestingly, Azar et al. [4] show that the 
problem of finding the optimum oblivious rout- 
ing for a graph can be formulated as a linear 
program. They consider a formulation with expo- 
nentially many constraints, one for each possible 
demand matrix that has optimum congestion 1 
that enforces that the oblivious routing should 
have low congestion for this demand matrix. 
Azar et al. [4] give a separation oracle for this 
problem, and hence, it can be solved using the 
ellipsoid method. A more practical polynomial 
size linear program was given later by Applegate 
and Cohen [2]. Bansal et al. [5] considered a 
more general variant referred to as the online 
oblivious routing that can also be used to find an 
optimum oblivious routing. However, note that 
without Racke’s result, it would not be clear 
whether these optimum routings were any good. 
Moreover, these techniques do not give a hier- 
archical decomposition and hence may be less 
desirable in certain contexts. On the other hand, 
they may be more useful sometimes since they 
produce an optimum routing (while [9] implies 
an O(log 2nlog log n)-competitive routing for 
any graph, the best oblivious routing could have 
a much better guarantee for a specific graph). 

Oblivious routing has also been studied for 
directed graphs; however, the situation is much 
worse here. Azar et al. [4] show that there exist 
directed graphs where any oblivious routing is 
Q(./n) competitive. Some positive results are 
also known. Hajiaghayi et al. [7] show a sub- 
stantially improved guarantee of O(log 7n) for 
directed graphs in the random demands model. 
Here, each source-sink pair has a distribution 
(which is known by the algorithm) from which it 
chooses its demand independently. A relaxation 
of oblivious routing known as semi-oblivious 
routing has also been studied recently [8]. 


Techniques 

This section describes the high-level idea 
of Ricke’s result. For a subset S Cc V, 
let cap(S) denote the total capacity of the 
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edges that cross the cut (S,V\S), and let 
dem(S) denote the total demand that must be 
routed across the cut (S,V\S). Observe that 
gq =max scydem(S)/cap(S) is a lower bound on 
the congestion of any solution. On the other hand, 
the key result [3, 13] relating multicommodity 
flows and cuts implies that there is a routing 
such that the maximum congestion is at most 
O(qlog k) where k is the number of distinct 
source sink pairs. However, note that this by itself 
does not suffice to obtain good oblivious routings, 
since a pair (s;,¢;) can have different routing for 
different demand sets. The main idea of Ricke 
was to impose a treelike structure for routing 
on the graph to achieve obliviousness. This 
is formalized by a hierarchical decomposition 
described below. 

Consider a hierarchical decomposition of the 
graph G =(V,£) as follows. Starting from the set 
S =V, the sets are partitioned successively until 
each set becomes singleton vertex. This hierar- 
chical decomposition can be viewed naturally as 
a tree T, where the root corresponds to the set V 
and leaves corresponds to the singleton sets {v}. 
Let S; denote the subset of V corresponding to 
node i in 7. For an edge (i,j) in the tree where 
i is the child of j, assign it a capacity equal to 
cap(.S; ) (note that this is the capacity from S; to 
the rest of G and not just capacity between S; and 
S; in G). The tree T is used to simulate routing 
in G and vice versa. Given a demand from u to v 
in G, consider the corresponding (unique) route 
among leaves corresponding to {u} and {v} in T. 
For any set of demands, it is easily seen that the 
congestion in T is no more than the congestion 
in G. Conversely, Racke showed that there also 
exists a tree 7 where the routes in T can be 
mapped back to flows in G, such that for any 
set of demands, the congestion in G is at most 
O(log 3n) times that in 7. In this mapping, a 
flow along the (i, 7) in the tree T corresponds to a 
suitably constructed flow between sets S; and S; 
in G. Since route between any two vertices in T 
is unique, this gives an oblivious routing in G. 

Racke uses very clever ideas to show the 
existence of such a hierarchical decomposition. 
Describing the construction is beyond the scope 
of this note, but it is instructive to understand the 
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properties that must be satisfied by such a de- 
composition. First, the tree 7 should capture the 
bottlenecks in G, i.e., if there is a set of demands 
that produces high congestion in G, then it should 
also produce a high congestion in T. A natural 
approach to construct T would be to start with 
V, split V along a bottleneck (formally, along 
a cut with low sparsity), and recurse. However, 
this approach is too simple to work. As discussed 
below, T must also satisfy two other natural con- 
ditions, known as the bandwidth property and the 
weight property which are motivated as follows. 
Consider a node 7 connected to its parent 7 in T. 
Then, i needs to route dem(S;) flow out of S;, 
and it incurs congestion dem(S;)/cap(S;) in T. 
However, when T is mapped back to G, all the 
flow going out of S; must pass via S;. To ensure 
that the edges from S; to S; are not overloaded, it 
must be the case that the capacity from S; to S; 
is not too small compared to the capacity from 
S; to the rest of the graph V\S;. This is referred 
to as the bandwidth property. Racke guarantees 
that this is ratio is always Q(1/logn) for every 
S; and S; corresponding to edges (i, j) in the 
tree. The weight property is motivated as follows. 
Consider a node 7 in T with children 11,...,ip; 
then the weight property essentially requires that 
the sets Sj1,..., Sip should be well connected 
among themselves even when restricted to the 
subgraph S;. To see why this is needed, consider 
any communication between, say, nodes 7; and iz 
in T. It takes the route 7; to 7 to iz, and hence, 
in G,S;, cannot use edges that lie outside S; to 
communicate with S;2. Racke shows that these 
conditions suffice and that a decomposition can 
be obtained that satisfies them. 

The factor O(log 3m) in Racke’s guarantee 
arises from three sources. The first logarithmic 
factor is due to the flow-cut gap [3, 13]. The 
second is due to the logarithmic height of the tree, 
and the third is due to the loss of a logarithmic 
factor in the bandwidth and weight properties. 


Applications 


The problem has widespread applications to rout- 
ing in networks. In practice, it is often required 
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that the routes must be a single path (instead of 
flows). This can often be achieved by random- 
ized rounding techniques (sometimes under an 
assumption that the demands to capacity ratios be 
not too large). The flow formulation provides a 
much cleaner framework for studying the prob- 
lems above. 

Interestingly, the hierarchical decomposition 
also found widespread uses in other seemingly 
unrelated areas such as obtaining good precon- 
ditioners for solving systems of linear equations, 
finding edge-disjoint paths and multicommod- 
ity flow problems, online network optimization, 
speeding up the running time of graph algo- 
rithms, and so on. 
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Problem Definition 


This entry surveys some of the applications of 
“oblivious subspace embeddings,” introduced by 
Sarldés in [19], to problems in linear algebra. 


Definition 1 ({19]) Given 0 < e < 1/2 andad- 
dimensional subspace E C R”, we say anm xn 
matrix [7 is an €-subspace embedding for E if 
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Vx € E (1—e)||x||3 < 7x5 < 1 + 8)|/x1]3. 


The goal is to have m small so that 7 provides 
dimensionality reduction for EF. 

Given 0 < ¢,6 < 1/2, and integers 1 < d < 
n, an (€,6,d,n)-oblivious subspace embedding 
(OSE) is a distribution D over m x n matrices 
such that for every d-dimensional linear subspace 
E CRN of dimension d, 


nm aC is an ¢-subspace embedding for E) >1—6. 


Sometimes we omit a subset of the variables 
€,6,d,n if they are understood from context. 


In the definition of an OSE, note that we can 
write E = {Ux : x € R?} where U € R"™~¢ 
and UU = J. That is, the columns of U form 
an orthonormal basis for E. Therefore, we would 
like that ||7Ux||3 ~ ||Ux|]} = x7UTUx = 
|| x ||5 for all x € R?. Letting || - || denote operator 
norm and noting that || Aj] = sup, |x? Ax| for 
any real symmetric A, we see that being an OSE 
as above is equivalent to the following holding for 
all U € R”*4 with orthonormal columns: 


a (TU)? “7U) I> e) ae th) 


Sarl6s introduced OSEs [19] to provide faster 
approximate algorithms for least squares regres- 
sion and low-rank approximation. In these prob- 
lems, the input is a tall and skinny matrix A € 
R”*4 (n > d). For regression, we also are given 
b € R". The goal is to solve some computational 
problem given A, and naturally the running time 
depends on both n and d. The basic idea is 
to instead run the computation on J7A for IT 
sampled from an OSE and (1) prove that the 
quality of solution found is near optimal if ¢ is 
small and (2) enjoy faster computation time to 
find a solution since the dimensionality of the 
problem is reduced (ITA is m x d, whereas A is 
nxd,m <n). 
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Key Results 


As mentioned above, Sarlés showed how to use 
OSEs to speed up least squares regression and 
low-rank approximation. Below we first discuss 
constructions of OSEs, and then we elaborate on 
applications. 


Constructing OSEs 

One OSE is to pick the entries of JT € R”*” 
iid. from a Gaussian distribution with mean 
zero and variance 1/m, where m = O((d + 
log(1/5))/e). In fact it suffices to pick any 
“Johnson-Lindenstrauss transform,” i.e., a IT 
which preserves the Euclidean norms of a certain 
set of 29) vectors up to 1 + e (see [7]). This 
setting of m is optimal for any OSE [17]. The 
downside of such constructions is multiplying 
ITA then takes time O(nmd), which is in 
fact worse than the time to solve the problems 
considered here. 

Sarlés remedied this by picking /7 from the 
“Fast Johnson-Lindenstrauss” distribution [1], 
which improved this time to O(ndlogn) + 
poly(d)/e?. A related construction, the “Sub- 
sampled Randomized Hadamard Transform” 
(SRHT), with improved bounds for OSEs was 
analyzed in [14,21] using matrix concentration 
inequalities. In this construction, one chooses 
TI = Jn/m-SHD where D € R"*” is 
diagonal with random signs on the diagonal, H 
is any bounded orthonormal system that supports 
matrix-vector multiplication in O(nlogn) 
time (i.e. H should be orthogonal with 
max;,; |Hj,;| = O(1//n)), and S is a sampling 
matrix with m rows. That is, the rows of S are 
independent, and each row of S has exactly a 
single 1 in a uniformly random location and 
zeroes elsewhere. The works [14, 21] showed 
one can take m = O(d log(d/65)/e7). Note we 
can multiply /7A in time O(nd logn + m) by 
individually multiplying IT by each column of A. 

Subsequently, Clarkson and Woodruff [7] 
showed that the Thorup-Zhang sketch [20] 
provides an OSE with small m. In particular, 
consider a random /7 with independent columns 
where in each column there is a single nonzero 
entry placed in a uniformly random location. 
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The value of this nonzero is uniform in {—1, 1}. 
Note that with this construction, one can multiply 
ITA in time O(nnz(A)), where nnz(-) counts 
nonzero entries. They showed this distribution is 
an OSE form = O(d?log®(d/e)/(e78)). This 
bound was improved independently in [15, 16] 
tom = O(d?/(e%6d)) via the moment method 
(see also an observation of N guyén [12, Remark 
6.4]). Note also a valid OSE is the product of 
the SRHT with this construction, yielding m 
as for the SRHT and with multiplication time 
O(nnz(A) + d? log(d/(e5))/e7) to apply to A. 
Nelson and Nguyén [16] analyzed the “Sparse 
Johnson-Lindenstrauss Transform” (SJLT) of 
[12] in the context of OSEs. In particular, they 
showed one can choose an OSE with m = 
O(d log®(d/8)/e?) and s = O(log?(d/68)/e) 
nonzero entries per column. See also [5]. Note 
ITA can be computed in time O(s - nnz(A)). 
One could also choose m = O(d!*Y/e?),s = 
O(1/e) for any fixed constant0 < y < 1, 
in which case 6 = 1/d° for any desired 
constant c > 0. A conjecture of [16] is that 
m = O((d +log(1/5))/e7), s = O(log(d/8)/e) 


suffices. 


Applying OSEs 


Least Squares Regression 
The input is A € R’*¢, b € R”. The goal is to 
compute 


x* = argmin,ena||Ax — Bla. 


By optimality of x*, Ax* must be the projection 
of b onto the column span of A. Write the sin- 
gular value decomposition (SVD) A = UV’, 
where U € R®*", V e€ R2@**" have orthonormal 
columns, and r is the rank of A. Also, ©’ € R’*’ 
is diagonal with strictly positive entries on the 
diagonal (the “singular values” of A). Then the 
column spans of U and of A are identical, and so 
Ax* = UU')D is the desired projection. Thus, 
we can choose x* = VY~!UTD. Alternatively 
one can write x* = (A? A)*+A™D, where the 
pseudoinverse of a matrix B with SVD LDW7 
ce = WD 
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Simply computing A? A in the formula for 
x* naively takes O(nd”) time (or O(nd®~') if 
using fast matrix multiplication). Note the follow- 
ing observation. 


Observation 1 Let E be the subspace spanned 
by b and the columns of A. Let IT be an ¢- 
subspace embedding for E, and write xX = 
argmin,, ||ITAx — ITb||2. Then 


7 l+e x 
| Az — blla < J — -IlAx* - 4 


Proof By optimality of x, ||J7Ax — ITbll2 < 
||{7Ax* — ITb||z. Furthermore, since Ax — 
b, Ax* —b € E, we have (1 — €)|| AX — b||} < 
TAX — ITb||Z and ||JTAx* — TTb||5 < 


(1 + €)|| Ax* — b||3. The claim follows. 


The above observation informs us that mini- 
mizing ||J7Ax —ITb||2 for a subspace embedding 
IT is sufficient to obtain a high-quality solution to 
the original problem. Using an OSE with m rows, 
one can compute ¥ in O(md7) time (or faster 
using fast matrix multiplication). 

Sarlés also provided another method of using 
OSEs for least squares regression. First we pro- 
vide a definition. 


Definition 2 We call a distribution D over R”*” 
an (e€,6)-AMMF distribution if, for every pair of 
matrices A, B each with n rows, 


Pe stTAy (1B)-AT Bll > ellAlle Bll) <8 


where ||Allzy = (j,; Aj jy? is the Frobenius 
norm of A. (“AMM” here stands for “approxi- 
mate matrix multiplication.”) 


The work [10] was the first to propose using 
AMM-F and (non-oblivious) subspace embed- 
dings in the context of low-rank approximation. 
Sarlé6s showed that any Johnson-Lindenstrauss 
(JL) distribution also provides AMM, but with 
6 increased by some factor involving the di- 
mensions of A, B. This factor was removed for 
random sign matrices in [8] and later for a fairly 
general class of JL distributions in [12, Theorem 
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6.2]. We state the relevant definition and theorem 
for this general class. 


Definition 3 ({12]) We say a distribution D over 
R”*" has (€, 6, p)-JL moments if for any x € R" 
of unit Euclidean norm, 


b ||lTx|I5 — 1]? < 66. 

II~D 

Theorem 1 ({12]) Given e,6 € (0,1/2), let D 
be any distribution with the (e,6, p)-JL moment 


property for some p > 2. Then D is a (3¢, 4)- 
AMM F distribution. 


For constant 5, for example, it thus follows 
from [20] that the Thorup-Zhang sketch with 
O(1/e?) provides AMM. Sarlds then 
proved the following. 


n = 


Theorem 2 Suppose xX = argmin||ITAx — 
ITb||2 where the distribution II is drawn from 
is (1) an (O(1),6)-OSE for d-dimensional sub- 
spaces (with a distortion parameter independent 
of €), and (2) a (Je/d, 5)-AMM F distribution. 
Then with probability 1 — 26, 


|| AX — blz < (1 + O(e))||Ax* — dlla. 


The above theorem combined with [16] al- 
lows, for example, picking [7 as the SJLT with 
m = O(d't’+d/e),s = O(1) for any constant 
0 < y < | to achieve (1 + ¢)-multiplicative error 
for least squares regression. 


Low-Rank Approximation 
The input is A € R”*@ and positive integer k, 
and the goal is to compute 


Ay, = argmin ||A—B\|r. 
B:rank(B)<k 


Given the SVD A = UZV’, the Schmidt 
approximation theorem (later rediscovered as 
the Eckart-Young theorem) yields that A, = 
US,V7, where ¥; retains only the k largest 
elements of »' and zeroes out the rest. Up to 
terms logarithmic in dimension and precision, the 
SVD can be computed in time nd®~! [9] where 
w is the exponent of square matrix multiplication. 


Oblivious Subspace Embeddings 


Thus, low-rank approximation can be solved in 
the same time bound. 

A scheme based on OSEs and AMMpf 
was given by Sarlés. For matrices B,S, let 
Projs ;(B) denote the best rank-k approximation 
to B in the column span of S. Equivalently, it 
is the best rank-k approximation to the matrix 
formed by projecting each column of B to the 
column span of S. Sarlés’ theorem is as follows. 


Theorem 3 ({19]) Let IT be drawn from a 
distribution which is (1) an (O(1),6)-OSE for 
k-dimensional subspaces and (2) a (Je/k, 6)- 
AMM fF distribution. Then with probability 1—26, 


|| A—Proj4n7 x(A)lly < (+ O(e))||A—Axllr- 


The above theorem has led to low-rank 
approximation algorithms with (1 + 6)- 
multiplicative error running in time O(nnz(A)) + 
O(nk?/poly(e)) [7, 15,16] or even O(nnz(A)) + 
O(nk®-!/poly(e)) [16]. Here O(-) hides 
logarithmic factors. These algorithms, in these 
running times, can output a decomposition 
Le R™,D « R&* Ww e R4** with D 
diagonal and L, W having orthonormal columns 
such that 


|A-—LDW" |r < (1+ 8)||4— Agllr. 


Other Applications 

OSEs have also found other applications, e.g., to 
approximating leverage scores [11], distributed 
principle component analysis [13], k-means clus- 
tering [6], canonical correlation analysis [3], sup- 
port vector machines [18], £, regression [22], 
ridge regression [14], CUR matrix factorization 
[4], and streaming approximation of eigenvalues 
[2]. The reader may investigate these references 
for more details. 
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Problem Definition 


Wireless sensor networks are composed of many 
small devices called sensor nodes with sensing, 
computing and radio frequency communication 
capabilities. Sensor nodes are typically deployed 
in an ad hoc manner and use their sensors to col- 
lect environmental data. The emerging network 
collectively processes, aggregates and propagates 
data to regions of interest, e.g., from a region 
where an event is being detected to a base station 
or a mobile user. This entry is concerned with the 
data propagation duty of the sensor network in the 
presence of obstacles. 

For different reasons, including energy conser- 
vation and limited transmission range of sensor 
nodes, information propagation is achieved via 
multi-hop message transmission, as opposed to 
single-hop long range transmission. As a con- 
sequence, message routing becomes necessary. 
Routing algorithms are usually situated at the 
network layer of the protocol stack where the 
most important component is the (dynamic) com- 
munication graph. 


Definition 1 (Communication graph) A wire- 
less sensor network is viewed as a graph 
G = (V, E) where vertexes correspond to sensor 
nodes and edges represent wireless links between 
nodes. 


Wireless sensor networks have stringent con- 
straints that make classical routing algorithms 
inefficient, unreliable or even incorrect. There- 
fore, the specific requirements of wireless sensor 
networks have to be addressed [2] and geographic 
routing offers the possibility to design particu- 
larly well adapted algorithms. 


Geographic Routing 

A geographic routing algorithm takes advantage 
of the fact that sensor nodes are location aware, 
i.e., they know their position in a coordinate 
system following the use of a localization proto- 
col [7]. Although likely to introduce a significant 
overhead, the use of a localization protocol is also 
likely to be inevitable in many applications where 
environmental data collected by the sensors 
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would be useless if not related to some geograph- 
ical information. For those applications, node 
location awareness can be assumed to be avail- 
able for routing purposes at no additional cost. 


The Power of Simple Geographic Routing 

The early “most forward within range” (MFR) 
or greedy geographic routing algorithms [14] 
route messages by maximizing, at each hop, the 
progress on a projected line towards the destina- 
tion or, alternatively, minimizing the remaining 
distance to the message’s destination. Both of 
these greedy heuristics are referred to as greedy 
forwarding (GF). Greedy forwarding is a very 
appealing routing technique for wireless sensor 
networks. Among explanations for the attrac- 
tiveness of GF are the following. (1) GF, as is 
almost imperatively required, is fully distributed. 
(2) It is lightweight in the sense that it induces 
no topology control overhead. (3) It is all-to- 
all (as opposed to all-to-one). (4) Making no 
assumptions on the structure of the communi- 
cation graph, which can be directed, undirected, 
stable or dynamic (e.g., nodes may be mobile 
or wireless links may appear and disappear, for 
example following environmental fluctuation or 
as a consequence of lower protocol stack layers 
such as sleep/awake schemes for energy saving 
purposes), it is robust. (5) It is on-demand: no 
routing table or gradient has to be built prior to 
message propagation. (6) Efficiency is featured as 
messages find short paths to their destination in 
terms of hop count. (7) It is very simple and thus 
easy to implement. (8) It is memory efficient in 
the sense that (8a) the only information stored in 
the message header is the message’s destination 
and that (8b) it is “ecologically sound” because 
no “polluting” information is stored on the sensor 
nodes visited by messages. 


Problem Statement 

Although very appealing, GF suffers from a ma- 
jor flaw: when a message reaches a local min- 
imum where no further progress towards the 
destination is possible the routing algorithm fails. 
There are two major reasons for the occurrence of 
local minimums: routing holes [1] and obstacles. 
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Definition 2 The so called routing holes are low 
density regions of the network where no sensor 
nodes are available for next-hop forwarding. 


Even in uniform-randomly deployed networks, 
routing holes appear as the manifestation of sta- 
tistical variance of node density. Although in- 
creasing as network density diminishes, routing 
holes have a severe impact on the performance of 
GF even for very high density networks [12]. 


Definition 3 A transmission blocking obstacle is 
a region of the network where no sensors are 
deployed and through which radio signals do not 
propagate. 


Clearly, large obstacles lying between a message 
and its destination tend to make GF fail. 

The problem reported in this entry is to 
find a geographic routing algorithm that main- 
tains the advantages of greedy forwarding listed 
in section “Geographic Routing” such as sim- 
plicity, light weight, robustness and efficiency 
while overcoming its weaknesses: the inability to 
escape local minimum nodes created by routing 
holes and large transmission blocking obstacles 
such as those seen in Fig. 1. 


Problem 1 (Escaping routing holes) The first 
problem is to route messages out of the many 
routing holes which are statistically doomed to 
occur even in dense networks. 


Problem 2 (Contouring obstacles) The second 
problem is to design a protocol capable of rout- 
ing messages around large transmission blocking 
obstacles. 


Problem | can be considered a simplified instance 
of Problem 2. Lightweight solutions to problem 1 
have been previously proposed, usually using 
limited backtracking [6] or controlled flooding 
combined with a GF heuristic [4, 13]. However, 
as shown in [5] where an integrated model for 
obstacles is proposed and where different algo- 
rithms are compared with respect to their obsta- 
cle avoidance capability, those solutions do not 
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certain obstacles 


satisfactorily solve Problem 2 in the sense that 
only small and simple obstacles are efficiently 
bypassed. 


Key Results 


In [12] a new geographic routing around ob- 
stacles (GRIC) algorithm was proposed to ad- 
dress the problems described in the previous 
section. 


Basic Idea of the Algorithm 

In GF, the strategy is to always propagate the 
message to the neighbor that maximizes progress 
towards the destination. Similarly, GRIC also 
maximizes progress in a chosen direction. 
However, this direction is not necessarily the 
message’s destination but an ideal direction of 
progress which has to be computed according to 
one of two possible strategies: the inertia mode or 
the rescue mode described below. Finally, it was 
found that performance is better in the presence 
of slightly unstable networks, cf. Result 4, 
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and thus in the case where the communication 
graph is very stable it is recommended to use 
a randomized version of GRIC where nodes 
about to take a routing decision randomly 
mark as either passive or active each outbound 
wireless link of the communication graph. Only 
active wireless links can be used for message 
propagation, and link status is re-evaluated each 
time a new routing decision is taken. Marking 
links as active with a probability of p = 0.95 
was found to be a good choice for practical 
purposes [12]. 


Inertia Mode 

The idea of the inertia mode is that a message 
should have a strong incentive to go towards its 
destination but this incentive should be moder- 
ated by one to follow the straight ahead direction 
of current motion “... like a celestial body in 
a planet system ...” [12]. The inertia mode aims 
at making messages follow closely the perimeter 
of routing holes and obstacles in order to even- 
tually bypass them and ensure final routing to 
the destination. To implement the inertia mode, 
a single additional assumption is made: sensor 
nodes should be aware of the position of the node 
from which they receive a message. As an ex- 
ample, this could be done by piggy-backing this 
1-hop away routing path history in the message 
header. Knowing its own position p, the mes- 
sage’s destination and the 1-hop away previous 
position of the message a sensor node can com- 
pute the vectors Ucur and vgs starting at position 
p and pointing in the direction of current motion 
and the direction to the message’s destination 
respectively. The inertia mode defines the ideal 
direction of progress, vial, aS a vector starting at 
point p and lying “somewhere in between”? Ucur 
and vast. More precisely, let a be the only angle 
in [—2, z[ such that vgs, is obtained by applying 
a rotation of angle @ to Ucy;, then v;q is the vector 
obtained by applying a rotation of angle a’ to 
Veur, Where a’ = sign(@) - min {Z, lo}. Finally, 
the message is greedily forwarded to the neighbor 
node maximizing progress in the computed ideal 
direction of progress via. 
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Rescue Mode 

In order to improve overall performance and 
to bypass complex obstacles, the rescue mode 
imitates the right-hand rule (RHR) which is a well 
known wall follower technique to find one’s way 
out of a maze. A high-level description of the 
RHR component of GRIC is given below while 
details will be found in [12]. In GRIC, the RHR 
makes use of a virtual compass and a flag. The 
virtual compass assigns to Ucyr a cardinal point 
value, treating the message’s destination as the 
north. Considering the angle a defined in the 
previous section, the compass returns a string 
x-y with x equal to north or south if |a| is 
smaller or greater than 4 respectively, while 
y is equal to west or east if @ is negative or 
positive respectively. The first time the compass 
returns a south value, the flag is raised and tagged 
with the (x, y) value of the compass. Raising 
the flag means that the message is being routed 
around an obstacle using the RHR rule if the 
compass indicates south-west. In the case where 
the compass indicates south-east, a symmetric 
case not discussed here for brevity is applied 
using the left-hand rule (LHR) instead of the 
RHR. Once the flag is raised, it stays up with its 
tag unchanged until the compass indicates north, 
meaning that the obstacle has been bypassed. 
In fact, a small optimization can be made by 
lowering the flag only if the compass points to 
the north-west (in the case of the RHR) and not 
if it points north-east, but cf. [12] for details. 
According to the RHR the obstacle’s perimeter 
should be followed closely and kept on the right 
side of the message’s current direction. If ever the 
compass and the flag’s tag disagree, i.e., if the 
flag is tagged with south-west and the compass 
returns south-east, it is assumed that the message 
is turning left too much, that it risks going away 
from the obstacle and that the RHR is at risk of 
being violated (a symmetric case applies for the 
LHR). When this is so, GRIC responds by calling 
the rescue mode which changes the default way 
of computing vjqi: in rescue mode the message is 
forced to turn right (or left if the LHR is applied), 
by defining vjq as the vector obtained by applying 
tO Vey, a rotation of angle aw” (instead of a’ in in- 
ertia mode) where a” = —sign(a)(2z — |a|)/6. 
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Main Findings 

The performance of GRIC was evaluated through 
simulations. The main parameters were the 
presence (or absence) of different shapes of 
large communication blocking obstacles and the 
network density which ranged from very low 
to very high and controls the average degree of 
the communication graph and the occurrence of 
routing holes. The main performance metrics 
were the success rate, ie., the percentage 
of messages routed to destination, and the 
path length. The main findings are that GRIC 
efficiently, i.e., using short paths, bypasses 
routing holes and obstacles but that in the 
presence of hard obstacles, the performance 
decreases with network density. In Fig. 1. typical 
routing paths found by GRIC for different 
obstacle shapes are illustrated, cf. [12] for details 
on the simulation environment. 


Result 1 In the absence of obstacles, routing 
holes are bypassed for every network density: The 
success rate is close to 100 % as long as the 
source and the destination are connected. Also, 
routing is efficient in the sense that path lengths 
are very Short. 


Result 2 Some convex obstacles such as the 
one in Fig. 1b are bypassed with almost 100% 
success rate and using short paths, even for 
low densities. When the density gets very low 
performance diminishes: If the density gets be- 
low the critical level guaranteeing the commu- 
nication graph to be connected with high prob- 
ability, then the success probability diminishes 
quickly and successful routings use longer rout- 
ing paths. 


Result 3 Some large concave obstacles such 
as those in Fig. lc and d are efficiently 
bypassed. However, when facing such obstacles 
performance becomes more sensitive to network 
density. The success rate drops and routing paths 
become longer when the density gets below 
a certain level depending on the exact obstacle 
shape. 
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Result 4 (Robustness) Similarly to GF, GRIC 
is robust to link instability. Furthermore, it was 
observed that limited link instability has a signif- 
icantly positive impact on performances. This can 
be understood as the fact that messages are less 
likely to enter endless routing loops in a “hot” 
system than in a “cold” system. 


Applications 


Replacement for Greedy Forwarding 

Because it makes no compromise with the advan- 
tages of GF except the fact that it may be some- 
how more complicated to implement and because 
it overcomes GF’s main limitations, GRIC can 
probably replace GF for most routing scenar- 
ios including but not exclusively wireless sensor 
networks. As an example opportunistic-routing 
strategies [11] could be applied to GRIC rather 
than to GF. 


Wireless Sensor Networks with Large 
Obstacles 

GRIC successfully bypasses large communi- 
cation blocking obstacles. However, it does so 
efficiently only if the network density is high 
enough. This suggests that the obstacle avoidance 
feature of GRIC may be more useful for dense 
wireless networks than for sparse networks. 
Wireless sensor networks are an example of 
networks which are usually considered to be 
dense. 


Dynamic Networks 

There exist some powerful alternatives to GRIC 
such as the celebrated guaranteed delivery proto- 
cols GFG [3], GPSR [8] or GOAFR [10]. Those 
protocols rely on a planarization phase such as 
the lazy cross-link detection protocol (CLDP) [9]. 
LCR implies significant topology maintenance 
overhead which would be amortized over time if 
the network is stable enough. On the contrary, 
if the network is highly dynamic the necessity 
for frequent updates could make this topology 
maintenance overhead prohibitive. GRIC may 
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thus be a preferable choice for dynamic networks 
where the communication graph is not a stable 
structure. 


Open Problems 


(1) Hard concave obstacles such as the one in 
Fig. Id are still a challenge for lightweight 
protocol since in this configuration GRIC’s 
performance is strongly dependent on network 
density. (2) Low to very low densities are 
challenging when combined with large obstacles, 
even when they are “simple” convex obstacles 
like the one in Fig. 1b. (3) The problem reported 
in this entry in the case of 3-dimensional 
networks is open. Inertia may be of some 
help, however the virtual compass and the 
right-hand rule seem quite strongly depend- 
ant on the 2-dimensional plane. (4) GRIC is 
not loop free. A mechanism to detect loops 
or excessively long routing paths would be 
quite important for practical purposes. (5) The 
understanding of GRIC could be improved. 
Analytical results are lacking and new metrics 
could be considered such as network lifetime, 
energy consumption or traffic congestion. 
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Problem Definition 


Online interval coloring is a graph coloring prob- 
lem. In such problems the vertices of a graph are 
presented one by one. Each vertex is presented in 
turn, along with a list of its edges in the graph, 
which are incident to previously presented ver- 
tices. The goal is to assign colors (which without 
loss of generality are assumed to be nonnegative 
integers) to the vertices, so that two vertices 
which share an edge receive different colors and 
the total number of colors used (or alternatively, 
the largest index of any color that is used) is 
minimized. The smallest number of colors, for 
which the graph still admits a valid coloring, is 
called the chromatic number of the graph. 

The interval coloring problem is defined as 
follows. Intervals on the real line are presented 
one by one, and the online algorithm must assign 
each interval a color before the next interval ar- 
rives, so that no two intersecting intervals receive 
the same color. The goal is again to minimize 
the number of colors used to color any interval. 
The last problem is equivalent to coloring of 
interval graphs. These are graphs which have a 
representation (or realization) where each inter- 
val represents a vertex and two vertices share an 
edge if and only if they intersect. It is assumed 
that the interval graph arrives online together with 
its realization. 

Given an interval graph, denote the size of the 
largest cardinality clique (complete subgraph) in 
it by w. Interval graphs have the special property 
that in a realization, the set of vertices in a clique 
have a common point in which they all intersect. 

Before discussing the online problem, some 
properties of interval graphs need to be stated. 
There exists a simple offline algorithm which 
produces an optimal coloring of interval graphs. 
An algorithm applies First Fit, if each time it 
needs to assign a color to an interval, it assigns a 
smallest index color which still produces a valid 
coloring. The optimal algorithm simply considers 


Online Interval Coloring 


intervals sorted from left to right by their left end 
points and applies First Fit. Note that the resulting 
coloring never uses more than w colors. Indeed, 
interval graphs are perfect. A graph G is perfect 
if any induced subgraph of G, G’ (including G), 
can be colored using w(G’) colors, where w(G’) 
is the size of the largest cardinality clique in G’. 
(For any graph, @ is a clear lower bound on its 
chromatic number.) 

However, once intervals arrive in an arbitrary 
order, it is impossible to design an optimal color- 
ing. Consider a simple example where the two in- 
tervals [1, 3] and [6, 8] are introduced. If they are 
colored using two distinct colors, this is already 
suboptimal, since using the same color for both 
of them is possible. However, if the sequence of 
intervals is augmented with [2, 5] and [4, 7], these 
two new intervals cannot receive the color of the 
previous intervals or the same color for both new 
intervals. Thus, three colors are used, even though 
a valid coloring using two colors can be designed. 
Note that even if it is known in advance that the 
input can be colored using exactly two colors, not 
knowing whether the additional intervals are as 
defined above, or alternatively, a single interval 
[2, 7] arrives instead, leads to the usage of three 
colors instead of only two. 

Online coloring is typically hard, which 
already applies to some simple graph classes 
such as trees. This is due to the lower bound 
of 2(logn) (where n is the number of 
vertices), given by Gyarfaés and Lehel [9] on 
the competitive ratio of online coloring of 
trees. There are very few classes for which 
constant bounds are known. One such class is 
line graphs, for which Bar-Noy, Motwani, and 
Naor [3] showed that First Fit is 2-competitive 
(specifically it uses at most 2 - OPT — | colors, 
where OPT is the number of colors in an optimal 
coloring), and this is the best possible bound. This 
result was later generalized to k- OPT —k + 1 for 
(k + 1)-claw-free graphs by [8] (note that line 
graphs are 3-claw-free). 


Key Results 


The paper of Kierstead and Trotter [11] provides 
a solution to the online interval coloring problem. 


Online Interval Coloring 


They show that the best possible competitive 
ratio is 3 which is achieved by an algorithm they 
design. More accurately, the following theorem is 
proved in the paper. 


Theorem 1 Given an interval graph which is in- 
troduced online and presented via its realization, 
any online algorithm uses at least 3w — 2 colors 
to color the graph, and there exists an algorithm 
which achieves this bound. 


The algorithm does not need to know in 
advance. Moreover, even though the algorithm 
is deterministic, it was shown in [13] that the 
lower bound of 3 on the competitive ratio of 
online algorithms for interval coloring holds for 
randomized algorithms as well. Thus, [11] gives 
a complete solution for the problem. 

The main idea of the algorithm is creation of 
“levels.” At the time of arrival of an interval, it 
is classified into a level as follows. Denote by 
Ax the union of sets of intervals which currently 
belong to all levels 1,...,. Intervals are classi- 
fied so that the largest cardinality clique in Ax 
is of size k. Thus, A, is simply a set of non- 
intersecting intervals. On arrival of an interval, 
the algorithm finds the smallest k such that the 
new interval can join level k, without violating 
the rule above. It can be shown that each level 
can be colored using two colors by an offline 
algorithm. Since the algorithm defined here is 
online, such a coloring cannot be found in general 
(see example above). However it is shown in 
[11] that at most three colors are required for 
each such level, and a coloring using three colors 
can be found by applying First Fit on each level 
(with disjoint sets of colors). Moreover, the first 
level can always be colored using a single color, 
and w is equal exactly to the number of levels. 
Thus a total number of colors, which is at most 
3(@ — 1) + 1 = 3@ — 2, is used. 


Applications 


In this section, both real-world applications of 
the problem and applications of the methods of 
Kierstead and Trotter [11] to related problems are 
discussed. 
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Many applications arise in various commu- 
nication networks. The need for connectivity 
all over the world is rapidly increasing. On the 
other hand, networks are still composed of very 
expensive parts. Thus application of optimization 
algorithms is required in order to save costs. 

Consider a network with a line topology that 
consists of links. Each connection request is for 
a path between two nodes in the network. The 
set of requests assigned to a channel must consist 
of disjoint paths. The goal is to minimize the 
number of channels (colors) used. A connection 
request from a to b corresponds to an interval 
[a, b], and the goal is to minimize the number of 
required channels to serve all requests. 

Another network-related application is that if 
the requests have constant duration c and all 
requests have to be served as fast as possible. 
In this case the colors correspond to time slots, 
and the total number of colors corresponds to the 
schedule length. The problem can be described as 
a scheduling problem as well, and it is clearly of 
theoretical interest being a natural online graph 
coloring problem. Two later studies are of pos- 
sible interest here, both due to their relevance to 
the original problem and for the usage of related 
methods. 

The applications in networks stated above 
raise a generalized problem studied in the recent 
years. In these applications, it is assumed that 
once a connection request between two points is 
satisfied, the channel is blocked at least for the 
duration of this request. An interesting question 
that was raised by Adamy and Erlebach [1] is 
the following. Assume that a request consists 
not only of a requested interval but also from 
a bandwidth requirement. That is, a customer 
of a communication channel specifies exactly 
how much of the channel is needed. Thus, in 
some cases it is possible to have overlapping 
requests sharing the same channel. It is required 
that at every point, the sum of all bandwidth 
requirements of requests sharing a color cannot 
exceed the value 1, which is the capacity of 
the channel. This problem is called online 
interval coloring with bandwidth. In the paper 
[1], a (large) constant competitive algorithm was 
designed for the problem. The original interval 
coloring problem is a special case of this problem 
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where all bandwidth requests are 1. Note that this 
problem is a generalization of bin packing as 
well, since bin packing is the special case of the 
problem where all requests have a common point. 
Azar et al. [2] designed an algorithm of compet- 
itive ratio of at most 10 for this problem. This 
was done by partitioning the requests into four 
classes based on their bandwidth requirements 
and coloring each such class separately. The 
class of requests with bandwidth in (G. 1] was 
colored using the basic algorithm of [11], since 
no two such requests colored with one color can 
overlap. The two other classes, which are (0, 4] 
and (4, S|, were colored using adaptations of 
the algorithm of [11]. Epstein and Levy [6, 7] de- 
signed improved lower bounds on the competitive 
ratio, showing that online interval coloring with 
bandwidth is harder than online interval coloring. 
Another problem related to coloring is the 
max coloring problem [5, 14, 15]. In this prob- 
lem each interval is given a nonnegative weight. 
Given a coloring, the weight of a color is the 
maximum weight of any vertex of this color. The 
goal is to minimize the sum of weights of the 
used colors. Note that if all weights are 1, max 
coloring reduces to the graph coloring problem. 
Several papers [5, 15], starting with that of Pem- 
maraju, Raman, and Varadarajan [15], designed 
algorithms for max coloring that are based on the 
algorithm of [11] (sometimes as a black box). 


Open Problems 


Since the paper [11] provided a nice and clean 
solution to the online interval coloring problem, 
it does not directly raise open problems. Yet, one 
related problem is of interest to researchers over 
the last 30 years, which is the performance of 
First Fit on this problem. It was shown by Kier- 
stead [10] that First Fit uses at most 40m colors, 
thus implying that First Fit has a constant com- 
petitive ratio. The quest after the exact competi- 
tive ratio was never completed. The best current 
published results are an upper bound of 10m by 
[15] and a lower bound of 4.4@ by Chrobak and 
Slusarek [4]. See [16] for recent developments. It 
particular, it is mentioned that a lower bound of 
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4.99999w —C (fora fixed C > 0) was proved by 
Kierstead and Trotter in 2004, later improved to a 
lower bound of (5—«)q for any ¢ > 0, implying a 
lower bound of 5 on the competitive ratio of First 
Fit [12]. It is interesting to note that for online 
interval coloring with bandwidth, First Fit has an 
unbounded competitive ratio [1]. 
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Problem Definition 


Suppose we are going to invest in a stock market. 
Our neighbor, for mysterious reasons, happens 
to know how the market evolves. But he can- 
not change his portfolio (proportions of holding 
stocks) once committed (to avoid being caught by 
regulators, say). On the other hand, we, the nor- 
mal investor, do not have any inside information 
but can sell and buy at will. If we and our pre- 
scient neighbor invest the same amount of money, 
is there a (computationally feasible) way for us to 
perform comparably well to our neighbor, with- 
out knowing his investing strategy? Surprisingly 
(as contrary to our real-life experience perhaps), 
the answer is yes, and we will see it through the 
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lens of online learning. Disclaimer: The reader 
is at his own risk if he decides to practice the 
beautiful theoretical results we describe below. 

The online learning problem is best described 
as a multi-round two-person game between the 
“learner” and the “environment,” following the 
protocol: 


The Online Learning Protocol 


Fort = 1,...,T7 
Learner predicts x; € D; 
Environment responds with a cost 
function f, : D> R; 
Learner suffers an 
Si (x2); 


Learner learns some information of /,. 


immediate cost 


Through the multi-round interactions with the 
environment, the learner tries to learn the be- 
havior of the environment so as to minimize its 
cumulative cost in the time horizon t ¢€ [1,7], 
where we could allow the game to continue 
indefinitely, i.e., T = 00. 

The online learning framework is particularly 
relevant in real applications where (1) sequential 
decisions are needed, (2) average good perfor- 
mance is desired, and (3) the process is too 
complicated to be modeled statistically. In our 
stock example above, the learner will be us (nor- 
mal stock holder), and the environment will be 
the market. Each day we submit our portfolio 
x;, carefully constructed based on the past in- 
formation and perhaps also mingled with some 
randomness (coin tosses for luck). The market 
responds with rises and falls of the stock prices, 
represented as the cost function f;. We suffer the 
loss f;(x;) and learn something about the market 
(e.g., ff), and the life moves on to the next day. 
(If it feels more comfortable, one can negate f 
and call it gain. We shall not do this, because 
“a true warrior faces her bleak life bravely.”) Our 
adventure ends at day 7, which is prefixed. (For 
T = ov, the adventure never ends.) Of course, 
the goal is to earn on average as much money 
as possible; it is OK if we lose occasionally. 
Also, for an average person (us), it is perhaps too 
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complicated to have a clear idea what is exactly 
going on in that stock market. As mentioned, 
we would like to compete against our “prescient 
neighbor.” This is formalized as the regret below. 
To evaluate the performance of the learner, the 
following notion of regret plays a central role: 


T 


Rr(x) = D> (filer) — file), Ve EC SD. 


t=1 
(1) 


Intuitively, the learner compares itself with the 
baseline (e.g., the “prescient” neighbor) that con- 
stantly predicts x € C in each round. We are 
interested in bounding the learner’s regret with 
respect to the “best” competitor in the set C (al- 
though our notation drops the dependence on C): 


Rr := sup E(Rr(x)), (2) 


xEC 


where the expectation E(-) is taken with respect 
to any internal randomization the learner or the 
environment might use. The learner is said to be 
(Hannan) consistent if 


R 
a > 0, asT > ow, ie, Rr = 0(T). 


In other words, the learner performs, on average, 
as well as the best constant competitor in the long 
run. 

We adopted the notion of regret not because 
we believe a constant (unchanging) predictor is 
the best strategy for our problem. Instead, the 
regret should be interpreted as a bare minimum 
requirement: If there does exist a constant pre- 
dictor that performs reasonably well on our task, 
it would be unacceptable if our algorithm is 
not even on par with it. More often than not, 
we would like to do better than any constant 
predictor, but this can be highly nontrivial (either 
computationally or statistically). 

We have allowed the learner to operate on a 
larger set D than its “competitors” (which are 
restricted to C). Of course this buys the learner 
some advantage, which sometimes is necessary 
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for consistency, particularly when C is a noncon- 
vex set. For instance, consider the game where 
the sets C = D = {0, 1} and the cost functions 


1, ifx=x 
A(x) = ; 


0, otherwise , 


(3) 


Recall that in our online learning protocol, we 
have no control on how the environment reacts. In 
the very worst case, the environment may appear 
to be completely “hostile.” For instance, the cost 
function jf; in (3) is thus defined to make the 
learner always suffer unit cost in each round. On 
the other hand, the best constant competitor in 
C suffers cost at most 7/2 in T rounds. Hence, 
Br > $ for all 7, meaning that any learner 
that follows our protocol cannot be consistent. 
The lesson is, of course, that we cannot compete 
under a very adversarial environment. However, 
if we allow the learner to randomize its decisions 
and correspondingly pay expected cost, then it is 
again possible to devise consistent learners for 
this game [8], provided that the environment is 
oblivious, i.e., it does not adapt to the learner’s 
randomization, thus constraining its “hostility.” 
Intuitively, randomization and averaging smooth 
out the possible worst-case (but oblivious) reac- 
tions of the environment. This is also equivalent 
to allowing the learner to operate on D = (0, 1], 
the convex hull of C = {0, 1}. Indeed, for binary 
xX we can interpret the cost function in (3) as 
Sii(x) = |x — y;|, where in the worse case the 
environment could happen to choose y; = 1 — x; 
from the set C. In the randomized setting, the 
learner first picks x € D, the convex hull of C, 
and then chooses | with probability x and 0 other- 
wise. Provided that the environment still chooses 
(however adversarial) y; € C, the expected cost 
the learner suffers is again f;(x) = |x — ye|, but 
this time extended to the convex domain D. The 
claim that there exists a consistent learner under 
this randomized setting follows from Theorem 2 
below. Intuitively, now the learner sits in the 
middle (x = 1/2) and leans toward the better 
constant predictor fast enough. 

The previous example shows that consistency 
may not always be achievable. Consequently, the 
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interesting questions in online learning include 
(but are not limited to) the following: 


e Identifying settings under which consistency 
can be achieved 

¢ Determining the correct order of the regret 
tending to infinity 

¢ Devising computationally efficient and order 
optimal learners 


These questions heavily depend on what the 
learner can learn in each round. For instance, in 
the full information setting, the learner observes 
the entire cost function /;; in the bandit setting, 
the learner only observes its incurred cost f;(x;), 
while in the partial monitoring setting, the 
learner only observes some quantity related to 
its cost. The geometry of the decision set D and 
the competitor set C, as well as the structural 
property (such as convexity, smoothness, etc.) of 
the cost functions, also play a significant role. In 
the next section, we will consider a special case 
where a particularly simple algorithm known as 
online gradient descent suffices to achieve the 
optimal regret. For more complete and thorough 
discussions, please refer to the excellent book [3] 
and surveys [2, 8]. 


Online Convex Programming 
We further simplify our online learning protocol 
as follows: 


Online Convex Programming (on the real 
line) 


¢ D C Ris aclosed convex set, with r = 
max, yep |x — y| < co; 

¢ Wt < T, ft is convex and differentiable 
on some open set containing D; 

e The gradient is uniformly bounded: 
suprep,t<T |V fr(x)| < M < 00; 

¢ The learner gets to observe V f;(x;) in 
round f¢. 
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The third condition is satisfied if each f; is M- 
Lipschitz continuous, i.e., 


¥a,y 6 D, (fi) —= (0) SM +x =F); 
(4) 


while the last condition is certainly met if the 
cost function f; is revealed to the learner in 
each round. Under this setting, Zinkevich [9] first 
analyzed the online learner that simply follows 
the (projected) gradient update: 

Wt > 1, X41 = PoQtr-—mV f(x), 5) 
where 7; => O is a small step size that we 
determine later and 


Pp(x) = argmin |x — yl, 
yeD 


(6) 


is the (Euclidean) projection of x onto the closed 
set D, i.e., the closest point in D to x. The 
projection is needed since the learner’s prediction 
X,+1 1S restricted to the decision set D. 

Before we analyze the regret of the above 
online gradient algorithm, let us first observe that 


Rey => (4G) — f@)) 
t=1 
T 


< (Vs) Gr - x) 


t=1 


T 
<MYO |e — x1, (7) 


t=1 


where the first inequality follows from the con- 
vexity of f;. Interestingly, the right-hand side is 
the worst-case regret for the special case where 
each f; is a linear function, say, w;x; for some 
|w:| < M. In other words, we could have re- 
stricted the game to linear cost functions, instead 
of the seemingly more general convex functions. 

The regret of an online learner can be bounded 
by analyzing its progress with respect to some 
potential function. Here we choose the familiar 
quadratic potential. Note that for any x € C C 
D, clearly Pp(x) = x; hence, 
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Pp(x — mV fr(xr)) — Pp(x)|* 
< |x — mV fi (xr) —x|? 
< |x¢ — x|? — 2mV fr (x1) 


[Xe+1 — x|? = 


“(4 — x) + pM? 
S [x2 — x? — 2m Fer) 
— filx)) + 9M", 


(8) 


where the first inequality follows from the I- 
Lipschitz continuity of the projection Pp(-) and 
the last inequality is due to the convexity of ff. 
Dividing (8) by 27;, summing the indices from 
t = 1 tot = T, and rearranging, we have 


Rr(x) = >> (fel) — fr) 


2 4 
= > — (|x, — x|? — |xr41 — x1?) 


gai Zit 
2 c Nt 
+M 2 5 (9) 
1 sa | 1 
gee? Com es, 
“me 
Ine —x/? + M? YTS. (10) 


Setting the step size n; properly leads to our key 
results, summarized in the next section. 


Key Results 


If the horizon T is finite and known in advance, 
then we can use a constant step size ny = 7. 
Optimizing with respect to 7 > 0 from (10) yields 


Theorem 1 (e.g., [8,9]) Let n, =n = ae for 
some constant c > 0; then the online gradient 
learner achieves sublinear regret 


2 es 2 
Rr(x) < my7 a 
Cc 
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2 2 
<MJ/T~ _ (11) 
(6 


for the online convex programming problem. 


If the horizon is not know in advance, inspired 
by the step size in Theorem 1, we can try setting 
nm = MJ Note that 7; is decreasing with 
respect to ¢. Continuing from (10): 


1 ae | 1 
Rr(x) <r? {— + — — 
2m1 dar, 2nt-1 
| 
dent 5 = 
ra: 


Using integration, ~ S ai < 
/T. Thus, we have proved 


T 1 
Ie a 


Theorem 2 (Zinkevich [9]) Let ny = uJ for 
some constant c > 0; then the online gradient 


learner achieves sublinear regret (simultaneously 


forall T) 
2c? + r? 


Rr(x) < MVT——— 
2c 


(12) 


for the online convex programming problem. 


Comparing to Theorem 1, we only lose a 
constant 2 in Theorem 2, but the result now holds 
simultaneously for all T — a property sometimes 
called anytime. Theorems | and 2 not only imply 
the consistency of the online gradient learner but 
also demonstrate that Rr = O(V/7), since the 
right-hand sides of (11) and (12) are independent 
of the competitor x. In fact, this rate is optimal, 
i.e., there exists an instantiation where no learner 
(efficient or not) can do better; see, e.g., [6]. 
Thanks to the convexity assumption on /; (and 
the decision set D), the online gradient algorithm 
can be efficiently implemented if the gradient 
V f; and the projection Pp(-) can be efficiently 
computed. 


Doubling Trick When the horizon T is not 
known in advance, we can also use the doubling 
trick, which divides the time into exponentially 
increasing phases 
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[logs(T+1)] 


'e van 


i=1 


2 =i), 


and on the 7th phase, we use the constant step size 
ni = O(1/V 2!—!) suggested in Theorem 1. The 
overall regret is bounded by 


[logs(T+1)] 

BS ; 2 

x ovary = 
V2=1 


O(VT + 1). 


i=1 


So asymptotically we only lose a factor of 


ae 
Wy © 3.4L. 


Other Rates It is possible to tighten the regret 
rate if the cost functions are more “regular.” 
Intuitively, this means the environment is more 
constrained hence can only be less adversarial. 
Indeed, if f; — $|- |? is convex, namely, f; is o- 
strongly convex, Hazan et al. [6] showed that the 
online gradient learner equipped with a smaller 
step size ny + suffers only logarithmic regret 
O(log(7')) — an exponential improvement com- 
pared to Theorem 2. Just like the time horizon, it 
is possible to achieve the same logarithmic regret 
without knowing the parameter o; see [1]. Sim- 
ilarly, if f; is so-called exponentially concave, a 
similar logarithmic regret can be achieved using 
a second-order Newton-type learner [6]. 


Extension to High Dimensions The above anal- 
ysis easily extends to high dimensions. In fact, 
Theorems | and 2 hold in any abstract Hilbert 
space, with virtually the same proof (provided 
that we replace the absolute value with the Hilbert 
norm). The cost functions f; need not be differ- 
entiable either; picking an arbitrary subgradient 
in the subdifferential 0f;(x;) would suffice. 


Extension to Composite Functions The regret 
can be extended to include a penalty function g 
as follows: 


T 
Rr(x) = )ofe (ar) + g(r) — fe(x) — g(x). 


t=1 


(13) 
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Our previous definition in (1) corresponds to the 
setting where g(x) = Oiff x € D (otherwise the 
regret is set to 00). We could simply treat f+ as 
a whole and apply the online gradient algorithm 
without any modification. A different approach, 
resulting in a similar regret bound, upgrades the 
projection to the proximity operator (of g): 


Pe (x) = argmin ale — yl +a(y), (14) 


where 7 > 0 is the step size to be chosen ap- 
propriately. The latter approach is not only more 
general but also leads to more structured inter- 
mediate predictions [4]. For instance, if g(x) = 
>>; |x; | is the £; norm, then [P?(x)]; = sign(x;)- 
max{|x;| — 7,0}, which would be exactly zero 
if |x;| is small and n is large. In contrast, if we 
apply online gradient descent directly to f; + g, 
we would almost never get sparse intermediate 
predictions. 


Without Projections The online gradient 
learner is computationally efficient only when 
the projection P p(-) in (6) (or more generally the 
proximity operator in (14)) can be efficiently 
implemented. In some applications, this is 
unfortunately not the case. Instead, Hazan 
and Kale [5] proposed a different learner that 
bypasses the projection step. Basically, the 
learner iteratively finds the vertexes of the 
decision set D and then takes suitable convex 
combinations of them to make progress. 


Connection to Stochastic Optimization The 
regret bound in Theorem 2 is closely related to 
some results in stochastic optimization, for the 
following problem [7]: 


inf F(x), where f(x) := Eg(F(x,&)), (15) 


and € is some random variable. The stochastic 
(sub)gradient method is a popular iterative algo- 
rithm for optimizing (15). In each iteration, it 
randomly draws an independent sample & and 
follows the projected (sub)gradient update: 


X41 = Po(x: — m Vx F (Xr, &)), 
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for some small step size yn; > 0. The similarity 
to the online gradient learner is apparent once 
we identify f(x) := F(x, &). Thus, the regret 
bound in Theorem 2 implies 


T 


0 (=z) = sup FELD (ion - £0)] 


xeD T t=1 


y 
= sup py (FOr, &) — F(x,&)) | 


xeD t=1 


T 
E (; > fs] ~ inf, fx) 


t=1 


T 
E (r (7>»]} ~ inf, f(x, 


t=1 


IV 


provided that the random sample & is 
independent of x, and F(-,&) is convex for 
(almost) every realization of €. In other words, 
the ergodic mean ae approaches, in 
expectation, the infimum in (15) at the rate 


O(1/VT). 


Cross-References 


Multi-armed Bandit Problem 


Recommended Reading 


1. Bartlett PL, Hazan E, Rakhlin A (2007) Adaptive 
online gradient descent. In: Platt JC, Koller D, Singer 
Y, Roweis ST (eds) Advances in neural information 
processing systems 20 (NIPS). Curran Associates, Inc., 
Vancouver, pp 257-269 

2. Bubeck S, Cesa-Bianchi N (2012) Regret analysis of 
stochastic and nonstochastic multi-armed bandit prob- 
lems. Found Trends Mach Learn 5(1):1-122 

3. Cesa-Bianchi N, Lugosi G (2006) Prediction, learning, 
and games. Cambridge University Press, New York 

4. Duchi JC, Shalev-Shwartz S, Singer Y, Tewari A 
(2010) Composite objective mirror descent. In: Kalai 
AT, Mohri M (eds) The 23rd conference on learning 
theory (COLT). Haifa, pp 14-26 

5. Hazan E, Kale S (2012) Projection-free online learn- 
ing. In: Langford J, Pineau J (eds) The 29th interna- 
tional conference on machine learning (ICML), Edin- 
burgh. Omnipress, pp 521-528 

6. Hazan E, Agarwal A, Kale S (2007) Logarithmic re- 
gret algorithms for online convex optimization. Mach 
Learn 69:169-192 


Online List Update 


7. Nemirovski A, Juditsky A, Lan G, Shapiro A (2009) 
Robust stochastic approximation approach to stochas- 
tic programming. SIAM J Optim 19(4):1574—-1609 

8. Shalev-Shwartz S (2011) Online learning and on- 
line convex optimization. Found Trends Mach Learn 
4(2):107-194 

9. Zinkevich M (2003) Online convex programming 
and generalized infinitesimal gradient approach. In: 
Fawcett T, Mishra N (eds) The 20th international 
conference on machine learning (ICML), Washington. 
AAAI Press, pp 928-936 


Online List Update 


Shahin Kamali 
David R. Cheriton School of Computer Science, 
University of Waterloo, Waterloo, ON, Canada 


Keywords 


Competitive analysis; Data compression; Online 
computation; Self-adjusting lists 


Years and Authors of Summarized 
Original Work 


1985; Sleator, Tarjan 


Problem Definition 


List update is one of the classic problems in 
the context of online computation. The main 
motivation for the study of the problem is self- 
adjusting lists. Consider a linear list which rep- 
resents a dictionary abstract data type. There 
are three elementary operations in the dictionary, 
namely, insertion, deletion, and lookup (search). 
To perform these operations on an item x, an 
algorithm needs to search for x, i.e., examine the 
list items, one by one, to find x. For the case 
of an insertion, all items should be sequentially 
checked to ensure that the inserted item is not 
already in the list. A deletion also requires finding 
the item that is being deleted. In this manner, all 
operations can be translated into a sequence of 
lookups or accesses to the items in the list. To 
access an item at index i, an algorithm examines 
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i items and therefore incurs an access cost of i. 
Immediately after the access, the algorithm can 
move the accessed item to any position closer to 
the front of the list at no extra cost; this is called 
a free exchange. It is also possible to exchange 
any two consecutive items at a cost of | through a 
paid exchange. The objective is to organize the 
list, using free and paid exchanges, so that the 
total cost (for accesses and paid exchanges) is 
minimized. 

The list update is naturally an online problem, 
i.e., at the time of accessing an item, it is not clear 
what items will be requested in the future. An 
online algorithm has to take its decision without 
any knowledge about the forthcoming requests. 
For example, Move-To-Front (MTF) is a well- 
known list update algorithm which moves an 
accessed item to front using a free exchange. In 
taking its decision, MTF does not rely on any 
information about future requests. Among other 
classic list update algorithms, we might mention 
Transpose (TRANS) and Frequency Count (FC). 
After accessing an item, TRANS moves it one step 
closer to the front, i.e., it exchanges the position 
of x with its preceding item. FC maintains the list 
in a way that more frequent items appear closer 
to the front. In doing so, it maintains a counter 
for each item x which indicates the number of 
previous requests to x. 

Competitive analysis is the standard method 
for the study and classification of list update algo- 
rithms. An algorithm is said to be c-competitive 
if the cost of serving any request sequence never 
exceeds c times the optimal cost of an offline 
algorithm OPT which knows the entire sequence 
in advance. More precisely, an algorithm A is c- 
competitive if A(a) < cOPpT(c) + b for any 
sequence o. Here, A(o) and OPT(c) respectively 
denote the costs of A and OPT for serving o, and 
c and b are constants. 


Key Results 


List update algorithms were initially studied in 
regard to their typical behavior on sequences that 
follow probability distributions. The average cost 
ratio of an algorithm A is the ratio between the 
expected cost of A for a random sequence and 
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the cost of an optimal offline algorithm which 
arranges items in nonincreasing order by proba- 
bility. Under this setting, the ratio achieved by Fc 
is 1 [14], while that of MTF is 2/2 [7]. Moreover, 
there are distributions in which TIMESTAMP has a 
better ratio than MTF [14]. These results indicate 
that Fc and TIMESTAMP are better than MTF. 
However, in practice, MTF has an advantage over 
the other algorithms. This is partially because the 
input sequences do not necessarily follow a fixed 
probability distribution. 

In their seminal paper, Sleator and Tarjan 
proved that MTF is 2-competitive, while TRANS 
and Fc do not achieve a constant competitive 
ratio [15]. At the same time, for sufficiently long 
lists, no algorithm can achieve a competitive ratio 
better than 2. For a while, MTF was the only 
algorithm with optimal competitive ratio until 
Albers introduced the TIMESTAMP algorithm [1]. 
After accessing an item x, TIMESTAMP inserts 
x in front of the first item y that is before x in 
the list and is requested at most once since the 
last request for x. If there is no such item y, or 
if this is the first access to x, TIMESTAMP does 
not reorganize the list. TIMESTAMP is also 2- 
competitive [1]. 


Randomized Algorithms 

No randomized list update algorithm can be bet- 
ter than 2-competitive against adaptive adver- 
saries. However, there are randomized algorithms 
with better competitive ratios against oblivious 
adversaries. Reingold et al. introduced a random- 
ized algorithm called BIT which assigns a bit to 
each item x and initially sets it, uniformly and 
randomly, to be 0 or 1. At the time of an access to 
an element x, the bit of x is complemented, and if 
it becomes 1, the algorithm moves x to the front. 
BIT has a competitive ratio of 1.75 [13]. Albers 
et al. proposed a hybrid algorithm, called COMB, 
which randomly selects between TIMESTAMP 
and BIT strategies [4]. Upon a request to an item, 
the algorithm applies BIT strategy with proba- 
bility 0.8 and TIMESTAMP with probability 0.2. 
COMB has a competitive ratio of 1.6 [4] which is 
the best among existing algorithms. Teia proved 
that no randomized algorithm can be better than 
1.5-competitive [16]. 
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Locality of Reference 

Real-life sequences usually exhibit locality 
of reference which implies that the currently 
requested item is more likely to be requested 
again. One model of locality is concave analysis 
in which the sequences are consistent with a 
concave function f so that the number of distinct 
requests in any window of size t is at most 
F(t). MTF is the unique optimal solution under 
bijective analysis for sequences that have locality 
of reference with respect to concave analysis [6]. 
Bijective analysis is an alternative to competitive 
analysis that directly compares two algorithms 
based on their worst-case and average-case 
behavior (see [6] for details). 

Inspired by the concave analysis, Dorrigiv 
et al. [8] defined the nonlocality of a sequence 
o of length n, denoted by Xo), as )7;_, d; in 
which d; is the number of distinct items requested 
since the last request to the ith item in o. For the 
first request to an item, d; is equal to the length 
! of the list. For any sequence o, the cost of any 
online algorithm is at least (0), while MTF has 
the same cost of Xo). The cost of TIMESTAMP 
is at least 2A (o), and TRANS and FC both have a 
cost of at least //2 xA(o) [8]. These results imply 
an advantage for MTF when sequences have high 
locality. 

Albers and Lauer defined an alternative lo- 
cality model which assigns a value A € [0, 1] 
for each sequence [2]. The larger values for A 
imply a higher locality. Using this notion of 
locality, the competitive ratio of MTF is at most 
coe i.e., for sequences with high locality, MTF 
is 1-competitive. The ratio of TIMESTAMP does 
not improve on request sequences satisfying A 
locality, i.e., it remains 2-competitive. The same 
holds for algorithm COMB, i.e., it remains 1.6- 
competitive. However, for the algorithm BIT the 


competitive ratio improves to min{1.75, ote}. 


Applications 


As mentioned earlier, the basic application of 
the list update is in maintaining self-adjusting 
lists. Martinez and Roura [11], and also Munro 
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[12], observed that a complete rearrangement 
of items which precede an item at position 7 is 
proportional to i rather than i?. For example, 
accessing the item at the end of a list and 
reversing the list can be done in linear time, while 
under the standard model, it has a quadratic cost. 
In the MRM model, after an access to an item at 
index 7, the preceding items can be arranged free 
of charge. It is known that any online algorithm 
has a competitive ratio of 2(//1g/) for a list 
of length 7 under the MRM model [11, 12], ive., 
under this practical setting of the problem, no 
online algorithm can be competitive (see [9] for 
details). 

List update is widely used for compression 
purposes. Consider each character of a text as 
an item in the list and the text as the input 
sequence. A compression algorithm writes an 
arbitrary initial configuration in the compressed 
file, as well as the access costs of a list update 
algorithm A for serving each character. In the 
decompression phase, the algorithm starts from 
the same initial configuration and follows the 
steps of the algorithm by reading the access cost 
written in the compressed file. In order to enhance 
the performance of the compression schemes, the 
Burrows-Wheeler Transform (BWT) can be ap- 
plied to the input string to increase the locality of 
the input. The bzip2 compression program which 
applies MTF after BWT transform outperforms 
the widely used gzip program by more than 5 % 
on the standard Canterbury corpus. 

To theoretically study the list update problem 
in the context of compression, it is better to 
assume the cost of accessing an item at index 
i is O(logi) rather than O(i). This is because, 
when an item is accessed in the ith position, 
the value of i is written as a binary code rather 
than unary. Sleator and Tarjan show that MTF 
is 2-competitive if the access cost is a convex 
function. On the other hand, some algorithms are 
competitive when the access cost is linear and 
noncompetitive when the access cost is O(logi). 
However, it is not the case for MTF and it has 
been shown that MTF is 2-competitive when the 
access cost is logarithmic [10]. In other words, 
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MTF is useful for compression, even for the 
sequences (files) generated by an adversary. 


Open Problems 


The competitive ratio of the best randomized 
algorithm lies in the range [1.5, 1.6]. Closing this 
gap is an important direction for future research. 
Almost all existing algorithms have the projective 
property which informally means that the relative 
position of any two items in the lists maintained 
by these algorithms only depend on the requests 
to these items. Projective algorithms are analyzed 
under the partial cost model where the cost of 
accessing an item at index 7 isi — 1. It is known 
that no online algorithm with the projective prop- 
erty can achieve a competitive ratio better than 
1.6 under the partial cost model [5]. Hence, to 
introduce an algorithm with competitive ratio 
better than 1.6 of BIT, one needs to deviate from 
the projective property. 

Reingold et al. introduced another model, 
called d-paid exchange model, in which the 
cost of paid exchanges is scaled up by a value 
d > 1, while free exchanges are not allowed [13]. 
Under this model, no deterministic algorithm 
can be better than 3-competitive [13], while 
the best existing algorithm is 4.56-competitive 
(reported in [3]). For the particular case of 
d = 1, the best lower and upper bound are, 
respectively, 3 and 4 (MTF is 4-competitive). 
Closing these gaps is another direction for future 
research. 
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Problem Definition 


Load balancing of temporary tasks is an on- 
line problem. In this problem, arriving tasks (or 
jobs) are to be assigned to processors, which 
are also called machines. In this entry, determin- 
istic online load balancing of temporary tasks 
with unknown duration is discussed. The input 
sequence consists of departures and arrivals of 
tasks. If the sequence consists of arrivals only, the 
tasks are called permanent. Events happen one 
by one, so that the next event appears after the 
algorithm completes dealing with the previous 
event. 

Clearly, the problem with temporary tasks is 
different from the problem with permanent tasks. 
One such difference is that for permanent tasks, 
the maximum load is always achieved in the end 
of the sequence. For temporary tasks, this is not 
always the case. Moreover, the maximum load 
may be achieved at different times for different 
algorithms. 

In the most general model, there are m ma- 
chines 1,...,7. The information of an arriving 
job 7 is a vector p; of length m, where P; is the 
load or size of job 7 if it is assigned to machine 
i. As stated above, each job is to be assigned to a 
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machine before the next arrival or departure. The 
load of a machine i at time ¢ is denoted by Li 
and is the sum of the loads (on machine 7) of jobs 
which are assigned to machine 7 that arrived by 
time ft and did not depart by this time. The goal 
is to minimize the maximum load of any machine 
over all times ¢. This machine model is known 
as unrelated machines (see [3] for a study of the 
load-balancing problem of permanent tasks on 
unrelated machines). Many more specific models 
were defined. In the sequel, a few such models 
are described. 

For an algorithm A, denote its cost by A as 
well. The cost of an optimal offline algorithm 
that knows the complete sequence of events in 
advance is denoted by OPT. Load balancing is 
studied in terms of the (absolute) competitive 
ratio. The competitive ratio of A is the infimum 
R such that for any input, A < R- OPT. If the 
competitive ratio of an online algorithm is at most 
C, it is also called C-competitive. 

Uniformly related machines [3, 12] are ma- 
chines with speeds associated with them; thus, 
machine i has speed s;, and the information 
that a job 7 needs to provide upon its arrival is 
just its size, or the load that it incurs on a unit 
speed machine, which is denoted by p;. Then, let 
pi = p;/s;. Vf all speeds are equal, this results 
in identical machines [13]. 

Restricted assignment [8] is a model where 
each job may be run only on a subset of the 
machines. A job j is associated with running 
time, which is the time to run it on any of its 
permitted machines M;. Thus, if i € M;, then 
pi = pj, and otherwise, P, = 00. 


Key Results 


The known results in all four models are surveyed 
below. 


Identical Machines 

Interestingly, the well-known algorithm of Gra- 
ham [13], List Scheduling, which is defined for 
identical machines, is valid for temporary tasks as 
well as permanent tasks. This algorithm greedily 
assigns a new job to the least loaded machine. 
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The competitive ratio of this algorithm is 2—1/m, 
which is best possible (see [5]). Note that the 
competitive ratio is the same as for permanent 
tasks, but for permanent tasks, it is possible to 
achieve a competitive ratio which does not tend 
to 2 for large m, see, e.g., [11]. 


Uniformly Related Machines 

The situation for uniformly related machines is 
not very different. In this case, the algorithms of 
Aspnes et al. [3] and of Berman et al. [12] cannot 
be applied as they are, and some modifications 
are required. The algorithm of Azar et al. [7] 
has competitive ratios of at most 20, and it is 
based on the general method introduced in [3]. 
The algorithm of [3] keeps a guess value A, which 
is an estimation of the cost of an optimal offline 
algorithm OPT. An invariant that must be kept is 
A < 20PT. At each step, a procedure is applied 
for some value of A (which can be initialized as 
the load of the first job on the fastest machine). 
The procedure for a given value of A is applied 
until it fails, and some job cannot be assigned 
while satisfying all conditions. The procedure is 
designed so that if it fails, then it must be the 
case that OPT > i, the value of A is doubled, 
and the procedure is reinvoked for the new value, 
ignoring all assignments that were done for small 
values of A. This method is called doubling and 
results in an algorithm with a competitive ratio 
which is at most four times the competitive ratio 
achieved by the procedure. The procedure for a 
given A acts as follows. Let c be a target com- 
petitive ratio for the procedure. The machines are 
sorted according to speed. Each job is assigned 
to the first machine in the sorted order such that 
the job is assignable to it. A job 7 arriving at time 
t is assignable to machine i if p;/s; < A and 
Li“! + p;/s; < cA. Itis shown in [7] that c = 5 
allows the algorithm to succeed in the assignment 
of all jobs (i.e., to have at least one assignable 
machine for each job) as long as OPT < i. 
Note that the constant c for permanent tasks used 
in [3] is 2. As for lower bounds, it is shown in 
[7] that the competitive ratio ® of any algorithm 
satisfies R > 3 — o(1). The upper bound has 
been improved to 6 + 2/5 ~ 10.47 by Bar-Noy 
et al. [9]. 
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Restricted Assignment 

As for restricted assignment, temporary tasks 
make this model much more difficult than per- 
manent tasks. The competitive ratio O(log m) 
which is achieved by a simple greedy algorithm 
(see [8]) does not hold in this case. In fact, 
the competitive ratio of this algorithm becomes 
Q(m3) [4]. Moreover, in the same paper, a lower 
bound of Q./m on the competitive ratio of any 
algorithm was shown. The construction was quite 
involved; however, Ma and Plotkin [14] gave a 
simplified construction which yields the same 
result. 

The construction of [14] selects a value 
p, which is the largest integer that satisfies 
p+ p? <m. Clearly, p = ©(./m). The lower 
bound uses two sets of machines, p machines 
which are called “the small group” and p? 
machines which are called “the large group.” 
The construction consists of p? phases, each of 
which consists of p jobs and is dedicated to one 
machine in the large group. In phase 7, job k of 
this phase can run either on the k-th machine of 
the small group or the i-th machine of the large 
group. After this arrival, only one of these p jobs 
does not depart. An optimal offline algorithm 
assigns all jobs in each phase to the small group 
except for the one job that will not depart. Thus, 
when the construction is completed, it has one 
job on each machine of the large group. The 
maximum load ever achieved by OPT is 1. 
However, the algorithm does not know at each 
phase which job will not depart. If no job is 
assigned to the small group in phase 7, then the 
load of machine i becomes p. Otherwise, a job 
that the algorithm assigns to the small group is 
chosen as the one that will not depart. In this way, 
after p phases, a total load of p? is accumulated 
on the small group, which means that at least one 
machine there has load p. This completes the 
construction. 

An alternative algorithm called ROBIN Hoop 
was designed in [7]. This algorithm keeps a lower 
bound on OPT, which is the maximum between 
the following two functions. The first one is the 
maximum average machine load over time. The 
second is the maximum job size that has ever 
arrived. Denote this lower bound at time ¢ (after 
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t events have happened) by B’. A machine i is 
called rich at time t if Li > ./mB". Otherwise, 
it is called poor. The windfall time of a rich 
machine i at time ¢ is the time ¢’ such that i is 
poor at time ¢’ — 1 and rich at times ¢’,...,f, i.e., 
the last time that machine 7 became rich. Clearly, 
machines can become poor due to an update of 
B' or departure of jobs. A machine can become 
rich due to arrival of jobs that are assigned to it. 

The algorithm assigns a job 7 to a poor ma- 
chine in M(j) if such a machine exists. Other- 
wise, j is assigned to the machine in M(/) with 
the most recent windfall time. The analysis makes 
use of the fact that at most ./m machines can be 
rich simultaneously. 

Note that for small values of m (m < 5), the 
competitive ratio of the greedy algorithm is still 
best possible, as shown in [1]. In this paper, it was 
shown that these bounds are (m + 3)/2 form = 
3,4, 5. It is not difficult to see that form = 2, the 
best bound is 2. 


Unrelated Machines 

The most extreme difference occurs for unre- 
lated machines. Unlike the case of permanent 
tasks, where an upper bound of O(log m) can 
be achieved [3], it was shown in [2] that any 
algorithm has a competitive ratio of Q(m/logm). 
Note that a trivial algorithm, which assigns each 
job to the machine where it has a minimum load, 
has a competitive ratio of at most m [3]. 


Applications 


In [10], a hierarchical model was studied. This 
is a special case of restricted assignment where 
for each job 7, M(/) is a prefix of the machines. 
They showed that even for temporary tasks, an 
algorithm of constant competitive ratio exists for 
this model. 

In [6], which studied resource augmentation in 
load balancing, temporary tasks were considered 
as well. Resource augmentation is a type of 
analysis where the online algorithm is compared 
to an optimal offline algorithm which has less 
machines. 
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Open Problems 


Small gaps still remain for both uniformly re- 
lated machines and for unrelated machines. For 
unrelated machines, it could be interesting to 
find if there exists an algorithm of competitive 
ratio o(m) or whether the simple algorithm stated 
above has optimal competitive ratio (up to a 
multiplicative factor). 
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Problem Definition 


We are given an undirected graph G = (V, E) 
offline, where node v has a given weight wy. 
Initially, the output graph H C G is the empty 
graph. In the generic online Steiner network de- 
sign problem, each online step has a connectivity 
request C; and the online algorithm must aug- 
ment the output graph H to meet the new request. 
We will consider the following problems in this 
domain: 


e Steiner tree. Each connectivity request Cj 
comprises a new vertex t; € V (called a 
terminal) that must be connected in #H to all 
previous terminals. (The first terminal fo is 
often called the root and the constraint C; can 
then be restated as connecting terminal f; to 
the root.) 

e Steiner forest. Each connectivity request C; 
comprises a new vertex pair (s;,¢;) (called a 
terminal pair) that must be connected in H. 
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* Group Steiner tree. Each connectivity request 
C; comprises a new set (group) of vertices 
T; © V (called a terminal group). The first 
terminal group 79 is a single vertex r called 
the root. At least one vertex in each terminal 
group must be connected in H to the root. 

¢ Group Steiner forest. Each connectivity re- 
quest C; comprises a new pair of sets (groups) 
of vertices (S;,7;) (called a terminal group 
pair). For each terminal group pair, at least 
one vertex in S; must be connected in H to 
at least one vertex in T;. 

¢ Prize-collecting Steiner tree (resp., Prize- 
collecting Steiner forest). Each connectivity 
request comprises a new terminal 7; (resp., 
a new terminal pair (s;,¢;)) and a penalty 
mz; > 0; the algorithm must either pay the 
penalty m; or augment graph H to connect 
terminal ¢; to the root (resp., augment graph 
H to connect the terminal pair (s;, ¢;)). 


In the (group) Steiner tree and (group) Steiner 
forest problems, the objective is to minimize the 
total weight (i.e., sum of weights of vertices) of 
graph H. In the prize-collecting versions of these 
problems, the objective is to minimize the sum of 
the total weight of H and the sum of penalties 
paid by the algorithm. 


Key Results 


The following theorem was obtained by Naor 
et al. [7] for the online node-weighted Steiner tree 
problem. 


Theorem 1 There is a randomized online algo- 
rithm for the node-weighted Steiner tree problem 
that has a competitive ratio of O(log? k logn) 
and runs in polynomial time. 


This was the first result to obtain a polyloga- 
rithmic competitive ratio for the online node- 
weighted Steiner tree problem. The competitive 
ratio for this problem was later improved to 
O(logk logn) (see [5]), which is tight up to 
constants. 
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The lower bound follows from the observation 
that the online set cover problem is a special case 
of the online node-weighted Steiner tree prob- 
lem. For the online set cover problem, a lower 
bound of Q (ae 
algorithms was obtained by Alon et al. [1], where 
m is the number of sets and n is the number 
of elements. This was later improved and ex- 
tended to a lower bound of Q(logm logn) for 
randomized algorithms by Korman [6]. An online 
set cover instance can be encoded as an online 
node-weighted Steiner tree instance where the 
terminals are the elements and the nonterminals 
are the sets. This encoding yields a lower bound 
of Q(logk logn) for the online node-weighted 
Steiner tree problem and its generalizations dis- 
cussed below. 

In addition to the Steiner tree problem, Naor 
et al. [7] also considered the online node- 
weighted Steiner forest problem and the online 
node-weighted group Steiner tree problem. 
In fact, they obtained the following theorem 
for the online node-weighted group Steiner 
forest problem which generalizes both these 
problems. 


) for deterministic 


Theorem 2 There is a randomized online algo- 
rithm for the node-weighted group Steiner forest 
problem that has a competitive ratio polyloga- 
rithmic inn and k and runs in quasi-polynomial 
time. 


For edge-weighted graphs, the same competitive 
ratio was obtained with a polynomial-time 
algorithm. 

Subsequent to this work, Hajiaghayi et al. [4] 
investigated the online node-weighted Steiner 
forest problem and obtained the first polynomial- 
time algorithm with a polylogarithmic competi- 
tive ratio. 


Theorem 3 There is a_ randomized online 
algorithm for the node-weighted Steiner forest 
problem that has a competitive ratio of 


O(log’ k logn) and runs in polynomial time. 


The competitive ratio is tight up to a logarithmic 
factor owing to the online set cover lower bound 
described above. For graphs with an excluded 
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minor (such as planar graphs), they gave an 
improved competitive ratio of O(logn) for this 
problem, which is tight up to constants. More- 
over, the result can be extended to all {0, 1}- 
proper functions which were introduced by Goe- 
mans and Williamson [3] to capture a broad range 
of connectivity problems and extended later to 
node-weighted graphs by Demaine et al. [2]. 

For the prize-collecting variants of the online 
node-weighted Steiner tree and Steiner forest 
problems, Hajiaghayi et al. [5] gave the first 
algorithms with a polylogarithmic competitive 
ratio by showing that these problems can be 
reduced to the fractional versions of their non 
prize-collecting variants while losing only a log- 
arithmic factor in the competitive ratio. This led 
to the following results. 


Theorem 4 There is a randomized online al- 
gorithm for the prize-collecting node-weighted 
Steiner tree problem that has a competitive ratio 
of O(log k log? n). For the node-weighted prize- 
collecting Steiner forest problem, there is a ran- 
domized online algorithm that has a competitive 
ratio of O(log? k log* n). Both these algorithms 
run in polynomial time. 


Corresponding results for edge-weighted graphs 
were previously known [8]. 


Applications 


Online node-weighted Steiner problems have 
broad applications in designing communication 
networks where the clientele grows over 
time. 


Open Problems 


Suppose we are given a_ node-weighted 
undirected graph G = (V,£). In the online 
edge-connectivity (resp., vertex connectivity) 
version of the survivable network design problem 
(SNDP), the online connectivity requirement 
C; comprises a pair of terminals (s;,7;) and an 
integer requirement r; > 0. The online algorithm 
must augment the output graph H so that there 
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are r; edge-disjoint (resp., node-disjoint) paths 
between s; and ¢; in H. The objective is to 
minimize the total weight of H. 

An interesting open problem is to obtain an 
algorithm with competitive ratio O(re log’ n) 


max 
for any constants a, for the online node- 
weighted SNDP problem with either edge 
or vertex connectivity requirements, where 
Tmax — Max; rj. 


Experimental Results 


No experimental results are known. 
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Problem Definition 


A file-caching problem instance specifies a cache 
size k (a positive integer) and a sequence of 
requests to files, each with a size (a positive 
integer) and a retrieval cost (a nonnegative num- 
ber). The goal is to maintain the cache to satisfy 
the requests while minimizing the retrieval cost. 
Specifically, for each request, if the file is not 
in the cache, one must retrieve it into the cache 
(paying the retrieval cost) and remove other files 
to bring the total size of files in the cache to k 
or less. Weighted caching or weighted paging is 
the special case when each file size is 1. Paging 
is the special case when each file size and each 
retrieval cost is 1 (then the retrieval cost is the 
number of cache misses, and the fault rate is the 
average retrieval cost per request). 

An algorithm is online if its response to 
each request is independent of later requests. 
In practice this is generally necessary. Standard 
worst-case analysis is not meaningful for online 
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algorithms — any algorithm will have some input 
sequence that forces a retrieval for every request. 
Yet worst-case analysis can be done meaningfully 
as follows. An algorithm is c(h, k)-competitive if 
on any sequence o the total (expected) retrieval 
cost incurred by the algorithm using a cache 
of size k is at most c(h,k) times the minimum 
cost to handle o with a cache of size h (plus a 
constant independent of o). Then the algorithm 
has competitive ratio c(h, k). The study of com- 
petitive ratios is called competitive analysis. (In 
the larger context of approximation algorithms 
for combinatorial optimization, this ratio is 
commonly called the approximation ratio.) 


Algorithms. Here are definitions of a number 
of caching algorithms; first is LANDLORD. 
LANDLORD gives each file “credit” (equal to 
its cost) when the file is requested and not in 
cache. When necessary, LANDLORD reduces 
all cached file’s credits proportionally to file 
size, then evicts files as they run out of credit. 


File-caching algorithm LANDLORD 

Maintain real value credit[ f] with each file f 
(credit[ f] = Oif f is not in the cache). 
When a file g is requested: 


1. if g is not in the cache: 


2. until the cache has room for g: 

3. for each cached file f: decrease 
credit[ f] by A- size[ f], 

4. where A = 
MIN f ecache credit[ f]/size[f']. 

2: Evict from the cache any subset 
of the zero-credit files f. 

6. Retrieve g into the cache; set 


credit[g] < cost(g). 
7. else Reset credit[g] anywhere between its 


current value and cost(g). 


For weighted caching, file sizes equal 1. GREEDY 
DUAL is LANDLORD for this special case. 
BALANCE is the further special case obtained 
by leaving credit unchanged in line 7. 

For paging, files sizes and costs equal 1. 
FLUSH-WHEN-FULL is obtained by evicting all 
zero-credit files in line 5; FIRST-IN-FIRST-OUT 
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is obtained by leaving credits unchanged in line 
7 and evicting the file that entered the cache 
earliest in line 5; LEAST-RECENTLY-USED is 
obtained by raising credits to 1 in line 7 and 
evicting the least-recently requested file in line 5. 
The MARKING algorithm is obtained by raising 
credits to 1 in line 7 and evicting a random zero- 
credit file in line 5. (LANDLORD generalizes to 
arbitrary covering problems with submodular 
costs as described in [10].) 


Key Results 


This entry focuses on competitive analysis of 
paging and caching strategies as defined above. 
Competitive analysis has been applied to many 
problems other than paging and caching, and 
much is known about other methods of analysis 
(mainly empirical or average case) of paging and 
caching strategies, but these are outside scope of 
this entry. 


Paging 

In a seminal paper, Sleator and Tarjan showed 
that LEAST-RECENTLY-USED, FIRST-IN-FIRST- 
OUT, and FLUSH-WHEN-FULL are —j75- 
competitive [13]. Sleator and Tarjan also showed 
that this competitive ratio is the best possible 
for any deterministic online algorithm. Fiat 
et al. showed that the MARKING algorithm is 
2H,-competitive and that no randomized online 
algorithm is better than H;-competitive [6]. 
Here Hp, = 14+ 1/2 +--- + I/k ®& 
0.58 + Ink. McGeoch and Sleator gave an 
optimal H,-competitive randomized online 
paging algorithm [12]. 


Weighted Caching 

For weighted caching, Chrobak et al. showed that 
the deterministic online BALANCE algorithm is 
k-competitive [4]. Young showed that GREEDY 
DUAL is rc +7 -competitive and that GREEDY 
DUAL is a primal-dual algorithm — it generates 
a solution to the linear-programming dual which 
proves the near-optimality of the primal solu- 
tion [14]. Bansal et al., resolving a long-standing 
open problem, used the primal-dual framework to 
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give an O(log k)-competitive randomized algo- 
rithm for weighted caching [2]. 


File Caching 

When each cost equals | (the goal is to minimize 
the number of retrievals), or when each file’s cost 
equals the file’s size (the goal is to minimize 
the total number of bytes retrieved), Irani gave 
O(log’ k)-competitive randomized online algo- 
rithms [7]. 

For general file caching, Irani and Cao 
showed that a restriction of LANDLORD is k- 
competitive [3]. Independently, Young showed 
that LANDLORD is rt I -competitive [15]. 


Other Theoretical Models 

Practical performance can be better than the 
worst case studied in competitive analysis. 
Refinements of the model have been proposed 
to increase realism. Borodin et al. [1], to model 
locality of reference, proposed the access- 
graph model (see also [8, 9]). Koutsoupias and 
Papadimitriou proposed the comparative ratio 
(for comparing classes of online algorithms 
directly) and the diffuse-adversary model (where 
the adversary chooses requests probabilistically 
subject to restrictions) [11]. Young showed that 
any ~—;77~Competitive algorithm is also loosely 
O(1)-competitive: for any fixed ¢,5 > 0, on any 
sequence, for all but a 6-fraction of cache sizes k, 
the algorithm either is O(1)-competitive or pays 
at most € times the sum of the retrieval costs [15]. 


Analyses of Deterministic Algorithms 
Here is a competitive analysis of GREEDY DUAL 
for weighted caching. 


k 


Theorem 1 GREEDY DUAL is sere 


competitive for weighted caching. 


Proof Here is an amortized analysis (in the spirit 
of Sleator and Tarjan, Chrobak et al., and Young; 
see [14] for a different primal-dual analysis). 
Define potential 


® =(h—-1)- ¥> credit[] 


f€eGpd 


4k. ae (cost(/) - credit{ ]), 


Sf €oPT 
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where GD and OPT denote the current caches of 
GREEDY DUAL and OPT (the optimal off-line 
algorithm that manages the cache to minimize 
the total retrieval cost), respectively. After each 
request, GREEDY DUAL and OPT take (some 
subset of) the following steps in order. 

OPT evicts a file f: Since credit[f] < 
cost(f), ® cannot increase. 

OPT retrieves requested file g: OPT pays 
cost(g); ® increases by at most k cost(g). 

GREEDY DUAL decreases credit| f] for all 
Ff € GD: The cache is full and the requested file 
is in OPT but not yet in GD. So |GD| = & and 
|OPT M GD| < fA — 1. Thus, the total decrease in 
® is A[(h — 1)|GD| —k |opTN Gp|] > Af(h — 
Ik —k(h—-1)] =0. 

GREEDY DUAL evicts a file f: Since 
credit[ f] = 0, ® is unchanged. 

GREEDY DUAL retrieves requested file g 
and sets credit[g] to cost(g): GREEDY DUAL 
pays c = cost(g). Since g was not in GD but 
is in OPT, credit[g] = 0 and ® decreases by 
-—(h-l)e+ke=(k-h+ 1c. 

GREEDY DUAL resets credit[g] between its 
current value and cost(g): Since g € OPT and 
credit[g] only increases, ® decreases. 

So, with each request: (1) when OPT retrieves 
a file of cost c, ® increases by at most kc; 
(2) at no other time does @ increase; and (3) 
when GREEDY DUAL retrieves a file of cost c, ® 
decreases by at least (k —h + 1)c. Since initially 
@ = (and finally ® => 0, it follows that GREEDY 
DUAL’s total cost times k —h+ 1 is at most OPT’s 
cost times k. 


Extension to File Caching 

Although the proof above easily extends to 
LANDLORD, it is more informative to analyze 
LANDLORD via a general reduction from file 
caching to weighted caching: 


Corollary 1 LANDLORD is 
for file caching. 


k ee 
kong Competitive 


Proof Let W be any deterministic c-competitive 
weighted-caching algorithm. Define file-caching 
algorithm Fy as follows. Given request sequence 
o, Fw simulates W on weighted-caching se- 
quence o’ as follows. For each file f, break f 
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into size(f ) “pieces” { f;} each of size 1 and cost 
cost(f)/size(f). When f is requested, give a 
batch (fi, fo,..., ayer of requests for pieces 
to W. Take N large enough so W has all pieces 
{fi} cached after the first sN requests of the 
batch. 

Assume that W respects equivalence: after 
each batch, for every file f, all or none of f’s 
pieces are in W’s cache. After each batch, make 
Fw update its cache correspondingly to {f : 
fi € cache(W)}. Fy’s retrieval cost for o is at 
most W’s retrieval cost for o’, which is at most 
c OPT(o’), which is at most c OPT(o). Thus, Fw 
is c-competitive for file caching. 

Now, observe that GREEDY DUAL can be 
made to respect equivalence. When GREEDY 
DUAL processes a batch of requests (fi, fo,..., 
caw oa resulting in retrievals, for the last s 
requests, make GREEDY DUAL set credit[ fj] = 
cost(f;) = cost(f)/s in line 7. In general, 
restrict GREEDY DUAL to raise credits of 
equivalent pieces f; equally in line 7. After each 
batch the credits on equivalent pieces fj will be 
the same. When GREEDY DUAL evicts a piece fi, 
make GREEDY DUAL evict all other equivalent 
pieces f; (all will have zero credit). 

With these restrictions, GREEDY DUAL 
respects equivalence. Finally, taking W to be 
GREEDY DUAL above, Fw is LANDLORD. 


Analysis of the Randomized MarkinG 
Algorithm. 

Here is a competitive analysis of the MARKING 
algorithm 


Theorem 2 The MARKING algorithm is 2H,- 
competitive for paging. 


Proof Given a paging request sequence o, par- 
tition o into contiguous phases as follows. Each 
phase starts with the request after the end of the 
previous phase and continues as long as possible 
subject to the constraint that it should contain 
requests to at most k distinct pages. (Each phase 
starts when the algorithm runs out of zero-credit 
files and reduces all credits to zero.) 

Say a request in the phase is new if the item re- 
quested was not requested in the previous phase. 
Let m; denote the number of new requests in the 
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ith phase. During phases i — 1 and i, k + m; 
distinct files are requested. OPT has at most & of 
these in cache at the start of the 7 — Ist phase, 
so it will retrieve at least m; of them before the 
end of the ith phase. So OPT’s total cost is at least 
max{)>; m2j, >); M2i41} = 0; mi /2. 

Say a non-new request is redundant if it is 
to a file with credit 1 and nonredundant oth- 
erwise. Each new request costs the MARKING 
algorithm 1. The jth nonredundant request costs 
the MARKING algorithm at most m; /(k — j + 1) 
in expectation because, of the k—j +1 files that if 
requested would be nonredundant, at most m; are 
not in the cache (and each is equally likely to be in 
the cache). Thus, in expectation MARKING pays 
at most m; + peat mj /(k — j +1) < m; A, 
for the phase and at most Hy >°; m; total. 


Applications 


Variants of GREEDY DUAL and LANDLORD have 
been incorporated into file-caching software such 
as Squid [5]. 


Open Problems 


None to report. 


Experimental Results 


For a study of competitive ratios on practical 
inputs, see, for example, [3,5, 14]. 
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Problem Definition 


We consider an online version of the classical 
problem of preemptive scheduling on uniformly 
related machines. 

We are given m machines with speeds s; = 
Ss. = > Sm and a sequence of jobs, each 
described by its processing time (length). The 
actual time needed to process a job with length 
p on a machine with speed s is p/s. In the 
preemptive version, each job may be divided 
into several pieces, which can be assigned to 
different machines in disjoint time slots. (A job 
may be scheduled in several time slots on the 
same machine, and there may be times when 
a partially processed job is not running at all.) 
The objective is to find a schedule of all jobs in 
which the maximal completion time (makespan) 
is minimized. 

In the online problem, jobs arrive one by one 
and the algorithm needs to assign each incoming 
job to some time slots on some machines, without 
any knowledge of the jobs that arrive later. This 
problem, also known as list scheduling, was first 
studied by Graham [8] for identical machines 
(1.e., 5) = +++ = Sm = 1), without preemption. 
In the preemptive version, upon the arrival of a 
job, its complete assignment at all times must 
be given and the algorithm is not allowed to 
change this assignment later. In other words, the 
online nature of the problem is in the order of the 
input sequence, and it is not related to possible 
preemptions and the time in the schedule. 


Key Results 


The main result is an optimal online algorithm 
RatioStretch for preemptive scheduling on 
uniformly related machines [4]. RatioStretch 
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achieves the best possible competitive ratio not 
only in the general case but also for any number 
of machines and any particular combination 
of machine speeds. Although RatioStretch 
is deterministic, its competitive ratio matches 
the best competitive ratio of any randomized 
algorithm. This proves that randomization does 
not help for this variant of preemptive scheduling. 

For any fixed set of speeds, the competitive 
ratio of the algorithm RatioStretch can be com- 
puted by solving a linear program. However, 
its worst-case value over all speed combinations 
is not known. Nevertheless, using the fact that 
there exists an e-competitive randomized algo- 
rithm [5], it is possible to conclude that Ra- 
tioStretch also achieves the ratio of at most 
e ® 2.718. The best lower bound shows that 
RatioStretch (and thus any algorithm) is not 
better than 2.112-competitive, by providing an 
explicit numerical instance on 200 machines [3]. 


Key Techniques 


The idea of the algorithm RatioStretch is fairly 
natural. Suppose that the algorithm is given a 
ratio R which we are trying to achieve. For each 
arriving job, RatioStretch computes the optimal 
makespan for jobs that have arrived so far and 
runs the incoming job as slow as possible so 
that it finishes at R times the computed optimal 
makespan. There are many ways of creating such 
a schedule given the flexibility of preemptions. 
RatioStretch chooses a particular one based on 
the notion of a virtual machine from [5]. Given 
a schedule, the ith virtual machine at each time 
corresponds to the ith fastest real machine that is 
idle. (In particular, before the first job, the virtual 
machines are the real machines.) This assignment 
of the real machines to the virtual machines can 
vary at different times in the schedule. Due to pre- 
emption, a virtual machine can be thought of and 
used as a single machine with changing speed. 
The key idea of RatioStretch is to schedule each 
job on two adjacent virtual machines. 

If RatioStretch fails on some input for a 
given R, it is possible to use the lower bound 
technique from [7] and show that there is no R- 
competitive algorithm. This implies that if the 
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algorithm knows the optimal competitive ratio R, 
it never fails and thus it is R-competitive. 

It remains to find the optimal competitive ratio 
R. Since the lower bound technique from [7] 
results in a linear condition, one can show that 
R can be computed by a linear program for each 
combination of speeds. 


Semi-online Scheduling 


The algorithm RatioStretch can be extended to 
semi-online scenarios [6]. This term encompasses 
situations where some partial information about 
the input is given to the scheduler in advance. 
Already Graham [9] studied a semi-online variant 
of scheduling on identical machines: he proved 
that if the jobs are presented in non-increasing 
order of their processing times, the approxima- 
tion ratio of list scheduling decreases from 2 to 
4/3. Since then numerous semi-online models 
of scheduling have been studied; typical exam- 
ples include (sequences of) jobs with decreasing 
processing times, jobs with bounded processing 
times, sequences with known total processing 
time of jobs, and so on. Most of these models can 
be viewed as online algorithms on a restricted set 
of input sequences. 

RatioStretch can be generalized so that it is 
optimal for any chosen semi-online restriction. 
This means not only the cases listed above — the 
restriction can be given as an arbitrary set of se- 
quences that are allowed as inputs. Again, for any 
semi-online restriction, RatioStretch achieves 
the best possible approximation ratio for any 
number of machines and any particular combina- 
tion of machine speeds; it is deterministic, but its 
approximation ratio matches the best possible ap- 
proximation ratio of any randomized algorithm. 
This result also provides a clear separation be- 
tween the design of the algorithm and the analysis 
of the optimal approximation ratio. While the 
algorithm is always the same, the analysis of the 
optimal ratio depends on the studied restrictions. 

For typical semi-online restrictions, the opti- 
mal ratio can again be computed by linear pro- 
grams (with machine speeds as parameters). Then 
we can study the relations between the optimal 
approximation ratios for different semi-online 


Online Preemptive Scheduling on Parallel Machines 


restrictions and give some bounds for a large 
number of machines by analysis of these linear 
programs. One interesting result is that the overall 
ratio with known sum of processing times is the 
same as in the purely online case — even though 
for a small fixed number of machines, knowing 
the sum provides a significant advantage. 

Some basic restrictions form an inclusion 
chain: the inputs where the first job has the 
maximal processing time (which is equivalent 
to known maximal processing time) include the 
inputs with non-increasing processing times, 
which in turn include the inputs with all 
jobs of equal processing time. The restriction 
to non-increasing processing times gives the 
same approximation ratio as when all jobs 
have equal processing times, even for any 
particular combination of speeds. The overall 
approximation ratio of these two equivalent 
problems is at most 1.52. For known maximal 
processing time of a job, there exists a computer- 
generated hard instance with approximation ratio 
1.88 with 120 machines. Thus, restricting the 
jobs to be non-increasing helps the algorithm 
much more than just knowing the maximal 
processing time of a job. This is very different 
from identical machines, where knowing the 
maximal processing time is equally powerful as 
knowing that all the jobs are equal; see [10]. 


Small Number of Machines 


For two, three, and sometimes four machines, it is 
possible to give an exact formula for the competi- 
tive ratio for any speed combination [2,3]. This is 
a fairly routine task which can be simplified (but 
not completely automated) using standard math- 
ematical software. Once the solution is known, 
verification amounts to checking the given primal 
and dual solutions for the linear program. 


Open Problems 


The main remaining open problem is to develop 
techniques for determining or bounding the over- 
all competitive ratio of the optimal algorithm 
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RatioStretch. In particular, it would be interest- 
ing to obtain a tight bound in the online case. 

It is also open if similar techniques can be used 
for the non-preemptive problem. In this case, 
the currently best algorithms were obtained by a 
doubling approach. This means that a competi- 
tive algorithm is designed for the case when the 
optimum is approximately known in advance, and 
then, without this knowledge, it is used in phases 
with geometrically increasing guesses of the op- 
timum. Such an approach probably cannot lead to 
an optimal algorithm for this type of scheduling 
problems. The best lower and upper bounds for 
non-preemptive scheduling on uniformly related 
machines are 2.438 and 5.828 for deterministic 
algorithms (see [1]) and 2 and 4.311 for random- 
ized algorithms (see [1,7]). Thus, it is still open 
whether randomized algorithms are better than 
deterministic. 
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Problem Definition 


With the ever-increasing reach of the Internet, 
crowdsourcing contests have become an increas- 
ingly convenient alternative for completing tasks, 
compared to traditional hire-and-pay methods. 
There are several websites dedicated to providing 
users a platform for creating their own crowd- 
sourcing contests. For instance, Taskcn.com al- 
lows users to post tasks, collect submissions from 
registered users, and provide a monetary reward 
to the best submission. The reach of crowdsourc- 
ing is far beyond tedious/labor-intensive tasks. 
Netflix, for instance, issued a million-dollar con- 
test for developing a collaborative filtering algo- 
rithm to predict user ratings for films, instead 
of hiring an in-house research team to develop 
this. The Indian Government used a crowdsourc- 
ing contest to pick a new symbol for its rupee 
currency. 


Optimal Crowdsourcing Contests 


The Questions 

In designing a crowdsourcing contest, a principal, 
with a preallocated sum of monetary reward in 
hand, seeks to identify the format of the contest 
that optimizes the quality of the best submission. 
For instance, Topcoder.com issues 2/3 of the re- 
ward to the best submission and 1/3 of the reward 
to the second-best submission. Is this the format 
best suited to optimize the best submission? Or 
should the entire award be given to the winner? 
More generally, should the precise division of 
rewards be even announced prior to the contest, 
or should they be announced only as a function 
of the quality of the submissions received? In a 
different direction, crowdsourcing contests make 
several people to expend efforts in producing 
submissions, but often only the best submission is 
put to use. How much effort is getting “burnt” in 
this process compared to conventional hire-and- 
pay? 


The Model 
Formally, let there be 1 contestants, and let the 
monetary reward be normalized to $1. Contes- 
tants enter their submissions which are ranked 
according to their qualities. Agent i’s submission 
quality p; is a function of their skill v; and their 
effort e;, given by p; = v; - e;. The skill v; can 
be interpreted as the rate at which agent 7 can 
do useful work. The contest designer can observe 
only the submission qualities p;’s and not the 
skills v;’s. However, the distribution F from 
which the v;’s are drawn (independently) is com- 
mon knowledge to all contestants and the contest 
designer. Every contestant’s goal is to maximize 
their utility, namely, their reward minus the effort 
they expended. If x; is the probability that agent 
i gets the reward, then their utility is given by 
Xj —e@j = Xi — a 

We model crowdsourcing contests as all-pay 
auctions, following the contest architecture liter- 
ature [4,6]. In an all-pay auction with n bidders, 
a seller auctions a good that bidder i values at 
v;. The value v; is private to bidder i, but the 
distribution F' from which v;’s are independently 
drawn is common knowledge to the seller and 
the bidders. The seller solicits sealed bids from 
the agents, and all bidders agree to pay their 
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bids regardless of which bidder gets the good 
(corresponding to all contestants losing their ef- 
fort irrespective of which contestant wins the 
contest). Which agent gets the good depends on 
the allocation rule of the auction. Given the rules 
of the auction, each bidder aims to maximize 
his utility. If x; is the probability that agent 7 
receives the good, then agent i maximizes his 
utility of v;x; — p;. Note that this utility is 
precisely v; times the utility of a contestant in 
the crowdsourcing contest defined in the previous 
paragraph. From agent i’s perspective, v; is just 
a constant. Thus, the incentives in the contest and 
the all-pay auction are identical. Thus, designing 
a contest to maximize the quality of the best 
submission, namely, maximize max; p;, is the 
same as designing an all-pay auction to maximize 
the maximum payment. Thus, we have an all- 
pay auction design problem where the objective 
is not the traditional one of maximizing revenue 
(>°; pi) but requires maximizing max; p;. 

We assume that the space of possible valua- 
tions V is an interval and the density f(-) of the 
value distribution is nonzero everywhere in this 
interval. 


Bayesian Nash Equilibrium 

In an all-pay auction it is not strategic for an agent 
to bid his true value v: the probability he wins 
the good is at most 1 (in which case he gets a 
value v), where he is sure to lose his bid. Thus, 
agents submit bids smaller than their true value. 
An agent’s bidding function 5; (-) maps their true 
value to bids. A profile of bidding functions 
(b,(-),...,Dn(-)) is a Bayesian Nash equilibrium 
(BNE) if the bidding functions are mutual best 
responses, i.e., if values are drawn from F and 
other agents bid according to their bidding func- 
tions, agent i weakly prefers following his own 
bidding strategy b;(-) over submitting any other 
bid. For a given outcome x;(v), let x;(vj) = 
E,_, [x;(v)], and let p;(v;) = Ey_; [pi (v)]. 

We will appeal to the following result from [2] 
that shows that for most of the all-pay auctions 
that we discuss, there exists a unique Bayesian 
Nash equilibrium, and it is also symmetric. That 
is, in the unique BNE, all agents have the same 
bidding function. 
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Theorem 1 ({2]) Jn the all-pay auction param- 
eterized by a reserve price and a nonincreasing 
sequence of rewards a1,...,4n, where the agents 
whose bids meet the reserve are assigned to the 
rewards in decreasing order of bids, a symmetric 
BNE exists and is the unique equilibrium. 


Key Results 


We now present the key results concerning 
the design of optimal crowdsourcing contests 
(from [3]). 


Rank-Based-Reward Contests 

Consider the class of contests that predetermine 
the division of rewards into fractions a1,..., dy, 
st. 0,4; = 1, ag > apy, and a; > 0. That 
is, agents are ordered by submission qualities and 
the ith best submission receives a; fraction of the 
reward. In this notation, Topcoder’s contest will 
be a, = 2/3, az = 1/3, and a, = O fork > 2. 
The first key result is that if the goal is to maxi- 
mize the maximum payment, the optimal all-pay 
auction is to award the good completely to the 
highest bidder. In contest language, the optimal 
contest is a winner-takes-all contest. Note that by 
Theorem | this contest format has a unique BNE. 


Theorem 2 When the contestant skills are 
distributed i.i.d., the optimal rank-based-reward 
contest is a winner-takes-all contest. 


Optimal Symmetric Contest 

Is there an even better contest in the larger space 
of contests? Suppose we allow contests that an- 
nounce rewards as a function of agents’ submis- 
sion qualities, what is the optimal contest? We 
focus on the class of symmetric contests and opti- 
mize over their symmetric equilibria. For a large 
class of distributions, including distributions that 
satisfy the monotone hazard rate property (e.g., 
uniform, normal, exponential), the optimal auc- 
tion will turn out to have a unique equilibrium 
that is also symmetric. 


Theorem 3 When the contestant skills are 
distributed i.i.d. from a distribution that satisfies 
the monotone hazard rate condition, the optimal 
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symmetric contest is highest-submission-wins 
contest subject to a minimum submission quality. 


Proof (Sketch) We prove this result through an 
argument that mirrors Myerson’s revenue opti- 
mal auction argument [8]. Recall that in auc- 
tion theory terms, we have to prove that the 
optimal auction is a highest-bidder-wins auction 
subject to a minimum bid reserve. Writing out the 
expression for the expected maximum payment 
and using the characterization of BNE payments 
in terms of allocation, we realize that the ex- 
pected maximum payment is just the expected 
virtual welfare. That is, let #(-) be a distribution- 
dependent transformation that is applied to each 
agent’s value v; to obtain d(v;) = v; F(v;)""! _ 


Ere). The expected virtual welfare of an 


outcome is just E, [do O(vi) xi (v)]. If this is 
the quantity to maximize, it is immediate that 
the optimal outcome is to allocate completely to 
the agent with the highest virtual value subject 
to the highest virtual value being nonnegative. If 
the virtual value transformation were a strictly 
increasing function (whenever it is positive), the 
bidder with the highest value, and hence also the 
highest bid because of our focus on symmetric 
equilibria, will also be the bidder with the high- 
est virtual value. Thus the highest-bidder-wins 
auction subject to a minimum bid reserve will 
implement the desired outcome. Now, for the 
distribution-dependent transformation ¢(-) to be 
strictly increasing, it is enough for the distribution 
to satisfy the monotone hazard rate condition. 
Finally, this contest has a unique BNE from 
Theorem |. 


Theorem 4 For any setting with i.i.d. values, 
the optimal symmetric contest is defined by a 
minimum submission quality and a subset of 
submission qualities called forbidden qualities 
that has the following format: the contest solicits 
submissions and rounds them down to the nearest 
non-forbidden quality; it then distributes the re- 
ward equally among the highest submissions sub- 
ject to the submissions being above the minimum 
submission quality. 


Proof (Sketch) Continuing with the proof of 
Theorem 3, if the virtual value transformation 
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were not increasing, allocating to the highest 
virtual value is no more a BNE outcome. In 
this case, the transformation ¢ is “ironed” to 
obtain a nondecreasing function (-), such that 
the expected maximum payment is equal to the 
expected ironed virtual surplus. To optimize 
this quantity, the outcome should be to allocate 
completely to the agent with the highest ironed 
virtual value subject to it being nonnegative. 
In case of a tie, all agents with the highest 
ironed virtual value get equal allocations. Such an 
allocation will result in a discontinuous allocation 
function and hence a discontinuous payment 
function. That is, some payments are forbidden. 
Correspondingly to ensure that some bids are 
forbidden, the auction explicitly says that bids 
in certain regions will be rounded down so that 
no rational agent will bid inside that interval. 
This explains the format of the optimal contest 
specified in the theorem. 


Utilization Ratio of Crowdsourcing 

In a crowdsourcing contest, which is like an all- 
pay auction, every agent’s submission is col- 
lected, but only the best submission is used. In 
contrast, in conventional contracting, which is 
like first- or second-price auctions, only the win- 
ner makes any submission at all, and thus there 
is no underutilization. One way of measuring 
the amount of work that actually gets utilized in 
crowdsourcing as opposed to getting “burnt” is 
to study the ratio of the maximum payment and 
the sum of all payments in an all-pay auction. It 
turns out that the utilization ratio in a large class 
of contests is at least a 1/2. 


Theorem 5 In any _highest-submission-wins 
contest with a minimum submission quality, the 
quality of the best submission is at least half of the 
sum total of the qualities of all the submissions. 


Related Work 

Other objectives that have been studied in 
contest design include maximizing the sum of 
submission qualities instead of the maximum 
submission quality [5-7] and maximizing the 
sum of submission qualities less the normalized 
reward [1]. The rank-based-reward result in 
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Theorem 2 is quite robust and continues to hold in 
many of these other models as well. Moldovanu 
and Sela [7] study multi-round contests and show 
that there are situations where it is better to split 
contestants into two divisions and to have a final 
among the divisional winners. DiPalantino and 
Vojnovic [4] study crowdsourcing websites as a 
matching market. Yang et al. [9] and DiPalantino 
and Vojnovic [4] study contestant behavior from 
contest website Taskcn.com and observe that 
experienced contestants strategize well. 


Open Problems 


Multi-round Contests 

The optimality result discussed here is restricted 
to single-round contests. If one were allowed 
to do a tournament-style multi-round contest, 
what is the optimal contest in this large class 
of contests? How significant is the difference in 
objective value when one is allowed to organize 
more than one round of contest? How does the 
objective value grow with the number of rounds? 
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Problem Definition 


The Byzantine agreement problem (BA) is con- 
cerned with multiple processors (parties, “play- 
ers’) all starting with some initial value, agree- 
ing on a common value, despite the possible 
disruptive or even malicious behavior of some 
them. BA is a fundamental problem in fault- 
tolerant distributed computing and secure multi- 
party computation. 

The problem was introduced by Pease, 
Shostak and Lamport in [17], who showed that 
the number of faulty processors must be less 
than a third of the total number of processors 
for the problem to have a solution. They also 
presented a protocol matching this bound, which 
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requires a number of communication rounds 
proportional to the number of faulty processors — 
exactly ¢ + 1, where ¢ is the number of faulty 
processors. Fischer and Lynch [10] later showed 
that this number of rounds is necessary in 
the worst-case run of any deterministic BA 
protocol. Furthermore, the above assumes that 
communication takes place in synchronous 
rounds. Fischer, Lynch and Patterson [11] proved 
that no completely asynchronous BA protocol 
can tolerate even a single processor with the 
simplest form of misbehavior — namely, ceasing 
to function at an arbitrary point during the 
execution of the protocol (“crashing’’). 

To circumvent the above-mentioned lower 
bound on the number of communication rounds 
and impossibility result, respectively, researchers 
beginning with Ben-Or [1] and Rabin [18], and 
followed by many others (e.g., [3, 5]) explored 
the use of randomization. In particular, Rabin 
showed that linearly resilient BA protocols 
in expected constant rounds were possible, 
provided that all the parties have access to 
a “common coin” (i.e., a common source of 
randomness) — essentially, the value of the coin 
can be adopted by the non-faulty processors 
in case disagreement at any given round is 
detected, a process that is repeated multiple 
times. This line of research culminated in the 
unconditional (or information-theoretic) setting 
with the work of Feldman and Micali [9], 
who showed an efficient (i.e., polynomial-time) 
probabilistic BA protocol tolerating the maximal 
number of faulty processors (Karlin and Yao, 
Probabilistic lower bounds for the byzantine 
generals problem, unpublished manuscript 
showed that the maximum number of faulty 
processors for probabilistic BA is also t < 4, 
where 7 is the total number of processors.) that 
runs in expected constant number of rounds. The 
main achievement of the Feldman—Micali work 
is to show how to obtain a shared random coin 
with constant success probability in the presence 
of the maximum allowed number of misbehaving 
parties “from scratch’’. 

Randomization has also been applied to BA 
protocols in the computational (or cryptographic) 
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setting and for weaker failure models. See [6] for 
an early survey on the subject. 


Notations 

Consider a set P = {P1, P2,--- , Py} of proces- 
sors (probabilistic polynomial-time Turing ma- 
chines) out of which ft, t < n may not follow the 
protocol, and even collude and behave in arbitrary 
ways. These processors are called faulty; it is 
useful to model the faulty processors as being 
coordinated by an adversary, sometimes called 
a t-adversary. 

For | <i <n, let b;, b; € {0, 1} denote party 
P;’s initial value. The work of Feldman and Mi- 
cali considers the problem of designing a proba- 
bilistic BA protocol in the model defined below. 


System Model 

The processors are assumed to be connected by 
point-to-point private channels. Such a network 
is assumed to be synchronous, i.e., the processors 
have access to a global clock, and thus the com- 
putation of all processors can proceed in a lock- 
step fashion. It is customary to divide the com- 
putation of a synchronous network into rounds. 
In each round, processors send messages, receive 
messages, and perform some local computation. 

The f-adversary is computationally un- 
bounded, adaptive (i.e., it chooses which 
processors to corrupt on the fly), and decides 
on the messages the faulty processors send 
in a round depending on the messages sent 
by the non-faulty processors in all previous 
rounds, including the current round (this is called 
a rushing adversary). 

Given the model above, the goal is to solve the 
problem stated in the » Byzantine Agreement; 
that is, for every set of inputs and any behavior 
of the faulty processors, to have the non-faulty 
processors output a common value, subject to 
the additional condition that if they all start the 
computation with the same initial value, then 
that should be the output value. The difference 
with respect to the other entry is that, thanks to 
randomization, BA protocols here run in expected 
constant rounds. 
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Problem 1 (BA) 

INPUT: Each processor P;, 1 <i <n, has bit bj. 
OUTPUT: Eventually, each processor Pj, 
1 <i <n, outputs bit d; satisfying the following 
two conditions: 


e Agreement: For any two non-faulty processors 
P; and P;, di = dj. 

° Validity: Ifb; = b; = b for all non-faulty pro- 
cessors P; and P;, then d; = b for all non- 
faulty processors P;. 


In the above definition input and output values 
are from {0, 1}. This is without loss of generality, 
since there is a simple two-round transformation 
that reduces a multi-valued agreement problem to 
the binary problem [19]. 


Key Results 


Theorem 1 Let t <%. Then there exists 
a polynomial-time BA protocol running in 
expected constant number of rounds. 


The number of rounds of the Feldman—Micali 
BA protocol is expected constant, but there is 
no bound in the worst case; that is, for every 
r, the probability that the protocol proceeds for 
more than r rounds is very small, yet greater 
than 0 — in fact, equal to gO): Further, the 
non-faulty processors may not terminate in the 
same round. (Indeed, it was shown by Dwork and 
Moses [7] that at least ¢ + 1 rounds are necessary 
for simultaneous termination. In [13], Goldreich 
and Petrank combine “the best of both worlds” by 
showing a BA protocol running in expected con- 
stant number of rounds which always terminates 
within tf + O(log tf) rounds.) 

The Feldman—Micali BA protocol assumes 
synchronous rounds. As mentioned above, one 
of the motivations for the use of randomization 
was to overcome the impossibility result due 
to Fischer, Lynch and Paterson [11] of BA in 
asynchronous networks, where there is no global 
clock, and the adversary is also allowed to sched- 
ule the arrival time of a message sent to a non- 
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faulty processor (of course, faulty processors may 
not send any message(s)). In [8], Feldman men- 
tions that the Feldman—Micali BA protocol can 
be modified to work on asynchronous networks, 
at the expense of tolerating t < 7 faults. In [4], 
Canetti and Rabin present a probabilistic asyn- 
chronous BA protocol for t < 3 that differs from 
the Feldman—Micali approach in that it is a Las 
Vegas protocol — 1.e., it has non-terminating runs, 
but when it terminates, it does so in constant 
expected rounds. 


Applications 


There exists a one-to-one correspondence, 
possibility- and impossibility-wise between BA 
in the unconditional setting as defined above 
and a formulation of the problem called the 
“Byzantine generals” by Lamport, Shostak and 
Pease [15], where there is a distinguished source 
among the parties sending a value, call it b,, and 
the rest of the parties having to agree on it. The 
Agreement condition remains unchanged; the 
Validity condition becomes 


¢ VALIDITY: If the source is non-faulty, then 
d; = bs for all non-faulty processors P;. 


A protocol for this version of the problem realizes 
a functionality called a “broadcast channel” on 
a network with only point-to-point connectivity. 
Such a tool is very useful in the context of 
cryptographic protocols and secure multi-party 
computation [12]. Probabilistic BA is particularly 
relevant here, since it provides a constant- 
round implementation of the functionality. In 
this respect, without any optimizations, the 
reported actual number of expected rounds of 
the Feldman—Micali BA protocol is at most 
57. Recently, Katz and Koo [14] presented 
a probabilistic BA protocol with an expected 
number of rounds at most 23. 

BA has many other applications. Refer to the 

Byzantine Agreement, as well as to, e.g., [16] 
for further discussion of other application areas. 
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Problem Definition 


The classical stable marriage problem (SM), first 
studied by Gale and Shapley [5], is introduced in 

Stable Marriage. An instance of SM comprises 
a set M = {m,...,my} of n men and a set 
W = {wj,...,Wn} of nm women and for each 
person a preference list, which is a total order 
over the members of the opposite sex. A man’s 
(respectively woman’s) preference list indicates 
his (respectively her) strict order of preference 
over the women (respectively men). A matching 
M is a set of m man-woman pairs in which each 
person appears exactly once. If the pair (7, w) is 
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in the matching M, then m and w are partners 
in M, denoted by w = M(m) andm = M(w). 
Matching M is stable if there is no man m and 
woman w such that m prefers w to M(m) and w 
prefers m to M(w). 

The key result established in [5] is that at 
least one stable matching exists for every instance 
of SM. In general, there may be many stable 
matchings, so the question arises as to what is 
an appropriate definition for the “best” stable 
matching and how such a matching may be found. 

Gale and Shapley described an algorithm to 
find a stable matching for a given instance of 
SM. This algorithm may be applied either from 
the men’s side or from the women’s side. In the 
former case, it finds the so-called man-optimal 
stable matching, in which each man has the 
best partner, and each woman the worst partner, 
that is possible in any stable matching. In the 
latter case, the woman-optimal stable matching 
is found, in which these properties are inter- 
changed. For some instances of SM, the man- 
optimal and woman-optimal stable matchings co- 
incide, in which case this is the unique stable 
matching. In general, however, there may be 
many other stable matchings between these two 
extremes. Knuth [13] was first to show that the 
number of stable matchings can grow exponen- 
tially with n. 

Because of the imbalance inherent, in general, 
in the man-optimal and woman-optimal solu- 
tions, several other notions of optimality in SM 
have been proposed. 

A stable matching M is egalitarian if the sum 


SY r(m;, M(mj)) + Yr (wj, Mwy) 
i j 
is minimized over all stable matchings, where 
r(m,w) represents the rank, or position, of w 
in m’s preference list and similarly for r(w, m). 
An egalitarian stable matching incorporates an 
optimality criterion that does not overtly favor the 
members of one sex — though it is easy to con- 
struct instances having many stable matchings in 
which the unique egalitarian stable matching is in 
fact the man (or woman) optimal. 
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A stable matching M is minimum regret if 
the value max(r(p, M(:p))) is minimized over all 
stable matchings, where the maximum is taken 
over all persons p. A minimum-regret stable 
matching involves an optimality criterion based 
on the least happy member of the society, but 
again, minimum regret can coincide with man 
optimal or woman optimal in some cases, even 
when there are many stable matchings. 

A stable matching is rank maximal (or lexico- 
graphically maximal) if, among all stable match- 
ings, the largest number of people have their first 
choice partner and, subject to that, the largest 
number have their second choice partner and so 
on. 

A stable matching M is sex equal if the differ- 
ence 


Yr(m;, M(mj)) - Yo r(w;, M(wj)) 


I J 


is minimized over all stable matchings. This def- 
inition is an explicit attempt to ensure that one 
sex is treated no more favorably than the other, 
subject to the overriding criterion of stability. 

In the weighted stable marriage problem 
(WSM), each person has, as before, a strictly 
ordered preference list, but the entries in this 
list have associated costs or weights — wt(m, w) 
represents the weight associated with woman 
w in the preference list of man m and likewise 
for wt(w,m). It is assumed that the weights are 
strictly increasing along each preference list. 

A stable matching M in an instance of WSM 
is optimal if 


S > wen, M(m;)) + So wtow;, Mw;)) 
i J 


is minimized over all stable matchings. 
A stable matching M in an instance of WSM 
is balanced if 


max S > we(mi, M(m;)), » wt(w;, M(w;)) 
i J 
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is minimized over all stable matchings. 

These same forms of optimality may be de- 
fined in the more general context of the sta- 
ble marriage problem with incomplete preference 
lists (SMI); see » Stable Marriage for a formal 
definition of this problem. 

Again as described in » Stable Marriage, the 
stable roommates problem (SR) is a non-bipartite 
generalization of SM, also introduced by Gale 
and Shapley [5]. In contrast to SM, an instance 
of SR may or may not admit a stable matching. 
Irving [9] gave the first polynomial-time algo- 
rithm to determine whether an SR instance admits 
a stable matching and if so to find one such 
matching. 

There is no concept of man or woman optimal 
in the SR context, and nor is there any analogue 
of sex-equal or balanced matching. However, the 
other forms of optimality introduced above can 
be defined also for instances of SR and WSR 
(weighted stable roommates). 

A comprehensive treatment of many aspects of 
the stable marriage problem, as of 1989, appears 
in the monograph of Gusfield and Irving [8], and 
a more recent detailed exposition is given by 
Manlove [14]. 


Key Results 


The key to providing efficient algorithms for the 
various kinds of optimal stable matching is an 
understanding of the algebraic structure underly- 
ing an SM instance and the discovery of methods 
to exploit this structure. Knuth [13] attributes to 
Conway the observation that the set of stable 
matchings for an SM instance forms a distribu- 
tive lattice under a natural dominance relation. 
Irving and Leather [10] characterized this lattice 
in terms of the so-called rotations — essentially 
minimal differences between lattice elements — 
which can be efficiently computed directly from 
the preference lists. The rotations form a natural 
partial order, the rotation poset, and there is a 
one-to-one correspondence between the stable 
matchings and the closed subsets of the rotation 
poset. 
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Building on these structural results, Gusfield 
[6] gave a O(n?) algorithm to find a Minimum- 
regret stable matching, improving an earlier 
O(n*) algorithm described by Knuth [13] and 
attributed to Selkow. Irving et al. [11] showed 
how the application of network flow methods to 
the rotation poset yields efficient algorithms for 
egalitarian and rank-maximal stable matchings as 
well as for an optimal stable matching in WSM. 
These algorithms have complexities O(n‘), 
O(n? log n log n) and O(n* log n), respectively. 
Subsequently, by using an interpretation of 
a stable marriage instance as an instance of 
2-SAT and with the aid of a faster network 
flow algorithm exploiting the special structure 
of networks representing SM instances, Feder 
[3, 4] reduced these complexities to O(n), 
O(n35), and O(min(n, VK)n? log(K /n? + 2)), 
respectively, where K is the weight of an optimal 
solution. 

By way of contrast, and perhaps surprisingly, 
the problems of finding a sex-equal stable match- 
ing for an instance of SM and of finding a 
balanced stable matching for an instance of WSM 
have been shown to be NP-hard [2, 12]. 

The following theorem summarizes the cur- 
rent state of knowledge regarding the various 
flavors of optimal stable matching in SM and 
WSM. 


Theorem 1 For an instance of SM: 


1. A minimum-regret stable matching can be 
found in O(n?) time. 

2. An egalitarian stable matching can be found 
in O(n?>) time. 

3. A rank-maximal stable matching can be found 
in O(n>>) time. 

4. The problem of finding a sex-equal stable 
matching is NP-hard. 


For an instance of WSM: 
1. An optimal stable matching can be found in 


O(min(n, V’K)n? log(K /n2+2)) time, where 
K is the weight of an optimal solution. 
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2: 


The problem of finding a balanced stable 
matching is NP-hard, but can be approximated 
within a factor of 2 in O(n?) time. 


Among related problems that can also be solved 
efficiently by exploitation of the rotation structure 
of an instance of SM are the following [6]: 


All stable pairs, i.e., pairs that belong to at 
least one stable matching, can be found in 
O(n?) time. 

All stable matchings can be enumerated in 
O(n? + kn) time, where k is the number of 
such matchings. 


Results analogous to those of Theorem | are 
known for the more general SMI problem. In 
the case of the stable roommates problem (SR), 
some of these problems appear to be harder, as 
summarized in the following theorem. 


Theorem 2 For an instance of SR: 


1. 


2. 


A minimum-regret stable matching can be 
found in O(n?) time [7]. 

The problem of finding an egalitarian stable 
matching is NP-hard. It can be approximated 
in polynomial time within a factor of a if and 
only if minimum vertex cover can be approxi- 
mated within a [1, 2]. 


For an instance of WSR (weighted stable room- 
mates): 


1. 


The problem of finding an optimal stable 
matching is NP-hard, but can be approximated 
within a factor of 2 in O(n?) time [3]. 


Applications 


The best known and most important applications 
of stable matching algorithms are in centralized 
matching schemes in the medical and educa- 


tional domains. 


Hospitals/Residents Problem 


includes a summary of some of these applica- 
tions. 


O 


1473 


pen Problems 


There remains the possibility of improving the 
complexity bounds for some of the optimiza- 
tion problems discussed and for finding better 
polynomial-time approximation algorithms for 
the various NP-hard problems. 
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Problem Definition 


Let S be a set of n points or vertices in R?. 
An edge is a closed line segment connecting two 
points. Let F be a collection of edges determined 
by vertices of S. The graph G = (S, E) is a plane 
geometric graph if (i) no edge contains a vertex 
other than its endpoints, that is, ab N S = {a,b} 
for every edge ab € E, and (ii) no two edges 
cross, that is, ab N cd € {a,b} for every two 
edges ab # cd in E. A triangulation of S 
is a plane geometric graph T = (S,£) with 
E being maximal. Here maximality means that 
edges in EF bound the convex hull of S, i.e., the 
smallest convex set in R? that contains S, and 
subdivide its interior into disjoint faces bounded 
by triangles. 

A plane geometric graph G = (S,£F) can 
be augmented with an edge set E’ so that it 
is a triangulation T = (S,E U E’), referred 
to as a triangulation of G. In this case, E is 
the set of constraining edges if it is not empty. 
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Some triangulations of G are classified as optimal 
triangulations according to various shape criteria. 
Many of these criteria are defined as max-min, 
short for maximizing the minimum, or min-max, 
short for minimizing the maximum. The first 
quantifier is over all possible triangulations of 
G while the second one is over all measures 
(e.g., angles) jz of triangles of a triangulation. For 
example, in the case of a min-max yp criterion, 
we define the measure of a triangulation A as 
(A) = max{(t) : fis a triangle of A}. If A 
and BG are two triangulations of G, then G is called 
an improvement of A if either u(B) < (A) 
or (6) = (A) and the set of triangles ¢ 
of B with w(t) = p(B) is a proper subset of 
that of A. Triangulation A is optimal for A, i.e., 
a min-max j triangulation of G, if there exists 
no improvement of A. Hence, the computational 
problem addressed here is to find a specific op- 
timal triangulation for a given plane geometric 
graph G. 


Key Results 


There are a few algorithmic paradigms or ap- 
proaches to solve the optimal triangulation prob- 
lems in R?. 


The Edge-Flip Approach 

The most notable one is the edge-flip ap- 
proach [11] to solve the max-min angle 
triangulation problem of a point set S. Given 
a triangulation A of G = (S,@), edge-flip is a 
local optimization method that operates on two 
adjacent triangles whose union forms a convex 
polygon. It replaces (or flips) the edge bd shared 
by triangles abd and cdb with the edge ac 
when the smallest angle of these triangles is 
smaller than that of abc and acd. In effect, 
an edge-flip replaces two existing triangles 
with two new triangles to (possibly) obtain an 
improvement of A. By repeating the edge-flip 
until no such an edge bd exists, the algorithm 
produces a specific max-min angle triangulation 
of S, known as the Delaunay triangulation, in 
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O(n?) time. Besides being a max-min angle 
triangulation [3], Delaunay triangulation is also 
the min-max circumscribed circle and the min- 
max smallest enclosing circle [2] triangulation. 
Note that other approaches exist to compute 
the Delaunay triangulation more efficiently in 
O(n log n) time [3]. 


The Edge-Insertion Approach 

The edge-insertion approach is considered as an 
extension of the edge-flip approach, to replace 
one or more edges in each operation. The basic 
idea is to iteratively improve a current triangu- 
lation A by an edge-insertion step which adds 
an appropriate, new edge say qs to A, deleting 
edges in A that cross qs and re-triangulating the 
resulting polygons to the left and the right of qs. 
In other words, the method starts by constructing 
an arbitrary triangulation A of G and then subse- 
quently applies the edge-insertion steps until no 
further improvement exists. Same as in the case 
of edge-flip, this does not work for all measures 
[4 as some may lead to suboptimal solutions. 
The approach is known to be applicable if the 
conditions of the so-called Cake-Cutting Lemma, 
which guarantees the existence of an improve- 
ment, are fulfilled; see [1,5] for details. The next 
theorem summarizes the results obtained by the 
edge-insertion approach. 


Theorem 1 For a plane geometric graph G = 
(S, E) ofn = |S| vertices: 


1. A min-max angle triangulation of G can be 
computed in time O(n? logn) and storage 
O(n). 

2. A max-min height triangulation of G can be 
computed in time O(n? logn) and storage 
O(n). 

3. A min-max eccentricity triangulation of G can 
be computed in time O(n?) and storage O(n”). 

4. A min-max slope triangulation of G can be 
computed in time O( n>) and storage O(n*). 


Let us go through those measures mentioned 
in the theorem. The height of a triangle is the 
minimum distance from a vertex to the opposite 
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edge. The eccentricity of a triangle is the infimum 
over all distances between the center of the cir- 
cumcircle of the triangle and points in its closure. 
To define the slope of a triangle, the triangulation 
is given as a 2D projection of a 2.5D piecewise- 
linear surface where each vertex of S has a third 
coordinate, and the slope of each triangle is its 
slope in R?. 


The Subgraph Approach 

The subgraph approach constructs a desired 
optimal triangulation by first computing a 
substructure of the optimal triangulation and then 
completes the computation by solving the smaller 
problems defined by the substructure. This 
approach works when (i) the substructure can 
be computed efficiently and (ii) the substructure 
subdivides the problem into smaller problems 
such as polygons that can be solved efficiently. 
For instance, the approach has_ successfully 
solved the min-max length triangulation problem 
using a substructure called relative neighborhood 
graph [4]. Here the length of a triangle is the 
length of its longest edge. 


Theorem 2 A min-max length triangulation of a 
set of n points in R? can be constructed in O(n?) 
time and storage. 


Note that the theorem is formulated with ref- 
erence to a set of 1 points instead of the general 
plane geometric graph. In fact, this theorem is 
valid for the latter provided the minimization 
condition is defined over all edges (of trian- 
gles) including those constraining edges. In both 
cases, the correctness of the theorem follows 
from the fact that every point set S in R? has 
a min-max length triangulation mlt(S) such that 
mg(S) U ch($) C mlt(S) where mg(S) is the 
relative neighborhood graph of S and ch(S) is 
the set of edges bounding the convex hull of S. 
Since mg(S) and ch($) can each be computed 
in O(7 logn) time, and rng(S) U ch(S) is a con- 
nected graph of S,, the min-max length triangula- 
tion problem can be solved by first constructing 
mg(S) U ch(S) and then computing an opti- 
mal triangulation within each polygon defined by 
edges of rng(S') U ch(S). The latter is solvable in 
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O(n?) time. Besides Euclidean metric, Theorem 
2 can be extended to general normed metrics as 
stated in the next theorem. 


Theorem 3 Let S be a set of n points in R? 
equipped with a normed metric. Given the 
relative neighborhood graph, a min-max length 
triangulation of S can be constructed in time 


O(n?). 


Examples of normed metrics are the ¢,- 
metrics, for p = 1,2,3,..., and the so-called 
A-metric used in VLSI applications. Note that the 
relative neighborhood graph under the £ )-metrics 
can be computed in O(n logn) time. As for the 
other normed metrics, a relative neighborhood 
graph can be constructed in time O(n>) with a 
trivial approach to test all (5) edges, each in time 
O(n). 

We note that min-max length is currently the 
only nontrivial length criterion known to be solv- 
able in polynomial time. The max-min length 
triangulation problem for an input point set is 
shown to be NP-complete, while the same prob- 
lem for a convex polygon is known to be solvable 
in linear time [6]. Another related problem is 
to find the minimum weight triangulation of G, 
where the weight of a triangulation is the sum of 
length of its edges. This problem is proven to be 
NP-hard [12]. 


Applications 


Triangulation is a prominent meshing method 
that decomposes a domain into a collection of tri- 
angles. Such decomposition is used in many areas 
of engineering and scientific applications such as 
physics simulation, visualization, approximation 
theory, numerical analysis, computer-aided geo- 
metric design, etc. It is desirable to obtain an opti- 
mal triangulation, often with respect to the angle, 
edge length, aspect ratio, etc., as the quality of the 
subsequent computation depends on the shapes of 
the triangles. Two popular techniques that greatly 
depend on such optimal triangulations are finite 
element analysis and surface interpolation; see, 
for example, the survey in [7]. 
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Open Problems 


There are a few other interesting measures one 
can define over a triangulation, such as area, 
aspect ratio, and vertex degree. The min-max 
area and max-min area triangulation problems for 
a point set are still open, though the special case 
of a convex polygon can be solved in polynomial 
time [10]. The problem to triangulate a plane 
geometric graph with degree at most seven is 
known to be NP-complete [8], and the min- 
max degree problem for an arbitrary biconnected 
plane geometric graph is also NP-complete [9]. 
Its general problem without any constraining 
edges is still open. 


URLs to Code and Data Sets 


A version of the edge-insertion approach was 
implemented by Roman Waupotitsch. It is known 
to be available at: ftp://ftp.ncsa.uiuc.edu/SGI/ 
MinMaxer/ 
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Problem Definition 


Find a minimal sum-of-products expression 
for a Boolean function. Consider a Boolean 
algebra with elements False and True. A Boolean 
function f(y1,y2,.--,¥n) of nm Boolean input 
variables specifies, for each combination of input 
variable values, the function’s value. It is possible 
to represent the same function with various 
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expressions. For example, the first and last 
expressions in Table 1 correspond to the same 
function. Assuming access to complemented 
input variables, straightforward implementations 
of these expressions would require two AND 
gates and an OR gate for (a A b) V (a \ bd) and 
only a wire for a. Although the implementation 
efficiency depends on target technology, in 
general terser expressions enable greater 
efficiency. Boolean minimization is the task of 
deriving the tersest expression for a function. 
Elegant and optimal algorithms exist for solving 
the variant of this problem in which the 
expression is limited to two levels, i.e., a layer 
of AND gates followed by a single OR gate or a 
layer of OR gates followed by a single AND gate. 


Key Results 


This survey will start by introducing the Kar- 
naugh Map visualization technique, which will 
be used to assist in the subsequent explanation 
of the Quine—McCluskey algorithm for two-level 
Boolean minimization. This algorithm is optimal 
for its constrained problem variant. It is one of the 
fundamental algorithms in the field of computer- 
aided design and forms the basis or inspiration 
for many solutions to more general variants of the 
Boolean minimization problem. 


Karnaugh Maps 

Karnaugh Maps [4] provide a method of visual- 
izing adjacency in Boolean space. A Karnaugh 
Map is a projection of an n-dimensional hyper- 
cube onto a two-dimensional surface such that 
adjacent points in the hypercube remain adjacent 
in the two-dimensional projection. Figure | illus- 
trates Karnaugh Maps of 1, 2, 3, and 4 variables: 
a,b,c,and d. 

A literal is a single appearance of a comple- 
mented or uncomplemented input variable in a 
Boolean expression. A product term or impli- 
cant is the Boolean product, or AND, of one or 
more literals. Every implicant corresponds to the 
repeated balanced bisection of Boolean space, 
or of the corresponding Karnaugh Map, ie., an 
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Optimal Two-Level Boolean Minimization, Table 1 Equivalent representations with different implementation 


complexities 
Expression Meaning in English Boolean Logic Identity 
a@AbvaAb not a and not b or not a and b Distributivity 
aA (bv b) Not a and either not b or b Complements 
aA True Not a and True Boundeness 
a not a 
00, 2 01 b 
— 
e a 
0 1 
a a 
00——O1 oii 10 11 2 13 
b 
d 
— 


10 |11 [15 [14 


8 9 |13 12 


Optimal Two-Level Boolean Minimization, Fig. 1 Boolean function spaces from one to four dimensions and their 


corresponding Karnaugh Maps 


implicant is a rectangle in a Karnaugh Map with 
width m and height n where m = 2/ andn = 2* 
for arbitrary nonnegative integers j and k, e.g., 
the ovals in Fig. 2(ii—v). An elementary implicant 
is an implicant in which, for each variable of 
the corresponding function, the variable or its 
complement appears, e.g., the circles in Fig. 2(ii). 
Implicant A covers implicant B if every elemen- 
tary implicant in B is also in A. 

Prime implicants are implicants that are not 
covered by any other implicants, e.g., the ovals 
and circle in Fig. 2(iv). It is unnecessary to con- 
sider anything but prime implicants when seek- 
ing a minimal function representation because, 
if non-prime implicants could be used to cover 
some set of elementary implicants, there is guar- 
anteed to exist a prime implicant that covers those 
elementary implicants and contains fewer literals. 
One can draw the largest implicants covering 
each elementary implicant and covering no posi- 
tions for which the function is False, thereby us- 
ing Karnaugh Maps to identify prime implicants. 
One can then manually seek a compact subset of 


prime implicants covering all elementary impli- 
cants in the function. 

This Karnaugh Map-based approach is effec- 
tive for functions with few inputs, i.e., those 
with low dimensionality. However, representing 
and manipulating Karnaugh Maps for functions 
of many variables is challenging. Moreover, the 
Karnaugh Map method provides no clear set of 
rules to follow when selecting a minimal subset 
of prime implicants to implement a function. 


The Quine-McCluskey Algorithm 
The Quine—McCluskey algorithm provides a 
formal, optimal way of solving the two-level 
Boolean minimization problem. W. V. Quine laid 
the essential theoretical groundwork for optimal 
two-level logic minimization [7,8]. However, E. 
J. McCluskey first proposed a precise algorithm 
to fully automate the process [6]. Both are built 
upon the ideas of M. Karnaugh [4]. 

The Quine—McCluskey method has two 
phases: (1) produce all prime implicants and 
(2) select a minimal subset of prime implicants 
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Re) z e 
mek 


(i) b (ii) 
d 
o* 1 0 5 0 4 0 (x) 
C 2 1 3 J 7 0 6 0 C (1) 
O|;1)1 |x 0 
a 10 11 15 14 a 10 
O0O;O0;1]1 0) 
8 9 13 12 8 
ae 
(iv) a 
Cc 
a 


Optimal Two-Level Boolean Minimization, Fig. 2 (i) Karnaugh Map of function f(a, b,c, d), (ii) elementary 
implicants, (iii) second-order implicants, (iv) prime implicants, and (v) a minimal cover 


covering the function. In the first phase, the 
elementary implicants of a function are iteratively 
combined to produce implicants with fewer 
literals. Eventually, all prime implicants are 
thus produced. In the second phase, a minimal 
subset of prime implicants covering the on-set 
elementary implicants is selected using unate 
covering [5]. 

The Quine—McCluskey method may be illus- 
trated using an example. Consider the function 
indicated by the Karnaugh Map in Fig. 2(i) and 
the truth table in Table 2. For each combination 
of Boolean input variable values, the function 
f(a, b,c, d) is required to output a 0 (False), a 
1 (True), or has no requirements. The lack of 
requirements is indicated with an X, or don’t-care 
symbol. 

Expanding implicants as much as possible will 
ultimately produce the prime implicants. To do 
this, combine on-set and don’t-care elementary 
implicants using the reduction theorem (ab v 
ab b) shown in Table 1. The elementary 
implicants are circled in Fig. 2(i1) and listed in the 
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Truth table of function f(a, b,c, d) 


Elementary — Function 

implicant value Elementary — Function 
(a,b,c,d) (a,b,c,d) — implicant value 
0000 x 1000 0 
0001 0 1001 0 
0010 1 1010 0 
0011 1 1011 1 
0100 0 1100 1 
0101 0 1101 1 
0110 0 1110 x 
0111 0 1111 0 


second column of Table 3. In this figure, Os indi- 
cate complemented variables, and 1s indicate un- 
complemented variables, e.g., 0010 corresponds 
to abcd. It is necessary to determine all possible 
combinations of implicants. It is impossible to 
combine nonadjacent implicants, i.e., those that 
differ in more than one variable. Therefore, it 
is not necessary to consider combining any pair 
of implicants with a number of uncomplemented 
variables differing by any value other than 1. This 
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Optimal Two-Level Boolean Minimization, Table 3 
Identifying prime implicants 


Elementary 


Number of Implicant Second-order Third-order 


ones (a,b,c,d) Implicant Implicant 
0 0000 V 00X0O 
1 0010 V OO1Lx 
0011 V X011 
2 1100Y¥ ss 110X ¥ 11Xx 
11X0 V 
1011 V 1X11 
3 1101 V 11X1V 
1110V 111XV 
4 1ll1lV 


fact can be exploited by organizing the implicants 
based on the number of ones they contain, as indi- 
cated by the first column in Table 3. All possible 
combinations of implicants in adjacent subsets 
are considered. For example, consider combining 
0010 with 0011, which results in 001X or abc, 
and also consider combining 0010 with 1100, 
which is impossible due to differences in more 
than one variable. Whenever an implicant is suc- 
cessfully merged, it is marked. These marked im- 
plicants are clearly not prime implicants because 
the implicants they produced cover them and con- 
tain fewer literals. Note that marked implicants 
should still be used for subsequent combinations. 
The merged implicants in the third column of 
Table 3 correspond to those depicted in Fig. 2(iii). 

After all combinations of elementary impli- 
cants have been considered, and successful com- 
binations listed in the third column, this process is 
repeated on the second-order merged implicants 
in the third column, producing the implicants 
in the fourth column. Implicants that contain 
don’t-care marks in different locations may not 
be combined. This process is repeated until a 
column yielding no combinations is arrived at. 
The unmarked implicants in Table 3 are the prime 
implicants, which correspond to the implicants 
depicted in Fig. 2(iv). 

After a function’s prime implicants have been 
identified, it is necessary to select a minimal 
subset that covers the function. The problem can 


Optimal Two-Level Boolean Minimization 


Optimal Two-Level Boolean Minimization, Table 4 
Solving unate covering problem to select minimal cover 


Requirements ee eT 
Resources (prime implicants) 
(elementary 


implicants) |O0OXO O01X xX011 1X11 
0010 Vv Vv 

0011 Vv Vv 

1011 Vv Vv 
1100 Vv 
1101 Vv 
1111 Vv Vv 


11Xx 


be formulated as unate covering. As shown in 
Table 4, label each column of a table with a prime 
implicant; these are resources that may be used 
to fulfill the requirements of the function. Label 
each row with an elementary implicant from the 
on-set; these rows correspond to requirements. 
Do not add rows for don’t cares. Don’t cares im- 
pose no requirements, although they were useful 
in simplifying prime implicants. Mark each row— 
column intersection for which the elementary 
implicant corresponding to the row is covered by 
the prime implicant corresponding to the column. 
If a column is selected, all the rows for which 
the column contains marks are covered, 1.e., those 
requirements are satisfied. The goal is to cover 
all rows with a minimal-cost subset of columns. 
McCluskey defined minimal cost as having a 
minimal number of prime implicants, with ties 
broken by selecting the prime implicants contain- 
ing the fewest literals. The most appropriate cost 
function depends on the implementation tech- 
nology. One can also use a similar formulation 
with other cost functions, e.g., minimize the total 
number of literals by labeling each column with 
a cost corresponding to the number of literals in 
the corresponding prime implicant. 

One can use a number of heuristics to acceler- 
ate solution of the unate covering problem, e.g., 
neglect rows that have a superset of the marks of 
any other row, for they will be implicitly covered 
and neglect columns that have a subset of the 
marks of any other column if their costs are as 
high, for the other column is at least as useful. 
One can easily select columns as long as there 
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exists a row with only one mark because the 
marked column is required for a valid solution. 
However, there exist problem instances in which 
each row contains multiple two marks. In the 
worst case, the best existing algorithms are re- 
quired to make tentative decisions, determine the 
consequences, and then backtrack and evaluate 
alternative decisions. 

The unate covering problem appears in many 
applications. It is \VP-complete [5], even for 
the instances arising during two-level minimiza- 
tion [9]. Its use in the Quine—McCluskey method 
predates its categorization as an \/P-complete 
problem by 16 years. A detailed treatment of this 
problem would go well beyond the scope of this 
entry. However, Gimpel [3] as well as Coudert 
and Madre [2] provide good starting points for 
further reading. 

Some families of logic functions have opti- 
mal two-level representations that grow in size 
exponentially in the number of inputs, but have 
more compact multilevel implementations. These 
families are frequently encountered in arithmetic, 
e.g., a function indicating whether the number of 
on inputs is odd. Efficient implementation of such 
functions requires manual design or multilevel 
minimization [1]. 


Applications 


Digital computers are composed of precisely two 
things: (1) implementations of Boolean logic 
functions and (2) memory elements. The Quine— 
McCluskey method is used to permit efficient 
implementation of Boolean logic functions in a 
wide range of digital logic devices, including 
computers. The Quine—McCluskey method 
served as a Starting point or inspiration for most 
currently used logic minimization algorithms. Its 
direct use is contradicted when functions are not 
amenable to efficient two-level implementation, 
e.g., many arithmetic functions. 
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Problem Definition 


The Orienteering problem and its variants are 
in the large class of vehicle routing problems, 
also containing the traveling salesperson problem 
(TSP), in which the goal is to find a short route 
that visits several potential destinations. Typi- 
cally, the input is represented by a graph G(V, E) 
with an associated length function 2: E > Rt, 
where each destination is a vertex v € V, and an 
edge e = (u, v) has length £(e) representing the 
distance between u and v or the time it takes to 
travel between them. Unlike TSP, where the goal 
is to find a short tour visiting all vertices, Orien- 
teering and its variants typically involve finding 
short walks that visit many vertices; having to 
choose the set of vertices to visit adds additional 
complexity to the problem. 

In Orienteering, we are given a bound on the 
maximum length of the walk (also referred to 
as a budget), and the goal is to visit as many 
vertices as possible. A closely related problem is 
k-Stroll; here, we are given an integer k, and the 
goal is to find a walk that is as short as possible, 
subject to visiting at least k vertices. For both 
these problems, the walks are allowed to traverse 
an edge multiple times; the length of a walk W is 
Yoeew f(e). Hence, w.l.o.g., one can assume that 
the input graph is complete (by working with its 
metric completion) or, equivalently, that the input 
is represented by a metric. 

We focus mainly on the “point-to-point” ver- 
sions of these problems, in which the start and 
end vertices of the walk are also specified; here, 
the goal is to find a short walk from the specified 
start vertex to the specified end vertex that visits 
many other vertices. One can also consider the 
variant in which only the start vertex is specified 
(and the algorithm can choose where to end the 
walk) or the one in which neither the start nor 
the end vertex is specified. These are referred to 
as the “rooted” and “unrooted” variants, respec- 
tively. We define the problems formally below. 


Problem 1 (Orienteering) 


INPUT: Graph G(V,EF), with an associated 
length function @:E —> Rt, start and 
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end vertices s,t € V, and a budget/length 
bound L. 
OuTPUT: An s-t walk of total length at most L. 
OBJECTIVE: Maximize the number of distinct 
vertices in the walk. 


Problem 2 (k-Stroll) 

INPUT: Graph G(V,£), with an associated 
length function £:E —> Rt, start and end 
vertices s,¢ € V, and an integer k. 

OUTPUT: An s-t walk containing at least k dis- 
tinct vertices. 

OBJECTIVE: Minimize the total length of the 
walk. 


Orienteering and k-Stroll are “dual” problems. 
They are equivalent in terms of exact solvability; 
a polynomial-time optimal algorithm for one can 
be used to obtain a polynomial-time optimal 
algorithm for the other. However, this is not 
true from the standpoint of approximability; an 
a-approximation for one does not immediately 
imply an @-approximation for the other. 

Orienteering with time windows (Orient-TW) 
is a generalization of Orienteering in which each 
vertex v has an associated time interval or win- 
dow [R(v), D(v)], and a vertex is considered 
“visited” (i.e., is counted toward the objective) 
only if the total length of the walk from the start 
vertex up to v is in the range [R(v), D(v)]. (For 
intuition, if the length of an edge is interpreted 
as the time taken to traverse it, then the vertex 
is counted if the time at which it is visited falls 
within its time window.) R(v) and D(v) are re- 
ferred to as the release time and deadline of vertex 
v, respectively. A special case of this problem 
(sometimes called orienteering with deadlines or 
even deadline-TSP) is when R(v) = 0 for all 
vertices v. 


Problem 3 (Orienteering with Time Windows) 


INPUT: Graph G(V,£), with an associated 
length function £:E — Rt, start and end 
vertices s,t € V, a budget/length bound L, 
and a time interval [R(v), D(v)] for each 
vertexv eV. 

OUTPUT: An s-t walk of total length at most L. 

OBJECTIVE: Maximize the number of distinct 
vertices in the walk that are visited during 


Orienteering Problems 


their time intervals. A vertex is counted as 
visited only if the walk visits v at a time 
t € [R(v), D(v)]; we assume it takes @ units 
of time to cross an edge of length £. 


See [9] for an overview and applications of 
many vehicle routing problems related to Orien- 
teering and its variants. 


Key Results 


Orienteering is both NP-hard and APX-hard [3]; 
the same applies for k-Stroll, as a generalization 
of TSP. Therefore, we focus primarily on approx- 
imability. 


Undirected Graphs 

Arkin et al. [1] gave a 2-approximation for rooted 
Orienteering in the Euclidean plane. Chen and 
Har-Peled [7] improved this to a PTAS for the 
plane and higher-dimensional Euclidean metrics. 

For general undirected graphs (i.e., symmetric 
metrics), the first constant-factor approximation 
for rooted Orienteering was given by Blum et al. 
[3]. They obtained many of the key insights 
used in subsequent papers, reducing Orienteering 
to k-Stroll. Blum et al. [3] showed that an a- 
approximation to k-Stroll gives a 1 + [32 - $1- 
approximation for rooted Orienteering. This was 
improved by Bansal et al. [2] to 30 : 
for the harder point-to-point variant. Since there 
is a 2 + e-approximation for k-Stroll due to 
[4], this gives a 3-approximation for Orienteer- 
ing. Chekuri et al. [6] reduced Orienteering to 
a bicriteria version for k-Stroll; this gives the 
current best 2+ ¢-approximation for Orienteering, 
matching the ratio for k-Stroll. 

A key challenge in Orienteering is the hard 
constraint on the total length of the walk L. In 
particular, if the shortest path from the source s 
to the destination ¢ has length close to L, then 
even a small detour from the shortest path to visit 
a cluster of many vertices might result in reaching 
t after the deadline L. Roughly speaking, [3] 
shows that an optimum walk can be broken down 
into segments which are “monotonic,” meaning 
that they visit vertices in increasing order of their 
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(shortest path) distance from s, and segments 
which are “non-monotonic,” in which case the 
length of the segment is at least 3 times the 
shortest path between its endpoints. To find a 
good walk, one can use a dynamic program to 
“enumerate” all relevant segmentations of the 
optimal path; for each monotonic segment, one 
can find the optimum sub-path using dynamic 
programming. For the non-monotonic segments, 
one can “skip” the reward from some of them, 
which saves considerable distance since the de- 
tours taken by the path in such segments are 
large. This saving allows one to take a little extra 
distance to collect reward in the remaining non- 
monotonic segments (using an approximation al- 
gorithm for k-Stroll) while still keeping the total 
length of the walk at most L. 


Directed Graphs 

Orienteering is more challenging in directed 
graphs, or asymmetric metrics. The first poly- 
logarithmic approximation algorithms were due 
independently to [6, 8]; the former gave an 
O(log? n/loglogn)-approximation using an 
LP-based approach, while the latter gave an 
O(log” OPT)-approximation using combinatorial 
techniques. The ratio of [6] is better when OPT, 
the number of vertices visited by an optimal 
walk, is much less than n and is slightly worse 
otherwise; on the other hand, the LP of [8] is 
based on the well-known Held-Karp relaxation 
for asymmetric TSP, and a conjectured improved 
upper bound on the integrality gap of this 
relaxation would immediately give an improved 
approximation ratio for directed Orienteering. 
Both these papers also obtain poly-logarithmic 
approximations for the closely related problem- 
Directed k-TSP, which is the special case of 
k-Stroll when s = f. 

Chekuri and Pal [5] gave a quasi-polynomial- 
time O(logn)-approximation for directed Ori- 
enteering and several generalizations, including 
Orient-TW. This algorithm is based on repeatedly 
“guessing” the midpoint of sub-paths, and hence 
it does not appear easy to obtain a polynomial- 
time equivalent. 
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Orienteering with Time Windows 

Orient-TW in undirected graphs, with arbitrary 
distinct release times and deadlines for each ver- 
tex, was first studied by [2], which gave an 
O(log n)-approximation for the case with only 
deadlines (i.e, where R(v) = O for all v) 
and O(log? n) for the general problem. Chekuri 
et al. [6] later gave an O(max{log OPT, log R})- 
approximation for Orient-TW, where R denotes 
the ratio between the length of the longest and 
shortest time windows; when all time windows 
are polynomially bounded, this is an O(logn)- 
approximation. (For directed graphs, via [6, 8], 
we lose additional poly-logarithmic factors). 

The general approach taken by these papers, 
following [2], is to use combinatorial techniques 
to reduce the given instance of the problem to a 
collection of subproblems in which all vertices 
have identical or disjoint time windows. For a 
set of vertices with identical time windows, these 
windows can be effectively ignored, yielding an 
instance of Orienteering; walks for different sets 
with disjoint time windows can be combined 
using dynamic programming. Thus, Orient-TW 
can be reduced to Orienteering with the loss of 
logarithmic factors in the approximation ratio. 


Open Problems 


There are several natural open problems related 
to Orienteering. 


1. Is there a PTAS for Orienteering in undirected 
planar graphs? In recent years, PTASes have 
been obtained for many related problems, in- 
cluding TSP, STEINER TREE, and their prize- 
collecting versions, but extending these tech- 
niques to Orienteering and k-Stroll (or even 
the easier kK-MST problem) seems challeng- 
ing. 

2. Can one obtain an O(logn) or even O(1)- 
approximation for directed Orienteering? The 
quasi-polynomial-time approximation of [5] 
provides some evidence that it may be possi- 
ble. Can one obtain any poly-logarithmic ap- 
proximation for directed k-Stroll? Currently, 
only bicriteria approximations are known. 


3. 


Orthogonal Range Searching on Discrete Grids 


Is there an O(logn)-approximation for 
Orient-TW? 


Recommended Reading 


1. 


Arkin E, Mitchell J, Narasimhan G (1998) Resource- 
constrained geometric network optimization. In: Sym- 
posium on computational geometry, Minneapolis, 
pp 307-316 


. Bansal N, Blum A, Chawla S, Meyerson A (2004) 


Approximation algorithms for deadline-TSP and vehi- 
cle routing with time-windows. In: Proceedings of the 
36th annual ACM symposium on theory of computing, 
Chicago. ACM, New York, pp 166-174 


. Blum A, Chawla S, Karger D, Lane T, Meyerson A, 


Minkoff M (2007) Approximation algorithms for ori- 
enteering and discounted-reward TSP. SIAM J Comput 
37(2):653-670 


. Chaudhuri K, Godfrey B, Rao S, Talwar K (2003) 


Paths, trees, and minimum latency tours. In: 44th an- 
nual symposium on foundations of computer science, 
Cambridge. IEEE Computer Society, pp 36-45 


. Chekuri C, Pal M (2005) A recursive greedy algorithm 


for walks in directed graphs. In: Proceedings of the 
46th annual symposium on foundations of computer 
science, Pittsburgh. IEEE Computer Society, pp 245- 
253 


. Chekuri C, Korula N, Pal M (2012) Improved algo- 


rithms for orienteering and related problems. ACM 
Trans Algorithms (TALG) 8(3):23 


. Chen K, Har-Peled S (2008) The orienteering problem 


in the plane revisited. SIAM J Comput 38(1):385-397, 
preliminary version in Proceedings of the ACM SoCG, 
Sedona, 2006, pp 247-254 


. Nagarajan V, Ravi R (2011) The directed orienteering 


problem. Algorithmica 60(4):1017—1030 


. Toth P, Vigo D (eds) (2001) The vehicle routing 


problem. SIAM monographs on discrete mathematics 
and applications. Society for Industrial and Applied 
Mathematics, Philadelphia 


Orthogonal Range Searching on 
Discrete Grids 


Yakov Nekrich 
David R. Cheriton School of Computer Science, 
University of Waterloo, Waterloo, ON, Canada 


Keywords 


Orthogonal range searching; Word RAM model 


Orthogonal Range Searching on Discrete Grids 


Years and Authors of Summarized 
Original Work 


1988; Chazelle 

2000; Alstrup, Brodal, Rauhe 
2004; JaJa, Mortensen, Shi 
2007; Patrascu 

2009; Karpinski, Nekrich 
2011; Chan, Larsen, Patrascu 
2013; Chan 


Problem Definition 


Let S be a set of n d-dimensional points. In 
the orthogonal range searching problem we keep 
S in a data structure, so that for an arbitrary 
query rectangle OQ = [a1,bi] x --- x [ag, ba] 
information about points in Q M S can be found. 
Range searching is a fundamental computational 
geometry problem with numerous applications in 
data bases, text indexing, string processing, and 
network analysis. In computational geometry it 
is frequently assumed that point coordinates are 
real and the data structure works in the real RAM 
model. In a vast majority of practical situations 
we can, however, make a stronger assumption 
that point coordinates are discrete values. This 
scenario is captured by the word RAM model 
of computation: all coordinates are integers that 
fit into a machine word and standard operations 
on words can be performed in constant time. In 
this case, we say that points are on an integer 
grid. If point coordinates are also bounded by a 
parameter U, we say that points are on a grid of 
size U (also called a U x U grid if points are 
two-dimensional). 

The discrete grid assumption leads to im- 
proved results for many range searching prob- 
lems. In this entry we consider several such prob- 
lems. We remark that a problem on an integer grid 
can be always reduced to the same problem on a 
grid of size n using the technique called reduction 
to rank space [1, 8]. This reduction takes O(n) 
additional space and increases the query time by 
an additive term pred(n, U), where pred(n, U) = 
O(min(log log U, ,/logn/ log logn)) is the time 


needed to answer a predecessor query [2, 21]. 
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Unless specified otherwise, we will assume that 
all points are on a grid of size n. Throughout this 
entry, the space usage of described data structures 
is measured in words; each word consists of w > 
logn bits and can hold a coordinate of any point 
from the input set. 


Key Results 
Orthogonal Range Reporting 


Two Dimensions 

The problem is to keep a set of points S in a data 
structure, so that all points in QMS for any query 
rectangle Q can be reported. Overmars [17] was 
the first to consider this problem in the discrete 
grid scenario. His data structure needs O(n logn) 
words of space and supports two-dimensional 
queries on U x U grid in O(loglogU + k) 
time, where k is the number of reported points. 
Alstrup et al. [1] improved the space usage and 
described a data structure that uses O(n log® n) 
space and supports queries in O(loglogn + k) 
time. Henceforth, ¢ denotes an arbitrarily small 
positive constant. There are also data structures 
that use less space but need more than constant 
time per reported point. Chan et al. [6] describe 
an O(n)-space data structure that answers two- 
dimensional queries in O((k + 1) log’ n) time 
and an O(n loglogn)-space data structure that 
answers queries in O((k + 1) log log 1) time. Any 
data structure that uses n log? n space needs 
2 (log logn +k) time to answer two-dimensional 
range reporting queries. This follows from the re- 
duction of two-dimensional range reporting prob- 
lem to the predecessor problem [12] and the 
lower bound for the predecessor problem [20]. 

In the special case of three-sided range re- 
porting queries, the query range is bounded on 
three sides. Thus, a three-sided query range is 
a product of a closed interval and a half-open 
interval. Three-sided range reporting queries can 
be answered in O(k + 1) time using a linear 
space data structure [1]. The restriction on point 
coordinates can be relaxed for one dimension 
in the case of three-sided queries: there is a 
linear-space data structure for points on n xN grid 
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that answers queries of the form [a, b] x (—oo, d] 
or [a,b] x [c, +00) in O(k + 1) time. Symmet- 
rically, there is also a linear-space data structure 
for points on N x n grid that answers queries 
[a, +00) x [c, d] or (—00, b] x [c, d] in O(K +1) 
time. We list the best currently known results 
for two-dimensional orthogonal range reporting 
in rows 1-4 of Table 1. 


Three Dimensions 

Three-dimensional queries can be answered in 
optimal O(log logn +k) time and O(n log't* n) 
space [6]. There is also an O(n log n (log log n)?)- 
space data structure that answers queries in 
O((loglogn)? + kloglogn) time; this result 
is achieved by combining the approach of 
[11] with the point location data structure of 
[4]. Alternatively, there is a data structure that 
uses O(n logn) space and answers queries in 
O(log!t*n + klog®n) time [7]. Better space 
bounds can be achieved for some special cases. A 
three-dimensional query that is a product of three 
closed intervals, Q = [a,b] x [c,d] x [e, f], 
is called a (2,2, 2)-sided query; this is the most 
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general case of three-dimensional queries. We 
say that a three-dimensional orthogonal range 
reporting query Q is a (2,2,1)-sided query if 
it is a product of two closed intervals and one 
half-open interval: O = [a, b] x [c, d] x [e, +00). 
There exists a data structure that uses O(n log® n) 
space and answers (2,2,1)-sided queries in 
O(loglogn + k) time [6]. Since (2,2, 1)- 
sided queries are not easier to answer than 
two-dimensional queries, this query time is 
optimal. There is also a data structure that uses 
O(n(loglogn)*) space and answers (2,2, 1)- 
sided queries in O((loglogn)? + k loglogn) 
time; this result is obtained by combining 
the approach of Karpinski and Nekrich [11] 
with the point location data structure of 
Chan [4]. Finally, we can also answer (2, 2, 1)- 
sided queries in O(log't®n + klog®n) 
time using a linear-space data structure [7]. 
A (1, 1, 1)-sided query (also known as dominance 
query) QO is a product of three half-open 
intervals: OQ = (—o0o,a] x (—co, b] x (—o0, c]. 
Chan [4] describes a linear space data structure 
that answers (1,1, 1)-sided queries in optimal 
O(log logn + k) time. See Table 2. 


Orthogonal Range Searching on Discrete Grids, Table 1 Two-dimensional orthogonal range reporting. Four-sided 


queries denote general two-dimensional queries 


Ref. Space Query time 

{1] O(n) O(k + 1) 

{1] O(n log® n) O(loglogn + k) 

[6] O(n log logn) O((k + 1) log logn) 
[6] O(n) O((k + 1) log® n) 
[16] O(n log® n) O(loglogn + k’) 
[22] O(n log logn) O((k’ + 1) loglogn) 
[16] O(n) O((k’ + 1) log® n) 


Grid Remarks 
nxN 3-sided 

nxn 4-sided 

nxn 4-sided 

nxn 4-sided 

nxn Sorted 4-sided 
nxn Sorted 4-sided 
nxn Sorted 4-sided 
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Query type 

(2, 2, 2)-sided 
(2, 2, 2)-sided 
(2, 2, 2)-sided 


Ref. Space Query time 

[6] O(nlog't* n) O(log logn + k) 

(11) +[4] O(n log n(log logn)) O((loglogn)? + k loglogn) 
[7] O(n logn) O(log! t* n + k log® n) 

[6] O(n log® n) O(loglogn + k) 

(11+ [4] O(n(log logn)) O((loglogn)? + k loglogn) 
[7] O(n) O(log't* n + k log’ n) 

[4] O(n) O(log logn + k) 


(2,2, 1)-sided 
(2,2, 1)-sided 
(2, 2, 1)-sided 
(1, 1, 1)-sided 
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Multi-dimensional Queries 

Using range trees [3], we can extend the 
three-dimensional data structures so that d- 
dimensional queries for any integer constant 
d can be answered. The query time and 
space usage grow by O(log’? n). We can 
also increase the arity of range trees, so that 
each internal node has O(log’! (2(4-3)) ny 
children. In this case, the query time grows by 
O((logn/loglogn)4~?) and the space usage 
grows by Oloz?-*** n). Thus, there is a data 
structure that uses O(nlog?*®n) space and 
answers four-dimensional range reporting queries 
in O(logn + k) time [6]. This result almost 
matches the lower bound of Patrascu [19] stating 
that any n log? n-space data structure answers 
four-dimensional orthogonal range reporting 
queries in Q(logn/loglogn) time. The best 
currently known d-dimensional data structure 
needs O((logn/ log logn)4~3 log logn +k) time 
and uses O(n log? ?** n) space [6]. 


Emptiness Queries and One-Reporting 

Queries 

An orthogonal range emptiness query Q asks 
whether Q contains any points of the input set 
S.A one-reporting query Q asks for an arbitrary 
pointpinONnSifONS AGB;ifONS =G,a 
special NULL value is returned. We can employ 
all data structures described above for answering 
emptiness and one-reporting queries. Any previ- 
ously described data structure that answers range 
reporting queries in time O(q(n) + k -q’(n)) can 
be used to answer emptiness and one-reporting 
queries (for the same dimension and the same 
query type) in O(q(n)) time and O(q(n)+q'(n)) 
time, respectively. 


Two-Dimensional Range Successor and Sorted 
Range Reporting Queries 

An orthogonal range successor (also known as 
range next value) query Q = [a,b] x [c,d] 
asks for the leftmost point in SN Q. In [16] the 
authors considered the following generalization 
of range successor queries: for a query range 
OQ = [a,b] x [c, d] report all points in QM S in 
left-to-right order. Sorted range reporting queries 
can also be answered in online modus: a query 
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can be terminated when the k’ leftmost points 
in QS are reported for any k’ < |ON S| 
and k’ can be specified at query time. Nekrich 
and Navarro [16] describe data structures that use 
O(n) and O(n log’ n) space and answer sorted 
range reporting queries in O((k’ + 1) log* n) and 
O(loglogn + k’) time, respectively. The data 
structure of [22] needs O(n loglogn) time and 
supports queries in O((k’ + 1) loglogn) time. 
See Table 1. Sorted range reporting queries for 
k’' = | are equivalent to range successor queries. 
Thus, data structures for sorted range reporting 
match the complexity of the best currently known 
structures for standard two-dimensional point re- 
porting (respectively, the data structures for range 
successor queries match data structures for one- 
reporting in two dimensions). 


Orthogonal Range Counting 

The problem is to keep a set of points S 
in a data structure so that for any query 
rectangle Q, the number of points in ON S 
can be computed. The data structure of [10] 
uses O(n(logn/loglogn)?~*) space and 
answers d-dimensional dominance counting 
queries (i.e., counts points in a range that 
is a product of d_half-open intervals) in 
O((logn/loglogn)¢~!) time. We can count 
points in any d-dimensional rectangle by answer- 
ing O(2¢) d-dimensional dominance queries. 
Hence, d-dimensional range counting queries 
can also be answered in O((logn/ log logn)¢~!) 
time and O(n(logn/loglogn)?~?) space for 
any constant d. Dynamic data structures that 
use O(n) space, answer two-dimensional range 
counting queries in O((logn/loglogn)7) time, 
and support updates in poly-logarithmic time 
are described in [14] and [9]. Query time of 
the static data structure is optimal ford = 2 
dimensions: by the lower bound of [18], any two- 
dimensional data structure that uses n log? n 
space needs 2(logn/loglogn) time to answer 


queries. 
We can, however, reduce the query cost if k 
is small, where k = |Q 1M S| is the number of 


points in a query range. Chan and Wilkinson [5] 
describe a data structure that uses O(n log log n) 
words of space and answers two-dimensional 
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range counting queries in optimal O(loglogn + Recommended Reading 


log,, k) time, where w is the size of the machine 
word. Nekrich [15] describes a data structure 
that uses O(n) words and counts the number 
of points in a two-dimensional three-sided range 
in optimal O(log,,k) time. There are also data 
structures that provide an approximate answer 
for two- and three-dimensional range counting 
queries [5, 13, 15]. These data structures return 
a value k such that (1 — 8)k < k < (14+ 8)k 
for an arbitrary query Q and some fixed constant 
56 > 0. We can answer two-dimensional ap- 
proximate counting queries in O(log logn) time 
using an O(n log logn)-space data structure [5]; 
we can answer three-sided approximate counting 
queries in O(1) time using an O(n)-space data 
structure [15]. An approximate answer to a three- 
dimensional dominance counting query can be 
obtained in O((log log n)*) time using an O(n)- 
space data structure; we can estimate the number 
of points in any three-dimensional range within 
the same time using an O(n log? n)-space data 
structure [13]. If we plug the point location data 
structure of [4] into the data structure of [13], then 
the query time of approximate three-dimensional 
queries is reduced to O((log logn)?). 


Open Problems 


In spite of extensive research and significant 
achievements, many important questions are still 
open. The best currently known data structure 
that supports two-dimensional reporting queries 
in optimal time needs O(n log’ n) space [1]. 
Existence of a data structure that uses o(n log® n) 
space for any ¢ > 0 and achieves optimal query 
time is an interesting open question. Another 
important problem is improving the space com- 
plexity of d-dimensional range reporting for d > 
2 dimensions and query time of d-dimensional 
range reporting for d > 4 dimensions. 
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Problem Definition 


This problem is concerned with efficiently de- 
signing a serverless infrastructure for a federation 
of hosts to store, index and locate information, 
and for efficient data dissemination among the 
hosts. The key services of peer-to-peer (P2P) 
overlay networks are: 


1. A keyed lookup protocol locates information 
at the server(s) that hold it. 
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2. Data store, update and retrieve operations 
maintain a _ distributed persistent data 
repository. 

3. Broadcast and multicast support information 
dissemination to multiple recipients. 


Because of their symmetric, serverless nature, 
these networks are termed P2P networks. Below, 
we often refer to hosts participating in the net- 
work as peers. 

The most influential mechanism in this area 
is consistent hashing, pioneered in a paper by 
Karger et al. [21]. The idea is roughly the fol- 
lowing. Frequently, a good way of arranging 
a lookup directory is a hash table, giving a fast 
O(1)-complexity data access. In order to scale 
and provide highly available lookup services, 
we partition the hash table and assign different 
chunks to different servers. So, for example, 
if the hash table has entries | through n, and 
there are k participating servers, we can have 
each server select a virtual identifier from | to 
n at random. Server i will then be responsi- 
ble for key values that are closer to i than to 
any other server identifier. With a good random- 
ization of the hash keys, we can have a more 
or less balanced distribution of information be- 
tween our k servers. In expectation, each server 
will be responsible for (n/k) keys. Furthermore, 
the departure/arrival of a server perturbs only 
one or two other servers with adjacent virtual 
identifiers. 
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A network of servers that implement 
consistent hashing is called a distributed hash 
table (DHT). Many current-generation resource 
sharing networks, and virtually all academic 
research projects in the area, are built around 
a DHT idea. 

The challenge in maintaining DHTs is two- 


fold: 


Overlay routing Given a hash key i, and starting 
from any node ¢ in the network, the problem is 
to find the server s whose key range contains 
i. The key name i bears no relation to any 
real network address, such as the IP address 
of a node, and therefore we cannot use the 
underlying IP infrastructure to locate s. An 
overlay routing network links the nodes, and 
provides them with a routing protocol, such 
that r can route toward s using the routing 
target 7. 

Dynamic maintenance DHTs must work in 
a highly dynamic environment in which the 
size of the network is not known a priori, 
and where there are no permanent servers 
for maintaining either the hash function or 
the overlay network (all servers are assumed 
to be ephemeral). This is especially acute in 
P2P settings, where the servers are transient 
users who may come and go as they wish. 
Hence, there must be a decentralized protocol, 
executed by joining and leaving peers, that 
incrementally maintains the structure of the 
system. Additionally, a joining peer should be 
able to correctly execute this protocol while 
initially only having knowledge of a single, 
arbitrary participating network node. 


One of the first overlay network projects was 
Chord [35], after which this encyclopedia 
entry is named (2001; Stoica, Morris, Karger, 
Kaashoek, Balakrishnan). More details about 
Chord are given below. 


Key Results 


The P2P area is very dynamic and rapidly evolv- 
ing. The current entry provides a mere snapshot, 


P2P 


covering dominant and characteristic strategies, 
but not offering an exhaustive survey. 


Unstructured Overlays 
Many of the currently deployed widespread 
resource-sharing networks have little or no 
particular overlay structure. More specifically, 
early systems such as Gnutella version 0.4 had 
no overlay structure at all, and allowed every 
node to connect to other nodes arbitrarily. This 
resulted in severe load and congestion problems. 
Two-tier networks were introduced to 
reduce communication overhead and_ solve 
the scalability issues that early networks like 
Gnutella version 0.4 had. Two-tier networks 
consist of one tier of relatively stable and 
powerful nodes, called servers (superpeers, 
ultrapeers), and a larger tier of clients that 
search the network though servers. Most current 
networks, including Edonkey/Emule, KaZaa, 
and Gnutella, are built using two tiers. Servers 
provide directory store and search facilities. 
Searching is either limited to servers to which 
clients directly connect (eDonkey/eMule) or done 
by limited-depth flooding among the servers 
(Gnutella). The two-tier design considerably 
enhances the scalability and reliability of P2P 
networks. Nevertheless, the connections among 
servers and between clients/servers is done 
in a completely ad hoc manner. Thus, these 
networks provide no guarantee for the success 
of searches, nor a bound on their costs. 


Structured Overlays Without Locality 
Awareness 


Chord 

The Chord system was built at MIT and is 
currently being developed under FNSF’s IRIS 
project (http://project-iris.net/). Several aspects 
of the Chord [35] design have influenced 
subsequent systems. We briefly explain the core 
structure of Chord here. Nodes have binary 
identifiers, assigned uniformly at random. Nodes 
are arranged in a linked ring according to their 
virtual identifiers. In addition, each node has 
shortcut links to other nodes along the ring, link 
i to anode 2! away in the virtual identifier space. 
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In this way, one can move gradually to the target 
by decreasing the distance by half at every step. 
Routing takes on average logn hops to reach 
any target, in a network containing n nodes. 
Each node maintains approximately logn links, 
providing the ability to route to geometrically 
increasing distances. 


Constant Per-Node State 

Several overlay network algorithms were devel- 
oped with the goal of pushing the amount of 
network state kept by each node in the overlay to 
a minimum. We refer to the state kept by a node 
as its degree, as it mostly reflects the number 
of connections to other nodes. Viceroy [23] was 
the first to demonstrate a dynamic network in 
which each node stores only five links to other 
network nodes, and routes to any other node in 
a logarithmic number of hops, logn for a net- 
work of m nodes. Viceroy provided a dynamic 
emulation of a butterfly network (see [11] for 
a textbook exposition of interconnect networks 
like butterfly). Later, several emulations of De 
Bruijn networks emerged, including the generic 
one of Abraham et al. (AAABMP) [1], the 
distance halving network [26], D2B [13], and 
Koorde [20]. Constant-degree overlay networks 
are too fragile for practical purposes, and may 
easily degrade in performance or even partition in 
the face of failures. A study of overlay networks 
under churn demonstrated these points [18]. In- 
deed, to the best of our knowledge, none of these 
constant-degree networks were built. Their main 
contribution, and the main reason for mentioning 
these works here, is to know that it is possible 
in principle to bring the per-node state to a bare, 
small constant. 


Content Addressable Network 

The Content Addressable Network (CAN) [31] 
developed at ICSI builds the network as vir- 
tual d-dimensional space, giving every node a 
d-dimensional identifier. The routing topology 
resembles a d-dimensional torus. Routing is done 
by following the Euclidean coordinates in every 
dimension, yielding adn ue hop routing strategy. 
The parameter d can be tuned by the network 
administrator. Note that for d = logn, CAN’s 
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features are the same as in Chord, namely, loga- 
rithmic degree and logarithmic routing hop count. 


Overlay Routing Inspired by “Small-World” 
Networks 

The Symphony [24] algorithm emulates rout- 
ing in a small world. Nodes have k links to 
nodes whose virtual identifiers are chosen at ran- 
dom according to a routable small-world distribu- 
tion [22]. With k links, Symphony is expected to 
find a target in log” /k hops. 


Overlay Networks Supporting Range Queries 

One of the deficiencies of DHTs is that they 
support only exact key lookup; hence, they do 
not address well the need to locate a range of 
keys, or to have a fuzzy search, e.g., search 
for any key that matches some prefix. Skip- 
Graphs [4] and the SkipNet [19] scheme from 
Microsoft (project Herald) independently devel- 
oped a similar DHT based on a randomized 
skip list [28] that supports range queries over 
a distributed network. The approach in both of 
these networks is to link objects into a double- 
linked list, sorted by object names, over which 
“shortcut” pointers are built. Pointers from each 
object skip to a geometric sequence of distances 
in the sorted list, i.e., the first pointer jumps two 
items away, the second four items, and so on, 
up to pointer logn — 1, which jumps over half 
of the list. Logarithmic, load-balanced lookup is 
achieved in this scheme in the same manner as in 
Chord. Because the identifier space is sorted by 
object names, rather than hash identifiers, ranges 
of objects can be scanned efficiently simply by 
routing to the lowest value in the range; the 
remaining range nodes reside contiguously along 
the ring. 

By prefixing organization names to object 
names, SkipNet achieves contiguity of nodes 
belonging to a single organization along the ring, 
and the ability to map objects on nodes in their 
local organizations. In this way SkipNet achieves 
resource proximity and isolation the only system 
besides RP [33] to have this feature. 

Whereas the SkipGraphs work focuses on ran- 
domized load-balancing strategies and proofs, 
the SkipNet system considers issues of dynamic 
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P2P, Table 1 Comparison of various measures of lookup schemes with no locality awareness 


Overlay lookup scheme Topology resemblance Hops Degree 
Chord Hypercube logn logn 
Viceroy Butterfly logn 5 
AAABMP, Distance-halving, Koorde, D2B De Bruijn logn 4 
Symphony Small world log? n/k k 
SkipGraphs/SkipNet Skip list logn logn 
CAN Torus dni/4 d 


maintenance, variable base sizes, and adopts the 
locality-awareness strategy of Pastry [33], which 
is described below. 


Summary of Non-Locality-Aware Networks 

Each of the networks mentioned above is dis- 
tinct in one or more of the following properties: 
The (intuitive) emulated topology; the expected 
number of hops required to reach a target; and 
the per-node degree. Table 1 summarizes these 
properties. 


Locality Awareness 

The problem with the approaches listed above is 
that they ignore the proximity of nodes in the 
underlying networks, and allow hopping back and 
forth across large physical distances in search 
of content. Recent studies of scalable content 
exchange networks [17] have indicated that up to 
80 % of Internet searches could be satisfied by 
local hosts within one’s own organization. There- 
fore, even one far hop might be too costly. The 
next systems we encounter consider proximity 
relations among nodes in order to obtain locality 
awareness, i.e., that lookup costs are proportional 
to the actual distance of interacting parties. 


Growth-Bounded Networks 

Several locality-aware lookup networks were 
built around a bit-fixing protocol that borrows 
from the seminal work of Plaxton et al. [27] 
(PRR). The growth bounded network model for 
which this scheme is aimed views the network 
as a metric space, and assumes that the densities 
of nodes in different parts of the network are 
not terribly different. The PRR [27] lookup 
scheme uses prefix routing, similar to Chord. 


It differs from Chord in that a link for flipping the 
ith identifier bit connects with any node whose 
length-i prefix matches the next hop. In this way, 
the scheme favors the closest one in the network. 
This strategy builds geometric routing, whose 
characteristic is that the routing steps toward 
a target increase geometrically in distance. This is 
achieved by having large flexibility in the choice 
of links for each prefix at the beginning of a route, 
and narrowing it down as the route progresses. 
The result is an overlay routing scheme that finds 
any target with a cost that is proportional to the 
shortest-distance route. 

The systems that adopt the PRR algorithm are 
Pastry [33], Tapestry [36], and Bamboo [32]. 
A very close variant is Kademlia [25], in which 
links are symmetric. It is worth mentioning that 
the LAND scheme [2] improves PRR in provid- 
ing a nearly optimal guaranteed locality guaran- 
tee; however, LAND has not been deployed. 


Applications 


Caching 

The Coral network [14] from NYU, built on top 
of DSHT [15], has been operational since around 
2004. It provides free content delivery services 
on top of the PlanetLab-distributed test bed [9], 
similar to the commercial services offered by 
the Akamai network. People use it to provide 
multiple, fast access points to content they wish 
to publish on the Web. 

Coral optimizes access locality and down- 
load rate using locality-aware lookup provided by 
DSHT. Within Coral, DSHT is utilized to support 
locality-aware object location in two applica- 
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tions. First, Coral contains a collection of HTTP 
proxies that serve as content providers; DSHT 
is used by clients for locating a close-by proxy. 
Second, proxy servers themselves use DSHT to 
locate a near-by copy of content requested by the 
client, thus making use of copies of the content 
that are stored in the network, rather than going 
to the source of the content. 


Multicast 

Several works deploy an event notification or 
publish—subscribe service over an existing rout- 
ing overlay by building reverse-routing multicast 
paths from a single “target” to all “sources.” 
For example, multicast systems built in this way 
include the Bayeux network [38], which is built 
over Tapestry [36], and SCRIBE [5], which is 
built over Pastry. In order to publish a file, the 
source advertises using flooding a tuple which 
contains the semantic name of a multicast session 
and a unique ID. This tuple is hashed to obtain 
a node identifier which becomes the session root 
node. Each node can join this multicast session by 
sending a message to the root. Nodes along the 
way maintain membership information, so that 
a multicast tree is formed in the reverse direction. 
The file content (and any updates) is flooded 
down the tree. Narada [8] is built with the same 
general architecture, but differs in its choice of 
links, and the maintenance of data. 


Routing Infrastructure 

A DHT can serve well to store routing and (po- 
tentially dynamic) location information of virtual 
host names. This idea has been utilized in a num- 
ber of projects. A naming system for the Internet 
called CoDoNS [30] was built at Cornell Uni- 
versity over the BeeHive overlay [29]. CoDoNS 
provides a safety net and is a possible replace- 
ment for the Domain Name System, the current 
service for looking up host names. Support for 
virtual IPv6 network addresses is provided in [37] 
by mapping names to their up-to-date, reachable 
IPv4 address. The Internet Indirection Infrastruc- 
ture [34] built at the University of California, 
Berkeley provides support for virtual Internet 
host addresses that allows mobility. 
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Collaborative Content Delivery 

Recent advances provide collaborative content 
delivery solutions that address both load balance 
and resilience via striping. The content is split 
into pieces (quite possibly with some redundancy 
through error-correcting codes). The source 
pushes the pieces of the file to an initial group 
of nodes, each of which becomes a source of 
a distribution tree for its piece, and pushes it 
to all other nodes. These works demonstrate 
clearly the advantages of data striping, i.e., 
of simultaneously exchanging stripes of data, 
over a tree-based dissemination of the full 
content. 

SplitStream [6] employs the Pastry routing 
overlay in order to construct multiple trees, such 
that each participating node is an inner node in 
only one tree. It then supports parallel download 
of stripes within all trees. SplitStream [6] strives 
to obtain load balancing between multicast nodes. 
It achieves that by splitting the published content 
into several parts, called stripes, and publishing 
each part separately. Each stripe is published 
using a tree-based multicast. The workload is 
divided between the participating nodes by send- 
ing each stripe using a different multicast tree. 
Load balance is achieved by carefully choosing 
the multicast trees so that each node serves as an 
interior node in at most one tree. This reduces 
the number of “free riders” who only receive 
data. 

A very popular file-distribution network is the 
BitTorrent system [10]. Nodes in BitTorrent are 
divided into seed nodes and clients. Seed nodes 
contain the desired content in full (either by 
being original providers, or by having completed 
a recent download of the content). Client nodes 
connect with a seed node or several seed nodes, 
as well as a tracker node, whose goal is to 
keep track of currently downloading clients. Each 
client selects a group (currently, of size about 
20) of other downloading clients, and exchanges 
chunks of data obtained from the seed(s) with 
them. BitTorrent employs several intricate strate- 
gies for selecting which chunks to request from 
what other clients, in order to obtain fair load 
sharing of the content distribution and, at the 
same time, achieve fast download. 


1496 


BitTorrent currently does not contain P2P- 
searching facilities. It relies on central sites 
known as “trackers” to locate content, and to 
coordinate the BitTorrent download process. 
Recent announcements by Bram Cohen (the 
creator of BitTorrent) and creators of other 
BitTorrent clients state that new protocols based 
on BitTorrent will be available soon, in which the 
role of trackers is eliminated, and searching and 
coordination is done in a completely P2P manner. 

Experience with BitTorrent and similar sys- 
tems indicates that the main problem with this 
approach is that towards the end of a download, 
many peers may be missing the same rare chunks, 
and the download slows down. Fairly sophisti- 
cated approaches were published in an attempt to 
overcome this issue. 

Recently, a number of works at Microsoft 
Research have demonstrated the benefits of net- 
work coding in efficient multicast, e.g., [7] and 
Avalanche [16]. We do not cover these techniques 
in detail here, but only briefly state the principal 
ideas that underlie them. 

The basic approach in network coding is to re- 
encode all the chunks belonging to the file, so 
that each one that is shared is actually a linear 
combination of all the pieces. The blocks are 
then distributed with a description of the content. 
Once a node obtains these re-encoded chunks, it 
can generate new combinations from the ones it 
has, and can send those out to other peers. The 
main benefit is that peers can make use of any 
new piece, instead of having to wait for specific 
chunks that are missing. This means no one peer 
can become a bottleneck, since no piece is more 
important than any other. Once a peer collects 
sufficiently many such chunks, it may use them 
to reconstruct the whole file. 

It is worth noting that in unstructured settings, 
it was recently shown that network coding offers 
no advantage [12]. 
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Problem Definition 


Valiant’s work defines a model for representing 
the general problem of learning a Boolean con- 
cept from examples. The motivation comes from 
classical fields of artificial intelligence [2], pat- 
tern classification [8], and machine learning [13]. 
Classically, these fields have employed numerous 
heuristics for representing knowledge and defin- 
ing criteria by which computer algorithms can 
learn. The pioneering work of [16, 17] provided 
the leap from heuristic-based approaches to a rig- 
orous Statistical theory of pattern recognition (see 
also [1, 7, 14]). Their main contribution was the 
introduction of probabilistic upper bounds on the 
generalization error which hold uniformly over a 
whole class of concepts. Valiant’s main contri- 
bution is in formalizing this probabilistic theory 
into a general model for computational inference. 
This model which is known as the Probably Ap- 
proximately Correct (PAC) model of learnability 
is concerned with computational complexity of 
learning. In his formulation, learning is depicted 
as an interaction between a teacher and a learner 
with two main procedures, one which provides 
randomly drawn examples x of the concept c 
that is being learned and the second acts as an 
oracle which provides the correct classification 
label c(x). Based on a finite number of such 
examples drawn identically and independently 
according to any fixed probability distribution, 
the aim of the learner is to infer an approximation 
of c which is correct with high confidence. Using 
the terminology of [12] suppose X denotes the 
space of instances, i.e., objects which a learner 
can obtain as training examples. A concept over 
X is a Boolean mapping from X to {0,1}. Let P 
be any fixed probability distribution over X and 
c a fixed target concept to be learned. For any 
hypothesis concept over X define by L(h) = 
P(c(x) 4 h(x)) the error of h, i.e., the proba- 
bility that ) disagrees with c on a test instance x 
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which is drawn according to P. Then according 
to Valiant, an algorithm A for learning c is one 
which runs in time ¢ and with a sample of size m 
where both ¢ and m are polynomials with respect 
to some parameters (to be specified below) and 
produces a hypothesis concept / such that with 
high confidence L(h) is small. 


Key Results 


The main result of Valiant’s work is a formal def- 
inition of what constitutes a learnable problem. 
Formally, this is stated as follows: let H be a 
class of concepts over X. Then H is learnable 
if there exists an algorithm A with the following 
property: for every possible target concept c € 
H, for every probability distribution P on X 
(this is sometimes referred to as the “distribution- 
independence” assumption), for all values of a 
confidence parameter 0 < 6 < 1/2 and an 
approximation accuracy parameter 0 < € < 1/2, 
if A receives as input the value of 5,¢ and a 
sample S = {(x;,c(x;))}i=1”” of cardinality m 
(which may depend on ¢ and 5) which consists of 
examples x; that are randomly drawn according 
to P and labeled by an oracle as c(x;), then with 
probability 1 — 6, A outputs a hypothesis concept 
h € H such that the error L(h) < ¢€. That € can 
be arbitrarily close to zero follows from what is 
known as the “noise-free” assumption, i.e., that 
the labels comprise the true value of the target 
concept. If A runs in time ¢ and if ¢ and m are 
polynomial in 1/e and 1/5 (and possibly other 
relevant parameters, such as n if the space of 
instance X is {0,1}” or R”), then H is efficiently 
PAC learnable. 

Valiant has shown that the following classes 
are all efficiently PAC learnable: the class of con- 
junctive normal form expressions with a bounded 
number of literals in each clause, the class of 
monotone disjunctive normal form expressions 
(here the learner requires in addition to S also 
an oracle that can answer membership queries, 
i.e., provide the true label c(x) for an x in 
question), and the class of arbitrary expressions in 
which each variable occurs just once (using more 
powerful oracles). The work following Valiant’s 
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paper (see [11] for references) has shown that the 
classes of kK-DNF, k-CNF, and k-decision lists are 
efficiently PAC learnable for each fixed k. Under 
suitable complexity-theoretic hardness assump- 
tions, the class of concepts in the form of a dis- 
junction of two conjunctions is not PAC learnable 
and neither is the class of existential conjunctive 
concepts on structural instance spaces with two 
objects. Linear threshold concepts (perceptrons) 
are PAC learnable on both Boolean and real- 
valued instance spaces, but the class of concepts 
in the form of a conjunction of two linear thresh- 
old concepts is not PAC learnable. The same 
holds for disjunctions and linear thresholds of lin- 
ear thresholds (i.e., multilayer perceptrons with 
two hidden units). If the weights are restricted to 
1 and 0 (but the threshold is arbitrary), then linear 
threshold concepts on Boolean instance spaces 
are not PAC learnable. 

It should be noted that the notion of PAC 
learnability discussed throughout this entry is 
sometimes referred to as “proper” PAC learn- 
ability because of the requirement that, when 
learning a concept class H, the learning algorithm 
must output a hypothesis that also belongs to H. 
Several of the negative results mentioned above 
can be circumvented in a model of “improper” 
PAC learning, where the learning algorithm is 
allowed to output hypotheses from a broader class 
of functions than #1. This is sometimes referred to 
as agnostic PAC learnability (see [15], Ch. 3, [12] 
and the proceedings of the COLT conferences for 
many results of this type). 


Applications 


Valiant’s paper is a milestone in the history 
of the area known as Computational Learning 
Theory (see proceedings of COLT conferences). 
The PAC model has been criticized in that the 
distribution-independence assumption and the 
notion of target concepts with noise-free training 
data are unrealistic in practice, e.g., in machine 
learning and AI. There has thus been much 
work on learning models that relax several of 
the assumptions in Valiant’s PAC model. These 
include models which allow noisy labels or 
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remove the assumptions on the independence 
of training examples, relax the assumption on 
the probability distribution to be fixed, allow 
the bounds to be distribution dependent, permit 
the training sample to be picked by the learner 
and labeled by the oracle instead of the random 
sample or chosen by a helpful teacher, allow 
learning regression, and use generalized loss 
functions. For references, see Sec.2.6 of [1] 
and Ch. 3 of [15]. An important follow-up of 
Valiant’s model was the work of [6] who unified 
his model with the uniform convergence results 
of [16]. They showed the important dependence 
between the notion of learnability and certain 
combinatorial properties of concept classes, one 
of which is known as the Vapnik-Chervonenkis 
(VC) dimension (see Sec. 3.4 of [1] for history 
on the VC-dimension). 
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Problem Definition 


A collection of packets need to be routed from 
a set of specified sources to a set of specified 
destinations in an arbitrary network. Leighton, 
Maggs and Rao [5] looked at a model where 
this task is divided into two separate tasks: the 
first is the path selection task, where for each 
specified packet i with source s; and packet 
destination ¢;, a simple (meaning edges don’t 
repeat) path P; through the network from s; to fj 
is pre-selected. Packets traverse the network in 
a store and forward manner: each time a packet 
is forwarded it travels along the next link in the 
pre-selected path. It is assumed that only one 
packet can cross each individual link at each 
given global (synchronous) timestep. Thus, when 
there is contention for a link, packets awaiting 
traversal are stored in the local link’s queue 
(special source and sink queues of unbounded 
size are also defined that store packets at their 
origins and destinations). Thus, the second 
task, and the focus of the Leighton, Maggs and 
Rao result (henceforth called the LMR result) 
is the scheduling task: a determination, when 
a link’s queue is not empty, of which packet 
gets to traverse the link in the next timestep 
(where it is assumed to immediately join the 
link queue for its next hop). The goal is to 
schedule the packets so that the maximum time 
that it takes any packet to reach its destination is 
minimized. 

There are two parameters of the network to- 
gether with the pre-selected paths that are clearly 
relevant. One is the congestion c, defined as the 
maximum number of paths that all use the same 
link. The other is the dilation d, which is simply 
the length of the longest path that any packet 
traverses in the network. Clearly each of c and 
d is a lower-bound on the length of any schedule 
that routes all the packets to their destinations. It 
is easy to see that a schedule of length at most cd 
always exists. In fact, any schedule that never lets 
a link go idle if there is a packet that can use that 
link at that timestep is guaranteed to terminate in 
cd steps, because each packet traverses at most d 
links, and at any link can be delayed by at most 
c — | other packets. 
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Key Results 


The surprising and beautiful result of LMR is as 
follows: 


Theorem ([5]) For any network G with a pre- 
specified set of paths P with congestion c and 
dilation d, there exists a schedule of length O(c + 
d), where the queue sizes at each edge are always 
bounded by a constant. 


The original proof of the LMR paper is non- 
constructive. That is, it uses the Local Lemma [3] 
to prove the existence of such a schedule, but 
does not give a way to find it. In his book [10], 
Scheideler showed that in fact, a O(c + d) 
schedule exists with edge queue sizes bounded 
by 2 (and gave a simpler proof of the original 
LMR result). A subsequent paper of Leighton, 
Maggs and Richa in 1999 [6] provides a con- 
structive version of the original LMR paper as 
follows: 


Theorem ([6]) For any network G with a pre- 
specified set of paths P with congestion c and 
dilation d, there exists a schedule of length 
O(c + d). Furthermore, such a schedule can 
be found in O(p log!*€ plog*(c + d)) time for 
any €>0, where p is the sum of the lengths of the 
paths taken by the packets and € is incorporated 
into the constant hidden by the big-O in the 
schedule length. 


The algorithm in the paper is a randomized one, 
though the authors claim that it can be deran- 
domized using the method of conditional proba- 
bilities. However, even though the algorithm of 
Leighton, Maggs and Richa is constructive, it 
is still an offline algorithm: namely, it requires 
full knowledge of all packets in the network 
and the precise paths that each will traverse 
in order to construct the schedule. The original 
LMR paper also gave a simple randomized online 
algorithm, that, by assigning delays to packets 
independently and uniformly at random from an 
appropriate interval, results in a schedule which 
is much better than greedy schedules, though not 
as good as the offline constructions. 
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Theorem ([5]) There is a simple randomized on- 
line algorithm for producing, with high proba- 
bility, a schedule of length O(c + d log(Nd)) 
using queues of size O(log(Nd)), where c is the 
congestion, d is the dilation, and N is the number 
of packets. 


In the special case where it is assumed that all 
packets follow shortest paths in the network, 
Meyer, auf der Heide and Voécking produced 
a simple randomized online algorithm that pro- 
duces, with high probability, a schedule of length 
O(c +d + log Nd) steps, but queues can be as 
large as O(c) [7]. For arbitrary paths, the LMR 
online result was ultimately improved to O(c + 
d + log't€ N) steps, for any €>0 with high 
probability, in a series of two papers by Rabani 
and Tardos [9], and Rabani and Ostrovsky [8]. 
Online protocols have also been studied in a set- 
ting where additional packets are dynamically 
injected into the network in adversarial settings, 
see [10] for a survey. 

The discussion is briefly returned to the first 
task, namely to pre-construct the set of paths. 
Clearly, the goal is to find, for a particular set 
of packets with pre-specified sources and des- 
tinations, a set of paths that minimizes c + d. 
Srinivasan and Teo [12] designed an off-line algo- 
rithm that produces a set of paths whose c + d 
is provably within a constant factor of optimal. 
Together with the offline LMR result, that gives 
a constant-factor approximation problem for the 
offline store-and-forward packet routing problem. 
Note that the approach of trying to minimize 
c + d rather than c alone seems crucial; produc- 
ing schedules within a constant factor of optimal 
congestion c is hard, and in fact has been shown 
to be related to the integrality gap for multicom- 
modity flow [1, 2]. 


Applications 


Network Emulations 

Typically, a guest network G is emulated by 
a host network H by embedding G into H. Nodes 
of G are mapped to nodes of H, while edges of 
G are mapped to paths in H. If P is the set of 
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e paths (each corresponding to an edge in the 
guest network G), the congestion and dilation 
can be defined analogously as in the main result 
for the set of paths P, namely c denotes the 
maximum number of paths that use any one edge 
of H, and d is the length of the longest path in 
P. In addition, the load | is defined to be the 
maximum number of nodes in G that are mapped 
to a single node of H. Once G is embedded in 
H, H can emulate G as follows: Each node of 
H emulates the local computations performed by 
the / (or fewer) nodes mapped to it in O(/) time. 
Then for each packet sent along an edge of G, H 
sends a packet along the corresponding path in 
the embedding; using the offline LMR result this 
takes O(c + d) steps. Thus, H can emulate each 
step of Gin O(c + d + 1) steps. 


Job Shop Scheduling 

Consider a scheduling problem with jobs 
Ji,---Jr and machines m,,...,ms for which 
each job must be performed on a specified 
sequence of machines (in a specified order). 
Assume each job spends unit time on each 
machine, and that no machine has to work on 
any job more than once (In the language of 
job-shop scheduling, this is the non-preemptive, 
acyclic, job-shop scheduling problem, with unit 
jobs). There is a mapping of sequences of 
machines to paths and jobs to packets so that 
this becomes an encoding of the main packet 
routing problem, where if c is now to be the 
maximum number of jobs that have to be run on 
any one machine, and d to be the maximum 
number of different machines that work on 
any single job, there becomes O(c) congestion 
and O(d) dilation for the corresponding packet- 
routing instance. Then the offline LMR result 
shows that there is a schedule that completes 
all jobs in O(c + d) steps, where in addition, 
each job waits at most a constant number of 
steps in between consecutive machines (and the 
queue of jobs waiting for any particular machine 
will always be bounded by a constant). Similar 
techniques to those developed in the LMR paper 
have subsequently been applied to more general 
instances of Job-Shop Scheduling; see [4, 11]. 


Packet Routing 


Open Problems 


The main open problem is whether there is a ran- 
domized online packet scheduling that matches 
the offline LMR bound of O(c + d). The bound 
of [8] is close, but still grows logarithmically with 
the total number of packets. 

For job shop scheduling, it is unknown 
whether the constant-factor approximation 
algorithm for the non-preemptive acyclic job- 
shop scheduling problem with unit length jobs 
implied by LMR can be improved to a PTAS. It is 
also unknown whether there is a constant-factor 
approximation in the case of arbitrary-length 
jobs. 
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Problem Definition 


A multi-queue network switch serves m incoming 
queues by transmitting data packets arriving at 
m input ports through one single output port. In 
each time step, an arbitrary number of packets 
may arrive at the input ports, but only one packet 
can be passed through the common output port. 
Each packet is marked with a value indicating its 
priority in the Quality of Service (QoS) network. 
Since each queue has bounded capacity B and the 
rate of arriving packets can be much higher than 
the transmission rate, packets can be lost due to 
insufficient queue space. The goal is to maximize 
the throughput which is defined as the total value 
of transmitted packets. The problem comprises 
two dependent questions: buffer management, 
namely which packets to admit into the queues, 
and scheduling, i.e., which (FIFO) queue to use 
for transmission in each time step. 
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Two scenarios are distinguished: (a) unit 
packet value (All packets have the same value.), 
(b) arbitrary packet values. 

The problem is considered as an online prob- 
lem, i.e., at time step ¢, only the packet arrivals 
until ¢ are known, but nothing about future packet 
arrivals. The online switch performance in QoS 
based networks is studied by using competitive 
analysis in which the throughput of the online 
algorithm is compared to the throughput of an op- 
timal offline algorithm knowing the whole arrival 
sequence in advance. 

If not stated otherwise, the admission control 
is assumed to allows preemption, i.e., packets 
once enqueued need not necessarily be transmit- 
ted, but can be discarded. 


Problem 1 (Unit Value Problem) All packets 
have value 1. Since all packets are thus equally 
important, the admission control policies sim- 
plify: All arriving packets are to be enqueued; 
in the case of buffer overflow, it does not matter 
which packets are stored in the queue and which 
packets are discarded. 


Problem 2 (General problem) Each packet has 
its individual value where usually a range [1, a] 
is given for all packets. A special case consists 
in the two value model where the values are 
restricted to {1, a}. 


Key Results 
Unit Value Packets 
Deterministic Algorithms 


Theorem 1 ({1]) For any buffer size B, the com- 
petitive ratio of each deterministic online algo- 
rithm is not smaller than (epg + 2)/(ep —1+ 7) 
> <& w 1.58 where eg = ((B + 1)/B)”. 
Theorem 2 ((4]) Every work-conserving online 
algorithm is 2-competitive. 


Theorem 3 ([{1]) For any buffer size B, the com- 
petitive ratio of any greedy algorithm, which 
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always serves a longest queue (LQF), is at least 
2— ; ifm > B. 


Algorithm: SGR (Semi-Greedy) In each time 
step, the algorithm executes the first rule that 
applies to the current buffer configuration. 


1. If there is a queue buffering more than | B/2| 
packets, serve the queue currently having the 
maximum load. 

2. If there is a queue the hitherto maximum load 
of which is less than B, serve among these 
queues the one currently having the maximum 
load. 

3. Serve the queue currently having the maxi- 
mum load. 


Ties are broken by choosing the queue with the 
smallest index. The hitherto maximum load is 
reset to O for all queues whenever all queues are 
unpopulated in SGR’s configuration. 


Theorem 4 ({1]) Jf B is even, then SGR is 


v ~ 1.89-competitive. If B is odd, then SGR 


is (2 + 58). competitive where 63 = 


B+1° 


Theorem 5 ((3]) Algorithm EM®” (not 
stated in detail due to space limitation), 
which is based on a water level algorithm 
and uses a fractional matching in an online 
constructed graph, achieves a competitiveness 
of e/(e — 1) + (LHm + 1])/B), where Hy, de- 
notes the mt" harmonic number. Thus, EM*? is 


asymptotically >*;-competitive for B > logm. 


1 


Randomized Algorithms 


Theorem 6 ([{1]) The competitive ratio of 
each randomized online algorithm is at least 
0 = 1.4659 for any buffer size B(@ = 1+ —) 


a+ 
where a is the unique positive root of 
e* =a +2). 


Theorem 7 (Generalizing technique [9]) /f 
there is a randomized c-competitive algorithm 
A for B =1, then there is a randomized c- 
competitive algorithm A for all B. 
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Algorithm: RS (Random Schedule) 


1. The algorithm uses m auxiliary queues 


QO1,...,QOm of sizes By,..., Bm (different 
buffer sizes at the distinct ports are allowed), 
respectively. These queues contain real 


numbers from the range (0,1), where each 
number is labeled as either marked or 
unmarked. Initially, these queues are empty. 

2. Packet arrival: If a new packet arrives at queue 
qi, then the algorithm chooses uniformly at 
random a real number from the range (0,1) 
that is inserted into queue Q; and labeled as 
unmarked. If queue Q; was full when the 
packet arrived, the number at the head of the 
queue is deleted prior to the insertion of the 
new number. 

3. Packet transmission: Check whether queues 
Q,1,...,Qm contain any unmarked number. 
If there are unmarked numbers, let Q; be the 
queue containing the largest unmarked num- 
ber. Change the label of the largest number to 
“marked” and select queue g; for transmission. 
Otherwise (no unmarked number), transmit 
a packet from any non-empty queue if such 
exists. 


Theorem 8 ([4]) Randomized algorithm RS is 


SES : a 
soy © 1.58-competitive. 


Algorithm: RP (Random Permutation) Let P be 
the set of permutations of {1,...,m}, denoted 
as m-tuples. Choose z € P according to the uni- 
form distribution and fix it. In each transmission 
step, choose among the populated queues that 
one whose index is most to the front in the 
m-tuple 1. 


Theorem 9 ([9]) Randomized algorithm RP 
is 3-competitive for B=1. By Theorem 7, 
there is a randomized algorithm RP _ that is 
3-competitive for arbitrary B. 


Arbitrary Value Packets 


Definition 1 A switching algorithm ALG is 
called comparison-based if it bases its decisions 
on the relative order between packet values (by 
performing only comparisons), with no regard to 
the actual values. 
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Theorem 10 (Zero-one principle [5]) Let ALG 
be a comparison-based switching algorithm (de- 
terministic or randomized). ALG is c-competitive 
if and only if ALG achieves a c-competitiveness 
for all packet sequences whose values are re- 
stricted to {0, 1} for every possible way of break- 
ing ties between equal values. 


Algorithm: GR (Greedy) Enqueue a new packet 
if 


e the queue is not full 

* or a packet with the smallest value in the 
queue has a lower value then the new packet. 
In this case, a smallest value packet is dis- 
carded and the new packet in enqueued. 


Algorithm: TLH (Transmit Largest Head) 


1. Buffer management: Use algorithm GR inde- 
pendently in all m incoming queues. 

2. Scheduling: At each time step, transmit the 
packet with the largest value among all pack- 
ets at the head of the queues. 


Theorem 11 ([(5]) Algorithm 
competitive. 


TLH is 3- 


Algorithm: TL (Transmit Largest) 


1. Buffer management: Use algorithm GR inde- 
pendently in all m incoming queues. 

2. Scheduling: At each time step, transmit the 
packet with the largest value among all pack- 
ets stored in the queues. 


Algorithm: GS“ (Generic Switch) 


1. Buffer management: Apply buffer manage- 
ment policy A to all m incoming queues. 

2. Scheduling: Run a simulation of algorithm TL 
(in the preemptive relaxed model) with the 
online input sequence o. Adopt all scheduling 
decisions of TL, i.e., at each time step, transmit 
the packet at the head of the queue used by TL 
simulation. 


Theorem 12 (General reduction [4]) Let 
GS“ denote the algorithm obtained by running 
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algorithm GS with the event-driven single-queue 
buffer management policy A (preemptive or 
non-preemptive) and let ca be the competitive 
ratio of A. The competitive ratio of GS“ satisfies 
Cgs4 < 2-CA. 


Applications 


The unit value scenario models most current 
networks, e.g., IP networks which only support 
a “best effort” service in which all packet streams 
are treated equally, whereas the scenario with 
arbitrary packet values integrates full QoS capa- 
bilities. 

The general reduction technique allows 
to restrict oneself to investigate single-queue 
buffer problems. It can be applied to a 1.75- 
competitive algorithm named PG by Bansal 
et al. [7], which achieves the best ratio known 
today, and yields an algorithm GS that is 3.5- 
competitive for multi-queue buffers (3.5 is still 
higher than 3 which is the competitive ratio of 
TLH). In the 2-value preemptive model, Lotker 
and Patt-Shamir [8] presented a mark&flush 
algorithm mf that is 1.30-competitive for single 
queue buffers and that the general reduction 
technique transforms into a 2.60-competitive 
algorithm GS’ for multi-queue buffers. 

For the general non-preemptive model, 
Andelman et al. [2] presented a policy for a single 
queue called Exponential-Interval Round-Robin 
(EIRR), which is (e[In@])-competitive, and 
showed also a lower bound of @(logq@). In the 
multi-queue buffer case, the general reduction 
technique provides a non-preemptive (e[Ina])- 
competitive algorithm. 


Open Problems 


It is known from Theorem 3 that the competitive 
ratio of any greedy algorithm in the unit value 
model is at least 2 if m >> B. Which is the 
tight upper bound for greedy algorithms in the 
opposite case B >> m? 

The proof of the lower bound e/(e —1) in 
Theorem 1 uses m >> B whereas Theorem 5 
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achieves e/(e—1) as an upper bound for 
B > logm. In [4], a lower bound of 1.366 
is shown, independent of B and m. Which is 
the optimal competitive ratio for arbitrary B 
and m? 

Due to the general reduction technique in 
Theorem 7, the competitive ratio for multi-queue 
buffer algorithms can be improved if better com- 
petitiveness results for single queue buffer algo- 


rithms are achieved. Currently, lass x 1.43 [2] 
and 1.75 [7] are the best known lower and up- 
per bounds, respectively. How to reduce this 


gap? 


Cross-References 


Online Paging and Caching 
Packet Switching in Single Buffer 


Recommended Reading 


1. Albers S, Schmidt M (2005) On the performance of 
greedy algorithms in packet buffering. SIAM J Comput 
35:278-304 

2. Andelman N, Mansour Y, Zhu A (2003) Competitive 
queueing policies for QoS switches. In: Proceedings 
of the 14th ACM-SIAM symposium on discrete algo- 
rithms (SODA), pp 761-770 

3. Azar Y, Litichevskey M (2004) Maximizing through- 
put in multiqueue switches. In: Proceedings of the 12th 
annual European symposium on algorithms (ESA), pp 
53-64 

4. Azar Y, Richter Y (2003) Management of multi-queue 
switches in QoS networks. In: Proceedings of the 35th 
ACM symposium on theory of computing (STOC), pp 
82-89 

5. Azar Y, Richter Y (2004) The zero-one principle for 
switching networks. In: Proceedings of the 36th ACM 
symposium on theory of computing (STOC), pp 64-71 

6. Azar Y, Richter Y (2004) An improved algorithm for 
CIOQ switches. In: Proceedings of the 12th annual 
European symposium on algorithms (ESA). LNCS, vol 
3221, pp 65-76 

7. Bansal N, Fleischer L, Kimbrel T, Mahdian M, 
Schieber B, Sviridenko M (2004) Further improve- 
ments in competitive guarantees for QoS buffering. In: 
Proceedings of the 31st international colloquium on 
automata, languages, and programming (ICALP), pp 
64-71 

8. Lotker Z, Patt-Shamir B (2003) Nearly optimal FIFO 
buffer management for two packet classes. Comput 
Netw 42(4):48 1-492 


Packet Switching in Single Buffer 


9. Schmidt M (2005) Packet buffering: randomization 
beats deterministic algorithms. In: Proceedings of the 
22nd annual symposium on theoretical aspects of com- 
puter science (STACS). LNCS, vol 3404, pp 293-304 


Packet Switching in Single Buffer 


Rob van Stee 
University of Leicester, Leicester, UK 


Keywords 


Buffering 


Years and Authors of Summarized 
Original Work 


2003; Bansal, Fleischer, 
Schieber, Sviridenko 


Kimbrel, Mahdian, 


Problem Definition 


In this entry, consider a quality-of-service (QoS) 
buffering system that is able to hold B packets. 
Time is slotted. At the beginning of a time step, a 
set of packets (possibly empty) arrives, and at the 
end of the time step, a single packet may leave 
the buffer to be transmitted. Since the buffer has 
a bounded size, at some point packets may need 
to be dropped. The buffer management algorithm 
has to decide at each step which of the packets 
to drop and which packets to transmit, subject 
to the buffer capacity constraint. The value of a 
packet p is denoted by v(p). The system obtains 
the value of the packets it sends and gains no 
value otherwise. The aim of the buffer manage- 
ment algorithm is to maximize the total value of 
transmitted packets. 

In the FIFO model, the packet transmitted at 
time ¢ is always the first (oldest) packet in the 
buffer. 

In the nonpreemptive model, packets accepted 
to the queue will be transmitted eventually and 
cannot be dropped. In this model, the best com- 


Packet Switching in Single Buffer 


petitive ratio achievable is O(log a), where a is 
the ratio of the maximum value of a packet to the 
minimum [1,2]. 

In the preemptive model, packets can also 
be dropped at some later time before they are 
served. The rest of this entry focuses on this 
model. Mansour, Patt-Shamir, and Lapid [11] 
were the first to study preemptive queuing 
policies for a single FIFO buffer, proving that 
the natural greedy algorithm (see definition 
in Fig.1) maintains a competitive ratio of at 
most 4. This bound was improved to the tight 
value of 2 by Kesselman, Lotker, Mansour, 
Patt-Shamir, Schieber, and Sviridenko [8]. An 
alternative proof of the 2-competitiveness, due 
to Kimbrel [9], is presented in Epstein and van 
Stee’s survey on buffer management [5]. 

The greedy algorithm is not optimal since it 
never preempts a packet until the buffer is full and 
this might be too late. The first algorithm with a 
competitive ratio strictly below 2 was presented 
by Kesselman, Mansour, and van Stee [7]. This 
algorithm uses a parameter f and introduces an 
extra rule for processing arrivals that is executed 


Packet Switching in 
Single Buffer, Fig. 1 The 
natural greedy algorithm 
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before rules (1) and (2) of the greedy algorithm. 
This rule is formulated in Fig. 2. 

It is shown in [7] that by taking B = 15, the 
algorithm preemptive greedy (PG) has a com- 
petitive ratio of 1.983. The analysis is rather 
complicated and is done by assigning the value of 
packets served by the offline algorithm to packets 
served by PG. 

A lower bound of 5/4 for this problem was 
shown in [11]. This was improved to /2 in [2] 
and then to 1.419 in [7]. 


Key Results 


A modification of PG was presented by Bansal, 
Fleischer, Kimbrel, Mahdian, Schieber, and 


Sviridenko [3]. It changes rule 0 to rule 0’ 
(Fig. 3). 

Thus, the modification compared to PG is that 
this algorithm finds a “locally optimal” packet to 
evict. We will denote modified preemptive greedy 
by MPG. 


Packet Switching in Single Buffer, Fig. 2 Extra rule for the preemptive greedy algorithm 


Packet Switching in 
Single Buffer, Fig. 3 
Modified preemptive 
greedy 
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Theorem 1 ({3]) For B = 4, MPG has a com- 
petitive ratio of 1.75. 


The proof begins by showing that in order to 
analyze the performance of MPG, it is sufficient 
to consider only input instances in which the 
value of each packet is either 0 or B’ for some 
i => 0, but ties are allowed to be broken by the 
adversary. 

The authors then define an interval structure 
for input instances. An interval J is said to be 
of type i if at every step ¢ € 7, MPG outputs a 
packet of value at least B’ , and J is a maximal 
interval with this property. 

TZ; is the collection of maximal intervals of 
type 7, and Z is the union of all Z;’s. This is 
a multiset, since an interval of type i can also 
be contained in an interval of one or more types 
[Gk 

This induces an interval structure which is a 
sequence of ordered rooted trees in a natural way: 
The root of each tree is an interval in Zo, and the 
children of each interval J € Z; are all the max- 
imal intervals of type i + 1 which are contained 
in J. These children are ordered from left to right 
based on time, as are the trees themselves. The 
intervals of type i (and the vertices that represent 
them) are distinguished by whether or not an 
eviction of a packet of value at least 8’ occurred 
during the interval. 

To complete the proof, the authors show that 
for every interval structure 7, the competitive 
ratio of MPG on instances with interval structure 
JT can be bounded by the solution of a linear 
program indexed by 7. Finally, it is shown that 
for every 7 and every B > 4, the solution of this 
program is at most 2 — 1/8. 


Applications 


In recent years, there has been a lot of interest 
in quality-of-service networks. In regular IP net- 
works, packets are indistinguishable, and in case 
of overload, any packet may be dropped. In a 
commercial environment, it is much more prefer- 
able to allow better service to higher-paying cus- 
tomers or customers with critical requirements. 
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The idea of quality-of-service guarantees is that 
packets are marked with values which indicate 
their importance. 

This naturally leads to decision problems at 
network switches when many packets arrive and 
overload occurs. The algorithm presented in this 
entry can be used to maximize network perfor- 
mance in a network which supports quality of 
service. 


Open Problems 


Despite substantial advances in improving the 
upper bound for this problem, a fairly large gap 
remains. Sgall (quoted in Jawor [6]) showed that 
the performance of PG is as good as that of MPG. 
Englert and Westermann [4] showed that PG has 
a competitive ratio of at most /3 ~ 1.732 and 
at least 1 + 1/2\/2 ~ 1.707. Thus, to improve 
further, a different algorithm will be needed. 

The authors also note that Lotker and Patt- 
Shamir [10] studied the special case of two packet 
values and derived a 1.3-competitive algorithm, 
which closely matches the corresponding lower 
bound of 1.28 from Mansour et al. [11]. 
An open problem is to close the remaining 
small gap. 
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Problem Definition 


Given a user query, current web search services 
retrieve all web pages that contain the query 
terms resulting in a huge number of web pages 
for the majority of searches. Thus it is cru- 
cial to reorder or rank the resulting documents 
with the goal of placing the most relevant docu- 
ments first. Frequently, ranking uses two types of 
information: (1) query-specific information and 
(2) query-independent information. The query- 
specific part tries to measure how relevant the 
document is to the query. Since it depends to 
a large part on the content of the page, it is 
mostly under the control of the page’s author. 
The query-independent information tries to es- 
timate the quality of the page in general. To 
achieve an objective measure of page quality, 
it is important that the query-independent infor- 
mation incorporates a measure that is not con- 
trolled by the author. Thus the problem is to 
find a measure of page quality that: (a) cannot 
be easily manipulated by the web page’s author 
and (b) works well for all web pages. This is 
challenging as web pages are extremely hetero- 
geneous. 


Key Results 


The hyperlink structure of the web is a good 
source for basing such a measure as it is hard 
for one author or a small set of authors to influ- 
ence the whole structure, even though they can 
manipulate a subset of the web pages. Brin and 
Page showed that a relatively simple analysis of 
the hyperlink structure of the web can be used 
to produce a quality measure for web documents 
that leads to large improvements in search qual- 
ity. The measure is called the PageRank mea- 
sure. 


Linear Algebra-Based Definition 

Let n be the total number of web pages. The 
PageRank vector is an ndimensional vector with 
one dimension for each web page. Let d be a 
small constant, like 1/8, let deg(p) denote the 
number of hyperlinks in the body text of page p, 
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and let PR(p) denote the PageRank value of page 
p. Assume first that every page contains at least 
one hyperlink. In such a collection of web pages, 


PR(p) =d/n+(1-d)* 
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the PageRank vector is computed by solving a 
system of linear equations that contains for each 
page p the equation 


» 


PR(q)/deg(q) 


q has hyperlink to p 


In matrix notation, the PageRank vector is the 
Eigenvector with 1Norm one of the matrix A with 
d/n + (1 — d)/deg(q) for entry Ag, if g has a 
hyperlink to p and d/n otherwise. 

If web pages without hyperlinks are allowed 
in this linear system, then they might become 
“PageRank sinks”, i.e., they would “receive” 
PageRank from the pages pointing to them, but 
would not “give out” their PageRank, potentially 
resulting in an “unusually high” PageRank value 
for themselves. Brin and Page proposed two ways 
to deal with web pages without out-links, namely 
either to recursively delete them until no such 
web pages exist anymore in the collection or to 
add a hyperlink from each such page to every 
other page. 


Random Surfer Model 

Let the web graph G = (V,E) be a directed 
graph such that each node corresponds to a web 
page and every hyperlink corresponds to a di- 
rected edge from the referencing node to the 
referenced node. The PageRank can also be in- 
terpreted as the following random walk in the 
web graph. The random walk starts at a random 
node in the graph. Assume in step k it visits 
page g. Then it flips a biased coin, and with 
probability d or if g has no out-edges, it selects 
a random node out of V and visits it in step 
k + 1. Otherwise it selects a random out-edge 
of the current node and visits it in step k + 1. 
(Note that this corresponds to adding a directed 
edge from every page without hyperlink to ev- 
ery node in the graph.) Under certain conditions 
(which do not necessarily hold on the web) the 
stationary distribution of this random walk cor- 
responds to the PageRank vector. See [1, 4] for 
details. 


Brin and Page also suggested computing the 
PageRank vector approximately using the power 
method, i.e., by setting all initial values to 1/n 
and then repeatedly using the PageRank vector of 
the previous iteration to compute the PageRank 
vector of the current iteration using the above 
linear equations. After a hundred iterations, 
barely any values change and the computation 
is stopped. 


Applications 


The PageRank measure is used as one of the 
factors by Google in its ranking of search results. 
The PageRank computation can be applied to 
other domains as well. Two examples are repu- 
tation management in P2P networks and learning 
word dependencies in natural language process- 
ing. In relational databases PageRank was used 
to weigh database tuples in order to improve 
keyword searching when a user does not know the 
schema. Finally, in rank aggregation PageRank 
can be used to find a permutation that minimally 
violates a set of given orderings. See [1] for more 
details. 

Variations of PageRank were studied as well. 
Personalizing the PageRank computation such 
that the values reflect the interest of a user has 
received a lot of attention. See [3] for a survey on 
this topic. It can also be modified to be used for 
detecting web search spam, i.e., web pages that 
try to manipulate web search results [1]. 
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Problem Definition 


In the general form of multiprocessor precedence 
scheduling problems a set of n tasks to be exe- 
cuted on m processors is given. Each task requires 
exactly one unit of execution time and can run on 
any processor. A directed acyclic graph specifies 
the precedence constraints where an edge from 
task x to task y means task x must be completed 
before task y begins. A solution to the problem 
is a schedule of shortest length indicating when 
each task is started. The work of Jung, Serna, 
and Spirakis provides a parallel algorithm (on 
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a PRAM machine) that solves the above problem 
for the particular case that m = 2, that is where 
there are two parallel processors. 

The two processor precedence constraint 
scheduling problem is defined by a directed 
acyclic graph (dag) G = (V, E). The vertices 
of the graph represent unit time tasks, and the 
edges specify precedence constraints among 
the tasks. If there is an edge from node x to 
node y then x is an immediate predecessor of 
y. Predecessor is the transitive closure of the 
relation immediate predecessor, and successor 
is its symmetric counterpart. A two processor 
schedule is an assignment of the tasks to time 
units 1,...,¢ so that each task is assigned exactly 
one time unit, at most two tasks are assigned to 
the same time unit, and if x is a predecessor of y 
then x is assigned to a lower time unit than y. The 
length of the schedule is ¢. A schedule having 
minimum length is an optimal schedule. Thus the 
problem is the following: 


Name Two processor precedence constraint 
scheduling 

Input A directed acyclic graph 

Output A minimum length schedule preserv- 


ing the precedence constraints. 
Preliminaries 


The algorithm assume that tasks are partitioned 
into levels as follows: 


(i) Every task will be assigned to only one 


level 

(ii) Tasks having no successors will be as- 
signed to level 1 and 

(iii) For each level i, all tasks which are imme- 


diate predecessors of tasks in level i will 
be assigned to level i + 1. 


Clearly topological sort will accomplish the 
above partition, and this can be done by an 
NC algorithm that uses O(n*) processors and 
O(logn) time, see [3]. Thus, from now on, it is 
assumed that a level partition is given as part of 
the input. For sake of convenience two special 
tasks, to and t* are added, in such a way that 
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the original graph could be taught as the graph 
formed by all tasks that are successors of fo and 
predecessors of t*. Thus fo is a predecessor of 
all tasks in the system (actually an immediate 
predecessor of tasks in level the highest level 
L(G)) and t* is a successor of all tasks in 
the system (an immediate successor of level 1 
tasks). 

Notice that if two tasks are at the same level 
they can be paired. But when x and y are at 
different levels, they can be paired only when 
neither of them is a predecessor of the other. 
Let L(G) denote the number of levels in a given 
precedence graph G. A level schedule schedules 
tasks level by level. More precisely, suppose lev- 
els L(G),...,i + 1 have already been scheduled 
and there are k unscheduled tasks remaining on 
level i. When k is even, those tasks with are paired 
with each other. When k is odd, k — 1 of the tasks 
are paired with each other, while the remaining 
task may (but not necessarily) be paired with 
a task from a lower level. 

Given a level schedule level i jumps to level i’ 
(i’ < i) if the last time step containing a task from 
level i also contains a task from level i’. If the 
last task from level i is scheduled with an empty 
slot, it is said that level i jumps to level 0. The 
jump sequence of a level schedule is the list of 
levels jumped to. A lexicographically first jump 
schedule is a level schedule whose jump sequence 
is lexicographically greater than any other jump 
sequence resulting from a level schedule. 

Given a graph G a level partition of G is 
a partition of the nodes in G into two sets in 
such a way that levels 0,...,k are contained in 
one set (the upper part) denoted by U, and levels 
k +1,..., L in the other (the lower part) denoted 
by L. Given a graph G and a level i, the i-partition 
of G (or the partition at level i) is formed by 
the graphs U; and L; defined as U; contains all 
nodes x such that level(x) <i and L; contains 
all nodes x with level(x) >i. Note that each i- 
partition determines two different level partitions 
depending on whether level i nodes are assigned 
to the upper or the lower part. A task x € U; is 
called free with respect to a partition at level i if x 
has no predecessors in L;. 
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Auxiliary Problems 

The algorithm for the two processors precedence 
constraint scheduling problem uses as a building 
block an algorithm for solving a matching prob- 
lem in a particular graph class. 

A full convex bipartite graph G is a triple 
(V,W,E), where V = {vj,...,vg} and 
W = {w1,..., Wx} are disjoint sets of vertices. 
Furthermore the edge set E satisfies the following 
property: If (vj,w;) € E then (vg,w;) € E for 
all g => i. Thus, from now on it is assumed that 
the graph is connected. 

A set F CE is a matching in the graph 
G = (V, W, E) iff no two edges in F have a com- 
mon endpoint. A maximal matching is a matching 
that cannot be extended by the addition of any 
edge in G. A lexicographically first maximal 
matching is a maximal matching whose sorted 
list of edges is lexicographically first among all 
maximal matchings in G. 


Key Results 


When the number of processors m is arbitrary 
the problem is known to be NP-complete [8]. 
For any m > 3, the complexity is open [6]. Here 
the case of interest has been m = 2. For two 
processors a number of efficient algorithms has 
been given. For sequential algorithms see [2, 4, 
5] among others. The first deterministic parallel 
algorithm was given by Helmbold and Mayr [7], 
thus establishing membership in the class NC. 
Previously [9] gave a randomized NC algorithm 
for the problem. Jung, Serna and Spirakis present 
a new parallel algorithm for the two processors 
scheduling problem that takes time O(log” n) and 
uses O(n?) processors on a CREW PRAM. The 
algorithm improves the number of processors of 
the algorithm given in [7] from O(n’L(G)?), 
where L(G) is the number of levels in the prece- 
dence graph, to O(n*). Both algorithms compute 
a level schedule that has a lexicographically first 
jump sequence. 

To match jumps with tasks it is used a solution 
to the problem of computing the lexicographi- 
cally first matching for a special type of convex 
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bipartite graphs, here called full convex bipartite 
graphs. A geometric interpretation of this prob- 
lem leads to the discovery of an efficient parallel 
algorithm to solve it. 


Theorem 1 The lexicographically first maximal 
matching of full convex bipartite graphs can be 
computed in time O(logn) on a CREW PRAM 
with O(n3/logn) processors, where n is the 
number of nodes. 


The previous algorithm is used to solve efficiently 
in parallel two related problems. 


Theorem 2 Given a precedence graph G, there 
is a PRAM parallel algorithm that computes all 
levels that jump to level 0 in the graph L; and all 
tasks in leveli — 1 that can be scheduled together 
with a task in level i, fori = 1,..., L(G), using 
O(n?) processors and O(log? n) time. 


Theorem 3 Given a level partition of a graph 
G together with the levels in the lower part in 
which one task remains to be matched with some 
other task in the upper part of the graph. There 
is a PRAM parallel algorithm that computes 
the corresponding tasks in time O(logn) using 
n> /logn processors. 


With those building blocks the algorithm for two 
processor precedence constraint scheduling starts 
by doing some preprocessing and after that an 
adequate decomposition that insure that at each 
recursive call a number of problems of half size 
are solved in parallel. This recursive schema is 
the following: 


Algorithm Schedule 


0. Preprocessing 

1. Find a level i such that |U;|<n/2 and 
|Li| <n/2. 

2. Match levels that jump to free tasks in level i. 

. Match levels that jump to free tasks in Uj. 

4. If level i (or i + 1) remain unmatched try to 
match it with a non free task. 

5. Delete all tasks used to match jumps. 

6. Apply (1)-(5) in parallel to Z; and the modi- 
fied U;. 


WwW 
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Algorithm Schedule stops whenever the corre- 
sponding graph has only one level. 

The correction an complexity bounds for algo- 
rithm Schedule follows from the previous results, 
leading to: 


Theorem 4 There is an NC algorithm which 
finds an optimal two processors schedule for any 
precedence graph in time O(log? n) using O(n?) 
processors. 


Applications 


A fundamental problem in many applications is 
to devise a proper schedule to satisfy a set of 
constrains. Assigning people to jobs, meetings 
to rooms, or courses to final exam periods are 
all different examples of scheduling problems. 
A key and critical algorithm in parallel process- 
ing is the one mapping tasks to processors. In 
the performance of such an algorithm relies many 
properties of the system, like load balancing, total 
execution time, etc. Scheduling problems differ 
widely in the nature of the constraints that must 
be satisfied, the type of processors, and the type 
of schedule desired. 

The focus on precedence-constrained schedul- 
ing problems for directed acyclic graphs has 
a most direct practical application in problems 
arising in parallel processing. In particular in 
systems where computations are decomposed, 
prior to scheduling into approximately equal 
sized tasks and the corresponding partial ordering 
among them is computed. These constraints must 
define a directed acyclic graph, acyclic because 
a cycle in the precedence constraints represents 
a Catch situation that can never be resolved. 


Open Problems 


The parallel deterministic algorithm for the two 
processors scheduling problem presented here 
improves the number of processors of the Helm- 
bold and Mayr algorithm for the problem [7]. 
However, the complexity bounds are far from 
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optimal: recall that the sequential algorithm given 
in [5] uses time O(e + na(n)), where e is the 
number of edges in the precedence graph and 
a(n) is an inverse Ackermann’s function. Such 
an optimal algorithm might have a quite different 
approach, in which the levelling algorithm is not 
used. 

Interestingly enough computing the lexico- 
graphically first matching for full convex bipartite 
graphs is in NC, in contraposition with the results 
given in [1] which show that many problems de- 
fined through a lexicographically first procedure 
in the plane are P-complete. It is an interesting 
problem to show whether all these problems fall 
in NC when they are convex. 
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Problem Definition 


Given a weighted undirected graph G with n ver- 
tices and m edges, compute a minimum spanning 
tree (or spanning forest) of G on a parallel ran- 
dom access machine (PRAM) without concurrent 
write capability. 

A minimum spanning tree of a graph is a 
spanning tree with the smallest possible sum 
of edge weights. The parallel random access 
machine (PRAM) is an abstract model for 
designing parallel algorithms and understanding 
the power of parallelism. In this model, 
processors (each being a random access machine) 
work in a synchronous manner and communicate 
through a shared memory. PARM can be further 
classified according to whether it is allowed for 
more than one processor to read and write into the 
same shared memory location simultaneously. 
The strongest model is CRCW (concurrent-read, 
concurrent-write) PRAM, and the weakest is 
EREW (exclusive-read, exclusive-write) PRAM. 
For an introduction to PRAM algorithms, one can 
refer to Karp and Ramachandran [8] and JaJa [5]. 

The input graph G is assumed to be given in 
the form of adjacency lists. Furthermore, isolated 
(degree-0) vertices are removed, and hence it is 
assumed that m > n. 
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Key Results 


The MST problem is related to the connected 
component (CC) problem, which is to find the 
connected components of an undirected graph. 
Sequential algorithms for solving the CC prob- 
lem and the MST problem in O(m) time and 
O(m log n) time, respectively, were known a few 
decades ago. A number of more efficient MST 
algorithms have since been published, the most 
recent of which is Pettie and Ramachandran’s 
algorithm [9], which is provably optimal. 

In the parallel context, both problems are often 
solved in a similar way. With respect to CRCW 
PRAM, the two problems can be solved using 
O(logn) time and n + m processors (see, e.g., 
Cole and Vishkin [3]). Using randomization, (n+ 
m)/ logn processors are sufficient to solve these 
problems in O(log) expected time [2, 10]. 

For EREW PRAM, O(log? 7) time algorithms 
for the CC and MST problems were developed 
in the early 1980s. For a while, it was believed 
that the exclusive write models (including both 
concurrent read and exclusive read) could not 
overcome the O(log” n) time bound [8]. The first 
breakthrough was due to Johnson and Metaxas 
[6]; they devised O(log! n) time algorithms for 
the CC problem and the MST problem. These 
results were soon improved to O(log n log log n) 
time by Chong and Lam. If randomization is 
allowed, the time complexity can be improved to 
O(log n) expected time and optimal work [7, 10, 
11]. Finally, Chong, Han, and Lam [1] obtained 
an algorithm for MST (and CC) using O(logn) 
time and n + m processors. This algorithm does 
not need randomization. Notice that O(logn) 
is optimal since these graphs’ problems are at 
least as hard as computing the OR of n bits, 
and Cook et al. [4] have proven that the latter 
requires (2(log 7) time on exclusive-write PRAM 
no matter how many processors are used. 

Below is a sketch of some ideas for computing 
a minimum spanning tree in parallel without 
using concurrent write. 

Without loss of generality, assume that the 
edge weights are all distinct. Thus, G has a 
unique minimum spanning tree, which is denoted 
by TG. Let B be a subset of edges in G which 
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contains no cycle. B induces a set of trees F = 
{T,, T2,--- , T;} in a natural sense — two vertices 
in G are in the same tree if they are connected 
by an edge of B. B is said to be a A-forest if each 
tree T € F has at least A vertices. For example, if 
B is the empty set, then B is a | forest; a spanning 
tree is an n-forest. 

Suppose that B is a A-forest and its edges are 
all found in 73. Then B can be augmented to give 
a 2A -forest using a greedy approach: Let F’ be an 
arbitrary subset of F including all trees T € F 
with fewer than 2A vertices. For every tree in F’, 
pick its minimum external edge (i.e., the smallest- 
weight edge connecting to a vertex outside the 
tree). Denote B’ as this set of edges. It can be 
proven that B’ consists of edges in TG only and 
B U B’ is a 2A-forest. The above idea allows us 
to find Tg in [log] stages as follows: 


1.B<@¢ 
2. For i = 1 to |logn| do /* Stage i */ 

1. Let F be the set of trees induced by B on 
G. Let F’ be an arbitrary subset of F such 
that F’ includes all trees T € F with fewer 
than 2! vertices. 

2. B; < {e|eis the minimum external edge 
of Te F};B<— BUB; 

3. returnB 


Different strategies for choosing the set F’ in 
Step l(a) may lead to different B;’s. Never- 
theless, B[1,i] is always a subset of Tg and 
induces a 2! -forest. In particular, B[1, |logn |] 
induces exactly one tree, which is exactly T. 
Using standard parallel algorithmic techniques, 
each stage can be implemented in O(log) time 
on EREW PRAM using a linear number of pro- 
cessors (see, e.g., [5],). Therefore, TG can be 
found in O(log’ n) time. In fact, most parallel 
algorithms for finding MST are based on a similar 
approach. These parallel algorithms are “sequen- 
tial” in the sense that the computation of B; starts 
only after B;—1 is available. 

The O(log n)-time EREW algorithm in [1] is 
based on some structural properties related to 
MST and can compute the B;’s in a more par- 
allel fashion. In this algorithm, there are |log n | 
concurrent threads (a thread is simply a group 
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of processors). For 1 < i < |logn|, Thread i 
aims at computing B;, and it actually starts long 
before Thread i — 1 has computed B;—1, and it 
receives the output of Threads 1 to i — | (ie., 
B,,--- , Bj-1) incrementally. More specifically, 
the algorithm runs in |logn| supersteps, where 
each superstep lasts O(1) time. Thread i delivers 
B; at the end of the ith superstep. The compu- 
tation of Thread 7 is divided into |logi| phases. 
Let us first consider a simple case when 7 is a 
power of 2. Phase | of Thread 7 starts at the 
(i/2 + 1)th superstep, i.e., when By,--- , Biz 
are available. Phase 1 takes no more than i/4 
supersteps, ending at the (7/2 +7/4)th superstep. 
Phase 2 starts at the (i/2 + i/4 + 1)th superstep 
(ie., when Bj/241,°++ , Bis2+is4 are available) 
and uses i/8 supersteps. Each subsequent phase 
uses half as many supersteps as the preceding 
phase. The last phase (Phase logi) starts and 
ends within the ith superstep; note that Bj_, is 
available after (i — 1)th superstep. 


Applications 


Finding connected components or MST is a key 
step in several parallel algorithms for other graph 
problems. For example, the Chong-Han-Lam 
algorithm implies an O(logn) time algorithm 
for finding ear decomposition and biconnectivity 
without using concurrent write. 
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Problem Definition 


Computational Social Choice is an interdisci- 
plinary research area involving Economics, Po- 
litical Science, and Social Science on the one 
side and Mathematics and Computer Science 
(including artificial intelligence and multi-agent 
systems) on the other side. Concrete questions 
addressed in this area include the following three: 
How efficiently can one determine the winner of 
an election, given a number of votes with prefer- 
ences over a number of alternatives? Is it possible 
to obtain a desirable outcome of an election 
by executing a number of campaigning actions? 
(Formally, such problems are often modeled as 
bribery.) Can the chair of an election influence 
the result of an election by modifying the set 
of available alternatives (e.g., by encouraging 
some alternatives (candidates) to participate in 
the run)? 

The main objective of parameterized complex- 
ity is to analyze computationally hard problems 
with respect to multiple input parameters and 
to classify these problems according to their in- 
herent difficulty when “viewed through different 
parameterizations.” The complexity of a problem 
typically depends on the values of a multitude of 
input parameters, so this approach allows to clas- 
sify NP-hard problems on a much finer scale than 
in classical complexity theory (where the com- 
plexity of a problem is only measured relative 
to the input size). In particular, a parameterized 
problem consisting of an input instance J and a 
parameter k is called fixed-parameter tractable 
if it can be solved in f(k) - |J|°™ time for a 
computable function f which typically grows at 
least exponentially in k. This still means, how- 
ever, that the problem is efficiently solvable for 
small parameter values. 
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Parameterized complexity analysis, which 
still adheres to a worst-case complexity scenario, 
has been successfully applied in many areas. 
Computational Social Choice, although being 
a more recent application area [5], is among 
the most natural ones. To make this more 
precise, we next discuss in more detail three 
prominent voting problems that were already 
briefly mentioned in the introductory part: winner 
determination, campaign management/bribery, 
and control. 

An election E := (C,V) consists of a set C 
of m alternatives C,,C2,...,Cm and a list V of 
n voters V1, U2,..., Un. Each voter v has a linear 
order >, over the set C which we call a pref- 
erence order. For example, if C = {c1,c2,c3}, 
then the preference order cy >y C2 >y C3 Of 
voter v indicates that v likes c; most (the Ist 
position), then c2, and c3 least (the 3rd position). 
For any two distinct alternatives c and c’, we 
write c >, c’ if voter v prefers c over c’. For 
an election E = (C,V), an alternative c € C is 
a Condorcet winner if any other alternative c’ € 
C \ {c} satisfies 


{ueVl[cormyc}> l{fveV]c! >» c}H. 


It is important to note that voting problems typi- 
cally come along with many natural parameter- 
izations, the two most obvious ones being the 
number of candidates (which is typically small in 
political elections, say) or the number of voters 
(which is often small in applications concerning 
multi-agent systems or decision making by com- 
mittees). A further, standard type of parameter 
refers to the size (or the value) of the solution that 
we seek. 

We now give an example of a winner deter- 
mination problem, focusing on the voting rule 
due to Dodgson (also known as the writer Lewis 
Carroll). In Dodgson’s system, the score of a 
candidate is the smallest number of swaps of 
adjacent candidates needed to ensure that this 
candidate is a Condorcet winner. In the DODG- 
SON SCORE problem we ask about the score of a 
given candidate in an election: 
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Input: Anelection EF := (C, V), a distinguished 
alternative c € C, and a nonnegative integer k. 

Question: Can one make c the Condorcet 
winner by swapping a total number of at 
most k pairs of neighboring alternatives 
(i.e., k “bubble sort operations’’) in the voters’ 
preference orders? 


Our campaign management/bribery example 
is based on the t-Approval voting rule. In f- 
Approval, every voter can assign one point to 
each of t most preferred alternatives, and the al- 
ternatives with maximum total number of points 
win. Notably, 1-Approval simply is the frequently 
used Plurality voting rule. Now, the NP-hard 
problem SWAP BRIBERY FOR t-APPROVAL reads 
as follows. 


Input: Anelection F := (C, V), a distinguished 
alternative c € C, a cost function assigning 
a nonnegative integer cost to every swap-of- 
consecutive-candidates operation, and a non- 
negative integer f called the budget. 

Question: Can one make c a winner by swap 
operations of total cost at most 6? 


Intuitively, each swap operation means a cam- 
paigning effort that convinces the voter that one 
candidate is better than the other and comes at a 
given cost (measured in time or money or some 
other way). 

Finally, our control example focuses on con- 
trol by adding alternatives in an election based on 
the plurality voting role. Note that other control 
actions include, for example, deleting alternatives 
or adding/deleting voters (assuming a powerful 
and corrupted chair of an election). The goal of 
the chair can either be to ensure someone’s vic- 
tory (constructive control) or preclude someone 
from winning (destructive control); we focus on 
the former. The NP-hard problem PLURALITY 
CONSTRUCTIVE CONTROL BY ADDING ALTER- 
NATIVES reads as follows: 


Input: Anelection F := (C, V), a distinguished 
alternative c € C, a set of “spoiler alterna- 
tives” for possible addition, and a nonnegative 
integer k. 
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Question: Can one make c a winner by adding 
at most k spoiler alternatives? 


Note that we assume that every voter has a clear 
linear order over all alternatives (including the 
spoiler alternatives) and that this is known by the 
manipulating election chair. 


Key Results 


We again start with our winner determination 
example. Bartholdi, Tovey, and Trick [1] were the 
first to provide an “ILP-based” fixed-parameter 
tractability result in the context of Computational 
Social Choice (actually the result was stated 
implicitly). They developed an integer linear 
program (ILP) to solve the NP-hard DODGSON 
SCORE problem and gave a running time bound 
based on a famous result of Lenstra, concerning 
the exact solvability of integer linear programs 
with “few” variables. Without having explicitly 
stated this in their publication, this implies fixed- 
parameter tractability for DODGSON SCORE 
with respect to the parameter number m of 
alternatives. 

Bartholdi et al. [1]’s integer linear program for 
DODGSON SCORE reads as follows. It computes 
the Dodgson score of an alternative c. 


min) j - X;,; Subject to 
i,j 


Vi€ Vey eg = Ni, 
J 


VyeC?)> eiprmgady, 


iJ 


Xi,j 2 0. 


Here, V denotes the set of preference order 
types (i.e., the set of different preference orders 
in the given election), N; denotes the number of 
voters of type 7, x;,; denotes the number of voters 
with preference order of type i for which alter- 
native c will be moved upward by / positions, 
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€;,j,y 18 1 if the result of moving alternative c by j 
positions upward in a preference order of type i 
is that c gains an additional voter support against 
alternative y, and 0 otherwise. Furthermore, dy is 
the deficit of c with respect to alternative y, that 
is, the minimum number of voter supports that 
c must gain against y to defeat y in a pairwise 
comparison. If c already defeats y, then dy = 
Altogether, the integer linear program contains 
at most m - m! variables x;,;, where m denotes 
the number of alternatives. Thus, the number 
of variables in the described integer linear pro- 
gram is upper-bounded by a function in parame- 
ter m, yielding fixed-parameter tractability due to 
Lenstra’s result. 

We remark that beyond the above parame- 
terization by the number m of alternatives, it 
is also known that DODGSON SCORE is fixed- 
parameter tractable with respect to the parameter 
total number of swaps [4] (this is an exam- 
ple of a parameter that measures the solution 
value). 

Now we briefly discuss some parameterized 
complexity results for SWAP BRIBERY FOR 
t-APPROVAL due to Dorn and Schlotter [8]. 
The SWAP BRIBERY problem was introduced 
by Elkind et al. [9], who have shown that the 
problem is NP-complete for a variety of voting 
rules, including t-Approval (for ¢ > 2). Dorn and 
Schlotter provided a detailed discussion of the 
complexity of the problem for ¢t-Approval, and, 
in particular, they have shown that the problem 
is fixed-parameter tractable when parameterized 
by the number of voters. On the contrary, if 
we take ¢ to be the parameter (i.e., the problem 
no longer considers a single, fixed voting rule 
but a whole family of them), then the problem 
is W[1]-hard (and, unless unlikely complexity 
class collapses occur, the problem is not fixed- 
parameter tractable). 

The complexity of control problems was first 
studied by Bartholdi, Tovey, and Trick [2], who 
— in particular — have shown that PLURALITY 
CONSTRUCTIVE CONTROL BY ADDING AL- 
TERNATIVES (PCCAC) is NP-complete. Control 
problems have received relatively little attention 
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from the parameterized complexity perspective. 
For example, among other issues, Betzler and 
Uhlmann [3] considered parameterization by 
the number of voters (for Copeland voting rule, 
which we do not discuss here). Liu et al. [10] 
considered the parameterization by the solution 
size (i.e., the number of candidates to be 
added) and, among other results, obtained W[2]- 
hardness for PCCAC (in essence, this already 
follows from the proof of Bartholdi, Tovey, and 
Trick). 


Open Problems 


1. We sketched the ILP-based fixed-parameter 
tractability result for DODGSON SCORE. A 
key question for this and many other prob- 
lems shown to be fixed-parameter tractable 
using Lenstra’s result is whether the (imprac- 
tical) ILP formulation can be replaced by 
a direct combinatorial algorithm (still pro- 
viding fixed-parameter tractability); we point 
to a recent survey [6] for a broader exposi- 
tion on that. Concerning DODGSON SCORE, 
it is also interesting to settle its parameterized 
complexity with respect to the number of 
votes [4]. 

2. One of the most intriguing questions regarding 
the complexity of SWAP BRIBERY is whether 
the problem (for some given voting rule) is 
fixed-parameter tractable when parameterized 
by the number of candidates (or, if not, 
then if at least there is a fixed-parameter 
tractable approximation scheme; interestingly, 
for SHIFT BRIBERY, a significantly simpler 
variant of the problem, such an approximation 
scheme indeed exists [7]). Dorn and 
Schlotter [8] showed that this problem is 
fixed-parameter tractable, but their proof only 
applies (in essence) to the case where each 
swap has the same cost. 

3. While there is quite a number of NP- 
completeness results regarding control 
problems and various voting rules, there are 
relatively few parameterized results. Can one 
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turn some of these NP-completeness results 
into fixed-parameter tractability results for 
some natural parameters? 


A recent survey article [6] contains several more 
research challenges concerning the parameter- 
ized complexity of problems from Computational 
Social Choice. 
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Problem Definition 


ONE-SIDED CROSSING MINIMIZATION (OSCM) 
can be viewed as a specific form of drawing 
a bipartite graph G = (Vi, V2, E), where all ver- 
tices from partition V; are assigned to the same 
line (also called layer) L; in the plane, with L, 
and L, being parallel. The vertex assignment to 
L, is fixed, while that to Ly is free and should 
be chosen in a way to minimize the number of 
crossings between etdes drawn as straight-line 
segments. 


Notations 

A graph G is described by its vertex set 
V and its edge set EF, i.e, G=(V, E), with 
ECVxV. The (open) neighborhood of 
a vertex v, denoted M(v), collects all vertices that 
are adjacent to v. N[v] = N(v) U {v} denotes 
the closed neighborhood of v. deg(v) = |N(v)| 
is the degree of v. For a vertex set S, 
N(S) = Uses N(v), and N[S] = N(S)US. 
G[S] denotes the graph induced by vertex set S, 
ie., G[S] = (S, EN(S x S)). A graph G = (V, 
E) with vertex set V and edge set E C V x V is 
bipartite if there is a partition of V into two sets 
V, and V> such that V = Vj UV, Vj N Vn = @, 
and E C Vj x V2. For clarity, G = (Vi, V2, E) 
is written in this case. 


Parameterized Algorithms for Drawing Graphs 


A two-layer drawing of a bipartite graph G = 
(V, V2, E) can be described by two linear orders 
<; on V; and <2 on V>. This drawing can 
be realized as follows: the vertices of V, are 
placed on a line L; (also called Jayer) in the 
order induced by <, and the vertices of V2 are 
placed on a second layer Ly (parallel to the first 
one) in the order induced by <2; then, draw 
a straight-line segment for each edge e = (1, u2) 
in E connecting the points that represent u; and 
Uy, respectively. A crossing is a pair of edges 
e = (uy,uU2) and f = (vj, v2) that cross in the 
realization of a two-layer drawing (G, <,, <2). It 
is well-known that two edges cross if and only 
if uy <1 vy and v2 <2 u2; in other word, this 
notion is a purely combinatorical object, inde- 
pendent of the concrete realization of the two- 
layer drawing. cr(G, <;, <2) denotes the number 
of crossings in the described two-layer drawing. 
In the graph drawing context, it is of course 
desirable to draw graphs with few crossings. 
In its simplest (yet probably most important) 
form, the vertex order in one layer is fixed, 
and the aim is to minimize crossings by choos- 
ing an order of the second layer. Formally, this 
means: 


Problem 1 (kK-OSCM) 

INPUT: A simple n-vertex bipartite graph 
G = (V,, V2, E) and a linear order <; on Vj, 
a nonnegative integer k (the parameter). 


Parameterized 
Algorithms for Drawing 
Graphs, Fig.1 The 
running example for 
OSCM 
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OUTPUT: If possible, a linear order <2 on V> such 
that cr(G, <1, <2) < k. If no such order exists, 
the algorithm should tell so. 


Given an instance G = (Vj, V2, E) and <, of 
OSCM and two vertices u,v € V3, 


Cu = er(G[N [{u, vil, <1 A(N (tu, v}) 
x N({u, v})), {@, v)}) - 


Hence, the closed neighborhoods of u and v are 
considered when assuming the ordering u <2 v. 
Consider the following as a running example: 


Example I In Fig. 1, a concrete drawing of 
a bipartite graph is shown. Is this drawing optimal 
with respect to the number of crossings, assuming 
the ordering of the upper layer being fixed? At 
some points, more than two edges cross; in that 
case, a number is shown to count the crossings. 
All crossings are emphasized by a surrounding 
box. 

Let us now compute the crossing number ma- 
trix (Cyy) for this graph. 


Cy @ b c de 
a - 4 5 0 1 
b 1 -—- 1 0 0 
c 3 3 0 1 
d 3 2 - 1 
e 23 2 0 - 
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The number of crossings in the given drawing 
can be hence computed as 


Cab + Cac + Cad + Cae + Che + Chd + Che 


+ Cod + Cce + Cde = 13. 


Key Results 


Exact exponential-time algorithms are mostly 
interesting when dealing with problems for which 
no polynomial-time algorithm is expected to 
exist. 


Theorem 1 ((6]) The decision problem corre- 
sponding to k-OSCM is NP-complete. 


In the following, to state the results, let 
G = (Vj, V2, E) be an instance of OSCM, where 
the ordering <, of V; is fixed. 

It can be checked in polynomial time if an 
order of V2 exists that avoids any crossings. 
This observation can be based on either of the 
following graph-theoretic characterizations: 


Theorem 2 ({3]) cr(G, <1, <2) = 0 if and only 
if G is acyclic and, for every path (x,a,y) of 
G with x,y € Vj, it holds: for all u€ V, with 
XxX <1 Uu <, y, the only edge incident to u (if any) 
is (u, a). 


The previously introduced notion is crucial due to 
the following facts: 


Lemma 3 yoy yev5,u<5u Cuw = (G, <1, <2). 


u<2U 


Theorem 4 ((9]) Jf k is the minimum number 
of edge crossings in an OSCM 
(G = (Vi, Vp, E), <1), then 


D 


u,ve V2 ,ux~v 


De 


u,veEV2,ux~v 


instance 


min{cyy, Cuu} < k < 1.4664 


MIN{Cyy, Cou} - 


In fact, Nagamochi also presented an approx- 
imation algorithm with a factor smaller than 
1.4664. 
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Furthermore, for any u € V2 with deg(u) > 0, 
let J, be the leftmost neighbor of u on L;, and 
r, be the rightmost neighbor of u. Two vertices 
u,v € V> are called unsuited if there exists some 
x € N(u) with ly <j x <1 ry, or there exists 
some x € N(v) with 1, <; x <, r,. Otherwise, 
they are called suited. Observe that, for {u,v} 
suited, Cyy - Cy, = 0. Dujmovicé and Whitesides 
have shown: 


Lemma 5 ([5]) In any optimal ordering <2 of 
the vertices of V2, u <2 v is found if r, <, ly. 


This means that all suited pairs appear in their 
natural ordering. 

This already allows us to formulate a first 
parameterized algorithm for OSCM, which is 
a simple search tree algorithm. In the course of 
this algorithm, a suitable ordering <2 on Vj is 
gradually constructed; when settling the order- 
ing between uw and v on Vo, u <2 v or Vv <2 U 
is committed. A generalized instance of OSCM 
therefore contains, besides the bipartite graph 
G = (Vj, V2, E), a partial ordering <2 on V3. 
A vertex uv € V2 is fully committed if, for all 
u € V> \ {u, v}, {u, v} is committed. 

Lemma 5 allows us to state the following rule: 


RRI: For every pair of vertices {u,v} from V2 
with c,y = 0, commit u <2 v. In the example, d 
would be fully committed by applying RR1, since 
the d-column in the crossing number matrix is all 
zeros; hence, ignore d in what follows. 

Algorithm | is a simple search tree algorithm 
for OSCM that repeatedly uses Rule RR1. 


Lemma 6 OSCM can be solved in time O* (2*). 


Proof Before any branching can take place, the 
graph instance is reduced, so that every pair of 
vertices {u,v} from V2 which is not committed 
satisfies min{c,y, Cy} => 1. Therefore, each re- 
cursive branch reduces the parameter by at least 
one. Oo 


It is possible to improve on this very simple 
search tree algorithm. A first observation is that 
it is not necessary to branch at {x, y} C V2 with 
Cxy = Cyx. This means two modifications to Al- 
gorithm 1: 
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Algorithm 1 A search tree algorithm solving OSCM, called OSCM-ST-simple 


Require: a bipartite graph G = (Vj, V2, E), an integer k, a linear ordering < on Vj, a partial ordering <2 on V2 
Ensure: YES iff the given OSCM instance has a solution 


repeat 
Exhaustively apply the reduction rules, adjusting <2 and k accordingly. 
Determine the vertices whose order is settled by transitivity and adjust <2 and k accordingly. 
until there are no more changes to <2 and to k 
: if k < 0 or <2 contains both (x, y) and (y, x) then 
return NO. 
else if d{x, y} C V2 : neither x <2 y nor y <2 x is settled then 
if OSCM-ST-simple(G, k — 1, <1, <2 U {(x, y)}) then 


un 


return YES 
10: else 
return OSCM-ST-simple(G, k — 1, <1, <2 U {(y, x)}) 
end if 
else 
return YES 

15: end if 
¢ Line 5 should exclude cyy = Cy x. 7 
e Line 12 should arbitrary commit some 

{x,y} C Vp that are not yet committed and a<c c<a 

recurse; only if all {x, y} C V2 are committed, 

YES is to be returned. 2 2 

b<a a<b 
a<b b<a 
These modifications immediately yield an b<c_ c<b 
O* (1.6182*) algorithm for OSCM. This is also —2 0 = 3 
the core of the algorithm proposed by Dujmovié 
and Whitesides [5]. There, more details are c<b b<c 
discussed, as, for example: 
<0 0 2 

¢ How to efficiently calculate all the crossing | e<a oe ee 

numbers C,y in a preprocessing phase. <0 0 (b<e) 
¢ How to integrate branch and cut elements 0 

in the algorithm that are surely helpful from <0 


a practical perspective. 

¢ How to generalize the algorithm for instances 
that allow integer weights on the edges (mul- 
tiple edges). 
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Fig. 2 A search tree example for OSCM 


A possible run of the improved search tree algo- 


Further improvements are possible if one gives ‘ithm is displayed in Fig. 2, with the (optimal) 
a deeper analysis of local patterns {x,y} € V2 Outcome shown in Fig. 3. 


such that cxycyx <2. This way, it has been 
shown: Variants and Related Problems have been dis- 


cussed in the literature. 


Theorem 7 ([4]) OSCM can be solved in time 


a 1. Change the goal of the optimization: mini- 
O* (1.4656*). 


mize the number of edges involved in cross- 
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Algorithms for Drawing 

Graphs, Fig.3 An 

optimal solution to the 

example instance 


ings (ONE-LAYER PLANARIZATION (OLP)). 
As observed in [7, 10], Theorem 2 almost 
immediately leads to an 0* (3*) algorithm for 
OLP that was subsequently improved down to 
O* (2*) in [10]. 

2. One could allow more degrees of freedom 
by considering two (or more) layer assign- 
ments at the same time. For both the cross- 
ing minimization and the planarization vari- 
ants, parameterized algorithms are reported in 
[3, 7, 10]. 

3. One can consider other additional constraints 
on the drawings or the admissible orderings; 
in [8], parameterized algorithms for two-layer 
assignment problems are discussed where the 
admissible orderings are restricted by binary 
trees. 


Applications 


Besides seeing the question of drawing bipartite 
graphs as an interesting problem in itself, e.g., 
for nice drawings of relational diagrams, this 
question quite naturally shows up in the so-called 
Sugiyama approach to hierarchical graph draw- 
ing, see [12]. This very popular approach tackles 
the problem of laying out a hierarchical graph in 
three phases: (1) cycle removal (2) assignment 
of verticesto layers, (3) assignment of vertices 
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2 3 4 >, 6 


Cc a e d 


to layers. The last phase is usually performed 
in a sweep-line fashion, intermediately solving 
many instances of OSCM. The third variant in 
the discussion above has important applications 
in computational biology. 


Open Problems 


As with all exponential-time algorithms, it is 
always a challenge to further improve on the 
running times of the algorithms or to prove lower 
bounds on those running times under reason- 
able complexity theoretic assumptions. Let us 
notice that the tacit assumptions underlying the 
approach by parameterized algorithmics are well 
met in this application scenario: e.g., one would 
not accept drawings with many crossings any- 
ways (if such a situation is encountered in prac- 
tice, one would switch to another way of rep- 
resenting the information); so, one can safely 
assume that the parameter is indeed small. 

This is also true for other NVP-hard sub- 
problems that relate to the Sugiyama approach. 
However, no easy solutions should be expected. 
For example, the DIRECTED FEEDBACK ARC 
SET PROBLEM [1] that is equivalent to the first 
phase is not known to admit a nice parameterized 
algorithm, see [2]. 
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Experimental Results 


Suderman [10] reports on experiments with 
nearly all problem variants discussed above, also 
see [11] for a better accessible presentation of 
some of the experimental results. 


URL to Code 


Suderman presents several JAVA applets related 
to the problems discussed in this article, see 
http://cgm.cs.mcgill.ca/~msuder/. 
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Problem Definition 


Parameterized strings, or p-strings, are strings 
that contain both ordinary symbols from an al- 
phabet & and parameter symbols from an alpha- 
bet IT. Two equal-length p-strings s and s’ are a 
parameterized match, or p-match, if one p-string 
can be transformed into the other by applying a 
one-to-one function that renames the parameter 
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symbols. The following example of a p-match is 
one with both ordinary and parameter symbols. 
The ordinary symbols are in lowercase and the 
parameter symbols are in uppercase: 


S=AbAbCAdbACdd 
S=DbEDbEDdAbDEdd 


In some of the problems to be considered, it 
will be sufficient to solve for p-strings in which 
all symbols are parameter symbols, as this is 
the more difficult part of the problem. In other 
words, the case in which © =@. In this case, the 
definition can be reformulated so that s and s’ are 
a p-match if there exists a bijection 7:II; — If, 
such that z(s) = s’, where z(s) is the renaming 
of each character of s via 7. 

The following problems will be considered. 
Parameterized matching — given a parameterized 
pattern p of length m and parameterized text f, 
find all locations 7 of a parameterized text ¢ for 
which p p-matches fj ... tj+m—1, where m = |p|. 
The same problem is also considered in two di- 
mensions. Approximate parameterized matching 
— find all substrings of a parameterized text ¢ 
that are approximate parameterized matches of 
a parameterized pattern p (to be fully defined 
later). 


Key Results 


Baker [4] introduced parameterized matching in 
the framework of her seminal work on discover- 
ing duplicate code within large programs for the 
sake of code minimization. An example of two 
code fragments that p-match taken from the X 
Windows system can be found in [4]. 


Parameterized Suffix Trees 

In [4] and in the follow-up journal versions [6,7], 
a novel method was presented for parameterized 
matching by constructing parameterized suffix 
trees. The advantage of the parameterized suf- 
fix tree is that it supports indexing, i.e., one 
can preprocess a text and subsequently answer 


Parameterized Pattern Matching 


parameterized queries p in O(|p|) time. In order 
to achieve parameterized suffix trees, it is nec- 
essary to introduce the concept of a predecessor 
string. A predecessor string of a string s has at 
each location i the distance between i and the 
location containing the previous appearance of 
the symbol. The first appearance of each symbol 
is replaced with a 0. For example, the predecessor 
string of aabbaba is 0,1,0,1,3,2,2. A simple and 
well-known fact is that: 


Observation 1 ([7]) s ands’ p-match if and only 
if they have the same predecessor string. 


Notice that this implies transitivity of parame- 
terized matching, since if s and s’ p-match and 
s’ and s” p-match, then, by the observation, s 
and s’ have the same predecessor string and, 
likewise, s’ and s” have the same predecessor 
string. This implies that s and s” have the same 
predecessor string and hence, by the observation, 
p-match. 

Moreover, one may also observe that if r is a 
prefix of s, then the predecessor string of r, by 
definition, is exactly the |r|-length prefix of the 
predecessor string of s. Hence, similar to regular 
pattern matching, a parameterized pattern p p- 
matches at location i of ¢ if and only if the | p|- 
length predecessor string of p is equal to the 
| p|-length prefix of the predecessor string of the 
suffix ¢; . . . t,. Combining these observations, it is 
natural to do as follows: create a (parameterized 
suffix) tree with a leaf for each suffix where the 
path from the root to the leaf corresponding to 
a given suffix will have its predecessor string 
labeling the path. Branching in the parameterized 
suffix tree, as with suffix trees, occurs according 
to the labels of the predecessor strings. See [4, 6, 
7] for an example. 

Baker’s method essentially mimics the Mc- 
Creight suffix tree construction [18]. However, 
while the suffix tree and the parameterized suffix 
tree are very similar, there is a slight hitch. A 
strong component of the suffix tree construction 
is the suffix link. This is used for the construction 
and, sometimes, for later pattern searches. The 
suffix link is based on the distinct right context 
property, which does not hold for the parameter- 
ized suffix tree. In fact, the node that is pointed 


Parameterized Pattern Matching 


to by the suffix link may not even exist. The main 
parts of [6,7] are dedicated to circumventing this 
problem. 

In [7] Baker added the notion of “bad” suf- 
fix links, which point to the vertex just above, 
i.e., closer to the root than the desired place, 
and of updating them with a lazy evaluation 
when they are used. The algorithm runs in time 
O(n|T]| log | =|). In [6] (which is chronologically 
later than [7] despite being the first to appear) 
Baker changed the definition of “bad” suffix links 
to point to just below the desired place. This 
turns out to have nice properties, and one can use 
more sophisticated data structures to improve the 
construction time to O(n(|I| + log|=])). 

Kosaraju [16] made a careful analysis of 
Baker’s properties utilized in the algorithm 
of [6] which suffer from the |II| factor. He 
pointed out two sources for this large factor. He 
handled these two issues by using a concatenable 
queue and maintaining it in a lazy manner. 
This is sufficient to reduce the |I1| factor to 
a log |II| factor, yielding an algorithm of time 
O(n (log |TI| + log |=])). 

Obviously if the alphabet or symbol set is 
large, the construction time may be O(n logn). 
Cole and Hariharan [9] showed how to construct 
the parameterized suffix trees in randomized 
O(n) time for alphabets and parameters 
taken from a polynomially sized range, e.g., 
[1,..., m°]. They did this by adding additional 
nodes to the tree in a back-propagation manner 
which is reminiscent of fractional cascading. 
They showed that this adds only O() nodes and 
allows the updating of the missing suffix links. 
However, this causes other problems and one 
may find the details of how this is handled in 
their paper. 


More Methods for Parameterized 

Matching 

Obviously the parameterized suffix tree 
efficiently solves the parameterized matching 
problem. Nevertheless, a couple of other 
results on parameterized matching are worth 
mentioning. 
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First, in [6] it was shown how to construct the 
parameterized suffix tree for the pattern and then 
to run the parameterized text through it, giving an 
algorithm with O(m) space instead of O(7). 

Amir et al. [2] presented a simple method 
to solve the parameterized matching problem 
by mimicking the algorithm of Knuth, Morris, 
and Pratt. Their algorithm works in O(n * 
min(log | IT], )) time independent of the alpha- 
bet size (||). Moreover, they proved that the log 
factor cannot be avoided for large symbol sets. 

In [5] parameterized matching was solved 
with a Boyer-Moore type algorithm. In [10] 
the problem was solved with a Shift-Or 
type algorithm. Both handle the average case 
efficiently. In [10] emphasis was also put on 
the case of multiple parameterized matching, 
which was previously solved in [14] with an 
Aho-Corasick automaton-style algorithm. 


Two-Dimensional Parameterized Matching 
Two-dimensional parameterized matching arises 
in applications of image searching; see [13] for 
more details. Two-dimensional parameterized 
matching is the natural extension of param- 
eterized matching where one seeks pmatches 
of a two-dimensional parameterized pattern p 
within a two-dimensional parameterized text f. 
It must be pointed out that classical methods 
for two-dimensional pattern matching, such as 
the L-suffix tree method, fail for parameterized 
matching. This is because known methods tend to 
cut the text and pattern into pieces to avoid going 
out of boundaries of the pattern. This is fine 
because each pattern piece can be individually 
evaluated (checked for equality) to a text piece. 
However, in parameterized matching, there is a 
strong dependency between the pieces. 

In [1] an innovative solution for the problem 
was given based on a collection of lineariza- 
tions of the pattern and text with the property 
to be currently described. Consider a lineariza- 
tion. Two elements with the same character, say 
“a,” in the pattern are defined to be neighbors 
if there is no other “a” between them in this 
linearization. Now take all the “a’’s of the pat- 
tern and create a graph Gg with “a’s as the 
nodes and edges between the two if they are 
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neighbors in some linearization. We say that two 
“a’s are chained if there is a path from one 
to the other in Gg. Applying one-dimensional 
parameterized matching on these linearizations 
ensures that any two elements that are chained 
will be evaluated to map to the same text value 
(the parameterized property). A collection of lin- 
earizations has the fully chained property if every 
two locations in p with the same character are 
chained. It was shown in [1] that one can obtain 
a collection of log m linearizations that is fully 
chained and that does not exceed pattern bound- 
ary limits. Each such linearization is solved with 
a convolution-based pattern-matching algorithm. 
This takes O(n? log m) time for each lineariza- 
tion, where the text size is n?. Hence, overall the 
time is O(n? log? m). 

A different solution was proposed in [13], 
where it was shown that it is possible to solve the 
problem in O(n? + m?>polylog m), where the 
text size is O(n”) and the pattern size is O(m7). 
Clearly, this is more efficient for large texts. 


Approximate Parameterized Matching 

Our last topic relates to parameterized matching 
in the presence of errors. Errors occur in various 
applications and it is natural to consider param- 
eterized matching with the Hamming distance 
metric or the edit distance metric. 

In [8] the parameterized matching problem 
was considered in conjunction with the edit dis- 
tance. Here the definition of edit distance was 
slightly modified so that the edit operations are 
defined to be insertion, deletion, and parame- 
terized replacements, i.e., the replacement of a 
substring with a string that p-matches it. An 
algorithm for finding the “parameterized edit dis- 
tance” of two strings was devised whose effi- 
ciency is close to the efficiency of the algorithms 
for computing the classical edit distance. 

However, it turns out that the operation of 
parameterized replacement relaxes the problem 
to an easier problem. The reason that the prob- 
lem becomes easier is that two substrings that 
participate in two parameterized replacements are 
independent of each other (in the parameterized 
sense). 
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A more rigid, but more realistic, definition for 
the Hamming distance variant was given in [3]. 
For a pair of equal-length strings s and s’ and 
a bijection 2 defined on the alphabet of s, the 
z-mismatch is the Hamming distance between 
the image under z of s and s’. The minimal z- 
mismatch over all bijections z is the approximate 
parameterized match. The problem considered in 
[3] is to find for each location 7 of a text ¢ the 
approximate parameterized match of a pattern p 
with the substring beginning at location 7. In [3] 
the problem was defined and linear-time algo- 
rithms were given for the case where the pattern is 
binary or the text is binary. However, this solution 
does not carry over to larger alphabets. 

Unfortunately, under this definition, the 
methods for classical string matching with 
errors for the Hamming distance, also known 
as pattern matching with mismatches, seem to 
fail. Following is an outline of a classical method 
[17] for pattern matching with mismatches that 
uses suffix trees. 

The pattern is compared separately with each 
suffix of the text, beginning at locations 1< i < 
n—m -+ 1. Using a suffix tree of the text and pre- 
computed longest common ancestor information 
(which can be computed once in linear time [11]), 
one can find the longest common prefix of the 
pattern and the corresponding suffix (in constant 
time). There must be a mismatch immediately af- 
terwards. The algorithm jumps over the mismatch 
and repeats the process, taking into consideration 
the offsets of the pattern and suffix. 

When attempting to apply this technique to 
a parameterized suffix tree, it fails. To illustrate 
this, consider the first matching substring (up un- 
til the first error) and the next matching substring 
(after the error). Both of these substrings p-match 
the substring of the text that they are aligned 
with. However, it is possible that combined they 
do not form a p-match. See the example below. 
In the example abab p-matches cdcd followed 
by a mismatch and subsequently followed by 
abaa p-matching efee. However, different zr’s are 
required for the local p-matches. This example 
also emphasizes why the definition of [8] is a sim- 
plification. Specifically, each local p-matching 


Parameterized Pattern Matching 


substring is one replacement, i.e., abab with cdcd 
is one replacement and abaa with efee is one 
more replacement. However, the definition of 
[3] captures the globality of the parameterized 
matching, not allowing, in this case, abab to p- 
match to two different substrings. 


ababaabaa... 


..cdcddefee...: 


P 
t 


In [12] the problem of parameterized matching 
with k mismatches was considered. The param- 
eterized matching problem with k mismatches 
seeks all locations 7 in text f where the minimal 
wr-mismatch between p to t;...ti+m—1 has at 
most k mismatches. An O(nk!° + mk log m) 
time algorithm was presented in [12]. At the base 
of the algorithm, i.e., for the case where |p| = 
|t| = m, an O(m + k!*°) algorithm is used based 
on maximum matching algorithms. Then the al- 
gorithm uses a doubling scheme to handle the 
growing distance between potential parameter- 
ized matches (with at most k mismatches). Also 
shown in [12] is a strong relationship between 
maximum matching algorithms in sparse graphs 
and parameterized matching with k errors. 

The rigid, but more realistic, definition for the 
Hamming distance version given in [3] can be 
naturally extended to the edit distance. Lately, it 
was shown that this problem is nondeterministic 
polynomial-time complete [15]. 


Applications 


Parameterized matching has applications in code 
duplication detection in programming languages, 
in homework plagiarism detection, and in image 
processing, among others [1,4]. 
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Problem Definition 


Much research has been devoted to finding 
classes of propositional formulas in conjunctive 
normal form (CNF) for which the recognition 
problem as well as the propositional satisfiability 
problem (SAT) can be decided in polynomial 
time. Some of these classes form infinite chains 
C; C Cy C--- such that every CNF formula 
is contained in some C, for k_ sufficiently 
large. Such classes are typically of the form 
Cy = {F €CNF:2(F) <k}, where zis 
a computable mapping that assigns to CNF 
formulas F a non-negative integer 2(F); we 
call such a mapping a satisfiability parameter. 
Since SAT is an NP-complete problem (actually, 
the first problem shown to be NP-complete [1]), 
we must expect that, the larger k, the longer the 
worst-case running times of the polynomial- 
time algorithms that recognize and decide 
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satisfiability of formulas in C,. Whence there 
is a certain tradeoff between the generality 
of classes and the performance guarantee for 
the corresponding algorithms. Szeider [12] 
initiates a broad investigation of this tradeoff 
in the conceptional framework of parameterized 
complexity [2, 3, 6]. This investigation draws 
attention to satisfiability parameters a for which 
the following holds: recognition and satisfiability 
decision of formulas F with z(F) < k can be 
carried out in uniform polynomial time, that is, 
by algorithms with running time bounded by 
a polynomial whose order is independent of 
k (albeit, possibly involving a constant factor 
that is exponential in k). If a satisfiability 
parameter z allows satisfiability decision in 
uniform polynomial time, we say that SAT is 
fixed-parameter tractable with respect to 1. 


Satisfiability Parameters Based on Graph 
Invariants 

One can define satisfiability parameters by means 
of certain graphs associated with CNF formulas. 
The primal graph of a CNF formula is the graph 
whose vertices are the variables of the formula; 
two variables are joined by an edge if the vari- 
ables occur together in a clause. The incidence 
graph of a CNF formula is the bipartite graph 
whose vertices are the variables and clauses of 
the formula; a variable and a clause are joined by 
an edge if the variable occurs in the clause. 


Satisfiability Parameters Based 

on Backdoor Sets 

The concept of backdoor sets [13] gives rise 
to several interesting satisfiability parameters. 
Let C be a class of CNF formulas. A set B of 
variables of a CNF formula F is a strong C- 
backdoor set if for every partial truth assignment 
t: B — {true, false}, the restriction of F to t 
belongs to C. Here, the restriction of F to t is the 
CNF formula obtained from F by removing all 
clauses that contain a literal that is true under t 
and by removing from the remaining clauses all 
literals that are false under tT. 
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Key Results 


Theorem 1 (Gottlob, Scarcello, and Sideri [4]) 
SAT is fixed-parameter tractable with respect to 
the treewidth of primal graphs. 


Several satisfiability parameters that generalize 
the treewidth of primal graphs, such as the 
treewidth and clique-width of incidence graphs, 
have been studied [5, 10, 12]. 

The maximum deficiency of a CNF formula F 
is the number of clauses remaining exposed by 
a maximum matching of the incidence graph of F. 


Theorem 2 (Szeider [11]) SAT is _fixed- 
parameter tractable with respect to maximum 
deficiency. 


A CNF formula is minimal unsatisfiable if it 
is unsatisfiable but removing any of its clauses 
makes it satisfiable. Recognition of minimal un- 
satisfiable formulas is DP-complete [9]. 


Corollary 1 (Szeider [11]) Recognition of 
minimal unsatisfiable CNF formulas is fixed- 
parameter tractable with respect to the difference 
between the number of clauses and the number 
of variables. 


Theorem 3 (Nishimura, Ragde, and Szei- 
der [7]) SAT is fixed-parameter tractable with 
respect to the size of strong Horn-backdoor sets 
and with respect to the size of strong 2CNF- 
backdoor sets. 


Applications 


Satisfiability provides a powerful and general 
formalism for solving various important prob- 
lems including hardware and software verifica- 
tion and planning. Instances stemming from ap- 
plications usually contain a “hidden structure” 
(see, e.g., [13]). The satisfiability parameters con- 
sidered above are designed to make this hidden 
structure explicit in the form of small values for 
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the parameter. Thus, satisfiability parameters are 
a way to make the hidden structure accessible to 
an algorithm. 


Open Problems 


A new line of research is concerned with the iden- 
tification of further satisfiability parameters that 
allow a fixed-parameter tractable SAT decision 
are more general than the known parameters and 
apply well to real-world problem instances. 


Cross-References 


Maximum Matching 
Treewidth of Graphs 


Recommended Reading 


1. Cook SA (1971) The complexity of theorem-proving 
procedures. In: Proceedings of the 3rd annual sym- 
posium on theory of computing, Shaker Heights, pp 
151-158 

2. Downey RG, Fellows MR (1999) Parameterized com- 
plexity, Monographs in computer science. Springer, 
Berlin 

3. Flum J, Grohe M (2006) Parameterized complexity 
theory. Texts in theoretical computer science. An 
EATCS series, vol XIV. Springer, Berlin 

4. Gottlob G, Scarcello F, Sideri M (2002) Fixed- 
parameter complexity in AI and nonmonotonic rea- 
soning. Artif Intellect 138:55—-86 

5. Gottlob G, Szeider S (2007) Fixed-parameter algo- 
rithms for artificial intelligence, constraint satisfac- 
tion, and database problems. Comput J Spec Issue 
Parameterized Complex Adv Access 

6. Niedermeier R (2006) Invitation to fixed-parameter 
algorithms, Oxford lecture series in mathematics and 
its applications. Oxford University Press, Oxford 

7. Nishimura N, Ragde P, Szeider S (2004) Detect- 
ing backdoor sets with respect to Horn and binary 
clauses. In: Informal proceedings of SAT 2004, 7th 
international conference on theory and applications 
of satisfiability testing, Vancouver, 10-13 May 2004, 
pp 96-103 

8. Nishimura N, Ragde P, Szeider S (2007) Solving SAT 
using vertex covers. Acta Inf 44(7-8):509-523 

9. Papadimitriou CH, Wolfe D (1988) The complexity 
of facets resolved. J Comput Syst Sci 37:2-13 


1532 


10. Samer M, Szeider S (2007) Algorithms for propo- 
sitional model counting. In: Proceedings of LPAR 
2007, 14th international conference on logic for 
programming, artificial intelligence and reasoning, 
Yerevan, 15-19 Oct 2007. Lecture notes in computer 
science, vol 4790. Springer, Berlin, pp 484-498 

11. Szeider S (2004) Minimal unsatisfiable formulas 
with bounded clause-variable difference are fixed- 
parameter tractable. J Comput Syst Sci 69:656-674 

12. Szeider S (2004) On fixed-parameter tractable pa- 
rameterizations of SAT. In: Giunchiglia E, Tacchella 
A (eds) Theory and applications of satisfiability, 6th 
international conference, SAT 2003, selected and re- 
vised papers. Lecture notes in computer science, vol 
2919. Springer, Berlin, pp 188-202 

13. Williams R, Gomes C, Selman B (2003) On the 
connections between backdoors, restarts, and heavy- 
tailedness in combinatorial search. In: Informal pro- 
ceedings of SAT 2003 sixth international conference 
on theory and applications of satisfiability testing, 
5-8 May 2003, S. Margherita Ligure — Portofino, 
pp 222-230 


Parity Games 


Tim A.C. Willemse and Maciej Gazda 
Department of Mathematics and Computer 
Science, Eindhoven University of Technology, 
Eindhoven, The Netherlands 


Keywords 
Automata; Computer-aided verification; Deter- 


minacy; Infinite duration games; Perfect infor- 
mation; Two-player games 


Years and Authors of Summarized 
Original Work 


1991; Emerson, Jutla 
1991; Mostowski 


Problem Definition 


A parity game is an infinite duration game, played 
by players odd and even, denoted by O and O, 
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respectively on a directed, finite graph. Through- 
out this note, we let O denote an arbitrary player 
and we write © for ©’s opponent; i.e. o=0 
and = ©. 


Definition 1 (Parity game) A parity game is a 
tuple (V, E, 2, (Vo, Va)), where 


¢ V isa set of vertices, partitioned in a set Vo 
of vertices owned by player ©, and a set of 
vertices Vq owned by player U1, 

¢« ECV x Visa total edge relation, 

¢ 0Q:V — Nisa priority function that assigns 
priorities to vertices. 


The graph (V, £) underlying a parity game is 
often referred to as the arena. Parity games are 
depicted as graphs with diamond-shaped vertices 
representing vertices owned by player © and 
box-shaped vertices representing those owned by 
player U1. The priorities associated with vertices 
are written inside vertices; see the game depicted 
in Fig. 1. 

Imagine the following game, played on an 


arena. One starts by placing a token on a ver- 
tex. Then, players perpetually move this token 
according to a single simple rule: if the token is 
on some vertex v € Vo, player © gets to move the 
token to an adjacent vertex. The infinite sequence 
of vertices visited this way is referred to as a play 
and the parity of the lowest priority (associated 
with the vertices) that occurs infinitely often on 
the play defines its winner: player © wins if and 
only if this priority is even. 

It does not immediately follow from the above 
notion of winning that there is always a player 
that can win all plays that start in a given vertex. 
The most elementary of all problems concerning 
parity games is thus as follows. 


Problem 1 Are parity games determined? That 
is, for a given parity game and some vertex in that 
game, is there always a way in which one of the 
players can play such that, regardless of how her 
opponent moves, the resulting plays are won by 
her? 


The determinacy problem can be formalized 
by describing the choices players make when 
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Parity Games, Fig. 1 Parity games depicted as graphs with priorities associated with vertices, written inside vertices 


they engage in a play; that is, we can formally 
capture their strategies. More specifically, a strat- 
egy for player © determines for a vertex v be- 
longing to that player which of its adjacent ver- 
tices the token will be moved to next once the 
game play moves the token to v. Of course, 
such a decision can be based on the history of 
the play so far. We write vE to denote the set 
{w € V | (v,w) € E}. A strategy for player O 
is thus described adequately by a partial function 
o:V*Vo — V satisfying that for all sequences 
V,...Un € V* for which o is defined, we have 
O(v,...Un) € vU,L. For a given strategy o, a 
play p conforms to o if for all finite prefixes q 
of p, whenever o is defined for g, also go(q) is 
a prefix of p. We say that a strategy o for player 
O is winning from a vertex v if and only if © is 
the winner of every play that starts in v and that 
conforms to o. The problem of determinacy can 
then be rephrased as follows: for given vertex v in 
a game, is there always a player with a winning 
strategy from v? 

A vertex is won by some player if that player 
has a winning strategy from that vertex. The ver- 
tices won by player © and player Oi are denoted 
Wo and Wa, respectively; determining these sets 
is typically referred to as solving a game. This 
immediately leads to the second fundamental 
question: 


Problem 2 Can we compute Wy and Wa? 


Solving the above computational problem may 
not necessarily involve computing the winning 
strategies for both players. Nonetheless, the win- 
ning strategies themselves have their own merit, 
if only that they serve as certificates in proving 
that the winning sets for both players are indeed 
just that. This leads to the third problem of 
interest: 


Problem 3 Can we, for all vertices that are won 
by one of the players, compute winning strategies 
for that player? 


Key Results 


The first claim, stated below, positively answers 
the determinacy problem for parity games. 


Theorem 1 Parity games are determined: for 
every vertex either player > or player 0 has a 
winning strategy from that vertex. 


Determinacy of parity games already follows 
from a general result due to Martin [9] who 
showed that Borel games (which subsume parity 
games) are determined. The proof of the latter re- 
sult employs strategies that require infinite mem- 
ory. A deep result by Emerson and Jutla [1], and 
found independently by Mostowski [11], states 
that parity games are in fact memoryless deter- 
mined: if a player has a winning strategy from 
a vertex v, she also has a memoryless strategy 
that is winning for her; that is, player O always 
has a strategy o for which o(pv) = o(p’v) for 
all sequences pv and p’v for which it is defined 
and which can thus be represented by a partial 
function o/:V4 > V. 

A simpler and constructive proof of memo- 
ryless determinacy for parity games with a fi- 
nite arena was subsequently proposed by Mc- 
Naughton [10] and extended to games with an 
infinite arena by Zielonka [15]. From the memo- 
ryless determinacy result, it follows that the prob- 
lem of deciding whether player © has a winning 
strategy from a given vertex is in NP: essentially 
one can guess a memoryless strategy for player © 
and check in polynomial time whether it is win- 
ning; the latter can be done efficiently by showing 
the absence of odd cycles using e.g. [8]. In a 
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similar manner, one can prove that the problem 
is in coNP, making the decision problem one of 
those interesting problems that are in NPM coNP. 
In fact, in 1998, Jurdzinski [5] showed that the 
problem is in UPMcoUP. The complexity class 
UP is a subclass of NP and is defined to contain 
all problems that can be recognized by a non- 
deterministic polynomial time Turing machine 
that for every input has at most one accepting 
computation. From the memoryless determinacy, 
one also immediately obtains a positive answer to 
the second question. 


Theorem 2 The problem of deciding whether 
v € Woy is in NP(\coNP, and the sets Wy and 
Wa can be computed. 


McNaughton and Zielonka’s constructive 
proofs can be converted into a_ recursive 
algorithm for computing Wp and Wa for games 
with a finite arena. This algorithm, which we 
will introduce shortly, can be modified in a 
straightforward manner to also produce winning 
(memoryless) strategies for both players, thus 
also answering the third question. 


Theorem 3 Winning strategies for players © 
and U1 can be computed. 


Algorithms for Parity Games 

While the problem of computing the winning 
sets of a parity game is decidable, its exact 
complexity is still open, but over the years, the 
upper bound has been improved. We therefore 
summarize the various algorithms that have been 
invented for solving parity games so far, starting 
with a brief exposition of the recursive algorithm 
which is, as we described above, also interesting 
for its theoretical consequences. For the remain- 
der of this section, fix a parity game G = 
(V, E,2,(Vo,Voq)) with n vertices, m edges, 
and d different priorities. 

The recursive algorithm effectively decom- 
poses a game into subgames and solves these. 
An essential ingredient in this decomposition is 
the notion of an ©-attractor into a set of vertices 
U, denoting Attr8(U ), which is the least set A, 
satisfying A > U and 
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Algorithm 1: The recursive algorithm 


function SOLVE(G) 
Input: parity game G = (V, E, 2, (Vo, Vo)) 
Output: winning partition (Wo, Wa) 


if V = @ then (Wo, Wp) < 
else 
m < min{2(v) |v € V} 
ifm mod 2 = 0 then p < CO else p <— O 
end if 
U<+{vEV|2(v) =m} 
A< Attr2(U) 


(0,9) 


(W4, Wi) < SOLVE(G \ A) 
re, =@ then (Wp, Wp) <— (AU W;, 9) 
else 


B< Attr9(WS) 
(W4,,WL) < SoLvE(G \ B) 
(Wp, Ws) <— (Wh, Ww; UB) 
end if 
end if 
return (Wo, Wa) 
end function 


* forallvu € Vo, if vE NA # Q, then also 
veEA; 
* forall v € Vs, if vE C A, then also v € A. 


Intuitively, the O-attractor into U contains U and 
exactly those vertices for which © can force play 
into U. 

Writing GNA for the parity game (VN. A, EN 
(A x A), 2) 4, Vo N A, VQ A)), where 2| 4 is 
the priority function {2 restricted to the vertices 
in A, and writing G \ A for GM (V \ A), the 
recursive algorithm is defined as in Algorithm 1. 

The correctness of Algorithm 1 leans on the 
observation that lower priorities in the game dom- 
inate higher priorities and that revisiting these 
lower priorities is beneficial to the player with the 
“same parity” as this priority. The algorithm runs 
in polynomial space and its runtime complexity 
is O(m-n®). 

The first improvement on this runtime that 
maintains a polynomial space complexity was 
achieved by an algorithm due to Jurdziriski [6] in 
2000. This algorithm, colloquially known as the 
small progress measures (SPM) algorithm, runs 
in time O(d -m- G2). Jurdzinski’s algo- 


rithm builds on decorations of the game arena, 
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called parity progress measures and game parity 
progress measures; the former exist for a parity 
game with only odd-owned vertices if and only if 
the game only has even cycles (and thus is won by 
player ©); the latter extend parity progress mea- 
sures to arbitrary games and are essentially wit- 
nesses for winning strategies for player ©. The 
SPM algorithm computes game parity progress 
measures using a fixpoint iteration, ensuring the 
measures are the least measures that decrease 
along a play with each bad odd priority that 
is encountered and only increase when reaching 
beneficial even priorities. 

The next improvement came in 2006 and was 
based on a modification of the recursive algo- 
rithm, resulting in a subexponential algorithm 
with running time nOWn), see [7]. It relies on 
a notion called a dominion for a player: a set of 
vertices D that is won by that player by staying 
within D and without allowing her opponent to 
leave D. The main idea behind the algorithm is 
that it identifies small dominions (of size at most 
/2n) using a dedicated algorithm and removes 
them from the game prior to the recursive calls. 
This algorithm, in turn, inspired Schewe [13] to 
improve on the runtime complexity for games 
with a small number of priorities. Rather than 
using a brute-force method for searching and 
eliminating dominions, Schewe’s algorithm uti- 
lizes a modified SPM algorithm while executing 
the standard recursive algorithm. As a result, this 
reduces the complexity of solving parity games to 
O(m- (Py), where x is a small constant and 
y(d) x £. 

In parallel to the abovementioned solutions, 
a different family of algorithms has been devel- 
oped. They are based on the notion of strategy 
improvement, which has been known in the game 
theory since the 1960s. The first algorithm of this 
kind designed specifically for parity games is due 
to Voge and Jurdzinski [14]. In this approach, 
one requires devising an order on strategies <_ 
that satisfies two conditions. Firstly, the maxi- 
mal strategy w.r.t <o is winning for O on Wo. 
Secondly, there has to be an (efficiently com- 
putable) improvement procedure which, for every 
strategy o that is not <o-maximal, computes a 
better strategy o’ >o o. Strategy improvement 
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algorithms start with a certain initial strategy 
and perform a sequence of improvement steps, 
until the maximal strategy is reached. While a 
single iteration (improvement step) is typically 
efficient, so far no policy guaranteeing a poly- 
nomial number of iterations has been found. In 
fact, Friedmann [3, 4] has proved that the key 
strategy improvement algorithms have worst-case 
exponential running time. 


Applications 


Parity games underlie a number of problems in 
theoretical computer science. For instance, they 
served as a vehicle for elegantly proving the 
complementation lemma for automata on infinite 
trees, a crucial lemma in Rabin’s proof of the 
decidability of a particular second-order math- 
ematical theory. Parity games are also used in 
word and emptiness problems for a variety of 
(alternating) automata [1]. Moreover, they are 
closely related to other two-player, infinite du- 
ration games with perfect information such as 
mean payoff games, which have, among others, 
applications in scheduling. 

The practical significance of parity games 
stems from the fact that they have proved to be 
of great value in computer-aided software and 
hardware verification and synthesis. Of particular 
importance is the result that parity games are 
polynomial-time equivalent to model checking 
for the modal jz-calculus (see, for instance, [2]), 
a modal logic that expressively subsumes most 
of the popular temporal logics used in computer- 
aided verification. We present this transformation 
to illustrate the tight connection between game 
theory and logic. 


Parity Games for Model Checking 

Say we are given a structure (S, A, R), where A is 
a set of atomic actions, (S, R) is a directed graph 
in which S is a set of states, and RC SxAxS 
is a (for simplicity) total edge-labeled transition 
relation. The structure (S, A, R) is often referred 
to as a Labeled Transition System (LTS), and 
it serves the purpose of modeling the behavior 
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of software or hardware. The modal jz-calculus 
allows for reasoning about such behaviors; the 
logic is defined through the following grammar: 


fig u=true| false|X | fag| 
fv gilalf lla)f | vx. f | ux. f 


where a € A and X is a propositional variable, 
taken from a sufficiently large set of variables 
X. We write o to denote either jz or v. For sim- 
plicity, we assume that a propositional variable 
X in f is bound at most once (by some o) 
and occurrences of X are all within the scope 
of its binder. Expressions in the logic are inter- 
preted in the context of an LTS (S,A,R) and 
a mapping e:¥ —> 25, typically referred to 
as an environment, assigning sets of states to 
propositional variables. The modal operators (_)_ 
and [_]_ allow for reasoning with the transition 
relation of an LTS; e.g.,, (a) f will hold in states 
that have some a-successor satisfying f, whereas 
[a] f will hold in states for which all a-successors 
(if any) satisfy f. More formally, the meaning of 
a formula f is established by stating which states 
in the LTS satisfy it; this satisfaction relation, 
denoting s,e — /f, is defined inductively as 
follows: 


s,e - true 
s,eF false 
seEX iff s € e(X) 
seE fag iff see fands,eEg 
seEfveg iff see fors,eEg 
s,eFlalf iff forall (s,a,t)eER t,e 
E f holds 
s,eF (a)f iff exists (s,a,t) € R such that 
t,e E f holds 
seEvx.f iff seU{s’coS|S8’ Cc F(S’)} 
seFux.f iff sef\{s’cS| FS) CS} 


where F(T) = {t € S| te[X Bb T] FE 
ff} is a monotone operator in the complete 
lattice (25,<), and where e[X +> T] is 
the environment e in which X is assigned 
set JT. Perhaps due to its extreme expressive 
power, expressions in the modal j-calculus are 
famously known for being hard to understand. 
Expressions using only one fixpoint are 
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reasonably straightforward to interpret. For 
instance, the formula vX.(a)X holds for a 
state s with an infinite a-path: essentially, 
we are looking for the largest solution (its 
fixpoint) to the equation “X = (a)X,” or, more 
semantically, the largest set T C S that can 
be assigned to X that satisfies the equation 
T = {fs € S|at € T : (s,a,t) € R}. 
An expression such as pY.(a)¥ v (a)true 
holds for a state s whenever there is a finite 
sequence of b-transitions leading to a state 
having an a-successor. Mixing fixpoints allows 
for expressing more complicated properties, 
unfortunately at the expense of readability, 
making formulas such as vX.uY.(a)X Vv (b)Y 
(expressing that, when it holds, there is an 
infinite sequence of a,b steps in which stretch 
of b-steps (if any) is of finite length), hard to 
understand. 

The model checking problem is to decide, 
given some formula f, a state s in an LTS 
and some environment e, whether s,e F ff. 
This problem can be reduced to solving a 
parity game as follows: define a parity game 


(V, E,2,(Vo,Vg)) in which V = S x @(f), 
where ®(f) is the set of all subformulas of 
f, and in which E, Vo, and Vg are defined 
structurally as follows: 
vertex successor(s) owner 
(s, true) (s, true) © 
(s, false) (s, false) © 
(s,X) andoX.g (s,0X.g) © 
EOF) 
(s,X) andoX.g (s,X) © 
EOS) 
(s, f Ag) (s, f) and (s, g) O 
(s, f Vg) (s, f) and (s, g) ° 
(s, [a] f) all (t, f) for (s,a,t) O 
ER 
(s, (a) f) all (t, f) for (s,a,t) © 
ER 
(s, vX. f) (s, f) © 
(s, uX.f) (s, f) © 


The priority function is assigned in such a way 
that it meets the following conditions: 
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* 2((s,true)) = Oand Q((s, false)) = 1; 
¢ ifoX.g ¢ Of), then Q((s,X)) = Oifs € 
e(X) and Q2((s, X)) = 1 otherwise; 
° ifoX.g € ®(f), then 
— Q((s,X)) is even if o = 
otherwise, and 
— Q((s,X)) < Q(t, Y)) if Y depends on X 
Gif X is free in g inoY.g); 
¢ §2((s,g)) is maximal for other formulas g € 


P(f). 


v and odd 


There is an “optimal” assignment of priorities 
that does not assign values larger than the alter- 
nation depth of the y-calculus formula f [12]. 
Intuitively, the alternation depth is a measure 
of the degree of semantic alternations between 
j4- and v-operators. Using an optimal priority 
assignment yields better complexity bounds for 
the model checking problem. The theorem below 
establishes the connection between the model 
checking problem and the parity game solving 
problem. 


Theorem 4 s,e— f/f iffplayer O wins (s, f) in 
the constructed parity game. 


On the one hand, through the above reduction, 
practical algorithmic progress in solving parity 
games directly impacts the performance and scal- 
ability of tooling for conducting the verification 
and synthesis. Abstracting from syntactic details, 
the more elementary parity games setting, on the 
other hand, permits studying the true complexity 
of model checking and, at the same time, provides 
a better understanding of the dynamics of the 
modal j2-calculus. 


Open Problems 


Parity games are among the few problems that are 
in NPM co-NP for which no polynomial time al- 
gorithm has been found. The key open problem is 
thus whether there is a polynomial time algorithm 
for solving parity games. 


Problem 4 Can Wo and W, 
polynomial time? 


be computed in 
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Problem Definition 


Let c be a given compression function that maps 
strings A to their compressed representations 
c(A). The problem of compressed pattern match- 
ing (CPM) is defined as follows: 


Problem 1 (Compressed Pattern Matching) 
Given a pattern string P and a compressed 
text string ¢e(7), determine whether there is an 
occurrence of P in T, without decompressing 7. 


A CPM algorithm is said to be optimal if it 
runs in O(|P| + |e(7)|) time. The time/space 
complexity of the CPM problem can be a new 
criterion to evaluate compression schemes in ad- 
dition to the traditional ones: the compression 
ratio and the time/space complexity of compres- 
sion/decompression. 

The CPM problem was first defined in the 
work of Amir and Benson [1], and many studies 
have been made over different compression 
formats. Kida et al. [9] introduced a useful CPM- 
oriented abstraction of compression formats, 
named collage systems. Outputs of various 
compression algorithms — not only dictionary- 
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based compression algorithms but also grammar- 
based compression algorithms — can be regarded 
as collage systems, and hence algorithmic 
research working on collage systems is of great 
significance. They presented in the same paper a 
general Knuth-Morris-Pratt (KMP) algorithm on 
collage systems. A general Boyer-Moore (BM) 
algorithm on collage systems was also designed 
by almost the same authors [17]. 


Collage Systems 
Let »’ be a fixed finite alphabet. A collage system 
on & is a pair (D, S) defined as follows: 


¢ D is a sequence of assignments X; = 
expr,;; X2 = expro; ++: } Xn = expr, where, 
for each k = 1,...,n, Xx is a variable and 
expr; 1s any of the form: 

* S is a sequence X;,---X;, of variables de- 
fined in D. 


By the j length prefix (resp. suffix) truncation, 
we mean an operation on strings which takes a 
string w and returns the string obtained from w by 
removing its prefix (resp. suffix) of length 7. The 
variables Xx represent the strings val(X;) ob- 
tained by evaluating their expressions. A collage 
system (D,S) represents the string obtained by 
concatenating the strings val(X;,),...,val(Xi,) 
represented by variables X;,,..., Xi, of S. 

The size of D is the number 7 of assignments 
and denoted by |D|. The height of D, denoted 
by height(D), is defined to be the longest path 
length of the dependency graph of D, namely, a 
directed acyclic graph such that (1) the vertices 
are the variables in D and (2) a directed edge from 
Xx to X; exists if and only if D contains a non- 
primitive assignment X; = expr, such that X; 
appears in expr;. The length of S is the number ¢ 
of variables in S and denoted by |S]. The size of 
collage system (D, S) is defined to be |D| + |S]. 

It should be noted that any collage system 
can be converted into an equivalent one with 
|S| = 1, by adding a series of assignments with 
concatenation operations into D. This may imply 
S is unnecessary. However, a variety of com- 
pression schemes can be captured naturally by 


Pattern Matching on Compressed Text 


separating D (defining phrases) from S (giving a 
factorization of text T into phrases). How to ex- 
press outputs of existing compression algorithms 
is found in [9]. 

A collage system (D,S) is said to be 
truncation-free if D contains no truncation 
operation, repetition-free if D contains no 
repetition operation, and regular if it is 
truncation- and repetition-free. A regular collage 
system (D,S) is simple if |val(Y)| = 1 or 
|val(Z)| = 1 for every assignment X¥ = YZ 
of D. 


Outputs of grammar-based compression al- 
gorithms such as Re-Pair, Sequitur, and Byte 
Pair Encoding (BPE) fall into the class of regu- 
lar collage systems, and outputs of LZ78/LZW 
fall into the class of simple collage systems. 
LZ77 factorization is an abstraction of LZ77 and 
its variants, which has two variations depending 
upon whether self-referencing is allowed. The 
LZ77 factorization Z of T with (resp. without) 
self-referencing can be transformed into a collage 
system (resp. a repetition-free collage system) of 
size O(|Z| - log |Z|) generating T (see [5]). 

It should be mentioned that the so-called 
straight-line programs (SLPs) are the regular 
collage systems with |S| = 1. 


Key Results 


Theoretical Aspect 

Amir et al. [2] presented two solutions 
to CPM for LZW with time complexities 
O(\e(T)| log|P| + |P|) and O(\e(T)| + |P|?), 
respectively. The latter was generalized by Kida 
et al. [9] via the unified framework of collage 
systems. 


Theorem 1 (Kida et al. [9]) CPM for collage 
systems can be solved in O((|D| + |S\)- 


a fora € YU fe}, 

XX; fori, j <k, 

1X; fori < k and a positive integer /, 
xX p ] fori < k and a positive integer /, 
(X;)/ fori < k and a positive integer /. 
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height(D) + |P|*) time using O(\D| + |P|?) 
space. The factor height(D) is dropped for 
truncation-free collage systems. 


We briefly sketch the algorithm of [9]. It is 
originally intended to solve the all-occurrence 
version of the CPM problem and reports all 
locations of T at which P occurs with addi- 
tional time linearly proportional to the number 
of pattern occurrences. The algorithm has two 
stages: First, it preprocesses D and P, and second 
it processes the variables of S. In the second 
stage, it simulates the move of KMP automaton 
running on uncompressed text, by using two 
functions Jump and Output, both take as input 
a state g and a variable X. The former is used 
to substitute just one state transition for the con- 
secutive state transitions of the KMP automaton 
for the string val(X) for each variable X of 
S, and the latter is used to report all pattern 
occurrences found during the state transitions. 
Let 6 be the state-transition function of the KMP 
automaton. Then Jump(q,X) = 6(q,val(X)), 
and Output(q, X) is the set of lengths |w| of 
nonempty prefixes w of val(X) such that 5(q, w) 
is the final state. A naive two-dimensional array 
implementation of the two functions requires 
Q(|P|- | P|) space, and the size of Output(q, X) 
can be exponential in |D|. The data structures of 
[9] use only O(|D| + |P|?) space, are built in 
O(|D|- height(D) + | P|?) time, and enable us to 
compute Jump(q, X) in O(1) time and enumerate 
the set Output(q, X) in O(height(D) + £) time 
where £ = |Output(q, X)|. The factor height(D) 
is dropped for truncation-free collage systems. 

By replacing |D| + |S| with |e(7)|, the above 
theorem means that the existence version of CPM 
can be solved in O(|e(T)| + |P|*) time for 
any compression formats that fall into the class 
of truncation-free collage systems. |P|? is ac- 
ceptable since it is often smaller than |¢(7')| in 


(primitive assignment) 
(concatenation) 

(j length prefix truncation) 
(j length suffix truncation) 
(j times repetition) 
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practice. But removing the quadratic dependency 
on | P| is of great interest in theory. 

Consider a maximal variable sequence S[i.. 7] 
of S[1..€] such that the string val(S[k]) is a 
factor of pattern P for any k ¢€ [i..j]. Take the 
longest suffix h of val(S[i — 1]) and the longest 
prefix ¢ of val(S[j + 1]) to obtain the sequence 
h, val(S[i]),..., val(S[j]),t of pattern factors. 
Any pattern occurrence must appear in such a 
pattern factor sequence, except for the case that 
some variable X = S[k] contains a pattern 
occurrence in its string val(X). The CPM task 
can thus be reduced into a number of instances of 
pattern matching in a sequence of pattern factors. 
It should be noted that the task of pattern match- 
ing in a pattern factor sequence depends only 
on P, not depending on T nor its compression 
format. Gawrychowski [7] described an elaborate 
technique to perform this task in linear time. 
On the other hand, in the reduction task, we are 
faced with the need to solve the so-called factor 
concatenation problem [9]: 


Preprocess P to build a data structure that returns 
in constant time the vertex representing the factor 
xy, for any two (explicit or implicit) vertices of 
suffix tree of P representing factors x, y of P. 


An O(| P|) time preprocessing was presented in 
[9]. For LZW or, more generally, simple collage 
systems, it is rather straightforward to see that if 
the alphabet is of constant size, the preprocessing 
requires only O(|P|) time since either x or y 
is of length 1, and the reduction task thus takes 
O(|P|+ |S|+| P|) time. CPM for simple collage 
systems can therefore be solved in optimal linear 
time for a constant alphabet. Gawrychowski [7] 
further described how to keep it linear even in the 
case of integer alphabet for LZW. 


Theorem 2 (Gawrychowski [7]) CPM for LZW 
can be solved in optimal linear time even for a 
polynomial size integer alphabet, assuming the 
word RAM model. 


For LZ77, one possible solution is to convert 
the input LZ77 factorization into a truncation-free 
collage system and then apply the CPM algorithm 
of [9]. We can convert an LZ77 factorization 
of T into a truncation-free collage system with 
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an increase in size by a factor of O(log wry) 
in time linear to the output size (see [4]). The 
resulting algorithm thus has time complexity of 
O(|e(T)| log ar +|P|?). Gawrychowski in [6] 
successfully removed the quadratic dependency 
on | P| again. 


Theorem 3 (Gawrychowski 
LZ77 can be solved in O(|ce(T)| log 
time. 


[6]) CPM for 
T 
ary + |PI) 


le 


Table | summarizes the best known solutions 
to CPM for several compression formats. 


Practical Aspect 

From a practical viewpoint, we have two goals. 
One is to perform the CPM task in less time 
compared with a decompression followed by an 
ordinary search (Goal 1), and the other is to 
perform it in less time compared with an or- 
dinary search over uncompressed text (Goal 2). 
An optimal CPM algorithm theoretically achieves 
the two goals if |ce(T)| = o(|T|). However, 
we often observe |ce(T)| = ©(|T]) in practice, 
and hence reducing the constant factors hidden 
behind the O-notation of time complexity of 
CPM algorithms plays a crucial role in achieving 
the two goals, especially for Goal 2. For example, 
code words are limited to multiples of 8 bits to 
avoid bit manipulation. 

Kida et al. [8] reported the first experimental 
results in this area, achieving Goal | for LZW. 
Navarro and Tarhio [14] presented BM-type algo- 
rithms for LZ78/LZW compression schemes and 
showed they are twice as fast as a decompression 
followed by a search using the best algorithms. 

Data compression can be regarded as a pre- 
processing that allows a fast search in the context 
of Goal 2. An appropriate compression format 
would be chosen for this purpose. We note that 
in general, some occurrences of the encoded 
pattern can be false matches, and/or the pattern 
possibly occurs in several different forms within 
the encoded text. There are two lines of research 
work addressing Goal 2. One is to put a restriction 
on the compression scheme so that every pattern 
occurrence can be identified simply as a substring 
of the encoded text that is identical to the encoded 
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Pattern Matching on Compressed Text, Table 1 Best known solutions to CPM for several compression formats 


Compression formats Time complexity Work 
Run-length O(|e(T)| + | P|) Trivial 
LZW O(|e(T)| + | P|) [7] 
LZ77 O(\e(T)| log LE + Pl) [6] 
Simple collage systems O((ID| + |S|) + | Pl) [7] 
Truncation-free collage systems O((IP| + |S|) + | PI?) [9] 
Collage systems O((|D| + |S|) - height(D) + | P|?) [9] 


pattern. The advantage is that any favored pattern 
matching algorithm can be used to search the en- 
coded text for the encoded pattern. The works of 
Manber [10] and Rautio et al. [15] are along this 
line. The latter is based on a combination of the 
so-called stopper encoding and the Boyer-Moore- 
Horspool (BMH) algorithm and is regarded as 
the fastest combination that achieves Goal 2. The 
drawback of this line is, however, that the restric- 
tion considerably sacrifices the compression ratio 
(e.g., 60-70 % for typical English texts). In the 
case of natural language texts written in western 
languages such as English (having explicit word 
boundaries), there are some compression formats 
that enable us to achieve Goal 2 by using a 
modification of byte-oriented Huffman coding on 
words (see, e.g., [3]). 

The other line is to suppress a false detection 
or detection omission by an algorithmic device, 
without putting such a restriction on the compres- 
sion scheme. The work of Miyazaki et al. [13] 
for Huffman encoding and the works of Shibata 
et al. [16,17] for BPE are along this line. While 
all of the works [10, 13, 15-17] mentioned here 
achieve Goal 2, the compression ratios are poor: 
BPE is the best among them. A BPE compressed 
text is a regular collage system with limitation 
|D| < 256. Matsumoto [12] extended BPE to 
get a higher compression ratio by easing the 
limitation and using the byte-oriented Huffman 
coding for representing the variables occuring 
in S. Their CPM algorithm runs fast to achieve 
Goal 2, but memory requirement increases as 
|D| grows. Maruyama [11] introduced a new 
compression scheme, called the context-sensitive 
grammar transform, of which compression ratio 
is a match for gzip and Re-Pair. The search speed 
of their CPM algorithm is almost twice faster than 


the KMP-type CPM algorithm of [16] on BPE 
and faster than [15] for short patterns. 
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Problem Definition 


A tile type is a colored unit square each of whose 
four sides is provided with a glue. An assembly 
is a partial function from Z? (2D-grid) to a tile 
type set 7. A (rectangular) pattern P (of width w 
and height /) is a function from the rectangular 
domain [w] x [A] to a set of colors, where [m] = 
{1,...,m} form € N. If at most k colors appear 
on P, we say P is k-colored. Tiles being colored, 
an assembly of domain [w] x [A] induces a unique 
pattern of width w and height h. 

The rectilinear tile assembly system (RTAS) 
is a variant of Winfree’s aTAM system [10]. 
Figure | illustrates how an RTAS self-assembles 
the binary counter pattern. An RTAS is a pair of a 
finite set of tile types and an L-shape seed, which 
is an assembly of domain {(0, 0)} U [w] x {0} U 
{0} x [A]. Starting from the L-shape seed, it tiles 
the plain according to the following rule: 


RTAS’s tiling rule: A tile can attach at a posi- 
tion (x, y) if its west glue matches the east 
glue of the tile at (x—1, y) and its south glue 
matches the north glue label of the tile at 
(x, y—l). 


A tile hence finds the sole attachable position 
(1, 1) on the L-shape seed. The attachment of 
tile there makes the positions (2, 1) and (1, 2) 
attachable. Tiles attach in this manner one after 
another and an assembly grows rectilinearly. 

An RTAS is directed (a.k.a. deterministic) if 
no two tile types share the west and south glues, 
as the one in Fig. 1. A directed RTAS admits a 
unique assembly Aq to which no tile can attach 
any more. Then we say that it uniquely self- 
assembles the pattern of Au. 

Patterned self-assembly tile set synthesis 
(PATS), proposed by Ma and Lombardi [7], aims 
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Patterned Self-Assembly Tile Set Synthesis, Fig. 1 
(Left) Four tile types implement together the half-adder 
with two inputs A, B from the west and south, the output 


at minimizing the RTAS that self-assembles a 
given pattern, where an RTAS is measured by the 
cardinality of its tile type set. Since the minimum 
RTAS is known to be directed [2], this problem 
is formulated as follows: 

Patterned self-assembly tile set synthesis 
(PATS) [7] 


GIVEN: A (rectangular) pattern P 
FIND: A minimum directed RTAS that uniquely 
self-assembles P 


For k > 1, the k-colored PATS (k-PATS) is a 
practical variant of PATS which takes only the k- 
colored patterns as input. 


Key Results 


PaTs and k-PATS have been studied mainly in 
two research directions so far: its computational 
complexity and algorithms. 

The NP-hardness of 2-PATS was claimed in 
[8], but what was proved NP-hard there was 
something different. Czeizler and Popa proved 
that PATS is NP-hard [1] (its concise proof is in 
[5]) by a polynomial-time reduction of 3SAT to 


S to the north, and the carryout C to the east (Right) Copies 
of the “half-adder” tile types turn the L-shape seed into the 
binary counter pattern 


the decision variant of PATS: given a pattern P 
and n € N, decide whether P can be uniquely 
self-assembled by a directed RTAS with n tile 
types. Variables and clauses of a given 3SAT 
instance @ are color-coded so that the number of 
colors in the reduced pattern is in proportion to 
the size of ¢. 

Potential of geometry, or more precisely, con- 
figuration of colors, as a medium of encoding a 
3SAT encoding a 3SAT instance ¢ was suggested 
by Seki [9]. In the reduction, a set Teva) of 84 tile 
types is designed as a 3SAT-verifier, i.e., using 
tiles in the set, a directed RTAS evaluates ¢ to be 
true and assembles a pattern P(@), starting froma 
seed encoding ¢ and some satisfying assignment. 
A subpattern GADGET of P(¢) endows P(@) 
with the property that in order for a directed 
RTAS with a set T of at most 84 tile types to 
assemble P(#), T must be isomorphic to Toya. 
The pattern P() is 60-colored. Hence, ¢ is 
satisfiable if and only if (P(@), 84) is a yes- 
instance of 60-PATS. Johnsen, Kao, and Seki have 
refined the original design so that the number of 
colors decreased to 29 [3] and further to 11 [4]. 

In these proofs, quite a few colors are devoted 
just to make the property of GADGET manually 
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checkable. Giving up the manual checkability of 
the GADGET property has yielded a computer- 
assisted proof of NP-hardness of 2-PATS by Kari 
et al. [6]. The proof was verified in two differ- 
ent environments. Note that in this proof, ev- 
erything but the GADGET property is manually 
checkable. 

The NP-hardness of 2-PATS makes it 
essentially indispensable for exact PATS-solvers 
to search for the exponential number of solution 
candidates. G6ds et al. designed PATS-solvers: 
an exhaustive partition-search branch-and-bound 
(PS-BB) algorithm, its heuristic modification 
(PS-H), and an ASP-solver-based algorithm [2]. 
PS-BB is an exact algorithm for PATS, running 
in practical time just for small patterns, say 
7 x 7, while PS-H works even for larger patterns 
in exchange for the loss of guarantee on the 
minimality of its output. 

Let P be an input pattern of width w and 
height h. The colors on P induce a partition z(c) 
of the P’s domain [w] x [h]. In principle, among 
all possible partitions, PS-BB searches for the 
one Tmin Of least cardinality satisfying: 


¢ z(c) is coarser than 7min in the sense that 
Vq € Mmin, Ip € W(C),g © p. 

¢ One can associate each class p of 21min With a 
quadruple of glues such that a directed RTAS 
with the associated tile type set uniquely self- 
assembles P. 


Note that the finest partition tax = {{(x, y)} | 
(x,y) € [w] x [A]} satisfies these conditions 
(associated tile types “hardcode their position” in 
glues). The coarser-finer relation yields a tree of 
partitions whose root is 2max. This is the search 
tree of PS-BB (and PS-H). 

PS-BB employs branch-pruning by bounding 
function to save computational resources. PS- 
H more greedily optimizes the order in which 
the coarsenings of a partition are explored by 
preferring some search paths to the others. 
Random choice of the one among the preferred 
lets PS-H perform differently at each run. 
PS-H, is a variant of PS-H which runs 
multiple independent searches in parallel for 
efficiency. 


Patterned Self-Assembly Tile Set Synthesis 


Open Problems 


The lack of guarantee on the minimality of tile 
type sets output by the heuristic algorithms or 
on their running time motivates the design of 
a polynomial-time approximation algorithm for 
PATS. The ratio 14/13 = 1.077 is known to 
be unachievable in polynomial-time, unless P = 
NP [6]. 

A manually checkable proof for the NP- 
hardness of 2-PATS is of not practical but 
theoretical interest. 


URLs to Code and Data Sets 


The computer program for the computer-assisted 
proof is available online: 
http://self-assembly.net/wiki/index.php?title= 
2PATS-tileset-search. 
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Problem Definition 


De novo sequencing arises from the identification 
of peptides by using tandem mass spectrometry 
(MS/MS). A peptide is a sequence of amino acids 
in biochemistry and can be regarded as a string 
over a finite alphabet from a computer scientist’s 
point of view. Each letter in the alphabet repre- 
sents a different kind of amino acid and is associ- 
ated with a mass value. In the biochemical exper- 
iment, a tandem mass spectrometer is utilized to 


1545 


fragment many copies of the peptide into pieces 
and to measure the mass values (in fact, the mass 
to charge ratios) of the fragments simultaneously. 
This gives a tandem mass spectrum. Since differ- 
ent peptides normally produce different spectra, 
it is possible, and now a common practice, to 
deduce the amino acid sequence of the peptide 
from its spectrum. Often this deduction involves 
the searching in a database for a peptide that can 
possibly produce the spectrum. But in many cases 
such a database does not exist or is not complete, 
and the calculation has to be done without look- 
ing for a database. The latter approach is called 
de novo sequencing. 

A general form of de novo sequencing prob- 
lems is described in [2]. First, a score function 
F(P,S) is defined to evaluate the pairing of a 
peptide P and a spectrum S. Then the de novo 
sequencing problem seeks for a peptide P such 
that f(P, S) is maximized for a given spectrum 
S. 

When the peptide is fragmented in the tandem 
mass spectrometer, many types of fragments can 
be generated. The most common fragments are 
the so called b-ions and y-ions. b-ions corre- 
spond to the prefixes of the peptide sequence, 
and y-ions the suffixes. Readers are referred to 
[8] for the biochemical details of the MS/MS 
experiments and the possible types of fragment 
ions. For clarity, in what follows only b-ions 
and y-ions are considered, and the de novo se- 
quencing problem will be formulated as a pure 
computational problem. 

A spectrum S = {(x;,/;)} is a set of peaks, 
each has a mass value x; and an intensity value 
h;. A peptide P = a ,a2...dy is a string over 
a finite alphabet &. Each a € ® is associated 


with a positive mass value m(a). For any string 

k 
t = fyt)...tg, denote m(t) >> m(t;). The mass 
of a length-i prefix Glen) oF P is defined as 
b; = cp + m(aya2z...a;). The mass of a length- 
i suffix (y-ion) of P is defined as y; = cy + 
m(ag—i+1---@k-14%). Here cp and cy are two 
constants related to the nature of the MS/MS 
experiments. If the mass unit used for measuring 
each amino acid is dalton, then c, = 1 and 
cy = 19. 
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Let § be a mass error tolerance that is associ- 
ated with the mass spectrometer. For mass value 
m, the peaks matched by m is defined as D(m) = 
{(x;,4;) € S| |x; — m| < 5}. The general 
idea of de novo sequencing is to maximize the 
number and intensities of the peaks matched by 
all b and y ions. Normally, 6 is far less than 
the minimum mass of an amino acid. Therefore, 
for different i and 7, D(b;) N D(b;) = @ and 
D(yi) 1 D(y;) = ©. However, D(b;) and D(y;) 
may share common peaks. So, if not defined 
carefully, a peak may be counted twice in the 
score function. There are two different definitions 
of de novo sequencing problem, corresponding to 
two different ways of handling this situation. 


Definition 1 (Anti-symmetric de novo 
quencing) 


se- 


Instance: A spectrum S,, a mass value M, and an 
error tolerance 6. 
Solution: A peptide P such that m(P) = M, 
and D(b;) N D(y;) = @ for any i, 7. 
n 


Objective: Maximize >°* » hj. 
K=1 (x; Aj )EDOK)UD (VK) 


This definition discards the peptides that give a 
pair of b; and y; with similar mass values, be- 
cause this happens rather infrequently in practice. 
Another definition allows the peptides to have 
pairs of b; and y; with similar mass values. 
However, when a peak is matched by multiple 
ions, it is counted only once. More precisely, 
define the matched peaks by P as 


DP) =|J_ (PGi) U Di). 


Definition 2 (De novo sequencing) 


Instance: A spectrum S,, a mass value M, and an 
error tolerance 6. 
Solution: A peptide P such that m(P) = M. 


yt 


(x; A; )€D(P) 


f(P,S) = 


Objective: Maximize 
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Key Results 


Anti-symmetric de novo sequencing was studied 
in [1,2]. These studies convert the spectrum into 
a spectrum graph. Each peak in the spectrum 
generates a few of nodes in the spectrum graph, 
corresponding to the different types of ions that 
may produce the peak. Each edge in the graph 
indicates that the mass difference of the two 
adjacent nodes is approximately the mass of an 
amino acid, and the edge is labeled with the 
amino acid. When at least one of each pair of 
b; and yn—; matches a peak in the spectrum, the 
de novo sequencing problem is reduced to the 
finding of the “anti-symmetric” longest path in 
the graph. A dynamic programming algorithm for 
such purpose was published in [1]. 


Theorem 1 ([1]) The 
path in a spectrum graph G = (V,E) can be 
found in O(|V| |E|) time. 

Under Definition 2, de novo sequencing was 
studied in [6] and a polynomial time algorithm 
was provided. The algorithm is again a dynamic 


longest anti-symmetric 


programming algorithm. For two mass values 
(m,m’'), the dynamic programming calculates an 
optimal pair of prefix Aa and suffix a’ A‘, such 
that 


1. m(Aa) = m and m(a' A’) = m’. 

2. Either cy + m(A) < cy +m(a'A’) < cp + 
m(Aa) or cy + m(A’) < cp +m(A) < cy + 
m(a’ A’). 


The calculation for (m,m') is based on the op- 
timal solutions of smaller mass values. Because 
of the second above requirement, it is proved in 
[6] that not all pairs of (m,m') are needed. This 
is used to speed up the algorithm. A carefully 
designed strategy can eventually output a prefix 
and a suffix so that their concatenation form 
the optimal solution of the de novo sequencing 
problem. More specifically, the following theorem 
holds. 


Theorem 2 ([5]) The de novo sequencing prob- 
lem has an algorithm that gives the optimal 
peptide in O(|=| x 6 x max geym(a) x M). 


Perceptron Algorithm 


Because |=|, 8, max geym(a) are all con- 
stants, the algorithm in fact runs in linear time 
with a large coefficient. 

Although the above algorithms are designed 
to maximize the total intensities of the matched 
peaks, they can be adapted to work on more 
sophisticated score functions. Some studies of 
other score functions can be found in [2-5]. Some 
of these score functions require new algorithms. 


Applications 


The algorithms have been implemented into soft- 
ware programs to assist the analyses of tandem 
mass spectrometry data. Software using the spec- 
trum graph approach includes Sherenga [2]. The 
de novo sequencing algorithm under the second 
definition was implemented in PEAKS [5]. More 
complete lists of the de novo sequencing software 
and their comparisons can be found in [7,9]. 


URL to Code 


PEAKS free trial version is available at http:// 
www.bioinfor.com/. 
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Problem Definition 


The Perceptron algorithm [1, 13] is an iterative 
algorithm for learning classification functions. 
The Perceptron was mainly studied in the online 
learning model. As an online learning algorithm, 
the Perceptron observes instances in a sequence 
of trials. The observation at trial ¢ is denoted 
by x;. After each observation, the Perceptron 
predicts a yes/no (+/—) outcome, denoted ,, 
which is calculated as follows: 


Jr = sign((Wr, Xr)), 


where w; is a weight vector which is learned 
by the Perceptron and (-,-) is the inner product 
operation. Once the Perceptron has made a pre- 
diction, it receives the correct outcome, denoted 
yz, where y, € { + 1,—1}. If the prediction 
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of the Perceptron was incorrect, it updates its 
weight vector, presumably improving the chance 
of making an accurate prediction on subsequent 
trials. The update rule of the Perceptron is 


—_ § we + yexr if 3, Ae 
be ae WwW; otherwise ' () 


The quality of an online learning algorithm is 
measured by the number of prediction mistakes 
it makes along its run. Novikoff [12] and Block 
[2] have shown that whenever the Perceptron is 
presented with a sequence of linearly separable 
examples, it makes a bounded number of pre- 
diction mistakes which does not depend on the 
length of the sequence of examples. Formally, let 
(xj, ¥1),---,(%7, yT) be a sequence of instance- 
label pairs. Assume that there exists a unit vector 
u(|jul/2 = 1) anda positive scalar y > 0 such that 
for all t, y;(u-x;) > y. In words, u separates the 
instance space into two half-spaces such that pos- 
itively labeled instances reside in one half-space, 
while the negatively labeled instances belong to 
the second half-space. Moreover, the distance 
of each instance to the separating hyperplane, 
{x : u-x = O}, is at least y. The scalar y is 
often referred to as the margin attained by u on 
the sequence of examples. Novikoff and Block 
proved that the number of prediction mistakes 
the Perceptron makes on a sequence of linearly 
separable examples is at most (R/y)?, where 
R = max; ||x;|/2 is the minimal radius of an 
origin-centered ball enclosing all the instances. 
In 1969, Minsky and Papert [11] underscored 
serious limitations of the Perceptron by showing 
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that it is impossible to learn many classes of pat- 
terns using the Perceptron (e.g., XOR functions). 
This fact caused a significant decrease of interest 
in the Perceptron. The Perceptron has gained 
back its popularity after Freund and Schapire [9] 
proposed to use it in conjunction with kernels. 
The kernel-based Perceptron not only can handle 
non-separable datasets but can also be utilized for 
efficiently classifying nonvectorial instances such 
as trees and strings (see, e.g., [5]). 

To implement the Perceptron in conjunction 
with kernels, one can utilize the fact that at each 
trial of the algorithm, the weight vector w; can be 
written as a linear combination of the instances 


Wr = ) JiXi, 


xely 


where J; = {i < t : ¥; # y;} is the set of 
indices of trials in which the Perceptron made a 
prediction mistake. Therefore, the prediction of 
the algorithm can be rewritten as 


Se = sign] > yi (xi.x) |. 


ier; 


and the update rule of the weight vector can 
be replaced with an update rule for the set of 
erroneous trials 


T, U tt} if Je A yt 
I; otherwise * 


hee (2) 


In the kernel-based Perceptron, the inner prod- 
uct (x;,x;) is replaced with a Mercer kernel 


Perceptron Algorithm, Table 1 Correspondence between the standard Perceptron algorithm and the kernel-based 


Perceptron 


Online Perceptron 
INITIALIZATION: w1 = 0 
Fort = 1,2,... 


Receive an instance x; 
Predict an outcome ¥; = sign((x;, x;)) 


Receive correct outcome y; € {+ 1,—1} 


wy + ext if De A Ye 


Update: wr41 = : 
Wr otherwise 


Update: J;41 = 


Kernel-based online Perceptron 


INITIALIZATION: I; = {+} 
Fort = 1,2,... 


Receive an instance x; 


Predict an outcome ; = sign ( > Kx) 


ier, 


Receive correct outcome y; € {+ 1,—1} 


I, U{t} if 3, Aye 
I; otherwise 
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function, K(x;,x;), without any further changes 
to the algorithm (for a discussion on Mercer 
kernels, see, e.g., [15]). Intuitively, the kernel 
function K(x;,x;) implements an inner product 
(b(x;), @(x;) where ¢@ is a nonlinear mapping 
from the original instance space into another 
(possibly high-dimensional) Hilbert space. Even 
if the original instances are not linearly separable, 
the images of the instances due to the nonlin- 
ear mapping @ can be linearly separable and 
thus the kernel-based Perceptron can handle non- 
separable datasets. Since the analysis of the Per- 
ceptron does not depend on the dimensionality of 
the instances, all of the formal results still hold 
when the algorithm is used in conjunction with 
kernel functions (Table 1). 


Key Results 


In the following a mistake bound for the Percep- 
tron in the non-separable case (see, e.g., [10, 14]) 
is provided. 


Theorem Assume that the Perceptron is 
presented with the sequence of examples 
(X1,V1),---, (%7, yr) and denote R = 
max; ||x;||2. Let u be a unit length weight vector 
(|lull2 = 1), let y > 0 be a scalar, and denote 


T 
L= ) | max{0, 1 — y;(u/y, X;)}. 


t=1 


Then, the number of prediction mistakes the Per- 
ceptron makes on the sequence of example is at 
most 


R\? RVL 
L+{—) +—. 
y y 


Note that if there exists u and y such that 
y;(u,x,) > y for all t, then L = O and the 
above bound reduces to Novikoff’s bound, 
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Note also that the bound does not depend on 
the dimensionality of the instances. Therefore, 
it holds for the kernel-based Perceptron as well 
with R = max; K(x, x;). 


Applications 


So far the Perceptron has been viewed in the 
prism of online learning. Freund and Schapire [9] 
proposed a simple conversion of the Perceptron 
algorithm to the batch learning setting. A batch 
learning algorithm receives as input a training set 
of examples {(x1, y1),..., (x7, yr)} sampled in- 
dependently from an underlying joint distribution 
over the instance and label space. The algorithm 
is required to output a single classification func- 
tion which performs well on unseen examples as 
long as the unseen examples are sampled from 
the same distribution as the training set. The 
conversion of the Perceptron to the batch setting 
proposed by Freund and Schapire is called the 
voted Perceptron algorithm. The idea is to simply 
run the online Perceptron on the training set of 
examples, thus producing a sequence of weight 
vectors W;,...,W7y. Then, the single classifica- 
tion function to be used for unseen examples is a 
majority vote over the predictions of the weight 
vectors. That is, 


+1 if|{t : (wz, x) > O}| > 
[{t : (wr, x) < O}| 
—1 otherwise 


f(x) = 


It was shown (see again [9]) that if the number 
of prediction mistakes the Perceptron makes on 
the training set is small, then f(x) is likely to 
perform well on unseen examples as well. 

Finally, it should be noted that the Perceptron 
algorithm was used for other purposes such as 
solving linear programming [3] and training sup- 
port vector machines [14]. In addition, variants of 
the Perceptron were used for numerous additional 
problems such as online learning on a budget 
[4,8], multiclass categorization and ranking prob- 
lems [6,7], and discriminative training for hidden 
Markov models [5]. 
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Problem Definition 


Let S = {51,52,...,5,} be a set of elements 
called objects and let C = {c1,C2,...,Cm} be 
a set of functions called characters such that 
each c; € C is a function from S to the set 
{0,1,...,7; — 1} for some integer r;. For every 
cj € C, the set {0,1,...,r; — 1} is called the 
set of allowed states of character c;, and for any 
s; € S andc; € C, it is said that the state of s; 
onc; is @, or that the state ofc; for s; is a, where 
a = c;(s;). The character state matrix for S 
and C is the (n x m)-matrix in which entry (7, 7) 
for any i € {1,2,...,m} and 7 € {1,2,...,m} 
equals the state of s; on c;. 

In this encyclopedia entry, a phylogeny for S 
is an unrooted tree whose leaves are bijectively 
labeled by S. For every cj; € C anda ¢€ 
{0,1,...,7;—1}, define the set Se; ,a by Se; .a = 
{s; € S : the state of s; onc; isa}. A perfect 
phylogeny for (S,C) (Gif one exists) is a phy- 
logeny T for S such that the following holds: 
for each c; € C and pair of allowed states a, B 
of c; with a ~ 8B, the minimal subtree of T 
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that connects Se; 2 and the minimal subtree of T 
that connects S,; g are vertex disjoint. See Fig. | 
for an example. The Perfect Phylogeny Problem 
(also called the Character Compatibility Problem 
in the literature [2,9]) is the following: 


Problem 1 (The Perfect Phylogeny Problem) 


INPUT: An (n x m)-character state matrix M for 
some S and C. 

OUTPUT: A perfect phylogeny for (S, C), if one 
exists; otherwise, null. 


Below, we define r max je{1,2,...,m} 1; for 
the input character state matrix M. 


Key Results 


The following negative result was proved by 
Bodlaender, Fellows, and Warnow [2] and, inde- 
pendently, by Steel [14]: 


Theorem 1 ([2, 14]) The 
Problem is NP-hard. 


Perfect Phylogeny 


On the other hand, certain restrictions of the 
Perfect Phylogeny Problem can be solved ef- 
ficiently. One such special case occurs if the 
number of allowed states of each character is 
limited. For this case, Agarwala and Fernandez- 
Baca [1] designed a dynamic programming-based 
algorithm that builds perfect phylogenies on cer- 
tain subsets of S called c-clusters (also referred 
to as proper clusters in [5, 10] and as charac- 
ter subfamilies in [6]) in a bottom-up fashion. 
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Each c-cluster G has the property that: (1) G 
and S \ G share at most one state of each 
character; and (2) for at least one character, G 
and S \ G share no states. The number of c- 
clusters is at most 2”m, and the algorithm’s to- 
tal running time is O(27"(nm? + m‘*)), i., 
exponential in r. Hence, the Perfect Phylogeny 
Problem is polynomial-time solvable when the 
number of allowed states of every character is 
upper-bounded by O(log(m + 7)). Subsequently, 
Kannan and Warnow [10] presented a modified 
algorithm with improved running time. They re- 
structured the algorithm of [1] to eliminate one 
of the three nested loops that steps through all 
possible c-clusters and added a preprocessing 
step which speeds up the innermost loop. The 
resulting time complexity is given by: 


Theorem 2 ([10]) The algorithm of Kannan and 
Warnow in [10] solves the Perfect Phylogeny 
Problem in O(2?"nm7?) time. 


A perfect phylogeny T for (S,C) is called 
minimal if no tree which results by contracting 
an edge of T is a perfect phylogeny for (S, C). 
In [10], Kannan and Warnow also showed how to 
extend their algorithm to enumerate all minimal 
perfect phylogenies for (S,C) by constructing a 
directed acyclic graph that implicitly stores the 
set of all perfect phylogenies for (S, C). 


Theorem 3 ({10]) The extended algorithm of 
Kannan and Warnow in [10] enumerates the set 
of all minimal perfect phylogenies for (S,C) so 


56 
am Cy Cy C3 b [1,0,1] 
52 
Cle 2 « [1.1.0] 
Ss] 
eas (0.0,1] 
S;/2 2 0 
54,1 0 0 S4 
. [1,0,0] 
$,/0 3 1 
6) 4-0) 1 Ss . 
[03,1] [2.2.0] 
Perfect Phylogeny (Bounded Number of States), Fig. r2 = 4, andr3 = 2,ie, 7 = 4. (b) A perfect 


1 (a) Anexample of a character state matrix M for S = 
{s1, 52, aes So} and C = {c1, C2, ¢3} with ry = 3, 


phylogeny for (S,C). For convenience, the states of all 
three characters for each object in S are shown 
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Perfect Phylogeny (Bounded Number of 
States), Table 1 The running times of the fastest 
known algorithms for the Perfect Phylogeny Problem 
with a bounded number of states 


r Running time Reference 

2 O(nm) [11] together with [7] 

3 min{O(nm?), [3, 10] together with [9] 
O(n?m)} 

4 min{ O(nm?), [10] together with [9] 
O(n2m)} 


>5 O(27'nm?) [10] 


that the maximum computation time between two 
consecutive outputs is O(27"nm?). 


For small values of r, even faster algorithms 
are known. Refer to the table in Table | for a sum- 
mary. If r = 2, then the problem can be solved 
in O(um) time by reducing it to the Directed 
Perfect Phylogeny Problem for Binary Characters 
(see, e.g., Encyclopedia entry » Directed Perfect 
Phylogeny (Binary Characters) for a definition 
of this variant of the problem) using O(nm) 
time [7,11] and then applying Gusfield’s O(n m)- 
time algorithm [7]. If r = 3 orr = 4, the 
problem is solvable in O(n?m) time by another 
algorithm by Kannan and Warnow [9], which is 
faster than the algorithm from Theorem 2 when 
n < m. Also note that for the case r = 3, there 
exists an older algorithm by Dress and Steel [3] 
whose time complexity coincides with that of the 
algorithm in Theorem 2. 

For other special cases of the Perfect Phy- 
logeny Problem that can be solved efficiently, see 
Encyclopedia entry » Directed Perfect Phylogeny 
(Binary Characters) or the survey by Fernandez- 
Baca [5]. 


Applications 


Computational evolutionary biology relies on 
efficient methods for inferring, from some given 
data, a phylogenetic tree that accurately describes 
the evolutionary relationships among a set of 
objects (e.g., biological species, proteins, genes, 
etc.) assumed to have been produced by an 
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evolutionary process. One of the most widely 
used techniques for reconstructing a phylogenetic 
tree is to represent the objects as vectors of 
character states and look for a tree that clusters 
objects which have a lot in common. The Perfect 
Phylogeny Problem can be regarded as the ideal 
special case of this approach in which the given 
data contains no errors, evolution is treelike, and 
each character state can emerge only once in the 
evolutionary history. 

However, data obtained experimentally 
seldom admits a perfect phylogeny, so various 
optimization versions of the problem such as 
maximum parsimony and maximum compatibility 
are often considered in practice. These strategies 
generally lead to NP-complete problems, 
but there exist heuristics that work well 
for most inputs. See, e.g., [4, 5, 12] for a 
discussion. Nevertheless, algorithms for the 
Perfect Phylogeny Problem may be useful 
even when the data does not admit a perfect 
phylogeny, for example, if there exists a perfect 
phylogeny for m — O(1) of the characters in C. 
In fact, in one crucial step of their proposed 
character-based methodology for determining 
the evolutionary history of a set of related natural 
languages, Warnow, Ringe, and Taylor [15] 
consider all subsets of C in decreasing order of 
cardinality, repeatedly applying the algorithm 
of [10] until a largest subset of C which 
admits a perfect phylogeny is found. The ideas 
behind the algorithms of [1] and [10] have 
also been utilized and extended by Fernandez- 
Baca and Lagergren [6] in their algorithm for 
computing near-perfect phylogenies in which 
the constraints on the output have been relaxed 
in order to permit non-perfect phylogenies 
whose so-called penalty score is less than or 
equal to a prespecified parameter g; see [6] for 
details. (See also [13] for a fixed-parameter 
tractable algorithm for this problem variant 
when r = 2.) 

The motivation for considering a bounded 
number of states is that characters based on 
directly observable traits are, by the way they 
are defined, naturally bounded by some small 
number (often 2). When biomolecular data is 
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used to define characters, the number of allowed 
states is typically bounded by a constant, e.g., 
r = 2 for SNP markers, r = 4 for DNA or RNA 
sequences, or r = 20 for amino acid sequences. 
(see also Encyclopedia entry » Directed Perfect 
Phylogeny (Binary Characters)). Moreover, 
characters with r = 2 can be useful in 
comparative linguistics [8]. 


Open Problems 


An open problem is to determine whether the 
time complexity of the algorithm of Kannan and 
Warnow [10] can be improved. As noted in [5], it 
would be interesting to find out if the Perfect Phy- 
logeny Problem is solvable in O(27"nm) time for 
any r, or more generally, in O(f(r)-nm) time, 
where f is a function of r which does not depend 
on n or m, since this would match the fastest 
known algorithm for the special case r = 2 (see 
Table 1). Another open problem is to establish 
lower bounds on the computational complexity of 
the Perfect Phylogeny Problem with a bounded 
number of states. 
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Problem Definition 


In the context of the perfect phylogeny haplo- 
typing (PPH) problem, each vector h € {0, 1}’” 
is called a haplotype, while each vector 
g € {0,1,2}”" is called a genotype. Haplotypes 
are binary encodings of DNA sequences, 
while genotypes are ternary encodings of pairs 
of DNA sequences (one sequence for each 
of the two homologous copies of a certain 
chromosome). 

Two haplotypes h’ and h” are said to resolve 
a genotype g if, at each position j: (i) if 
gj € {0,1} then both hi = g; and hi; = gj; 
(ii) if gj = 2 then either h, = 0 and h’ = 1 
or h'. = 1 and h'. = 0. If h’ and h” resolve g, 
we write g = h’ +h”. An instance of the PPH 
problem consists of a set G = {g!, g?,..., 97} 
of genotypes. A set H of haplotypes such that, 
for each g eG, there are h’,h” € H with 
g =h' +h", is called a resolving set for G. 

A perfect phylogeny for a set H of haplotypes 
is a rooted tree T for which 


¢ the set of leaves is H and the root is labeled by 
some binary vector r; 

* each index j € {1,...,7m} labels exactly one 
edge of T; 

¢ if an edge e is labeled by an index k, then, for 
each leaf h that can be reached from the root 
via a path through e, it is hy A rz. 


Without loss of generality, it can be assumed 
that the vector labeling the root is r = 0. Within 
the PPH problem, T is meant to represent the 
evolution of the sequences at the leaves from 
a common ancestral sequence (the root). Each 
edge labeled with an index represents a point 
in time when a mutation happened at a specific 
site. This model of evolution is also known as 
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coalescent [11]. It can be shown that a perfect 
phylogeny for H exists if and only if for all 
choices of four haplotypes h!,...,h4 ¢ H and 
two indices 7, j, 


{ht ht 1 <a < 4} # {00, 01, 10, 11}. 


Given the above definitions, the problem sur- 
veyed in this entry is the following: 


Perfect Phylogeny Haplotyping Problem 
(PPH) 

Given a set G of genotypes, find a resolving set H 
of haplotypes and a perfect phylogeny T for H, or 
determine that such a resolving set does not exist. 

In a slightly different version of the above 
problem, one may require to find all perfect 
phylogenies for H instead of just one (in fact, 
all known algorithms for PPH do find all perfect 
phylogenies). 

The perfect phylogeny problem was intro- 
duced by Gusfield [7], who also proposed 
a nearly linear-time O(nma(nm))-algorithm 
for its solution (where a() is the extremely 
slowly growing inverse Ackerman function). 
The algorithm resorted to a reduction to 
a complex procedure for the graph realization 
problem (Bixby and Wagner [2]), of very 
difficult understanding and implementation. 
Later approaches for PPH proposed much 
simpler, albeit slower, O(nm7)-algorithms 
(Bafna et al. [1]; Eskin et al. [6]). However, 
a major question was left open: does there exist 
a linear-time algorithm for PPH? \n [7], Gusfield 
conjectured that this should be the case. The 
2005 algorithm by Ding, Filkov, and Gusfield [5] 
surveyed in this entry settles the above conjecture 
in the affirmative. 


Key Results 


The main idea of the algorithm is to find the 
maximal sub-graphs that are common to all PPH 
solutions. Let us call P-class a maximal sub- 
graph of all PPH trees for G. The authors show 
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that each P-class consists of two sub-trees which, 
in each PPH solution, can appear in either one 
of two possible ways (called flips of the P-class) 
with respect to any fixed P-class taken as a ref- 
erence. Hence, if there are k P-classes, there are 
2*-! distinct PPH solutions. 

The algorithm resorts to an original and effec- 
tive data structure, called the shadow tree, which 
gives an implicit representation of all P-classes. 
The data structure is built incrementally, by pro- 
cessing one genotype at a time. The total cost for 
building and updating the shadow tree is linear in 
the input size (i.e., inm). A detailed description 
of the shadow tree requires a rather large number 
of definitions, possibly accompanied by figures 
and examples. Here, we will introduce only its 
basic features, those that allow us to state the 
main theorems of [5]. 

The shadow tree is a particular type of directed 
rooted tree, which contains both edges and links 
(strictly speaking, the latter are just arcs, but they 
are called links to underline their specific use in 
the algorithm). The edges are of two types: tree- 
edges and shadow-edges, and are associated to 
the indices {1,...,m}. For each index i, there is 
a tree-edge labeled ¢; and a shadow-edge labeled 
s;. Both edges and links are oriented, with their 
head closer to the root than their tail. Other than 
the root, each node of the shadow tree is the 
endpoint of exactly one tree-edge or one shadow- 
edge (while the root is the head of two “dummy” 
links). The links are used to connect certain tree- 
and shadow-edges. A link can be either free or 
jixed. The head of a free link can still be changed 
during the execution of the algorithm, but once 
a link is fixed, it cannot be changed any more. 

Tree-edges, shadow-edges and fixed links are 
organized into classes, which are sub-graphs of 
the shadow tree. Each fixed link is contained 
in exactly one class, while each free link con- 
nects one class to another, called its parent. For 
each index i, if the tree-edge ¢; is in a class X, 
then the shadow-edge s; is in X as well, so that 
a class can be seen as a pair of “twin” sub- 
trees of the shadow tree. The free links point out 
from the root of the sub-trees (the class roots). 
Classes change during the running of the algo- 


1555 


rithm. Specifically, classes are created (contain- 
ing a single tree- and shadow-edge) when a new 
genotype is processed; a class can be merged with 
its parent, by fixing a pair of free edges; finally, 
a class can be flipped, by switching the heads of 
the two free links that connect the class roots to 
the parent class. 

A tree T is said to be “contained in” a shadow 
tree if T can be obtained by flipping some classes 
in the shadow tree, followed by contracting all 
links and shadow-edges. Let us call contraction 
of a class the sub-graph (consisting of a pair 
of sub-trees, made of tree-edges only) that is 
obtained from a class X of the shadow tree by 
contracting all fixed links and shadow-edges of 
X. The following are the main results obtained 
in [5]: 


Proposition 1 Every P-class can be obtained 
by contraction of a class of the final shadow 
tree produced by the algorithm. Conversely, every 
contraction of a class of the final shadow tree is 
a P-class. 


Theorem 1 Every PPH solution is contained 
in the final shadow tree produced by the algo- 
rithm. Conversely, every tree contained in the 
final shadow tree is a distinct PPH solution. 


Theorem 2 The total time required for building 
and updating the shadow tree is O(nm). 


Applications 


The PPH problem arises in the context of Single 
Nucleotide Polymorphisms (SNP’s) analysis in 
human genomes. A SNP is the site of a single 
nucleotide which varies in a statistically signif- 
icant way in a population. The determination 
of SNP locations and of common SNP patterns 
(haplotypes) are of uttermost importance. In fact, 
SNP analysis is used to understand the nature 
of several genetic diseases, and the international 
Haplotype Map Project is devoted to SNP study 
(Helmuth [9]). 
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The values that a SNP can take are called 
its alleles. Almost all SNPs are bi-allelic, i.e., 
out of the four nucleotides A, C, T, G, only two 
are observed at any SNP. Humans are diploid 
organisms, with DNA organized in pairs of chro- 
mosomes (of paternal and of maternal origin). 
The sequence of alleles on a chromosome copy 
is called a haplotype. Since SNPs are bi-allelic, 
haplotypes can be encoded as binary strings. For 
a given SNP, an individual can be either homozy- 
gous, if both parents contributed the same allele, 
or heterozygous, if the paternal and maternal 
alleles are different. 

Haplotyping an individual consists of deter- 
mining his two haplotypes. Haplotyping a pop- 
ulation consists of haplotyping each individual of 
the population. While it is today economically 
infeasible to determine the haplotypes directly, 
there is a cheap experiment which can determine 
the (less informative and often ambiguous) geno- 
types. 

A genotype of an individual contains the con- 
flated information about the two haplotypes. For 
each SNP, the genotype specifies which are the 
two (possibly identical) alleles, but does not spec- 
ify their origin (paternal or maternal). The ternary 
encoding that is used to represent a genotype g 
has the following meaning: at each SNP j, it 
is g; =O (respectively, 1) if the individual is 
homozygous for the allele 0 (respectively, 1), and 
g; = 2 if the individual is heterozygous. There 
may be many possible pairs of haplotypes that 
justify a particular genotype (there are 2*—! pairs 
of haplotypes that can resolve a genotype with k 
heterozygous SNPs). Given a set of genotypes, in 
order to determine the correct resolving set out of 
the exponentially many possibilities, one imposes 
some “biologically meaningful” constraints that 
the solution must possess. The perfect phylogeny 
model (coalescent) requires that the resolving set 
must fit a particular type of evolutionary tree. 
That is, all haplotypes should descend from some 
ancestral haplotype, via mutations that happened 
(only once) at specific sites over time. The coales- 
cent model is accurate especially for short haplo- 
types (for longer haplotypes there is also another 
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type of evolutionary event, recombination, that 
should be taken into account). 

The linear-time PPH algorithm is of signifi- 
cant practical value in two respects. First, there 
are instances of the problem where the number 
of SNPs considered is fairly large (genotypes 
can extend over several kilo-bases). For these 
long instances, the advantage of an O(nm) al- 
gorithm with respect to the previous O(nm7) 
approach is evident. On the other hand, when 
genotypes are relatively short, the benefit of us- 
ing the linear-time algorithm is not immediately 
evident (both algorithms run extremely quickly). 
Nevertheless, there are situations in which one 
has to solve a large set of haplotyping problems, 
where each single problem is defined over short 
genotypes. For instance, this is the case in which 
one examines all “small” subsets of SNPs in order 
to determine the subsets for which there is a PPH 
solution. In this type of application, the gain of 
efficiency with the use of the linear-time PPH 
algorithm is significant (Chung and Gusfield [4]; 
Wiuf [14]). 


Open Problems 


A linear-time algorithm is the best possible for 
PPH, and no open problems are listed in [5]. 


Experimental Results 


The algorithm has been implemented in C and 
its performance has been compared with the 
previous fastest PPH algorithm, i.e., DPPH 
(Bafna et al. [1]). In the case of m = 2000 
and n = 1000, the algorithm is about 250-times 
faster than DPPH, and is capable of solving an 
instance in an average time of 2 s, versus almost 
8 min needed by DPPH (on a “standard” 2005 
Personal Computer). The smaller instances (e.g., 
with m = 50 SNPs) are such that the superior 
performance of the algorithm is not as evident, 
with an average running time of 0.07 s versus 
0.2 s. However, as already remarked, when the 
small instances are executed within a loop, the 
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speed-up turns out to be again of two or more 
orders of magnitude. 


Data Sets 


The data sets used in [5] have been generated 
by the program ms (Hudson [12]), which is the 
widely used standard for instance generation re- 
flecting the coalescent model of SNP sequence 
evolution. Real-life instances can be found at the 
HapMap web site http://www.hapmap.org. 


URL to Code 


http://wwwesif.cs.ucdavis.edu/~gusfield/Ipph/ 
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Problem Definition 


Circuit partitioning consists of dividing the cir- 
cuit into parts, each of which can be imple- 
mented as a separate component (e.g., a chip) 
that satisfies the design constraints. The work of 
Rajaraman and Wong [5] considers the problem 
of dividing a circuit into components, subject to 
area constraints, such that the maximum delay at 
the outputs is minimized. 

A combinational circuit can be represented as 
a directed acyclic graph G = (V, E), where V is 
the set of nodes and E£ is the set of directed edges. 
Each node represents a gate in the network and 
each edge (u, v) in E represents an interconnec- 
tion between gates u and v in the network. The 
fanin of a node is the number of edges incident 
into it, and the fanout of a node is the number of 
edges incident out of it. A primary input (PI) is a 
node with fanin 0, while a primary output (PO) is 
a node with fanout 0. Each node has a weight and 
a delay associated with it. 


Definition 1 A clustering of a network G = 
(V, £) is a triple (H, o, X), where 


1. H = (V’, E’) isa directed acyclic graph. 
2. gis a function mapping V’ to V such that 
o For every edge (u’, v’) € E’, (o(v’), o(v’)) 
ek. 
o For every node v’ e€ MV’ and edge 
(u,o(v’)) € E, there exists a unique 
u’ € V" such that o(uv’) = uw and 
(u',v') € E’. 
o For every PO node v é€ VJ, there exists a 
unique v’ € V’ such that b(v’) = v. 
3. Dis a partition of V’. 


Let = (A = (V’, E’), >, 2) be a clustering of 
G. For v € Viv’ € V’, if b(v’) = v, we call v’ 
a copy of v. The set V’ consists of all the copies 
of the nodes in V that appear in the clustering. 
A node v’ is a PI (respectively, PO) in T if b(v’) 
is a PI (respectively, PO) in G. It follows from 
the definition of @ that H is logically equivalent 
to G. 


The weights and delays on the individual 
nodes in G yield weights and delays of nodes in 
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H’ and a delay for the clustering . The weight 
(respectively, delay) of a node v’ in V’ is the 
weight (respectively, delay) of o(v). The weight 
of any cluster C € &, denoted by W(C), is 
the sum of the weights of the nodes in C. The 
delay of a clustering is given by the general delay 
model of Murgai et al. [3], which is as follows. 
The delay of an edge (u’, v’) € E’ is D (which is 
a given parameter) if u’ and v’ belong to different 
elements of & and zero otherwise. The delay 
along a path in H’ is simply the sum of the 
delays of the edges of the path. Finally, the delay 
of I’ is the delay of a maximum-delay path in H’, 
among all the paths from a PI node to a PO node 
in H’. 


Definition 2 Given a combinational network 
G = (V, E) with weight function w: V > Rt, 
weight capacity M, and a delay function 8: V > 
Rt, we say that a clustering [ = (H, 6, X) is 
feasible if for every cluster C € %,W(C) is 
at most M. The circuit clustering problem is to 
compute a feasible clustering [ of G such that 
the delay of is minimum among all feasible 
clusterings of G. 


An early work of Lawler et al. [2] presented a 
polynomial-time optimal algorithm for the circuit 
clustering problem in the special case where all 
the gate delays are zero (ie., 8(v) = O for 
all v). 


Key Results 


Rajaraman and Wong [5] presented an optimal 
polynomial-time algorithm for the circuit cluster- 
ing problem under the general delay model. 


Theorem 1 There exists an algorithm that com- 
putes an optimal clustering for the circuit clus- 
tering problem in O(n? logn + nm) time, where 
n and m are the vertices and edges, respectively, 
of the given combinational network. 


This result can be extended to compute opti- 
mal clusterings under any monotone clustering 
constraint. A clustering constraint is monotone 
if any connected subset of nodes in a feasible 
cluster is also monotone [2]. 
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Theorem 2 The circuit clustering problem can 
be solved optimally under any monotone cluster- 
ing constraint in time polynomial in the size of the 
circuit. 


Applications 


Circuit partitioning/clustering is an important 
component of very large scale integration design. 
One application of the circuit clustering problem 
formulated above is to implement a circuit on 
multiple field programmable gate array chips. 
The work of Rajaraman and Wong focused on 
clustering combinational circuits to minimize 
delay under area constraints. Related studies 
have considered other important constraints, 
such as pin constraints [1] and a combination 
of area and pin constraints [6]. Further 
work has also included clustering sequential 
circuits (as opposed to combinational circuits) 
with the objective of minimizing the clock 
period [4]. 


Experimental Results 


Rajaraman and Wong reported experimental re- 
sults on five ISCAS (International Symposium 
on Circuits and Systems) circuits. The number 
of nodes in these circuits ranged from 196 to 
913. They reported the maximum delay of the 
clusterings and running times of their algorithm 
on a Sun Sparc workstation. 
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Problem Definition 


Permutation Enumeration 

Let S, be the set of permutations of [n] = 
{1,2,...,n}. We write a permutation as a se- 
quence of elements in [n] such that each ele- 
ment appears exactly once. A permutation enu- 
meration is to list all permutations in S;,. For 
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example, there are 24 permutations of [4]: 1234, 
1243, 1324, 1342, 1423, 1432, 2134, 2143, 2314, 
2341, 2413, 2431, 3124, 3142, 3214, 3241, 3412, 
3421, 4123, 4132, 4213, 4231, 4312, 4321. The 
enumeration of permutations is a basic and long- 
standing enumeration problem, and it was sur- 
veyed by Sedgewick [8]. 

Note that the above example lists all the 
permutations of [4], and we have listed them 
in lexicographic order, which is the most 
natural way to enumerate them. The purpose 
of this paper is to introduce representative 
methods for enumeration problems by showing 
how these methods are applied to permutation 
enumeration. 


Efficiency of Enumeration Algorithms 

The efficiency of an algorithm is measured by 
the time and space complexity for a given input 
size. However, many enumeration problems have 
an exponential number of outputs (solutions) for 
a given input. Hence, an enumeration algorithm 
may require exponential time in order to output 
the solutions. Typically, an enumeration algo- 
rithm is measured by the “delay time.” We say 
that an enumeration algorithm has delay d if 
(1) it takes at most d time to output the first 
object, and (2) it takes at most d time between 
two consecutive outputs. See [1, 3,4] for further 
details. Note that the delay time does not include 
the time required to output the objects, since this 
is typically ignored when estimating the time 
complexity of an enumeration algorithm. The 
space complexity of an enumeration algorithm 
is an estimate of the amount of working mem- 
ory required by the algorithm (as in the usual 
sense). 


Key Results 


Enumeration by Partition Search 

In the partition search enumeration method, ob- 
jects are listed by repeatedly partitioning the set 
of objects. As an example, we will apply the 
partition search method to permutation enumer- 
ation. We will partition S, by fixing the first 
element of a permutation. Denote by S,(i) © 


Permutation Enumeration 


Algorithm 1: PARTITION-SEARCH(z, S) 


1 z is the current subpermutation, and S is the set of 
elements in 7; 

2 if S = [n] then 
in S, x/ 
Output zr; 
return; 


foreach i € [n] \ S do 
PARTITION-SEARCH(z +1, S U {i}); 

/* The operation ‘+’ isa 
concatenation x«/ 


/*« mw is a permutation 


ann bw 


Sy the set of permutations in which 7 in [n] is 
the first element. Then, S, is partitioned into 
Sn (1), Sn(2),..., Sn (1). Hence, if we have the 
list of all permutations of [n] \ {i} for each i = 
1,2,...,”, then we can enumerate all permuta- 
tions in S, by appending 7 as the first element to 
every permutation. This recursive structure gives 
the algorithm shown as Algorithm 1. To begin, 
Algorithm | is called with the empty sequence 
and the empty set. The algorithm recursively 
fixes the first element, and we then obtain all 
permutations in S,. Figure 1 illustrates a tree 
structure of recursive calls of the algorithm. The 
root corresponds to the empty sequence. Each 
vertex of the tree corresponds to the prefix of a 
permutation, and it is obtained by removing the 
last element of its child. The leaves correspond to 
the permutations in S,. Algorithm | traverses the 
tree structure in a depth-first manner. 

We now estimate the running time of Algo- 
rithm |. Let T be the running time for traversing 
an edge of the search tree. Since the depth of the 
search tree is 1, the delay time of the algorithm is 
O(nT) time in worst case. 


Theorem 1 One can enumerate all permutations 
by the partition search method. The running time 
of the algorithm is O(nT), where T is the time 
for traversing an edge of the search tree. The 
required working space is O(n). 


Enumeration by Gray Code 

A combinatorial Gray code is a list of all of 
the objects in some class such that two consec- 
utive objects in the list differ only by a small 
amount. Since the list contains all of the ob- 
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12 13 14 21 23 24 
£\ SF £%. FN. Fe LS 
123 124 132 134 142 143 213 214 231 234 


241 243 312 314 
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31 32 34 41 42 43 
7%... ££. £& FS % 2. oF 
321 324 341 342 412 413 421 423 431 432 


1234 1243 1324 1342 1423 1432 2134 2143 2314 2341 2413 2431 3124 3142 3214 3241 3412 3421 4123 4132 4213 4231 4312 4321 


Permutation Enumeration, Fig. 1 Search tree of the partition search method 


jects, an algorithm that generates a combinatorial 
Gray code can be regarded as an enumeration 
algorithm. There are combinatorial Gray codes 
for various combinatorial objects, and these have 
been surveyed [7]. For a permutation, a difference 
between two consecutive objects would be a swap 
of two adjacent elements, where we say that the 
i-th and the (i + 1)-th element in a permuta- 
tion are adjacent. For permutations, the well- 
known Steinhaus-Johnson-Trotter algorithm (or 
Johnson-Trotter algorithm) [5,8, 10,11] generates 
the combinatorial Gray code for the permutations 
of [n]. This algorithm is regarded as an enumera- 
tion of permutations. 

Let wm = pi po... Pn—1 in Sp—1, and denote 
by z(i) = pip2... PinPi+1---Pn—-1 fori = 
0,1,...,2 — 1 the permutation obtained from 
mz by inserting n between p; and p;+,. Then, 
the list of 7(0),7(1),...,a(m — 1) or x(n — 
1),z(n — 2),...,2(0) is a combinatorial Gray 
code for a subset of S,,. Such lists can be defined 
for all permutations in S,— , and the lists for 
all permutations in S,—; contain all permutations 
in S,. Assume that we have a combinatorial 
Gray code for S,-;. Let 2; be the i-th per- 
mutation in the list. Then, we construct the list 
mi (0), mj (1),..., a; (n—1) ifi is even, and x; (n— 
1), mj(n — 2),..., 2; (0) if i is odd. The obtained 
list is a combinatorial Gray code for S,. Note that 
if i is even, then 2;+41(n — 1) is obtained from 
mij(n — 1) by swapping two adjacent elements, 
where z;+ 1 is the (i + 1)-th permutation in a 
combinatorial Gray code for S,—,. Similarly, if 
i is odd, then z+ 1(0) is obtained from z; (0) by 
swapping two adjacent elements. By recursively 
applying this idea, we can design a combinatorial 
Gray code for Sy. 


Now we explain the details of Steinhaus- 
Johnson-Trotter algorithm. The algorithm first 
outputs the identity permutation: = 12...n. 
Let us consider the case of 7 = pip2... Pn in 
Sn \ {t}, and let us assume that z is generated 
from a’ = p'\p...p, by swapping two 
adjacent elements in 2’. We construct the next 
permutation of a by swapping 7 and its left- 
adjacent or its right-adjacent element. More 
precisely, the rule of swapping is as follows. 


1. n is the last element of zr. 

1-1. is the second-to-last element of z’. 

In this case, x is obtained from z’ 
by swapping p),_, = n and p’, by 
Step 3. We swap two elements in [n — 1] 
by recursively applying this swapping 
algorithm to the —subpermutation 
Pip2---Pn—-1, Which is obtained from 
xz by removing the element n. Let zy 
be the obtained subpermutation. Then, 
we append n to zy as the last element. 
The obtained permutation is the next 
permutation. 

. nis also the last element of x’. 

We construct the next permutation of 
x by swapping pn—1 and py =n. 

2. n is the first element of 2. Similar to Step 1, 
we construct the next permutation of z, as 
follows: 

2-1. n is the second element of x’. 

We recursively apply this swapping 
algorithm to the subpermutation 
P2pP3...Pn. Let am, be the obtained 
subpermutation. Then, we append n to 
mz, as the first element. The obtained 
permutation is the next permutation. 
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Permutation Enumeration, Table 1 List of S4 in com- 
binatorial Gray code (nm = 4 is underlined) 


1234 3124 2314 
1243 3142 2341 
1423 3412 2431 
4123 4312 4231 
4132 4321 4213 
1432 3421 2413 
1342 3241 2143 
1324 3214 2134 


2-2. n is also the first element of z’. 
We construct the next permutation of 

x by swapping p, = and pp. 

3. Otherwise. 

We swap p; = n and p;+1 if z is obtained 
from x’ by swapping p;_, = n and p; of x’, 
and swap pj-1 and p; = n if z is obtained 
from x’ by swapping p; and p},, =n of x’. 


Table 1 shows the list of permutations in S4 
enumerated by the above swapping algorithm. 
Note that any permutation can be obtained by 
swapping two adjacent elements. 

Pseudocodes for the above algorithm are 
shown in Algorithm 2, which is the main 
routine, and Algorithm 3, which is a subroutine 
which generates the next permutation of a given 
permutation. These pseudocodes assume that 
a direction vector d = (d(1),d(2),...,d(n)) 
is stored in global memory. Each d(i) for 
i = 1,2,...,n represents the direction in which 
the element 7 in the current permutation goes to 
obtain the next permutation. More precisely, an 
instruction of “left” or “right” is stored in each 
d(i). By using the direction vector, we know 
in which direction two adjacent elements were 
swapped without needing to check the current 
permutation and the preceding permutation. 

Our implementations (Algorithms 2 and 3) are 
not efficient, but an efficient loopless algorithm 
was given by Sedgewick [8], as in the following 
theorem. 


Theorem 2 ([8]) After constructing the identity 
permutation in O(n) time, one can enumerate all 
permutations in Sy, in the order of a combinato- 
rial Gray code with a constant time delay. 


Permutation Enumeration 


It can be observed that the combinatorial Gray 
coder order defined by the algorithm represents a 
Hamiltonian path of the permutohedron. Figure 2 
shows a permutohedron of S4 and its Hamil- 
tonian path corresponding to the combinatorial 
Gray code. 


Enumeration by Reverse Search 

Avis and Fukuda [2] proposed a reverse search 
enumeration method. The idea of the reverse 
search method is as follows: We first define a 
rooted tree structure for the objects such that each 
vertex corresponds to an object and each edge 
corresponds to a relation between two objects. 
Then, by traversing the tree structure, we enumer- 
ate all of the objects. 

Now, we illustrate the reverse search method 
by applying it to permutation enumeration [9]. 
We define a rooted tree structure for S,, as 
follows. To construct a tree structure, we de- 
fine its root and the parent of each permutation 


Algorithm 2: GRAY-CODE(1) 


1 d(i) fori = 1,2,...,m represents in which 
direction the element 7 goes; 

2 foreach = 1,2,...,1 do 
/x Initialization of direction 
vector d(i) «/ 
d(i) < left 


4na<12...n /* Set the identity 
permutation to mw. «/; 

5 foreachi = 1,2,...,n!do 

Output 7; 

7 me < SWAP(n,7); 


i 


Algorithm 3: SWAP(1, 7 = P1p2..- Pn) 


1 ifm = 1 then return 2 

2 else if py =n and d(n) = right then 

3 qe’ < SwaP(n — 1, pi p2.-- Pn—1)3 

4 d(n) < left; 

5 return z’ +7 

6 else if pj =n and d(n) = left then 

7 mt’ < SWAP(n — 1, D2 p3.-- Pn); 

8 d(n) < right; 

9 return 1 + 7’ 

0 else return the permutation obtained from a by 
swapping 7 and its left-adjacent or its right-adjacent 
element depending on d(n) 


Permutation Enumeration 


Permutation 
Enumeration, Fig. 2 
Permutohedron of S4 and 
its Hamiltonian path 


in S, except the root. We define the identity 
permutation 1 12...n as the root of the 
tree structure. Then, we define the parent of a 
permutation by an adjacent swap of two elements. 
Intuitively, the parent is defined so that there is 
greater similarity between it and ¢ than there is 
between the child and v. A formal definition is 
as follows. Let 7 = pip2... Pn € Sn \ {t} bea 
permutation, and let 7 be the minimum index such 
that p; > pj+1 holds. The parent permutation of 
x, denoted by P(z), is the permutation obtained 
from a by swapping p; and p;, ;. Then, we 
call a a child permutation of P(s). Note that 
the parent permutation of z is uniquely defined. 
For example, P(3421) = 3241, P(3241) = 
2341, P(2341) = 2314, P(2314) = 2134, and 
P(2134) = 1234 are obtained. By repeatedly 
finding the parent permutations, we have the 
sequence of permutations in S,, which ends up 
with the identity permutation. By merging these 
sequences, we have the tree structure, called the 
family tree T, of S,. Figure 3 shows the family 
tree Ty. 

We next design an algorithm that traverses 
the family tree by recursively generating all of 
the child permutations of any permutation. In- 
tuitively, the operation to generate a child per- 
mutation is the reverse of finding the parent 
permutation. 

We introduce some notation. Let z 
Pip2--- Pn bea permutation in S,. Then, we de- 
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note by z[i] = pi p2--- Pi-1 Pi+1 Pi Pi+2 +++ Pns 
the permutation obtained from z by swapping p; 
and pj+1. Note that z[i] is a child permutation 
if and only if z P(a[i]) holds. A key point 
to find i with = P(z{i]) is to maintain the 
reverse point r(z), which is the minimum index 
of z such that p;(7) > Pr(x)+1- For convenience, 
we set r(.) = n for the identity permutation in 
Sn. Note that the subpermutation py p2... Pr(z) 
is the maximal increasing prefix of z. If we know 
r(x) of z, all child permutations are generated, 
as follows. For eachi = 1,2,...,r(7) — 1, x[i] 
is a child permutation of z since m = P(z[i]) 
holds. If pp(7) < Pr(x)+2 holds, then z[r (7) +1] 
is a child permutation of za. Otherwise, 
z[r(s) + 1] is not a child permutation. For each 
r(x), r(a) + 2,r(z) + 3,...,n — 1, x[i] 


i= 
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Algorithm 4: REVERSE-SEARCH(x = 
P1P2--- Pn) 


1 Let r(z) be a reverse point of zr; 

2 Output 7; 

3 for eachi = 1,2,...,r(a)—1do 
4 | REVERSE-SEARCH(z[i]) 


5 if r (2) <n—2and Pr(x) < Pr(a)+2 then 
REVERSE-SEARCH(st[r (zt) + 1]) 


is not a child permutation. Based on the above 
observation, we obtain the enumeration algorithm 
shown in Algorithm 4. To begin, Algorithm 4 is 
called with the identity permutation which is the 
root of the family tree. 

By maintaining the reverse point of the current 
permutation in a traverse of the family tree, we 
can use a stack to generate each child permutation 
in O(1) time. To estimate the running time of the 
algorithm, note that the algorithm can traverse 
each edge of the family tree in O(1) time. 
However, the delay time of the algorithm is not 
bounded by O(1) time for the case that the next 
permutation is output after deep recursive calls 
without outputting any permutation. However, 
by applying the speed-up method proposed by 
Nakano and Uno [6], we have the following 
lemma. 


Theorem 3 ((9]) After constructing the root (the 
identity permutation) in O(n) time, one can enu- 
merate all the permutations in Sy by the reverse 
search method with a constant time delay. The 
required working space is O(n). 
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Problem Definition 


Let n be a positive integer. A distance matrix 
of order n is a matrix D of size (n x n) which 
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satisfies (1) D;,; > 0 for alli, 7 € {1,2,...,n} 
with i # j; (2) Di = O for all i,j € 
{1,2,...,m} with? = j; and (3) Dj; = Dj, 
for all i,j € {1,2,...,n}. In the literature, 
a distance matrix of order n is also called a 
dissimilarity matrix of order n. 

Below, all trees are assumed to be unrooted 
and edge-weighted. For any tree 7, the distance 
between two nodes u and v in TJ is defined as the 
sum of the weights of all edges on the unique path 
in 7 between u and v and is denoted by ds A 
tree J is said to realize a given distance matrix D 
of order n if and only if it holds that {1,2,...,} 
is a subset of the nodes of 7 and dj, = Dj,; 
for alli, 7 € {1,2,...,n}. Finally, a distance 
matrix D is called additive or tree-realizable if 
and only if there exists a tree which realizes D. 
See Fig. 1 for an example. 


Problem 1 (The Phylogenetic Tree from Dis- 
tance Matrix Problem) 

INPUT: A distance matrix D of order n 

OUTPUT: A tree which realizes D and has the 
smallest possible number of nodes, if D is addi- 
tive, otherwise null 


In the time complexities listed below, the time 
needed to input all of D is not included. Instead, 
O(1) is charged to the running time whenever 
an algorithm requests to know the value of any 
specified entry of D. 


Key Results 


Several authors have independently shown how 
to solve the Phylogenetic Tree from Distance 
Matrix Problem in O(n?) time. (See [5] for a 
short survey of older algorithms which do not run 
in O(n?) time.) 


Theorem 1 ((2, 4,5, 7, 14]) There exists an al- 
gorithm which solves the Phylogenetic Tree from 
Distance Matrix Problem in O(n) time. 


Although the various existing algorithms are 
different, it can be proved that: 


1565 


Theorem 2 ([8,14]) For any given distance ma- 
trix, the solution to the Phylogenetic Tree from 
Distance Matrix Problem is unique. 


Furthermore, the algorithms referred to in 
Theorem 1 have optimal running time since 
any algorithm for the Phylogenetic Tree from 
Distance Matrix Problem must in the worst case 
query all 2(n”) entries of D to make sure that 
D is additive. However, if it is known in advance 
that the input distance matrix is additive, then the 
time complexity improves as follows. 


Theorem 3 ((9, 12]) There exists an algorithm 
which solves the Phylogenetic Tree from Distance 
Matrix Problem restricted to additive distance 
matrices in O(kn log; n) time, where k is the 
maximum degree of the tree that realizes the input 
distance matrix. 


The algorithm of Hein [9] starts with a tree 
containing just two nodes and then successively 
inserts each node i into the tree by repeatedly 
choosing a pair of existing nodes and computing 
where on the path between them that i should be 
attached, until 7’s position has been determined. 
The same basic technique is used in the O(n7)- 
time algorithm of Waterman et al. [14] referenced 
to by Theorem | above, but the algorithm of 
Hein selects paths which are more efficient at dis- 
criminating between the possible positions for 7. 
According to [12], the running time of Hein’s 
algorithm is O(kn log; n). 

A lower bound that implies the optimality of 
Theorem 3 is given by the next theorem. 


Theorem 4 ({10]) The Phylogenetic Tree from 
Distance Matrix Problem restricted to additive 
distance matrices requires S2(kn log, n) queries 
to the distance matrix D, where k is the maxi- 
mum degree of the tree that realizes D, even if 
restricted to trees in which all edge weights are 
equal to 1. 


Independently of [9], Culberson and Rud- 
nicki [5] presented an algorithm for the 
Phylogenetic Tree from Distance Matrix Problem 
and claimed it to have O(knlog,n) time 
complexity when restricted to additive distance 
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Phylogenetic Tree Construction from a Distance Matrix, Fig. 1 (a) An additive distance matrix D of order 5. 
(b) A tree 7 which realizes D. Here, {1,2,...,5} forms a subset of the nodes of T 


matrices and trees in which all edge weights 
are equal to 1. As pointed out by Reyzin and 
Srivastava [12], the algorithm actually runs in 
0 (n3/2 Vk) time. See [12] for a counterexample 
to [5] and a correct analysis. On the positive side, 
the following special case is solvable in linear 
time by the Culberson-Rudnicki algorithm: 


Theorem 5 ((5]) There exists an O(n)-time al- 
gorithm which solves the Phylogenetic Tree from 
Distance Matrix Problem restricted to additive 
distance matrices for which the realizing tree 
contains two leaves only and has all edge weights 
equal to 1. 


Applications 


The main application of the Phylogenetic Tree 
from Distance Matrix Problem is in the con- 
struction of a tree (a so-called phylogenetic tree) 
that represents evolutionary relationships among 
a set of studied objects (e.g., species or other 
taxa, populations, proteins, genes, etc.). Here, it 
is assumed that the objects are indeed related 
according to a treelike branching pattern caused 
by an evolutionary process and that their true 
pairwise evolutionary distances are proportional 
to the measured pairwise dissimilarities. See, 
e.g., [1, 6, 7, 14] for examples and many ref- 
erences as well as discussions on how to esti- 
mate pairwise dissimilarities based on biological 
data. Other applications of the Phylogenetic Tree 
from Distance Matrix Problem can be found in 
psychology, for example, to describe semantic 


memory organization [1], in comparative linguis- 
tics to infer the evolutionary history of a set of 
languages [11], or in the study of the filiation of 
manuscripts to trace how manuscript copies of a 
text (whose original version may have been lost) 
have evolved in order to identify discrepancies 
among them or to reconstruct the original text [1, 
3, 13]. 

In general, real data seldom forms additive 
distance matrices [14]. Therefore, in practice, 
researchers consider optimization versions of the 
Phylogenetic Tree from Distance Matrix Prob- 
lem which look for a tree that “almost” real- 
izes D. Many alternative definitions of “almost” 
have been proposed, and numerous heuristics 
and approximation algorithms have been devel- 
oped. A comprehensive description of some of 
the most popular methods for phylogenetic re- 
construction from a non-additive distance matrix 
such as Neighbor-joining [16] as well as more 
background information can be found in, e.g., 
Chapter 11 of [6]. See also [1] and [15] and the 
references therein. 
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Problem Definition 


In the classic VERTEX-DISJOINT PATHS prob- 
lem, the input consists of an n-vertex graph G and 
k pairs of terminals (s;, tj ae , and the question is 
whether there exist pairwise VERTEX-DISJOINT 
PATHS Pj, P2,..., Px such that for every 1 < 
i <k, the path P; starts in s; and ends in ¢;. In 
this entry we are interested in the complexity of 


this problem restricted to planar directed graphs. 


Key Results 


An algorithm for the VERTEX-DISJOINT PATHS 
problem in undirected graphs with running time 
f(k)n? for some function f is one of the key 
ingredients of the minor testing algorithm of 
Robertson and Seymour [8]. The approach can 
be summarized as follows: either the input graph 
has treewidth bounded by a function of k, in 
which case we can apply standard dynamic 
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Planar Directed k-VeRTEX-DisJOINT PaTHs Problem, Fig. 1 Different homotopy classes of a solution: in the first two 
figures, the solutions are of the same class, whereas on the third figure the homotopy class is different 


programming techniques, or, by the Excluded 
Grid Theorem, the input graph contains a large 
grid minor. In the second case, we may deduce 
that a middle vertex of the grid minor is irrelevant 
for the problem and can be discarded. 

The original proof of the irrelevancy of a 
middle vertex of a grid minor by Robertson and 
Seymour [8] is not only highly involved but also 
leads to an extremely large dependency on k in 
the running time bound. A more recent algorithm 
by Kawarabayashi and Wollan [6] improves upon 
the original approach in both these aspects, but is 
still very complex. 

As already observed by Robertson and Sey- 
mour in [7], a situation becomes dramatically 
simpler if we restrict ourselves to planar graphs. 
A short self-contained argument of irrelevancy 
of the middle vertex of a ck x ck grid minor, 
for some universal constant c, is due to Adler, 
Kolliopoulos, Krause, Lokshtanov, Saurabh, and 
Thilikos [1]. It is worth noting that the expo- 
nential dependency on k in the irrelevant vertex 
argument is necessary. The intuitive reason why 
planarity greatly helps in the VERTEX-DISJOINT 
PATHS problem is that, on the plane, the solution 
paths need to correspond to noncrossing curves 
and one path serves as a separator for other paths. 
This allows us to use a wide variety of topological 
arguments. 

In directed graphs, the VERTEX-DISJOINT 
PATHS problem is already NP-hard for two 
paths (k = 2) [3], so we cannot hope for similar 
results. However, it turns out that in the directed 
case the planarity assumption is very useful, 
too. More than 20 years ago, Schrijver showed 
that the VERTEX-DISJOINT PATHS problem in 
n-vertex planar directed graphs can be solved 


in time n°) [9]. Recently, Cygan, Marx, 
Pilipczuk, and Pilipczuk presented a fixed- 
parameter algorithm for this problem, running 


2. 
in time 22°" n° [2], 


Key Techniques for Planar Directed Graphs 


The Schrijver’s Algorithm 

The approach of Schrijver [9] can be summarized 
as follows. The main observation is that there are 
n©*) homotopy types of the solution, where two 
different solutions are considered homotopical if 
the paths of one solution can be “shifted” (modi- 
fied by a homotopy) to obtain the second solution, 
without crossing any face that contains a terminal 
(without loss of generality, we may assume that 
all terminals are of degree one, and the notion 
of a face containing a terminal is well defined). 
See also Fig. 1 for an illustration. A second in- 
gredient of Schrijver’s approach is a polynomial- 
time algorithm that essentially checks if there 
exists a solution in one homotopy class. (It should 
be noted that this statement is a significant sim- 
plification, as the Schrijver’s algorithm operates 
on the notions of (co)homologies and in fact 
searches for a solution in a significant superset of 
one homotopy class, but that is sufficient for our 
needs.) 


The Fixed-Parameter Algorithm 

The first step in the fixed-parameter algorithm 
of [2] is an appropriate irrelevant vertex rule for 
the problem. Unfortunately, for directed graphs 
there is no Excluded Grid Theorem which as 
convenient as it is in the undirected case. Al- 
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Planar Directed 
k-VERTEX-DISJOINT PATHS 
Problem, Fig.2 An 
example of a 
decomposition with six 
disk components and a 
single ring component 


though the conjecture of Johnson, Robertson, 
Seymour, and Thomas [4] about the connections 
between directed treewidth and directed grid mi- 
nors has been recently proven by Kawarabayashi 
and Kreutzer for graph excluding a fixed mi- 
nor [5], neither directed treewidth seems well 
suited for dynamic programming algorithm for 
the VERTEX-DISJOINT PATHS problem [10], nor 
it is clear whether a directed grid minor can be 
useful for an irrelevant vertex argument. In [2], it 
is proven that a family of sufficiently many con- 
centric cycles with alternating direction, without 
any terminal enclosed by the outermost cycle, is 
sufficient to make an irrelevant vertex argument. 
In the light of such an irrelevant vertex rule, 
the next question is: what is the structure of a 
graph without many concentric cycles with al- 
ternating directions? The answer provided in [2] 
can be informally stated as follows: such a graph 
can be decomposed into a bounded (in k) number 
of disk and ring components, connected by a 
bounded number of bundles, where every bundle 
is a set of edges in one direction that lie close to 
each other on the plane (see Fig. 2). 
Unfortunately, it is not easy to make algorith- 
mic use of such a decomposition; this should 
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be contrasted with the undirected case, where 
bounded treewidth immediately yields efficient 
algorithms via standard dynamic programming 
approach. The approach taken in [2] is to make 
use of Schrijver’s approach and use the decom- 
) nOW 6 


position to enumerate only go 
able” homotopy types of the solution. 

Recall that the nC term in the time com- 
plexity of Schrijver’s algorithm comes from the 
number of homotopy classes. It is quite easy to 
see that this bound cannot be improved: each 
of k solution paths can “wind” arbitrary num- 
ber of times in some part of the graph, lead- 
ing to different homotopy classes. This is also 
the case in the decomposition, as can be seen 
on Fig.3. To deal with this issue, an involved 
technical argumentation is developed in [2] to 
show that for any such place in the graph as 
on Fig.3, there is some “canonical” number of 
turns, and we may assume that the solution will 
take approximately this number of turns, up to 
an additive f(k) factor, for some computable 
function f. This leads to an FPT bound on the 
number of “reasonable” homotopy classes to con- 
sider and, consequently, to an FPT running time 
bound. 


reason- 
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Planar Directed 
k-VERTEX-DISJOINT PATHS 
Problem, Fig.3 An 
example of a solution path 
winding many times 
between four disk 
components 
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Problem Definition 


Let S be a set of n points in the plane and let 
G be an undirected graph with vertex set S, in 
which each edge (u,v) has a weight, which is 
equal to the Euclidean distance |uv| between the 
points uv and v. For any two points p and g in S, 
their shortest-path distance in G is denoted by 
6G(p,q). If t = 1 is areal number, then G is a t- 
spanner for S if 5g(p,q) < t|pq| for any two 
points p and q in S. Thus, if ¢ is close to 1, 
then the graph G contains close approximations 
to the (5) Euclidean distances determined by the 
pairs of points in S. If, additionally, G consists 
of O(n) edges, then this graph can be considered 
a sparse approximation to the complete graph on 
S. The smallest value of t for which G is a t- 
spanner is called the stretch factor (or dilation) 
of G. For a comprehensive overview of geomet- 
ric spanners, see the book by Narasimhan and 
Smid [16]. 

Assume that each edge (u,v) of G is em- 
bedded as the straight-line segment between the 
points u and v. The graph G is said to be plane if 
its edges intersect only at their common vertices. 

In this entry, the following two problems are 
considered: 


Problem 1 Determine the smallest real number 
t > 1 for which the following is true: For every 
set S of n points in the plane, there exists a plane 
graph with vertex set S, which is a f-spanner for 
S. Moreover, design an efficient algorithm that 
constructs such a plane f-spanner. 


Problem 2 Determine the smallest positive inte- 
ger D for which the following is true: There exists 
a constant f, such that for every set S of n points in 
the plane, there exists a plane graph with vertex 
set S and maximum degree at most D, which is 
a t-spanner for S. Moreover, design an efficient 
algorithm that constructs such a plane f-spanner. 
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Key Results 


Let S be a finite set of points in the plane that 
is in general position, i.e., no three points of S 
are on a line and no four points of S are on 
a circle. The Delaunay triangulation of S is the 
plane graph with vertex set S$, in which (u, v) 
is an edge if and only if there exists a circle 
through uw and v that does not contain any point 
of S in its interior. (Since S is in general posi- 
tion, this graph is a triangulation.) The Delaunay 
triangulation of a set of n points in the plane 
can be constructed in O(n logn) time. Dobkin, 
Friedman and Supowit [10] were the first to show 
that the stretch factor of the Delaunay triangu- 
lation is bounded by a constant: They proved 
that the Delaunay triangulation is a t-spanner 
for t = 2(1 + V5)/2. The currently best known 
upper bound on the stretch factor of this graph is 
due to Keil and Gutwin [12]: 


Theorem 1 Let S be a finite set of points in the 
plane. The Delaunay triangulation of S is a t- 
spanner for S, for t = 41 V/3/9. 


A slightly stronger result was proved by Bose et 
al. [3]. They proved that for any two points p 
and q in S, the Delaunay triangulation contains 
a path between p and g, whose length is at most 
(42 /3/9)|pq| and all edges on this path have 
length at most |pq]. 

Levcopoulos and Lingas [14] generalized the 
result of Theorem |: Assume that the Delaunay 
triangulation of the set S is given. Then, for any 
real number r > 0, a plane graph G with vertex 
set S can be constructed in O(n) time, such that G 
is a t-spanner for S, where t = (1+1/r)42 /3/9, 
and the total length of all edges in G is at most 
2r + 1 times the weight of a minimum spanning 
tree of S. 

The Delaunay triangulation can alternatively 
be defined to be the dual of the Voronoi dia- 
gram of the set S. By considering the Voronoi 
diagram for a metric other than the Euclidean 
metric, a corresponding Delaunay triangulation is 
obtained. Chew [7] has shown that the Delaunay 
triangulation based on the Manhattan-metric is 
a J 10-spanner (in this spanner, path-lengths are 
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measured in the Euclidean metric). The currently 
best result for Problem | is due to Chew [8]: 


Theorem 2 Let S be a finite set of points in the 
plane, and consider the Delaunay triangulation 
of S that is based on the convex distance function 
defined by an equilateral triangle. This plane 
graph is a 2-spanner for S (where path-lengths 
are measured in the Euclidean metric). 


Das and Joseph [9] have generalized the result of 
Theorem | in the following way (refer to Fig. 1). 
Let G be a plane graph with vertex set S and let 
a be a real number with 0 < a < 7/2. For any 
edge e of G, let A, and A> be the two isosceles 
triangles with base e and base angle a. The edge e 
is said to satisfy the a-diamond property, if at 
least one of the triangles A, and A» does not 
contain any point of S in its interior. The plane 
graph G is said to satisfy the a-diamond property, 
if every edge e of G satisfies this property. For 
a real number d > 1, G satisfies the d-good 
polygon property, if for every face f of G, and 
for every two vertices p and g on the boundary 
of f,; such that the line segment joining them is 
completely inside f, the shortest path between p 
and q along the boundary of f has length at most 
d\pq|. Das and Joseph [9] proved that any plane 
graph satisfying both the a-diamond property and 
the d-good polygon property is a f-spanner, for 
some real number ¢ that depends only on a and 
d. A slight improvement on the value of t was 
obtained by Lee [13]: 


Theorem 3 Let a € (0,2/2) andd > 1 be real 
numbers, and let G be a plane graph that satisfies 
the a-diamond property and the d-good polygon 
property. Then, G is a t-spanner for the vertex set 
of G, where 


_ 8(4—a)*d 
~ a2 sin2(a/4) 


To give some examples, it is not difficult to 
show that the Delaunay triangulation satisfies 
the a-diamond property with a = 2/4. Drysdale 
et al. [11] have shown that the minimum weight 
triangulation satisfies the a-diamond property 
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with a = 2/4.6. Finally, Lee [13] has shown 
that the greedy triangulation satisfies the a- 
diamond property with a = 2/6. Of course, 
any triangulation satisfies the d-good polygon 
property with d = 1. 

Now consider Problem 2, that is, the problem 
of constructing plane spanners whose maximum 
degree is small. The first result for this problem 
is due to Bose et al. [2]. They proved that the 
Delaunay triangulation of any finite point set con- 
tains a subgraph of maximum degree at most 27, 
which is a ¢-spanner (for some constant f). Li and 
Wang [15] improved this result, by showing that 
the Delaunay triangulation contains a f-spanner 
of maximum degree at most 23. Given the De- 
launay triangulation, the subgraphs in [2, 15] can 
be constructed in O(n) time. The currently best 
result for Problem 2 is by Bose et al. [6]: 


Theorem 4 Let S be a set of n points in the 
plane. The Delaunay triangulation of S contains 
a subgraph of maximum degree at most 17, which 
is a t-spanner for S, for some constant t. Given the 
Delaunay triangulation of S, this subgraph can be 
constructed in O(n) time. 


In fact, the result in [6] is more general: 


Theorem 5 Let S be a set of n points in the 
plane, let a € (0,2 /2) be a real number, and let 
G be a triangulation of S that satisfies the a- 
diamond property. Then, G contains a subgraph 
of maximum degree at most 14 + [21/a], which 
is a t-spanner for S, where t depends only on a. 
Given the triangulation G, this subgraph can be 
constructed in O(n) time. 


Applications 


Plane spanners have applications in on-line path- 
finding and routing problems that arise in, for 
example, geographic information systems and 
communication networks. In these application 
areas, the complete environment is not known, 
and routing has to be done based only on the 
source, the destination, and the neighborhood of 
the current position. Bose and Morin [4, 5] have 
shown that, in this model, good routing strategies 
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Planar Geometric Spanners, Fig. 1 On the left, the 
a-diamond property is illustrated. At least one of the 
triangles A, and A> does not contain any point of S in 
its interior. On the right, the d-good polygon property is 


exist for plane graphs, such as the Delaunay 
triangulation and graphs that satisfy both the a- 
diamond property and the d-good polygon prop- 
erty. These strategies are competitive, in the sense 
that the paths computed have lengths that are 
within a constant factor of the Euclidean distance 
between the source and destination. Moreover, 
these routing strategies use only a limited amount 
of memory. 


Open Problems 


None of the results for Problems | and 2 that are 
mentioned in section “Key Results” seem to be 
optimal. The following problems are open: 


1. Determine the smallest real number ft, such 
that the Delaunay triangulation of any finite 
set of points in the plane is a t-spanner. It is 
widely believed that t = 2/2. By Theorem 1, 
t <4rV3/9. 

2. Determine the smallest real number ¢, such 
that a plane f-spanner exists for any finite set 
of points in the plane. By Theorem 2, ¢ < 2. 
By taking S to be the set of four vertices of 
a square, it follows that ¢ must be at least J/2. 

3. Determine the smallest integer D, such that 
the Delaunay triangulation of any finite set of 
points in the plane contains a ¢-spanner (for 
some constant t) of maximum degree at most 
D. By Theorem 4, D < 17. It follows from 
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aes 


illustrated. p and qg are two vertices on the same face f 
which can see each other. At least one of the two paths 
between p and q along the boundary of fhas length at most 
dlpq| 


results in Aronov et al. [1] that the value of 
D must be at least 3. 

Determine the smallest integer D, such that 
a plane f-spanner (for some constant #) of 
maximum degree at most D exists for any 
finite set of points in the plane. By Theorem 4 
and results in [1],3 < D < 17. 
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Problem Definition 


Given a directed, planar graph G = (V, E) with 
arc capacities c E > St, a subset S of 
source vertices, and a subset T of sink vertices, 
the goal is to find a maximum flow from the 
source vertices to the sink vertices: 


max > tsa 
suse S,sueE 
S.t. ye his = 2 tow =0 
uviuveE uwiuwe EB 
Vv €V\ (SUT) (1) 
O< fe <Ce Veeck (2) 
Key Results 


In general (i.e., nonplanar) graphs, multiple 
sources and sinks can be reduced to the single- 
source, single-sink case by introducing an 
artificial source and sink and connecting them 
to all the sources and sinks, respectively, but 
this reduction does not preserve planarity. Using 
Orlin’s algorithm for sparse graphs [21] leads 
to a running time of O(n?/logn). For integer 
capacities less than U, one could instead use the 
algorithm of Goldberg and Rao [9], which leads 
to arunning time of O(n! logn log U). 
Maximum flow in planar graphs with multiple 
sources and sinks was first studied by Miller 
and Naor [19]. They gave a divide-and-conquer 
algorithm for the case where all the sinks and 
the sources are on the boundary of a single face. 
Plugging in the linear-time shortest-path algo- 
rithm of Henzinger et al. [12] yields a running 
time of O(n logn). Borradaile and Harutyunyan 
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have given an iterative algorithm with the same 
running time [2]. Miller and Naor also gave an 
algorithm for the case where the sources and 
the sinks reside on the boundaries of k dif- 
ferent faces. Using the O(n logn) time single- 
source, single-sink maximum flow algorithm of 
Borradaile and Klein [3] yields a running time 
of O(k?n log” n). Miller and Naor show that, 
when it is known how much of the commodity is 
produced/consumed at each source and each sink, 
finding a consistent routing of flow that respects 
arc capacities can be reduced to negative-length 
shortest paths [19], which can be solved in planar 
graphs in O(n log” n/ log logn) time [20]. 


Near-Linear Time Algorithm 

Borradaile et al. gave the first O(npoly logn) 
time algorithm for the multiple-source, multiple- 
sink maximum flow problem in directed planar 
graphs. The approach uses pseudoflows [10, 14] 
(flows which may violate the balance con- 
straints (1) in a limited way) and a divide-and- 
conquer scheme influenced by that of Johnson 
and Venkatesan [15] and that of Miller and 
Naor [19], using the separators introduced by 
Miller: a (triangulated) planar graph G can 
be separated by a simple cycle C of O(./n) 
vertices [18]. 

In each of the two subgraphs, a more gen- 
eral problem is solved in which, after the two 
recursive calls have been executed, within each 
of the two subgraphs, there is no residual path 
from any source to any sink nor from any source 
to C or from C to any sink. Then, since C is 
a separator, there is no residual path from any 
source to any sink in G, but, however, the balance 
constraints (1) may not be satisfied for vertices in 
C. The flow is then balanced among the vertices 
in C by augmenting the flow so that there is no 
residual path in G from a vertex with positive 
inflow to a vertex with positive outflow. The 
resulting flow can then be turned into a maximum 
flow in linear time. 

The core of the algorithm is this final balanc- 
ing procedure which involves a series of |C| — 
1 max-flow computations in G. Since |C| is 
O(./n), the challenge is carrying out all these 
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max-flow computations in near-linear time. The 
procedure uses a succinct representation to keep 
track of the changes to the pseudoflow without 
explicitly storing the changes. The representation 
relies on the relationship between circulations in 
G and shortest paths in the dual, and the compu- 
tations make use of an adaptation of Fakcharoen- 
phol and Rao’s efficient implementation of Dijk- 
stra’s algorithm [7]. The resulting running time 
to balance the flow is O(n log? n) time for an 
overall running time of O(n log? n) time for the 
original multiple-source, multiple-sink maximum 
flow problem. 


Applications 


Multiple-source, multiple-sink min-cut arises in 
several computer vision problems including im- 
age segmentation (or binary labeling) [11]. For 
the case of more than two labels, there is a 
powerful and effective heuristic [5] using a very 
large-neighborhood [1] local search; the inner 
loop consists of solving the two-label case. 
Maximum matching in a_ bipartite planar 
graph reduces to multiple-source, multiple-sink 
maximum flow. Multiple-source, multiple-sink 
maximum flow can also be used for finding 
orthogonal drawings of planar graphs with a 
minimum number of bends [6] and uniformly 
monotone subdivisions of polygons [23]. 
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Problem Definition 


Given a directed, planar graph G = (V, E) with 
arc capacities c E — St, a source vertex 
s, and a sink vertex ¢, the goal is to find a flow 
assignment fe for each arc e € E such that 


max y Tsu 


suisuce Ek 
sa t= >, dwt 
uviuveE uwiuwe EB 
Vue V \ {s,t} (1) 
O< fe < Ce Veek (2) 
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Key Results 


In the paper proposing the maximum flow prob- 
lem in general graphs, Ford and Fulkerson [5] 
gave a generic method for computing a maximum 
flow: the augmenting-path algorithm. The algo- 
rithm is iterative: find a path P from the source to 
the sink such that capacity constraint (2) is loose 
for each arc on P (residual); increase the flow 
on each arc in P by a constant chosen so that at 
least one of the capacity constraints become tight; 
update the capacities of each arc, making note 
that the reverse of these arcs now have residual 
capacity; and repeat until there is no path from 
the source to the sink along which the flow can be 
augmented. By augmenting the flow along a path, 
the balance constraints (1) are always satisfied. 


st-Planar Graphs 

Ford and Fulkerson further showed that, in the 
case of planar graphs when the source and the 
sink are on a common face (st-planar graphs), by 
selecting the augmenting paths to be as far to the 
left as possible in each iteration (viewing s on the 
bottom and ¢ on the top), each arc is saturated at 
most once, resulting in at most | | iterations [5]. 
In 1979, Itai and Shiloach showed that each 
iteration of this algorithm could be implemented 
in O(log) time using a priority queue and gave 
a simple example showing that any implemen- 
tation of this algorithm is capable of sorting n 
numbers [11]. In 1991, Hassin demonstrated that 
such a maximum st-flow could be derived from 
shortest-path distances in the planar dual G* of G 
where capacities in G are interpreted as lengths in 
G* [7]. Faster algorithms for computing shortest 
paths in planar graphs culminated in a linear-time 
algorithm for this case of maximum st-flow in 
planar graphs with s and ¢t on acommon face [9]. 


Undirected Planar Graphs 

For undirected planar graphs, Reif gave an algo- 
rithm for computing the maximum st-flow where 
s and ¢ need not be on a common face, by 
way of several shortest-path computations in the 
dual [19]. The algorithm finds a shortest path 
P in G* from a vertex adjacent to the face 
corresponding to s to a vertex adjacent to the 
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face corresponding to t. Reif proves that C only 
crosses P once; by finding the minimum separat- 
ing cycle Cy through each vertex v of P, we will 
surely find C: C is the minimum of the cycles Cy. 
These cycles can be found in time log 7 times the 
time for one shortest-path computation via divide 
and conquer over the length of P. Hassin and 
Johnson show that the corresponding maximum 
flow can be computed within this framework 
by computing shortest-path distances between 
the nested cycles C, [8]. The shortest-path al- 
gorithms of Henzinger et al. [9] or Klein [15] 
can be used to reimplement these algorithms 
in O(nlogn) time. Italiano et al. [12] further 
improved this running time to O(n loglogn) by 
using an r-division to break the graph into suffi- 
ciently small pieces through which shortest paths 
can be efficiently computed. 

If the capacities are all units, the maximum st- 
flow can be computed in linear time [1]. 


Directed Planar Graphs 

Maximum st-flow in directed graphs is more 
general since the problem of maximum st-flow in 
an undirected graph can be converted to a directed 
problem by introducing two oppositely oriented 
arcs of equal capacity for each edge. Johnson and 
Venkatesan gave a divide-and-conquer algorithm 
that finds a flow of input value v in O(n! logn) 
time [13]. The algorithm divides the graph using 
balanced separators, finding a flow in each side 
of value v. However, the flow on the O(./n)- 
boundary edges of each subproblem might not be 
feasible. Each boundary edge is made feasible via 
an st-planar flow computation. Miller and Naor 
showed that finding a directed st-flow of value 
v could be reduced to computing shortest-path 
distances in a graph with positive and negatives 
lengths [17]. Here, v units of flow are routed (per- 
haps violating the capacity constraints) along any 
s-to-t path P. For those arcs whose capacity are 
violated, we must route the excess flow through 
the rest of the graph. This is a feasible circulation 
problem and can be solved using shortest-path 
distances in the dual graph, where lengths may 
be negative (representing the negative or violated 
capacities). Using an O(n poly logn)-time algo- 
rithm for computing shortest paths in a planar 
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graph with negative edge lengths [4, 16, 18] gives 
an O(n poly logn log C)-time algorithm where 
C is the sum of the capacities. 

If the capacities are all unit, the maximum st- 
flow can be computed in linear time [21]. 


Leftmost-Path Algorithm 

Borradaile and Klein gave an augmenting-path, 
O(n logn)-time algorithm for the maximum st- 
flow problem in directed planar graphs. The al- 
gorithm is a generalization of the algorithm for 
the st-planar case, augmenting flow repeatedly 
along the leftmost path from s to t. However, 
with s and ¢ not on a common face, what left- 
most is not clear. With the graph embedded such 
that ¢ is on the external face and the clockwise 
cycles saturated, a leftmost path is well-defined 
and can be found with a left-first, depth-first 
search into t. Clockwise cycles can be initially 
saturated with a circulation defined by potentials 
on the faces given by shortest-path distances 
in the dual graph [14], and clockwise cycles 
remain saturated under leftmost augmentations. 
Borradaile and Klein, and Erickson improved the 
analysis [3] showed that under these conditions 
an arc and its reverse can be saturated at most 
once, resulting in at most 2m augmentations. 
Augmentations can be performed in O(logn) 
time using a dynamic tree data structure, resulting 
in an O(n logn) running time. 


Applications 


Maximum st-flow in directed planar graphs 
has applications to computer vision problems. 
Schmidt et al. [20] use it as a black box for image 
segmentation and Greig et al. [6] provide an 
example for smoothing noisy images. 


Open Problems 


Currently, maximum st-flow in undirected planar 
graphs can be computed more quickly than in 
directed. Can this gap be closed? 
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Experimental Results 


Schmidt et al. [20] have implemented this algo- 
rithm and compared its performance on an image 
segmentation problem. 


URLs to Code and Data Sets 


Hoch and Wang have provided an open-source 
implementation of the algorithm [10]. Eisenstat 
has an implementation of the linear-time algo- 
rithm for unit-capacity graphs [2]. 
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Problem Definition 


Given a graph G = (V, E), the crossing number 
cr(G) of G is the smallest number of edge cross- 
ings possible for any drawing of G into the plane. 
Since its introduction in the mid-1940s, the cross- 
ing number problem has proved to be notoriously 
difficult. Even some of the oldest and seemingly 
simplest questions remain unanswered, despite 
the large amount of research. 

Already the problem definition is more 
ambiguous than it may seem. We will usually 
only consider drawings where vertices are 
mapped to distinct points in the plane and 
edges to continuous non-self-intersecting curves 
between their end vertices. Any non-vertex 
point may only be contained in at most two 
edge curves, in which these curves have to 
meet transversally (i.e., cross). There are several 
different and specialized related crossing number 
variants; see [16] for a comprehensive annotated 
list. 

A planarization of a nonplanar graph G is a 
planar graph obtained from G by drawing G into 
the plane and replacing the crossings by dummy 
vertices of degree 4. Observe that in other liter- 
ature, the term planarization is also sometimes 
used to denote a (large) planar spanning subgraph 
of G. 


Key Results 


The (decision version of the) crossing number 
problem is NP-complete, even when all vertices 
have degree at most three [13] or when the 
removal of a single edge would give a planar 
graph [3]. There is, however, a fixed parameter 
tractable (FPT) algorithm to test in linear time 
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for a constant k (not part of the input), whether 
G allows a crossing number of at most k [15]. 
The dependency on k, however, is doubly ex- 
ponential, and the algorithm is far from being 
applicable in practice. Most questions regarding 
the problem’s approximability remain open; see 
below for details. 


Planarization Algorithms 
The practically strongest heuristic is the pla- 
narization approach, cf. Fig. 1: First, we seek a 
maximal planar subgraph (observe that finding a 
maximum planar subgraph is already NP-hard). 
In other words, we temporarily remove edges 
from G until it becomes planar. Then, we reinsert 
those edges with as few crossings as possible 
one after another. After each step, crossings are 
replaced by dummy vertices, such that we can 
consider a sequence of edge insertion problems: 
Given a planar graph H and an edge e ¢ E(H), 
let He := H +e. Since cr(H¢) is still NP- 
hard to obtain, we ask for a crossing-minimum 
solution under the side-constraint that H is drawn 
planarly. An embedding is an equivalence class 
over planar drawings, based on the cyclic order 
of the edges around their incident vertices. For a 
fixed embedding of H, the insertion problem is 
trivially solvable via a breadth-first search in H’s 
dual graph. However, the number of embeddings 
of H is exponential in general. 

The seminal paper [12] shows that it is possi- 
ble to find the best embedding in linear time using 
SPR-trees, or formally: 


Theorem 1 Let G be a planar graph and v,w € 
V(G) two vertices. We can find a planar em- 
bedding of G in O(|V(G)|) time, into which 
an edge e = (v,w) can be inserted with the 
least possible number of edge crossings over all 
possible embeddings. 


To discuss this result, we first need to describe 
SPR-trees, which are used to decompose graphs 
into their triconnectivity structures. While their 
graph-theoretic foundation is based on Tutte [17], 
the data structure was first suggested by Di Bat- 
tista and Tamassia [11] under the name SPOR- 
tree. Nowadays, we often drop the “Q” from the 
abbreviation, as the corresponding node type is 
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not necessary, and use the following contempo- 
rary definition (see, e.g., [7]), illustrated in Fig. 2. 


Definition 1 (SPR-tree) The SPR-tree T of a 
biconnected graph G is the unique smallest tree 
satisfying the following properties: 


1. Each node v in 7 holds a graph S, = 
(V,, Fy), Vv © V(G), called skeleton. Each 
edge of F,, is either a real edge from E(G) or 
a virtual edge f = (u,v) where {u, v} forms 
a 2-cut (a split pair) in G. 

2. T has only three different node types with the 
following skeleton structures: 

S: Sj, is a simple cycle — it represents a serial 
component. 

P: S, consists of two vertices connected by 
multiple edges — a parallel component. 

R: S, is a simple triconnected graph. 

3. For every edge (v, wz) in T, S, (S,,) contains 
a specific virtual edge e,, (€,) which “repre- 
sents” S,, (S,, respectively). Both edges e, 
and e, connect the same vertices. 

4. The original graph G can be obtained by 
recursively applying the following operation: 
For the edge (v, ) in T, let e,, ey be the 
virtual edges as in (3) connecting the same 
vertices u, v. A merged graph ($,US,.)—ey— 
é, is obtained by gluing the skeletons together 
at u, v and removing é,,, ey. 


There are several essential properties of any 
SPR-tree: First, it has linear size and can be 
constructed in linear time. Second, a simple tri- 
connected graph — the skeleton of an R-node — al- 
lows only a unique embedding and its mirror; the 
embedding of an S-node skeleton is unique, and 
the possible embeddings of a P-node are precisely 
all cyclic permutations of its edges. Moreover, 
each embedding of G can be precisely described 
via the subembeddings of the skeletons. 

In order to obtain Theorem 1, it is shown that 
we only need to consider the unique shortest path 
P in G’s SPR-tree from any node whose skeleton 
contains v to any node whose skeleton contains 
w. We will specify an embedding for each skele- 
ton S,, of the nodes x. € P; the embedding of 
all other skeletons is irrelevant. Moreover, the 
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optimum embedding of each of the former skele- 
tons can be chosen independent of the others: For 
each S,,, we can specify a source s and a target 
t and ask for an embedding such that edge (s, t) 
can be inserted with the least possible number of 
edge crossings into planar S,,: In the first (last) 
skeleton along P, the source (target) simply is v 
(w, respectively). In the other cases, the source 
(target) is the virtual edge corresponding to the 
predecessor (successor) along P. For simplicity, 
it may be helpful to consider a subdivision of 
such a virtual edge, such that each source and 
target can be represented by a vertex. If jz is an S- 
node, its skeleton has a unique embedding, (s, t) 
will require no crossings, and there is nothing to 
specify. If jz is a P-node, both the source and the 
target are virtual edges due to minimality of P; 
we pick any embedding where the two virtual 
edges appear consecutively, so that (s,f) again 
requires no crossings. Finally, if 2 is an R-node, 
its skeleton allows only a unique embedding and 
its mirror and hence has a unique dual graph. We 
compute a shortest path between s and ¢ via a 
simple breadth-first-search in the dual of S,, as 
for the fixed embedding case. 

Finally, traverse P and let 1, 4’ be any two 
consecutive nodes. We want to establish that if 
the insertion path in S, enters its target virtual 
edge “from the left” (corresponding to some arbi- 
trary predefined orientation of the virtual edges), 
it leaves the source virtual edge in S, “to the 
right” or vice versa. If this is not already the 
case, it suffices to flip the embedding of S,,, (we 


Planarisation and Crossing Minimisation, Fig. 1 The 
planarization heuristic: the first figure shows a graph 
drawn with the optimal number of three crossings. The 
planarization heuristic starts with a maximal (in the fig- 
ure, in fact, maximum) planar subgraph (second figure) 
where the edges (x, y) and (a,b) are removed. Then, 
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can ignore the case when jz’ is an S-node). This 
establishes a suitable embedding of G. 

This algorithmic breakthrough has allowed the 
planarization heuristic to perform extraordinar- 
ily well in practice, both in terms of running 
time and of solution quality. The algorithm and 
its proofs also give rise to several strong pre- 
and postprocessing routines. Furthermore, it is 
pivotal to several later results such as insertion 
of stars [6] or insertion-based approximation al- 
gorithms [2, 7, 8]. The strongest of the latter 
considers the problem of inserting several edges 
F simultaneously into a planar graph G [7]: 
it can be shown that this multi-edge-insertion 
problem approximates cr(G + F) [8], but is 
unfortunately itself already NP-hard. However, 
this insertion problem can in turn also be ap- 
proximated. This approximation chain gives the 
currently only practically relevant approximation 
algorithm. In fact, it arguably gives the best 
running time vs. solution-quality trade-off among 
all known algorithms in practice [5]. 


Exact Approaches 

In certain cases, a heuristic or approximate so- 
lution is not good enough, e.g., when the result 
is to be used as a base case in some formal 
graph-theoretic proof. There are exact approaches 
based on integer linear programs. The currently 
strongest one — constituting the second central 
paper of this entry — is able to solve typical “real- 
world graph drawing” instances (i.e., relatively 
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iteratively, we find optimal embeddings to reinsert these 
edges into the planarly drawn graph, simulating crossings 
by dummy vertices (shown as squares). Although the 
insertion problems are solved optimally, the result requires 
four crossings 
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R 


Planarisation and Crossing Minimisation, Fig. 2. A graph (to the /eft) and its decomposition into triconnectivity 
structures, arranged within an SPR-tree. Thick dashed or dotted edges are virtual 


sparse graphs with up to 80-100 vertices) to 
optimality in a couple of minutes [9]. The idea 
is to introduce binary indicator variables xe, ¢ for 
each edge pair e, f € E, which are | if and only 
if the two edges cross in the optimum solution. 
Minimizing the sum of these variables gives the 
desired objective function. 

It remains to ensure that the variables are 
set correctly. To this end, we introduce Kura- 
towski constraints. The famous Kuratowski the- 
orem states that planar graphs are characterized 
by the absence of subdivisions of certain small 
subgraphs (namely, K5 and K3,3). In other words, 
for each such subgraph K, we can require that 
at least one edge pair e, f € E(K) crosses. 
Unfortunately, there may be an exponential num- 
ber of such subgraphs in G, and we also have 
to take care of Kuratowski subgraphs that only 
arise because certain other edge pairs cross (es- 
tablishing a dummy vertex). Even when solving 
all these challenges, a further crucial problem 
remains: Consider a (presumably optimum) 0/1- 
assignment to the x-variables, satisfying all Ku- 
ratowski constraints. It is still NP-complete to 
decide whether this solution is at all feasible! 
Consider an edge e that is crossed by edges f and 
g. Our x-variables establish these crossings, but 
we do not know the order of f and g along e. De- 
ciding whether any feasible order exists is what 
makes the problem still hard. There are two meth- 
ods to solve the problem: We can subdivide each 
edge e sufficiently often and allow at most one 


crossing per edge segment. Alternatively we can 
introduce additional variables ye ¢ to explicitly 
describe a linear ordering of all edges crossing 
edge e, for all edges e. Of course, we have to 
modify the Kuratowski constraints accordingly. 

While the second modeling approach is prac- 
tically more efficient, we like to showcase the 
central ideas only with a simplified version of 
the first model here. Let U be any upper bound 
on the crossing number of G, e.g., found via the 
planarization heuristic described above. Clearly, 
any optimum solution will have at most U cross- 
ings on any edge. Therefore, let H be the graph 
obtained from G by subdividing each edge into 
U edge segments. Now, we solve the problem on 
H instead of G, where we can require that each 
segment is crossed at most once: 


os 


e,feE(H)e#f 


min X{e,f}, Subjectto (1) 


Vee E(H) (2) 


ye 


SEE (H)\{e} 


Xfe,f} <1 


Xfe,f} & {0, 1} Ve, f © E(A),e x f (3) 


This model minimizes the sum of the variables 
indicating a crossing between a pair of edge 
segments and allows at most one crossing per 
segment. It remains to discuss the Kuratowski 
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constraints to establish that only — graph- 
theoretically feasible solutions are allowed. Let 
Xs denote the set of all binary solution vectors 
satisfying (2), and consider any solution vector 
x € X;. Furthermore, let R(x) be the set of edge 
pairs {e, f} for which X,¢, ¢) = 1. Starting with 
H, we can realize x by, for each {e, f} € R(x), 
subdividing e and f and identifying the two 
new vertices. This vertex may be called a dummy 
vertex, representing the crossing. We obtain a 
final graph H [x] and let K(x) denote the set of 
all Kuratowski subdivisions in H [x]. Intuitively, 
for any subdivision K € K(x), we need to require 
at least one crossing on K, if the crossings 
R(x) exist. Formally, this establishes the final 
constraint class: 


a Xfe,fy 2 1- ~~ (1 — Xte, fy) 
e,feKk est f fe, f}ER(X) 
Vx 6X3, K € K(x) (4) 


Independent on whether we use the just de- 
scribed subdivision-based model or the stronger 
one based on linear orderings, the obtained ILP 
models are much too large to solve directly. 
We require both special separation and column- 
generation routines, in order to produce the ac- 
tually necessary constraints and variables on the 
fly. The approaches’ practical applicability is fur- 
thermore only possible due to the strong heuris- 
tics described above (which often give optimum 
upper bounds early on), heavy preprocessing [4], 
and efficient planarity testing routines. 


Open Problems 


The area of crossing numbers is filled with in- 
teresting open questions. Let us pinpoint two of 
them: 

The original question, as stated in 1944 by Pal 
Turan, was for the crossing number of complete 
bipartite graphs. We still do not know the answer 
for this graph class, nor do we for complete 
graphs. While we have upper bounds, which 
are conjectured to be optimal, we are stuck 
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with partial proofs and positive results for small 
graphs. 

We know that crossing number is APX- 
hard [1], i.e., there cannot be a polynomial 
approximation scheme. However, even for 
graphs with bounded maximum degree, the best 
approximation ratios are only slightly sublinear 
in |V| [10] or dependent on parameters like 
the graph’s genus [14] or the number of edges 
required to remove in order to become planar [7]. 
Does there exist a constant factor approximation 
to the crossing number problem? At least for 
graphs with bounded degree? 


URLs to Code and Data Sets 


The free (GPL) Open Graph Drawing Frame- 
work (OGDF) contains implementations of the 
strongest planarization heuristics and the exact 
algorithms: http://www.ogdf.net. 

A web front-end to the exact crossing mini- 
mizer is freely available at http://crossings.uos. 
de: 
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Problem Definition 


The problem is to determine whether or not the 
input graph G is planar. The definition pertinent 
to planarity-testing algorithms is: G is planar 
if there is an embedding of G into the plane 
(vertices of G are mapped to distinct points and 
edges of G are mapped to curves between their re- 
spective endpoints) such that edges do not cross. 
Algorithms that test the planarity of a graph can 
be modified to obtain such an embedding of the 
graph. 


Key Results 


Theorem 1 There is an algorithm that given 
a graph G with n vertices, determines whether or 
not G is planar in O(n) time. 


The first linear-time algorithm was obtained by 
Hopcroft and Tarjan [5] by analyzing an itera- 
tive version of a recursive algorithm suggested 
by Auslander and Parter [1] and corrected by 
Goldstein [4]. The algorithm is based on the 
observation that a connected graph is planar if 
and only if all its biconnected components are 
planar. The recursive algorithm works with each 
biconnected component in turn: find a separating 
cycle C and partition the edges of G not in C; 
define a component of the partition as consisting 
of edges connected by a path in G that does not 
use an edge of C; and, recursively consider each 
cyclic component of the partition. If each compo- 
nent of the partition is planar and the components 
can be combined with C to give a planar graph, 
then G is planar. 

Another method for determining planarity was 
suggested by Lempel, Even, and Cederbaum [6]. 
The algorithm starts with embedding a single 
vertex and the edges adjacent to this vertex. It 
then considers a vertex adjacent to one of these 
edges. For correctness, the vertices must be con- 
sidered in a particular order. This algorithm was 
first implemented in O(n) time by Booth and 
Lueker [2] using an efficient implementation of 
the PQ-trees data structure. Simpler implementa- 
tions of this algorithm have been given by Boyer 
and Myrvold [3] and Shih and Hsu [8]. 


Point Location 


Tutte gave an algebraic method for giving 
a straight-line embedding of a graph that, if the 
input graph is 3-connected and planar, is guaran- 
teed to generate a planar embedding. The key idea 
is to fix the vertices of one face of the graph to be 
the corners of a convex polygon and then embed 
every other vertex as the geometric average of its 
neighbors. 


Applications 


Planarity testing has applications to computer- 
aided circuit design and VLSI layout by deter- 
mining whether a given network can be realized 
in the plane. 


URL to Code 


LEDA has an efficient implementation of the 
Hopcroft and Tarjan planarity testing algo- 
rithm [7]: http://www.algorithmic-solutions.info/ 
leda_guide/graph_algorithms/planar_kuratowski. 
html 
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Problem Definition 


Point location is a well-studied problem in com- 
putational geometry with many applications in 
geometric information systems and computer- 
aided design. In general terms, the problem is to 
find which element of a given object contains a 
given query point. More precisely, we are given 
a subdivision S of a metric space, usually the 
Euclidean plane. The goal is then to preprocess 
S so that for a query point p we can determine 
which region of the subdivision contains p. There 
are many variants of the problem, with different 
constraints on the subdivision. The most common 
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variant and the focus of this overview is planar 
point location, where S is a polygonal subdivi- 
sion of the Euclidean plane. In fact, for several 
data structures, we assume that the input is a 
triangulation. (Note that a polygonal subdivision 
can be triangulated in linear time [1].) As for 
most query data structures, the efficiency of a 
point location structure is measured by its space 
requirement, the time needed to preprocess the 
input, and the query time. The efficiency of the 
algorithms and data structures described here will 
be expressed as a function of n, the number of 
vertices in the subdivision. 


Key Results 


Most solutions for the point location problem 
build upon one (or a combination) of three basic 
ideas: walking in a triangulation, a trapezoidal de- 
composition, or a hierarchical triangulation. The 
three methods provide a basic trade-off between 
space usage and query time and are described in 
more detail below. 


Walking in a Triangulation 

This approach is the simplest of the three and 
requires no additional storage or preprocessing if 
the input is provided in a suitable format in which 
it takes constant time to access the neighbors 
of a triangle. For each query, we start in some 
triangle or vertex of the triangulation and walk 
to the query point by traversing the triangulation, 
walking from the current triangle to a neighbor in 
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each step. The triangle to visit next is determined 
by a walking strategy. Devillers et al. [3] describe 
three walking strategies that are described below 
and illustrated in Fig. 1. 


¢ Straight-line walk. Starting at a vertex of the 
triangulation we walk along a straight line 
toward the query point. 

¢ Orthogonal walk. Instead of walking along a 
straight line, we first walk parallel to the x- 
axis and then parallel to the y-axis. 

¢ Visibility walk. For each triangle visited, we 
pick an edge of the triangle (at random or in 
some specific order) and test if its supporting 
line separates the query point from the trian- 
gle. If this is the case, then we continue our 
walk by crossing that edge into a new triangle. 


The worst-case behavior of each of the walk- 
ing strategies is very bad as there are triangula- 
tions where each walk visits {2 (7) triangles in ex- 
pectation given a random starting point and query 
point. In fact, the visibility walk is not guaranteed 
to reach the query point if edges are picked in 
a fixed order and the randomized version may 
require exponential time [3]. However, this seems 
to require many long and thin triangles, which do 
not occur in most practical applications, where 
the subdivision is often closer to a Delaunay tri- 
angulation. Experiments on Delaunay triangula- 
tions of random point sets suggest that the meth- 
ods are comparable in total query time, though 
they provide a trade-off between the number of 
triangles visited and the time spent per triangle. 


Point Location, Fig. 1 The paths traversed by (a) a straight-line walk, (b) an orthogonal walk, and (c) a visibility 
walk. Note that any path following the arrows is possible in the visibility walk 
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Point Location, Fig. 2 (a) Part of a trapezoidal decomposition with (b) a sequence of trapezoids visited by a query 


Point Location, Fig. 3 The triangulations of the hierarchy from left to right. Triangles containing the query point are 
marked with dark gray and other triangles inspected with light gray 


That is, the straight-line walk visits the fewest 
triangles, but needs most time to determine which 
triangle to visit next, whereas the visibility walk 
visits the most triangles, but spends the least time 
per triangle. 


Trapezoidal Decomposition 

The first data structure for point location to 
achieve O(logn) query time for point was 
provided by Dobkin and Lipton [6]. The structure 
is created by cutting the subdivision into slices 
using vertical lines through each of its vertices. 
Point location can then be done using two binary 
searches, first on the slices and then within a slice. 
This creates O(n) trapezoids, but Sarnak and 
Tarjan [11] show we don’t have to explicitly store 
all of them and instead only need O(n) space and 
O(nlogn) time to store them implicitly in a 
search structure. 

A subdivision into trapezoids can also be cre- 
ated in a more careful way so that only O(n) 
trapezoids are created. For a polygonal subdivi- 
sion, its trapezoidal decomposition is defined as 
the result of shooting vertical rays upward and 
downward from each vertex of the subdivision 
until they hit an edge of the subdivision — if they 
do not hit such an edge, they extend to infinity 
(see also Fig.2). A trapezoidal decomposition 
can be constructed by incrementally adding the 
edges of the subdivision [10]. Each insertion of an 


edge can interrupt some rays and introduces two 
new ones, namely, the rays from the endpoints 
of the edge. Seidel [12] showed that the history 
of this process can be recorded in a directed 
acyclic graph, which can be used for point loca- 
tion. Using randomized incremental construction, 
that is, adding the edges in random order, the 
longest path in the graph has an expected length 
of O(logn). Building the structure itself takes 
O(nlogn) expected time and O(n) expected 
space. For a guaranteed query time of O(logi), 
we can find the longest path and reconstruct 
if needed resulting in O(1 logn) expected pre- 
processing time and O(log”) guaranteed query 
time [13]. 


Hierarchical Triangulation 

Point location can also be done using a so- 
called hierarchical triangulation. We start with 
a triangulation of the subdivision and in each 
level of the hierarchy, we remove a subset of 
the points and compute a triangulation of the 
remainder. Each triangle of this new triangulation 
then stores a list of triangles from the previous tri- 
angulation that intersect it as illustrated in Fig. 3. 
This process is repeated until only one triangle 
remains. (Here, we assume that the outer face 
of the input triangulation is a triangle and these 
vertices are never removed.) To query for a point 
P, we traverse this hierarchy of triangles starting 
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at the top level consisting of only one triangle. At 
each level, we know the triangle 7 that contains 
p and find the triangle T’ from the previous level 
that contains p by a linear search on the triangles 
that intersect T (see Fig. 3 from right to left). 
The query time depends on the number of 
levels and the number of intersections each tri- 
angle has with the previous level. Kirkpatrick [9] 
showed that it is possible to find a constant 
size set of points, such that its removal creates 
triangles that each intersect a constant number 
of triangles from the previous level. He also 
shows that such a set of points can be found 
in O(n) time. By picking and removing points 
from each level in this way, we create at most 
O(log n) levels and each triangle will intersect at 
most O(1) triangles from the previous level. As a 
result, a point location query takes O(log 1) time 
in total, while the structure requires O(n) space. 


Hybrid and Refined Approaches 

The walking strategy requires no additional mem- 
ory and is fast on small instances, but has very 
poor performance on larger instances, both in 
theory and in practice. To make the walking 
approach more feasible for larger inputs, sev- 
eral structures have been proposed that combine 
walking strategies with other methods to obtain 
fast query times with low overhead costs in terms 
of space and preprocessing time. 


Delaunay Hierarchy. Hierarchical triangulations 
can be combined with walking algorithms to 
reduce the number of levels in the hierarchy. 
Finding the correct triangle in the next level of the 
hierarchy is done using a walking algorithm as 
opposed to a linear search. Devillers [2,3] showed 
that this results in a very fast query time without 
requiring a lot of preprocessing time or space. 


Jump and Walk. In most walking strategies, the 
starting point is chosen at random. In the jump- 
and-walk approach, we use a set S of several 
starting points and start our walk from the one 
that is nearest to the query point. The set S 
can either be chosen at random for each new 
query [5] or picked more carefully and stored in 
a data structure for nearest-neighbor searches [4]. 
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Depending on the size of S and the complexity of 
the search structure, query times between O( </n) 
and O(log) can be achieved. 


Experimental Results 


Several solutions have been implemented in 
CGAL, the Computational Geometry Algorithm 
Library. Haran and Halperin [7] compared several 
of the implementations from CGAL. Their 
results show that the various methods provide 
a trade-off between how much memory and 
preprocessing time is used and the resulting 
query time. Overall, they conclude that a jump- 
and-walk algorithm performs well, if the set 
of potential starting points is carefully chosen 
and stored in an efficient search structure. 
Recently, the CGAL variant of the trapezoidal 
decomposition approach has received a major 
overhaul [8]. Unlike some of the other variants, 
this implementation guarantees a O(logn) 
query time, while experiments show that the 
implementation is still competitive with other 
approaches. 
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Problem Definition 


Let R denote the set of reals and R@ the d- 
dimensional real space. A finite subset of R@ is 


called a point set. The set of all point sets (subsets 
of R2) is denoted P(R?). 
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Point pattern matching problems ask for 
finding similarities between point sets under 
some transformations. In the basic set-up a target 
point set T C R@ and a pattern point set (point 
pattern) P C R@ are given, and the problem 
is to locate a subset J of T (if it exists) such 
that P matches I. Matching here means that P 
becomes exactly or approximately equal to / 
when a transformation from a given set F of 
transformations is applied on P. 

Set F can be, for example, the set of all 
translations (a constant vector added to each 
point in P), or all compositions of translations and 
rotations (after a translation, each point is rotated 
with respect to a common origin; this preserves 
the distances and is also called a rigid movement), 
or all compositions of translations, rotations, and 
scales (after translating and rotating, distances to 
the common origin are multiplied by a constant). 

The problem variant with exact matching, 
called the Exact Point Pattern Matching (EPPM) 
problem, requires that f(P)=J for some 
f € F. In other words, the EPPM problem is 
to decide whether or not there is an allowed 
transformation f such that f(P) CT. For 
example, if F is the set of translations, the 
problem is simply to decide whether P + ¢t C T 
for some t € R¢. 

Approximate matching is a better model of 
many situations that arise in practice. Then the 
quality of the matching between f(P) and J is 
controlled using a threshold parameter ¢ = 0 and 
a distance function 8: (P(R@), P(R2)) > R for 
measuring distances between point sets. Given 
€ > 0, the Approximate Point Pattern Matching 
(APPM) problem is to determine whether there is 
asubset J C T andatransformation f € F such 
that 6(f(P), 7) <e. 

The choice of the distance function & is 
another source of diversity in the problem 
statement. A variant requires that there is 
a one-to-one mapping between f(P) and J, and 
each point p of f(P) is e-close to its one-to- 
one counterpart p* in J, that is, |p— p*| <e. 
A commonly studied relaxed version uses 
matching under a many-to-one mapping: it is only 
required that each point of f(P) has some point 
of J that is e-close; this distance is also known 
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as the directed Hausdorff distance. Still more 
variants come from the choice of the norm | - | to 
measure the distance between points. 

Another form of approximation is obtained 
by allowing a minimum amount of unmatched 
points in P: The Largest Common Point Set (LCP) 
problem asks for the largest J C T such that 
Ic f(P) for some f € Ff. In the Largest Ap- 
proximately Common Point Set (LACP) problem 
each point p* € J must occur e-close to a point 
pe F(P). 

Finally, a problem closely related to point pat- 
tern matching is to evaluate for point sets A and B 
their smallest distance min sez 5( f(A), B) under 
transformations F or to test if this distance is < e. 
This problem is called the distance evaluation 
problem. 


Key Results 


A folk theorem is a voting algorithm to 
solve EPPM under translations in O(|P||T| 
log(|7||P|)) time: Collect all translations 
mapping each point of P to each point of T, sort 
the set, and report the translation getting most 
votes. If some translation gets |P| votes, then 
a subset J such f(P) = J is found. With some 
care in organizing the sorting, one can achieve 
O(|P||T | log | P|) time [13]. 

The voting algorithm also solves the LCP 
problem under translations. A faster algorithm 
specific to EPPM is as follows: Let p1, p2,-+: Dm 
and ¢1,f2,°-+t, be the lists of pattern and target 
points, respectively, lexicographicly ordered ac- 
cording to their d-dimensional coordinate values. 
Consider the translation f;, = tj, — pi, for any 
1 <i, <n. One can scan the target points in the 
lexicographic order to find a point ¢;, such that 
p2+ fi, = ti,. If such is found, one can continue 
scanning from ¢;,41; on to find ¢;, such that 
p3+ fi, = tiz. This process is continued until 
a translated point of P does not occur in T or until 
a translated occurrence of the entire P is found. 
Careful implementation of this idea leads to the 
following result showing that the time bound of 
the naive string matching algorithm is possible 
also for the exact point pattern matching under 
translations. 
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Theorem 1 (Ukkonen et al. 2003 [13]) The 
EPPM problem under translations for point pat- 
tern P and target T can be solved in O(mn) time 
and O(n) space wherem = |P| < |T| =n. 


Quadratic running times are probably the best one 
can achieve for PPM algorithms: 


Theorem 2 (Clifford et al. 2006 [10]) The LCP 
problem under translations is 3SUM-hard. 


This means that an o(|P||T|) time algorithm 
for LCP would yield an o(n”) algorithm for the 
3SUM problem, where |7'| = n and |P| = O(n). 
The 3SUM problem asks, given m numbers, 
whether there are three numbers a, b, and c 
among them such that a+ b+ c = 0; finding 
a sub-quadratic algorithm for 3SUM would be 
a surprise [5]. For a more in-depth combinatorial 
characterization of the geometric properties of 
the EPPM problem, see [7]. 

For the distance evaluation problems there are 
plethora of results. An excellent survey of the key 
results until 1999 is by Alt and Guibas [2]. As an 
example, consider in the 2-dimensional case how 
one can decide in O(n logn) time whether there 
is a transformation f composed of translation, 
rotation and scale, such that f(A) = B, where 
A, B C R? andn = |A| = |B|: The idea is to 
convert A and B into an invariant form such that 
one can easily check their congruence under the 
transformations. First, scale is taken into account 
by scaling A to have the same diameter as B 
(in O(n logn) time). If A and B are congruent, 
then they must have the same centroids (which 
can be computed O() time). Consider rotating 
a line from the centroid and listing the angles 
and distances to other points in the order they 
are met during the rotation. Having done this (in 
O(n logn) time) on both A and B, the lists of an- 
gles and distances should be cyclic shifts of each 
other; the list L.4 of A occurs as a substring in 
LpLeg, where Lz is the list of B. This latter step 
can be done in O(n) time using any linear time 
exact string matching algorithm. One obtains the 
following result. 


Theorem 3 (Atkinson 1987 [4]) Jt is possible 
to decide in O(nlogn) time whether there is 
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a transformation f composed of translation, ro- 
tation and scale, such that f(A) = B, where 
A, B C R? and |A| = |B| =n. 


Approximate variant of the above problem is 
much harder. Denote by f(A) =° B the directed 
approximate congruence of point sets A and B, 
meaning that there is a one-to-one mapping 
from f(A) to B such that for each point in f(A) 
its image in B is e-close. The following result 
demonstrates the added difficulty. 


Theorem 4 (Alt et al. 1988 [3]) Jt is possible 
to decide in O(n®) time whether there is 
a translation f such that f(A) =* B, where 
A,B CR? and |A|=|B|=n. The 
algorithm solves the corresponding LACP 


same 


problem for point pattern P and target T under 
the one-to-one matching condition in O((mn)?) 
time, wherem = |P| < |T| =n. 


To get an idea of the techniques to achieve 
the O((mn)*) time algorithm for LACP, 
consider first the one-dimensional version, 
ie., let P,7T CR. Observe, that if there is 
a translation f such that f’(P) =® T, then there 
is a translation fsuch that f(P) =* T anda point 
peéP that is mapped exactly at e-distance 
of a point t € 7. This lets one concentrate on 
these 2mn representative translations. Consider 
these translations sorted from left to right. 
Denote the left-most translation by /. Create 
a bipartite graph, whose nodes are the points 
in P and in T on the different parties. There 
is an edge between pe P and te€T if and 
only if f(p) is e-close to ft. Finding a maximum 
matching in this graph tells the size of the 
largest approximately common point set after 
applying the translation f, One can repeat this 
on each representative translation to find the 
overall largest common point set. When the 
representative translations are considered from 
left to right, the bipartite graph instances are such 
that one can compute the maximum matchings 
greedily at each translation in time O(|P|) [6]. 
Hence, the algorithm solves the one-dimensional 
LACP problem under translations and one-to- 
one matching condition in time O(m?n), where 
m= |P|<|T| =n. 
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In the two-dimensional case, the set of repre- 
sentative translations is more implicitly defined: 
In short, the mapping of each point p € P e-close 
to each point t € T, gives mn circles. The bound- 
ary of each such circle is partitioned into intervals 
such that the end points of these intervals can be 
chosen as representative translations. There are 
O((mn)?) such representative translations. As 
in the one-dimensional case, each representative 
translation defines a bipartite graph. Once the 
representative translations along a circle are pro- 
cessed e.g., counterclockwise, the bipartite graph 
changes only by one edge at a time. This allows 
an O(mn) time update for the maximum match- 
ing at each representative translation yielding an 
overall O((mn)*) time algorithm [3]. 

More efficient algorithms for variants of this 
problem have been developed by Efrat, Itai, and 
Katz [11], as by-products of more efficient bipar- 
tite matching algorithms for points on a plane. 
Their main result is the following: 


Theorem 5 (Efrat et al. 2001 [11]) Jt is possi- 
ble to decide in O(n? logn) time whether there 
is a translation f such that f(A) =° B, where 
A, B C R? and |A| = |B| =n. 


The problem becomes somewhat easier when the 
one-to-one matching condition is relaxed; one- 
to-one condition seems to necessitate the use 
of bipartite matching in one form or another. 
Without the condition, one can match the points 
independently of each other. This gives many 
tools to preprocess and manipulate the point sets 
during the algorithm using dynamic geometric 
data structures. Such techniques are exploited 
e.g., in the following result. 


Theorem 6 (Chew and Kedem 1992 [8]) The 
LACP problem under translations and using di- 
rected Hausdorff distance and the L; norm, can 
be solved in O(mn logn) time, where P,T C R? 
and m= |P| <|T| =n. The distance evalua- 
tion problem for directed Hausdorff distance can 
be solved in O(n? log? n) time. 


Most algorithms revisited here have relatively 
high running times. To obtain faster algorithms, 
it seems that randomization and approximation 
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techniques are necessary. See [9] for a compre- 
hensive summary of the main achievements in 
that line of development. 

Finally, note that the linear transformations 
considered here are not always enough to 
model a_ real-world problem—even when 
approximate congruence is allowed. Sometimes 
the proper transformation between two point 
sets (or between their subsets) is non-linear, 
without an easily parametrizable representation. 
Unfortunately, the formulations trying to capture 
such non-uniformness have been proven NP- 
hard [1] or even NP-hard to approximate within 
any constant factor [12]. 


Applications 


Point pattern matching is a fundamental problem 
that naturally arises in many application domains 
such as computer vision, pattern recognition, im- 
age retrieval, music information retrieval, bioin- 
formatics, dendrochronology, and many others. 
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Problem Definition 


Definition 1 A simple polygon is a polygon 
whose interior is simply connected, ie., it 
consists of a single connected component and 
does not contain holes. 


Definition 2 A triangulation of a simple polygon 
P with N vertices is a partition of the polygon, 
considered as a full-dimensional subset of the 
plane, into N — 2 nonoverlapping triangles such 
that the set of vertices of these triangles is the set 
of vertices of P, such that no edge of a triangle 
lies outside of P, and such that no triangle edges 
intersect except in their common endpoints. 


Key Results 


In addition to the regularization-based approach 
by Garey et al. [7], three other O(N log N)- 
time algorithms are milestones on the way toward 
an optimal linear-time algorithm. In the first of 
these algorithms [2], Chazelle uses a linear-time 
“polygon-cutting” approach to partition a sim- 
ple polygon by a suitably chosen diagonal; the 
resulting divide-and-conquer scheme yields an 
O(N log N)-time algorithm for simple polygons. 
Hertel and Mehlhorn [8] present an O(N log N)- 
time plane-sweep algorithm and refine its analy- 
sis to yield an O(N + R log R)-time upper bound, 
where R is the number of concave polygon an- 
gles. Chazelle and Incerpi [4] present a divide- 
and-conquer algorithm with O(N + S log S) run- 
ning time; here, S denotes the maximum number 
of times the boundary of the polygon changes 
from “spiraling” to “antispiraling.” 


Polygon Triangulation in o(Nlog N) Time 
Algorithms with o(N log NV) running time were 
developed by Tarjan and van Wyk [14] (using 
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Jordan sorting and finger trees) and Kirkpatrick, 
Klawe, and Tarjan [10] (using efficient point- 
location structures). Both algorithms can be 
shown to run in O(N loglogN) time; the 
algorithm by Kirkpatrick et al. can be made to run 
in O(N log* N) time if the polygon’s vertices 
have bounded integer coordinates. Clarkson, 
Tarjan, and van Wyk [7] restate the algorithm 
by Tarjan and van Wyk in a randomized setting 
using random sampling and develop a Las Vegas 
algorithm with O(N log* N) expected running 
time. The same expected running time can be 
obtained by a considerably simpler randomized 
incremental construction presented by Seidel [7]; 
as an added benefit, this algorithm constructs an 
efficient data structure for vertical ray shooting 
among a set of line segments. 


Polygon Triangulation via Trapezoidation 
The key to an efficient polygon triangulation al- 
gorithm was that polygon triangulation is linear- 
time equivalent to polygon trapezoidation. Here, 
the task is to compute for each vertex v of a sim- 
ple polygon P the point (if any) of the boundary 
of P that is visible from v when shooting hori- 
zontal rays (chords) from v toward too through 
the interior of P. The resulting structure is called 
the visibility map of P.. 


Theorem 1 ((6]) Given the trapezoidal decom- 
position of a simple polygon P, a triangulation of 
P can be computed in linear time and vice versa. 


The proof builds upon the fact that a trape- 
zoidation for a set of points in general position, 
i.e., a set in which no two points share the same 
y-coordinate, consists of trapezoids, which may 
degenerate into triangles that have exactly two 
polygon vertices on their boundary. A trapezoid 
T is said to be a class-A-trapezoid if these ver- 
tices lie on the same side of 7 (this includes 
the case of triangles); otherwise, it is said to 
be a class-B-trapezoid. Fournier and Montuno 
observed that a polygon can be partitioned into 
so-called unimonotone polygons by adding diag- 
onals between the vertices of class-B-trapezoids 
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a Trapezoidation 


Polygon Triangulation, Fig. 1 Phases of Fournier and 
Montuno’s algorithm [6]. By connecting the vertices of 
class-B-trapezoids in (a), the polygon is subdivided in 
unimonotone polygons as shown in (b). The numbers in 


and that these unimonotone polygons can be 
triangulated independently in linear time — see 
Fig. 1. 


Polygon Triangulation in Linear Time 

The only deterministic linear-time algorithm 
for triangulating a simple polygon known so 
far is due to Chazelle [3]. Chazelle’s algorithm 
uses a divide-and-conquer approach to compute 
the visibility map of a simple polygon. Since 
the divide-and-conquer approach subdivides 
the polygon’s boundary into polygonal chains, 
there is no proper notion of the interior of the 
polygon through which the chords are supposed 
to pass. Instead, the polygon is embedded into 
the spherical plane on which the chords can 
“warp around infinity,’ and the visibility map 
is computed by always shooting rays in both 
directions. 

Stated in terms of visibility maps, the algo- 
rithm’s task now can be reduced to merging 
two visibility maps. To avoid linear-time merging 
steps which in turn would lead to O(N log N) 
runtime, the algorithm proceeds as follows: in a 


b Unimonotone subpolygons 
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C Triangulation 


— 


(c) indicate the order in which the ear-cutting algorithms 
construct the triangles in the triangulation of the unimono- 
tone subpolygon P (shown in gray) 


first, bottom-up phase, the algorithm repeatedly 
merges the visibility maps of two subchains of 
the same length that share a common vertex. To 
ensure a sublinear running time, however, the 
algorithm does not compute the full visibility 
map, i.e., the map obtained by shooting rays from 
each vertex. Instead, for a subchain consisting of 
+1 vertices, the algorithm maintains a QIBAT_ 
granular visibility map. 


Definition 3 A visibility map for a polygonal 
chain P is y-granular for some y > 0 if no part 
of the boundary of any region consists of more 
than y consecutive edges of P and if no two 
adjacent regions can be merged without violating 
this property. 


The consequence of this definition is that 
for polygons in general position, i.e., polygons 
for which no two vertices share the same y- 
coordinate, a y-granular visibility map consists 
of O (2 + 1) regions, each of which is bounded 
by a constant number of chords and polygonal 
chains with a total complexity of O(y). This 
enables a compact representation of each submap 
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along with a uniform upper bound on the 
coarseness of the approximation of the visibility 
map. 

For each subchain z considered throughout 
the bottom-up phase, the algorithm computes two 
“oracle” data structures, which are reused in the 
final phase of the algorithm. The first oracle, the 
so-called ray shooter, returns in O( f(y)) time 
for any point in the plane the first point of the 
y-granular visibility map of z that is seen when 
shooting a horizontal ray in either direction. This 
oracle, whose construction is based upon Lip- 
ton and Tarjan’s planar separator theorem [11], 
is used when merging the visibility map of x 
with the visibility map of a subchain x’ that 
shares a common vertex p with m. Starting with 
p, the algorithm walks along a and x’ and 
uses the respective ray shooter to update the 
visibility information for as many vertices as 
needed to guarantee the desired granularity. Due 
to Chazelle’s polygon-cutting theorem [2], each 
region in either submap is closed under visibility. 
This implies that the “ray-shooter” oracle can be 
defined for each region separately, and only one 
oracle for a region in m needs to be queried for 
any vertex in 1’ (and vice versa) as long as the 
algorithm keeps track of the region the vertex 
currently under consideration lies in. 

The second oracle, the so-called arc cutter, 
subdivides m in O(g(y)) time into g(y) subarcs 
each of which is given along with an hA(y)- 
granular visibility map. Using these two oracles, 
merging a y;-granular visibility map for a sub- 
chain consisting of N; vertices with a y2-granular 
visibility map for a subchain consisting of N2 
vertices, ¥2 > y;, can be done in 


lo ( rela 1) F(ya)e(y2)(h(y2) 
V1 y2 


+ log(N1 + N2))) 


time. Chazelle proves that one can maintain these 
oracles such that f(x) € x°-74, g(x) € O(log x), 
and h(x) € O(x°?°), with yo € O(NS.?°); this 
eventually implies a sublinear merging step and 
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hence a linear-time complexity of the bottom-up 
phase. 

The final, top-down phase incrementally re- 
fines all regions of the visibility maps produced 
in the bottom-up phase. Using the “arc-cutter” 
oracle, each polygonal chain on the boundary 
of a region is subdivided into an appropriate 
number of subchains. As a result of the bottom- 
up phase and by carefully aligning the subchains 
considered in that phase with the results of the 
arc cutter, visibility maps, ray shooters, and arc 
cutters are available for each of these chains. The 
algorithm then uses the ray shooters to construct 
new chords and the arc cutters to further refine 
the visibility maps until the recursion bottoms 
out, and visibility maps of constant size can be 
refined by a brute-force algorithm. An inductive 
proof yields a linear runtime for the refinement 
of each region in the visibility map that was 
constructed in the bottom-up phase; hence, the 
overall running time is linear. 

While Chazelle’s algorithm uses only reason- 
ably complicated data structures and subroutines, 
the analysis of both phases strongly suggests 
large constant factors hidden in the Big-Oh no- 
tation. In addition, the algorithm requires rather 
delicate implementation issues to be solved, in 
particular regarding the representation of the vis- 
ibility maps, and thus it is not surprising that 
Chazelle mentioned developing a simpler, ran- 
domized algorithm with expected linear runtime 
as a major open problem. 


Randomized Polygon Triangulation in 
Expected Linear Time 

Over a decade after the publication of Chazelle’s 
deterministic, linear-time algorithm, Amato, 
Goodrich, and Ramos [1] affirmatively answered 
Chazelle’s question. Their algorithm follows 
Chazelle’s two-phase approach and uses a 
bottom-up phase to preprocess helper data 
structures for so-called portal queries in the 
subchains’ visibility maps. The top-down phase 
also subdivides the polygonal chain into smaller 
chains and refines the visibility maps. In contrast 
to Chazelle’s algorithm, however, this refinement 
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step is done on a random sample of the subchains 
only. As the sampling probability tends to one as 
the size of the subchain approaches O(1), both 
correctness and an expected linear runtime can 
be shown. 


Applications 


Being able to efficiently triangulate simple poly- 
gons has a variety of applications in compu- 
tational geometry, computer graphics, and geo- 
graphic information systems. For some of the 
problems considered, using a linear-time polygon 
triangulation algorithm is the key to obtaining 
an optimal algorithm. One such example among 
a variety of results is the optimal point-location 
scheme presented by Kirkpatrick [9] whose pre- 
processing time is linear assuming the availabil- 
ity of a linear-time triangulation algorithm. Sev- 
eral other applications are covered in O’ Rourke’s 
textbook on art gallery problems [12]. 


Open Problems 


The main open problem is to devise an optimal 
deterministic algorithm that is reasonably effi- 
cient in practice. 


Experimental Results 


Due to its inherent complexity, Chazelle’s algo- 
rithm has eluded a rigorous experimental eval- 
uation so far. Preliminary results reported by 
Vahrenhold [15], however, seem to indicate that 
running even the first nontrivial stage of the 
bottom-up process takes significantly more time 
than running, e.g., Seidel’s randomized algorithm 
[13] in full. Similarly, Amato et al. [1] con- 
jecture that their randomized algorithm, despite 
its expected optimal runtime, is “not likely to 
be of practical value,” either. Hence, for practi- 
cal purposes, the simplicity of the deterministic 
algorithm by Hertel and Mehlhorn [8] and the 
randomized algorithm by Seidel [13] strongly 
advocates their use. 
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Problem Definition 


This problem is concerned with the Nash equi- 
libria of a game based on the ad auction used 
by Google and Yahoo. This research work [5] 
is motivated by the huge revenue that the ad- 
word auction derives every year. It defines two 
types of Nash equilibrium in the position auction 
game, applies economic analysis to the equilibria, 
and provides some empirical evidence that the 
Nash equilibria of the position auction describes 
the basic properties of the prices observed in 
Google’s adword auction reasonably accurately. 
The problem being studied is closely related to 
the assignment game studied by [4, 1, 3]. And [2] 
has independently examined the problem and 
developed related results. 


The Model and Its Notations 

Consider the problem of assigning agents 
a=1,2,...,A to slots s=1,2,...,S where 
agent a’s valuation for slot s is given by 
Uas = VgXs. The slots are numbered such that 
Xj >X2 >-+:->xXg. It is assumed that xs = 0 
for all s > S and the number of agents is greater 
than the number of slots. A higher position 
receives more clicks, so x, can be interpreted 
as the click-through rate for slot s. The value 
Ua > 0 can be interpreted as the expected profit 
per click so Ugs = VgXs indicates the expected 
profit to advertiser a from appearing in slot s. 
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The slots are sold via an auction. Each agent 
bids an amount b,, with the slot with the best 
click through rate being assigned to the agent 
with the highest bid, the second-best slot to the 
agent with the second highest bid, and so on. 
Renumbering the agents if necessary, let vs be 
the value per click of the agent assigned to slot 
s. The price agent s faces is the bid of the agent 
immediately below him, so py = b;+1. Hence the 
net profit that agent a can expect to make if he 
acquires slot s is (Ug — Ps) Xs = (Ug — bs41) Xz. 


Definitions 


Definition 1 A Nash equilibrium set of prices 
(NE) satisfies 


(Us — Ps) Xs = (Vs — pt) Xr, fort > s 
(Us — Ds) Xs = (Vs — pr-1) Xt, fort < s 


where p; = b;+1. 


Definition 2 A symmetric Nash equilibrium set 
of prices (SNE) satisfies 


(Us — Ps) Xs = (Vs — pr) Xz for all ¢ and s. 
Equivalently, 


Us (Xs — X¢) = PsXs — PrX; for allt and s. 


Key Results 


Facts of NE and SNE 

Fact 1 (Non-negative surplus) In = an 
Us = Ds. 

Fact 2 (Monotone values) In an SNE, vs—1>Us, 
for all s. 

Fact 3 (Monotone prices) In an SNE, ps—1Xs—1 
> PsXs and ps—1 = Pz for all s. If vs > Ds 
then Ps—1 > Ps. 

Fact 4(NE > SNE) If aset of prices is an SNE 
then it is an NE. 

Fact 5 (One-step solution) If a set of bids satis- 
fies the symmetric Nash equilibria inequalities 
for s+1 and s—1, then it satisfies these 
inequalities for all s. 


SNE, 
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Fact 6 The maximum revenue NE yields the 
same revenue as the upper recursive solution 
to the SNE. 


A Sufficient and Necessary Condition 
of the Existence of a Pure Strategy Nash 
Equilibrium in the Position Auction Game 


Theorem 1 Jn the position auction described 
before, a pure strategy Nash equilibrium exists if 
and only if all the intervals 


| — Ps+iXs+1 


Ps-1Xs-1 — a 
’ 
Xs —Xs+1 


Xs—1 — Xs 


fors = 2,3,...,S8 


are non-empty. 


Applications 


The model studied in this paper is a simple and el- 
egant abstraction of the real adword auctions used 
by search engines such as Google and Yahoo. 
Different search engines have slightly different 
rules. For example, Yahoo ranks the advertisers 
according to their bids, while Google ranks the 
advertisers not only according to their bids but 
also according to the likelihood of their links 
being clicked. 

However, similar analysis can be applied to 
real world situations, as the author has demon- 
strated above for the Google adword auction case. 
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Problem Definition 


The power grid of an integrated system is respon- 
sible for providing reliable supply and ground 
voltages to every circuit element in the system. 
Degradations in the supply voltage levels can 
result in parametric failures due to increased 
delays, whereby circuits no longer meet their 
specifications, as well as catastrophic failures due 
to incorrect gate switching. Further, power grids 
are susceptible to reliability faults due to catas- 
trophic failure modes such as electromigration. 
Therefore, accurate power grid analysis is a vital 
step in integrated circuit design. 

Power grids may be analyzed under DC 
waveforms that reflect the steady-state currents 
drawn by the circuit or under transient analysis 
that captures the response of the grid to specific 
time-varying current waveforms; inductive 
effects, particularly in the integrated circuit 
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package; and decoupling capacitors that are 
deliberately placed in the circuit to temper the 
effect of large transients. For both DC and 
transient analysis, the problem can be abstracted 
as solving a set of linear algebraic equations of 
the following form: 


GV =E, (1) 


where G € t“*% models the conductances in 
the system, V € KR is the vector of unknown 
node voltages, and E € Si% is the right-hand 
side (RHS) vector, modeling the current loads. In 
case of DC analysis, a single such system must 
be solved, while in case of transient analysis, one 
such system is solved at each time step. For com- 
putational efficiency, a constant time step is often 
used during transient analysis of power grids in 
order to ensure that the G matrix, whose entries 
depend on the time step, remains unchanged 
through the simulation. 

Given a power grid topology with |E| 
resistors, these equations can be formulated using 
modified nodal analysis [2] in O(|E|) time. 
Matrix G is sparse and diagonally dominant 
(vig; lgiil < gii, Vi), and all off-diagonals of 
G are less than or equal to zero. 

The task of power grid analysis is to determine 
all voltage levels in the system and verify that 
the maximum deviation from their ideal values 
is within a user-specified bound and to ensure 
that the current density in each wire is within a 
user-specified limit in order to assure resilience 
to electromigration failure. 


Key Results 


Mainstream methods for solving such systems 
include direct methods such as LU/Cholesky fac- 
torization and iterative methods. Due to the fa- 
vorable structural properties of the power grid, 
notably sparsity and diagonal dominance, it is 
possible to solve these systems efficiently. How- 
ever, the scale of the problem, where power grids 
may have hundreds of millions or billions of 
resistors, poses large memory and computation 
challenges even to the most efficient solvers. As 
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a result, there has been considerable work on 
developing specialized solvers for power grids. 
Notable contributions in this direction are de- 
scribed below. 


Hierarchy-Based Solvers 

In real designs, the power grid is inherently hi- 
erarchical since it is created as a part of a hier- 
archical design process, where individual blocks 
with locally constructed power are first designed 
individually and then assembled at the chip level. 
This structure is exploited in [9] to build a hierar- 
chical solution to the grid. 

Based on inherent hierarchy, the power grid 
has k local partitions, corresponding to blocks, 
and a global partition that connects the power 
grids within these blocks. The global grid is then 
defined to include the set of nodes that lie in 
the global partition and the port nodes, while the 
grid in a local partition constitutes a local grid. 
The local grid is connected to the global grid 
through a set of nodes called ports, and due to the 
hierarchical structure, the number of port nodes 
is a small fraction of all nodes. The technique 
consists of the following steps: 


Macromodeling Each of the k local grids may 
be modeled as a multi-port linear element 
represented by a macromodel of the type I = 
A-V+5S, where I and V are the vectors of 
port voltages and A and S are, respectively, a 
constant matrix and a constant vector. Here, 
the A matrix is sparsified with bounded and 
minimal loss of accuracy using an integer 
knapsack scheme. 

Solution of the global grid Once the macro- 
models for all the local grids are generated, 
the entire network is abstracted simply as the 
global grid, with the macromodel elements 
connected to it at the port nodes. This system 
is solved to determine the voltages at all ports. 

Solution of the local grids Given the port volt- 
ages, the local grids are then each separately 
solved to provide the solution to the entire 
system. 
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Multigrid-Based Solvers 

Multigrid-based approaches are an effective way 
of solving large systems of equations and have 
been customized to solve power grids [3]. The 
solution proceeds by creating a coarsened form 
of the network with a reduced number of nodes, 
which can be solved efficiently, and then by 
propagating the result of this solution to the full 
network. The technique consists of four steps: 


Grid reduction, in which the large power grid is 
coarsened by selecting a subset of nodes that 
are to be maintained, while the other nodes are 
removed. The number of variables is therefore 
significantly reduced from from 1 to m. 

Interpolation, in which an n x m interpolation 
operator matrix P is defined to map the orig- 
inal grid to the coarsened grid. This inter- 
polation operator relates the voltages on the 
removed nodes to those on the coarsened grid, 
thereby allowing the solution of the coarsened 
grid to accurately reflect that of the original 
grid. 

Solution of the coarsened grid, in which a so- 
lution is found for the voltages in the coars- 
ened grid by solving the above linear equa- 
tions. 

Mapping the solution from the coarsened grid 
to the original grid by applying the interpola- 
tion operator concludes the process. 


Random Walk-Based Solvers 

The diagonal dominance of the power grid en- 
ables a special property that creates an exact 
mapping between the solution of the power grid 
equations and the use of random walks on a 
network. This idea has been used in [8] and 
further sped up in [6]. Unlike other approaches 
that require all (or most) nodes in a system to be 
solved together, random walk approaches allow 
for a single node to be solved alone. This is 
particularly useful during incremental analysis 
and optimization [1]. 

The family of random walk methods has been 
extended to solve entire systems, in a marriage 
with iterative linear solvers based on conjugate 
gradient methods. The intuition is that since 
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random walks provide approximate solutions 
rapidly, they can be used to build effective 
preconditioners for iterative solvers [7]. 


Applications 


Power grid analysis is a vital ingredient in the 
design of every integrated circuit, and there are 
several commercial offerings of design automa- 
tion tools that analyze power grids. Aside from 
the problem of solving the linear system, the 
issue of determining the worst-case excitation is 
also a difficult problem. In spite of numerous 
efforts, automated tools for this purpose have 
been excessively pessimistic and therefore inef- 
fective. Itis generally accepted that user-specified 
patterns are the most effective way to provide 
input excitations, particularly in a design world 
where the power grid must be analyzed at mul- 
tiple corners and multiple modes (corresponding 
to different supply voltages that could be applied 
to the circuit). 


Experimental Results 


Intelligent solutions for solving linear systems of 
equations have found extensive use in the analysis 
of power grids. Solvers that are used include 
direct solvers as well as iterative solvers based 
on methods such as preconditioned conjugate 
gradient-based solvers. Preconditioners based on 
methods such as support trees have been found 
to be useful, and random walk preconditioners 
have also been shown to outperform conventional 
methods. Due to the computational nature of this 
task, there has been active work on developing 
parallel and multithreaded power grid solvers. 
For example, the 2011 and 2012 editions of the 
Tau workshop have hosted contests on solving 
power grid problems [4, 5]. 


URLs to Code and Data Sets 


A set of power grid benchmarks have been made 
available to the community at http://dropzone. 
tamu.edu/~pli/PGBench. 
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Problem Definition 


Consider an ordered universe U, andaset T C U 
with |T| =n. The goal is to preprocess T, such 
that the following query can be answered effi- 
ciently: given x € U, report the predecessor of 
x, ie., max{y € T | y < x}. One can also con- 
sider the dynamic problem, where elements are 
inserted and deleted into T. Let tg be the query 
time, and ¢, the update time. 

This is a fundamental search problem, with 
an impressive number of applications. Later, this 
entry discusses IP lookup (forwarding packets 
on the Internet), orthogonal range queries and 
persistent data structures as examples. 

The problem was considered in many com- 
putational models. In fact, most models below 
were initially defined to study the predecessor 
problem. 


Comparison model: The problem can be solved 
through binary search in © (lg 1) comparisons. 
There is a lot of work on adaptive bounds, 
which may be sublogarithmic. Such bounds 
may depend on the finger distance, the work- 
ing set, entropy etc. 

Binary search trees: Predecessor search is one 
of the fundamental motivations for binary 
search trees. In this restrictive model, one can 
hope for an instance optimal (competitive) 
algorithm. Attempts to achieve this are 
described in a separate entry. (O(log log )- 
competitive Binary Search Trees (2004; 
Demaine, Harmon, Iacono, Patrascu)) 

Word RAM: Memory is organized as words of 
b bits, and can be accessed through indi- 
rection. Constant-time operations include the 
standard operations in a language such as C 
(addition, multiplication, shifts and bitwise 
operations). 

It is standard to assume the universe is U = 
oe a}, i.e., one deals with ¢-bit integers. 
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The floating point representation was designed 
so that order is preserved when values are 
interpreted as integers, so any algorithm will 
also work for £-bit floating point numbers. 
The standard transdichotomous assumption is 
that b = @, so that an input integer is repre- 
sented in a word. This implies b > Ign. 

Cell-probe model: This is a nonuniform model 
stronger than the word RAM, in which the op- 
erations are arbitrary functions on the memory 
words (cells) which have already been probed. 
Thus, f, only counts the number of cell probes. 
This is an ideal model for lower bounds, since 
it does not depend on the operations imple- 
mented by a particular computer. 

Communication games: Let Alice have the 

query x, and Bob have the set T. They are 
trying to find the predecessor of x through 
t rounds of communication, where in each 
round Alice sends ma bits, and Bob replies 
with mg bits. 
This can simulate the cell-probe model when 
mg = b and mg is the logarithm of the mem- 
ory size. Then t < f, and one can use commu- 
nication complexity to obtain cell-probe lower 
bounds. 

External memory: The unit of access is a page, 
containing B words of £ bits each. B-trees 
solve the problem with query and update 
time O(loggn), and one can also achieve 
this oblivious to the value of B (See Cache- 
oblivious B-tree (2005; Bender, Demaine, 
Farach-Colton).). The cell-probe model with 
b = B - Lis stronger than this model. 

AC® RAM: This is a variant of the word RAM 
in which allowable operations are functions 
that have constant depth, unbounded fan-in 
circuits. This excludes multiplication from the 
standard set of operations. 

RAMBO: this is a variant of the RAM with 
a nonstandard memory, where words of mem- 
ory can overlap in their bits. In the static case 
this is essentially equivalent to a normal RAM. 
However, in the dynamic case updates can be 
faster due to the word overlap [5]. 


The worst-case logarithmic bound for compar- 
ison search is not particularly informative when 
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efficiency really matters. In practice, B-trees and 
variants are standard when dealing with huge data 
sets. Solutions based on RAM tricks are essential 
when the data set is not too large, but a fast query 
time is crucial, such as in software solutions to IP 
lookup [7]. 


Key Results 


Building on a long line of research, Patrascu 
and Thorup [15, 16] finally obtained matching 
upper and lower bounds for the static problem in 
the word RAM, cell-probe, external memory and 
communication game models. 

Let S be the number of words of space avail- 
able. (In external memory, this is equivalent to 
S/B pages.) Define a = lg S -£/n. Also define 
Igx = [log,(x + 2)], so that lgx > 1 even if 
x € [0, 1]. Then the optimal search time is, up to 
constant factors: 


log, n = O(min{log pz n, logy n}) 


lg Eign 
is (1) 


min 
legis &) 


The bound is achieved by a deterministic 
query algorithm. For any space S, the data 
structure can be constructed in time O(S) by 
a randomized algorithm, starting with the set 
T given in sorted order. Updates are supported 
in expected time tf, + O(S/n). Thus, besides 
locating the element through one predecessor 
query, updates change a minimal fraction of the 
data structure. 

Lower bounds hold in the powerful cell-probe 
model, and hold even for randomized algorithms. 
When S > n't, the optimal trade-off for 
communication games coincides to (1). Note that 
the case S = n!+e() essentially disappears in 
the reduction to communication complexity, be- 
cause Alice’s messages only depends on Ig S. 
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Thus, there is no asymptotic difference between 
S = O(n) and, say, S = O(n”). 


Upper Bounds 
The following algorithmic techniques give the 
optimal result: 


¢ B-trees give O(log pn) query time with linear 
space. 

¢ Fusion trees, by Fredman and Willard [10], 
achieve a query time of O(log, 1). The basis 
of this is a fusion node, a structure which 
can search among b* values in constant time. 
This is done by recognizing that only a few 
bits of each value are essential, and packing 
the relevant information about all values in 
a single word. 

¢ Van Emde Boas search [18] can solve the 
problem in O(lg €) time by binary searching 
for the length of the longest common prefix 
between the query and a value in T. Beginning 
the search with a table lookup based on the 
first lg n bits, and ending when there is enough 
space to store all answers, the query time is 
reduced to O(lg((€ — lgn)/a)). 

e A technique by Beame and Fich [4] can per- 
form a multiway search for the longest com- 
mon prefix, by maintaining a careful balance 
between £ and n. This is relevant when the 
space is at least n'*®, and gives the third 
branch of (1). Patrascu and Thorup [15] show 
how related ideas can be implemented with 
smaller space, yielding the last branch of (1). 


Observe that external memory only features 
in the optimal trade-off through the O(log, n) 
term coming from B-trees. Thus, it is optimal 
to either use the standard, comparison-based B- 
trees, or use the best word RAM strategy which 
completely ignores external memory. 


Lower Bounds 

All lower bounds before [15] where shown in 
the communication game model. Ajtai [1] was 
the first to prove a superconstant lower bound. 
His results, with a correction by Miltersen [12], 
show that for polynomial space, there exists n as 
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a function of £ making the query time 2(y/lg 0), 
and likewise there exists £ a function of n making 
the query complexity (2( Vign ). 

Miltersen et al. [13] revisited Ajtai’s proof, ex- 
tending it to randomized algorithms. More impor- 
tantly, they captured the essence of the proof in 
an independent round elimination lemma, which 
is an important tool for proving lower bounds in 
asymmetric communication. 

Beame and Fich [4] improved Ajtai’s lower 
bounds to (lg £/lglg@) and Q(./lgn/lglgn) 
respectively. Sen and Venkatesh [17] later gave 
an improved round elimination lemma, which 
can reprove these lower bounds, but also for 
randomized algorithms. 

Finally, using the message compression 
lemma of [6] (an alternative to round elimi- 
nation), Patrascu and Thorup [15] showed an 
optimal trade-off for communication games. This 
is also an optimal lower bound in the other 
models when S >n!+®, but not for smaller 
space. 

More importantly, [15] developed the first 
tools for proving lower bounds exceeding 
communication complexity, when § = n!+(, 
This showed the first separation ever between 
a data structure or polynomial size, and one of 
near linear size. This is fundamentally impossible 
through a direct communication lower bound, 
since the reduction to communication games 
only depends on lg S. 

The full result of Patragcu and Thorup [15] it 
the trade-off (1). Initially, this was shown only for 
deterministic query algorithms, but eventually it 
was extended to a randomized lower bound as 
well [16]. Among the surprising consequences 
of this result was that the classic van Emde 
Boas search is optimal for near-linear space (and 
thus for dynamic data structures), whereas with 
quadratic space it can be beaten by the technique 
of Beame and Fich. 

A key technical idea of [15] is to analyze many 
queries simultaneously. Then, one considers 
a communication game involving all queries, 
and proves a direct-sum version of the round 
elimination lemma. Arguably, the proof is even 
simpler than for the regular round elimination 
lemma. This is achieved by _ considering 
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a stronger model for the inductive analysis, in 
which the algorithm is allowed to reject a large 
fraction of the queries before starting to make 
probes. 


Bucketing 

The rich recursive structure of the problem can 
not only be used for fast queries, but also to 
optimize the space and update time — of course, 
within the limits of (1). The idea is to place ranges 
of consecutive values in buckets, and include 
a single representative of each bucket in the 
predecessor structure. After performing a query 
on the predecessor structure (now with fewer ele- 
ments), one need only search within the relevant 
bucket. 

Because buckets of size w? can be handled 
in constant time by fusion trees, it follows that 
factors of w in space are irrelevant. A more ex- 
treme application of the idea is given by exponen- 
tial trees [3]. Here buckets have size O(n!~”), 
where y is a sufficiently small constant. Buckets 
are handled recursively in the same way, leading 
to O(lglgn) levels. If the initial query time is at 
least tg > lg° n, the query times at each level de- 
crease geometrically, so overall time only grows 
by a constant factor. However, any polynomial 
space is reduced to linear, for an appropriate 
choice of y. Also, the exponential tree can be 
updated in O(t,) time, even if the original data 
structure was static. 


Applications 


Perhaps the most important application of prede- 
cessor search is IP lookup. This is the problem 
solved by routers for each packet on the Internet, 
when deciding which subnetwork to forward the 
packet to. Thus, it is probably the most run 
algorithmic problem in the world. Formally, this 
is an interval stabbing query, which is equivalent 
to predecessor search in the static case [9]. As 
this is a problem where efficiency really mat- 
ters, it is important to note that the fastest de- 
ployed software solutions [7] use integer search 
strategies (not comparison-based), as theoretical 
results would predict. 
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In addition, predecessor search is used perva- 
sively in data structures, when reducing problems 
to rank space. Given a set T, one often wants to 
relabel it to the simpler {1,...,7} (“rank space”), 
while maintaining order relations. If one is pre- 
sented with new values dynamically, the need for 
a predecessor query arises. Here are a couple of 
illustrative examples: 


¢ In orthogonal range queries, one maintains 
a set of points in U“, and queries for points 
in some rectangle [a;,),] x--- x [ag, bq]. 
Though bounds typically grow exponentially 
with the dimension, the dependence on 
the universe can be factored out. At query 
time, one first runs 2d predecessor queries 
transforming the universe to {1,...,n}4. 

¢ To make pointer data structures persistent [8], 
an outgoing link is replaced by a vector of 
pointers, each valid for some period of time. 
Deciding which link to follow (given the time 
being queried) is a predecessor problem. 


Finally, it is interesting to note that the lower 
bounds for predecessor hold, by reductions, for 
all applications described above. To make these 
reductions possible, the lower bounds are in fact 
shown for the weaker colored predecessor prob- 
lem. In this problem, the values in T are colored 
red or blue, and the query only needs to find the 
color of the predecessor. 


Open Problems 


It is known [2] how to implement fusion trees 
with AC ° instructions, but not the other query 
strategies. What is the best query trade-off 
achievable on the AC° RAM? In particular, can 
van Emde Boas search be implemented with 
AC? instructions? 

For the dynamic problem, can the update times 
be made deterministic? In particular, can van 
Emde Boas search be implemented with fast 
deterministic updates? This is a very appealing 
problem, with applications to deterministic dic- 
tionaries [14]. Also, can fusion nodes be updated 
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deterministically in constant time? Atomic heaps 
[11] achieve this when searching only among 
(lgn)® elements, not b®. 


Finally, does an update to the predecessor 


structure require a query? In other words, can 
ty = O(tg) be obtained, while still maintaining 
efficient query times? 
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Problem Definition 


Given a set S of n keys from the set [1... 2°], the 
goal of predecessor search is to return, given a 
key y € [1...2°], the largest key x € S such that 
x < y. We have the following models: 


Comparison model Balanced binary search 
trees [1,6] can solve the problem in optimal 
O(logn) time in the comparison model, in 
which the key can only be manipulated through 
comparisons with each other. 


External memory model In this model, it is 
assumed that the data is read and written into 
blocks of B elements (integers in our case) and 
the cost of a query or algorithm is the number of 
read or written blocks. In this model, B-trees [7] 
can solve the problem in O(loggn) time and 
O(n) space. 


RAM model This is the main subject of study. 
The model assumes that all standard arithmetic 
and logic operations (including multiplication) on 
integers of length w take constant time, where w 
is the computer word size. It is assumed that w > 
£> logn. 


AC® RAM model This model is similar to the 
RAM model except that it only contains instruc- 
tions that can be implemented with circuits of 
polynomial size, constant depth and unfounded 
fan-in. The only affected instruction is multipli- 
cation which cannot be implemented with such 
circuits. While this model has been often consid- 
ered in the literature in the past, modern com- 
puters support multiplications very fast, and the 
bottleneck is usually the memory access. 


Cell probe model This model is used to prove 
lower bounds. It also has an associated word 
size w, but the cost of a query or an update 
is just the number of accessed memory words 
(computations have zero cost). 


Useful Concepts 
The two main techniques used for predecessor 
search are cardinality reduction and length reduc- 
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tion. At every step of the query, one would wish 
to either reduce the cardinality of the searched set 
or reduce the length of the searched keys. 


Balanced Binary Search Tree 

Cardinality reduction is achieved through the use 
of balanced binary search tree. This allows doing 
a predecessor search in O(logn) time in the 
comparison model. The B-tree is a generalization 
of the balanced binary search trees, where every 
node can have B children instead of just two. 
A static balanced binary search tree allows one 
to divide the set of searched keys by 2 at every 
level. A static B-tree allows one to divide the set 
by B. In the dynamic case, the cardinality can 
be reduced by a factor less than 2 (less than B 
for a dynamic B-tree) at every level, but it is 
guaranteed that the cardinality goes to one after 
O(log n) levels (O(log, n) for a B-tree). 


Trie 

A key concept is that of a trie. A predecessor 
search for a key in a trie takes O(¢) time. A 
trie built on a set S of n keys from [1...2°] is 
a binary tree with £ levels numbered top-down. 
All the leaves of the trie are at level £. Every 
edge in the tree has a label that is either 0 or 1. 
Let x[i] denote the bit number i of the integer 
x, where i € [1...£] and the bits are numbered 
from the most to the least significant bit (x[1] is 
the most significant bit). Denote by x[i ... j] the 
binary string that consists in the concatenation 
of the bits x[i],...x[j]. A node of the trie at 
depth d € [0, €] will be labeled by a binary string 
of length d, formed by concatenating the edge 
labels from the root to the node. There will be 
a node of depth d labeled by binary string p if 
and only if there exists at least an element x € S 
such that x[1...d] = p. Anode at depthd < ¢ 
labeled by string p will have as children the nodes 
at depth d + 1 labeled by pO and pl (if they 
both exist, otherwise it will only have one child). 
The leaves of the trie are labeled by strings of 
Sand the root is labeled by the empty string. A 
trie occupies O(né log n) bits. In order to support 
predecessor searches, every internal node of the 
trie stores two elements of S. The two elements 
stored by a node labeled by binary string p are the 
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largest element prefixed by p and the predecessor 
of p0"—!?!. A predecessor search on a compacted 
trie for a key y is then done by traversing the 
trie top-down, at level i € [1... 4] following the 
edge labeled by character y[i + 1]. The node at 
which the traversal stops is called the locus of y. 
The predecessor of y is then easily determined 
from its locus. If the locus is a leaf labeled by 
x, then necessarily x = y and y is returned 
as the predecessor. Otherwise, the locus is some 
internal node and the predecessor is one of the 
two elements of S stored at that internal node. 
Suppose that the internal node is at level i and 
is labeled by string p. Then, if x[i + 1] = 1, 
the predecessor is the largest element prefixed by 
Pp; otherwise, it is the predecessor of pO”. In 
a compacted trie (Patricia trie), only leaves and 
internal nodes with degree 2 are kept, resulting in 
a tree of 2n —1 nodes, n leaves, and n — | internal 
nodes (see Fig. 1). The trie will then occupy only 
O(n(£+logn)) bits. A predecessor search can be 
implemented on a compacted trie in a way similar 
to the non-compacted trie. The main difference is 
that the locus in a compacted trie is either a leaf 
or a location in the middle of a compacted edge. 


Key Results 


Van Emde Boas 

The Van Emde Boas tree [17-19] is a compacted 
trie representation supporting predecessor search 
by doing a binary search on the trie levels. Since 
the number of levels is £, the binary search takes 


Trie 


Predecessor Search, String Algorithms and Data 
Structures, Fig. 1 Three trie variants used for storing 
the integers 16, 17, 25, and 30. On the /eft, an ordinary 
trie; in the middle, a compacted trie; and on the right, a 


Compacted trie 
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time O(log £). The original structure used space 
0(2°). Later, Willard [20] showed how to use 
hashing to reduce the space usage to O(7) while 
keeping the search time bounded by O(log £). 


Searching a Compacted Trie (Fusion Trees) 
The fusion tree reduces the predecessor search 
problem to the search on small compacted tries, 
each encoded in one memory word. The idea of 
the fusion tree dates back to a paper by Ajtai et 
al. [2], where it was shown how to implement pre- 
decessor search using a compacted trie in which 
the compacted paths are omitted and only their 
lengths are stored (the trie on the right in Fig. 1). 
This allows encoding a trie in O(n (log £+logn)) 
bits only compared to the O(n(€ + logn)) bits 
for the ordinary compacted trie. Then, a prede- 
cessor search for a key y is done by first doing 
a top-down traversal of the compacted trie, by 
always pretending that a comparison between the 
compacted paths and the bits of searched key 
is successful (only bits that label the edges are 
compared to bits of the searched key). At the 
end, the search terminates at a leaf that points 
to an integer x € S that is one of the elements 
that share the longest prefix with the searched 
key. Then, the locus (and hence the predecessor) 
is determined by filling the traversed compacted 
paths with bits from x and comparing those bits 
with the corresponding bits of y. 

The main observation is that a predecessor 
search can be supported in constant time in the 
cell probe model whenever n log £ < w, since it 


15 17 25 30 
Compacted trie (omitted paths) 


compacted trie in which the compacted paths are omitted 
and replaced by their lengths. Notice that the compacted 
trie has exactly 2-4 — 1 = 7 nodes 
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involves reading a constant number of memory 
locations corresponding to the trie encoded in a 
constant number of words and to two elements of 
S' (the traversal of the trie is charged zero cost). 
Later, Fredman and Willard [11] invented the fu- 
sion node, where they implemented a predecessor 
search in constant time in the RAM model, when 
n < w'/¢ for c = 6. This allows searching the 
predecessors among a set of w!/¢ keys for c = 6 
in constant time. By implementing a B-tree with 
block size B = w!/°, one can achieve query time 
O(loggn) = O(logn/logw). Finally, Patragcu 
and Thorup [15] have shown how to implement 
the approach with constant time updates and 
queries on B = w!/¢ keys for c = 4. This allows 
searches and dynamic updates in deterministic 
O(log n/ log w) time. 


Beame and Fich 

Beame and Fich [8] use a more advanced search 
that combines cardinality and length reductions. 
As a building block, they use a data structure 
that recursively reduces a search over n keys of 
length £ to a search over a group of n’ keys of 
length ¢’ such that either 2’ = &@ but n’ < q 
or €’ = £/h for some parameters h and q. 
This reduction technique was taken from an al- 
gorithm used for integer sorting [5]. Their search 


: i log £ 
time is O (wt 
Combining with the fusion tree, this gives query 
time O ( lng 


log logn 
achieved by using the fusion tree when logé > 


 logn log logn and the new data structure when 
log£ < ,/lognloglogn. They also prove a 


matching lower bound, by showing that one can- 
not achieve space polynomial in n with query 


) and the space is quadratic. 


with quadratic space. This is 


log £ 


time o (ate) for all values of n and w or query 


logn 
log logn 
Sen and Venkatesh show that their lower bound 


holds even if randomization is allowed [16]. 


time o ( for all values of £ and w. Later, 


Exponential Search Trees 

The exponential search tree [4] allows one to 
transform any predecessor data structure with 
polynomial space and preprocessing time, and 
query time q(n,,w), into a data structure 
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with O(n) space and preprocessing time, and 
query time at most O(loglogn - q(n,&,w)). 
Given some predecessor data structure P that 
supports queries in time qg(n,£,w) and uses 
space and preprocessing time S(n) = O(n‘), 
the exponential search tree is a recursive data 
structure built from a sorted set x1 < +--+ < X, of 
n keys as follows: 


1. The root has degree d = n!/+), 

2. The n keys are partitioned into d blocks of 
sizes b = i each and the data structure 
P is built on the set which consists in the 
first element of each of the blocks 2...d (the 
elements Xp41,-.-,X(d—1)b+1)- 

3. The root has d children, where every child is 
itself an exponential search tree built on the 
b = n°/©+) elements of a block. 


The recursion stops whenever we have a tree of 
constant size, in which the predecessor search 
is trivially supported in constant time. The con- 
struction time C(7) follows the equation 


C(n) = S@VerDy a yile+)) . c(ne/er) 


which gives C(n) = O(no/€+)) 4 net) . 
C(n°/+), By iteratively expanding the term 
C(n°/+)), we get C(n) = O(n). 

A query is done by traversing the loglogn 
levels of the tree and doing predecessor searches 
at the structure P of each traversed node. The 
query time follows the equation Q(n,f,w) = 
gnV/Ct) gw) + O(ne/©+) £,w), which 
solves to O(n, £,w) = O(loglogn - q(n, £,w)). 
In the same paper [4], Andersson and Thorup 
show how to insert or delete an element in 
worst-case constant time, once a pointer to its 
predecessor (or to itself in case of a deletion) has 
been determined. 


Deterministic Dynamic Bounds 
When applied to Beame and Fich’s 


tion with time O (; 

og log £ 
search tree gives O(n) 
mo) and when applied to the 


solu- 
log £ 


} the exponential 
space with time 


O (log logn- cana 
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solution with O (, / ee ) time and quadratic 
og logn 


space, it keeps the same time bound and achieves 
O(n) space. By combining the two bounds with 
the bounds of the fusion tree [15], one gets the 
following time bounds which are the best known 
ones for dynamic deterministic predecessor 
search: 


logn 
log w 


/ logn 
log logn () 


log £ 
log log € 


Oj} 1+ min 
loglogn - 


The space is O(n) for all the three branches. 
The bounds refer to the maximum of update and 
search times. 


Optimal Static Bounds 

Patragcu and Thorup [13] obtained optimal lower 
and upper bounds for the static case. They obtain 
optimal trade-offs between the time and the space 
usage. Define a = log “ + log w, where S' is the 
space usage. Then, the optimal time bound is 


logn 
log w 


f£—logn 
logw 


log 
log £ 

log (ca - log ) 
log i 


log (log | log ‘een ) 


They later show that their lower bound holds even 
if randomization is allowed [14]. The first branch 
of the upper bound corresponds to the fusion tree. 
The second branch is obtained by first reducing 
the length from ¢ to £ — logn bits, by dividing 
the universe into n buckets according to their 
most significant logn bits, and then by storing 
a separate Van Emde Boas structure for every 
bucket. The query time is then reduced from 
O(log(£ — logn)) to log £logn , by stopping the 
Van Emde Boas recursion when the key length 
gets to logw bits. Finally, the last two branches 


©} 1+ min (2) 
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are implemented using a refinement of the tech- 
nique by Beame and Fich. 


Optimal Randomized Dynamic Bounds 
When allowing randomization, optimal bounds 
can be achieved [15]. Again, the bounds are for 
the maximum of query and update times: 


log(2® —n) 


@|1+mind los?) (3) 


£ 
log log w 


Tos 
£/log 22") 


log(log log log w 


The space usage is linear (O(7)) or almost linear 
O(nw?D), 

The first branch of the upper bound corre- 
sponds to the dynamic version of the fusion 
tree [15] and the third branch to a dynamic 
version of the fourth branch of the optimal static 
upper bound. The second branch is similar to the 
second branch of the optimal static upper bound, 
with the difference that the term & — logn is 
replaced by log(2‘ — n). 

The first and third branches are trivially op- 
timal, since the bounds are the same as the static 
ones and any lower bound that applies to the latter 
also applies to the former. The authors show that 
the second branch is also optimal by proving a 
corresponding lower bound that is stronger than 
the static one. 


Applications 


Range queries are very important in databases. 
Answering to range queries is an obvious and nat- 
ural application of predecessor search. A query 
(in one dimension) asks, given a range [a,b] C 
[1...2°], to return every element x in the set S/N 
[a, b]. This can obviously be solved by doing two 
predecessor queries for a and b and then report- 
ing all elements between the two predecessors 
(excluding the predecessor of a). In the compar- 
ison model and the external memory model, this 
is the best one can hope for, and the optimal time 
bounds are O(logn+|SN[a, b]|) and O(logz n+ 
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|S A [a, b]|/B) respectively. Surprisingly, in the 
RAM model, there exists a linear-space static 
solution with O(|S /M [a, b]|) query time [3]. An 
important application of predecessor search is 
IP forwarding problem, which must be solved 
by every internet router. The router contains a 
database of subnetworks specified by their IP 
address prefixes, and each received packet has to 
be forwarded to the subnetwork with the longest 
matching prefix. The IP forwarding problem is an 
instance of the longest common prefix problem, 
which in the static case is equivalent to the 
predecessor problem. 

The lower and upper bounds on predecessor 
search can be used to prove bounds for other 
problems through reductions. For example, 
predecessor search can be reduced to two- 
dimensional range search, allowing one to prove 
a lower bound of §2(log log) time for the two- 
dimensional range-emptiness problem on sets 
of n points on a grid of m rows by n columns. 
Optimal bounds can also be proved for rank 
queries on sequence representations through 
reduction to and from predecessor queries [9]. 


Open Problems 


The deterministic complexity of the dynamic pre- 
decessor search is still open. Another open prob- 
lem is whether updates can be supported faster 
than searches when the search time is optimal or 
near optimal (of course, one can always support 
constant update time when the query time is the 
trivial O(n)). For the moment, this is not disal- 
lowed by any lower bound and has been achieved 
for the related dynamic ranking problem [10], in 
which a set S C [1...2°] is maintained under 
updates and a query asks, given an integer y € S 
to count the number of elements of S smaller 
than y. 
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Problem Definition 


The price of anarchy captures the lack of coordi- 
nation in systems where users are selfish and may 
have conflicted interests. It was first proposed 
by Koutsoupias and Papadimitriou in [8], where 
the term coordination ratio was used instead, but 
later Papadimitriou in [12] coined the term price 
of anarchy that finally prevailed in the literature. 
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Roughly, the price of anarchy is the system 
cost (e.g., makespan, average latency) of the 
worst-case Nash equilibrium over the optimal 
system cost that would be achieved if the players 
were forced to coordinate. Although it was orig- 
inally defined in order to analyze a simple load- 
balancing game, it was soon applied to numerous 
variants and to more general games. The family 
of (weighted) congestion games [\1, 13] is a 
nice abstract form to describe most of the al- 
ternative settings. (We focus our presentation on 
cost minimization problems in congestion games. 
We mention some utility maximization problems 
where price of anarchy analysis has been used in 
the Applications section.) 

The price of anarchy may vary, depending on 
the 


¢ Equilibrium solution concept (e.g., pure, 
mixed, correlated equilibria) 
¢ Characteristics of the congestion game 
— Players Set (e.g., atomic — non-atomic) 
— Strategy Set (e.g., symmetric asymmetric, 
parallel machines-network-general) 
— Players’ cost functions (e.g., linear, poly- 
nomial) 
¢ Social cost (e.g., maximum, sum, total latency) 


Notation 

Let G be a (finite) game that is determined by the 
triple (N, (Sj)iew, (Ci)ien). N = {1,...,n} is 
the set of the players that participate in the game. 
S; is a pure strategy set for player i. An element 
A; € S; isa pure strategy for playeri € N. 
A pure strategy profile A = (Aj,...,An) isa 
vector of pure strategies, one for each player. The 
set of all possible pure strategy profiles is denoted 
by S = S; x--- x Sy. The cost of a playeri € 
N, for a pure strategy, is determined by a cost 
function c; : St R. 

A pure strategy profile A is a pure Nash 
equilibrium, if none of the players i € N can 
benefit, by unilaterally deviating to another pure 
strategy s; € S;: 


ci (A) < cj (A_i, 83) ViEN, Vs; € S;, 
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where (A_;,5;) is the simple strategy profile 
that results when just the player i deviates from 
strategy A; € S; to strategy s; € S;. 

Similarly, a mixed strategy p; for a player 
i € N isa probability distribution over her pure 
strategy set S;. A mixed strategy profile p is the 
tuple p = (/p1,.-.- Pn), where player i chooses 
mixed strategy p;. The expected cost of a player 
i € N with respect to the p is 


ci(p) = Y> p(A)ei(A), 


AcS 


where p(A) = []jey pi(A;) is the probabil- 
ity that pure strategy A occurs, with respect to 
(pi)ien. A mixed strategy profile p is a Nash 
equilibrium, if and only if 
ci(P) < ci (p-i, Si) ViEN, Vs; €S; 

The social cost of a pure strategy profile A, de- 
noted by SC(A), is the maximum cost of a player 
MAX(A) = maxjew c;(A) or the average cost of 
a player. For simplicity, the sum of the players’ 
cost is considered (i.e., 7 times the average cost) 
SUM(A) = \ojey ci(A). The same definitions 
extend naturally for the case of mixed strategies, 
but with expected costs in this case. 

The (mixed) price of anarchy [8] for a game is 
the worst-case ratio, among all the (mixed) Nash 
equilibria, of the social cost over the optimal cost, 
OPT = minpes SC(P). 


SC(p) 


PA = max 
pisNE. OPT 


The price of anarchy for a class of games is the 
maximum (supremum) price of anarchy among 
all the games of this class. 


Congestion Games Here, a general class of 
games is described that captures most of the 
games for which price of anarchy is studied in 
the literature. A congestion game [11, 13], is 
defined by the tuple (N, E,(S;)ien. (felece): 
where NV = {1,...,n} is a set of players, E is 
a set of facilities, S; C 2” is the pure strategy 
set for player i, a pure strategy A; € S; isa 
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subset of the facility set, and fe is a cost (or 
latency) function (Unless otherwise stated, linear 
cost functions are only considered throughout 
this article. See [14] and references therein for 
results about more general cost functions, and 
for additional results see entries 00260, 00251, 
00053.) with respect to the facility e € E. 

A pure strategy profile A = (A1,..., An) isa 
vector of pure strategies, one for each player. The 
cost c; (A) of player 7 for the pure strategy profile 
A is given by 


ci(A) = D> fe(ne(A)), 


ecA; 


where 7¢(A) is the number of the players that use 
facility e in A. 

A congestion game is called symmetric or 
single commodity, if all the players have the same 
strategy set: S; = C. The term asymmetric or 
multi-commodity is used to refer to all the games 
including the symmetric ones. A special class is 
the class of network congestion games. In these 
games, the facilities are edges of a (multi)graph 
G(V, E). The pure strategy set for a playeri ¢ N 
is the simple paths set from a source s; € V 
to a destination t; € V. In network symmetric 
congestion games, all the players have the same 
source and destination. 

A natural generalization of congestion games 
are the weighted congestion games, where every 
player controls an amount of traffic w;. The cost 
of each facility e € EF depends on the total load of 
the facility. In this case, a well-studied social cost 
function is the weighted sum of players costs, or 
total latency. 

In a congestion game with splittable weights 
(divisible demands), every player i € N, instead 
of fixing a single pure strategy, is allowed to 
distribute her demand among her pure strategy 
set. 

Finally, in a non-atomic congestion game, 
there are k different player types 1...k. Players 
are infinitesimal and for each player type i the 
continuum of the players is denoted by the 
interval [0,7;]. In general, each player type 
contributes in a different way to the congestion 
on the facility e € £, and this contribution is 
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determined by a positive rate of consumption 
rs,e for a strategy s € S; and a facility e € s. 
Each player chooses a strategy that results 
in a strategy distribution x = (Xs)ses, with 


Shoe 


Key Results 


Maximum Social Cost 

First, we review results on the price of anarchy 
w.r.t maximum social cost that was historically 
the first social cost considered in [8]. Formally, 
for a pure strategy profile A, the social cost is 


SC(A) = MAX(A) = max ¢;(A) 


The definition naturally extends to mixed 
strategies. 


Theorem 1 ([7-10]) The price of anarchy for m 
logm 


identical machines is © . 
loglogm 


Theorem 2 ([7]) The price of anarchy for m 
uniformly related machines with speeds s, = 
S2 = +++ = Sm is 


logm 


log loglogm’ log (rate) 


logm 


0 


Theorem 3 ([4]) The price of anarchy for pure 
equilibria is O(./n) for asymmetric but at most 
5/2 for symmetric congestion games. 


Average Social Cost: Total Latency 

Here, we consider as social cost the (weighted) 
sum (total latency) of the players’ cost for 
(weighted) congestion games, i.e., 


SC(A) = Sum(A) = 5 ¢j(A), 
ieN 
SC(A) = C(A) = > wie; (A). 


ic¢N 


The definition naturally extends for mixed 
strategies. 
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Theorem 4 ({2-4]) The price of anarchy is 5/2 
for asymmetric and ae for symmetric conges- 
tion games. 


Theorem 5 ([2, 3]) The price of anarchy for 
weighted congestion games is 1+ ¢ > 2.618. 


Theorem 6 ([6]) The price of anarchy is at most 
3/2 for congestion games with splittable weights. 


Theorem 7 ([15, 16]) The price of anarchy for 
non-atomic congestion games is 4/3. 


Key Proof Technique: Smoothness Most of the 
above results on atomic congestion games have 
been generalized for polynomial latencies [1— 
3] and hold for various equilibrium concepts. 
Roughgarden’s smoothness framework [14] dis- 
tills the main ideas in these proofs and provides a 
general, canonical proof recipe to obtain price of 
anarchy bounds. He also shows how smoothness 
provides tight bounds for congestion games with 
general cost functions. 


Applications 


The efficiency of large-scale networks, in which 
selfish users interact, is highly affected due to the 
users’ selfish behavior. The price of anarchy is a 
quantitative measure of the lack of coordination 
in such systems. It is a useful theoretical tool 
for the analysis and design of telecommunication 
and traffic networks, where selfish users compete 
on system’s resources motivated by their atomic 
interests and are indifferent to the social welfare. 

The price of anarchy has been also studied 
in utility maximization Problems; see, for exam- 
ple, Valid Utility Games [17]. Finally, a line of 
work shows that price of anarchy can be used 
to evaluate the performance of mechanisms; see, 
for example, [5] for an analysis of simultaneous 
Second-Price Auctions. 


Cross-References 


Best Response Algorithms for Selfish Routing 
Computing Pure Equilibria in the Game of 
Parallel Links 
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Problem Definition 


This entry considers a selfish routing model for- 
mally introduced by Koutsoupias and Papadim- 
itriou [10], in which the goal is to route the traffic 
on parallel links with linear latency functions. 
One can describe this model as a scheduling prob- 
lem with m independent machines with speeds 
S1,...,5m and n independent tasks with weights 
W1,-.-,Wn. The goal is to allocate the tasks to the 
machines to minimize the maximum load of the 
links in the system. 
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It is assumed that all tasks are assigned by 
noncooperative agents. The set of pure strategies 


for task 7 is the set {1,...,m}, and a mixed 
strategy is a distribution on this set. 

Given a combination (jj,..., jn) E 
{1,...,m}" of pure strategies, one for each task, 
the cost for task i is jn=li 55, which is the 


time needed for machine j; chosen by task i 
to complete all tasks allocated to that machine. 
Similarly, for a combination of pure strategies 


(j1,---, Jn) € {1,...,m}", the load of machine 
j is defined as )7 ,, ; $<. 
7 vA 


Given n tasks of length w1,..., Wn and m ma- 
chines with the speeds 51,..., 5m, let opt denote 
the social optimum, that is, the minimum cost 
over all combinations of pure strategies: 


opt = max 


min 
(iiss JndEtl,...m}" l<j<m 


For example, if all machines have the same 
unit speed (s; = 1 for every 7,1 < j < m) 
and all tasks have the same unit weight (w; = 1 
for every i, 1 < i <n), then the social optimum 
is [7]. 


It is also easy to see that in any system 


It is known that computing the social optimum is 
NP-hard even for identical speeds (see [10)]). 

For mixed strategies, let Di denote the proba- 
bility that an agent 7 sends the entire traffic w; to 
a machine j. Let £; denote the expected load on 
a machine /, that is, 


Thi ee 
J i=1 


For a task i, the expected cost of task i on 
machine j is equal to 


J 

j Wi Wr P 
red =—+) : 
Sj SS; 


si, 
= 4) +0-p7)—. 
t#i J J 
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The expected cost cf corresponds to the ex- 
pected finish time of task 7 on machine j under 
the processor sharing scheduling policy. This is 
an appropriate cost model with respect to the 
underlying traffic routing application. 


Definition 1 (Nash equilibrium) The probabil- 
ities (p} )1<i<n,1<j<m define a Nash equilibrium 
if and only if any task 7 will assign nonzero 
probabilities only to machines that minimize oe ; 
that is, pi > 0 implies ef < ee for every q, 
l<q<m. 


As an example, in the system considered 
above in which all machines have the same unit 
speed and all weights are the same, the uniform 
probabilities Di = 1 for all 1 < 7 < mand 
1 <i <n define a system in a Nash equilibrium. 

The existence of a Nash equilibrium over 
mixed strategies for noncooperative games was 
shown by Nash [13]. In fact, the routing game 
considered here admits an equilibrium even if all 
players are restricted to pure strategies, what has 
been shown by Fotakis et al. [7]. 

Fix an arbitrary Nash equilibrium, that is, fix 
the probabilities Care ee that define a 
Nash equilibrium. Consider the randomized al- 
location strategies in which each task i is allo- 
cated to a single machine chosen independently at 
random according to the probabilities ne , that is, 
task i is allocated to machine j with probability 
re Let C;, 1 < j < m, be the random variable 
indicating the load of machine j in our random 
experiment. Observe that C; is the weighted 
sum of independent 0-1 random variables i : 


Pr[J/ =1)/= 2, such that 


1 n 


Wi - ff . 
Sj 


i=1 


Let ¢ denote the maximum expected load over 
all machines, that is, 


max ¢;. 
1<jsm ° 


Notice that E[C;] = ¢;, and therefore, c = 
maxj<j<m E[C;]. 
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Finally, let the social cost C be defined as the 
expected maximum load (instead of maximum 
expected load), that is, 


=E C;]. 
Lae i] 


Observe that c < C and possibly c < C. 
The goal is to estimate the price of anarchy (also 
called the worst-case coordination ratio) which 
is the worst-case ratio 


R = max —, 


opt’ 


where the maximum is over all Nash equilibria. 


Key Results 


Early Work 

The study of the price of anarchy has been ini- 
tiated by Koutsoupias and Papadimitriou [10], 
who showed also some very basic results for 
this model. For example, they proved that for 
two id machines, the price of anarchy is 
exactly 3 5, and for two machines (with possibly 
different speeds), the price of anarchy is at least 
og = ee Koutsoupias and Papadimitriou 
showed also that for m identical machines, the 


price of anarchy is Qe a and it is at most 


O(./m Inm), and for m arbitrary machines, the 


price of oe is OW Sn a ti o - /logm), 


where 51 > S2 >--- > Sm [10]. 

Koutsoupias and Papadimitriou [10] conjec- 
tured also that the price of anarchy for m identical 
machines is Oe): In the quest to resolve 
this conjecture, Mavronicolas and Spirakis [12] 
considered the problem in the so-called fully 
mixed model, which is a special class of Nash 
equilibria in which all fad are strictly positive. 
In this model, Mavronicolas and Spirakis [12] 
showed that for m identical machines in the fully 
mixed Nash equilibrium, the price of anarchy is 
Ot a ). Similarly, they proved also that for m 
(not necessarily identical) machines and n identi- 
cal weights in the fully mixed Nash equilibrium, 


ifm <n, then the price of anarchy is One = ). 
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The motivation behind studying fully mixed 
equilibria is the so-called fully mixed Nash equi- 
librium conjecture stating that these equilibria 
maximize the price of anarchy because they max- 
imize the randomization. The conjecture seems to 
be quite appealing as a fully mixed equilibrium 
can be computed in polynomial time, which led 
to numerous studies of this kind of equilibria with 
the hope to obtain efficient algorithms for com- 
puting or approximating the price of anarchy with 
respect to mixed equilibria. However, Fischer 
and Vécking [6] disproved the fully mixed Nash 
equilibrium conjecture and showed that there is a 
mixed Nash equilibrium whose expected cost is 
larger than the expected cost of the fully mixed 
Nash equilibrium by a factor of Qe eben): Fur- 
thermore, they presented polynomial time algo- 
rithms for approximating the price of anarchy for 
mixed equilibria on identical machines up to a 
constant factor. 


Tight Bounds for the Price of Anarchy 

Czumaj and Vécking [4] entirely resolved the 
conjecture of Koutsoupias and Papadimitriou 
[10] and gave an exact description of the price of 
anarchy as a function of the number of machines 
and the ratio of the speed of the fastest machine 
over the speed of the slowest machine. (To 
simplify the notation, for any real x > 0, let log x 
denote log x = max{log, x, 1}. Also, following 
standard convention, J“(V) is used to denote the 
gamma (factorial) function, which for any natural 
N is defined by '(N + 1) = WN! and for an 
arbitrary real x > Ois (x) = ihe tle dt. 
For the inverse of the gamma _ function, 
rCvV(N), it is known that "~)(N) = x such 
that |x|! < MN —1 < [x]!. It is well known that 


PCV(N) = pebty (1 + o(1)),) 


Theorem 1 ([4] Upper bound) 


The price of anarchy for m machines is 


bounded from above by 


logm 


O | min , 


where it is assumed that the speeds satisfy 8; = 
oe = Sm- 
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In particular, the price of anarchy for m ma- 
‘i logm 

chines is O (atti) . 
The theorem follows directly from the follow- 


ing two results [4]: that the maximum expected 
load ¢ satisfies ¢ = opt. F-Y(m) = opt- 


O (min (we log (2)}) and that the social 


logm 
wal) 

If one applied these results to systems in which 
all agents follow only pure strategies, then since 
then £; = C; for every j, it holds that C = c. 
This leads to the following result. 


cost C satisfies C = opt: O +1 


Corollary 1 ({4]) For pure strategies the price 
of anarchy for m machines is upper bounded by 


1 
o(om te (2), 
log logm Sm 


where it is assumed that the speeds satisfy 8, = 
joie) = Sm- 


Theorem 3 below proves that this corollary 
gives an asymptotically tight bound for the price 
of anarchy for pure strategies. 

By Theorem 1, in the special case when all 
machines are identical, the price of anarchy is 


O (; log Ta ) this result has been also obtained 
oglogm 

independently by Koutsoupias et al. [11]. How- 

ever, in this special case one can get a stronger 


bound that is tight up to an additive constant. 


Theorem 2 ([4]) For m identical machines the 
price of anarchy is at most 


logm 


rm) + O() = 
loglogm 


-(1+o0())). 


One can obtain a lower bound for the price of 
anarchy for m identical machines by considering 
the system in which yp = 4 for every i, j. 
Gonnet [9] proved that then the price of anarchy 
is [~-Y(m) — 3 + o(1), which implies that 
Theorem 2 is tight up to an additive constant. 

The next theorem shows that the upper bound 
in Theorem | is asymptotically tight. 


Theorem 3 ([4] Lower bound) The price of an- 
archy for m machines is lower bounded by 
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logm 


92 | min 


In particular, the price of anarchy for m machines 
. logm 
is 2 (atten): 

In fact, it can be shown [4] (analogously to 
the upper bound) that for every positive integer 
m, positive real r, and ea > 1, there exists a 


set of m machines with + = S being in a 
Nash equilibrium and Gee opt =r,eo= 


opt - 2 (min | BE log (2)}), and C = 


ont- 2 


logm 


Applications 


The model discussed here has been extended in 
the literature in numerous ways, in particular in 
[1,5, 8]; see also survey presentations in [3, 14]. 


Open Problems 


An interesting attempt that adds an algorithmic 
or constructive element to the analysis of the 
price of anarchy is made in [2]. The idea behind 
“coordination mechanisms” is not to study the 
price of anarchy for a fixed system, but to design 
the system in such a way that the increase in cost 
or the loss in performance due to selfish behav- 
ior is as small as possible. This is a promising 
direction of research that might result in practical 
guidelines of how to build a distributed system 
that does not suffer from selfish behavior but 
might even exploit the selfishness of the agents. 


Experimental Results 


None is reported. 


URLs to Code and Data Sets 


None is reported. 
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Problem Definition 


Let there be 1 agents and a set of feasible out- 
comes S2. For concreteness, readers may think of 
92 as the set of allocations of m items to n agents. 
Each agent has a private value function v; : 2 tb 
[0, 1] over feasible outcomes. We focus on direct 
revelation mechanisms, which first let each agent 
i report a value function v;, then choose a feasible 
outcome w € §2 and a payment p; for each agent 
i according to the reported value functions. Let 
w@(v) and p(v) denote the outcome and payment 
vector chosen by the mechanism. Note that both 
@(v) and p(v) may be random variables. 

We hope to achieve the following three objec- 
tives: 


Objective 1: Maximizing Social Welfare 

The social welfare of a feasible outcome w € 
§2 is the sum of the agents’ values for the out- 
come, namely, }*;_, vj (@). We hope to approx- 
imately maximize the expected social welfare 
of the chosen outcome over the randomness of 
the mechanism, a widely considered objective in 
mechanism design. 

Approximately maximizing social welfare 
given the true value functions is a well-studied 
algorithmic problem (e.g., [4, 16]). In our 
mechanism design setting, agents may choose 
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not to report the true value if it fits their interests. 
So the mechanism has an additional challenge of 
motivating the agents to report truthfully. 


Objective 2: Incentive Compatibility 

We adopt the standard assumption that each 
agent i aims to maximize the expectation of 
his quasi-linear utility, which equals his value 
for the chosen outcome less his payment. A 
mechanism is incentive compatible if truth telling 
maximizes an agent’s expected utility regardless 
of the reported values of other agents, that is, 
for any agent i, any true value v;, reported 
value v;, and any reported values of other agents 
v_;, we have E[v;(@(v;, v_;)) — pi (vi, v-i)] = 
'[v; (@(v;, V_;)) — p; (0;, V_;)]. We also consider 
a relaxed notion called a-incentive compatibility, 
where an agent’s expected utility of truth telling 
can be worst off by at most an @ additive 
factor comparing to his utility of reporting any 
alternative value. 

There is a vast literature on designing in- 
centive compatible mechanisms with approxi- 
mately optimal social welfare (see, e.g., [11] 
for a comprehensive survey). We remark the 
Vickrey-Clarke-Groves (VCG) mechanism [2, 5, 
15], which chooses an outcome that maximizes 
the social welfare and uses payments to align 
the interests of the agents and the mechanism 
designer. When computational efficiency is not 
of concern, the VCG mechanism gives optimal 
social welfare and is incentive compatible for 
arbitrary problems. However, it does not achieve 
the next objective. 


Objective 3: Protecting Agents’ Privacy 

Our last objective is to protect the agents’ pri- 
vacy by ensuring that the chosen outcome and 
payments do not reveal too much information 
about any individual agent’ private value func- 
tion. Agents may care about their privacy for both 
exogenous and endogenous reasons. On the one 
hand, privacy is a basic desideratum. On the other 
hand, violating an agent’s privacy could explicitly 
hurt the agent’s utility in the future, e.g., compa- 
nies may post higher reserve prices based on an 
agent’s past values if such information is revealed 
by previous mechanisms. 
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Definition 1 ((3]) A mechanism is (e,6)- 
differentially private if for any agent 7, any value 
v;, alternative value v;, any values of other agents 
v_;, and any subset of feasible outcome S C 2, 


Pr[w(v;, v_;) € S] < e*-Pr[w(v;,v_;) € S]+6 . 


We remark that the payments may violate the 
agents’ privacy as well. Nevertheless, we can 
make the prices privacy preserving without 
changing the agents’ expected utilities by adding 
any scale of noise with expectation zero to the 
payments. (Having arbitrarily large variance in 
the payment and, thus, in the utility of an agent 
is an undesirable property. In some settings, it 
is possible to privately compute prices without 
having large variance. Readers are referred to 
Hsu et al. [6] for details, which we will omit due 
to space constraint.) So we focus on the privacy 
property of the outcome in the above definition. 

We provide two informal interpretations 
of differential privacy (for sufficiently small 
6). Information theoretically, a mechanism 
being (¢,6)-differentially private implies that 
the outcome reveals at most O(e?) bits of 
information about any individual agent’s private 
value. Game theoretically, it implies that truth 
telling may decrease an agent’s future utility by 
at most a factor of ef = 1—e. 

For some mechanism design problems, such 
as auctioning m items to n agents, no (e,64)- 
differentially private mechanisms can approxi- 
mately maximize social welfare. For such re- 
source allocation problems, let w_; denote the al- 
location to all agents except agent i. We consider 
the following relaxed notion of privacy: 


Definition 2 ((8]) A mechanism is (e, 5)-jointly 
differentially private if for any agent 7, any value 
v;, alternative value v; , any values of other agents 
v_;, and any subset of feasible outcome S C 2, 


Pr[w_;(v;, v-;) € S] 
< e& -Prlw_j(vj,v-;)€ S]+6. 


In settings where each agent can see only his 
own allocation, a mechanism being (e, 5)-jointly 
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differentially private also implies that it reveals 
at most O(e€*) bits of information of an agent’s 
value and that truth telling decreases an agent’s 
future utility by at most e~‘, even if the adversary 
colludes with all other agents. 


Related Work 

The problem we consider in this article falls 
into the growing literature on the interface of 
game theory and differential privacy (see, e.g., 
Pai and Roth [14] for a survey). McSherry and 
Talwar [10] proposed using differentially private 
mechanisms to design auctions by pointing out 
that differential privacy implies approximate in- 
centive compatibility and resilience to collusion. 
They also proposed the exponential mechanism, 
which is an important building block in one of the 
results we will discuss. Nissim et al. [13] showed 
how to convert differentially private mechanisms 
into exactly incentive compatible mechanisms 
in some settings, but the final mechanisms no 
longer protect agents’ privacy. Xiao [17] pro- 
posed mechanisms that are both incentive com- 
patible and differentially private in some special 
cases. Unfortunately, it does not seem possible to 
extend the results of Nissim et al. [13] and Xiao 
[17] to more general problems. Finally, Xiao [17], 
Chen et al. [1], and Nissim et al. [12] consid- 
ered modeling the agents’ concern for privacy 
in the utility functions and introduced incentive 
compatible mechanisms for some special cases 
in this model. In sum, most previous techniques 
apply only to special cases. In this article, we 
summarize two recent techniques for designing 
privacy-preserving auctions for a large family of 
mechanism design problems. 


Key Results 


Almost all mechanism design problems can be 
classified into two families: social choice prob- 
lems and resource allocation problems. In a so- 
cial choice problem, the set of feasible outcome 
is independent of the number of agents n. In 
particular, the number of feasible outcome is 
independent of n. For example, leader elections 
and choosing a subset of public projects subject 
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a budget constraint fall into this family. In a 
resource allocation problem, such as allocating 
m items to n agents, the set of feasible outcome 
depends on the number of agents. In particular, 
the number of feasible outcome grows exponen- 
tially with n. Below we discuss two techniques 
by Huang and Kannan [7] and Hsu et al. [6] for 
designing privacy-preserving auctions for these 
two families of problems, respectively. 


Social Choice Problems 

Huang and Kannan [7] proposed a technique 
for designing incentive compatible and e- 
differentially private mechanisms for arbitrary 
mechanism design problems. For social choice 
problems, in particular, this technique also gives 
nearly optimal social welfare. 


Theorem 1 ([7]) For any mechanism design 
problem, there is an incentive compatible and 
€-differentially private mechanism that gives at 
least OPT — 2(In |§2| + 1n z) social welfare with 
probability at least 1 — B. 


This mechanism is based on the exponential 
mechanism by McSherry and Talwar [10], a 
general differentially private mechanism that 
can be applied to a large family of problems. 
The social welfare guarantee and e-differential 
privacy in Theorem 1 follow directly from 
properties of the exponential mechanism. 
However, the exponential mechanism is not 
incentive compatible in general. Huang and 
Kannan [7] noticed that the exponential 
mechanism can be viewed as maximizing a 
linear combination of the social welfare and 
the Shannon entropy of the outcome distribution. 
Therefore, its allocation rule is equivalent to that 
of the VCG mechanism in a virtual market where 
the set of feasible outcomes are distributions 
over the original outcomes, the set of agents are 
the original 1 agents plus an additional agent 
whose value equals the entropy of the chosen 
distribution. As a result, using the payments in 
the virtual market along with the exponential 
mechanism achieves incentive compatibility. 

In social choice problems, In | 2| is a constant 
independent of 7. So the loss in social welfare is 
a constant independent of n. On the other hand, 
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the social welfare is the sum of n agents’ values, 
each of which is between 0 and 1. Hence, in a 
large market with many agents, it is reasonable to 
expect the optimal social welfare (if not of scale 
@(n)) to be much larger than the additive loss in 
Theorem | in practical instances. 

In resource allocation problems, however, || 
grows exponentially in 1 and, thus, In|Q| is of 
scale 2(n). For instance, consider matching m = 
n items to n agents. Then, |S2| = n! and In|S2| = 
92(n Inn). Even if the optimal social welfare is 
of scale O(n), we would need € to be at least 
§2(Inn) to have nontrivial social welfare guaran- 
tee in Theorem |. This means that the mechanism 
would reveal (In? n) bits of information of 
an agent’s private value, and truth telling may 
decrease an agent’s future utility by a poly(n) 
factor. Further, this is not only a limitation of the 
current technique. Huang and Kannan [7] showed 
that no €-differentially private can give nontrivial 
social welfare guarantee for ¢ = o(Inn), even 
without incentive compatibility. 


Resource Allocation Problems 

Given the obstacles for applying differential 
privacy to resource allocation problems, Hsu 
et al. [6] looked into a relaxed notion of 
privacy, namely, joint differential privacy. In 
particular, they considered matching m items 
to n agents where each item has a supply 
of at least s copies and then generalized the 
results to combinatorial auctions with gross 
substitute value functions (e.g., [9]). Their first 
result is a jointly differentially private (yet not 
incentive compatible) mechanism with nearly 
optimal social welfare when the supply s is poly- 
logarithmic in 1 and m. Their main technique 
is a noisy variant of the deferred-acceptance 
algorithm by Kelso and Crawford [9]. 


Theorem 2 (([6]) For combinatorial auctions 
with gross substitute valuations, there is an €- 
jointly differentially private algorithm that gives 
at least OPT — an social welfare with probability 
at least 1—B ifs = 2(ypolylog(n, m, i, 3))- 


They also showed a supply of w(1) is needed 
for a jointly differentially private mechanism to 
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achieve o(7) additive loss in social welfare. More 
precisely, they showed: 


Theorem 3 ([6]) No jointly differentially private 
algorithm can compute matchings with social 
welfare at least OPT — an ifs < O( zz): 


Their approach can also be used to design approx- 
imately incentive compatible and jointly differen- 
tially private mechanisms, but the supply needs to 
be polynomially large. 


Theorem 4 (implicit in [6]) For combinatorial 
auctions with gross substitute valuations, there 
is an a-incentive compatible and €-jointly dif- 
ferentially private algorithm that gives at least 
OPT —an social welfare with probability at least 
1— 8 ifs = Q(m), where the constant depends 
one, 6, a, and Bp. 


Open Problems 


The results of Huang and Kannan [7] and Hsu 
et al. [6] provided a preliminary step towards 
designing auctions that protect agents’ privacy. 
There are still many open problems in this topic, 
some of which we sketch below. 

First, the techniques of Hsu et al. [6] funda- 
mentally rely on properties of gross substitute 
value functions and, thus, cannot be extended 
to more general value functions. Further, there 
are many important families of value functions 
beyond gross substitute, e.g., sub-modular func- 
tions, sub-additive functions, etc. So it is natural 
to seek for techniques that work for more general 
families of value functions. 


Problem 1 Are there jointly differentially pri- 
vate mechanisms that achieve nearly optimal so- 
cial welfare for arbitrary value functions? 


Even if we restrict our attention to gross sub- 
stitute value functions or even to matching mar- 
kets, Theorems 2 and 3 leave a large gap between 
the upper and lower bounds on the supply needed 
by jointly differentially private mechanisms to 
get nearly optimal social welfare. Closing this 
gap would advance our understanding on joint 
differential privacy. 
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Problem 2 What is the minimal supply needed 
so that a jointly differentially private mecha- 
nism can achieve nearly optimal social welfare 
in combinatorial auctions? In particular, is the 
logarithmic dependency on and m in Theorem 2 
necessary? 


Finally, the current technique for achieving 
both approximate incentive compatibility and 
joint differential privacy requires a polynomially 
large supply of items, much larger than the supply 
needed for achieving joint differential privacy 
alone. Does approximate incentive compatibility 
make the problem fundamentally harder? Or is it 
just a limitation of the current technique? 


Problem 3 What is the minimal supply needed 
so that an approximately incentive compatible 
and jointly differentially private mechanism can 
achieve nearly optimal social welfare in combi- 
natorial auctions? In particular, is the polynomial 
dependency on m in Theorem 4 necessary? 
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Problem Definition 


Spectral analysis refers to a family of popular 
and effective methods that analyze an input ma- 
trix by exploiting information about its eigen- 
vectors or singular vectors. Applications include 
principal component analysis, low-rank approx- 
imation, and spectral clustering. Many of these 
applications are commonly performed on data 
sets that feature sensitive information such as 
patient records in a medical study. In such cases 
privacy is a major concern. Differential privacy 
is a powerful general-purpose privacy definition. 
This entry explains how differential privacy may 
be applied to task of approximately computing 
the top singular vectors of a matrix. 

Generally speaking, the input is a real-valued 
matrix A € R”*” and a parameter k € N. We 
think of the input matrix as specifying 7 attributes 
for m individuals. The goal of the algorithm is to 
approximately compute the first k < min {m,n} 
singular vectors of A while achieving differential 
privacy. There are several notions of approxi- 
mation as well as several variants of differential 
privacy that make sense in this context. 


Approximation Guarantee 

Let A = UV" denote the singular value 
decomposition of A with singular values 0, > 
02 = +++ = Omax{m.n} = 9. Further, let U, and 
V, represent the first kK columns of U and V, 
respectively. In other words, Ux consists of the 
first k left singular vectors of A and Vz consists 
of the first k right singular vectors. 


Principal Angle 

Principal angles are a useful tool for comparing 
the distance between subspaces. The sine of 
the largest principal angle between subspaces 
X,Y of equal dimension represented by 
orthonormal matrices is defined as sin 9(X, Y) = 
Ke —XX')y > Where the norm refers to the 
spectral norm (or £2-operator norm). 

A natural objective is to require that the al- 
gorithm M outputs an orthonormal matrix X € 
R”** so as to minimize | Via. 6 aan |. We 
call this the principal angle objective. The angle 
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is of course 0 when X = V;. We will also be 
interested in the case where the rank of X is larger 
than the rank of V;. Note that our objective is still 
well defined. 


Expressed Variance 

Another natural objective is to output an or- 
thonormal matrix X¥ € R”** so as to maximize 
the variance captured by the subspace spanned 
by the columns of X. A convenient way to ex- 
press this objective is to maximize the quantity 
| AX lr where the norm refers to the Frobenius 
norm. It is not difficult to show that this objective 
is maximized for X = V;. Again, the objective 
is still well defined when the rank of X is larger 
than that of X. 


Privacy Guarantee 

Differential privacy requires the definition of a 
neighborhood relation on matrices, denoted A ~ 
A’. Pairs of matrices in this relation are called 
neighboring. Differential privacy requires that the 
algorithm maps neighboring databases to nearly 
indistinguishable output distributions. 


Definition 1 Given a neighborhood relation ~, 
we say that a randomized algorithm MM satisfies 
(€,6)-differential privacy if for all neighboring 
matrices A ~ A’ and for every measurable set S 
in the output space of the algorithm, we have that 


Pr{M(A) € S} < exp(e) Pr{M(4’) € S} +8. 
(1) 


Neighborhood Relations. Typically in differen- 
tial privacy, the neighborhood relation is chosen 
to be the set of all pairs of matrices that differ in 
at most one row. Unfortunately, this definition is 
unattainable in the spectral setting as the privacy 
definition is sensitive to the scale of the row 
vector that’s being changed. Indeed, if we replace 
a single row of the matrix by a vector Au of 
norm A, then as we let A tend to infinity, the top 
right singular vector of the matrix will tend to the 
vector u (say, in angle). 

To circumvent this problem, we will generally 
specify a norm bound in each neighbor relation. 
It is important to note that the strength of privacy 
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definition then depends on the scaling of the 
matrix. 


— We say that A, A’ are entry neighbors if they 
differ by at most 1 in absolute value. 

— We say that A, A’ are row neighbors if they 
differ in at most one row by a vector whose 
Euclidean norm is bounded by 1. 


All entry neighbors are of course also row neigh- 
bors so that the privacy definition based on row 
neighbors is stronger than that of entry neighbors. 
It is sometimes natural to scale the matrix such 
that either all entries have magnitude at most 1 
or all rows have Euclidean norm at most 1. 
While this may strengthen the privacy guarantee, 
it also leads to a corresponding deterioration 
in the utility guarantee of the algorithm as the 
signal-to-noise ratio decreases. It is tempting to 
nonuniformly scale each row by a different factor. 
However, this can dramatically change the singu- 
lar vector decomposition and does not in general 
lead to an easily interpretable guarantee. 


Key Results 


We describe two simple and effective methods 
that lead to nearly optimal approximation guaran- 
tees in various settings we introduced above. The 
first algorithm is based on the well-known power 
method. The other uses a simple Gaussian noise 
addition step (Fig. 1). 


Noisy Power Method 

For simplicity we describe the algorithm in the 
case where A is a symmetric n x n matrix. The 
algorithm extends straightforwardly to rectangu- 
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lar and asymmetric matrices as explained in [5]. 
We first state a general-purpose analysis of PPM. 


Theorem 1 ({3]) Let k <p. Then, the pri- 
vate power method satisfies (€, 5)-differential pri- 
vacy under the entry neighbor relation, and after 
L= Oa log(n)) iterations, we have with 


probability 9/10 that 


| — X2X7)Vell < 


o(2 max#_, | Xellooy7 log L 


Ok —Ok+1 


a) 
Jp-Vk-1/ 


When p = k+ 92(k), the trailing factor becomes 
a constant. If p = k, it creates a factor k 
overhead. In the worst case we can always bound 
|X¢lloo by 1 since X¢ is an orthonormal basis. 
However, in principle, we could hope that a much 
better bound holds provided that the target basis 
V, has small coordinates. Hardt and Roth [4, 5] 
suggested a way to accomplish a stronger bound 
by considering a notion of coherence of A, de- 
noted as (A). The coherence parameter varies 
between | and n but is often sublinear in n. In- 
tuitively, the coherence measures the correlation 
between the singular vectors of the matrix with 
the standard basis. Low coherence means that 
the singular vectors have small coordinates in the 
standard basis. Many results on matrix comple- 
tion and robust PCA crucially rely on such an 
assumption though the exact notion is somewhat 
different here. Specifically, if A = VIV' is 
a singular vector decomposition of A, we define 
(A) = n maxi, jetn Vis. 

Theorem 2 ([3]) Under the assumptions of The- 
orem 3, we have the conclusion 


2. For 2=1 to L: 


Output: Matrix Xz, 


Input: Symmetric A € R"*", L, p, privacy parameters ¢,6 > 0 
1. Let Xo be a random orthonormal basis and put o = e~!,/4pL log(1/6) 


a) Yp — AX¢_1 + Ge where Gy ~ N(0, || X¢_1||2,07)”*?. 
b) Compute the QR-factorization Yp = X;Rp 


Private Spectral Analysis, Fig. 1 Private power method. Here || X ||oo = max;;|Xj/| 


Private Spectral Analysis 
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2.C-—A'A+E 7 
Output: Top p singular vectors X € R”*? of C 


Input: Matrix A € R™*", privacy parameters ¢,6 > 0, parameter p € N 
1. Let E be a symmetric matrix where the upper triangle (including the diagonal) is sampled 


iid. from N(0,07) where o = \/2In(1.25/6)/e 


Private Spectral Analysis, Fig. 2 Gaussian mechanism 


| — XL X7)VEl| < 


(eden JP . 


Ok — Ok41 


Gaussian Mechanism 

The Gaussian mechanism first appeared in [1] 
and was recently revisited [2]. The algorithm 
simply computes the covariance matrix of the 
data set and adds suitably scaled (symmetric) 
Gaussian noise to the covariance matrix. The 
result is differentially private, and the top singular 
vectors of the perturbed covariance matrix serve 
as an approximation of the true singular vectors 
(Fig. 2). 


Theorem 3 ({2]) Let k <_ p. .Then, the 
Gaussian mechanism satisfies (€, 6)-differential 
privacy under the row neighbor relation, 
and with probability 1 — o(1), we have 
AX | > |AVull — O (ok Vn). Moreover, 


with probability 1 — o(1), 


VAX = AVell — 0 (; 


Applications 


Principal Component Analysis 

In principal component analysis, the goal is 
to compute the top k singular vectors of the 
n x n matrix A' A. Recall that we identified 
data points with row vectors in A. The singular 
vectors of A'A are identical to the right 
singular vectors of A. Hence, both algorithms 
we previously discussed immediately solve this 
problem. 


Low-Rank Approximation 

In low-rank approximation, the goal is to output 
a matrix B of rank k such that ||A— Bll, is 
small, where v € {2,F}. For either norm, 
the optimal solution is given by the truncated 
singular value decomposition B = Ux 2X, AL 
In the context of privacy-preserving spectral 
analysis, a good approximation Vi. to Ve typically 
leads to a good low-rank approximation by 
performing a privacy-preserving multiplication 
step dy, Ug -_ AVz + N, where N is 
suitably chosen noise matrix. See, for example, 


[5]. 


Open Problems 


1. Is it possible to obtain an expressed variance 
guarantee for the noisy power method? For 
instance, can we match the bounds achieved 
by Gaussian noise addition via the power 
method? The problem with the Gaussian 
mechanism is that it computes the matrix 
A'A which is impractical when n is large 
but A may be sparse. In this case, the power 
method is computationally far more efficient. 

2. Can we weaken the incoherence assumption in 
Theorem 2? 

3. Theorem 3 depends on the separation between 
Ox and o¢4, even when p > k. Is it possible 
to replace the dependence on ox, — ox+41 with 
a dependence on ox — 0p41? 
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Problem Definition 


An important problem in wireless sensor net- 
works is that of local detection and propaga- 
tion, i.e., the local sensing of a crucial event 
and the energy and time efficient propagation of 
data reporting its realization to a control cen- 
ter (for a graphical presentation, see Fig. 1). 
This center (called the “sink”) could be some 
human authorities responsible of taking action 
upon the realization of the crucial event. More 
formally: 
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Control Center 


: Sensor nodes 
Sensor field 


Probabilistic Data Forwarding in Wireless Sensor 
Networks, Fig. 1 A sensor network 


Definition 1 Assume that a single sensor, E, 
senses the realization of a local event £. Then 
the propagation problem is the following: “How 
can sensor P, via cooperation with the rest of 
the sensors in the network, efficiently propagate 
information reporting the realization of the event 
to the sink S$?” 


Note that this problem is in fact closely related to 
the more general problem of data propagation in 
sensor networks. 


Wireless Sensor Networks 

Recent dramatic developments in  micro- 
electro-mechanical systems (MEMS), wireless 
communications and digital electronics have 
led to the development of small in size, low- 
power, low-cost sensor devices. Such extremely 
small (soon in the cubic millimetre scale) 
devices integrate sensing, data processing and 
wireless communication capabilities. Examining 
each such device individually might appear 
to have small utility, however the effective 
distributed self-organization of large numbers 
of such devices into an ad-hoc network may 
lead to the efficient accomplishment of large 
sensing tasks. Their wide range of applications 
is based on the use of various sensor types 
(i.e., thermal, visual, seismic, acoustic, radar, 
magnetic, etc.) to monitor a wide variety of 
conditions (e.g., temperature, object presence 
and movement, humidity, pressure, noise levels 
etc.). For a survey on wireless sensor networks 
see [1] and also [6, 9]. 
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A Simple Model 

Sensor networks are comprised of a vast number 
of ultra-small homogeneous sensors, which are 
called “grain” particles. Each grain particle is 
a fully-autonomous computing and communica- 
tion device, characterized mainly by its avail- 
able power supply (battery) and the energy cost 
of computation and transmission of data. Such 
particles (in the model here) do not move. Each 
particle is equipped with a set of monitors (sen- 
sors) for light, pressure, humidity, temperature 
etc. and has a broadcast (digital radio) beacon 
mode. 

It is assumed that grain particles are randomly 
deployed in a given area of interest. Such a place- 
ment may occur e.g., when throwing sensors 
from an airplane over an area. A special case is 
considered, when the network being a lattice (or 
grid) deployment of sensors. This grid placement 
of grain particles is motivated by certain applica- 
tions, where it is possible to have a pre-deployed 
sensor network, where sensors are put (possibly 
by a human or a robot) in a way that they form 
a 2-dimensional lattice. 

It is assumed that each particle has the fol- 
lowing abilities: (i) It can estimate the direc- 
tion of a received transmission (e.g., via the 
technology of direction-sensing antennae). (ii) It 
can estimate the distance from a nearby particle 
that did the transmission (e.g., via estimation of 
the attenuation of the received signal). (iii) It 
knows the direction towards the sink S. This can 
be implemented during a set-up phase, where 
the (powerful) sink broadcasts the information 
about itself to all particles. (iv) All particles have 
a common co-ordinates system. Notice that GPS 
information is not assumed. Also, there is no need 
to know the global structure of the network. 


Key Results 


The Basic Idea 

For the above problem [3] proposes a protocol 
which tries to minimize energy consumption by 
probabilistically favoring certain paths of local 
data transmissions towards the sink. Thus this 
protocol is called PFR (Probabilistic Forwarding 
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Protocol). Its basic idea is to avoid flooding by 
favoring (in a probabilistic manner) data prop- 
agation along sensors which lie “close” to the 
(optimal) transmission line, ES, that connects the 
sensor node detecting the event, E, and the sink, 
S. This is implemented by locally calculating the 
angle @ = (EPS), whose corner point P is the 
sensor currently running the local protocol, hav- 
ing received a transmission from a nearby sensor, 
previously possessing the event information (see 
Fig. 2). If g is equal or greater to a predetermined 
threshold, then p will transmit (and thus prop- 
agate the information further). Else, it decides 
whether to transmit with probability equal to £. 
Because of the probabilistic nature of data prop- 
agation decisions and to prevent the propagation 
process from early failing, the protocol initially 
uses (for a short time period which is evaluated) 
a flooding mechanism that leads to a sufficiently 
large “front” of sensors possessing the data under 
propagation. When such a “front” is created, 
probabilistic Forwarding is performed. 


The PFR Protocol 
The protocol evolves in two phases: 


Probabilistic Data Forwarding in Wireless Sensor 
Networks, Fig. 2. Angle @ and proximity to the optimal 
line 
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Probabilistic Data 
Forwarding in Wireless 
Sensor Networks, Fig. 3 
Thin zone of particles 


Phase 1: The “Front” Creation Phase 

Initially the protocol builds (by using a limited, 
in terms of rounds, flooding) a sufficiently large 
“front” of particles, to guarantee the survivabil- 
ity of the data propagation process. During this 
phase, each particle having received the data to 
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be propagated, deterministically forwards them 
towards the sink. 


Phase 2: The Probabilistic Forwarding Phase 

Each particle P possessing the information under 
propagation (called in fo(£) hereafter), calcu- 
lates an angle @ by calling the subprotocol “g- 
calculation” (see description below) and broad- 
casts in fo(E) to all its neighbors with probabil- 
ity Pfywa (or it does not propagate any data with 
probability 1 — P ¢,,¢) as follows: 


1 if @ Pat Pthreshold 


Pryd = 
- otherwise 


al 


where ¢ is the (EPS) angle and @ihreshola = 134° 
(the selection reasons of this value are discussed 
in [3]). 

If the density of particles is appropriately 
large, then for a line ES there is (with high proba- 
bility) a sequence of points “closely surrounding 
ES” whose angles ¢ are larger than @threshoig and 
so that successive points are within transmission 
range. All such points broadcast and thus essen- 
tially they follow the line ES (see Fig. 3). 


E 


Probabilistic Data Forwarding in Wireless Sensor 
Networks, Fig. 4 Angle @ calculation example 


The g-calculation Subprotocol (see Fig. 4) 
Let Pprey the particle that transmitted in fo(£) to 
P. 


1. When Pprey broadcasts info(E), it also 


attaches the info |EPprey| and the direction 
—_ 

P prevE - 

P estimates the direction and length of line 
segment Pprey P, as described in the model. 

P now computes angle (EF Pprey P), and com- 


putes |E P| and the direction of PE (this will 
be used in further transmission from P). 

P also computes angle (Porey PE ) and by sub- 
tracting it from (PoePS ) it finds @. 


Performance Properties of PFR 

Any protocol II solving the data propagation 
problem must satisfy the following three proper- 
ties: (a) Correctness.IT must guarantee that data 
arrives to the position S, given that the whole net- 
work exists and is operational. (b) Robustness. 
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II must guarantee that data arrives at enough 
points in a small interval around S, in cases where 
part of the network has become inoperative. (c) 
Efficiency. If II activates k particles during its 
operation then II should have a small ratio of 
the number of activated over the total number of 
particles r = x. Thus + is an energy efficiency 
measure of IT. It is shown that this is indeed the 
case for PFR. 

Consider a partition of the network area into 
small squares of a fictitious grid G (see Fig. 5). 
When particle density is high enough, occupancy 
arguments guarantee that with very high proba- 
bility (tending to 1) all squares get particles. All 
the analysis is conditioned on this event, call it 
F, of at least one particle in each square. Below 
only sketches of proofs are provided (full proofs 
can be found in [3]). 


The Correctness of PFR 

Consider any square & intersecting the ES line. 
By the occupancy argument above, there is w.h.p. 
a particle in this square. Clearly, the worst case 
is when the particle is located in one of the 
corners of & (since the two corners located most 
far away from the ES line have the smallest 
g-angle among all positions in &). By geometric 
calculations, [3] proves that the angle @ of this 
particle is @ > 134°. But the initial square (i.e., 
that containing FE) always broadcasts and any in- 
termediate intersecting square will be notified(by 
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induction) and thus broadcast because of the 
argument above. Thus the sink will be reached 
if the whole network is operational: 


Lemma 1 ((3]) PFR succeeds with probability 1 
given the event F- 


The Energy Efficiency of PFR 

Consider a “lattice-shaped” network like the one 
in Fig. 5 (all results will hold for any random 
deployment “in the limit’). The analysis of the 
energy efficiency considers particles that are ac- 
tive but are as far as possible from ES. [3] esti- 
mates an upper bound on the number of particles 
inann xn (i.e., N =n Xn) lattice. If k is this 
number then r = + (0 <r < 1) is the “energy 
efficiency ratio” of PFR. More specifically, in [3] 
the authors prove the (very satisfactory) result 
below. They consider the area around the ES line, 
whose particles participate in the propagation 
process. The number of active particles is thus, 
roughly speaking, captured by the size of this 
area, which in turn is equal to |E'S| times the 
maximum distance from |E S|. This maximum 
distance is clearly a random variable. To calculate 
the expectation and variance of this variable, 
the authors in [3] basically “upper bound” the 
stochastic process of the distance from ES by 
a random walk on the line, and subsequently 
“upper bound” this random walk by a well-known 
stochastic process (i.e., the “discouraged arrivals” 
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birth and death Markovian process. Thus they 
prove: 


Theorem 2 ((3]) The energy efficiency of the 
PFR protocol is © ((%2)’) where no = |ES| 


andn = VN, where N is the number of particles 
in the network. For no = |ES| = o(n), this is 


o(1). 


The Robustness of PFR 

Consider particles “very near” to the ES line. 
Clearly, such particles have large g-angles (i.e., 
@ > 134°). Thus, even in the case that some of 
these particles are not operating, the probability 
that none of those operating transmits (during 
phase 2) is very small. Thus: 


Lemma 3 ([3]) PFR manages to propagate the 
crucial data across lines parallel to ES, and of 
constant distance, with fixed nonzero probability 
(not depending on n, |E'S|). 


Applications 


Sensor networks can be used for continuous sens- 
ing, event detection, location sensing as well as 
micro-sensing. Hence, sensor networks have sev- 
eral important applications, including (a) security 
(like biological and chemical attack detection), 
(b) environmental applications (such as fire de- 
tection, flood detection, precision agriculture), (c) 
health applications (like telemonitoring of human 
physiological data) and (d) home applications 
(e.g., smart environments and home automation). 
Also, sensor networks can be combined with 
other wireless networks (like mobile) or fixed 
topology infrastructures (like the Internet) to pro- 
vide transparent wireless extensions in global 
computing scenaria. 


Open Problems 
It would be interesting to come up with for- 


mal models for sensor networks, especially with 
respect to energy aspects; in this respect, [10] 
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models energy dissipation using stochastic meth- 
ods. Also, it is important to investigate funda- 
mental trade-offs, such as those between energy 
and time. Furthermore, the presence of mobility 
and/or multiple sinks (highly motivated by ap- 
plications) creates new challenges (see e.g., [2, 
11]). Finally, heterogeneity aspects (e.g., having 
sensors of various types and/or combinations of 
sensor networks with other types of networks like 
p2p, mobile and the Internet) are very important; 
in this respect see e.g., [5, 13]. 


Experimental Results 


An implementation of the PFR protocol along 
with a detailed comparative evaluation (using 
simulation) with greedy forwarding protocols can 
be found in [4]; with clustering protocols (like 
LEACH, [7]) in [12]; with tree maintenance ap- 
proaches (like Directed Diffusion, [8]) in [5]. 
Several performance measures are evaluated, like 
the success rate, the latency and the energy dis- 
sipation. The simulations mainly suggest that 
PFR behaves best in sparse networks of high 
dynamics. 
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Problem Definition 


The virus identification is an important research 
topic in molecular biology. One method is using 
probes. A probe is a short oligonucleotide of size 
8-25, which plays a role of ID when identify a 
virus in a biological sample through hybridiza- 
tion. If each probe hybridizes to a unique virus, 
then identification of virus is straightforward. 
However, unique probes are very hard to be 
obtained, especially for virus subtypes which are 
closely related. Therefore, how to identify virus 
with the minimum number of nonunique probes 
becomes an interesting problem. 

Given a biological sample and a set of possibly 
nonunique probes, how to select a minimum sub- 
set of probes to identify viruses in the biological 
sample. This problem is called the nonunique 
probe selection. 
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Key Results 


Suppose the biological sample contains only one 
virus. The problem is to determine what is this 
virus. To do so, it is sufficient to select probes 
satisfying the condition that different viruses hy- 
bridize different subsets of probes. This condition 
enables us to find the virus easily from a test 
outcome. 

In general, suppose the biological sample con- 
tains at most d viruses. Then selected probes 
should satisfy the condition that different sets 
of at most d viruses should hybridize different 
subsets of selected probes. Schilep, Torney, and 
Rahman [9] first pointed out that this is actu- 
ally a nongroup testing group testing problem 
[3]. 

Consider each virus as an item and each probe 
as a pool consisting of all viruses hybridized by 
the probe. A nonadaptive group testing with n 
items and ¢ pools can be represented, and t x n 
binary matrix with rows labeled by pools and 
columns labeled by items and cell (i, 7) contains 
1-entry if and only if the 7th pool contains item /. 
This binary matrix is called the incidence matrix 
of the nonadaptive group testing. In theory of 
nonadaptive group testing, the above condition 
means that the incidence matrix is d -separable. 
Actually, a binary matrix is d -separable if all 
Boolean sums of at most d columns are distinct. 
Here, by Boolean sum, we mean the following: If 
each column is seen as a set of rows correspond- 
ing to l-entries in the column, then the Boolean 
sum can be seen as a union of columns. The 
Boolean sum is a classic statement in the study of 
group testing. With a d -separable matrix, the test 
outcome can identify up to d viruses in biological 
sample. 

In nonadaptive group testing, each test is on a 
pool. Thus, each probe can also be seen as a test. 
The test outcome is positive if the probe is hy- 
bridized by some virus in a biological sample and 
negative otherwise. Test outcomes for all selected 
probes can be written as a column vector which 
is exactly the union of columns corresponding 
viruses contained in the biological sample, where 
l-entry denotes a positive outcome and 0-entry 
denotes a negative outcome. Therefore, the defi- 
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nition of d -separable matrix means that different 
sets of at most d viruses receive different test- 
outcome f¢-dimensional vectors. 

The nonunique probe selection problem can 
also be formulated as follows: 


MIN-d-SS (Minimum d -Separable Submatrix). 
Given a binary matrix M, find the minimum of 
rows to form a d-separable submatrix. 


For any fixed d, Min-d-SS is NP-hard [3]: 
Moreover, from the test outcome obtained from 
d -separable, it may take time O(n") to find all 
existing viruses. This means that it is hard to de- 
code the test outcome from a d -separable matrix 
[3]. Therefore, Thai et al. [10] considered to use a 
d-disjunct matrix instead of d -separable matrix. 
A binary matrix is d-disjunct if any union of d 
columns cannot contain the (d + 1)th column. 
Decoding test outcome from a d-disjunct matrix 
is very easy [3]. This introduces another mini- 
mization problem: 


MIN-d-DS (Minimum d-Disjunct Submatrix). 
Given a d-disjunct binary matrix M, find a 
minimum subset of rows to form d-disjunct 
submatrix. 


Theoretically, there is another similar problem 
as follows: 


MINn-d-SS (Minimum d-Separable Submatrix). 
Given a d-separable binary matrix M, find a 
minimum subset of rows to form d-separable 
submatrix where a binary matrix is d-separable if 
all Boolean sums of exactly d columns are distinct. 


For d = 1, MIN-d-SS is exactly the minimum 
test cover problem [5], also called the minimum 
test set problem [2] or the minimum test collec- 
tion [6], which has a greedy approximation with 
performance 1 + 21nn where 7 is the number of 
items [2]. This fact makes a suggestion that de- 
sign greedy approximations for MIN-d-SS, MIN- 
d-SS, and MIN-d-DS. 

In fact, it is easy to construct greedy approx- 
imations with performance ratio 1 + 2d Inn for 
Min-d-SS, 1 + (d + 1) Inn for MIN-d-DS, and 
1 + 2d In(n + 1) for Min-d-SS. For example, 
let us study MIN-d-DS. Consider the collection 
S of all possible pairs (C,D) of one column 
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C and a submatrix D with d columns. Clearly 
|S| < n@+!. A row is said to cover such a pair 
(C, D) if and only if at this row, the entry of 
column C is | and all entries of columns in D are 
0. Now, MIN-d-DS is equivalent to the problem 
of finding the minimum number of rows covering 
all such pairs. This is a special case of the set 
cover problem. It is well known that there is a 
greedy algorithm for the set cover problem with 
performance ratio 1 + In|S| < 1+ (d+ 1)Inn. 

This greedy algorithm works well only for 
small d because its running time is O(n?*!). 
When d is large, it runs too slow. Therefore, we 
must look for other smart ways. Schilep, Torney, 
and Rahman [9] proposed an algorithm which 
adds probe one by one until the incidence ma- 
trix with considered viruses forms a d -separable 
matrix. This does not work for large d, neither. 
In fact, if d is not bounded, then testing whether 
a binary matrix is d-separable, or d -separable, 
or d-disjunct is co-NP-complete [3]. There exist 
more methods [8] in the literature, which work 
well for small d. However, no efficient method 
has been found to produce good solutions for 
larger d. 

<?pag ?>In some applications, the pool size 
cannot be too big due to the sensitivity of 
tests. For example, UNH suggested in ADS 
testing, each pool should not contain more 
than five blood samples. When the pool size 
is bounded, the problem becomes easier. For 
instance, let us consider the case that every 
pool has size at most 2 so that all pools of 
size 2 together with items form a graph G 
where pools are edges and item are vertices. 
Halldérsson et al. [6] and De Bontridder 
et al. [2] proved that in this case, MIN-1-SS is 
still APX-hard, which means that there is no 
polynomial-time approximation scheme for it 
unless NP=P. They also showed that MIN-1-SS 
in this case has a polynomial-time approximation 
with performance ratio 7/6 + e for any fixed 
e>0. 

A surprising result was showed by Wang 
et al. [11] that a subgraph H of G represents 
a d-disjunct matrix if and only if every vertex in 
H has degree at least d + 1, and hence, finding 
such an H with minimum number of edges is 
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polynomial-time solvable. What about the case 
that all pools have size 3 Wang et al. proved that 
in this case MIN-d -DS is still NP-hard. However, 
there exist polynomial-time approximations with 
better performance. 


Applications 


In practice, we may select nonunique probes in 
the following steps [9]: 


Step 1. Estimate an upper bound d for the num- 
ber of viruses existing in a given biologi- 
cal sample. Collect a large set of nonunique 
probes to form a d -separable matrix. 

Step 2. From this large set of probes, find a 
subset of probes to identify up to d viruses 
by computing an approximation solution for 
MIN-d-DS or MIN-d-SS. 

Step 3. Decode the presence or absence of 
viruses in the given biological sample from 
test outcome. 


Open Problems 


When d is not fixed, MIN-d-DS belongs to ©? 
and is conjectured to be By -complete [3]. 
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Problem Definition 


The topic of prophet inequality has been stud- 
ied in optimal stopping theory since the 1970s 
[7, 9, 10] and more recently in computer sci- 
ence [1, 3, 6, 8]. In the prophet inequality set- 
ting, given (not necessary identical) independent 
distributions D;,..., D,, a sequence of random 
variables x1,...,X, where x; is drawn from D;, 
a collection M of feasible subsets of {1,...,}, 
an onlooker has to choose from the succession 
of these values, where x; is revealed to us at 
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time step 7. The onlooker starts with an empty 
set S = @. Upon the arrival of a value x;, 
the onlooker can choose to either add x; to 
the set S or discard it permanently. After the 
arrival of all values, the indices of values in S 
should form a feasible set in M. The revenue 
of the onlooker is the total value of variables in 
S. The onlooker’s goal is to maximize his/her 
(expected) revenue compared to the hindsight 
expected revenue of a prophet who knows the 
drawn values in advance. The optimal offline 
solution (the prophet’s revenue) is defined as 
OPT =E [maxyem ier xl The competitive 
ratio of an algorithm for the onlooker is defined 
as the worst-case ratio of the expected revenue of 
the onlooker over OP 7. This inequality ratio has 
been interpreted as meaning that a prophet with 
complete foresight has only a bounded advantage 
over an onlooker who observes the variables 
one by one, and this explains the name prophet 
inequality. 


Different Variants 

The basic prophet inequality discovered by Kren- 
gel, Sucheston, and Garling in the 1970s con- 
cerns the case in which the onlooker can only 
choose one value [9], i.e., MW is the set of single- 
tons. Decades later in 2007, Hajiaghayi, Klein- 
berg, and Sandholm [6] considered the k-choice 
prophet secretary variant in which sets with at 
most k elements are feasible. Later in 2012, 
Kleinberg and Weinberg [8] considered the more 
general matroid prophet inequality. In this variant 
the collection M contains the independent sets of 
a matroid. 

Other prophet inequality settings (dependent 
Dj,’s, restricted prophets, etc.) have been consid- 
ered in the literature as well. For an overview of 
these models, we refer the reader to [6, 8] and 
references therein. 


Key Results 


Krengel, Sucheston, and Garling [9] were first to 
consider basic prophet inequality. Using a very 
simple example, they showed no online algorithm 
can have a competitive ratio better than 3: let 
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q= 1. The first value, i.e., x; is always 1. The 
second value is either g with probability € or 0 
with probability 1 — e. Observe that the expected 
revenue of any (randomized) online algorithm 
is at most max {1,¢ (+)} = 1. However the 
prophet, i.e., the optimum offline solution, would 
choose x2 if x2 = q; otherwise he would choose 
the first value. Thus the optimum offline revenue 
is (1 —€) x 1 + (4) © 2. We note that without 
considering stochastic assumptions, we cannot 
hope to get any constant competitive ratio. 

An algorithm for the basic prophet inequality 
problem can be described by setting a threshold 
for every step: we stop at the first step that 
the arriving value is higher than the threshold 
of that step. The classical prophet inequality re- 
sult [9] states that by choosing the same threshold 
OPT/2 for every step, one achieves the tight 
competitive ratio of 1/2. 

For the k-choice variant, Hajiaghayi et al. [6] 
show an algorithm with the competitive ratio 
1-—O (“He). Later Alaei [1] improved this 


bound to 1— Tes 
approach (gamma-conservative magician). Alaei, 
Hajiaghayi, and Liaghat simplified and general- 
ized these results to the matching prophet in- 
equality [2,3]. Later they generalized their result 
to the online stochastic generalized assignment 
problem [4] (GAP) with slightly worse compet- 


itive ratio of 1 — a In GAP, we have a set 


using an involved randomized 


of items to be placed in a set of bins. The bins 
are known in advance, but the sequence of items 
arrives online; each item has a value and a size; 
upon arrival, an item can be placed in one of 
the bins or can be discarded permanently; the 
objective is to maximize the total value of the 
placement. Both value and size of an item may 
depend on the bin in which the item is placed; 
the size of an item is revealed only after it has 
been placed in a bin; distribution information is 
available about the value and size of each item 
in advance (not necessarily i.i.d.); however, items 
arrive in adversarial order (nonadaptive adver- 
sary). Alaei et al. [4] show an algorithm with the 
competitive ratio of 1 — 7 where in this setting 
k is interpreted as the minimum number of items 
that can fill up the capacity of a bin. 
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Kleinberg and Weinberg [8] considered the 
matroid prophet inequality. They show an elegant 
algorithm that still achieves the competitive ratio 
of 1/2. Generalizing their result still further, they 
show that under an intersection of p matroid 
constraints, the prophet’s revenue exceeds the 
onlooker’s by a factor of at most O(p), and 
this factor is also tight. Kleinberg and Weinberg 
design the following algorithm for the matroid 
prophet inequality. The algorithm pretends that 
the online selection process is Phase 1| of a two- 
phase game; after each x; has been revealed in 
Phase 1 and the algorithm has accepted some set 
Aj, Phase 2 begins. In Phase 2, a new weight will 
be sampled for every matroid element, indepen- 
dently of the Phase 1 weights, and the algorithm 
will play the role of the prophet on the Phase 2 
weights, choosing the max-weight subset Az such 
that A; U Az is independent. However, the rev- 
enue for choosing an element in Phase 2 is only 
half of its value. When observing element 7 and 
deciding whether to select it, our algorithm can 
be interpreted as making the choice that would 
maximize its expected revenue if Phase 1 were 
to end immediately after making this decision 
and Phase 2 were to begin. Of course, Phase 2 
is purely fictional: it never actually takes place, 
but it plays a key role in both the design and 
the analysis of the algorithm. The analysis of 
the algorithm is involved and relies on a careful 
analysis of the expected revenue at each step. For 
further intuitions about the analysis, we refer the 
reader to [8]. 


Applications 


Beyond their interest as theorems about pure on- 
line algorithms or optimal stopping rules, prophet 
inequalities also have applications to mechanism 
design. Mechanism design has traditionally fo- 
cused on the offline setting where all agents 
are present up front. However, many electronic 
commerce applications do not fit that model be- 
cause the agents can arrive and depart dynami- 
cally. This is characteristic, for example, of online 
ticket auctions, search keyword auctions, Internet 
auctions, and scheduling computing jobs on a 
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cloud. The online aspect is characteristic of some 
important traditional applications as well, such as 
the sale of a house, where the buyers arrive and 
depart dynamically. 

The pioneer work of Hajiaghayi, Kleinberg, 
and Sandholm [6] initiated the research on the 
relationship between algorithmic mechanism de- 
sign and prophet inequalities. They observed that 
algorithms used in the derivation of prophet in- 
equalities, owing to their monotonicity proper- 
ties, could be interpreted as (temporarily) truthful 
online auction mechanisms and that the prophet 
inequality in turn could be interpreted as the 
mechanism’s approximation guarantee. Indeed, 
Bayesian optimal mechanism design problems 
provide a compelling application of prophet in- 
equalities in economics. In such a Bayesian mar- 
ket, we have a set of n agents with private types 
sampled from (not necessary identical) known 
distributions. Upon receiving the reported types, 
a seller has to allocate resources and charge prices 
to the agents. The goal is to maximize the seller’s 
revenue in equilibrium. Chawla et al. [5] pio- 
neered the study of the approximability of a spe- 
cial class of such mechanisms, sequential posted 
pricing (SPM): the seller makes a sequence of 
take-it-or-leave-it offers to agents, offering an 
item for a specific price. They show although 
simple, SPMs approximate the optimal revenue 
in many different settings. Therefore prophet in- 
equalities directly translate to approximation fac- 
tors for the seller’s revenue in these settings 
through standard machineries. Indeed one can 
analyze the so-called virtual values of winning 
bids introduced by Roger Myerson [11], to prove 
via prophet inequalities that the expected virtual 
value obtained by the SPM mechanism approxi- 
mates an offline optimum that is with respect to 
the exact types. Chawla et al. [5] provide a type 
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of prophet inequality in which one can choose the 
ordering of agents. As mentioned before, Klein- 
berg and Weinberg [8] later improved their result 
by giving an algorithm with the tight competitive 
ratio of 0.5 for an adversarial ordering. 
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Problem Definition 


The quadtree describes a class of data structures 
for geometric objects. A quadtree partitions space 
hierarchically using a stopping rule that decides 
when a region is small enough so that it does not 
need to be subdivided further. If the space is d 
dimensional, a quadtree recursively divides a d- 
dimensional hypercube containing the input data 
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into 2¢ hypercubes until each region satisfies the 
given stopping rule. In 2D, the hypercubes are 
squares. Three-dimensional quadtrees are also 
known as octrees. Quadtrees have been used for 
many types of data, such as points, line segments, 
polygons, rectangles, curves, and images, and for 
many types of applications. For a detailed presen- 
tation, we refer to the book by Samet [10]. While 
their worst-case behavior is good only in some 
simple cases, quadtrees perform well empirically 
in many applications. 

A quadtree can be stored as a tree that corre- 
sponds to the hierarchical subdivision of the input 
region. A region that is subdivided further is then 
represented by a node with four children, one for 
each quadrant; the cells that are not subdivided 
further constitute the leaves of the tree and repre- 
sent a subdivision of the input region. A quadtree 
with m leaves has exactly (m — 1)/3 internal 
nodes and 4m/3 — 1/3 nodes in total. Hence it 
can be described by a sequence of 4m/3 — 1/3 
bits representing the nodes of the tree in preorder, 
where each internal node is represented by a 1 
(meaning that the next bit encodes its first child) 
and each leaf is represented by a 0. However, 
for efficient navigation one would typically use 
a pointer-based data structure. Alternatively, one 
may store only the leaves of the tree, ordered 
along a space-filling curve. This variant of the 
quadtree is called the linear quadtree and was 
introduced by Gargantini [3]. The linear quadtree 
has smaller memory requirements as it does not 
store the tree structure but only the data in the 


1638 


leaves. This makes it particularly useful when 
dealing with large data. 

In this entry we focus on quadtree construction 
algorithms that are efficient on very large data. To 
analyze these algorithms, we use the > I/O-Model 
and the » Cache-Oblivious Model. We’ll use the 
terms linear quadtree and quadtree subdivision 
interchangeably. We define the size of a subdivi- 
sion as the number of cells it contains and the size 
of a cell is the size of the data (points and edges) 
it contains/intersects. 


The Complexity of Quadtrees for Points in 

the Plane 

Let P be a set of n points in the plane and 
assume, for simplicity, that the points lie in the 
unit square. A quadtree for P corresponds to a 
recursive subdivision of the unit square into four 
equal regions, called canonical squares, quad- 
rants, or cells, until each square contains at most 
one point. Following customary terminology in 
the computational geometry literature (and in 
deviation from Samet [10]), we refer to this 
generically as a point quadtree. 

In the worst case, the size of a quadtree sub- 
division on P cannot be bounded by a func- 
tion of n. If € is the distance between the two 
closest points in P, the worst-case complexity is 
O(n lg 1), and the corresponding tree may have 
a large number of empty nodes. A compressed 
quadtree is a quadtree where paths of nodes that 
each have three empty children are merged into a 
single node along with their empty children; the 
region corresponding to the merged node is called 
a donut and represents the difference between 
two canonical squares. A compressed quadtree 
for a set of n points in the plane such that each 
cell contains at most one point has size O(n) and 
height O(n) in the worst case. 


The Complexity of Quadtrees for Line 
Segments in the Plane 

Let € be a set of n non-intersecting line seg- 
ments in the plane — for example, the edges of a 
planar subdivision — and assume, as above, that 
the edges lie in the unit square. We refer to a 
quadtree for € generically as an edge quadtree 
and assume that each edge is stored in all the cells 
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that it intersects. The simplest way to define an 
edge quadtree may be to take a point quadtree 
on the endpoints of the edges and then store 
each edge with the leaves that correspond to the 
quadtree cells intersected by the edge. We denote 
by / the number of intersections between € and 
the cells in the subdivision. Even if we use a 
compressed quadtree, in the worst case, there can 
be @(n) cells that each intersects O(n) edges, 
so 1 = O(n), and the quadtree will have size 
O(n +1) = O(n”). Other edge quadtrees can 
be defined by formulating stopping criteria that 
allow subdividing cells further in order to limit 
the number of edges that intersect each cell; 
this will result in a subdivision with more cells 
but smaller number of edges per cell. However, 
obtaining a good trade-off between the size of 
a cell (number of points and edges inside or 
intersecting it) and the number of cells in the 
subdivision is not possible in the worst case. Note 
that an edge quadtree that splits a region until it 
intersects a single edge will result in a subdivision 
of unbounded size since the distance between two 
edges can be arbitrarily small. 


Quadtrees and Morton Indexing 

Quadtrees are often used in conjunction with a 
z-order space-filling curve. A z-order, or Morton 
order, can be understood as a mapping from two- 
dimensional (in general multidimensional) data 
to one dimension. We use a z-order curve that 
visits the four quadrants of the initial square, 
recursively, in the order top left, top right, bot- 
tom left, and bottom right. This order gives a 
well-defined ordering between any two canonical 
squares in the subdivision. If we define canonical 
squares to be closed on the top and left side and 
open on the bottom and right side, the z-order also 
gives a well-defined ordering between any two 
points in the input region. Let p = (px, py) be 
a point in the unit square [0, 1)”, with the x-axis 
oriented from left to right and the y-axis oriented 
from top to bottom. We define the z-index Z(p) 
of p to be the value in the range [0, 1) obtained 
by interleaving the bits in the fractional parts of 
Px and py, starting with a bit from py. The value 
Z(p) is sometimes called the Morton block index 
of p. The z-order of two points in the unit square 
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is the order of their z-indices. A crucial property 
is that the z-indices of all points in a canonical 
square o form an interval [Z1, Z2) of [0, 1), where 
Z,1 is the z-index of the top left corner of o. A 
donut cell is the difference between two canonical 
squares [Z1,Z2) and [z3, z4), and thus, it is the 
union of two intervals [z1, z3) and [z4, Z2). 

With this notation, a (compressed) quadtree 
subdivision corresponds to a subdivision Q of 
the z-order curve and can be viewed as a set 
of consecutive, adjacent, nonoverlapping inter- 
vals, covering [0,1), in z-order: OQ = {[z; = 
0, Z2), [Z2,Z3),-..-}. Each interval [z;, z;41) cor- 
responds to a cell 0;, which is either a canonical 
square or a part of a donut. We note that this rep- 
resentation does not make any assumptions on the 
stopping criterion used to generate the quadtree 
subdivision and thus works on any quadtree sub- 
division, no matter how many points are in a 
region and whether it is compressed or not. A 
linear edge quadtree can therefore be represented 
as a Set of key-edge pairs, where each intersection 
of an edge e with a quadtree cell o corresponding 
to an interval [z,,Z2) is represented by storing 
edge e with key z,; thus each cell stores all edges 
that intersect it [2,4]. 


Key Results 


Point Quadtrees 
Agarwal et al. [1] described an algorithm for 
constructing a quadtree on a set of n points in the 
plane such that each cell contains O(k) points; 
the algorithm runs in O(% eHTE) 1/0’s, where 
h is the height of the quadtree. Effectively, this is 
O(sort(7)) 1/0’s only when h = O(logn), which 
is true when the points are nicely distributed. A 
bound on the size of the quadtree is not given, and 
the quadtree is not compressed, which means the 
quadtree size can be unbounded in the worst case. 
The algorithms were implemented and tested 
as part of an application to interpolate LIDAR 
datasets, which are nicely distributed and un- 
likely to cause worst-case behavior. 

De Berg et al. [2] described an algorithm to 
construct a compressed quadtree subdivision with 
at most one point per cell in O(sort(7)) 1/0’s, as 
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a step in the construction of their Guard-quadtree 
for edges which is discussed below. Haverkort 
et al. [4] describe a simple generalization of 
this algorithm which constructs a compressed 
quadtree subdivision of O(n/k) cells with at 
most k points per cell in the same 1/O-bound. 
Thus, compared to the algorithm by Agarwal 
et al., a stronger bound on the I/O-complexity 
is obtained, along with an upper bound on the 
number of cells in the subdivision. 


PM Quadtrees 

A variety of edge quadtrees were described by 
Samet and various co-authors [5—7,9, 11,12]. All 
of these solutions are aimed at subdividing the 
cells that intersect too many edges, while also 
limiting the total size of the quadtree and being 
able to construct it 1/O-efficiently. 

The PM quadtree [11] allows a region to 
contain more than one edge if the edges meet 
at a vertex inside the region; otherwise it keeps 
subdividing it. Variants of PM quadtrees differ 
in how to handle regions that contain no vertices 
(only edges). The segment quadtree [12] is a lin- 
ear quadtree in which a leaf cell is either empty, 
contains one edge and no vertices, or contains 
precisely one vertex and its incident edges. The 
most versatile structure within the PM family is 
the PMR quadtree [9], a linear quadtree where 
each region may have a variable number of seg- 
ments and regions are split if they contain more 
than a predetermined threshold of edges. The tree 
is built incrementally, by inserting each segment 
into all the regions that it intersects. When a 
region contains more segments than a prede- 
termined splitting threshold, the region is split, 
once, into four quadrants. Improved algorithms 
for the construction (or bulk loading) of the PMR 
quadtree were described in [5-7]. These algo- 
rithms are developed and optimized with massive 
data in mind and use 1/O-efficient sorting as one 
of the steps. It is reported that in many cases 
(although not in the worst case), the 1/O-cost of 
the bulk-loading algorithm is the same as that of 
external sorting [5]. The algorithms are reported 
to perform well in practice, but there are several 
disadvantages: the the resulting quadtree depends 
on the insertion order; complexity is analyzed 
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in terms of various parameters that depend on 
the data; and the performance is not worst-case 
optimal. On the plus side, the algorithms can 
handle insertions and work in situations where 
the data is dynamic. 


Star-Quadtrees 

The Star-quadtree by De Berg et al. [2] is de- 
signed for fat triangulations (a triangulation is fat 
if every angle of every triangle is larger than some 
fixed positive constant 6). A Star-quadtree is a 
linear, uncompressed edge quadtree that splits a 
region until all edges intersecting a region are 
incident on one common endpoint (similar to 
the PM quadtree by Samet and Webber [11]). 
The Star-quadtree can be built on any set of 
edges in the plane, but, when the input is a fat 
triangulation, it can be shown that this stopping 
tule creates (1) a quadtree of O(n) size and (2) 
each leaf cell in the quadtree (each cell in the 
subdivision) intersecting (1) edges. The height 
of the quadtree can still be O(n), which makes 
a top-down construction, such as that used by 
Agarwal et al. [1], height dependent and not op- 
timal. The authors of the Star-quadtree describe 
a completely different algorithm for its construc- 
tion that crucially exploits the stopping criterion 
and runs in O(sort(7)) 1/o’s if the input is a fat 
triangulation. 


Guard-Quadtrees 

The Guard-quadtree by De Berg et al. [2] is 
designed for sets of non-intersecting edges of 
low density — a set of edges has density A if 
any disk D is intersected by at most A edges 
whose length is at least the diameter of D. For 
a given set of n edges, the authors define a set 
of at most 4n guards, namely, the vertices of 
the minimum axis-parallel bounding rectangles 
of the individual edges. The Guard-quadtree is 
a linear, compressed edge quadtree that splits a 
region until it contains at most one guard. As 
the set of guards is a superset of the endpoints 
of the edges, this leads to a subdivision that is 
more refined than a quadtree built only on the 
endpoints of edges. The stopping rule, together 
with compression, leads to a quadtree subdivision 
that has O(n) cells and each cell intersects O(1) 
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edges, provided the set of edges to be stored has 
low density. Furthermore, the quadtree can be 
constructed in O(sort(7)) 1/O0’s in this case. 


K-Quadtrees 

Combining ideas from De Berg et al. [2] with 
packing more vertices in a region, Haverkort 
et al. [4] described an 1/0-efficient edge quadtree 
referred to as a K-quadtree. For any k > 1, the 
K-quadtree is a compressed, linear quadtree built 
on the endpoints of the edges, with O(n/k) cells 
in total and such that each cell contains O(k) ver- 
tices (and such that each edge is stored in all cells 
that they intersect). Each cell in the subdivision 
can intersect O(n) edges in the worst case. For 
k = 1, a K-quadtree is a linear, compressed edge 
quadtree with O(n) cells and at most one vertex 
per cell. Larger values of k can be chosen to trade 
off between the number of cells O(n/k) and the 
number of vertices in a cell O(k). 

The algorithm for building a K-quadtree has 
two steps: First it builds, in O(sort(7)) 1/0’s, a 
linear, compressed quadtree subdivision on the 
endpoints of € with O(n/k) cells in total and 
such that each cell contains O(k) vertices. This 
step is a simple generalization of the algorithm 
for building Guard-quadtrees from [2]. In the sec- 
ond step, the K-quadtree construction algorithm 
computes the intersections between the edges and 
the subdivision in O(sort(n + /)) 1/0’s, where 
1 = O(n?/k) is the total number of intersections. 

The main idea of the second step of the al- 
gorithm is to split the set of edges into edges of 
positive slope €+ and edges of negative slope €_ 
and compute the intersections of each set sepa- 
rately. The intersections of €+ with the subdivi- 
sion are computed by time-forward processing, 
as follows. The cells of the quadtree subdivision 
are scanned in z-order. At any point during this 
scan, there is a frontier: an xy-monotone curve 
that constitutes the boundary between the cells 
that have already been scanned and the cells that 
are still to be scanned. The algorithm relies on the 
property that an edge of positive slope intersects 
the cells in the subdivision in z-order (a similar 
property holds for the edges of negative slope 
and a reflected version of the z-order). During 
the scan, each edge of €4 is passed on, from 
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each intersected quadtree cell to the next, through 
a supporting data structure that stores the edges 
intersecting the frontier. 

Unlike standard instantiations of time-forward 
processing, the supporting data structure is not 
a priority queue, but it is a list, implemented as 
two stacks, containing the edges that intersect the 
frontier, in order along the frontier. At each point 
in time, the list starts at the bottom of one stack 
and goes up to the top and then down the other 
stack. The cutting point between the two stacks 
corresponds to the current scanning position in 
the list; scanning backward or forward in the 
list for lookups and updates is implemented by 
moving elements from one stack to another. The 
key to 1/O-efficiency is that the total amount of 
scanning that is needed to maintain the support- 
ing data structure is linear in the output size, 
incurring only O(scan(/)) I/Os. 

As the algorithm relies only on the basic 
building blocks of 1/0-efficient sorting, scanning, 
and stacks, it is also easy to implement cache 
obliviously. 

Compared to a quadtree that employs a stop- 
ping criterion that aims to bound the number of 
edges intersecting a cell (like PMR, Star- and 
Guard-quadtrees), the simpler K-quadtree has a 
couple of advantages: (1) the resulting subdi- 
vision size is smaller; (2) the total size of the 
quadtree (the number of intersections between 
edges and the subdivisions) is also smaller since 
the size of the subdivision is smaller; and (3) the 
quadtree can be built in O(sort(n + /)) 1/o’s, 
without making any assumptions about the input. 


Datasets 


Common test datasets for 2D quadtrees are 
triangulated terrains and USA TIGER data. They 
represent relatively simple classes of inputs; 
however they arise frequently in practice and 
have been used extensively as test beds for 
spatial index structures. The TIGER dataset 
consists of 50 datasets, one for each state, 
containing roads, railways, boundaries, and 
hydrography in the state. The size of a dataset 
ranges from 115,626 edges (Delaware) to 40.4 
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million edges (Texas). The TIGER datasets can 
be downloaded from http://www.census.gov/cgi- 
bin/geo/shapefiles2013/main. 


Experimental Results 


Since many of the quadtree algorithms perform 
much better in practice than their theoretical 
worst-case bounds, experimental analysis is an 
important way to assess their merits. Some of 
the early experimental analysis of quadtrees per- 
formance on massive data was by Hjaltason and 
Samet [6]. They describe ample results concern- 
ing practical performance of PMR quadtrees in 
terms of construction time, insertions and bulk 
insertions, comparison with R-tree bulk-loading, 
and, as an application, performance of spatial join 
using quadtrees to store the datasets. Their test 
data consists of TIGER datasets ranging from 
40K lines to approximately 260K edges on a 
machine with 64MB RAM. 

Agarwal et al. [1] implemented and tested 
their 1/O-efficient point quadtree part of an ap- 
plication to interpolate LIDAR datasets, where it 
was used specifically for batched neighbor find- 
ing (finding the points in all neighbor leaves for 
each quadtree leaf). The algorithms are scalable 
up to at least 500 million points (20GB raw data) 
(their platform was an Intel 3.4GHz machine with 
1GB RAM running Linux). 

Haverkort et al. [4] described an experimental 
analysis of K-quadtrees reporting on the con- 
struction time and size of the quadtree (number 
of cells and number of edge-cell intersections) for 
various values of k, as well as computing a spatial 
join using K-quadtrees. The K-quadtree construc- 
tion algorithm is efficient and scalable, with the 
running time getting faster as more points are 
packed into a leaf. Even though the number of 
edges intersecting a cell may be large, the average 
size of a cell stays low and the total size of 
the quadtree is linear. Their tests use TIGER 
data with the largest bundle, corresponding to 
the entire USA, having approximately 427 mil- 
lion edges, on a machine with 512MB RAM. 
A comparison with the PMR quadtree results of 
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Hjaltason and Samet [6] is difficult because of the 
difference in platforms. 


Extensions 


A series of recent results have shown that com- 
pressed quadtrees and Delaunay triangulations 
are equivalent structures, in the sense that a com- 
pressed quadtree of a set of points P in the 
plane can be computed in linear time given the 
Delaunay triangulation DT(P); and the other 
way around, the Delaunay triangulation DT(P) 
can be computed in linear time given a com- 
pressed quadtree of P; see, for example, Loffler 
and Mulzer [8]. In the I/o-model, both problems 
can be solved in O(sort(7)) 1/0’s. This naturally 
brings the question of whether one can be com- 
puted from the other in O(scan(7)) 1/0’s. 
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Problem Definition 


A network representation of a complex system 
comprises nodes, which represent system 
elements, and edges, which represent interactions 
between the elements. Networks may _ be 
described in terms of their topology; for instance, 
some nodes may be connected to an atypically 
large number of other nodes, and some may act 
as bridge nodes that participate in paths between 
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Quantification of Regulation in Networks with 
Positive and Negative Interaction Weights, Fig. 1 
Common network measures applied to a sample 9-node 
network with symmetric interactions. Darker nodes have 
higher betweenness centrality (i.e., they tend to act as a 
bridge between other pairs of nodes); note that even nodes 
with low degree (i.e., few connections) may have high 
betweenness centrality. Highlighted edges show a shortest 
path (length 4) between nodes | and 8 


many other pairs of nodes (Fig. 1). For a review 
of topological network measures, see [1-3]. 

In some contexts, this topological structure 
serves as a basis for a dynamical description, 
where nodes are characterized by a dynamic 
variable that is regulated by the node’s interac- 
tions. For instance, in the Boolean framework, 
nodes are either ON or OFF (1 or 0, respec- 
tively) [4]. In biological regulatory networks, 
where interactions between system elements can 
represent both upregulation and downregulation, 
one common dynamic scheme is summative [5]: 


xj (t+1) =sgn{ D> Ejixi@]. 
j 


where F;,; is the weight of the interaction from 
node jto node i and absent interactions have 
a weight of 0 by definition. In such a frame- 
work, the state change of a node can propagate 
to the node(s) it directly regulates, then to the 
node(s) they regulate, and so on. This information 
flow across a network is sometimes referred to 
as network communicability [6]. A topological 
analysis of the network should, in principle, give 
insight into its dynamical structure and address 
questions such as, “Which nodes yield a strong 
influence on many other nodes?,” “Which nodes 
are regulated in a complex way by many other 
nodes?,” and “Which nodes seem to have a pe- 
ripheral impact on the dynamics of the network?” 

However, while networks have been used to 
explicate the structure and function of a large and 
diverse array of complex systems, most network 
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measures consider the most general properties of 
networks and are therefore ill-suited for appli- 
cation to specialized networks. The positive and 
negative edge weights typically used in biological 
regulatory networks are one such specialization: 
standard network measures do not consider edge 
weights of opposite sign and are therefore ill- 
equipped to fully capture the dynamical implica- 
tions of their topology. 

Here, we address this shortcoming by devel- 
oping a suite of topological measures that address 
the regulatory relationship between nodes that are 
connected by edges with both positive and neg- 
ative edge weights. We first consider node-node 
interactions and then summarize those measures 
to quantify both the regulatory impact of a node 
on the entire network and of the entire network 
on a node. 


Key Results 


To first consider node-node relationships, we 
introduce two complementary measures. The 
weighted node-node path count from node i to 
node j considers both the number of paths from 
node i to node j and their length: 


ae > Pig + Phij 

. 1=1 i 
Here, Phi and pj; respectively indicate the num- 
ber of positive and negative paths from node i to 
node j of length /. While we here consider a path 
to be positive if it contains 0 or an even number of 
negative edges, note that this measure effectively 
ignores the sign of the paths. To take this into 
consideration, we introduce the node-node path 
influence: 


mj; is therefore bounded by the range [—a@;;, @;;]. 
@;; indicates the regulatory strength insofar as it 
is large when there are many short paths between 
the nodes and decreases when the paths are few 
and/or long; 7; indicates the overall regulatory 
nature of those interactions. Values close to 
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0 relative to |w;;| indicate mixed (complex) 
regulation, while values close to |@j;;| indicate 
overall positive or negative regulation. 
Node-network relationships may be assessed 
by cumulating these measures with a fixed source 
or target node. The node path influence, :;, 
and node path susceptibility, o;, take this into 
account for a fixed source and target node, respec- 


tively: 
G = Do moi 
J 


o> > Wij @ij- 
i 


The summative product results in large absolute 
values for these measures only when the regula- 
tion is both strong and consistent in sign. Nodes 
receiving low values are regulated weakly and/or 
in a complex way. 


N = number of nodes in graph 


a= (} 


for node in graph: d[node] = 


ies ia a ay i 
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In cases where edge weights take on values 
other than +1, the p;;; values may readily be 
modified to, for instance, the sum of the mean 
interaction weights of the pertinent paths. This 
modification reduces to the above definition when 
edge weights are restricted to +1. In both cases, 
however, the above measures are characterized by 
the parameter /max, which represents the longest 
path considered by the algorithm. Counting all 
paths of arbitrary length for all but the simplest 
networks is computationally intractable, and so in 
practice /imax must generally be a low number. We 
therefore introduce a complementary measure, 
strength of connection, which considers paths 
of arbitrary length through network erosion. The 
measure is determined for any two nodes i and 
j via a procedure that assigns every node a char- 
acteristic value. In the below pseudo-algorithm, 
these values are stored in a dictionary d. 


infinity 


while a path exists between iand j: 


SP = 
SPL = 
if SPL == 1: 
delete edge between iand j 
d{[i] = d[j] = SPL 
else: 
for node in SP: 
if d[node] == 
if (node != i and node 


infinity: d[node] = 
ae pe 


[nodes on the shortest path between iand j] 
length of shortest path between iand j 


SPL 
remove node from graph 


return sum(1/(values in d))/(N/2) 


else: 


while a cycle containing i exists: 


SP = 
SPL = 
if SPL == 1: 
delete self-edge 
d[i] = SPL 
else: 
for node in SP: 
if d[node] 
if node != i: 


== infinity: d[node] = 


[nodes on the shortest cycle containing node i] 
length of shortest cycle containing node i 


SPL 


remove node from graph 


return sum(1/(values in d))/((N+2)/2) 
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Quantification of Regulation in Networks with 
Positive and Negative Interaction Weights, Fig. 
2 Adapted from Figs. | and 2 of [7]. (a) A fully 
connected 5-node network. Solid black arrows indicate 
positive regulation, while dashed, red arrows indicate 
negative regulation. (b) A circle at position i, 7 has 
a size proportional to w;;(max(w@;;) = 2.75) and 
color determined by 77; / @;; , With positive, neutral, 


The normalization factors force the returned 
value to be bounded by | [7]. While this al- 
gorithm does not consider the sign of paths, 
it is straightforward to modify it to, e.g., in- 
clude only those paths that are of the specified 
sign. Such a modification would then yield both 
a positive strength of connection and a nega- 
tive strength of connection. We demonstrate the 
above-defined measures for a simple network in 
Fig. 2. 
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and negative sign corresponding to green, black, or red 
coloring, respectively. Circles are additionally identified 
with a small white concentric circle if 10; ; [ij < —-0.2. 
(c) A scatter plot of node path influence, 4, and node path 
susceptibility, o. (d) The strength of connection measure 
indicates which node pairs remain well connected under 
network erosion; the values vary significantly despite each 
node having equal degree and the network being strongly 
connected 


Applications 


The analytical measures introduced above may 
be applied to any network with both positive and 
negative edge weights. Biological regulatory 
networks are a prime example of complex 
systems that are often modeled in this way. 
For example, the measures have been applied 
to explicate the regulatory cross talk of a 
network of the immune response responding 
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to both respiratory bacteria and allergen [7]. The 
measures stand to inform the dynamical regula- 
tion between nodes from a strictly topological 
perspective and thereby (1) provide insight into 
systems where the dynamic behavior is poorly 
understood and (2) complement dynamic analysis 
in systems where the regulatory behavior is 
understood. 


Open Problems 


The methodology discussed here considers the 
topology of a network with weighted positive and 
negative interactions. However, network analy- 
sis often involves an investigation of network 
dynamics, where the details of the interactions 
encoded in the network topology play a pivotal 
role. The role of network topology in constraining 
network dynamics is an active area of study (see, 


e.g., [8]). 
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Problem Definition 


In the element distinctness problem, one is given 
a list of N elements x1,...,xy € {1,...,m} 
and one must determine if the list contains two 
equal elements. Access to the list is granted by 
submitting queries to a black box, and there are 
two possible types of query. 

Value Queries. In this type of query, the 
input to the black box is an index i. The black 
box outputs x; as the answer. In the quantum 
version of this model, the input is a quantum 
state that may be entangled with the workspace 
of the algorithm. The joint state of the query, 
the answer register, and the workspace may be 


represented as }° dj,y,z|i, y,z), with y being an 
1,Y.Z 
extra register which will contain the answer to 


the query and z being the workspace of the al- 
gorithm. The black box transforms this state into 
> 4i,y,zli. (vy + xi) mod m,z). The simplest 


iY, 
particular case is if the input to the black box is of 


the form >> a;|i,0). Then, the black box outputs 


l 
>= a; |i, x;). That is, a quantum state consisting of 


L 

the index 7 is transformed into a quantum state, 
each component of which contains x; together 
with the corresponding index 7. 
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Comparison Queries. In this type of query, 
the input to the black box consists of two indices 
i, 7. The black box gives one of the three possible 
answers: “x; > x7", “x; <x; or “xj = x;.? In 
the quantum version, the input is a quantum state 
consisting of basis states |i, y,z), with 7, 7 being 
two indices and z being algorithm’s workspace. 

There are several reasons why the element 
distinctness problem is interesting to study. First 
of all, it is related to sorting. Being able to 
sort x;,...,Xy enables one to solve the ele- 
ment distinctness by first sorting x1,...,xy in 
increasing order. If there are two equal elements 
x; = x,, then they will be the next one to 
another in the sorted list. Therefore, after one 
has sorted x1,...,xy, one must only check the 
sorted list to see if each element is different from 
the next one. Because of this relation, the element 
distinctness problem captures some of the same 
difficulty as sorting. This has led to a long line 
of research on classical lower bounds for the 
element distinctness problem (cf. [5, 11,21] and 
many other papers). 

Second, the central concept of the algorithms 
for the element distinctness problem is the notion 
of a collision. This notion can be generalized in 
different ways, and its generalizations are use- 
ful for building quantum algorithms for various 
graph-theoretic problems (e.g., triangle finding 
[18]) and matrix problems (e.g., checking matrix 
identities [12]). 

A generalization of element distinctness is ele- 
ment k-distinctness [3], in which one must deter- 
mine if there exist k different indices i1,...,ig € 
{1,..., N} such that xj; = xj2 =... 
A further generalization is the k-subset finding 
problem [14], in which one is given a func- 


= Nik- 


tion f(y1,...,¥%) and must determine whether 
there exist ij,...,i, € {1,...,N} such that 
FI (Xin, Xi2,---,Xik) = 1 (where x1,...,xy are 
the input data). 

Key Results 


Element Distinctness: Summary of Results 
In the classical (non-quantum) context, the 
natural solution to the element distinctness 
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problem is done by sorting, as described in 
the previous section. This uses O(N) value 
queries (or O(N log N) comparison queries) 
and O(N log N) time. Any classical algorithm 
requires Q2(1V) value or Q(N log NV) comparison 
queries. If the algorithm is restricted to o(N) 
space, stronger lower bounds are known [21]. 

In the quantum context, Buhrman et al. [13] 
gave the first nontrivial quantum algorithm, using 
O(N?/4) queries. Ambainis [3] then designed a 
new algorithm, based on a novel idea using quan- 
tum walks. Ambainis’ algorithm uses O(N2/?) 
queries and is known to be optimal: Aaronson and 
Shi [1, 2, 15] have shown that any quantum algo- 
rithm for element distinctness must use Q(N2/?) 
queries. 

For quantum algorithms that are restricted to 
storing r values x; (where r < N at 3), the best 
algorithm runs in O(N/,/T) time. 

All of these results are for value queries. They 
can be adapted to the comparison query model, 
with a log N factor increase in the complexity. 
The time complexity is within a polylogarithmic 
O(log’ N) factor of the query complexity, as 
long as the computational model is sufficiently 
general [3]. (Random access quantum memory 
is necessary for implementing any of the known 
quantum algorithms.) 

Using the quantum walk methods, one can 
also solve the k-distinctness problem [3]. This 
gives a quantum algorithm for k-distinctness 
(and k-subset finding) that uses O(N*/&+)) 
value queries and O(N*/&+))) memory. For 
the case when the memory is restricted to 
r < N*/&+) values of x;, It suffices to use 
O(r + (N¥/?)/(r&-Y/2)) value queries. The 
results generalize to comparison queries and 
time complexity, with a polylogarithmic factor 
increase in the time complexity (similarly to the 
element distinctness problem). For the k-subset 
finding problem, Belovs and Rosmanis [8] have 
shown that there is a function f(91,..., yx) 
for which Q(N*/&+)) queries are also 
necessary. 

For the k-distinctness problem, a_ better 
quantum algorithm has been recently developed 
by Belovs [6], using the learning graph approach. 
It solves 3-distinctness using O(N*/7) value 
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_2k-2 
queries and k-distinctness using O (w 1 oR 
value queries. The algorithm for 3-distinctness 
can be implemented so that it runs in time 
O(N*/7log°N) [9]. It is an open problem to 
construct a time-efficient implementation for 
k > 3. 


Element Distinctness: The Methods 

Ambainis’ algorithm has the following structure. 
Its state space is spanned by basic states |7'), for 
all sets of indices T C {1,..., N} with |7| =r. 
The algorithm starts in a uniform superposition of 
all |) and repeatedly applies a sequence of two 
transformations: 


1. Conditional phase flip: |7) — —|T) for all T 
such that T contains 7, j with x; = x;, and 
|T7) — |T) for all other T; 

2. Quantum walk: perform O(./r) steps of quan- 
tum walk, as defined in [3]. Each step is a 
transformation that maps each |7) to a com- 
bination of basis states |7’) for T’ that differ 
from T in one element. 


The algorithm maintains another quantum regis- 
ter, which stores all the values of x;,i € T. This 
register is updated with every step of the quantum 
walk. 

If there are two elements 7, 7 such that x; = 
x;, repeating these two transformations O(N/r) 
times increases the amplitudes of |7) containing 
i, j. Measuring the state of the algorithm at 
that point with high probability produces a set T 
containing 7, 7. Then, from the set 7, we can find 
i and j. 

The basic structure of [3] is similar to Grover’s 
quantum search, but with one substantial differ- 
ence. In Grover’s algorithm, instead of using a 
quantum walk, one would use Grover’s diffusion 
transformation. Implementing Grover’s diffusion 
requires (2(r) updates to the register that stores 
x;,i € T. In contrast to Grover’s diffusion, each 
step of quantum walk changes 7’ by one element, 
requiring just one update to the list of x;,i € T. 
Thus, O(./r) steps of quantum walk can be per- 
formed with O(./r) updates, quadratically better 
than Grover’s diffusion. And, as shown in [3], 
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the quantum walk provides a sufficiently good 
approximation of diffusion for the algorithm to 
work correctly. 

This was one of the first uses of quantum 
walks to construct quantum algorithms. 
Ambainis, Kempe, and Rivosh [4] then gener- 
alized it to handle searching on grids (described 
in another entry of this encyclopedia). Their al- 
gorithm is based on the same mathematical ideas, 
but has a slightly different structure. Instead of 
alternating quantum walk steps with phase flips, it 
performs a quantum walk with two different walk 
rules — the normal walk rule and the “perturbed” 
one. (The normal rule corresponds to a walk 
without a phase flip and the “perturbed” rule 
corresponds to a combination of the walk with a 
phase flip.) 


Generalization to Arbitrary Markov Chains 
Szegedy [20] and Magniez et al. [19] have 
generalized the algorithms of [4] and [3], 
respectively, to speed up the search of an 
arbitrary Markov chain. The main result of [19] is 
as follows. 

Let P be an irreducible Markov chain with 
state space X. Assume that some states in the 
state space of P are marked. Our goal is to find 
a marked state. This can be done by a classical 
algorithm that runs the Markov chain P until it 
reaches a marked state (Algorithm 1). 

There are three costs that contribute to the 
complexity of Algorithm 1: 


1. Setup cost S: the cost to sample the initial 
state x from the initial distribution. 

2. Update cost U: the cost to simulate one step 
of a random walk. 

3. Checking cost C: the cost to check if the 
current state x is marked. 


The overall complexity of the classical algorithm 
is then S + f2(t;U + C). The required ¢, and fy 
can be calculated from the characteristics of the 
Markov chain P. Namely, 


Proposition 1 ({19]) Let P be an ergodic, yet 
symmetric Markov chain. Let 6 > 0 be the 
eigenvalue gap of P, and assume that whenever 
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Algorithm 1: Search by a classical random 
walk 
1. Initialize x to a state sampled from some initial 
distribution over the states of P. 
2. tz times repeat: 
(a) Ifthe current stage y is marked, output y and 
stop; 
(b) Simulate ¢; steps of random walk, starting with 
the current state y. 
3. Ifthe algorithm has not terminated, output “no 
marked state.” 


the set of marked states M is nonempty, we have 
|M|/|X| > €. Then there are ty = O(1/6) 
and t2 = O(1/e) such that Algorithm I finds a 
marked element with high probability. 

Thus, the cost of finding a marked element 
classically is O(S + 1/e(1/5U + C)). Magniez 
et al. [19] construct a quantum algorithm that 
finds amarked element in O(S’ +1/€(1/V8U! + 
C’)) steps, with S', U', and C’ being quantum 
versions of the setup, update, and checking costs 
(in most of applications, these are of the same 
order as S, U, and C ). This achieves a quadratic 
improvement in the dependence on_ both 
e and 8. 


The element distinctness problem is solved 
by a particular case of this algorithm: a search 
on the Johnson graph. The Johnson graph is the 
graph whose vertices v7 correspond to subsets 
T C {1,...,N} of size |T| = r. A vertex vr 
is connected to a vertex Vrs if the subsets T and 
T’ differ in exactly one element. A vertex vr is 
marked if T contains indices i, j with x; = x;. 

Consider the following Markov chain on the 
Johnson graph. The starting probability distribu- 
tion s is the uniform distribution over the vertices 
of the Johnson graph. In each step, the Markov 
chain chooses the next vertex v7 from all ver- 
tices that are adjacent to the current vertex vr, 
uniformly at random. While running the Markov 
chain, one maintains a list of all x;,i € T. This 
means that the costs of the classical Markov chain 
are as follows: 


e Setup cost of S = r queries (to query all 
x;,1 € T where vr is the starting state). 
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Update cost of U = 1 query (to query the 
value x;,i € T’ — T, where v7 is the vertex 
before the step and v‘, is the new vertex). 

¢ Checking cost of C = 0 queries (the values 
x;,i € T are already known to the algorithm, 
and no further queries are needed). 


The quantum costs S’, U’, and C’ are of the same 
order as S$, U, andC. 

For this Markov chain, it can be shown that the 
eigenvalue gap is 6 = O(1/r) and the fraction of 
marked states is € = O((r?)/(N7)). Thus, the 
quantum algorithm runs in time 


O (s + = ([0" + c’)) 
=O (s + vi (=0" + c’)) 


N 
=O (« + =) ‘ 

Learning Graphs 

Another framework that generalizes the element 
distinctness is the learning graphs by Belovs [6]. 
A learning graph is a structure that describes al- 
gorithm’s information about the input data. Using 
this approach, many quantum algorithms can be 
described as sequences of high-level instructions 
(which can be compiled into a standard quantum 
query algorithm). For example, the element dis- 
tinctness algorithm corresponds to a sequence of 
three operations: 


1. Load O(N2/?) values x; for randomly chosen 
ie {1,2,...,N}. 

2. Load one of the two equal elements x;. 

3. Load the other equal element x ;. 


Belovs [6] describes rules for determining the 
complexity of each step. In the algorithm above, 
the complexities are O(N7?/3),O(./N), and 
O(N7/3), respectively. This results in the same 
overall complexity of O(N2/?). 

The learning graph approach has been used 
to construct new quantum algorithms for k- 
distinctness [7], triangle-finding [6], and other 
tasks. 
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Applications 


Magniez et al. [19] showed how to use the ideas 
from the element distinctness algorithm as a sub- 
routine to solve the triangle problem. In the 
triangle problem, one is given a graph G on n 
vertices, accessible by queries to an oracle, and 
they must determine whether the graph contains 
a triangle (three vertices v1, V2, v3 with v1 v2, 
V1 V3, and v2 v3 all being edges). This prob- 
lem requires Q(n”) queries classically. Magniez 
et al. [19] showed that it can be solved using 
O(n!3 log®n) quantum queries, with a modifi- 
cation of the element distinctness algorithm as a 
subroutine. This has been improved by several 
authors. Currently, the best quantum algorithm 
for triangle finding is by Le Gall [17] which uses 
O(n!*> log°n) queries. It is also based on the 
quantum walks but uses them in a much more 
complex way. 

The methods of Szegedy [20] and Magniez 
et al. [19] can be used as subroutines for quantum 
algorithms for checking matrix identities [12, 18]. 

Bernstein et al. [10] have used the element 
distinctness algorithm to design a quantum algo- 
rithm for the subset sum problem, by combining 
the element distinctness algorithm with ideas 
from classical algorithms for subset sum. The 
resulting algorithm solves the subset sum prob- 
lem for n numbers in 2-241+0()" time steps, 
under some heuristic assumptions that are similar 
to the ones that are assumed for classical subset 
sum algorithms. The best classical algorithm uses 
200.291+0(1))" time steps. 


Open Problems 


1. How many queries are necessary to solve the 
element distinctness problem if the memory 
accessible to the algorithm is limited to r 
items, r < N?/3? The algorithm of [3] gives 
O(N/./r) queries, and the best lower bound 
is Q(N2/?) queries. 

2. Consider the following problem: 

Graph collision [18]. The problem is 
specified by a graph G (which is arbitrary 
but known in advance) and_ variables 
x1,...,xN € {0,1}, accessible by queries 
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to an oracle. The task is to determine if G 


contains an edge uv such that x, = xy = 1. 
How many queries are necessary to solve this 
problem? 


The element distinctness algorithm can be 
adapted to solve this problem with O(N2/?) 
queries [18], but there is no matching lower 
bound. Is there a better algorithm? A better al- 
gorithm for the graph collision problem would 
immediately imply a better algorithm for the 
triangle problem. 
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Problem Definition 


Every positive integer 1 has a unique decompo- 
sition as a product of primes n = p{i--: Py, 
for prime number p;, and positive integer 
exponent e;. Computing the decomposition 
Pi,€1,--- the factoring 
problem. 

Factoring has been studied for many hundreds 
of years, and exponential time algorithms for it 
were found to include trial division, Lehman’s 
method, Pollard’s p method, and Shank’s class 
group method [1]. With the invention of the RSA 
public-key cryptosystem in the late 1970s, the 
problem became practically important and started 
receiving much more attention. The security of 
RSA is closely related to the complexity of fac- 
toring, and in particular, it is only secured if 
factoring does not have an efficient algorithm. 


,Pe,ex from n is 


The first subexponential-time algorithm is due 
to Morrison and Brillhard [4] using a continued 
fraction algorithm. This was succeeded by the 
quadratic sieve method of Pomerance and the 
elliptic curve method of Lenstra [5]. The number 
field sieve [2, 3], found in 1989, is the best- 
known classical algorithm for factoring and runs 
in time exp(c(logn)!/3(log logn)?/?) for some 
constant c. Shor’s result is a polynomial-time 
quantum algorithm for factoring. 


Key Results 


Theorem 1 ((2, 3]) There is a subexponential- 
time classical algorithm that factors the integer 
n in time exp(c(logn)!/3 (log log n)?/9). 


Theorem 2 ((6]) There is a_ polynomial-time 
quantum algorithm that factors integers. The al- 
gorithm factor n in time O((logn)?(logn logn) 
(log log logn)) plus polynomial in logn post- 
processing which can be done classically. 


Applications 


Computationally hard number theoretic problems 
are useful for public-key cryptosystems. 
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The RSA public-key cryptosystem, as well as 
others, requires that factoring not to have an 
efficient algorithm. The best-known classical 
algorithms for factoring can help determine how 
secure the cryptosystem is and what key sizes to 
choose. Shor’s quantum algorithm for factoring 
can break these systems in polynomial time using 
a quantum computer. 


Open Problems 


It is open whether there is a polynomial-time 
classical algorithm for factoring. 
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Problem Definition 


A triangle is a clique of size three in an undi- 
rected graph. Triangle finding has been the sub- 
ject of extensive study as a basic search problem 
whose quantum query complexity is still open, in 
contrast to unstructured search [6] and element 
distinctness [1]. 

This survey quantum query 
algorithms for triangle finding. A quantum query 
algorithm for a search problem P = {Myf} ¢ 
is a sequence of unitary operators Qf = 
Uz, O ¢Up_-1O 7 U1 O7Uo such that if Mr x QO, 
measuring |r) = Q |0) yields a member of 
Mf, the set of objects associated with input /, 
with probability > 2/3. The operators Of are 
oracle queries, Of : |x)\a) > |x)|a ® f(x)), 
which yield information about f, whereas the 
U; are independent of f. The quantum query 
complexity of P is the minimum number of 


concerms 


Quantum Algorithm for Finding Triangles 


oracle queries required by a quantum query 
algorithm for P. 

In the context of triangle finding, the function 
f is the adjacency matrix of an undirected graph 
on vertices [n], G C [n]*, with m = |G| edges, 
where (a,b) € G => (b,a) € G, by convention. 
The associated set, Mg, is the set of triangles 
inG. 


Problem 1 (Triangle finding) 


INPUT: The adjacency matrix f of a graph G on 
n vertices. 

OutTpuT: A triangle: (a,b,c) € [n]? such that 
(a,b), (b,c), (a,c) € G, if one exists. 


A lower bound of {2(m) on the quantum 
query complexity of the triangle finding problem 
follows from a reduction from search [5]. 
It is easy to see that the randomized query 
complexity of the triangle finding problem is 


O(n). 


Key Results 


Progress on the quantum query complexity 
of triangle finding has closely followed the 
development of quantum algorithmic techniques 
for search problems. The first upper bounds 
were based on increasingly clever use of 
the structure of the problem, combined with 
amplitude amplification [4]. The first bound to go 
beyond the amplitude amplification framework, 
achieving O(n!3/!°) [10], was one of the 
first applications of the quantum walk search 
technique introduced by Ambainis in his element 
distinctness algorithm [1] and extended in [13] 
and [11]. The next bound of O(n35/27) was the 
first application of a new quantum algorithmic 
technique, the learning graph framework [3]. 
This finding led to the development of extensions 
to the quantum walk search technique to give 
a O(n3°/27) quantum walk algorithm for 
triangle finding [7]. The next improvement to 
O(n?/”) also used the learning graph framework 
[9], whereas the most recent upper bound of 
O(n5/4) uses, once again, a quantum walk search 
algorithm [8]. 


1653 


An O(n + /nm) Algorithm Using 
Amplitude Amplification 
A trivial application of Grover’s quantum search 
algorithm solves the triangle finding problem 
with O(n?/2) quantum queries by searching over 
[n]°. Buhrman et al. [5] improved this upper 
bound in the special case where G is sparse (i.e., 
m = o(n?)). 

The algorithm searches for an edge (a,b) € 
G in O(V|[n]?|/m) = O(n//m) quantum 
queries and then for c € [n] such that (a,b,c) 
is a triangle in O(./n) quantum queries. The 
second step succeeds when (a,b) is a triangle 
edge, which happens with probability at least 
1/m when G contains a triangle, so applying 
amplitude amplification to this procedure gives a 
O(Vm(n/Jm + J/n)) = O(n + ./nm) upper 


bound: 


Theorem 1 (Buhrman et al. [5]) Using quan- 
tum amplitude amplification, the triangle finding 
problem can be solved in O(n + ./nm) quantum 
queries. 


An O(n!°/7) Algorithm Using Amplitude 
Amplification 

The algorithm of Szegedy et al. [10, 12] is also 
based on amplitude amplification; however, it 
exploits additional combinatorial structure in the 
triangle finding problem. 

For A C [n] and w ¢€ [n], define Ag (A, w) := 
{(u,v) € A* : (u,w),(v,w) € G}. Choose 
a random subset X C [n] of size n* logn, for 
x = 3/7. Query X x [n] and search for an 
edge in Ex := Uvex Ac((n].w), which can be 
determined from GN (X x [n]), using O(|X |n + 
/\Ex|) = O(n'*%) queries. Either a triangle is 
found, or Ex NG =89. 

Let G’ := [n]? \ Ex. If a triangle is not found 
in the first step, then G C G’. Fixa = 6B = 1/7. 
Szegedy et al. show that for most X, G’ can be 
partitioned into (T, E), such that T has O(n3~%) 
triangles and |E MN G| = O(n?~8 4+. n2-x4 4B), 
in O(n'++8) queries (or a triangle is found 
in the process). If G C G’, any triangle in G 
either lies in JT, in which case it can be found 
in O(/n3-“) queries using quantum search, or 
intersects E, in which case it can be found in 
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O(n + V/n|G 2 E}|) queries using the algorithm 
of Buhrman et al. This gives the following: 


Theorem 2 (Szegedy; Magniez, Santha and 
Szegedy [10, 12]) Using amplitude amplifica- 
tion, the triangle finding problem can be solved 
in O(n!9/7) quantum queries. 


An O(n!3/1) Algorithm Using Quantum 
Walks 

A more efficient algorithm for the triangle finding 
problem was obtained by Magniez et al. [10], 
using the quantum walk search technique intro- 
duced by Ambainis [1]. 

Given oracle access to a function defining a 
relation M C [n]*, Ambainis’ quantum walk 
search procedure finds (a;,...,ax) € M if 
M # 9. The algorithm walks on sets A C [n] 
of size n®, keeping track of some data structure 
D(A) for the current state A and transitioning, 
in superposition, from A to A’ for A’ C [n] of 
size n® such that |A \ A’| = 1. Assume access 
to a quantum procedure @® that determines if 
Ak OM # O using D(A), with checking cost C 
queries. Suppose D(A) can be constructed from 
scratch at setup cost S queries and modified from 
D(A) to D(A’) when |A \ A’| = 1 at an update 
cost U. Then the procedure finds an element of 
M in O(S + (4)? (/n®U + C)) quantum 
queries. (For details, see the encyclopedia entry 
on element distinctness.) 

For a fixed graph G C [n]?, consider the graph 
collision problem on G, where an input f defines 
the binary relation My € [n]* satisfying (uw, u’) € 
My if fy) = fw) = 1 and (u,v) € G. 
Setting k = 2, it is a simple exercise to see 
that a quantum walk search algorithm solves this 
problem with O(n® + (4)(Vne - 1+0)) = 
O(n® + n!—*/?) queries. Setting « = 2/3 gives 
an upper bound of O(n?/3) quantum queries for 
graph collision. 

Magniez et al. [10] solve triangle finding using 
a quantum walk algorithm whose checking sub- 
routine is based on graph collision. Let M be the 
set of triangle edges. Define D(A) = GN A?. 
Then S = n@ initial queries are needed to set 
up D(A), and U = n® new queries are needed to 
update D(A), where a is now 3/5. The check- 
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ing step consists of an algorithm that, given a 
known subgraph H = G1 A? on n® vertices, 
decides if H contains a triangle edge using C = 
O(./nn?/3) queries, as follows. For any v € [n], 
define fy on A by fy(u) = Lif (u,v) € G. An 
edge (a,b) € A? is a graph collision in fy on 
G1 A? if and only if (a,b, v) is a triangle, so 
searching for v € [n] for which f, has a graph 
collision, using O(./n(n%)?/+) quantum queries, 
is equivalent to deciding if G M A? contains a 
triangle edge. Repeat O(log 1) times, to decrease 
the error to 1/ ne, since the subroutine is called 
many times. This gives the following: 


Theorem 3 (Magniez, Santha, and Szegedy 
[10]) Using a quantum walk search procedure, 
the triangle finding problem can be solved in 
O(n!3/1©) quantum queries. 


An O(n35/27) Algorithm Using Learning 
Graphs 

The learning graph framework, introduced by 
Belovs [3], allows for the construction of a 
quantum algorithm from a particular type of 
edge-weighted graph called a learning graph. For 
further details, refer to [3]. The first application 
of this framework was a new upper bound 
on the quantum query complexity of triangle 
finding. 

A learning graph may be constructed in stages, 
corresponding to searching for more and more 
specialized structures, which will eventually con- 
tain a |-certificate for the problem being solved. 
In Belovs’ application to triangle finding, the first 
part of the learning graph corresponds to search- 
ing for an n%-vertex subset of [n], A, containing 
two triangle vertices a and b. The next two stages 
correspond to searching for an n?%~°-edge graph 
on A, H, which contains the triangle edge {a, b}. 
The final stages correspond to the graph collision 
subroutine used in [10] to decide if any edge of 
the queried subgraph # is a triangle edge. Using 
a = 2/3 ando = 1/27 gives the following: 
Theorem 4 (Belovs [3]) Using a learning graph 
algorithm, the triangle finding problem can be 


solved in O(n35/27) = O(n'-?9%) quantum 


queries. 
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Additionally, a quantum walk search algorithm 
based on this learning graph construction solves 
triangle finding in O(n35/27) queries [7]. 


An O(n?/7) Algorithm Using Learning 

Graphs 

The next upper bound on the quantum query 
complexity of triangle finding, due to Lee et al. 
[9], also uses a learning graph. The first part of 
their learning graph corresponds to searching for 
an n®-vertex subset A C [n], containing a triangle 
vertex a. The next part corresponds to searching 
for an n*-vertex subset B C [n], containing a 
vertex, b, from the same triangle as a. The final 
part corresponds to the graph collision subroutine 
used in [10], but optimized for an unbalanced 
bipartite graph, used to decide if any edge of 
GN (A x B) is a triangle edge. Using a = 4/7 
and 6B = 5/7 gives the following: 


Theorem 5 (Lee, Magniez and Santha [9]) 
Using a learning graph algorithm, the triangle 
finding problem can be solved in O(n9/7) = 
O(n!-28°8) quantum queries. 


As with the previous algorithm, there exists a 
quantum walk search algorithm based on this 
learning graph construction that solves triangle 
finding in O(n9/7) queries [7]. 


An O(n>/4) Algorithm Using Quantum 

Walks 

The best known upper bound on the quantum 
query complexity of triangle finding is an al- 
gorithm by Le Gall [8]. Le Gall’s algorithm 
uses the quantum walk search technique, as in 
the O(n!3/!°)-query algorithm, combined with 
a more clever utilization of the combinatorial 
structure of triangle finding, similar to that of the 
O(n19/ 7)-query algorithm, and a quantum search 
algorithm of Ambainis that finds an x such that 
@(x) = lin cost O(,/>_, C(x)?), where C(x) 
is the cost to compute ®(x) [2]. 

The algorithm begins, like the O(n!°/7) algo- 
rithm, by choosing a random X C [n] of size 
n* logn and searching for a triangle in X x [n]?. 
This is done by quantum search on X x [n]?, 
using O(/|X x [n]?|) = O(n'+%/2) quantum 


queries. If no triangle is found, as in the O(n!°/7) 
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algorithm, the rest of the algorithm will make use 
of the fact that Ey 9G = Q, although in this 
case, since X x [n] is not queried, Ey is not 
known. 

The rest of the algorithm consists of the fol- 
lowing four levels of recursion: 


1. Using a quantum walk search algorithm, 
search for a set A C [n] of size n® such 
that A? contains a triangle edge. Maintain a 
data structure, D(A), encoding GM (A x X). 

2. For any A C [nl], to check if A? contains a 
triangle edge, search for a vertex c € [n] such 
that A? x {c} contains a triangle. 

3. For any A C [n] andc e€ [n], to check if 
A” x {c} contains a triangle, use a quantum 
walk search algorithm to search for a set B C 
A of size n® such that B? x {c} contains 
a triangle. Maintain a data structure, D°(B), 
encoding GM (B x {c}). 

4. For any B C [n] and c € [n], to check if B? x 
{c} contains a triangle, search for an edge in 
Ag(B,c) \ Ex. Here the algorithm exploits 
the fact that there is no edge in Ey. The set 
Ex B? can be determined from GN(Ax X). 


Constructing D(A) costs S = |A x X| = 
O(n%+%) queries. Mapping D(A) to D(A’) costs 
U = 2|X| = O(n*) queries. Let (A) = 
1 if A? has a triangle edge. Then if C is the 
quantum query complexity of computing ®(A), 
the quantum query complexity of finding a trian- 
gle in G \ Ex is O(S+ 4 (Vn"U4+C)) = 
O (n@tx 4 nitx-a/2 + nt °C) ; 

Let 4(c) = 1 if A? x {c} contains a triangle. 
To compute ®(A), search for c € [n] such that 
@a(c) = 1. Let C’(c) be the cost of computing 
@,(c), which will vary in c. Then by [2], C = 
O € Yeceln] C’(c) I; 

Let 6°(D°(B)) = 1if B*x{c} has a triangle. 
To compute ®,4(c), search for B C A such 
that d°(D°(B)) = 1. Creating D‘°(B) costs 
S’ = |B x {c}| = n® queries. Mapping D°(B) 
to D°(B’) costs U” = 2 queries. If comput- 
ing &°(D°(B)) costs C’(c), computing ®4(c) 
costs C’/(c) = S” + a (vnBu" + Cc") = 
O(n? + n®F/2 + n%-8C'(c)) queries. 
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Observe that ®°(D°(B)) = 1 if and only if 
Ag(B,c) contains an edge. Since GN Ex = 
@, one need only search Ag(B,c) \ Ey for 
an edge. The set Ag(B,c) can be determined 
from D°(B), and Ey M A” can be determined 
from D(A), so Ag(B,c) \ Ex is known. Thus, 
C'(c) = O(/|AG(B.¢) \ Ex). 

Using combinatorial arguments, Le Gall 
proves an upper bound on |Ag(B,c) \ Ex| 
relative to |Ag(A,c) \ Ex| for most B, allowing 
him to use further combinatorial arguments to 


show an upper bound on C = ,/ So cetny C'(c)? 


of O(ni/2+x 4. ni/2+B 4 ni/2+a—B/2 a 
ni/2+e-x/2) Setting y = B = 1/2 anda = 3/4 
then gives the following: 


Theorem 6 (Le Gall [8]) Using a quantum walk 
search algorithm, the triangle finding problem 
can be solved in O(n5/*) quantum queries. 


The quantum query complexity of triangle 
finding is still open, as the best known lower 
bound is 2(n). 


Cross-References 


Quantum Algorithm for Element Distinctness 
Quantum Algorithm for the Collision Problem 
Quantum Analogues of Markov Chains 


Recommended Reading 


1. Ambainis A (2007) Quantum walk algorithm for 
element distinctness. SIAM J Comput 37(1):2 10-239 

2. Ambainis A (2010) Quantum search with variable 
times. Theory Comput Syst 47(3):786-807 

3. Belovs A (2012) Span programs for functions 
with constant-sized 1|-certificates. In: Proceeding of 
STOC, New York, pp 77-84 

4. Brassard G, Hgyer P, Mosca M, Tapp A (2002) 
Quantum amplitude amplification and estimation. In: 
Quantum computation and quantum information: a 
millennium volume. AMS contemporary mathemat- 
ics series millennium volume, vol 305. American 
Mathematical Society, Providence, pp 53-74 

5. Buhrman H, Diirr C, Heiligman M, Hyer P, Santha 
M, Magniez F, de Wolf R (2005) Quantum algo- 
rithms for element distinctness. SIAM J Comput 
34(6):1324—-1330 


Quantum Algorithm for Search on Grids 


6. Grover LK (1996) A fast quantum mechanical algo- 
rithm for database search. In: Proceeding of STOC, 
Philadelphia, pp 212-219 

7. Jeffery S, Kothari R, Magniez F (2013) Nested quan- 
tum walks with quantum data structures. In: Proceed- 
ing of SODA, New Orleans, pp 1474-1485 

8. Le Gall F (2014) Improved quantum algorithm for 
triangle finding via combinatorial arguments. In: Pro- 
ceeding of FOCS, Philapelphia, pp 216-225. quant- 
ph/1407.0085 

9. Lee T, Magniez F, Santha M (2013) Improved quan- 
tum query algorithms for triangle finding and associa- 
tivity testing. In: Proceeding of SODA, New Orleans, 
pp 1486-1502 

10. Magniez F, Santha M, Szegedy M (2007) Quantum 
algorithms for the triangle problem. SIAM J Comput 
37(2):413-424 

11. Magniez F, Nayak A, Roland J, Santha M 
(2011) Search via quantum walk. SIAM J Comput 
40(1):142-164. quant-ph/0608026 

12. Szegedy M (2003) On the quantum query complexity 
of detecting triangles in graphs. quant-ph/0310107 

13. Szegedy M (2004) Quantum speed-up of Markov 
chain based algorithms. In: Proceeding of FOCS, 
Rome, pp 32-41 


Quantum Algorithm for Search on 
Grids 


Andris Ambainis 

Faculty of Computing, University of Latvia, 
Riga, Latvia 

Keywords 

Spatial search 

Years and Authors of Summarized 
Original Work 


2005; Ambainis, Kempe, Rivosh 


Problem Definition 


Consider an /N x /N grid, with each location 
storing a bit that is 0 or 1. The locations on 
the grid are indexed by (i, 7), where i,j € 
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{0,1,..., /N —1}-a;,; denotes the value stored 
at the location (i, /). 

The task is to find a location storing a;,; = 1. 
This problem is as an abstract model for search 
in a two-dimensional database, with each location 
storing a variable x;,; with more than two values. 
The goal is to find x;,; that satisfies certain 
constraints. One can then define new variables 
aj,j with a;,; = 1 if x;,; satisfies the constraints 
and search for 7, j satisfying a;,; = 1. 

The grid is searched by a “robot,” which at 
any moment of time is at one location i, 7. In 
one time unit, the robot can either examine the 
current location or move one step in one of the 
four directions (left, right, up, or down). 

In a probabilistic version of this model, the 
robot is probabilistic. It makes its decisions 
(querying the current location or moving) 
randomly according to prespecified probability 
distributions. At any moment of time, such a 
robot is at a probability distribution over the 
locations of the grid. In the quantum case, one 
has a “quantum robot” [5] which can be in a 
quantum superposition of locations (i, 7) and is 
allowed to perform transformations that move it 
at most one step at a time. 

There are several ways to make this model of 
a “quantum robot” precise [1] and they all lead to 
similar results. 

The simplest to define is the Z-local model 
of [1]. In this model, the robot’s state space is 
spanned by states |i, j,a) with i, 7 representing 
the current location and a being the internal 
memory of the robot. The robot’s state |y) can 
be any quantum superposition of those: |y) = 
> ai, jali,7j,a), where o;,;,q are complex num- 
1,J,a 
bers such that }~ |aj,;,a|? = 1. In one step, the 

1,],a 
robot can Siber pecions a query of the value at 
the current location or a Z-local transformation. 


A query is a transformation that leaves i, j 
parts of a state |7, 7,a) unchanged and modifies 
the a part in a way that depends only on the value 
aj,;. A Z-local transformation is a transformation 
that maps any state |i, j,a) to a superposition 
that involves only states with robot being either 
at the same location or at one of the four adjacent 
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locations (|i, 7,b), |i—1, 7, b), |i +1, 7,5), |i, j- 
1,5) or |i, 7 + 1,5) where the content of the 
robot’s memory J is arbitrary). 

The problem generalizes naturally to d- 
dimensional grid of size N1/4 x N‘/4 x... x 
N1/4, with robot being allowed to query or move 
one step in one of the d directions in one unit of 
time. 


Key Results 


Early Results 

This problem was first studied by Benioff [5] 
who considered the use of the usual quantum 
search algorithm by Grover [9] in this setting. 
Grover’s algorithm allows to search a collection 
of N items a;,; with O(N) queries. However, 
it does not respect the structure of a grid. Between 
any two queries, it performs a transformation that 
may require the robot to move from any location 
(i, j) to any other locations (i’, j’). In the robot 
model, where the robot in only allowed to move 
one step in one time unit, such transformation 
requires O(./N) steps to perform. Implement- 
ing Grover’s algorithm, which requires O(./N) 
such transformations, therefore, takes O(./N) x 
O(/N) = O(N) time, providing no advantage 
over the naive classical algorithm. 

The first algorithm improving over the naive 
use of Grover’s search was proposed by Aaronson 
and Ambainis [1] who achieved the following 
results: 


* Search on /N x J/N grid, if it is known 
that the grid contains exactly one a;,; = | in 
O(VN log?/? N) steps. 

* Search on /N x JN grid, if the grid may 
contain an arbitrary number of a;,; = 1 in 
O(VN log?!” N) steps. 

* Search on N!/4 x N1/4 x... N!/4 grid, for 
d > 3, in O(N) steps. 


They also considered a generalization of the 
problem, search on a graph G, in which the robot 
moves on the vertices v of the graph G and 
searches for a variable dy = 1. In one step, the 
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robot can examine the variable a, corresponding 
to the current vertex v or move to another vertex 
w adjacent to v. Aaronson and Ambainis [1] 
gave an algorithm for searching an arbitrary 
graph with grid-like expansion properties in 
O(N '/2+2()) steps. The main technique in those 
algorithms was the use of Grover’s search and its 
generalization, amplitude amplification [6], in 
combination with “divide-and-conquer” methods 
recursively breaking up a grid into smaller parts. 


Quantum Walks 

The next algorithms were based on quantum 
walks [3,7,8]. Ambainis, Kempe, and Rivosh [3] 
presented an algorithm, based on a discrete 
time quantum walk, which searches the two- 
dimensional /N x JN in O(VN log N) steps, 
if the grid is known to contain exactly one 
ai,j = 1 and in O(N log? N) steps in the 
general case. Childs and Goldstone [8] achieved 
a similar performance, using continuous time 
quantum walk. Curiously, it turned out that the 
performance of the walk crucially depended on 
the particular choice of the quantum walk, both 
in the discrete and continuous time, and some 
very natural choices of quantum walk (e.g., one 
in [7]) failed. 

Besides providing an almost optimal quantum 
speedup, the quantum walk algorithms also have 
an additional advantage: their simplicity. The 
discrete quantum walk algorithm of [3] uses just 
two bits of quantum memory. Its basis states are 
|i, j,d), where (7, 7) is a location on the grid and 
d is one of the four directions: <-, >, +, and 
{. The basic algorithm consists of the following 
simple steps: 


1 re . 
1. Generate the state %, syn lis i. 4). 


2. O(,/N log N) times repeat 


1. Perform the transformation 


At 2 4 
pen ene 
Co=d ft ae ond 
bit fe ee 
2 2. 2 2. 
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2. On the states |i, j,<-), |i, j,—), |i, 7,4), 
li, 7,), if a;,; = 0 and the transformation 
C; = —I on the same four states if a;,; = 
1. 

3. Move one step according to the direction 
register and reverse the direction: 


i,j.) > li + 1,7,.<), 
i,j.) > li -1,7,>), 
li. j,t) > li 7-14), 
lift) > li + 1,7). 


In case, if a;,; = 1 for one location (i, j), a 
significant part of the algorithm’s final state will 
consist of the four states |i, 7,d) for the location 
(i, 7) with a;,; = 1. This can be used to detect 
the presence of such location. More precisely, if 
we run the algorithm for O(,/ N log NV) steps and 
measure the state, we obtain one of the four states 
|i, j,d) with probability ©(1/ log N). 

We can increase the probability of algorithm 
finding the right location (i, j) by either repeating 
the algorithm or using quantum amplitude 
amplification. Quantum amplitude amplification 
[6] takes a quantum algorithm that succeeds 
with a small probability ¢ and increases the 
success probability to 3/4, by repeating the 
quantum algorithm O(1/./e) times. In our 
case, € = ©@(1/logN) which means that it 
suffices to repeat the basic algorithm O(,/log NV) 
times. This increases the running time from 
O(./N logN) for the basic algorithm to 
O(VN log N). 

A quantum algorithm for search on a grid can 
be also derived by designing a classical algorithm 
that finds ai, 7 = 1 by performing a random walk 
on the grid and then applying Szegedy’s general 
translation of classical random walks to quantum 
random chains, with a quadratic speedup over 
the classical random walk algorithm [15]. The 
resulting algorithm is similar to the algorithm of 
[3] described above and has the same running 
time. 

For an overview on related quantum algo- 
rithms using similar methods, see [2, 10]. 
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Further Developments 

The running time of the algorithm has been im- 
proved to O(,/N log N) time steps if the grid is 
known to contain exactly one (i, j) with a;,; = 1 
and O(N log! N) steps in the general case. 
This can be achieved in two different ways. First, 
Tulsi [16] showed how to modify the quantum 
walk so that, after O(,/ N log NV) steps, it finds 
the right (i, j) with a constant probability. This 
eliminates the need to use amplitude amplifica- 
tion. 

Second, Ambainis et al. [4] showed that the 
same result can be achieved without modifying 
the quantum walk, by a simple classical postpro- 
cessing. That is, even if the quantum walk does 
not find the right (i, 7), its final state is much more 
likely to be (i’, j’) that is close to (i, j). One 
can then run the quantum walk for O(,/ N log NV) 
steps once, measure the result, obtain a location 
(i’, j’), and search the nearby locations for (i, j) 
with a ij= 1. 

Search algorithms similar to the original 2D 
search algorithm have been analyzed for a num- 
ber of other graphs (e.g., for hierarchical net- 
works [12]). 


Applications 


The quantum algorithm for search on the grid 
by Ambainis, Kempe, and Rivosh [3] has been 
generalized by Szegedy [15], obtaining a gen- 
eral procedure for speeding up classical Markov 
chains (described in more detail in the article on 
Quantization of Markov Chains). Szegedy’s gen- 
eralization concerns a class of algorithms called 
Search by Random Walk in which one performs a 
random walk on some search space until finding 
an element with a certain property. Szegedy [15] 
showed that if a classical random walk finds a 
marked element in T steps (on average), there is 
a quantum algorithm that detects the existence of 
a marked element in O(./T) steps. 

It is an open problem to extend Szegedy’s 
algorithm so that it not only detects the existence 
of an element with the desired property but also 
finds it in O(./T) time steps. (This is known as 
the “finding problem’”’.) A step in this direction 
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was made by Magniez et al. [11] who generalized 
Tulsi’s algorithm for search on the grid [16] to 
solve the finding problem in O(./T) steps when- 
ever the classical random walk is vertex transitive 
and the search space has a unique element with 
the desired property. 

Quantum algorithms for spatial search are also 
useful for designing quantum communication 
protocols for the set disjointness problem. In 
the set disjointness problem, one has two parties 
holding inputs x € {0, 1 and y € {0, 1 and 
they have to determine if there is7 € {1,...,N} 
for which x; = y; = 1. (One can think of x and 
y as representing subsets X,Y C {1,...,N} 
with x» = 1l(y, = 1) ifi € XG e€ Y). 
Then, determining if x; = y; = 1 for some 
i is equivalent to determining if ¥ NY # 9.) 

The goal is to solve the problem, communi- 
cating as few bits between the two parties as pos- 
sible. Classically, Q(N) bits of communication 
are required [13]. The optimal quantum protocol 
[1] uses O(./N) quantum bits of communication 
and its main idea is to reduce the problem to 
spatial search. As shown by the Q(./N) lower 
bound of [14], this algorithm is optimal. 
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Problem Definition 


Pell’s equation is one of the oldest studied prob- 
lem in number theory. For a positive square-free 
integer d, Pell’s equation is x —dy” = 1, and the 
problem is to compute integer solutions x, y of 
the equation [8, 10]. The earliest algorithm for it 
uses the continued fraction expansion of Jd and 
dates back to 1000 a.d. by Indian mathematicians. 
Lagrange showed that there are an infinite num- 
ber of solutions of Pell’s equation. All solutions 
are of the form x, + ynvd = (x, + yivd)y, 
where the smallest solution, (x1, 1), is called the 
fundamental solution. The solution (x;, y,;) may 
have exponentially many bits in general in terms 
of the input size, which is log d, and so cannot be 
written down in polynomial time. To resolve this 
difficulty, the computational problem is recast as 
computing the integer closest to the regulator 
R = In + jy Jd). In this representation, 
solutions of Pell’s equation are positive integer 
multiples of R. 

Solving Pell’s equation is a special case of 
computing the unit group of number field. For a 
positive non-square integer A congruent to 0 or 1 
mod 4, K = Q(V/A) is a real quadratic number 
field. Its subring O = Z| 44/4] cS Q(VA) 
is called the quadratic order of discriminant A. 
The unit group is the set of invertible elements 
of ©. Units have the form te*, where k € Z, 
for some ¢ > 1 called the fundamental unit. 
The fundamental unit ¢ can have exponentially 
many bits, so an approximation of the regulator 
R = Ine is computed. In this representation 
the unit group consists of integer multiples of R. 
Given the integer closest to R there are classical 
polynomial-time algorithms to compute R to any 
precision. There are also efficient algorithms to 
test if a given number is a good approximation 
to an integer multiple of a unit or to compute the 
least significant digits of e = e* [1,3]. 

Two related and potentially more difficult 
problems are the principal ideal problem and 
computing the class group of a number field. In 
the principal ideal problem, a number field and 
an ideal J of O are given, and the problem is 
to decide if the ideal is principal, i.e., whether 
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there exists a such that J = wO. If it is principal, 
then one can ask for an approximation of Ina. 
There are efficient classical algorithms to verify 
that a number is close to Ina [1, 3]. The class 
group of a number field is the finite abelian group 
defined by taking the set of fractional ideals 
modulo the principal fractional ideals. The class 
number is the size of the class group. Computing 
the unit group, computing the class group, and 
solving the principal ideal problems are three of 
the main problems of computational algebraic 
number theory [3]. Assuming the GRH, they are 
in NPM CoNP [9]. 


Key Results 


The best known classical algorithms for the prob- 
lems defined in the last section take subexponen- 
tial time, but there are polynomial-time quantum 
algorithms for them [5,7]. 


Theorem 1 Given a quadratic discriminant A, 
there is a classical algorithm that computes an 
integer multiple of the regulator to within one. 
Assuming the GRH, this algorithm computes the 
regulator to within one and runs in expected time 


O(1) 
exp (Vv (og A) log log A) 


Theorem 2 There is a polynomial-time quantum 
algorithm that, given a quadratic discriminant 
A, approximates the regulator to within 6 of the 
associated order O in time polynomial in log A 
and \og 6 with probability exponentially close to 
one. 


Corollary 1 There is a polynomial-time quan- 
tum algorithm that solves Pell’s equation. 


The quantum algorithm for Pell’s equation 
uses the existence of a periodic function on the 
reals which has period R and is one-to-one within 
each period [5,7]. There is a discrete version of 
this function that can be computed efficiently. 
This function does not have the same periodic 
property since it cannot be evaluated at arbitrary 
real numbers such as R, but it does approximate 
the situation well enough for the quantum algo- 
rithm. In particular, computing the approximate 
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period of this function gives R to the closest 
integer or, in other words, computes a generator 
for the unit group. 


Theorem 3 There is a polynomial-time quantum 
algorithm that solves the principal ideal problem 
in real quadratic number fields. 


Corollary 2 There is a polynomial-time quan- 
tum algorithm that can break the Buchmann- 
Williams key-exchange protocol in real quadratic 
number fields. 


Theorem 4 The class group and class number of 
a real quadratic number field can be computed in 
quantum polynomial time assuming the GRH. 


In general, one can ask to find the unit group 
of an arbitrary degree number field Q(9), where 
@ is the root of a polynomial with rational coeffi- 
cients. There are two parameters associated with 
this problem. The first is the discriminant, which 
generalizes parameter above. The second is the 
degree n of the number field as a vector space 
over the rational numbers. In the above example 
the degree is fixed at 2. The unit group of an 
arbitrary degree number can also be computed 
efficiently by a quantum algorithm. 


Theorem 5 ([4]) The unit group of a number 
field can be computed by a quantum algorithm in 
time polynomial in log the discriminant, and the 
degree n. 


This last result uses a major generalization 
of the hidden subgroup problem to continuous 
functions. A new method is used to compute the 
function that is polynomial time in the degree of 
the number field and solves the hidden subgroup 
problem for continuous groups. 


Applications 


Computationally hard number theoretic problems 
are useful for public key cryptosystems. There are 
reductions from factoring to Pell’s equation and 
Pell’s equation to the principal ideal problem, but 
no reductions are known in the opposite direction. 
The principal ideal problem forms the basis of 
the Buchmann-Williams key-exchange protocol 
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[2]. Identification schemes based on this prob- 
lem have been proposed by Hamdy and Maurer 
[6]. The classical exponential-time algorithms 
help determine which parameters to choose for 
the cryptosystem. The best known algorithm for 
Pell’s equation is exponentially slower than the 
best factoring algorithm. Systems based on these 
harder problems were proposed as alternatives in 
case factoring turns out to be polynomial time 
solvable. The efficient quantum algorithms can 
break these cryptosystems. 


Open Problems 


Lattice-based cryptography is the leading class of 
candidates for primitives secure against quantum 
computers. Recent systems have used lattices 
from number fields in order to make them 
more efficient. It is an open question whether 
lattice-based systems are secure against quantum 
computers, given that quantum computers 
have an exponential advantage over classical 
computers for some problems in number 
fields. 
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Problem Definition 


A function F' is said to be r-to-one if every 
element in its image has exactly r distinct preim- 
ages. 
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Input : an r-to-one function F’. 
Output : x; and x2 such that F(x,) = F(x2). 


Key Results 


The algorithm presented here finds col- 
lisions in arbitrary r-to-one functions F 
after only O(</N/r) expected evaluations 
of F. The algorithm uses the function as 
a black box, that is, the only thing the 
algorithm requires is the capacity to evaluate 
the function. Again assuming the function 
is given by a black box, the algorithm is 
optimal [1], and it is more efficient than the 
best possible classical algorithm which has 
query complexity Q(,/N/r). The result is 
stated precisely in the following theorem and 
corollary. 


Theorem 1 Given anr-to-one function F:X >Y 
with r>2 and an integer 1<k<N = |X|, 
algorithm Collision(F,k) returns a collision 


after an expected number of O(k + ./N/(rk)) 


evaluations of F and uses space @(k). In partic- 
ular, when k = ¥/ N/r, then Collision(F, k) 
uses an expected number of O(VN/r) 
evaluations of F and space @(/N/r ). 


Corollary 1 There exists a quantum algorithm 
that can find a collision in an arbitrary r-to-one 
function F : X — Y, for any r = 2, using space 
S and an expected number of O(T) evalua- 
tions of F for every 1 < S < T subject to 
ST? > |F(X)| where F(X) denotes the image 
of F. 


The algorithm uses as a procedure a version of 
Grover’s search algorithm. Given a function H 
with domain size n and a target y, Grover(H, y) 
returns an x such that H(x) = y in expected 
O(./n) evaluations of H. 


Collision(F, k): 


1. Pick an arbitrary subset K C X of cardinal- 
ity k. Construct a table L of size k where each 
item in L holds a distinct pair (x, F(x)) with 
xek. 
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2. Sort L according to the second entry in each 
item of L. 

3. Check if L contains a collision, that is, check 
if there exist distinct elements (xo, F(xo)), 
(x1, F(x1)) € L for which F(x9) = F(x). 
If so, go to step 6. 

4. Compute x, = Grover(H, 1) where 
H : X — {0,1} denotes the function defined 
by A(x) =1 if and only if there exists 
xo € K so that (xo, F(x)) € L but x 4 Xp. 
(Note that xo is unique if it exists since we 
already checked that there are no collisions 
in L.) 

5. Find (xo, F(x1)) € L. 

6. Output the collision {xo, x1}. 


Applications 


This problem is of particular interest for cryptol- 
ogy because some functions known as hash func- 
tions are used in various cryptographic protocols. 
The security of these protocols crucially depends 
on the presumed difficulty of finding collisions in 
such functions. 
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Problem Definition 


Given positive real numbers a 4 1,b, the 
logarithm of b to base a is the unique real 
number s such that b = a*. The notion of the 
discrete logarithm is an extension of this concept 
to general groups. 


Problem 1 (Discrete logarithm) 


INPUT: Group G,a,b € G such that b = a® for 
some positive integer s. 

OUTPUT: The smallest positive integer 5 satis- 
fying b = a’, also known as the discrete 
logarithm of b to the base a in G. 


The usual logarithm corresponds to the dis- 
crete logarithm problem over the group of posi- 
tive reals under multiplication. The most common 
case of the discrete logarithm problem is when 
the group G = Z*, the multiplicative group of 
integers between | and p — 1 modulo p, where 
p is a prime. Another important case is when the 
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group G is the group of points of an elliptic curve 
over a finite field. 


Key Results 


The discrete logarithm problem in Zi» where 
p is a prime, as well as in the group of 
points of an elliptic curve over a finite field 
is believed to be intractable for randomized 
classical computers. That is, any, possibly 
randomized, algorithm for the problem run- 
ning on a classical computer will take time 
that is superpolynomial in the number of 
bits required to describe an input to the 
problem. The best classical algorithm for 
finding discrete logarithms in Zi where p 
is a prime, is Gordon’s [4] adaptation of 
the number field sieve which runs in time 
exp(O((log p)'/? (log log p)?/3)). 

In a breakthrough result, Shor [9] gave an 
efficient quantum algorithm for the discrete loga- 
rithm problem in any group G; his algorithm runs 
in time that is polynomial in the bit size of the 
input. 


Result 1 ({9]) There is a quantum algorithm 
solving the discrete logarithm problem in any 
group G on n-bit inputs in time O(n>) with 
probability at least 3/4. 


Description of the Discrete Logarithm 
Algorithm 

Shor’s algorithm [9] for the discrete logarithm 
problem makes essential use of an efficient 
quantum procedure for implementing a unitary 
transformation known as the quantum Fourier 
transform. His original algorithm gave an 
efficient procedure for performing the quantum 
Fourier transform only over groups of the 
form Z;,, where r is a “smooth” integer, but 
nevertheless, he showed that this itself sufficed 
to solve the discrete logarithm in the general 
case. In this article, however, a more modern 
description of Shor’s algorithm is given. In 
particular, a result by Hales and Hallgren [5] 
is used which shows that the quantum Fourier 
transform over any finite cyclic group Z; can be 


Quantum Algorithm for the Discrete Logarithm Problem 


efficiently approximated to inverse-exponential 
precision. 

A description of the algorithm is given below. 
A general familiarity with quantum notation on 
the part of the reader is assumed. A good in- 
troduction to quantum computing can be found 
in the book by Nielsen and Chuang [8]. Let 
(G,a,b,7) be an instance of the discrete loga- 
rithm problem, where Fr is a supplied upper bound 
on the order of a in G. That is, there exists a 
positive integer r < 7 such that a’ = 1. By using 
an efficient quantum algorithm for order finding 
also discovered by Shor [9], one can assume that 
the order of a in G is known, that is, the smallest 
positive integer r satisfying a” = 1. Shor’s 
order-finding algorithm runs in time O((log7)3). 
Let € > 0. The discrete logarithm algorithm 
works on three registers, of which the first two 
are each ¢ qubits long, where t¢ := O(logr + 
log(1/e)), and the third register is big enough to 
store an element of G. Let U denote the unitary 
transformation 


U : |x)ly)|z) > Ix)ly)lz @ (*a”)), 


where © denotes bitwise XOR. Given access to 
a reversible oracle for group operations in G, U 
can be implemented reversibly in time O(t*) by 
repeated squaring. 

Let C[Z;] denote the Hilbert space of func- 
tions from Z, to complex numbers. The compu- 
tational basis of C[Z;,] consists of the delta func- 
tions {|/)}o<;<,—1, Where |) is the function that 
sends the element / to 1 and the other elements of 
Z, to 0. Let QFTz, denote the quantum Fourier 
transform over the cyclic group Z, defined as the 
following unitary operator on C[Z,]: 


QFTz, : |x) rl? = gry). 


yeZy 


It can be implemented in quantum time 
O(t log(t/e) + log?(1/e)) up to an error of 
€ using one f-qubit register [5]. Note that 
for any k € Z,,QFTyz, transforms the state 


r—l/2, ¥* e~2ikx/r) x) to the state |k). For any 
x€EZr 
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integer /,0 <1 <r —1, define 


r-1 
\/) = pr t/2 br made (1) 


k=0 


Observe that qr )}o</<r—1 forms an orthonormal 
basis of C[(a)], where (a) is the subgroup gen- 
erated by a in G and is isomorphic to Z,, and 
C[(a)] denotes the Hilbert space of functions 
from (a) to complex numbers. 


Algorithm 1 (Discrete logarithm) 


INPUT: Elements a,b € G, a quantum circuit for 
U, the order r of a inG. 

OUTPUT: With constant probability, the discrete 
logarithm s of b to the base a in G. 

RUNTIME: A total of O(t?) basic gate opera- 
tions, including four invocations of QFTz,. and 
one of U. 

PROCEDURE: 


1. Repeat Steps (a)-(e) twice, 
(sl;mod r,/,) and (s/,mod r, /2). 
(a) |0)|0)|0) 


(b) re r-! Y! |x)/¥)10) 
x,yEZy 


Apply QFTz,. to the first two registers: 
(c)tert Y |x)ly)lb*a*) 


x,yeZr 


Apply U 
r-1 A 
(d) + r7'/? ¥ |sl_ mod r)|/)|/) 


obtaining 


Apply QFT:, to the first two registers: 
(e) +> (sl mod ¢,/) 
Measure the first two registers: 
2. If J; is not coprime to /, abort. 
3. Let k1,k2 be integers such that kyl, + 
kaly = 1. Then, output s = k,(sly) + 
k2(slz) mod r. 


The working of the algorithm is explained below. 
From Eq. (1), it is easy to see that 


r-1 
|b* a”) _ uy era), 
1=0 


Thus, the state in Step 1(c) of the above algorithm 
can be written as 
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r! S* |x)|y)|b¥a”) 


x,yeEZr 
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r-1 
Sey > e2tilsxty)/r 1x) ly) 


1=0 x,yEZy 


r-1 
_ eo ye grmisin/? ix) . » e2tily/r| yy \i). 


1=0 | x€Z,+ 


yeZ, 


Now, applying QFTz,. to the first two registers 
gives the state in Step 1(d) of the above algorithm. 
Measuring the first two registers gives (s/mod 
r,l) for a uniformly distributed /,0 <7 <r-—-1 
in Step 1(e). By elementary number theory, it 
can be shown that if integers /,, /2 are uniformly 
and independently chosen between 0 and / — 1, 
they will be coprime with constant probability. In 
that case, there will be integers k,, k2 such that 
kyl, + kzly = 1, leading to the discovery of the 
discrete logarithm s in Step 3 of the algorithm 
with constant probability. Since actually only an 
€-approximate version of QFT7,. can be applied, 
€ can be set to be a sufficiently small constant, 
and this will still give the correct discrete loga- 
rithm s in Step 3 of the algorithm with constant 
probability. The success probability of Shor’s 
algorithm for the discrete logarithm problem can 
be boosted to at least 3/4 by repeating it a constant 
number of times. 


Generalizations of the Discrete Logarithm 
Algorithm 

The discrete logarithm problem is a special case 
of a more general problem called the hidden 
subgroup problem [8]. The ideas behind Shor’s 
algorithm for the discrete logarithm problem can 
be generalized in order to yield an efficient quan- 
tum algorithm for hidden subgroups in Abelian 
groups (see [1] for a brief sketch). It turns out that 
finding the discrete logarithm of b to the base a 
in G reduces to the hidden subgroup problem in 
the group Z, x Z, where r is the order of a in 
G. Besides the discrete logarithm problem, other 
cryptographically important functions like inte- 
ger factoring, finding the order of permutations, 


as well as finding self-shift-equivalent polynomi- 
als over finite fields can be reduced to instances 
of a hidden subgroup in Abelian groups. 


Applications 


The assumed intractability of the discrete loga- 
rithm problem lies at the heart of several cryp- 
tographic algorithms and protocols. The first ex- 
ample of public-key cryptography, namely, the 
Diffie-Hellman key exchange [2], uses discrete 
logarithms, usually in the group Z5 for a prime p. 
The security of the US national standard Digital 
Signature Algorithm (see [7] for details and more 
references) depends on the assumed intractability 
of discrete logarithms in Z*, where p is a prime. 
The ElGamal public-key cryptosystem [3] and 
its derivatives use discrete logarithms in appro- 
priately chosen subgroups of Z*, where p is a 
prime. More recent applications include those in 
elliptic curve cryptography [6], where the group 
consists of the group of points of an elliptic curve 
over a finite field. 
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Problem Definition 


The parity of n bits x9, X1,-+- , Xn—1 € {0, 1} is 


n—-1 
Xo BX Os OXn-1 = ) Xj 
i=0 


mod 2. 
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As an elementary Boolean function, parity is 
important not only as a building block of digital 
logic but also for its instrumental roles in several 
areas such as error correction, hashing, discrete 
Fourier analysis, pseudorandomness, communi- 
cation complexity, and circuit complexity. The 
feature of parity that underlies its many appli- 
cations is its maximum sensitivity to the in- 
put: flipping any bit in the input changes the 
output. The computation of parity from its in- 
put bits is quite straightforward in most com- 
putation models. However, two settings deserve 
attention. 

The first is the circuit complexity of parity 
when the gates are restricted to AND, OR, and 
NOT gates. It is known that parity cannot be com- 
puted by such a circuit of a polynomial size and 
a constant depth, a groundbreaking result proved 
independently by Furst, Saxe, and Sipser [7] and 
Ajtai [1] and improved by several subsequent 
works. 

The second, and the focus of this article, 
is in the decision tree model (also called the 
query model or the black-box model), where 
the input bits x = XoxXy-*+Xn-1 € {0,1}” 
are known to an oracle only, and the algorithm 
needs to ask questions of the type “x; =?” to 
access the input. The complexity is measured by 
the number of queries. Specifically, a quantum 
query is the application of the following query 
gate: 


Ox: |i,b) & |i,b ® xi), 
i € {0,--- ,n—1},b € {0,1}. 


Key Results 


Proposition 1 There is a quantum query algo- 
rithm computing the parity of 2 bits with prob- 
ability 1 using 1 query. 

Proof Denote by |£) = 5 (10) 
initial state of the algorithm is 


+ |1)). The 


1 


Whe + |1)) ® |-). 
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Apply a query gate, using the first register for the 
index slot and the second register for the answer 
slot. The resulting state is 


1 
—((—1)*° 0) + (—1)*"|1)) ® |[-). 
ye (—1)*"|1)) @ |-) 
Applying a Hadamard gate H = |+)(0|+|-)(1| 
on the first register brings the state to 


(—1)*°|xo + x1) ® |-). 


Thus measuring the first register gives x9 + X1 
with certainty. 


Corollary 1 There is a quantum query algo- 
rithm computing the parity of n bits with prob- 
ability 1 using {n/2] queries. 


The above quantum upper bound for parity 
is tight, even if the algorithm is allowed to err 
with a probability bounded away from 1/2 [6]. In 
contrast, any classical randomized algorithm with 
bounded error probability requires n queries. This 
follows from the fact that on a random input, any 
classical algorithm not knowing all the input bits 
is correct with precisely 1/2 probability. 


Applications 


The quantum speedup for computing parity was 
first observed by Deutsch [4]. His algorithm 
uses |0) in the answer slot, instead of |—). 
After one query, the algorithm has 3/4 chance 
of computing the parity, better than any classical 
algorithm (1/2 chance). The presented algorithm 
is actually a special case of the Deutsch- 
Jozsa Algorithm, which solves the following 
problem now referred to as the Deutsch-Jozsa 
Problem. 


Problem 1 (Deutsch-Jozsa Problem) Let n > 
1 be an integer. Given an oracle function f : 
{0, 1}” —> {0,1} that satisfies either (a) f(x) is 
constant on all x € {0,1}”" or (b) |{x : f(x) = 
1}| = |{x : f(x) = 0} = 2"!, determine 
which case it is. 
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When n = 1, the above problem is precisely 
parity of 2 bits. For a general n, the Deutsch- 
Jozsa Algorithm solves the problem using only 
once the following query gate: 


Of:|x, b)r>|x, f(x) @ b), xe{0, 1)”, be{0, 1}. 
The algorithm starts with 
|0") @ |-). 


It applies 1 ®” on the index register (the first 
qubits), changing the state to 


y~ |x) @l-). 


xe{0,1}” 


gn/2 


The oracle gate is then applied, resulting in 


1 
gn/2 


> CD*™|x) @ |-). 


xe{0,1}” 


For the second time, H®” is applied on the index 
register, bringing the state to 


I 
Oe Ie, 
xe{0,1}” 


», 


ye{o,1}” 


(1) 


Finally, the index register is measured in the com- 
putational basis. The Algorithm returns “Case 
(a)” if 0” is observed, otherwise returns “Case 
(b).” 

By direct inspection, the amplitude of |0”) 
is 1 in Case (a) and 0 in Case (b). Thus the 
algorithm is correct with probability 1. It is easy 
to see that any deterministic algorithm requires 
n/2 + 1 queries in the worst case; thus the 
algorithm provides the first exponential quantum 
versus deterministic speedup. 

Note that O(1) expected a number of queries 
are sufficient for randomized algorithms to solve 
the Deutsch-Jozsa Problem with a constant suc- 
cess probability arbitrarily close to 1. Thus the 
Deutsch-Jozsa Algorithm does not have much 
advantage compared with error-bounded random- 
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ized algorithms. One might also feel that the 
saving of one query for computing the parity 
of 2 bits by Deutsch-Jozsa Algorithm is due to 
the artificial definition of one quantum query. 
Thus the significance of the Deutsch-Jozsa Al- 
gorithm is not in solving a practical problem, 
but in its pioneering use of quantum Fourier 
transform (QFT), of which H ®” is one, in the 
pattern 


QFT — Query — QFT. 


The same pattern appears in many subsequent 
quantum algorithms, including those found by 
Bernstein and Vazirani [2], Simon [9], and 
Shor [8]. 

The Deutsch-Jozsa Algorithm is also referred 
to as Deutsch Algorithm. The algorithm as pre- 
sented above is actually the result of the im- 
provement by Cleve, Ekert, Macchiavello, and 
Mosca [3] and independently by Tapp (unpub- 
lished) on the algorithm in [5]. 
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Problem Definition 


Associated with each number field is a finite 
abelian group called the class group. The order 
of the class group is called the class number. 
Computing the class number and the structure 
of the class group of a number field is among 
the main tasks in computational algebraic number 
theory [4]. 

A number field F can be defined as a subfield 
of the complex numbers C which is generated 
over the rational numbers Q by an algebraic 
number, i.e., F = Q(@) where @ is the root of 
a polynomial with rational coefficients. The ring 
of integers O of F is the subset consisting of 
all elements that are roots of monic polynomials 
with integer coefficients. The ring O C F can 
be thought of as a generalization of Z, the ring of 
integers in Q. In particular, one can ask whether 
O is a principal ideal domain and whether ele- 
ments in O have unique factorization. Another 
interesting problem is computing the unit group 
O*, which is the set of invertible algebraic inte- 
gers inside F, that is, elements a € O such that 
a! is also in O. 
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Ever since the class group was discovered by 
Gauss in 1798, it has been an interesting object 
of study. The class group of F is the set of equiv- 
alence classes of fractional ideals of /, where 
two ideals J and J are equivalent if there exists 
a € F* such that J = a/. Multiplication of two 
ideals J and J is defined as the ideal generated 
by all products ab, where a € J andb e€ J. 
Much is still unknown about number fields, such 
as whether there exist infinitely many number 
fields with trivial class group. The question of the 
class group being trivial is equivalent to asking 
whether the elements in the ring of integers O of 
the number field have unique factorization. 

In addition to computing the class number and 
the structure of the class group, computing the 
unit group and determining whether given ideals 
are principal, called the principal ideal problem, 
are also central problems in computational alge- 
braic number theory. 


Key Results 


The best known classical algorithms for the class 
group take subexponential time [1, 2, 4]. Assum- 
ing the GRH, computing the class group, the unit 
group, and solving the principal ideal problem are 
in NP CoNP [10]. 

The following theorems state that the three 
problems defined above have efficient quantum 
algorithms [7,9]. 


Theorem 1 There is a polynomial-time quantum 
algorithm that computes the unit group of a 
constant degree number field. 


Theorem 2 There is a polynomial-time quantum 
algorithm that solves the principal ideal problem 
in constant degree number fields. 


Theorem 3 The class group and class number of 
a constant degree number field can be computed 
in quantum polynomial time assuming the GRH. 


Computing the class group means computing 
the structure of a finite abelian group given a set 
of generators for it. When it is possible to effi- 
ciently multiply group elements (including com- 
puting large powers of elements) and efficiently 
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compute unique representations of each group 
element, then this problem reduces to the stan- 
dard hidden subgroup problem over the integers 
and therefore has an efficient quantum algorithm. 
Ideal multiplication is efficient in number fields. 
For imaginary number fields, there are efficient 
classical algorithms for computing group ele- 
ments with a unique representation, and therefore 
there is an efficient quantum algorithm for com- 
puting the class group. 

For real number fields, there is no known way 
to efficiently compute unique representations of 
class group elements. As a result, the classical 
algorithms typically compute the unit group and 
class group at the same time. A quantum algo- 
rithm [7] is able to efficiently compute the unit 
group of a number field and then use the principal 
ideal algorithm to compute a unique quantum 
representation of each class group element. Then 
the standard quantum algorithm can be applied 
to compute the class group structure and class 
number. 


Applications 


There are factoring algorithms based on com- 
puting the class group of an imaginary number 
field. One is exponential time and the other is 
subexponential time [4]. 

Computationally hard number _ theoretic 
problems are useful for public key cryptosystems. 
Pell’s equation reduces to the principal 
ideal problem, which forms the basis of the 
Buchmann-Williams key-exchange protocol 
[3]. Identification schemes have also been 
based on this problem by Hamdy and Maurer 
[8]. The classical exponential-time algorithms 
help determine which parameters to choose 
for the cryptosystem. Factoring reduces to 
Pell’s equation, and the best known algorithm 
for it is exponentially slower than the best 
factoring algorithm. Systems based on these 
harder problems were proposed as alternatives 
in case factoring turns out to be polynomial time 
solvable. The efficient quantum algorithms can 
break these cryptosystems. 
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Open Problems 


The unit group of an arbitrary degree number 
field has an efficient quantum algorithm [6], and 
computing the class group and solving the prin- 
cipal ideal problem are related to this problem. 
One open problem is to compute certain towers 
of number fields with special properties, such 
as an infinite family with constant root discrim- 
inant [5]. 
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Problem Definition 


The input is an undirected simple graph G on n 
vertices. The graph is given by its adjacency ma- 
trix: For any two vertices u and v, one can query 
whether wu and v are connected by an edge. (Note 
that classical algorithms usually have access to G 
in the form of incidence lists. However, specifi- 
cation of the input graph in the form of adjacency 
matrix is standard in quantum algorithms.) Two 
special vertices of the graph, s and f, are selected. 
The task is to detect whether s and ¢ lie in 
the same connected component of G. Quantum 
algorithms for this problem are described. 
Classically, this problem can be solved in 
quadratic time by a variety of algorithms. It is 
easy to see that this is optimal. Also, the st- 
connectivity problem is a canonical example of 
a problem in RL (the class of problems solvable 
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in randomized logspace) [1]. Later, it was shown 
to be in L (deterministic logspace) [8]. 


Key Results 


Previous Algorithm 

Diirr et al. [5] gave a quantum algorithm with 
the following properties. Its query complexity is 
O(n3/), and its time complexity is the same 
up to logarithmic factors. The algorithm repeat- 
edly executes a quantum subroutine that uses 
O(logn) qubits and requires quantum read-only 
access to O(n logn) classical bits. This memory 
is changed between the runs of the quantum 
subroutine. 

The algorithm is based on _ Bortvka’s 
algorithm [4]. It solves a more general 
problem of finding a minimum spanning 
tree of G. In particular, the algorithm out- 
puts a list of the connected components 
of G. 


Main Algorithm 

Theorem 1 ((3]) Consider the st-connectivity 
problem on an n-vertex graph G with the 
additional promise: Either s and t lie in different 
components of G, or they are connected by a path 
of length at most d. The above problem can be 
solved by a quantum algorithm in O(nV/d) time, 
O(nVd) queries, and O(log n) space. 


Thus, in the worst case of d = n — 1, the 
complexity of the algorithm is the same as of the 
algorithm by Diirr et al. But if d is small, this 
algorithm performs better. This promise appears 
quite naturally in practice. 

The algorithm is based on the quantum algo- 
rithm for evaluating span programs [6, 7]. 


Applications 


The st-connectivity algorithm or its modifi- 
cations can be used as a quantum version of 
dynamic programming. In general, quantum 
algorithms provide no advantage in implementing 
dynamic programming. The algorithm of 
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Theorem 1, although it does not have the 
full power of dynamic programming, attains a 
quadratic speedup (for small values of d). In [3], 
this algorithm is combined with the color-coding 
approach [2] to solve the problem of finding 
small subgraphs. 

For example, consider the problem of detect- 
ing the presence of a k-path in an input graph G 
given by its adjacency matrix. (We assume that 
k = O(1).) Color each vertex of G in a color 
from {0,1,...4} independently and uniformly 
at random. Leave only those edges of G that 
connect vertices whose colors differ by exactly 
1. Add two new vertices s and f, connect s 
to all vertices of color 0, and connect f¢ to all 
vertices of color k. Denote the resulting graph 
by G’. 

We say that a k-path in G is colored correctly, 
if the colors of its vertices go from 0 to k starting 
with one of its end points. Thus, for any k-path of 
G, the probability it is colored correctly is 92(1). 

Execute the algorithm of Theorem 1 on G’ 
with d = k + 2. If G contains a correctly 
colored k-path, then G’ has a path of length k + 2 
from s to t; hence, the algorithm accepts. On the 
other hand, if s and ft are connected by a path 
in G’, then G contains a k-path (not necessary 
correctly colored). Hence, if there is no k-path 
in G, the algorithm rejects for any coloring of 
G. By repeating the algorithm constant a number 
of times with different colorings, it is possible to 
distinguish these two cases. 

Classically, color coding is capable of finding 
a subgraph H in the input graph, if H is an 
arbitrary fixed tree. In the quantum case, the class 
of graphs is narrower. 


Problem 1 (Subgraph/not-a-minor —_ promise 
problem) Let H be a fixed simple graph. The 
input is a graph G given by its adjacency matrix. 
The task is to distinguish two cases: 


¢ The graph G contains H as a subgraph. 
¢ The graph G does not contain H as a minor. 


Classically, this problem requires (2(n7) 
queries even if H is a single edge. The quantum 
query lower bound is §2(n). 
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Theorem 2 ([3]) Assume that H 
angle or an edge-subdivision of a star. The 
subgraph/not-a-minor promise problem for H 
on an n-vertex input graph can be solved 
by a quantum algorithm in O(n) time. The 
algorithm uses O(n) queries. If H is an edge- 
subdivision of a star, the algorithm uses O(log n) 


is a tri- 


space. 


Corollary 1 Assume that H is a path or an 
edge-subdivision of a claw (a 3-star). There exists 
a quantum algorithm that detects whether an n- 
vertex input graph contains H as a subgraph in 
O(n) time. The algorithm uses O(n) queries and 
O(log n) space. 
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Problem Definition 


Let S be any algebraic structure over which 
matrix multiplication is defined, such as a field 
(e.g., real numbers), a ring (e.g., integers), or 
a semiring (e.g., the Boolean semiring). If we 
use + and - to denote the addition and mul- 
tiplication operations over S, then the matrix 
product C of two n x n matrices A and B 
is defined as Cj; := -y—, Aix - Be; for all 
i,j € {1,2,...,n}. Over the Boolean semiring, 
the addition and multiplication operations are the 
logical OR and logical AND operations, respec- 
tively, and thus, the matrix product C is defined 
as Cjy = \Vp_, (Aix A Bx;). In this article we 
consider the following problems. 


Problem 1 (Matrix multiplication) 


INPUT: Twon Xn matrices A and B with entries 
from S. 
OUTPUT: The matrix C := AB. 
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Problem 2 (Matrix product verification) 


INPUT: Three 1 x n matrices A, B, and C with 
entries from S. 

OUTPUT: A bit indicating whether or not C = 
AB. 


The matrix multiplication problem is a well- 
studied problem in classical computer science. 
The straightforward algorithm for matrix multi- 
plication that computes each entry separately us- 
ing its definition uses O(n?) operations. In 1969, 
Strassen [17] presented an algorithm that multi- 
plies matrices over any ring using only O(n?:8°7) 
operations, showing that the straightforward ap- 
proach was suboptimal. Since then there have 
been many improvements and the complexity of 
matrix multiplication remains an area of active 
research. 

Surprisingly, the matrix product verification 
problem can be solved faster. In 1979, 
Freivalds [6] presented an optimal O(n?) time 
bounded-error probabilistic algorithm to solve 
the matrix product verification problem over any 
ring using a randomized fingerprinting technique, 
which has found numerous other applications in 
theoretical computer science (see, e.g., Ref. [15]). 

In the quantum setting, these problems are 
traditionally studied in the model of quantum 
query complexity, where we assume the entries of 
the input matrices are provided by a black box or 
an oracle. The query complexity of an algorithm 
is the number of queries made to the oracle. The 
bounded-error quantum query complexity of a 
problem is the minimum query complexity of 
any quantum algorithm that solves the problem 
with bounded error, i.e., it outputs the correct 
answer with probability greater than (say) 2/3. 
The time complexity of an algorithm refers to the 
time required to implement the remaining non- 
query operations. In this article we only consider 
bounded-error quantum algorithms. 


Key Results 
It is not known if quantum algorithms can im- 


prove the time complexity of the general ma- 
trix multiplication problem compared to classical 
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algorithms. Improvements are possible for matrix 
product verification and special cases of the ma- 
trix multiplication problem, as described below. 


Matrix Product Verification over Rings 
According to Buhrman and Spalek [3], matrix 
product verification was first studied (in an 
unpublished paper) by Ambainis, Buhrman, 
Hgyer, Karpinski, and Kurur. Using a recursive 
application of Grover’s algorithm [7], they gave 
an O(n7/4) query algorithm for the problem. 
The first published work on the topic is due 
to Buhrman and Spalek [3], who gave an 
O(n5/3) query algorithm for matrix product 
verification over any ring using a generalization 
of Ambainis’ element distinctness algorithm [1]. 
This algorithm also achieves the same query 
complexity over semirings and more general 
algebraic structures. The algorithm can easily 
be cast in the quantum walk search framework 
of Magniez, Nayak, Roland, and Santha [14] as 
explained in the survey by Santha [16]. More 
interestingly, they presented an algorithm with 
time complexity O(n5/3) for the problem over 
fields and integral domains. Their algorithm uses 
the same technique used by Freivalds [6] and 
is therefore also time efficient over arbitrary 
rings. Buhrman and Spalek also proved a lower 
bound showing that any bounded-error quantum 
algorithm must make at least 2(n3/2) queries to 
solve the problem over the field F2. This lower 
bound can be extended to all rings [10]. 


Theorem 1 (Matrix product verification over 
rings) The matrix product verification problem 
over any ring can be solved by a quantum al- 
gorithm with query complexity O(n?!) and time 
complexity O(n>/3). Furthermore, any quantum 
algorithm must make Q(n?/*) queries to solve 
the problem over a ring. 


Buhrman and Spalek also studied the relation- 
ship between the complexity of their algorithm 
and the number of incorrect entries in the pur- 
ported product, C, and showed that their algo- 
rithm performs better when C has a large number 
of incorrect entries [3]. 
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Matrix Multiplication over Rings 

The quantum query complexity of multiplying 
two n Xn matrices is easy to characterize in terms 
of the input size. Clearly the query complexity is 
upper bounded by the input size, O(n7). On the 
other hand, if A equals the identity matrix, then 
C = B and in this case the matrix multiplication 
problem is equivalent to learning all the bits 
of an input of size n*, which requires 2(n”) 
queries. This follows, for example, from the fact 
that computing the parity of n? bits requires 
Q(n7) queries [2,5]. This shows that the quan- 
tum query complexity of matrix multiplication is 
@(n7), which is the same as the classical query 
complexity. Similarly, no quantum algorithm is 
known to improve the time complexity of matrix 
multiplication over rings compared to classical 
algorithms. 

Buhrman and Spalek [3] studied the matrix 
multiplication problem in terms of m and an 
additional parameter £, the number of nonzero 
entries in the output matrix C, and showed the 
following result. 


Theorem 2 The matrix multiplication problem 
over any ring can be solved by a quantum al- 
gorithm with query and time complexity upper 
bounded by 


O(n9/3 62/3) when 1 < £< Jn, 
O(n3/2£) when /n < € <n, and 
O(n?) whenn < € <n?, 
where £ is the number of nonzero entries in the 
output matrix C. 


When £ is small, this algorithm achieves 
subquadratic time complexity and when ¢@ 
approaches n?, its time complexity is close to 
O(n?), which is trivial and slower than known 
classical algorithms. A detailed comparison of 
this quantum algorithm with classical algorithms 
may be found in Ref. [3]. 


Boolean Matrix Product Verification 

Buhrman and Spalek [3] also studied the matrix 
product verification problem over the Boolean 
semiring and showed that the problem can be 
solved with query and time complexity O(n?/). 
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On the other hand, the best known lower bound 
is only Q(n!-95) queries due to Childs, Kimmel, 
and Kothari [4]. 


Theorem 3 (Boolean matrix product verifica- 
tion) The Boolean matrix product verification 
problem can be solved by a quantum algorithm 
with query complexity O(n?!) and time com- 
plexity O(n3/?). Furthermore, any quantum al- 
gorithm must make Q(n'->) queries to solve the 
problem. 


Boolean Matrix Multiplication 

As before, the quantum query complexity of 
multiplying two n xn Boolean matrices is O(n”), 
since it is at least as hard as learning n? input bits. 
The time complexity of Boolean matrix multipli- 
cation can be improved to O(n?>) by observing 
that the inner product of two Boolean vectors of 
length n can be computed with O(./n) queries 
using Grover’s algorithm [7]. This observation 
also speeds up matrix multiplication over some 
other semirings. 

Similar to the matrix multiplication problem 
over rings, Boolean matrix multiplication can be 
studied in terms of an additional parameter @, the 
number of nonzero entries in the output matrix. 
Indeed, the problem has been extensively studied 
in this setting. 

Buhrman and Spalek [3] observed that two 
Boolean matrices can be multiplied with query 
complexity O(n3/2./@). This upper bound 
was improved by Vassilevska Williams and 
Williams [18], who presented an algorithm 
with query complexity O(min{n!-3¢!7/3°, n? + 
n'3/15¢47/60\) which was then improved by 
Le Gall [11]. Finally, Jeffery, Kothari, and 
Magniez [8] presented a quantum algorithm 
for Boolean matrix multiplication that makes 
O(nVe) queries. These upper bounds are 
depicted in Fig.l. The log factors present in 
their algorithm were later removed to yield an 
algorithm with query complexity O(n) [9]. 
Jeffery, Kothari, and Magniez [8] also proved a 
matching lower bound of 2(nV/) when £ < en? 
for any constant e€ < 1. Their algorithm can 
also be modified to achieve time complexity 


O(nVe + £Yn) [12]. 
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Quantum Algorithms for Matrix Multiplication and 
Product Verification, Fig. 1 Upper bounds on the 
quantum query complexity of Boolean matrix multiplica- 
tion 


Theorem 4 (Boolean matrix multiplication) 
The Boolean matrix multiplication problem can 
be solved by a quantum algorithm with query 
complexity O(n). Furthermore, any quantum 
algorithm that solves the problem must make 
Q(nvV/) queries when £ < €n? for any constant 
€ < 1. Boolean matrix multiplication can be 


solved in time O(nV 0 + €/n). 


Recently the problem has also been studied 
in terms of the sparsity of the input matrix. Le 
Gall and Nishimura [13] present algorithms with 
improved time complexity in this case. Their al- 
gorithm’s time complexity is a complicated func- 
tion of the parameters and the reader is referred 
to Ref. [13] for details. 


Matrix Multiplication over Other Semirings 
Le Gall and Nishimura [13] recently initiated 
the study of matrix multiplication over semirings 
other than the Boolean semiring and presented 
algorithms with improved time complexity for 
the (max, min)-semiring and related semirings. 


Open Problems 


Several open problems remain in the time and 
query complexity settings. In the time complexity 
setting, a major open problem is whether quan- 
tum algorithms can solve the matrix multiplica- 
tion problem faster than classical algorithms over 
any ring. In the query complexity setting, the 
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complexity of matrix product verification over 
rings and the Boolean semiring remains open. 
The best upper and lower bounds are presented 
in Theorems | and 3. A more comprehensive sur- 
vey of the quantum query complexity of matrix 
multiplication and its relation to other problems 
studied in quantum query complexity such as 
triangle finding and graph collision can be found 
in the first author’s PhD thesis [10], which also 
contains additional open problems. 
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Problem Definition 


This problem is concerned with the development 
of quantum methods to speed up classical algo- 
rithms based on simulated annealing (SA). 

SA is a well-known and powerful strategy to 
solve discrete combinatorial optimization prob- 
lems [1]. The search space © = {00,...,0¢-1} 
consists of d configurations o;, and the goal is to 
find the (optimal) configuration that corresponds 
to the global minimum of a given cost function 
E : X& — R. Monte Carlo implementations of 
SA generate a stochastic sequence of configura- 
tions via a sequence of Markov processes that 
converges to the low-temperature Gibbs (proba- 
bility) distribution, zg,,(2’) « exp(—Bm E()). 
If By» is sufficiently large, sampling from the 
Gibbs distribution outputs an optimal configu- 
ration with large probability, thus solving the 
combinatorial optimization problem. The anneal- 
ing process depends on the choice of an an- 
nealing schedule, which consists of a sequence 
of d x d stochastic matrices (transition rules) 
S(B1), S(B2),..., S(Bm). Such matrices are de- 
termined, e.g., by using Metropolis-Hastings [2]. 
The real parameters f ; denote a sequence of “in- 
verse temperatures.” The implementation com- 
plexity of SA is given by m, the number of times 
that transition rules must be applied to converge 
to the desired Gibbs distribution (within arbitrary 
precision). Commonly, the stochastic matrices 
are sparse, and each list of nonzero conditional 
probabilities and corresponding configurations, 
{Prg(ojloi), 7 : Prg(o;|o;) > O}, can be ef- 
ficiently computed on input (i, 8). This implies 
an efficient Monte Carlo implementation of each 
Markov process. When a lower bound on the 
spectral gap of the stochastic matrices (i.e., the 
difference between the two largest eigenvalues) 


1678 


is known and given by A > 0, one can choose 
(Be+1— Be) « A/Emax and Bo = 0, Bm & 
log Vd. Emax is an upper bound on max, | E(o)|. 
The constants of proportionality depend on the 
error probability €, which is the probability of 
not finding an optimal solution after the transition 
rules have been applied. These choices result in a 
complexity m « Emax log /d/A for SA [3]. 

Quantum computers can theoretically solve 
some problems, such as integer factorization, 
more efficiently than classical computers [4]. 
This work addresses the question of whether 
quantum computers could also solve combina- 
torial optimization problems more efficiently or 
not. The answer is satisfactory in terms of A 
(Section “Key Results”). The complexity of a 
quantum algorithm is determined by the number 
of elementary steps needed to prepare a quantum 
state that allows one to sample from the Gibbs 
distribution after measurement. Similar to SA, 
such a complexity is given by the number of 
times a unitary corresponding to the stochastic 
matrix is used. For simplicity, we assume that 
the stochastic matrices are sparse and disregard 
the cost of computing each list of nonzero con- 
ditional probabilities and configurations, as well 
as the cost of computing E(o). We also assume 
d = 2” and the space of configurations ¥ is 
represented by n-bit strings. Some assumptions 
can be relaxed. 


Problem 

INPUT: An objective function E : > R, 
sparse stochastic matrices S(B) satisfying the 
detailed balance condition, a lower bound 
A > 0o0n the spectral gap of S(B), an error 
probability € > 0. 

OUTPUT: A random configuration 0; € X’ such 
that Pr(o; € So) => 1 —€, where So is the set 
of optimal configurations that minimize E. 


Key Results 


The main result is a quantum algorithm, referred 
to as quantum simulated annealing (QSA), that 
solves a combinatorial optimization problem with 
high probability using mg & Eyax log Vd [VA 
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unitaries corresponding to the stochastic matri- 
ces [5]. The quantum speedup is in the spectral 
gap, as 1//A <1/AwhenA <1. 

Computationally hard combinatorial opti- 
mization problems are typically manifest in 
a spectral gap that decreases exponentially 
fast in logd, the problem size. The quadratic 
improvement in the gap is then most significant 
in hard instances. The QSA algorithm is based 
on ideas and techniques from quantum walks 
and the quantum Zeno effect. The quantum 
Zeno effect can be implemented by evolution 
randomization [6]. Nevertheless, recent results 
n “spectral gap amplification” allow for other 
quantum algorithms that result in a similar 
complexity scaling [7]. 


Quantum Walks for QSA 

A quantization of the classical random walk is 
obtained by first defining a d? xd? unitary matrix 
that satisfies [8—10] 


d-1 


>> yPre(osloi)loi)\o;) - 


jJ=0 


X|o;)|0) = (1) 


The configuration 0 represents a simple configu- 
ration, e.g., 0 = og = 0...0 (the n-bit string), 
and Prg(o;|o;) are the entries of the stochastic 
matrix S(B). The other d* x d? unitary matrices 
used by QSA are P, the permutation (swap) 
operator that transforms |o;)|o;) into |o;)|o;), 
and R = 1 — 2|0) (0), the reflection operator over 
|0). 

The quantum walk is W = X'PXPRPX? 
PXR, and the detailed balance condition im- 
plies [5] 


3 mp (0i)|0;)|9) , 
(2) 


where zg(o;) are the probabilities given by the 
Gibbs distribution. (X, X + and W also depend 
on 6.) The goal of QSA is to prepare the corre- 
sponding eigenstate of W in Eq. 2, within certain 
precision € > 0, and for inverse temperature 
Bm & logd. A projective quantum measurement 


wy mp (0;)|0i)|0) = 
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of |o;) on such a state outputs an optimal solution 
in the set So with probability Pr(So) => 1—e. 


Evolution Randomization and QSA 
Implementation 

The QSA is based on the idea of adiabatic state 
transformations [6,11]. For 6 = 0, the initial 
eigenstate of W is ~ |o;)|0) //d, which can 
be prepared easily on a quantum computer. The 
purpose of QSA is then to drive this initial state 
towards the eigenstate of W for inverse tempera- 
ture Bm, within given precision. This is achieved 
by applying the sequence of unitary operations 
[W(Bm)|™ ... [W(B2)}2[W(B1)]"! to the initial 
state (Fig. 1). In contrast to SA, (6x41 — Be) « 
1/Emax [11], but the initial and final inverse 
temperatures are also By = 0 and Bm « log Vd. 
This implies that the number of different inverse 
temperatures in QSA is m & Emax log Vd : 
where the constant of proportionality depends 
on €. The nonnegative integers t, can be sampled 
randomly according to several distributions [6]. 
One way is to obtain ty, after sampling multiple 
(but constant) times from a uniform distribu- 
tion on integers between 0 and Q — 1, where 
Q = [2m//A]. The average cost of QSA is 
then m(ty) « Emax log /d/WA. One can use 
Markov’s inequality to avoid those (improbable) 
instances where the cost is significantly greater 
than the average cost. The QSA and the values of 
the constants are given in detail in Fig. 1. 


Analytical Properties of W 

The quantum walk W has eigenvalues e+!%/, 
for 7 = 0,...,d — 1, in the relevant subspace. 
In particular, 69 = 0 < @ < < dd-1 
and @; > VA [5,7-9]. This implies that the 
relevant spectral gap for methods based on quan- 
tum adiabatic state transformations is of order 
/A. The quantum speedup follows from the fact 
that the complexity of such methods, recently 
discussed in [6, 11-13], depends on the inverse 
of the relevant gap. 
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t%, t? = unif[0, Q—-1] 
t=t°+¢ 
Apply [W (B=k 5B)]° 


Yes <E<m> 


No 


Measure |o;) => 


Quantum Algorithms for Simulated Annealing, Fig. 1 
Flow diagram for the QSA. Under the assumptions, 
the input state can be easily prepared on a quantum 
computer by applying a sequence of n Hadamard gates 
on n qubits. unif[0,Q — 1] is the uniform distribu- 
tion on nonnegative integers in that range and Q = 
[2m//Al. 6B = Bri = Bx = €/(2E max) andm = 
[2Bm Emax/€]. Like SA, the final inverse temperature is 
Bm = (y/2) log(2Vd /e), where y is the gap of E, that 
is, Y = ming¢gs, E(o)—E (So). The average cost of the 
QSA is then mQ = [27 YE max log(2Vd/e)/(eWA)I, 
and dependence on € can be made fully logarithmic by 
repeated executions of the algorithm. A quantum com- 
puter implementation of W can be efficiently done by 
using the algorithm that computes the nonzero conditional 
probabilities of the stochastic matrix S(B) 


Output o;: 
Pr(o;,€ Sp) > 1—e 


Applications 
Like SA, QSA can be applied to solve 
general discrete combinatorial optimization 


problems [14]. QSA is often more efficient 
than exhaustive search in finding the optimal 
configuration. Examples of problems where 
QSA can be powerful include the simulation of 
equilibrium states of Ising spin glasses or Potts 
models, solving satisfiability problems or solving 
the traveling salesman problem. 


Open Problems 


Some (classical) Monte Carlo implementations 
do not require varying an inverse temperature and 
apply the same (time-independent) transition rule 
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S to converge to the Gibbs distribution. The num- 
ber of times the transition rule must be applied 
is the so-called mixing time, which depends on 
the inverse spectral gap of S [15]. The devel- 
opment of quantum algorithms to speed up this 
type of Monte Carlo algorithms remains open. 
Also, the technique of spectral gap amplification 
outputs a Hamiltonian H() on input S(6). The 
relevant eigenvalue of such a Hamiltonian is 
zero, and the remaining eigenvalues are +./1;, 
where A; > A. This opens the door to a quantum 
adiabatic version of the QSA, in which H(f) is 
changed slowly and the quantum system remains 
in an “excited” eigenstate of eigenvalue zero at 
all times. The speedup is also due to the increase 
in the eigenvalue gap. Nevertheless, finding a 
different Hamiltonian path with the same gap, 
where the adiabatic evolution occurs within the 
lowest energy eigenstates of the Hamiltonians, is 
an open problem. 
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Problem Definition 


The problem is to find a vector x € C% such 
that Ax = b, for some given inputs A ¢ CN*% 
and b € CN. Several variants are also possi- 
ble, such as rectangular matrices A, including 
overdetermined and underdetermined systems of 
equations. 

Unlike in the classical case, the output of this 
algorithm is a quantum state on log(N) qubits 
whose amplitudes are proportional to the entries 
of x, along with a classical estimate of ||x|| := 
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V >; |xi|?. Similarly, the input b is given as a 
quantum state. The matrix A is specified implic- 
itly as a row-computable matrix. Specifying the 
input and output in this way makes it possible to 
find x in time sublinear, or even polylogarithmic, 
in N. The next section has more discussion of 
the relation of this algorithm to classical linear 
systems solvers. 


Key Results 


Suppose that: 


* A CY*N is Hermitian, has all eigenvalues 
in the range [—1,—1/«] U [1/x, 1] for some 
known « > 1, and has <s nonzero entries per 
row. The parameter « is called the condition 
number (defined more generally to be the ratio 
of the largest to the smallest singular value) 
and s is the sparsity. 

e There is a quantum algorithm running in time 
T4 that takes an input 7 € [N] and outputs the 
nonzero entries of the ith row, together with 
their location. 

¢ Assume that ||b|| = 1 and that there is a 
corresponding quantum state to produce the 
state |b) that runs in time Tz. 


Define x’ := A7! |b) and x = re 

We use the notation x to refer to the vector 
as a mathematical object and |x) to refer to the 
corresponding quantum state on log(N) qubits. 
For a variable T, let O(T) denote a quantity 
upper bounded by 7 - polylog(T). The norm 
of a vector ||x|] is the usual Euclidean norm 
Vv >; |xi|?, while for a matrix || A|| is the operator 
norm maxjx||=1 || Ax||, or equivalently the largest 
singular value of A. 


Quantum Algorithm for Linear Systems 

The main result is that |x) and ||x’|| can 
be produced, both up to error ¢€, in time 
poly(«,s,€—!,log(N), 74, Tp). More precisely, 
the following run-times are known: 


O(kTg + log(N)s?x?T4/e) [5] (Ja) 
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O(kTg + log(N)s?xT4/e?) [1] (1b) 


A key subroutine is Hamiltonian simulation, and 
the run-times in (1) are based on the recent 
improvements in this component due to [3]. 


Hardness Results and Comparison 
to Classical Algorithms 


These algorithms are analogous to classical al- 
gorithms for solving linear systems of equations, 
but do not achieve exactly the same thing. Most 
classical algorithms output the entire vector x as 
a list of N numbers, while the quantum algo- 
rithms output the state |x), i.e., a superposition 
on log(NV) qubits whose N amplitudes equal x. 
This allows potentially faster algorithms but for 
some tasks will be weaker. This resembles the 
difference between the Quantum Fourier Trans- 
form and the classical Fast Fourier Transform. 

To compare the classical and quantum com- 
plexities for this problem, we should consider 
classical tasks (with classical output) that can be 
solved with the help of quantum linear equations 
algorithms. One can show that better classical 
algorithms for such tasks exist only if a// quantum 
algorithms could be simulated more quickly by 
classical algorithms. This is because the linear 
systems problem is BQP-complete, i.e., solving 
large sparse well-conditioned linear systems of 
equations is equivalent in power to general pur- 
pose quantum computing. 

To make this precise, define LinearSystemSa- 
mple(N, x, €, T4) to be the problem of producing 
asamplei € [N] from a distribution p satisfying 
Sika lpi — |xil?] < €, where x = x'/||x'[, 
x’ = A7!b, andb = e, (i.e., one in the first entry 
and zero elsewhere). Additionally the eigenvalues 
of A should have absolute value between 1/« and 
1, and there should exist a classical algorithm for 
computing the entries of a row of A that runs in 
time 74. This problem differs slightly from the 
version described above, but only in ways that 
make it easier, so that it still makes sense to talk 
about a matching hardness result. 
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Theorem 1 Consider a quantum circuit on 
n qubits that applies two-qubit unitary gates 
U,,...,Ur to the \o)®" state and concludes 
by outputting the result of measuring the 
first qubit. It is possible to simulate this 
measurement outcome up to error € by reducing 
to LinearSystemSample(N, «,¢/2,74) with 
N = O("T/e), « = O(T/e), and Tg = 
poly log(N). 


In other words, LinearSystemSample is at 
least as hard to solve as any quantum computation 
of the appropriate size. This result is nearly tight. 
In other words, when combined with the algo- 
rithm of [1], the relation between N, « (for linear 
system solving) and n, T (for quantum circuits) 
is known to be nearly optimal, while the cor- 
rect € dependence is known up to a polynomial 
factor. 

Theorem | can also rule out classical algo- 
rithms for LinearSystemSample(N, x, €, 74). 
Known algorithms for the problem (assuming 
for simplicity that A is s-sparse) run in time 
poly(V) poly log(«/e) + NT a4 (direct solvers), 
N poly(k) poly log(1/e)7'4 (iterative methods), 
or even s*"(/©) poly log(N) (direct expansion 
of x & Lerma — A)"b, assuming A 
is positive semidefinite). Depending on the pa- 
rameters N,«,€,8, a different one of these may 
be optimal. And from Theorem | it follows (a) 
that any nontrivial improvement in these algo- 
rithms would imply a general improvement in the 
ability of classical computers to simulate quan- 
tum mechanics and (b) that such improvement is 
impossible for algorithms that use the function 
describing A in a black-box manner (i.e., as an 
oracle). 


Applications and Extensions 


Linear system solving is usually a subroutine in 
a larger algorithm, and the following algorithms 
apply it to a variety of settings. Complexity anal- 
yses can be found in the cited papers, but since 
hardness results are not known for them, we 
cannot say definitively whether they outperform 
all possible classical algorithms. 
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Machine Learning 

A widely used application of linear systems of 
equations is to performing least-squares estima- 
tion of a model [6]. In this problem, we are given 
amatrix A € R”*? withn > p (for an overdeter- 
mined model) along with a vector b € IR”, and we 
wish to compute arg minyeRe || Ax — b||. If A is 
well conditioned, sparse, and implicitly specified, 
then the state |x) can be found quickly [6], 
and from this features of x can be extracted by 
measurement. 


Differential Equations 

Consider the differential equation [2] 
X(t) = A(t)x(t) + b(t) x(t) ER. (2) 

One of the simplest ways to solve this is to 


discretize time to take values tj < ... < t» and 
approximate 


X(ti41) © X(t;) + (ACG) x (Gi) +5(G)) (G41—-4). 

(3) 
By treating (x(t),...,(tm)) as a single vector 
of size Nm, we can find this vector as a solution 
of the linear system of equations specified by (3). 
More sophisticated higher-order solvers can also 
be made quantum; see [2] for details. 


Boundary-Value Problems 

The solution to PDEs can also be expressed 
in terms of the solution to a linear system of 
equations [4]. For example, in Poisson’s equation 
we are given a function Q : R*? — R and want 
to find wu : R* — R such that —-V7u = Q. 
By defining x and b to be discretized versions 
of u, Q, this PDE becomes an equation of the 
form Ax = b. One challenge is that if A is the 
finite-difference operator (i.e., discretized second 
derivative) for an Lx Lx L box, then its condition 
number will scale as L?. Since the total number 
of points is O(L3), this means the quantum 
algorithm cannot achieve a substantial speedup. 
Classically this condition number is typically 
reduced by using preconditioners. A method for 
using preconditioners with the quantum linear 
system solver was presented in [4], along with 
an application to an electromagnetic scattering 
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problem. The resulting complexity is still not 
known. 
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Problem Definition 


Spatial Search and Walk Processes 

Spatial search by quantum walk is database 
search with the additional constraint that one 
is required to move through the search space 
that obeys some locality structure. For example, 
the data items may be stored at the vertices of a 
two-dimensional grid. The requirement of moves 
along the edges of the grid captures the cost 
of accessing different items starting from some 
fixed position in the database. 

One of possible ways of carrying out spatial 
search is by performing a random walk on the 
search space or its quantum analog, a quantum 
walk. The complexity of spatial search by quan- 
tum walk is strongly tied to the quantum hitting 
time [19] of the walk. 

Let S, with |S| = n, be a finite set of states. 
Assume that a subset M C S of states are 
marked. We are given a procedure C that, on in- 
put x € S and an associated data structure d(x), 
checks whether the state x is marked. The goal is 
either to find a marked state when promised that 
M # Q (search version) or to determine whether 
M is nonempty (decision version). 

The algorithm progresses in stages. In the 
setup stage, we access some state of S (usually 
a random state). In the walk stage we move 
from state to state, performing a spatial walk as 
described below. The moves are called updates. 
In addition, in the walk stage we perform checks 
to see if the current state is marked at steps 
selected by the algorithm. 

In the classical setting, the transition prob- 
abilities of the spatial walk are described by a 
stochastic matrix P = (px,y)x,yes. This makes 
the walk a Markov chain. In every move the 
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algorithm must perform a random transition ac- 
cording to P. The possible x —> y moves, 
ie., those with px,y ~ 0, form the edges of a 
(directed) graph G, and we say that the Markov 
chain P has locality structure G. 

We define the search problem in the classical 
setting, which carries over to the quantum case 
with little modification: 


INPUT: Markov chain P on set S, marked subset 
M C'S that is implicitly specified by a check- 
ing procedure C, and the associated costs: 


Cost type||Setup| Update] Checking 
Notation || S U Cc 


OUTPUT: a marked state if one exists (search 
version) or a Boolean return value that indi- 
cates whether M is empty or not (decision 
version). 


The algorithm is required to be correct with 
probability at least 2/3 in either case, the search 
or the decision problem. The significance of the 
setup cost, which is incurred only once, will be 
clearer when we see some applications. Often 
we can choose between several competing walks, 
and we would like to design the one with mini- 
mum total cost. 

In the quantum case, the random process P is 
replaced by a quantum walk Wp that has the same 
locality structure as P. The costs S, U, C reflect 
the costs of quantum operations. 


The Quantum Walk Algorithm 

Designing a quantum analog of P is not so 
straightforward, since stochastic matrices have no 
immediate unitary equivalents. One either needs 
to abandon the discrete-time nature of the walk 
[15] or define the walk operator on a space other 
than C*. Here we take the second route. 

We say that a Markov chain P is irreducible 
if its underlying digraph is strongly connected. 
Let P be an irreducible Markov chain, let z 
be its unique stationary distribution, and let P* 
(with P* = (px )) denote the time-reversed 
Markov chain, where py y := Wy Py,x/Mx. De- 
fine the following vectors in the vector space C°: 
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|Px) = y VDx.y ly) and 
yex 
It) = do pty lx) 
xeX 
Define the unitary operator Wp := R,R2 on 


C5*S as the product of the two reflections R2 := 


Pees lx)x] @ Clpx)(px| — and Ry 2 
DV yes (2|P})(P3|-D @|y)(y|. The operator Wp 
is called the quantum analog of P, or the 
discrete-time quantum walk operator arising 
from P, and may be viewed as a walk on 
the edges of the underlying graph G. We 
define a “checking” operator on C%, based 
on whether or not the current state is marked: 
Om ‘= Vixem |X)(x1— Vem |X)(1- 

In the above description, we have suppressed 
the data structure associated with a state in the 
Markov chain for the sake of simplicity. The pre- 
cise description of the operators can be derived 
via the isometry |x) +> |x)|d(x)) between the 
appropriate spaces (see, e.g., Refs. [28,29]). The 
data structure becomes especially significant in 
the context of the complexity of the operators. 

A search algorithm by quantum walk is de- 
scribed by a quantum circuit that acts on “reg- 
isters” or “wires” which are associated with the 
space CS @ CS @C*, for some k > 0. We again 
suppress the registers carrying the data structure. 
The first two registers hold the current edge, 
and the last register holds auxiliary information, 
or work space, that drives the quantum walk. 
The quantum circuit implements the composi- 
tion X := X;X;-1--- X1, where each_X; is either 
Wp or Oy acting on the edge registers, possibly 
controlled by the auxiliary register, or a unitary 
operator independent of P and M acting on any 
of the registers. The circuit X is applied to a 
suitably constructed initial state |o). 

We associate a cost with each operator as 
a measure of its complexity, with respect to a 
resource of interest. The resource could be cir- 
cuit size or in the query model (which is the 
more typical application) the number of queries. 
We denote the cost of implementing Wp as a 
quantum circuit in the units of the resource of 
interest by U (update cost), the cost of construct- 
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ing Oy by C (checking cost), and the cost of 
preparing the initial state, |fo), of the algorithm 
by S (setup cost). Every time an operator is 
used, we incur the cost associated with it. This 
abstraction, implicit in Ref. [3] and made explicit 
in Ref. [28], allows Wp and Oy to be treated 
as black-box operators and provides a convenient 
way to capture time complexity or, in the quan- 
tum query model, query complexity. The cost of 
the sequence X;X;-1--:X 1 is the sum of the 
costs of the individual operators. The observation 
probability is the probability that we observe an 
element of M on measuring the first register of 
the final state, |¢;) := X|@o), in the standard 
basis (|x))xeg. In the decision version of the 
problem, we measure a fixed single qubit of the 
auxiliary register in the standard basis to obtain 
the output of the algorithm. 


Key Results 


Walk Definitions 

Quantum walks were first introduced by David 
Meyer and John Watrous to study quantum cellu- 
lar automata and quantum logspace, respectively. 
Discrete-time quantum walks were investigated 
for their own sake by Ambainis, Bach, Nayak, 
Vishwanath, and Watrous [4, 32] and Aharonov, 
Ambainis, Kempe, and Vazirani [2] on the infinite 
line and the n-cycle, respectively. The central 
issues in the early development of quantum walks 
included the definition of the walk operator, no- 
tions of mixing and hitting times, and the speedup 
achievable compared to the classical setting. 


Hitting Time 

Exponential quantum speedup of the hitting time 
between antipodes of the hypercube was shown 
by Kempe [19]. Childs, Cleve, Deotto, Farhi, 
Gutmann, and Spielman [13] presented the first 
oracle problem solvable exponentially faster by a 
quantum walk-based algorithm than by any (not 
necessarily walk-based) classical algorithm. 

The first systematic studies of quantum hitting 
time on the hypercube and the d-dimensional 
torus were conducted by Shenvi, Kempe, 
and Whaley [34] and Ambainis, Kempe, and 
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Rivosh [5]. Improving upon the Grover search- 
based spatial search algorithm of Aaronson 
and Ambainis, Ambainis et al. [5] showed that 
the d-dimensional torus with n nodes can be 
searched by quantum walk in ./n steps with 
observation probability 2(1) for d > 3 and in 
nlogn in steps and observation probability 
Q(1/logn) for d = 2 (see also Ref. [11]). 
Combining the algorithm for d = 2 with 
amplitude amplification [9], we get an algorithm 
with observation probability §2(1), at a cost that 
is a multiplicative factor of ,/logn larger. 

In the results in Refs. [13, 19], the algorithm 
has implicit knowledge of the target state, as 
the walk starts from a state whose location is 
“related” to that of the target. It is not known if 
we can achieve an exponential speedup when the 
walk starts in a state that is independent of the 
target. 


Element Distinctness 

The first result that used a quantum walk to 
solve a natural algorithmic problem, the so-called 
element distinctness problem, was due to Am- 
bainis [3]. The problem is to find out if among 
the set of s elements of a database, two are 
identical. Ambainis constructed a walk on the 
Johnson graph J(r,s) whose vertices are the r- 
size subsets of a universe of size s (in his case 
the universe corresponds to the set of all database 
elements), with two subsets connected iff their 
symmetric difference has size two. A subset is 
marked, i.e., it is an element of M, if it captures 
two identical database elements. In the quantum 
(but also the classical) query model, the setup 
cost is r, which stands for the cost of down- 
loading r (random) database elements. Update 
incurs a constant cost, as it requires reading a 
new database element and forgetting an old one. 
Furthermore, since we are in the query model, 
the checking cost is zero, since whether a state 
is marked can be deduced from the currently held 
database elements without any further download. 
Ambainis ingeniously balanced the costs of S and 
U finding that in the quantum case, the optimum 
choice for r is s?/3, leading to a query complex- 
ity of s?/> (this is a nontrivial balance: in the 
classical case, the same walk gives no speedup). 
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In contrast, the Grover algorithm, the inspiration 
behind Ambainis’ work, has no balancing option: 
its setup and update costs are zero in the query 
model. (The Grover search may be viewed as a 
quantum walk on the complete graph.) It turns 
out that the above walk-based quantum query 
algorithm with complexity O(s?/?) matches the 
lower bound due to Aaronson and Shi [1]. 


General Markov Chains 

Ambainis’s result is based on the quantum hitting 
time of J(r,s) for a marked set of relative size 
Ge. In Ref. [35], Szegedy investigates the hit- 
ting time of quantum walks arising from general 
Markov chains. His definitions (walk operator, 
hitting time) are abstracted directly from Ref. [3] 
and are consistent with prior literature, although 
slightly different in presentation. 

For a Markov chain P, the (classical) 
average hitting time of M can be expressed 
in terms of the leaking walk matrix Py, 
which is obtained from P by deleting all rows 
and columns indexed by states of M. Let 
U1,-+-,Un—m, be the normalized eigenvectors 
of Py, and let A,,...,An—m be the associated 
eigenvalues, where m = |M|. Let h(x,M) 
denote the expected time to reach M from x. 
Let wp : S > R?* be the initial distribution 
from which we start and wy’ its restriction to 
S \ M. Denote the vector (/u'(x) )xes\m 
by u. Then the average hitting time of M 
ish = Dyes HO) RG, M) = DART ewe, 
Although the leaking walk matrix Py is not 
stochastic, one can consider the absorbing 
walk matrix P’ = [ * af where P” is 
the matrix obtained from P by deleting the 
rows indexed by M and the columns indexed 
by S \ M. The walk P’ behaves like P but 
is absorbed by the first marked state it hits. 
Consider the quantum analog Wp, of P’ and 
ldo) := Yves V4(x)|x)|px), where w is the 
stationary distribution of P. The state |@o) is 
stationary for Wp, i.e., an eigenvector with 
eigenvalue 1. Define the quantum hitting time, 
H, of set M to be the smallest ¢ for which 
|| Wp-|@o) — |bo)|| = 0.1. Note that the cost of 
Wp: is proportional to U + C. 
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The motivation behind this definition of quan- 
tum hitting time is the following. The classical 
hitting time measures the number of iterations 
of the absorbing walk P’ required to noticeably 
skew the uniform starting distribution. Similarly, 
the quantum hitting time bounds the number of 
iterations of the following quantum algorithm 
for detecting whether M is nonempty: At each 
step, apply operator Wp,. If M is empty, then 
P’ = P and the starting state is left invariant. If 
M is nonempty, then the angle between W},|0) 
and Wj |¢o) gradually increases (for t not too 
large). Using an additional control register to 
apply either Wp, or Wp with quantum control, 
the divergence of these two states (should M be 
nonempty) can be detected. The required number 
of iterations is characterized by H. 

It remains to compute H. When P is symmet- 
ric and ergodic, the expression for the classical 
hitting time has a quantum analog [35] (we as- 
sume m < n/2 for technical reasons): 


H< yk () 


where vy = (vz, u). Note that u = wal, saey lL); 
since P is symmetric, so vg sum of the coor- 
dinates of vg divided by 1/./n. From (1) and 
the expression for h, one can derive an amazing 
connection between the classical and quantum 


hitting times: 


Theorem 1 (Szegedy [35]) Let P be symmetric 
and ergodic, and let h be the classical hitting 
time for marked set M and uniform starting 
distribution. Then the quantum hitting time of M 
is at most Wh. Therefore, the cost of solving the 
decision version of the problem is of order S + 


JVh(U + C). 
One can further show: 


Theorem 2 (Szegedy [35]) Jf P is state- 
transitive and |M| = 1, then the marked state is 


observed with probability at least n/h with cost 
O(S + Vh(U + C)). 


The observation probability n/h can be 
increased to O(1) with //n iterations of the 
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algorithm from Theorem 2, using amplitude 
amplification [9]. Theorems | and 2 imply most 
quantum hitting time results of the previous 
section directly, relying only on estimates of the 
corresponding classical hitting times. Expression 
(1) is based on a fundamental connection 
between the eigenvalues and eigenvectors of 
P and Wp. Notice that pj, Py,x for 
symmetric P, so |py) = |py). So Ri and Ro 
are reflections through the subspaces generated 
by t|px) ® |x)| x € S} and {]x) @ |px)| x € S}, 
respectively. The eigenvalues of R,R2 can be 
expressed in terms of the eigenvalues of the 
mutual Gram matrix D(P) of these systems. 
This matrix D(P), the discriminant matrix 
of P, equals P when P is symmetric. The 
formula remains fairly simple even when P is 
not symmetric. In particular, the absorbing walk 
P’ has discriminant matrix [* Le “Ae Finally, 
the relation between D(P) and the spectral 
decomposition of Wp is given by: 


Theorem 3 (Szegedy [35]) Let P be an arbi- 
trary Markov chain on a finite state space S 
and let cos 0, > --- > cos @ be those singular 
values of D(P) lying in the open interval (0, 1), 
with associated singular vector pairs v;,w; for 
1 < j <I. Then the nontrivial eigenvalues of 
Wp (namely, those other than 1 and —1) and their 
corresponding eigenvectors are (e~7*97 , Rw aes 
e797 Rov;) and (e797, Ryw; — e'9/ Rov;) for 
ei se 


Subsequent Developments 

Magniez, Nayak, Roland, and Santha [29] used 
the Szegedy quantum analog Wp of an ergodic 
walk P, rather than that of its absorbing ver- 
sion P’, to develop a search algorithm in the style 
of Ambainis 


[3]. 


Theorem 4 (Magniez, Nayak, Roland, San- 
tha [29]) Let P be reversible and ergodic with 
spectral gap 5 > 0. Let M have probability either 
zero or € > O under the stationary distribution 
of P. There is a quantum algorithm solving the 


search problem with cost S + 7 (GU +C). 
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The main idea here is to apply quantum phase 
estimation [14, 21] to the quantum walk Wp 
in order to implement an approximate reflection 
operator about the initial state. This operator is 
then used along with the checking operator Oy 
in an amplitude amplification scheme to get the 
final algorithm. 

The average classical hitting time 4 may be 
bounded by 1/de (with 6, M, ¢ as in Theorem 4), 
and this bound is tight for most known applica- 
tions. In these applications, the above algorithm 
finds marked elements with complexity at most 
that of the Szegedy algorithm. In other applica- 
tions, for instance, Triangle Finding [28], where 
the checking cost C is much larger than the 
update cost U, the complexity of the algorithm 
in Theorem 4 is asymptotically smaller. 

In the case of the two-dimensional square 
grid with n vertices, the average classical hitting 
time h is nlogn. This is asymptotically lesser 
than 1/dé¢ when there is a single marked element. 
(In this case, 1/5e = n?.) Algorithms due to Am- 
bainis et al. [5] and Szegedy [35] find a unique 
marked state with O(./n logn) steps of quantum 
walk, a ,/logn factor larger than V/h. Tulsi [36] 
showed how we may find a unique marked ele- 
ment in O( Vh) steps. Magniez, Nayak, Richter, 
and Santha [30] extended this result to show that 
for any state-transitive Markov chain, a unique 
marked state can be found in O(h ) steps. They 
also devised a detection algorithm that solves 
the decision version of the problem for any re- 
versible Markov chain and any number of marked 
elements, in O(//) steps (thus extending Theo- 
rem 1). 

Krovi, Magniez, Ozols, and Roland [23] pre- 
sented a different quantum algorithm for find- 
ing multiple marked elements in any reversible 
Markov chain. They introduced a notion of inter- 
polation between any reversible chain P and its 
absorbing counterpart P’ and used the quantum 
analog of the interpolated walk. In the case of a 
unique marked element, the resulting algorithm 
solves the search version of the problem with 
cost S + /h(U + C). The precise relationship 
between the number of steps of the quantum 
walk taken by the algorithm in the case of more 
than one marked element and the corresponding 
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classical hitting time remains open. It is known 
that for certain choices of P and M, the former 
may be asymptotically larger than Jh. 

The schema due to Magniez et al. [29] de- 
scribed above has been extended in different 
ways. Jeffery, Kothari, and Magniez [17] use a 
quantum state as the data structure d(x) associ- 
ated with a state x € S in quantum algorithms 
with nested walks. In this manner, they avoid 
the repeated overhead of setup cost in the inner 
quantum walks used for checking marked states. 
They solve several problems, including Triangle 
Finding, with query as well as time complexity 
matching, up to polylogarithmic factors, the per- 
formance of algorithms previously derived from 
learning graphs [7,26]. Childs, Jeffery, Kothari, 
and Magniez [8] introduced the use of a data 
structure that depends on the state transition in 
the walk. Using this, they develop quantum al- 
gorithms with nested walks, where the recursion 
occurs in the update operation. The cost incurred 
is essentially what we would expect from The- 
orem 4. This extension leads to algorithms that 
are as efficient in time as in query complexity, 
for applications such as 3-Distinctness. Indepen- 
dently, Belovs designed a different quantum walk 
algorithm [8], which leads to a similar result for 
3-Distinctness. 


Applications 


We list some quantum walk-based results for 
search problems that represent speedups over 
Grover search-based solutions. All are inspired 
by Ambainis’ algorithm for element distinctness. 


Triangle Finding 

Suppose we are given the adjacency matrix A of a 
graph on 7 vertices and are required to determine 
if the graph contains a triangle (i.e., a clique 
of size 3), using as few queries as possible to 
the entries of A. The classical query complexity 
of this problem is O(n”). Magniez, Santha, and 
Szegedy [28] gave an O(n! 3) algorithm. This 
upper bound has been improved by a sequence 
of results [7, 25, 26, 29] (see also Ref. [17]) to 
O(n>/*). Several of these algorithms, including 
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the current best algorithm due to Le Gall [25], are 
based on the quantum walk search framework. 


Matrix Product Verification and Matrix 
Multiplication 

Suppose we are given three n x n matrices A, 
B, C over a ring and are required to determine 
if AB # C, ie., if there exist i, 7 such that 
>>, Aik Be; F Ciz. We would like to make 
as few queries as possible to the entries of A, 
B, and C. This problem has classical query 
complexity @(n7). Buhrman and Spalek [10] 
gave an O(n5/3) quantum query algorithm. They 
also observed that two Boolean matrices can be 
multiplied with query complexity O(n3/?,/2), 
where £ is the number of nonzero entries in 
the product. This has since been improved in 
a sequence of results [16, 24, 37] to O(nV2). 
The algorithm due to Le Gall [24] builds upon 
quantum walk algorithms. We refer the reader to 
Ref. [22] for further work on this topic. 


Group Commutativity Testing 

Suppose we are presented with a black-box group 
specified by its k generators and are required to 
determine if the group commutes using as few 
queries as possible to the group product operation 
(i.e., queries of the form “What is the product of 
elements g and h?”). The classical query com- 
plexity is O(k) group operations. Magniez and 
Nayak [27] gave an (essentially optimal) O(k?/ =) 
quantum query algorithm for this problem. The 
algorithm involves a quantum walk on the prod- 
uct of two graphs whose vertices are ordered /- 
tuples of distinct generators. 


Forbidden Subgraph Property 

A property of graphs is called minor closed when 
the following condition holds: if a graph has 
the property, then all its minors also possess the 
property. A graph property (which need not be 
minor closed) is called a forbidden subgraph 
property (FSP) if it can be described by a finite set 
of forbidden subgraphs. Suppose we are given the 
adjacency matrix A of a graph on n vertices and 
are required to determine if the graph has a minor 
closed property /7, using as few queries as possi- 
ble to the entries of A. Childs and Kothari [12] 
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show that if J7 is nontrivial and is not FSP, 
then it has query complexity in @(n3/). They 
complement this with a more efficient algorithm 
for any minor closed property [7 that is FSP. 
The algorithm has query complexity O(n) for 
some a < 3/2 and is based on the quantum walk 
search framework. 


3-Distinctness 

This is a _ generalization of the element 
distinctness problem. Suppose we are given 
elements X1,...,Xm € {l,...,m} and are 
asked if there exist three distinct indices i, j,k 
such that x; = x; = xx. The Ambainis 
quantum walk algorithm achieves query and time 
complexity O(m?/*). The query complexity was 
improved to O(m5/7) by Belovs [6] using a new 
technique — learning graphs, while the best time 
complexity remained unchanged. Childs et al. [8] 
later designed ftime-efficient query algorithms 
with complexity O(m5/7), using extensions of 
the quantum walk search framework. 


Open Problems 


Many issues regarding quantum analogs of 
Markov chains remain unresolved, both for the 
search problem and the closely related mixing 
problem. 


Search Problem 

Can the quadratic quantum speedup of hitting 
time for the decision version of the problem 
be extended from all reversible Markov chains 
to all ergodic ones? Can quantum walks also 
find marked elements quadratically faster than 
classical walks, in the case of reversible Markov 
chains with multiple marked states? What other 
algorithmic applications of search by quantum 
walk can be found? 


Sampling Problem 

Another wide use of Markov chains in classical 
algorithms is in generating samples from certain 
probability distributions. In particular, Markov 
chain Monte Carlo algorithms work by running 
a carefully designed ergodic Markov chain. After 
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a number of steps given by the mixing time 
of P, the distribution over states is guaranteed to 
be €-close to its stationary distribution 2. Such 
algorithms form the basis of most randomized 
algorithms for approximating #P-complete prob- 
lems (see, e.g., Ref. [18]). The sampling problem 
may be formalized as follows: 


INPUT: Markov chain P, tolerance € € (0, 1). 
OUTPUT: A sample from a distribution that is €- 
close to z in total variation distance. 


Notions of quantum mixing time were first 
proposed and analyzed on the line, the cycle, 
and the hypercube [2, 4, 31,32]. Kendon and Tre- 
genna [20] and Richter [33] have investigated the 
use of decoherence in improving mixing of quan- 
tum walks. Two fundamental questions about 
quantum mixing time remain open: What is the 
“most natural” definition? And when is there a 
quantum speedup over the classical mixing time? 
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Problem Definition 


A knot invariant is a function on knots (or 
links —i.e., circles embedded in R?) which is 
invariant under isotopy of the knot, i.e., it does 
not change under stretching, moving, tangling, 
etc. (cutting the knot is not allowed). In low 
dimensional topology, the discovery and use 
of knot invariants is of central importance. In 
1984, Jones [12] discovered a new knot invariant, 
now called the Jones polynomial Vz (ft), which 
is a Laurent polynomial in ./f with integer 
coefficients and which is an invariant of the 
link L. In addition to the important role it has 
played in low dimensional topology, the Jones 
polynomial has found applications in numerous 
fields, from DNA recombination [16] to statistical 
physics [20]. 

From the moment of the discovery of the 
Jones polynomial, the question of how hard it is 
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to compute became important. There is a very 
simple inductive algorithm (essentially due to 
Conway [5]) to compute it by changing crossings 
in a link diagram, but, naively applied, this takes 
exponential time in the number of crossings. It 
was shown [11] that the computation of Vz(t) 
is #P-hard for all but a few values of t where 
V(t) has an elementary interpretation. Thus, a 
polynomial time algorithm for computing Vz (t) 
for any value of ¢ other than those elementary 
ones is unlikely. Of course, the #P-hardness of 
the problem does not rule out the possibility 
of good approximations. Still, the best classical 
algorithms to approximate the Jones polynomial 
at all but trivial values are exponential. Simply 
stated, the problem becomes: 


Problem 1 For what values of tf and for what 
level of approximation can the Jones polynomial 
Vz (t) be approximated in time polynomial in the 
number of crossings and links of the link L? 


Key Results 


As mentioned above, exact computation of the 
Jones polynomial for most ¢ is #P-hard, and the 
best known classical algorithms to approximate 
the Jones polynomial are exponential. The key 
results described here consider the above problem 
in the context of quantum rather than classical 
computation. 

The results concern the approximation of links 
that are given as closures of braids. (All links 
can be described this way.) Briefly, a braid of n 
strands and m crossings is described pictorially 
by n strands hanging alongside each other, with 
m crossings, each of two adjacent strands. A 
braid B may be “closed” to form a link by tying 
its ends together in a variety of ways, two of 
which are the trace closure (denoted by B") 
which joins the ith strand from the top right to the 
ith strand from the bottom right (for each 7) and 
the plat closure (denoted by B?') which is defined 
only for braids with an even number of strands by 
connecting pairs of adjacent strands (beginning at 
the rightmost strand) on both the top and bottom. 
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Quantum 
Approximation of the 
Jones Polynomial, Fig. 1 
The trace closure (/eft) and 
plat closure (right) of the 
same 4-strand braid 


Examples of the trace and plat closure of the same 
4-strand braid are given in Fig. 1. 

For such braids, the following results have 
been shown by Aharonov, Jones, and Landau: 


Theorem 1 ((2]) Fora given braid B in By with 
m crossings and a given integer k, there is a 
quantum algorithm which with probability 1 — 
c2@4+m+h) outputs a complex number r with |r — 
Vpu (e-n7*) | < ed”! where d = 2cos(x/k) 
and € is inverse polynomial inn, k,m, using time 
that is polynomial inn,m, k. 


Theorem 2 ((2]) Fora given braid B in By with 
m crossings and a given integer k, there is a 
quantum algorithm which with probability 1 — 
c2@4+m+k) outputs a complex number r with |r — 
Vppi (erm) | < ed"! where d = 2cos(x/k) 
and € is inverse polynomial inn, k,m, using time 
that is polynomial inn,m, k. 


The original connection between quantum 
computation and the Jones polynomial was 
made earlier in the series of papers [6-9]. 
A model of quantum computation based on 
Topological Quantum Field Theory (JQFT) and 
Chern-Simons theory was defined in [6, 9], and 
Kitaev, Larsen, Freedman, and Wang showed 
that this model is polynomially equivalent in 
computational power to the standard quantum 
computation model in [7, 8]. These results, 
combined with a deep connection between 
TQFT and the value of the Jones polynomial 
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at particular roots of unity discovered by Witten 
13 years earlier [18], implicitly implied (without 
explicitly formulating) an efficient quantum 
algorithm for the approximation of the Jones 
polynomial at the value e?7'/9, 

The approximation given by the above algo- 
rithms are additive, namely, the result lies in 
a given window, whose size is independent of 
the actual value being approximated. The for- 
mulation of this kind of additive approxima- 
tion was given in [4]; this is much weaker than 
a multiplicative approximation, which is what 
one might desire (again, see discussion in [4]). 
One might wonder if under such weak require- 
ments, the problem remains meaningful at all. It 
turns out that, in fact, this additive approxima- 
tion problem is hard for quantum computation, 
a result originally shown by Freedman, Kitaev, 
and Wang: 


Theorem 3 (Adapted from [8]) The problem of 
approximating the Jones polynomial of the plat 
closure of a braid at e2ni/k for constant k, to 
within the accuracy given in Theorem 2, is BQP- 
hard. 


A different proof of this result was given 
in [19], and the result was strengthened by 
Aharonov and Arad [1] to any k which is 
polynomial in the size of the input, namely, 
for all the plat closure cases for which the 
algorithm is polynomial in the size of the 
braid. 
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Understanding the Algorithm 
The structure of the solution described by Theo- 
rems | and 2 consists of four steps: 


1. Mapping the Jones polynomial computation 
to a computation in the Temperley-Lieb al- 
gebra. There exists a homomorphism of the 
braid group inside the so-called Temperley- 
Lieb algebra (this homomorphism was the 
connection that led to the original discovery 
of the Jones polynomial in [12]). Using this 
homomorphism, the computation of the Jones 
polynomial of either the plat or trace closure 
of a braid can be mapped to the computation 
of a particular linear functional (called the 
Markov trace) of the image of the braid in 
the Temperley-Lieb algebra (for an essential 
understanding of a geometrical picture of the 
Temperley-Lieb algebra, see [14]). 

2. Mapping the Temperley-Lieb algebra calcula- 
tion into a linear algebra calculation. Using 
a representation of the Temperley-Lieb alge- 
bra, called the path model representation, the 
computation in step 1 is shown to be equal 
to a particular weighted trace of the matrix 
corresponding to the Temperley-Lieb algebra 
element coming from the original braid. 

3. Choosing the parameter t corresponding to 
unitary matrices. The matrix in step 2 is a 
product of basic matrices corresponding to 
individual crossings in the braid group; an 
important characteristic of these basic ma- 
trices is that they have a local structure. In 
addition, by choosing the values of f as in 
Theorems | and 2, the matrices corresponding 
to individual crossings become unitary. The 
result is that the original problem has been 
turned into a weighted trace calculation of a 
matrix formed from a product of local unitary 
matrices — a problem well suited to a quantum 
computer. 

4. Implementing the quantum algorithm. Finally 
the weighted trace calculation of a matrix 
described in step 3 is formally encoded into 
a calculation involving local unitary matrices 
and qubits. 


A nice exposition of the algorithm is given in 
[15]. 
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Applications 


Since the publication [2], a number of 
interesting results have ensued investigating 
the possibility of quantum algorithms for other 
combinatorial/topological questions. Quantum 
algorithms have been developed for the case 
of the HOMFLY-PT two-variable polynomial 
of the trace closure of a braid at certain pairs of 
values [19]. (This entry also extends the results of 
[2] to a class of more generalized braid closures; 
it is recommended reading as a complement to 
[2] or [15] as it gives the representation theory of 
the Jones-Wenz] representations, thus putting the 
path model representation of the Temperley-Lieb 
algebra in a more general context.) A quantum 
algorithm for the colored Jones polynomial is 
given in [10]. 

Significant progress was made on the question 
of approximating the partition function of the 
Tutte polynomial of a graph [3]. This polyno- 
mial, at various parameters, captures important 
combinatorial features of the graph. Intimately 
associated to the Tutte polynomial is the Potts 
model, a model originating in statistical physics 
as a generalization of the Ising model to more 
than 2 states [17,20]; approximating the partition 
function of the Tutte polynomial of a graph is a 
very important question in statistical physics. The 
work of [3] develops a quantum algorithm for 
additive approximation of the Tutte polynomial 
for all planar graphs at all points in the Tutte 
plane and shows that for a significant set of 
these points (though not those corresponding to 
the Potts model) the problem of approximating 
is a complete problem for a quantum computer. 
Unlike previous results, these results use non- 
unitary representations. 


Open Problems 


There remain many unanswered questions related 
to the computation of the Jones polynomial from 
both a classical and quantum computational point 
of view. 

From a classical computation point of 
view, the originally stated Problem | remains 
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wide open for all but trivial choices of f. 
A result as strong as Theorem 2 for a 
classical computer seems unlikely since it 
would imply (via Theorem 3) that classi- 
cal computation is as strong as quantum 
computation. A result by Jordan and Shor 
[13] shows that the approximation given in 
Theorem | solves a complete problem for a 
presumed (but not proven) weaker quantum 
model called the one-clean-qubit model. 
Since this model weaker than the 
full quantum computation model, a classical 
result as strong as Theorem | for the trace 
closure of a braid is perhaps in the realm of 
possibility. 

From a quantum computational point of view, 
various open directions seem worthy of pursuit. 
Most of the quantum algorithms known as of the 
writing of this entry are based on the quantum 
Fourier transform and solve problems which 
are algebraic and number theoretical in nature. 
Arguably, the greatest challenge in the field of 
quantum computation (together with the physical 
realization of large scale quantum computers) 
is the design of new quantum algorithms 
based on substantially different techniques. The 
quantum algorithm to approximate the Jones 
polynomial is significantly different from the 
known quantum algorithms in that it solves a 
problem which is combinatorial in nature, and it 
does so without using the Fourier transform. 
These observations suggest investigating the 
possibility of quantum algorithms for other 
combinatorial/topological questions. Indeed, the 
results described in the applications section above 
address questions of this type. Of particular 
interest would be progress beyond [3] in the 
direction of the Potts model, specifically either 
showing that the approximation given in [3] is 
non-trivial or providing a different non-trivial 
algorithm. 
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Problem Definition 


Quantum information theory distinguishes classi- 
cal bits from quantum bits or qubits. The quantum 
state of n qubits is represented by a complex vec- 
tor in (C”)®”, where (C*)®" is the tensor prod- 
uct of mn 2-dimensional complex vector spaces. 
Classical n-bit strings form a basis for the vector 
space (C7)®”". Column vectors in (C”)®” are 
denoted as |i) and row vectors are denoted as 
|W)? = |w)*7 = (|. The complex inner prod- 
uct between vectors |r) and |) is conveniently 
written as (|). 

Entangled quantum states |W) € (C?)®”" are 
those quantum states that cannot be written as 
a product of some vectors |w;) € C7, that is, 
Iv) A ®& |i). The Bell states are four orthog- 


L 
onal (maximally) entangled states defined as 
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Woo) = va! 00) + |11)), 
ie S50) + |I1)), 
1 
Voi) = Wri O1) + |10)), 
U1) = £((01) + |10)). 
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The Pauli matrices X, Y, and Z are three unitary, 
Hermitian 2 x 2 matrices. They are defined as 
X = |0)(1| + |1)(O], Z = |0){O| — |1)(1] and 
Y =iXZ. 

Quantum states can evolve dynamically under 
inner product preserving unitary operations 
U(U-! = Ut). Quantum information can be 
mapped onto observable classical information 
through the formalism of quantum measure- 
ments. In a quantum measurement on a state | 1) 
in (C7)®", a basis {|x)} in (C?)®” is chosen. This 
basis is made observable through an interaction 
of the qubits with a macroscopic measurement 
system. A basis vector x is thus observed with 
probability P(x) = |(x|y)|?. 

Quantum information theory or more narrowly 
quantum Shannon theory is concerned with pro- 
tocols which enable distant parties to efficiently 
transmit quantum or classical information, pos- 
sibly aided by the sharing of quantum entangle- 
ment between the parties. For a detailed introduc- 
tion to quantum information theory, see the book 
by Nielsen and Chuang [12]. 


Key Results 


Superdense coding [3] is the protocol in which 
two classical bits of information are sent from 
sender Alice to receiver Bob. This is accom- 
plished by sharing a Bell state |Wo9)4p3 between 
Alice and Bob and the transmission of one qubit. 
The protocol is illustrated in Fig. 1. Given two 
bits b, and b2, Alice performs the following 
unitary transformation on her half of the Bell 
state: 
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Quantum Dense Coding, Fig. 1 Dense coding. Alice 
and Bob use a shared Bell state to transmit two classical 
bits b = (b,, b2) by sending one qubit. Double lines are 
classical bits and single lines represent quantum bits 


Poids &® T3|Woo) = |Wo1 bo): (1) 


1.e., one of the four Bell states. Here Poo = J, 
Po. = X, Pio = Z, and Piy = XZ = -iY. 
Alice then sends her qubit to Bob. This allows 
Bob to do a measurement in the Bell basis. He 
distinguishes the four states |W%1,2) and learns 
the value of the two bits b; and do. 

The protocol demonstrates the interplay be- 
tween classical information and quantum infor- 
mation. No information can be communicated 
by merely sharing an entangled state such as 
|Woo) without the actual transmission of physical 
information carriers. On the other hand, it is 
a consequence of Holevo’s theorem [10] that 
one qubit can encode at most one classical bit 
of information. The protocol of dense coding 
shows that the two resources of entanglement and 
qubit transmission combined give rise to a su- 
perdense coding of classical information. Dense 
coding is thus captured by the following resource 
inequality: 


1 ebit + 1 qubit > 2 cbits. (2) 


In words, one bit of quantum entanglement (one 
ebit) in combination with the transmission of one 
qubit is sufficient for the transmission of two 
classical bits or cbits. 

Dense coding can be generalized to the encod- 
ing of continuous variables, namely, the encoding 
of quadrature variables (x, p) of an electromag- 
netic field into one half of a two-mode squeezed 
state [2]. Such a two-mode squeezed state ap- 
proximates the two-mode EPR state — in which 
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both quadrature variables are perfectly correlated, 
Le., X1 = X2 and py = —p2 — in the limit 
of large squeezing. The authors in [2] show that 
the information transmission capacity through the 
EPR state is, in the limit of large squeezing, 
twice that of a direct encoding using a single 
transmitted mode. The scheme thus exemplifies 
the notion of dense coding through the use of 
quantum entanglement. 

Quantum teleportation [4] is a protocol that is 
dual to dense coding. In quantum teleportation, 
1 ebit (a Bell state) is used in conjunction with 
the transmission of two classical bits to send 
one qubit from Alice to Bob. Thus, the resource 
relation for quantum teleportation is 


1 ebit + 2 cbits > 1 qubit. (3) 


The relation with quantum teleportation allows 
one to argue that dense coding is optimal. It 
is not possible to encode 2k classical bits in 
less than m < k quantum bits even in the 
presence of shared quantum entanglement. Let us 
assume the opposite and obtain a contradiction. 
One uses quantum teleportation to convert the 
transmission of k quantum bits into the trans- 
mission of 2k classical bits. Then one can use 
the assumed superdense coding scheme to en- 
code these 2k bits into m < k qubits. As a 
result one can send & quantum bits by effectively 
transmitting m < k quantum bits (and sharing 
quantum entanglement) which is known to be 
impossible. 


Applications 


Harrow [8] has introduced the notion of a coher- 
ent bit or cobit. The notion of a cobit is useful 
in understanding resource relations and trade-offs 
between quantum and classical information. The 
noiseless transmission of a qubit from Alice to 
Bob can be viewed as the linear map Sq : |x) 4 > 
|x)p for a set of basis states {|x)}. The transmis- 
sion of a classical bit can be viewed as the linear 
map Se : |x)a — |x)s|x)z where E stands for 
the environment Eve. Eve’s copy of every basis 
state |x) can be viewed as the output of a quantum 
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measurement, and thus, Bob’s state is classical. 
The transmission of a cobit corresponds to the 
linear map Sco : |x)a — |X)alx)p. Since Alice 
keeps a copy of the transmitted data, Bob’s state 
is classical. On the other hand, the cobit can also 
be used to generate a Bell state between Alice 
and Bob. Since no qubit can be transmitted via 
a cobit, a cobit is weaker than a qubit. A cobit 
is stronger than a classical bit since entanglement 
can be generated using a cobit. 

One can define a coherent version of 
superdense coding and quantum teleportation 
in which measurements are replaced by unitary 
operations. In this version of dense coding, Bob 
replaces his Bell measurement by a rotation 
of the states |W,152) to the states |byb2)s. 
Since Alice keeps her input bits, the coherent 
protocol implements the map |x1x2)4 —> 
|x1X2)a|X1X2)p. Thus, one can strengthen the 
dense coding resource relation to 


1 ebit + 1 qubit > 2 cobits. (4) 
Similarly, the coherent execution of quantum 
teleportation gives rise to the modified relation 
2 cobits+ 1 ebit > 1 qubit+ 2 ebits. One can omit 
lebit on both sides of the inequality by using 
ebits catalytically, i.e., they can be borrowed and 
returned at the end of the protocol. One can then 
combine both coherent resource inequalities and 
obtain a resource equality: 


2 cobits = | qubit + | ebit. (5) 


A different extension of dense coding is the 
notion of superdense coding of quantum states 
proposed in [9]. Instead of dense coding clas- 
sical bits, the authors in [9] propose to code 
quantum bits whose quantum states are known to 
the sender Alice. This last restriction is usually 
referred to as the remote preparation of qubits, in 
contrast to the transmission of qubits whose states 
are unknown to the sender. In remote preparation 
of qubits, the sender Alice can use the additional 
knowledge about her states in the choice of en- 
coding. In [9] it is shown that one can obtain the 
asymptotic resource relation 
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1 ebit + 1 qubit > 2 remotely prepared qubit(s). 
(6) 


Such relation would be impossible if the r.h.s. 
were replaced by 2 qubits. In that case the in- 
equality could be used repeatedly to obtain that 
1 qubit suffices for the transmission of an arbi- 
trary number of qubits which is impossible. 

The “non-oblivious” superdense coding of 
quantum states should be compared with the 
non-oblivious and asymptotic variant of quantum 
teleportation which was introduced in [5]. In this 
protocol, referred to as remote state preparation 
(using classical bits), the quantum teleportation 
inequality, Eq. (3), is tightened to 


1 ebit + 1 cbit > 1 remotely prepared qubit(s). 
(7) 


These various resource (in)equalities and their 
underlying protocols can be viewed as the first in 
a comprehensive theory of resources inequalities. 
The goal of such theory [6] is to provide a unified 
and simplified approach to quantum Shannon 
theory. 


Experimental Results 


In [11] a partial realization of dense coding was 
given using polarization states of photons as 
qubits. The Bell state |W ,) can be produced 
by parametric down-conversion; this _ state 
was used in the experiment as the shared 
entanglement between Alice and Bob. With 
current experimental techniques, it is not possible 
to carry out a low-noise measurement in the Bell 
basis which uniquely distinguishes the four Bell 
states. Thus, in [11] one of three messages, 
a trit, is encoded into the four Bell states. 
Using two-particle interferometry, Bob learns 
the value of the trit by distinguishing two of the 
four Bell states uniquely and obtaining a third 
measurement signal for the two other Bell states. 

In perfect dense coding, the channel capacity 
is 2 bits. For the trit-scheme of [11], the ideal 
channel capacity is log 3 ~ 1.58. Due to the noise 
in the operations and measurements, the authors 
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of [11] estimate the experimentally achieved ca- 
pacity as 1.13 bits. In [1] it is shown how the 
presence of additional entanglement of the polar- 
ized photons in their orbital momentum degree 
of freedom (hyperentanglement) can assist in 
distinguishing all 4 Bell states in a modified Bell 
state analyzer. A capacity of 1.63 bits is reported. 

In [13] the complete protocol of dense coding 
was carried out using two ?Bet ions confined 
to an electromagnetic trap. A qubit is formed by 
two internal hyperfine levels of the °Be* ion. 
Single-qubit and two-qubit operations are carried 
out using two polarized laser beams. A single 
qubit measurement is performed by observing 
a weak/strong fluorescence of |0) and |1). The 
authors estimate that the noise in the unitary 
transformations and measurements leads to an 
overall error rate on the transmission of the bits 
b of 15%. This results in an effective channel 
capacity of 1.16 bits. 

In [7] dense coding was carried out using 
NMR spectroscopy. The two qubits were formed 
by the nuclear spins of 'H and !3C of chloroform 
molecules 13CHCL; in liquid solution at room 
temperature. The full dense coding protocol was 
implemented using the technique of temporal 
averaging and the application of coherent RF 
pulses; see [12] for details. The authors estimate 
an overall error rate on the transmission of the 
bits b of less than 10 %. 
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Problem Definition 


A quantum system can never be seen as being 
completely isolated from its environment, 
thereby permanently causing disturbance to the 
state of the system. The resulting noise problem 
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threatens quantum computers and their great 
promise, namely, to provide a computational 
advantage over classical computers for certain 
problems (see also the cross-references in 
the section “Cross-References”). Quantum 
noise is usually modeled by the notion of a 
quantum channel which generalizes the classical 
case and, in particular, includes scenarios for 
communication (space) and storage (time) of 
quantum information. For more information 
about quantum channels and quantum infor- 
mation in general, see [19]. A basic channel 
is the quantum mechanical analog of the classical 
binary symmetric channel [17]. This quantum 
channel is called the depolarizing channel and 
depends on a real parameter p € [0,1]. Its 
effect is to randomly apply one of the Pauli 
spin matrices X, Y, and Z to the state of the 
system, mapping a quantum state p of one qubit 
to (1 — p)p + p/3(XpX + YpY + ZpZ). 
It should be noted that it is always possible to 
map any quantum channel to a depolarizing 
channel by twirling operations. The basic 
problem of quantum error correction is to devise 
a mechanism that allows to recover quantum 
information that has been sent through a quantum 
channel, in particular the depolarizing channel. 


Key Results 


For a long time, it was not known whether it 
would be possible to protect quantum information 
against noise. Even some indication in the form 
of the no-cloning theorem was put forward to 
support the view that it might be impossible. 
The no-cloning theorem essentially says that an 
unknown quantum state cannot be copied per- 
fectly. This dashes hopes that, similar to the clas- 
sical case, a simple triple-replication and majority 
voting mechanism may be used in the quantum 
case as well. Therefore, it came as a surprise 
when Shor [20] found a quantum code which 
encodes one qubit into nine qubits in such a 
way that the resulting state has the ability to be 
protected against arbitrary single-qubit errors on 
each of these nine qubits. The idea is to use a 
concatenation of two threefold repetition codes. 
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One of them protects against bit-flip errors while 
the other protects against phase-flip errors. The 
quantum code is a two-dimensional subspace of 
the 2° dimensional Hibert space (C?)®?. Two 
orthogonal basis vectors of this space are iden- 
tified with the logical 0 and 1 states, respectively, 
usually called |0) and |1). Explicitly, the code is 
given by 


= 55 (000) + |111)) @ ({000) + |111)) 


® (|O00) + |111)), 


1 
571000) + |111)) ® (|000) — |111)) 
® (|000) + |111)). 


The state «|0) + 8|1) of one qubit is encoded to 
the state w|0)-+6|1) of the nine-qubit system. The 
reason why this code can correct one arbitrary 
quantum error is as follows. 

First, suppose that a bit-flip error has hap- 
pened, which in quantum mechanical notation is 
given by the operator X. Then a majority vote of 
each block of three qubits 1—3, 4—6, and 7—9 can 
be computed and the bit flip can be corrected. To 
correct against phase-flip errors, which are given 
by the operator Z, the fact is used that the code 
can be written as |0) = | + ++) + |-—-—), 
\1) = | + ++) —|—- ——), where |4) = 
5 ((000) + |111)). By measuring each block of 
three in the basis {|+),|—)}, the majority of the 
phase flips can be detected and one phase-flip 
error can be corrected. Similarly, it can be shown 
that Y, which is a combination of a bit flip and a 
phase flip, can be corrected. 


Discretization of Noise 

Even though the above procedure seemingly only 
takes care of bit-flips and phase-flip errors, it 
actually is true that an arbitrary error affecting 
a single qubit out of the nine qubits can be 
corrected. In particular, and perhaps surprisingly, 
this is also the case if one of the nine qubits is 
completely destroyed. The linearity of quantum 
mechanics allows this method to work. Linearity 
implies that whenever operators A and B can be 
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corrected, so can their sum A + B [8, 20, 22]. 
Since the (finite) set {12, X, Y, Z} forms a vector 
space basis for the (continuous) set of all one- 
qubit errors, the nine-qubit code can correct an 
arbitrary single-qubit error. 


Syndrome Decoding and the Need for 

Fresh Ancillas 

A way to do the majority vote quantum- 
mechanically is to introduce two new qubits 
(also called ancillas) that are initialized in |0). 
Then, the results of the two parity checks 
for the repetition code of length three can be 
computed into these two ancillas. This syndrome 
computation for the repetition code can be done 
using the so-called controlled not (CNOT) gates 
[19] and Hadamard gates. After this, the qubits 
holding the syndrome will factor out (i.e., they 
have no influence on future superpositions or 
interferences of the computational qubits) and 
can be discarded. Quantum error correction 
demands a large supply of fresh qubits for 
the syndrome computations which have to be 
initialized in a state |0). The preparation of many 
such states is required to fuel active quantum 
error-correcting cycles, in which syndrome 
measurements have to be applied repeatedly. This 
poses great challenges to any concrete physical 
realization of quantum error-correcting codes. 


Conditions for General Quantum Codes 

Soon after the discovery of the first quantum 
code, general conditions required for the exis- 
tence of codes, which protect quantum systems 
against noise, were sought after. Here the noise is 
modeled by a general quantum channel, given by 
a set of error operators E;. The Knill-Laflamme 
conditions [13] yield such a characterization. Let 
C be the code subspace and let Pc be an or- 
thogonal projector onto C. Then the existence of 
a recovery operation for the channel with error 
operators F; is equivalent to the equation 


PcESE;Pc =4i,j Pe. 
for all i and 7, where d;,; are some complex 


constants. This recently has been extended to 
the more general framework of subsystem codes 
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(also called operator quantum error-correcting 
codes) [16]. 


Constructing Quantum Codes 

The problem of deriving general constructions 
of quantum codes was addressed in a series 
of groundbreaking papers by several research 
groups in the mid-1990s. Techniques were 
developed which allow classical coding theory 
to be imported to an extent that is enough to 
provide many families of quantum codes with 
excellent error correction properties. 

The IBM group [3] investigated quantum 
channels, placed bounds on the quantum 
channels’ capacities, and showed that for some 
channels, it is possible to compute the capacity 
(such as for the quantum erasure channel). 
Furthermore, they showed the existence of a five- 
qubit quantum code that can correct an arbitrary 
error, thereby being much more efficient than 
Shor’s code. Around the same time, Calderbank 
and Shor [4] and Steane [21] found a construction 
of quantum codes from any pair Ci, C2 of 
classical linear codes satisfying ce GC. 
Named after their inventors, these codes are 
known as CSS codes. 

The AT&T group [5] found a general way of 
defining a quantum code. Whenever a classical 
code over the finite field F4 exists that is addi- 
tively closed and self-orthogonal with respect to 
the Hermitian inner product, they were able to 
find even more examples of codes. Independently, 
D. Gottesman [8,9] developed the theory of stabi- 
lizer codes. These are defined as the simultaneous 
eigenspaces of an abelian subgroup of the group 
of tensor products of Pauli matrices on several 
qubits. Soon after this, it was realized that the two 
constructions are equivalent. 

A stabilizer code which encodes k qubits 
into n qubits and has distance d is denoted 
by [n,k,d]. It can correct up to [(d — 1)/2| 
errors of the  qubits. The rate of the code is 
defined as r = k/n. Similar to classical codes, 
bounds on quantum error-correcting codes are 
known, i.e., the Hamming, Singleton, and linear 
programming bounds. 
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Asymptotically Good Codes 

Matching the developments in classical algebraic 
coding theory, an interesting question deals 
with the existence of asymptotically good 
codes, i.e., families of quantum codes with 
parameters [[n;,k;,d;]], where i > 0, 
which have asymptotically nonvanishing rate 
limjoo kj /nj > O and nonvanishing relative 
distance limj—+o9 d;/n; > 0. In [4], the existence 
of asymptotically good codes was established 
using random codes. Using algebraic geometry 
(Goppa) codes, it was later shown by Ashikhmin, 
Litsyn, and Tsfasman that there are also 
explicit families of asymptotically good quantum 
codes [2]. Currently, most constructions of 
quantum codes are from the abovementioned 
stabilizer/additive code construction, with 
notable exception of a few nonadditive codes and 
some codes which do not fit into the framework 
of Pauli error bases. 


Applications 


Besides their canonical application to protect 
quantum information against noise, quantum 
error-correcting codes have been used for other 
purposes as well. The Preskill/Shor proof of the 
security of the quantum key distribution scheme 
BB84 relies on an entanglement purification 
protocol, which in turn uses CSS codes [19]. 
Furthermore, quantum codes have been used 
for quantum secret sharing, quantum message 
authentication, and secure multiparty quantum 
computations. Properties of stabilizer codes are 
also germane for the theory of fault-tolerant 
quantum computation. 


Open Problems 


The literature of quantum error correction is fast 
growing, and the list of open problems is cer- 
tainly too vast to be surveyed here in detail. The 
following short list is highly influenced by the 
preference of the author. 


1. It is desirable to find quantum codes for 
which all stabilizer generators have low 
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weight and which at the same time allow for 
efficient fault-tolerant quantum computation 
with the encoded data. These requirements 
correspond to a quantum equivalent to low- 
density parity check (LDPC) codes. So far 
only a few constructions are known, but 
recent progress was made by Gottesman [10] 
who used quantum LDPC codes to show that 
universal fault-tolerant quantum computing 
with constant overhead is possible. See also 
[11,15] for recent progress on quantum LDPC 
codes. 

2. It is an open problem to find new families of 
quantum codes that improve on the currently 
best known estimates for the threshold 
for fault-tolerant quantum computing, in 
particular for codes that can be implemented 
on a two-dimensional fabric of qubits. An 
advantage might be had by using subsystem 
codes since they allow for simple error 
correction circuits. For more information 
about noise thresholds, see also the entry on 

Fault-Tolerant Quantum Computation. 

3. Many quantum codes are designed for 
the depolarizing channel, where — roughly 
speaking — the error probability is improved 
from p to p4/? for a distance d code. The 
independence assumption underlying this 
model might not always be justified, and 
therefore, it seems imperative to consider 
other channels, e.g., non-Markovian local 
error models. Under some assumptions on the 
decay of the interaction strengths, threshold 
results for such channels have been shown [1]. 
However, it remains open to find constructions 
of good codes for non-Markovian noise and 
in general for noise models that are more 
realistic than the depolarizing channel. 


Experimental Results 


Active quantum error-correcting codes, such as 
those codes which require syndrome measure- 
ments and correction operations, as well as pas- 
sive codes (i.e., codes in which the system stays 
in a simultaneous invariant subspace of all error 
operators for certain types of noise), have been 
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demonstrated for various physical systems. First, 
this was shown in nuclear magnetic resonance 
(NMR) experiments [14]. The three-qubit repeti- 
tion code, which protects one qubit against phase- 
flip error Z, was then demonstrated in an ion trap 
for beryllium ion qubits [6]. 

Subsequently, architectures have been 
proposed [18] that would in principle allow to 
construct scalable quantum computers based on 
ion traps and concatenated coding, e.g., based 
on the [1,3,7] Steane code. In superconducting 
qubit systems, using an architecture that supports 
nine physical qubits, high gate fidelities have 
been reported [12]. This suggests that it might be 
possible in this architecture to achieve error rates 
that are below the threshold for the surface code, 
which is known to be around | % [7]. 


Data Sets 


Markus Grass] maintains http://www.codetables. 
de, which contains tables of the best known 
quantum codes, some entries of which extend 
({5], Table III). It also contains bounds on the 
minimum distance of quantum codes for given 
lengths and dimensions and contains information 
about the construction of the codes. In principle, 
this can be used to get explicit generator matrices 
(see also the following section “URL to Code”). 


URL to Code 


The computer algebra system Magma (http:// 
magma.maths.usyd.edu.au/magma/) has _ func- 
tions and data structures for defining and 
analyzing quantum codes. Several quantum 
codes are already defined in a database of 
quantum codes. For instance, the command 
BestKnownQuantumCode(F, n, k) returns the 
best known quantum code (i.e., one of the 
highest known minimum weight) over the field 
F,, of length n, and dimension k. It allows the 
user to define new quantum codes and to study 
its properties such as the weight distribution, 
automorphism, and several predefined methods 
for obtaining new codes from a set of given ones. 
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Problem Definition 


Secret keys, i.e., random bitstrings not known to 
an adversary, are a vital resource in cryptography 
(they can be used, e.g., for message encryption 
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or authentication). The distribution of secret keys 
among distant parties, possibly only connected by 
insecure communication channels, is thus a fun- 
damental cryptographic problem. Quantum key 
distribution (QKD) is a method to solve this 
problem using quantum communication. It relies 
on the fact that any attempt of an adversary to 
wiretap the communication would, by the laws of 
quantum mechanics, inevitably introduce distur- 
bances which can be detected. 

For the technical definition, consider a setting 
consisting of two honest parties, called Alice and 
Bob, as well as an adversary, Eve. Alice and 
Bob are connected by a quantum channel Q 
which might be coupled to a (quantum) system 
E controlled by Eve (see Fig. 1). In addition, it 
is assumed that Alice and Bob have some means 
to exchange classical messages authentically, that 
is, they can make sure that Eve is unable to (un- 
detectably) alter classical messages during trans- 
mission. If only insecure communication chan- 
nels are available, Alice and Bob can achieve this 
using an authentication scheme [19]. The scheme 
requires that Alice and Bob have a short initial 
key or at least some initial common randomness 
that is not entirely known to Eve [17]. This 
is why QKD is sometimes called quantum key 
growing. 

A QKD protocol xn = (m4, 1B) is a pair of 
algorithms for Alice and Bob, producing clas- 
sical outputs S4 and Sz, respectively. S4 and 
Sg take values in S U {L} where S is called 
key space and | is a symbol (not contained in 
S) indicating that no key can be generated. A 
QKD protocol a with key space S is said to be 
perfectly secure on a channel Q if, after its execu- 
tion using communication over Q, the following 
holds: 


° S4= Sp: 

° if S4 ~ L, then Sy and Sg are uniformly 
distributed on S and independent of the state 
of E. 


More generally, m is said to be e€-secure on 
OQ if it satisfies the above conditions except 
with probability (at most) ¢. Furthermore, 1 is 
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Quantum Key Distribution, Fig. 1 A QKD protocol 1 
consists of algorithms m4 and wg for Alice and Bob, 
respectively. The algorithms communicate over a quantum 


said to be e-robust on Q if the probability that 
S4 = L is at most e. These definitions may be 
extended to sets of channels Q, i.e., one demands 
that the conditions hold for any member of 
the set. 

In the standard literature on QKD, protocols 
are typically parametrized by some positive num- 
ber k quantifying certain resources needed for its 
execution (e.g., the amount of communication). 
A protocol 1 = (1x)xen is said to be secure 
(robust) on a set of channels if there exists a 
sequence (€%)xeN Which approaches zero expo- 
nentially fast such that mx is €¢-secure (€%-robust) 
on this set for any k € N. Moreover, if the key 
space of 1, is denoted by S;, the key rate of 


TM = (1zK)xen is defined by r = jm fe where 
—>0o 


Ik := log, |S;| is the key length. 

The ultimate goal is to construct QKD proto- 
cols m which are secure against general attacks, 
i.e., secure on the set of all possible channels 
Q. This ensures that an adversary cannot get 
any information on the generated key even if 
she fully controls the communication between 
Alice and Bob. At the same time, a protocol 1 
should be robust on a set of realistic channels, 
corresponding to a situation where the noise of 
the channel is below a given threshold and no 
adversary is present. Note that, in contrast to 
security, robustness cannot be guaranteed on the 
set of all possible channels. Indeed, an adversary 
could, for instance, interrupt the entire communi- 
cation between Alice and Bob (in which case key 
generation is obviously impossible). 


channel Q that might be coupled to a system E controlled 
by an adversary. The goal is to generate identical keys S_4 
and Sg which are independent of E 


Key Results 


Protocols 
On the basis of the pioneering work of Wiesner 
[20], Bennett and Brassard, in 1984, invented 
QKD and proposed a first protocol, known today 
as the BB84 protocol [3]. In 1991, Ekert invented 
entanglement-based QKD. His protocol is com- 
monly referred to as E91 [8] and provides an 
additional level of security, termed device inde- 
pendence [1,9]. Later, in an attempt to increase 
the efficiency and practicability of QKD, various 
extensions to the BB84 and E91 protocols as well 
as alternative schemes have been proposed. 
QKD protocols can generally be subdivided 
into (at least) two subprotocols. The purpose 
of the first, called distribution protocol, is to 
generate a raw key pair, 1.e., a pair of correlated 
classical values X and Y known to Alice and 
Bob, respectively. In many protocols (including 
BB84), Alice chooses ¥ = (Xj,..., Xx) at 
random, encodes each of the X; into the state of 
a quantum particle, and then sends the k particles 
over the quantum channel to Bob. Upon receiving 
the particles, Bob applies a measurement to each 
of them, resulting in Y = (%,...,Y,). The 
crucial idea now is that, by virtue of the laws 
of quantum mechanics, the secrecy of the raw 
key is a function of the strength of the corre- 
lation between X and Y; in other words, the 
more information about the (raw) key an adver- 
sary tries to acquire, the more disturbances she 
introduces. 
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channel noise (measured in 
terms of the bit-flip 0.6 
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This is exploited in the second subprotocol, 
called distillation protocol. Roughly speaking, 
Alice and Bob estimate the statistics of the raw 
key pair (X,Y). If the correlation between their 
respective parts is sufficiently strong, they use 
classical techniques such as information recon- 
ciliation (error correction) and privacy amplifica- 
tion (see [4] for the case of a classical adversary 
which is relevant for the analysis of security 
against individual attacks and [14, 16] for the 
quantum-mechanical case which is relevant in the 
context of collective and general attacks; cf. the 
characterization below) to turn (X, Y) into a pair 
(S4, Sg) of identical and secret keys. 


Key Rate as a Function of Robustness and 
Security 

The performance (in terms of the key rate) of a 
QKD protocol strongly depends on the desired 
level of security and robustness, as illustrated in 
Fig. 2. (The robustness is typically measured in 
terms of the maximum tolerated channel noise, 
i.e., the maximum noise of a channel QO such 
that the protocol is still robust on Q.) The results 
summarized below apply to protocols of the form 
described above where, for the analysis of ro- 
bustness, it is assumed that the quantum channel 
Q connecting Alice and Bob is memoryless and 
time invariant, i.e., each transmission is subject 
to the same type of disturbances. Formally, such 
channels are denoted by Q = Oak where a) 
describes the action of the channel in a single 
transmission. 


Security Against Individual Attacks 

A QKD protocol = is said to be secure against 
individual attacks if it is secure on the set of 
channels QO of the form Ok under the constraint 
that the coupling to E is purely classical. Note 
that this notion of security is relatively weak. 
Essentially, it only captures attacks where the 
adversary applies identical and independent mea- 
surements to each of the particles sent over the 
channel. 

The following statement can be derived from a 
classical argument due to Csiszar and Korner [6]. 
Let t be a distribution subprotocol as described 
above, i.e., t generates a raw key pair (X,Y). 
Moreover, let S be a set of quantum channels a) 
suitable for t. Then there exists a QKD protocol 
mt (parametrized by k) consisting of k executions 
of the subprotocol t followed by an appropriate 
distillation subprotocol such that the following 
holds: x is robust on Q = O®* for any O€S, 
is secure against individual attacks, and has key 
rate at least 


r > min H(X|Z) — H(X|Y), (1) 
OES 


where the conditional Shannon entropies on 
the rh.s. are evaluated for the joint distribution 
ae of the raw key (X, Y) and the (classical) 
state Z of Eve’s system F after one execution of 
ton the channel Q. Evaluating the right hand side 
for the BB84 protocol on a channel with bit-flip 
probability e shows that the rate is non-negative 
ife < 14.6% [10]. 


1706 


Security Against Collective Attacks 

A QKD protocol 1 is said to be secure against 
collective attacks if it is secure on the set of chan- 
nels Q of the form Q®* with arbitrary coupling 
to E. This notion of security is strictly stronger 
than security against individual attacks, but it still 
relies on the assumption that an adversary does 
not apply joint operations to the particles sent 
over the channel. 

As shown by Devetak and Winter [7], the 
above statement for individual attacks extends 
to collective attacks when replacing inequality 
(1) by 


r>min S(X|E)— H(X|Y), (2) 
OES 


where S(X|£) is the conditional von Neumann 
entropy evaluated for the classical value X and 
the quantum state of E after one execution of t 
on O. For the standard BB84 protocol, the rate is 
positive as long as the bit-flip probability e of the 
channel satisfies e < 11.0 % [18] (see Fig. 2 fora 
graph of the performance of an extended version 
of the protocol). 


Security Against General Attacks 

A QKD protocol 1 is said to be secure against 
general attacks if it is secure on the set of all 
channels Q. This type of security is sometimes 
also called full or unconditional security as it 
does not rely on any assumptions on the type of 
attacks (as long as they are constrained to the 
communication channel) or the resources needed 
by an adversary. 

The first QKD protocol to be proved secure 
against general attacks was the BB84 protocol. 
The original argument by Mayers [13] was fol- 
lowed by various alternative proofs. Most no- 
tably, based on a connection to the problem of 
entanglement purification [5] established by Lo 
and Chau [12], Shor and Preskill [18] presented 
a general argument which applies to various ver- 
sions of the BB84 protocol. 

Later it has been shown that, for virtually any 
QKD protocol, security against collective attacks 
implies security against general attacks [14, 15]. 
In particular, the above statement about the secu- 
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rity of QKD protocols against collective attacks, 
including formula 2 for the key rate, extends to 
security against general attacks. 


Applications 


Because the notion of security described above 
is composable [16] (see [2, 14] for a general 
discussion of composability of QKD), the key 
generated by a secure QKD protocol can in prin- 
ciple be used within any application that requires 
a secret key (such as one-time pad encryption). 
More precisely, let A be a scheme which, when 
using a perfect key S (i.e., a uniformly distributed 
bitstring which is independent of the adversary’s 
knowledge), has some failure probability 8 (ac- 
cording to some arbitrary failure criterion). Then, 
if the perfect key S is replaced by the key gen- 
erated by an s-secure QKD protocol, the failure 
probability of A is bounded by 6 + ¢ [14]. 


Experimental Results 


Most known QKD protocols (including BB84 
and E91) only require relatively simple quan- 
tum operations on Alice and Bob’s side (e.g., 
preparing a two-level quantum system in a given 
state or measuring the state of such a system). 
This makes it possible to realize them with to- 
day’s technology. Experimental implementations 
of QKD protocols usually use photons as carriers 
of quantum information, because they can easily 
be transmitted (e.g., through optical fibers or free 
space). A main limitation, however, is noise in 
the transmission, which, with increasing distance 
between Alice and Bob, reduces the performance 
of the protocol (see Fig. 2). We refer to [11] for an 
overview on quantum cryptography with a focus 
on experimental aspects. 
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Problem Definition 


Brief Description 

The search problem can be described informally 
as finding an item possessing a specific property, 
in a given set of N items. Each item either does or 
does not possess the specified property, and that 
can be checked by a binary query. The complexity 
of the problem is the number of such queries 
required to find the desired item (also called the 
target item). The items are often collected in a 
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database and sorted to simplify the subsequent 
searches. When they are not sorted, there is no 
shortcut to the brute force method of checking 
each item one by one until the desired item is 
found. A familiar example of a database is a tele- 
phone directory. Its entries are sorted according 
to the names of persons but not according to 
the telephone numbers. Hence, it is easy to find 
the telephone number of a particular person, but 
difficult to find the name of a person to whom 
a particular telephone number belongs (i.e., a 
lookup is difficult when it is not in the same order 
in which the database 1s sorted). 

The Q(N) lower bound on search speed, 
based on inspection of one item at a time, is 
correct only for classical computers. Quantum 
computers can be in a superposition of multiple 
states, however, and so can inspect multiple items 
at the same time. There is no obvious lower 
bound on how fast a quantum search can be, nor 
is there an obvious technique faster than the brute 
force search. It turns out, though, that there is an 
efficient and optimal quantum search algorithm 
that requires only O(N) queries [15]. 

This quantum algorithm is very different from 
the search on a classical computer [8]. The opti- 
mal classical strategy is to check the items one 
at a time in a random order, avoiding in later 
trials the items that have already been checked 
earlier. After 7 items have been checked from 
a uniform distribution, the probability that the 
search hasn’t yet succeeded is (1 — 1/N)(1 — 
1/(N — 1))---d -— 1/(N — n 4+ 1)). Forn « 
N, the success probability is therefore roughly 
1—(1-1/N)" ® n/N. Increasing this success 
probability to ©(1) requires the number of items 
checked, n, to be O(N). 

In contrast to classical computation, quantum 
computation is formulated in terms of wavelike 
complex amplitudes, whose interference can be 
used to cancel undesirable components and boost 
the desired component. Quantum search is then 
analogous to the design of a multi-element an- 
tenna array, where a careful choice of phases can 
boost the radiation in a particular direction. The 
analysis of such structures is carried out using the 
algebra of unitary transformations, and absolute- 
value squares of the amplitudes give the ob- 
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servation probabilities. Unitary transformations 
include rotations and reflections (about various 
directions) in the space of amplitudes, as well 
as local phase shifts. Rotations and reflections 
redistribute the amplitudes and are similar to 
classical transformations. On the other hand, the 
phase shifts are uniquely quantum; they do not 
alter probabilities of individual components, but 
affect their subsequent interference pattern. The 
challenge of quantum computation is to find a 
sequence of elementary unitary operations (i.e., 
quantum logic gates) that solve the given com- 
puter science problem, while ensuring that the 
input and the output of the quantum algorithm 
have clear classical interpretations. 

The quantum search algorithm steadily in- 
creases the amplitude of the desired item through 
a series of quantum operations. Starting with an 
initial amplitude 1 /V(N), in 7 steps, the am- 
plitude increases to roughly n//N, and hence, 
the success probability (on observation of the 
state) increases to n/N. Boosting this to @(1) 
requires only O(N) steps, approximately the 
square root of the number of steps required by 
the best classical algorithm. 

The quantum search algorithm is of wide in- 
terest because of its versatility; it can be adapted 
to different settings in a variety of fields, giving a 
new class of quantum algorithms extending well 
beyond the search problems. Since its discovery, 
it has been incorporated in solutions of many 
quantum problems — several of them are men- 
tioned later in this article. Even now, two decades 
after the algorithm’s discovery, new applications 
and extensions keep on appearing regularly. 


Formal Construction 

Let the items in the set be labeled by an index i = 
1,2,..., N. Let the binary query be represented 
by an oracle f(i), such that f(i) = 1 when 
i represents a desired item and f(i) = 0 oth- 
erwise. The quantum algorithm works in an N- 
dimensional vector space with complex coordi- 
nates, known as the Hilbert space. We use Dirac’s 
notation, which is standard in the literature of 
quantum mechanics and quantum computation. 
Then the items are mapped to the N orthogonal 
basis vectors |i) of the Hilbert space, and the bi- 
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nary query is mapped to the selective phase-shift 
operator defined by Uy |i) = (—1)f|i). Given 
the binary query f(i), it is easy to construct the 
operator Uy using an ancilla qubit. 

The problem is to start from a specific ini- 
tial state |s) and evolve to a target state satis- 
fying Uy|t) = —|t), by applying a sequence 
of unitary operations. The number of times Uy 
is used in the algorithm is its query complex- 
ity. This search problem is unstructured because 
nothing is known about the solution, except the 
information available from the oracle that can 
tell whether or not a specific state is the target 
state. 

NP-complete problems can be represented as 
exhaustive search problems. For example, let 
be a 3-SAT formula on 1 Boolean variables. 
Then the search problem is to find an assignment 
for the variables, i € {1,2,...,N = 2}, 
that satisfy @. This example does not involve a 
database and so bypasses concerns regarding how 
the items are stored in a physical memory device 
and the spatial relationship among them. 


Key Results 


Grover [15] showed that there indeed exists a 
quantum search algorithm that provides a square- 
root speedup over the optimal classical random- 
ized algorithm. The algorithm has its simplest 
form when there is only one target item. Then 
the algorithm starts with an unbiased uniform 


N 
superposition state |s) = (1/N) > |i) and 
i=1 


performs Q = O(4/N) iterations to evolve to 
the state (UpU ¢)2|s). Each iteration consists of 
two reflection operations: 


1. Uy = 1 —2|t){t| is a reflection along |f). It 
uses the binary query to flip the sign of the 
amplitude of the target state. 

2. Up = 2|s) (s|—1 is reflection about |s). It can 
be carried out without any information about 
the target state. Since the action of |s)(s| gives 
the average amplitude state, Up amounts to 
inversion about the average or overrelaxation. 
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At the end, the final state is measured in the {|i)} 
basis that encodes the item labels, and i is output. 

There are several ways of analyzing the 
algorithm, and the geometric picture is perhaps 
the simplest. We observe that throughout 
the evolution, the quantum state stays in the 
two-dimensional subspace (of the Hilbert space) 
spanned by |s) and |t). Initially, the amplitude 
of the state along |t) is (t|s) = 1//N, and 
the angle between |f) and |s) is 7/2 — 0 with 
sin@ = 1/WN. It is a general property of linear 
transformations in two dimensions that a pair 
of reflections about two distinct axes produces 
a rotation, and the amount of rotation is twice 
the angle between the two axes. The quantum 
search algorithm is an alternating sequence 
of reflections about two different axes. Each 
application of the operator UpUy rotates the 
quantum state from |s) toward |f) by angle 
20. The number of iterations O required to 
exactly reach the target state is therefore given 
by (20 + 1)@ = 7/2. In practice, we have to 
truncate to integer OQ = | O +0.5], introducing a 
small error. The success probability still remains 
at least cos?9 = 1 — 1/N. Asymptotically, 
O = (n/4)VN. 

The reflection about the uniform superposition 
state, Up, is known as the Grover diffusion 
operation. When the indices are represented 
in binary notation, with N = 2”, we 
have |s) = H®"|0)®” in terms of the Hadamard 


: tot) Then Up = 


v2\1-1 

H®"UjH®", with Up = 2/0)9"(0/9" — 1 
being the reflection about the |0)®” state, and 
it can be implemented using O(n) qubit-level 
operations. In this case, the full quantum search 
algorithm evolves the state |0)®” to the state 
(H®"UyH®"U ¢)2 H8" \0)®”, 

When there are M target items, instead of 


just one, all that is required is to replace |t) by 
M 

|t;)/M in the algorithm. The final mea- 
j=l 

surement then yields one of the target items after 


O (/N7M) queries. Thus, we have [15]: 


Theorem 1 (Grover search) There is a quan- 
tum black-box unstructured search algorithm 


operator H = 
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with success probability @(1), which finds any 
one of the M target items in a set of N items, 


using O (/N7M) queries. 


This algorithm has several noteworthy proper- 


ties: 


The algorithm is optimal. It saturates the 
Q (VN ) lower bound [5] on the number of 
queries required for an unstructured quantum 
search. The evolution from |s) to |t) follows 
the shortest geodesic route in the Hilbert 
space at constant speed. A variational analysis 
shows that the algorithm cannot be improved 
by even a single query [43]. 

The best classical search algorithm has to 
walk randomly through all the items, while the 
quantum search algorithm performs a directed 
walk in the Hilbert space. The square-root 
speedup of the quantum search algorithm can 
therefore be understood as the well-known re- 
sult that directed walk provides a square-root 
speedup over random walk while covering the 
same distance. 

The algorithm can be looked upon as evo- 
lution of the quantum state from |s) to |t), 
governed by a Hamiltonian containing two 
terms, |t) (t| and |s)(s|. The former represents 
a potential energy attracting the state toward 
|t), and the latter represents a kinetic en- 
ergy diffusing the state throughout the Hilbert 
space. The algorithm is then the discrete Trot- 
ter’s formula, generated by exponentiating the 
two terms in the Hamiltonian [17]. 

Grover search does not require the full power 
of quantum dynamics and can be implemented 
using any system that obeys the superposi- 
tion principle. Explicit examples using clas- 
sical waves in the form of coupled oscilla- 
tors have been constructed [20, 29]. In these 
mechanical systems, the role of the uniform 
superposition state is played by the center-of- 
mass mode, and the search problem becomes 
the energy focusing problem. The classical 
wave implementation requires the same num- 
ber of queries as the quantum algorithm. The 
difference is that to represent N items, we 
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need N wave modes but only n = log, N 
qubits. 

Grover search finds with certainty a single 
target state out of four possibilities using a 
single binary query, ie, OQ = 1 for N = 
4. The best classical Boolean algorithm can 
distinguish only two items with a single binary 
query, and so it needs two binary queries 
to carry out the same task. When the query 
can be factored into subqueries, e.g., the item 
label is searched for one digit at a time and 
not as a whole, the best quantum (or wave) 
arrangement for a database is a quaternary 
tree. Then every subquery reduces the search 
space by a factor of 4, which is a factor-of-2 
advantage over the classically optimal binary 
tree. An additional advantage following from 
commutativity of superposition is that, unlike 
the classical case, the quantum tree does not 
require sorting of the database [28]. 

The quantum search algorithm is robust 
against changes in the initial state and the 
operators, in sharp contrast to many other 
quantum processes that are highly sensitive 
to errors. The initial state and the diffusion 
operator are related in Grover search, but can 
be separated in a more general context [2, 40]. 
Let Up be the modified diffusion operator 
with |s) as an eigenstate with eigenvalue 1, 
i.e., the diffusion is translationally invariant. 
The algorithm then succeeds with (1) 
probability, provided a = |(t|s)| as well as the 
angular spectral gap of Up in the vicinity of 
identity (say 6) are bounded away from zero. 
The number of queries required is O(B3/«), 
where B? is related to the second moment of 
the eigenvalue distribution of Up and obeys 
Br <1+ (4/62), in contrast to the classical 
result O(1/a?). Grover search, therefore, can 
be generalized to an entire class of algorithms 
that use different diffusion operators. This 
flexibility is one of the reasons why Grover 
search ideas appear frequently in quantum 
algorithms. 

Quantum search can be implemented so as 
to be robust also against faulty queries, a 
problem known as bounded-error search. 
When the query has a bounded coherent 
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error, say Ur|i)|1) = /pi(-—DF|a) 1) + 
J1— pili)|0) with each p; > 0.9 (|i) are 
arbitrary states and the last qubit is a witness 
for the fault), quantum search can still be 
implemented using O (VN ) queries [22]. 
The deterioration of the search algorithm 
depends on p;, but that is only in the scaling 
constant for the number of queries. 


A useful generalization of the quantum search 
algorithm is the amplitude amplification tech- 
nique [10, 16], which can be applied on top of 
nearly any quantum algorithm for any problem. 
It says that given a quantum algorithm that solves 
a problem with a small success probability €, the 
success probability can be increased to roughly 
me using O(m) calls to that algorithm. (Classi- 
cally, the success probability can be increased to 
only about me.) For the standard search problem, 
the simple algorithm that picks a random item has 
success probability « = 1/N, which the quantum 
search algorithm increases to @(1). 

More formally, let V be the unitary operator 
corresponding to an algorithm that evolves the 
initial state |s) to V|s). Its success probability is 
€ = |(t|V|s)|? = |Vis|?. The algorithm obtained 
by replacing |s) by V|s) in Grover search, i.e., 
replacing Up by VUpV1, then increases the 


success probability to @(1) in 0 (1//isl) 
iterations. In particular, this algorithm evolves 
the quantum state in the two-dimensional sub- 
space spanned by |t) and V|s), rotating it by 
angle 2sin”!|V;5| at every iteration. In order to 
implement Uy, the algorithm needs a witness 
for the correctness of the output. Thus, we have 
[10, 16]: 


Theorem 2 (Amplitude amplification) Let A 
be a quantum algorithm that outputs a correct 
answer with witness, with known probability 
€ = sin’ 6. Furthermore, let m = |1/(49)]. 
Then there is an algorithm A’ that uses 2m + 1 
calls to A and A~! and outputs a correct answer 
with probability €' > 1—e. 


Depending on the actual implementation, it 
is possible to vary the quantum search algo- 
rithm somewhat from the preceding deterministic 
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and optimal approach and obtain small improve- 
ments: 


¢ The algorithm needs O (VN log N ) qubit- 


level operations in order to implement H®” 
and Up. The log N factor in this count can 
be suppressed by adding a small number of 
queries to the algorithm. A simple scheme 
divides the n qubits into k sets of n/k qubits 
each and uses the Grover diffusion operators 
Up® = H2®@/H yy H2@/* that act only 
on one set at a time leaving the other sets 
unchanged [18]. Sequentially going through 
all the sets generates the transformation 


k : 
V = T[ UUs) H®", which is then used 


for amplinids amplification with the initial 
state |0)®”. Overall, the number of qubit- 
level operations reduce by a factor O(k), 
while the number of queries go up by a factor 
1+ @(kKN-'/*), provided KN7!/* = o(1). 
The choice k = ©O(n/logn) reduces the 
qubit-level operations to O (VN log n), at 
the cost of increasing the queries by a factor 
1+ O(1/logn). 

¢ Consider the partial search problem where the 
items are separated into N/b blocks of size 
b each, and only the block containing the 
desired item is to be located using the same 
Uy. In that case, the number of queries can 
be reduced by 0.34/b for large b [25]. The 
procedure first uses Grover search to make 
the amplitudes of nontarget blocks sufficiently 
small, then applies Grover search in parallel 
within each block to make the amplitudes of 
the target block sufficiently negative (ampli- 
tudes of nontarget blocks remain unchanged 
in this step), and then executes a final Up op- 
eration to reduce the amplitudes of nontarget 
blocks to zero. 

¢ Though Grover search proceeds from |s) to 
|t) with uniform speed in the Hilbert space, 
it slows down in terms of the success prob- 
ability as it nears the target state. So one 
can reduce the expected number of queries, 
by stopping the algorithm before reaching the 
target state and then looking for the desired 
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item probabilistically. That amounts to mini- 
mizing (Q + 1)/p, with the success probabil- 
ity p = sin? (20 + 1) sin" (1//N)). This 
probabilistic search reduces the required num- 
ber of queries asymptotically to 0.6900/N 
[14]. 

¢ Consider the search problem where the times 
required for querying different items are 
not the same. This can happen when the 
query is an algorithm A acting on different 
input states |i). When the query for the i 
item takes time ft; to execute, unstructured 
quantum search can be accomplished in time 

1/2 
O|T= (= ‘) when ¢; are known 
i=1 

apriori [3]. The strategy is to divide the items 
into multiple groups so that items in every 
group have query times within a constant 
factor, apply Grover search within each group, 
and then query the groups sequentially. The 
number of groups needed is O(log NV), and the 
result improves upon the global Grover search 
bound O (VN Imax) Amplitude amplification 
can be used when A is probabilistic, and a 
polylogarithmic overhead in T is required 
when ¢; are not known in advance. 


Applications 


NP-Complete Problems 

Even though NP-complete problems have some 
structure, there are few known algorithms that 
exploit this structure to solve them, and often the 
only recourse left is to solve them as exhaustive 
search problems. Since quantum search does not 
assume any structure or pattern in the input data, 
it provides a square-root speedup in such cases. 


Quantum Counting 

The counting problem is to find the number of 
items in a set that satisfy the given query. Its 
quantum solution is based on the fact that the 
iterative evolution in Grover search is periodic, 


with angular frequency w = 2 sin | (/M/N). 
The phase estimation procedure (based on quan- 
tum Fourier transform) [24] can therefore deter- 
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mine M approximately, up to error VM, using 
O (VN ) queries. Then, using the property that 


w differs by 1/ a M(N — M) between adjacent 
values of M, M can be determined exactly using 
O/./ M(N — M) queries [27]. For M = o(N), 
this quantum result is a power-law improvement 
over the classical result of ©(N) queries, al- 
though not as good as a square-root speedup. 


Element Distinctness 

An early application of Grover search was to find 
collisions, i.e., given oracle access to a 2-to-1 
function f, find distinct arguments x, y such that 
F(x) = f(y). The quantum collision problem 
has an O(N '/3) algorithm [9]. The more general 
element distinctness problem is to find distinct 
x,y such that f(x) = f(y), for an unknown 
function f that can be accessed only by an oracle. 
Ambainis discovered an optimal O(N2/?) quan- 
tum algorithm for this problem [2]. It searches 
a suitably constructed graph, with the Grover 
diffusion operation replaced by a certain quantum 
walk. The vertices of the graph correspond to 
various subsets of items S; C {1,2,...,N}, 
each of size N2/ 3. two vertices are connected 
by an edge when the corresponding subsets differ 
by only one item, and the target vertices are the 
subsets S; that solve the element distinctness 
problem. 


Distributed Search 

Grover search is also useful in improving com- 
munication complexity. For example, a straight- 
forward distributed implementation of the quan- 
tum search algorithm solves the set intersection 
problem or the appointment problem. The result 
is, when A and B have respective data strings 
x,y € {0, ne. and they want to find an index i 
such that x; = y; = 1, only O (VN log N) 
qubits of communication is necessary [11]. This 
result has led to an exponential classical/quantum 
separation in the memory required to evaluate a 
certain total function with a streaming input [26]. 


Fixed-Point Search 
The iterative evolution in Grover’s search algo- 
rithm is cyclic, and knowledge of N is necessary 
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to stop it at the right time to find the target 
item. In contrast, fixed-point search algorithms 
converge monotonically to the target state. For a 
long time, a fixed-point quantum algorithm was 
considered unlikely, since any iterative unitary 
evolution is periodic. Surprisingly, a way out was 
provided by recursive unitary evolution. When 
the reflection operations in the amplitude ampli- 
fication algorithm are replaced by selective phase 
shifts of 1/3 (e.g., Ri = 1 + (e'7/3 — Ii) (i| 
for the state |i)), |(¢|V|s)|?, = 1— implies 
\(t|VR,V'R,V|s)|? = 1 — 3 [19]. So each 
recursive substitution of the operator V by the 
operator VRsV*RiV reduces the deviation of 
the final state from the target state to the cube 
of what it was before. (The corresponding best 
classical reduction is O(€7), e.g., by majority rule 
selection after three trials.) This technique does 
not give a square-root speedup for search, but it 
is useful when ¢€ is small, for instance, in error 
correction. It has been used to design composite 
pulse sequences for reducing systematic errors 
[35]. An iterative quantum search algorithm with 
similar properties has been obtained combining 
reflection operations with non-unitary projective 
measurements [41]. Another recent construction 
is a bounded-error quantum search algorithm 
(i.e., success probability p > 1 — 5) that varies 
the phase shifts between m and 1/3 as a function 
of the iteration number [42]. It exhibits square- 
root speedup as well as convergence to the target 
state, provided that both 6 and |(t|s)| are bounded 
away from zero. 


Spatial Search 

This is the search problem where the items be- 
longing to a database are spread over distinct 
physical locations, say a d-dimensional lattice, 
and there is a restriction that one can proceed 
from any location to only its neighbors while 
searching for the target item. Its quantum solution 
replaces the global Grover diffusion operator by a 
local quantum walk, and Grover search becomes 
the d — oo limit. The required number of 
queries has to obey the double lower bound 


Q (anid, VN); the former arises from the 
finite speed of movement on the lattice and the 
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latter from the optimality of Grover search. The 
best algorithms are found in the framework of rel- 
ativistic quantum mechanics. They use O (VN ) 
queries for d > 2, with the scaling constant 
approaching 11/4 from above as d — oo [1,33]. 
In the critical dimension d = 2, the algorithms 
are slowed down by logarithmic factors arising 
from the infrared divergence, and the best known 
N log N 
For non-integer values of d, the scaling behavior 
of the algorithm has been verified using numeri- 
cal simulations on fractal lattices [32]. 


algorithm requires O ( ) queries [39]. 


Markov Chain Evolution 

Generic stationary stochastic processes (e.g., ran- 
dom walks) are defined in terms of transition 
matrices that encode the possible evolutionary 
changes at each step. Many properties of the 
resulting evolution (e.g., hitting time, detection, 
mixing, escape time) scale as negative powers 
of the spectral gap of the transition matrix. For 
Markovian evolution on bipartite graphs, the tran- 
sition matrix can be separated into two disjoint 
parts, say {x} > {y} and {y} — {x}. Szegedy 
constructed two reflection operators from these 
parts and defined a quantum evolution operator 
as their product [38] (classical Markov chain 
evolution does not allow such reflection opera- 
tors). The spectral gap of this quantum evolution 
operator scales as the square root of the spectral 
gap of the original transition matrix and so speeds 
up the evolution the same way as Grover search 
does. 


Recursive Search 

Game-tree evaluation, which is a recursive search 
problem, is an extension of unstructured search. 
Classically, using the alpha-beta pruning tech- 
nique, the value of a balanced binary AND-OR 
tree can be computed with o(1) error in expected 
time O (aviseclC1+ /33)/41) = O(N%754) [36]. 
This is optimal even for bounded-error algo- 
rithms [37]. By applying quantum search re- 
cursively, a depth-d regular AND-OR tree can 
be evaluated with constant error in time JN - 
O(log N)4~!. The log factors come from ampli- 
fying the success probability of inner searches to 
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be close to one. Bounded-error quantum search 
eliminates these log factors, reducing the time to 


O (v N- ef) for some constant c. Recently, an 


O(N°®5+°)) time algorithm has been discov- 
ered for evaluating an arbitrary AND-OR tree on 
N variables [4, 13]. 


Open Problems 


In several applications of the quantum search 
algorithm, only the leading asymptotic behavior 
of the query complexity is known, and attempts 
to suppress logarithmic corrections (when they 
appear) and reduce the scaling constants continue 
[34]. In this section, we point out some other 
offbeat applications. 


Hamiltonian Evolution 

Many conventional algorithms for simulations of 
quantum systems with sparse Hamiltonians use 
the Trotter formula with a small step size. They 
have a power-law dependence of the computa- 
tional complexity on the simulation error and 
hence are not efficient. In contrast, Grover search 
amounts to a Trotter formula with the largest 
possible step size, given the projection operator 
nature of the terms in the Hamiltonian. A recent 
exciting realization is that this feature leads to 
only logarithmic dependence of the computa- 
tional complexity on the simulation error, which 
is an exponential improvement. The general strat- 
egy is to decompose the sparse Hamiltonian as a 
sum of projection operators, formulate the evo- 
lution problem as a multi-query search problem, 
and then use a large step size Trotter formula 
to simulate it [7,31]. This framework can also 
readily benefit classical simulations of quantum 
systems. 


Molecular Biology 

Many molecular processes of metabolism occur 
at scales, nanometer and picosecond, where 
quantum dynamics is relevant. They frequently 
involve unstructured search and transport, in the 
sense that correct ingredients for the processes 
have to be found from the mixture of molecules 
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floating around. Evolution over billions of years 
has certainly produced complex machinery to 
carry out these searches efficiently, although 
we do not fully comprehend their optimization 
criteria. Attempts to understand some of these 
processes suggest that Grover search may 
have played a role in their design, quite 
likely exploiting coherent coupled vibrational 
modes and not quantum superposition. An 
intriguing example is that the universal genetic 
language uses an alphabet of four letters, 
while a binary alphabet would be sufficient 
and simpler to construct during evolution [30]. 
Coherent vibrational dynamics of molecules also 
contributes to efficient energy transport during 
photosynthesis and to the detection of smell [23]. 


Ordered Search 

A sequentially ordered database can be easily 
searched by factoring f(i) into subqueries for 
individual digits of 7. An alternative is to use a 
different oracle g(i), such that g(7) = 0 when 
i represents items before the desired item and 
g(i) = 1 otherwise. Classically, binary search 
is the optimal algorithm given either f(i) or 
g(i) and requires [log, N] queries. The opti- 
mal quantum algorithm for f(i) is quaternary 
search with 0.5[log, N] queries, but surprisingly 
a quantum algorithm using g(i) can do better. 
In case of g(i), though the optimal solution is 
unknown, the query complexity for an exact algo- 
rithm has a lower bound of 0.221 log, N [21] and 
a known solution of 0.433 log, N [12] (there also 
exists a quantum stochastic Las Vegas algorithm 
with 0.32 log, N expected queries and o(1) error 


[6]). 


Search with Additional Structure 

It may be possible to speed up a search pro- 
cess beyond the square-root speedup of Grover 
search, when the problem has extra structure 
beyond the minimal information provided by the 
oracle f(i). The details of the algorithm and 
the extent of speedup would then depend on 
the extra structure, and the possibilities are open 
to explorations. Some examples are symmetries 
among the items, associative memory recall with 
connections, and patterns in the Boolean function 
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to be evaluated. Another problem of interest is 
determination of the complete path (with certain 
properties) from the initial to the target state 
instead of locations of just the end points. 


Perspective 

In a lecture at the Bell Labs in 1985, Richard 
Feynman made an interesting observation. In 
the 1940s when airplanes were being developed, 
aeronautical engineers had proved bounds and 
theorems about why planes would never be able 
to fly faster than the speed of sound. For several 
years, this speed was regarded as fundamentally 
a bound for flights as the speed of light is for 
communications. However, gradually just by 
using intelligent design, it was discovered that 
airplanes could indeed fly faster than the speed 
of sound — only the rules of design in the new 
regime were very different. The question is 
whether the bounds on quantum computation 


(specifically the Q (v N ) bound for search) will 


continue to hold, or by making the rules of design 
very different, just as in the case of supersonic 
airplanes, someone will find a way around these 
bounds. No one has found any loophole in the 
arguments in the 20 years since the lower bound 
for quantum search was discovered, despite 
numerous scientists from different fields having 
tried their hand at it. On the other hand, even 
though this bound has been derived over and over 
again using different methods, no one has come 
up with a simple and short physical explanation 
for it, which would give one the assurance that 
one really understood it. 
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Problem Definition 


Our goal is to design differentially private al- 
gorithms to answer statistical queries on a sen- 
sitive database. We model the database D = 


(x1,.. 


-.Xn) € ({0, 134)" as a collection of n 


records — one per individual — each consisting 
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of d binary attributes. A differentially private 
algorithm is a randomized algorithm whose out- 
put distribution does not depend “significantly” 
on any one record of the database. The formal 
definition is as follows: 


Definition 1 ({8]) An algorithm A:({0, 1}2)"—> 
R is (e, )-differentially private if for every pair 
of databases D, D’ € ({0, 1}¢)” that differ on at 
most one row and every S C R, 


P[A(D) € S] < e®P[A(D’) € S] +58. 


Henceforth, we will say A is differentially private 
if it satisfies (1, 1/n”)-differential privacy. (The 
choice of ¢ = 1 and 6 = 1/n? is arbitrary and 
can be replaced with ¢ = c and 6 = 1/n'+¢ 
for any constants c,c’ > 0 without affecting any 
stated results). 


A statistical query (henceforth, simply 
query) is specified by a Boolean predicate 
q : {0,1}4 — {0,1}. The answer to a query is 
the expected value of the predicate over records 
in the database. Abusing notation, we write 


1 n 
q(D) = — 9 1 4(xi). 


i=1 


We wish to design a differentially private al- 
gorithm A that takes a database and a set of 
statistical queries and outputs an approximate 
answer to each query. 


Definition 2 An algorithm A is a-accurate for 
a query g if A(D) outputs a € [0,1] such that 
|a—q(D)| < a, with probability at least 99/100. 
An algorithm A is a-accurate for a set of queries 
QO ={M1.q....} if A(D) outputs (az)geq such 
that for every g € Q, |ag — q(D)| < @ with 
probability at least 99/100. 


The goal is to design differentially private 
algorithms that are w-accurate for sets of queries 
QO as large as possible. As privacy is easier to 
achieve when the number of records n is large, 
we will seek to obtain privacy and accuracy for 
n as small as possible. Lastly, we seek to make 
the algorithms as computationally efficient as 
possible. 
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Key Results 


As a baseline, we will consider simple addi- 
tive perturbation [2, 6-8], which answers each 
query by independently perturbing the answer 
with noise from a suitable distribution. 


Theorem 1 There is a differentially private al- 
gorithm A that takes a database D € ({0, 134)" 
and a set of queries O = {q1,...,4x} as input, 
runs in time poly(n, d, |qi| +--+: + |qx|), and is 
a-accurate for Q so long asn > O(|Q|!/2/a). 


Here, |qg| represents the time complexity of 
evaluating the predicate on a single row of the 
database. Typically, this it is assumed to be 
poly(d). 

Additive perturbation is differentially private 
and computationally efficient, but requires that 
the size of the database be polynomial in the num- 
ber of queries, and thus is restricted to answering 
at most about n2 queries. As we will see, it is 
possible to accurately answer exponentially more 
queries under differential privacy. 


Answering Many Queries via No-Regret 
Learning 

The first algorithm that improved on additive 
perturbation for answering arbitrary queries was 
given by Blum, Ligett, and Roth [3]. Surprisingly, 
they showed for the first time that it was possi- 
ble to answer exponentially many queries under 
differential privacy. Subsequent to their work, 
there were several improvements in the compu- 
tational efficiency, functionality, and quantitative 
guarantees of their algorithm. This work led to 
the private multiplicative weights algorithm of 
Hardt and Rothblum [10]. We summarize the 
capabilities of this algorithm in the following 
theorem. 


Theorem 2 ({10]) There is a differentially pri- 
vate algorithm A that takes a database D € 
({0, 132)” and a set of queries O = {q,,....4k} 
as input, runs in time poly(n,24,\q1| +--+: + 
ldx|), and is a-accurate for |Q| so long asn > 


O(V/d log |O|/a?). 


The private multiplicative weights algorithm 
is based on the following surprisingly simple 


1718 


framework: Begin with a “crude approximation” 
of the database D!. Then, fort = 1,...,7, 
find (in a differentially private manner) a query 
q' € Q such that the approximation D’ does 
not give an accurate answer. That is, |g’(D) — 
q' (D‘)| > a. Use q’ to “update” D* into a better 
approximation D‘t!, Finally, output the answers 
to QO given by D’. 

Remarkably, it is possible to find a query q’ € 
QO such that D‘ is inaccurate (or conclude that 
none exists) using much less data than would 
be required to simply answer all the queries 
in Q using additive perturbation. Perhaps even 
more surprisingly, it can be shown that if the 
updates are performed using the multiplicative 
weights update rule, then after T = O(d/a7) 
iterations (independent of | Q|!), the database D? 
will give an accurate answer to every g € Q. 
This argument makes use of the guarantee that 
multiplicative weights update rule is a “no-regret 
learning algorithm” (cf. the survey of Arora, 
Hazan, and Kale [1] for more information about 
the multiplicative weights update rule). This fast 
convergence makes it possible to argue that the 
algorithm can give accurate and differentially 
private answers with much less data than would 
be required by simple additive perturbation. 


Computational Complexity and Optimality 

When |Q| is large, the private multiplicative 
weights algorithm requires many fewer records n 
than additive perturbation (when |Q| >> d/c7). 
One might ask whether even fewer records suf- 
fices. Bun, Ullman, and Vadhan [4] gave a neg- 
ative answer to this question, and showed that 
the private multiplicative weights algorithm uses 
essentially the fewer records possible. 


Theorem 3 ([4]) There is no (even computation- 
ally inefficient) differentially private algorithm 
A that takes an arbitrary set of queries QO = 
{d1.---.dx} with k >> d/a? and a database 
D € ({0, 134)" withn < 2(V/d log|Q|/a?) as 
input and is a-accurate for the set of queries Q. 


A drawback of the private multiplicative 
weights algorithm (and all known algorithms 
with similar properties), when compared to ad- 
ditive perturbation, is computational complexity. 
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Even when answering a polynomial number of 
efficiently computable queries, the running time 
of private multiplicative weights is dominated 
by the factor of 2%, which is exponential in the 
number of attributes in the database. Ullman [13] 
showed that this is inherent, and (under a widely 
believed cryptographic assumption) improving 
on additive perturbation requires exponential 
running time. 


Theorem 4 ([13]) Assuming the existence of 
one-way functions, there is no differentially 
private algorithm A that takes an arbitrary 
set of queries OQ = {q\,...,qx} database 
D é€ ({0,1}4)" with n < Q(\Q|'/?) as input, 
runs in time poly(n, d,|qi| + |qx|), and is 1/3- 
accurate for the set of queries Q. 


Together, these negative results show the private 
multiplicative weights algorithm is nearly opti- 
mal for answering large sets of arbitrary statis- 
tical queries under differential privacy. 


Faster Algorithms for Marginal Queries via 
Efficient Learning 

Given the hardness of answering arbitrary 
queries, there has been a significant effort to 
design faster differentially private algorithms that 
improve on additive noise for natural restricted 
set of queries. One such set of queries is k- 
way marginals. These queries are specified by a 
subset of attributes S C [d] of size at most k and 
a pattern ¢ € {0, 1}!5! and asks for the fraction of 
records in D that have each attribute j € S set 
to t;. Note that there are poly(d *) such queries, 
and thus, additive perturbation would require 
running time poly(d*) and n > poly(d*). On the 
other hand, private multiplicative weights would 
require running time poly(2), butn > O(k Vd) 
would suffice. 

Most of the more effective algorithms for 
answering k-way marginal queries are based on 
the following technique, introduced by Gupta 
et al. [9]: View the database D as specifying a 
function fp(q) = q(D) that maps a query to its 
answer on D, and then attempt to “learn” a differ- 
entially private approximation gp ~ fp. Intu- 
itively, the value of this approach is that learning 
algorithms see the evaluation of fp on a small 
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number of queries and then are able to predict the 
value of fp on new queries. Since the learning al- 
gorithm only needs a small number of examples, 
it is easier to ensure differential privacy. If the 
queries g are “simple,” then good learning algo- 
rithms may exist for the function fp. In the case 
of k-way marginal queries, it turns out that f 
is in fact (an average of) conjunctions, and there 
are learning algorithms for this class of functions 
that satisfy various interesting parameter trade- 
offs. This technique underlies the following two 
results: 


Theorem 5 ({12], building on [11]) For every 
k &€N, there is a differentially private algo- 
rithm A that takes a database D € ({0,1}@)" 
as input and runs in time poly(n, d¥*), and if 
n> poly(d¥*), A outputs a summary of the 
database that yields 1/100-accurate answers to 
every k-way marginal query. That is, for every k- 
way marginal query q, one can obtain a 1/100- 
accurate answer to q in time poly(n, d vk). 


Theorem 6 (([5]) For every k € N, there is a 
differentially private algorithm A that takes a 


database D € ({0,1}4)" as input and runs in 


time poly(n, 2451/1), and ifn > kd*', A 


outputs 1/100-accurate answers to every k-way 
marginal query. 


We remark that there are many other algo- 
rithms for answering k-way marginal queries 
based on this learning approach, each achiev- 
ing different parameter trade-offs and guarantees 
of accuracy. At the time of writing, improving 
these algorithms and extending these techniques 
to richer classes of queries remains an active area 
of research. 
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Problem Definition 


Quorum systems are tools for increasing the 
availability and efficiency of replicated services. 
A quorum system for a universe of servers is 
a collection of subsets of servers, each pair of 
which intersect. Intuitively, each quorum can 
operate on behalf of the system, thus increasing 
its availability and performance, while the 
intersection property guarantees that operations 
done on distinct quorums preserve consistency. 

The motivation for quorum systems stems 
from the need to make critical missions 
performed by machines that are reliable. The only 
way to increase the reliability of a service, aside 
from using intrinsically more robust hardware, is 
via replication. To make a service robust, it can 
be installed on multiple identical servers, each 
one of which holds a copy of the service state 
and performs read/write operations on it. This 
allows the system to provide information and 
perform operations even if some machines fail 
or communication links go down. Unfortunately, 
replication incurs a cost in the need to maintain 
the servers consistent. To enhance the availability 
and performance of a replicated service, Gifford 
and Thomas introduced in 1979 [3, 14] the 
usage of votes assigned to each server, such 
that a majority of the sum of votes is sufficient 
to perform operations. More generally, quorum 
systems are defined formally as follows: 

Quorum system: Assume a universe U 
of servers, |U| =n, and an arbitrary number 
of clients. A quorum system Q © 2" is a set of 
subsets of U, every pair of which intersect. Each 
O € Q iscalled a quorum. 


Access Protocol 
To demonstrate the usability of quorum systems 
in constructing replicated services, quorums are 
used here to implement a multi-writer multi- 
reader atomic shared variable. Quorums have also 
been used in various mutual exclusion protocols, 
to achieve Consensus, and in commit protocols. 
In the application, clients perform read and 
write operations on a variable x that is replicated 
at each server in the universe U. A copy of the 
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variable x is stored at each server, along with 
a timestamp value ¢. Timestamps are assigned by 
a client to each replica of the variable when the 
client writes the replica. Different clients choose 
distinct timestamps, e.g., by choosing integers 
appended with the name of c in the low-order bits. 
The read and write operations are implemented as 
follows. 

Write: For a client c to write the value v, it 
queries each server in some quorum Q to obtain 
a set of value/timestamp pairs A = {(v,, ty) }ueQ; 
chooses a timestamp t € 7, greater than the high- 
est timestamp value in A; and updates x and the 
associated timestamp at each server in Q to v 
and ft, respectively. 

Read: For a client to read x, it queries each 
server in some quorum Q to obtain a set of 
value/timestamp pairs A = {(vy,ty)}ueQ. The 
client then chooses the pair (v, ¢) with the highest 
timestamp in A to obtain the result of the read 
operation. It writes back (v,t) to each server in 
some quorum Q’. 

In both read and write operations, each server 
updates its local variable and timestamp to the 
received values (v,t) only if t is greater than the 
timestamp currently associated with the variable. 
The above protocol correctly implements the se- 
mantics of a multi-writer multi-reader atomic 
variable (see » Linearizability). 


Key Results 


Perhaps the two most obvious quorum systems 
are the singleton, and the set of majorities, or 
more generally, weighted majorities suggested by 
Gifford [3]. 

Singleton: The set system 9 = {{u}} for 
some u € U is the singleton quorum system. 

Weighted Majorities: Assume that 
ery server s in the universe U is assigned 
a number of votes w,. Then, the set system 
Q2={O CU: co Wa > Qigeu Wq)/2} is 
a quorum system called Weighted Majorities. 
When all the weights are the same, simply call 
this the system of Majorities. 


ev- 
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Quorums, Fig. 1 The Grid quorum system of 6 x 6, 
with one quorum shaded 


An example of a quorum system that cannot 
be defined by voting is the following Grid con- 
struction: 

Grid: Suppose that the universe of servers is 
of size n = k* for some integer k. Arrange the 
universe into a ./n x ./n grid, as shown in Fig. 1. 
A quorum is the union of a full row and one 
element from each row below the full row. This 
yields the Grid quorum system, whose quorums 
are of size O(./n). 

Maekawa suggests in [6] a quorum system that 
has several desirable symmetry properties, and in 
particular, that every pair of quorums intersect in 
exactly one element: 

FPP: Suppose that the universe of servers 
is of size n =q*+q+1, where g = p” for 
a prime p. It is known that a finite projective plane 
exists for n, with g +1 pairwise intersecting 
subsets, each subset of size g + 1, and where 
each element is contained in g + 1 subsets. Then 
the set of finite projective plane subsets forms 
a quorum system. 


Voting and Related Notions 
Since generally it would be senseless to access 
a large quorum if a subset of it is a quorum, 
a good definition may avoid such anomalies. 
Garcia-Molina and Barbara [2] call such well- 
formed systems coteries, defined as follows: 
Coterie: A coterie Q C 2” is a quorum sys- 
tem such that forany 0,0'’€92:Q0¢€4 OQ". 
Of special interest are quorum systems that 
cannot be reduced in size (i.e., that no quorum in 
the system can be reduced in size). Garcia-Molina 
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and Barbara [2] use the term “dominates” to mean 
that one quorum system is always superior to 
another, as follows: 

Domination: Suppose that 9, 9’ are two co- 
teries, 9 # Q’, such that for every QO’ € Q’, there 
exists a Q € Q such that O C Q’. Then 2 dom- 
inates Q'.Q’ is dominated if there exists a coterie 
Q that dominates it, and is non-dominated if no 
such coterie exists. 

Voting was mentioned above as an intuitive 
way of thinking about quorum techniques. As 
it turns out, vote assignments and quorums are 
not equivalent. Garcia-Molina and Barbara [2] 
show that quorum systems are strictly more 
general than voting, i.e., each vote assignment 
has some corresponding quorum system but 
not the other way around. In fact, for a system 
with n servers, there is a double-exponential 
22°") number of non-dominated coteries, and 
only O(2”’) different vote assignments, though 
for n < 5, voting and non-dominated coteries are 
identical. 


Measures 

Several measures of quality have been identified 
to address the question of which quorum system 
works best for a given set of servers; among these, 
load and availability are elaborated on here. 


Load 

A measure of the inherent performance of a quo- 
rum system is its Joad. Naor and Wool define in 
[10] the load of a quorum system as the probabil- 
ity of accessing the busiest server in the best case. 
More precisely, given a quorum system Q, an 
access strategy w is a probability distribution on 
the elements of Q;i.e., )’9cg w(Q) = 1. w(Q) 
is the probability that quorum Q will be chosen 
when the service is accessed. Load is then defined 
as follows: 

Load: Let a strategy w be given for a quorum 
system 9 = {Q1,...,Qm} over a universe U. 
For an element u € U, the load induced by w on u 
is 1,(u) = 99,5, (Qi). The load induced by 
a strategy w on a quorum system Q is 


L,,(Q) = max{l,(u)}. 
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The system load (or just load) on a quorum 
system Q is 


L(Q) = min{L(Q)}, 


where the minimum is taken over all strategies. 

The load is a best-case definition, and will 
be achieved only if an optimal access strategy 
is used, and only in the case that no failures 
occur. A strength of this definition is that load 
is a property of a quorum system, and not of the 
protocol using it. 

The following theorem was proved in [10] for 
all quorum systems. 


Theorem 1 Let Q be a quorum system over 
a universe of n elements. Denote by c(Q) the 
size of the smallest quorum of Q. Then L(Q) = 


max{ Toy < 2) i. Consequently, L(Q) > a 


Availability 

The resilience f of a quorum system provides one 
measure of how many crash failures a quorum 
system is guaranteed to survive. 

Resilience: The resilience f of a quorum sys- 
tem Q is the largest k such that for every set 
K CU, |K| =k, there exists Q € Q such that 
KNQ=8%. 

Note that, the resilience fis at most c(Q) — 1, 
since by disabling the members of the small- 
est quorum every quorum is hit. It is possi- 
ble, however, that an f-resilient quorum system, 
though vulnerable to a few failure configurations 
of f +1 failures, can survive many configura- 
tions of more than f failures. One way to measure 
this property of a quorum system is to assume 
that each server crashes independently with prob- 
ability p and then to determine the probability F, 
that no quorum remains completely alive. This 
is known as failure probability and is formally 
defined as follows: 

Failure probability: Assume that each server 
in the system crashes independently with prob- 
ability p. For every quorum Q € Q let Eg be 
the event that Q is hit, i.e., at least one ele- 
ment i € Q has crashed. Let crash (Q) be the 
event that all the quorums Q € Q were hit, ie., 
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crash (Q) = \oe Q Eq. Then the system failure 
probability is F’,(Q) = Pr(crash (Q)). 

Peleg and Wool study the availability of 
quorum systems in [Il]. A good failure 
probability F,(Q) for a quorum system Q 
has limy-+oo Fp(Q) =0 when p< i. Note 
that, the failure probability of any quorum 
system whose resilience is f is at least e~ 2). 
Majorities has the best availability when p < }; 
for p = 5; there exist quorum constructions with 
F,(2) = 3 for p> 1 the singleton has the 
best failure probability F,(Q) = p, but for most 
quorum systems, F,(Q) tends to 1. 


The Load and Availability of Quorum 
Systems 
Quorum constructions can be compared by an- 
alyzing their behavior according to the above 
measures. The singleton has a load of 1, resilience 
0, and failure probability F, = p. This system 
has the best failure probability when p > 5; but 
otherwise performs poorly in both availability 
and load. 

The system of Majorities has a load of 

ne 1) s. It is resilient to [27] failures, and 

its failure probability is e~@. This system has 
the highest possible resilience and asymptotically 
optimal failure probability, but poor load. 

Grid’s load is O(=z). which is within a con- 


stant factor from optimal. However, its resilience 
is only ./n — 1 and it has poor failure probability 
which tends to | as n grows. 

The resilience of a FPP quorum system is 
q © s/n. The load of FPP was analyzed in [10] 
and shown to be L(FPP) = “+ ~ 1/,/n, which 
is optimal. However, its failure probability tends 
to | as n grows. 

As demonstrated by these systems, there is 
a tradeoff between load and fault tolerance in 
quorum systems, where the resilience f of a quo- 
rum system Q satisfies f <nL(Q). Thus, im- 
proving one must come at the expense of the 
other, and it is in fact impossible to simultane- 
ously achieve both optimally. One might con- 
clude that good load conflicts with low failure 
probability, which is not necessarily the case. 
In fact, there exist quorum systems such as the 
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Paths system of Naor and Wool [10] and the 
Triangle Lattice of Bazzi [1] that achieve asymp- 
totically optimal load of O(1/./n) and have close 
to optimal failure probability for their quorum 
sizes. Another construction is the CWlog system 
of Peleg and Wool [12], which has unusually 
small quorum sizes of logn — loglogn, and for 
systems with quorums of this size, has opti- 
mal load, L(CWlog) = O(1/logn), and optimal 
failure probability. 


Byzantine Quorum Systems 

For the most part, quorum systems were studied 
in environments where failures may simply cause 
servers to become unavailable (benign failures). 
But what if a server may exhibit arbitrary, pos- 
sibly malicious behavior? Malkhi and Reiter [7] 
carried out a study of quorum systems in environ- 
ments prone to arbitrary (Byzantine) behavior of 
servers. Intuitively, a quorum system tolerant of 
Byzantine failures is a collection of subsets of 
servers, each pair of which intersect in a set 
containing sufficiently many correct servers to 
mask out the behavior of faulty servers. More 
precisely, Byzantine quorum systems are defined 
as follows: 


Masking quorum system 
A quorum system Q is a b-masking quorum 
system if it has resilience f > b, and each pair 
of quorums intersect in at least 2b + 1 elements. 
The masking quorum system requirements 
enable a client to obtain the correct answer 
from the service despite up to b Byzantine 
server failures. More precisely, a write operation 
remains as before; to obtain the correct value 
of x from a read operation, the client reads 
a set of value/timestamp pairs from a quorum Q 
and sorts them into clusters of identical pairs. 
It then chooses a value/timestamp pair that is 
returned from at least b + 1 servers, and therefore 
must contain at least one correct server. The 
properties of masking quorum systems guarantee 
that at least one such cluster exists. If more 
than one such cluster exists, the client chooses 
the one with the highest timestamp. It is easy 
to see that any value so obtained was written 
before, and moreover, that the most recently 
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written value is obtained. Thus, the semantics 
of a multi-writer multi-reader safe variable are 
obtained (see » Linearizability) in a Byzantine 
environment. 

For a b-masking quorum system, the following 
lower bound on the load holds: 


Theorem 2 Let Q be a b-masking quorum sys- 
2b+1 c(Q) 
tem. Then L(Q) = max{ 7)? = fs and conse- 


quently L(Q) > 


This bound is tight, and masking quorum con- 
structions meeting it were shown. 

Malkhi and Reiter explore in [7] two 
variations of masking quorum systems. The 
first, called dissemination quorum systems, is 
suited for services that receive and distribute se/f- 
verifying information from correct clients (e.g., 
digitally signed values) that faulty servers can 
fail to redistribute but cannot undetectably alter. 
The second variation, called opaque masking 
quorum systems, is similar to regular masking 
quorums in that it makes no assumption of self- 
verifying data, but it differs in that clients do not 
need to know the failure scenarios for which the 
service was designed. This somewhat simplifies 
the client protocol and, in the case that the failures 
are maliciously induced, reveals less information 
to clients that could guide an attack attempting 
to compromise the system. It is also shown in [7] 
how to deal with faulty clients in addition to 
faulty servers. 


2b+1 
ao 


Probabilistic Quorum Systems 

The resilience of any quorum system is bounded 
by half of the number of servers. Moreover, as 
mentioned above, there is an inherent tradeoff 
between low load and good resilience, so that it is 
in fact impossible to simultaneously achieve both 
optimally. In particular, quorum systems over 1 
servers that achieve the optimal load of Ti can 


tolerate at most ./n faults. 

To break these limitations, Malkhi et al. pro- 
pose in [8] to relax the intersection property of 
a quorum system so that “quorums” chosen ac- 
cording to a specified strategy intersect only with 
very high probability. They accordingly name 
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these probabilistic quorum systems. These sys- 
tems admit the possibility, albeit small, that two 
operations will be performed at non-intersecting 
quorums, in which case consistency of the system 
may suffer. However, even a small relaxation 
of consistency can yield dramatic improvements 
in the resilience and failure probability of the 
system, while the load remains essentially un- 
changed. Probabilistic quorum systems are thus 
most suitable for use when availability of op- 
erations despite the presence of faults is more 
important than certain consistency. This might 
be the case if the cost of inconsistent operations 
is high but not irrecoverable, or if obtaining the 
most up-to-date information is desirable but not 
critical, while having no information may have 
heavier penalties. 

The family of constructions suggested in [8] is 
as follows: 

W(n, £) Let Ube a universe of sizen.W(n, £), 
€> 1, is the system (Q,w) where Q is the 
set system 9={QOCU:|QO|=/n}; w 
is an access strategy w defined by VO € Q, 
w(Q) = or 

The probability of choosing according to w 
two quorums that do not intersect is less than 
ee and can be made sufficiently small by 
appropriate choice of £. Since every element 
is in (ovm1) quorums, the load L(W(n, £)) is 
wa = O(-=). Because only £,/n servers need 
be available in order for some quorum to be avail- 
able, W(n, £) is resilient to n — £,/n crashes. 
The failure probability of W(n, 2) is less than 
e~?@) for all p< 1— +, which is asymptot- 
ically optimal. Moreover, if , <p<1- we 
this probability is provably better than any (non- 
probabilistic) quorum system. 

Relaxing consistency can also provide dra- 
matic improvements in environments that may 


experience Byzantine failures. More details can 
be found in [8]. 


Applications 


Just about any fault tolerant distributed protocol, 
such as Paxos [5] or consensus [1] implicitly 
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builds on quorums, typically majorities. More 
concretely, scalable data repositories were 
built, such as Fleet [9], Rambo [4], and 
Rosebud [13]. 
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Problem Definition 


Consider a graph G(V, E). For any two vertices 
u,v € V, d(u,v) denotes the distance of u, v in 
G. The general problem concerns a coloring of 
the graph G and it is defined as follows: 


Definition 1 (k-coloring problem) 

INPUT: A graph G(V, E). 

OuTPpuT: A_ function $:V — {1,...,co}, 
called k -coloring of G such that Vu,v € V, 
x €{0,1,...,k}: if dlu,v) > k —x +1 then 
|p(u) — o(v)| = x. 


© Springer Science+Business Media New York 2016 
M.-Y. Kao (ed.), Encyclopedia of Algorithms, 
DOI 10.1007/978-1-4939-2864-4 


OBJECTIVE: Let |¢(V)| = Ag. Then Ag is the 
number of colors that @ actually uses (it is 
usually called order of G under ¢~). The number 
Vg = MaxXyevo(v)—mMinyey (uy) + Lis usually 
called the span of G under g. The function ~ 
satisfies one of the following objectives: 


* minimum span: Ag is the minimum possible 
over all possible functions ¢ of G; 

* minimum order: vg is the minimum possible 
over all possible functions of G; 

e Min span order: obtains a minimum span 
and moreover, from all minimum span assign- 
ments, ~ obtains a minimum order. 

e Min order span: obtains a minimum order and 
moreover, from all minimum order assign- 
ments, ~ obtains a minimum span. 


Note that the case k = 1 corresponds to the well 
known problem of vertex graph coloring. Thus, 
k-coloring problem (with k as an input) is NP- 
complete [4]. The case of k-coloring problem 
where k = 2, is called the Radiocoloring prob- 
lem. 


Definition 2 (Radiocoloring Problem (RCP) [7]) 
INPUT: A graph G(V, £). 

OuTPuT: A function ®: V > N* such that 
|O(u) — O(v)| > 2 if d(u,v)=1 and 
|D(u) — O(v)| = Lif d(u, v) = 2. 

OBJECTIVE: The least possible number 
(order) needed to radiocolor G is denoted 
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by Xorder (G). The least possible number 
Maxyey P(v) — min,cey B(u) + | (span) needed 
for the radiocoloring of G is denoted as 
Xspan (G). Function @ satisfies one of the 
followings: 


¢ Min span RCP: ® obtains a minimum span, 

ie, Ao = Xspan(G); 

¢ Min order RCP: @ obtains a minimum order 

Vo = Xorder (G); 

e Min span order RCP: obtains a minimum 

span and moreover, from all minimum span 

assignments, ® obtains a minimum order. 

¢ Min order span RCP: obtains a minimum 
order and moreover, from all minimum order 
assignments, ® obtains a minimum span. 


A related to the RCP problem concerns to the 
square of a graph G, which is defined as follows: 


Definition 3 Given a graph G(V, E), G* is the 
graph having the same vertex set V and an edge 
set E’ : {u,v} € E’ iff d(u,v) < 2inG. 


The related problem is to color the square of 
a graph G, G* so that no two neighbor vertices 
(in G?) get the same color. The objective is 
to use a minimum number of colors, denoted 
as y(G7) and called chromatic number of the 
square of the graph G. Fotakis et al. [5, 6] first 
observed that for any graph G, Xorder(G) is the 
same as the (vertex) chromatic number of G?, i.e., 
X order(G) = x(G?). 


Key Results 


Fotakis et al. [5, 6] studied min span order, min 
order and min span RCP in planar graph G. 
A planar graph, is a graph for which its edges can 
be embedded in the plane without crossings. The 
following results are obtained: 


e It is first shown that the number of colors 
used in the min span order RCP of graph G 
is different from the chromatic number of the 
square of the graph, y(G7). In particular, it 
may be greater than y(G7). 
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e It is then proved that the radiocoloring 
problem for general graphs is hard to 
approximate (unless NP = ZPP, the class 
of problems with polynomial time zero-error 
randomized algorithms) within a factor of 
n'/2-€ (for any € > 0), where n is the number 
of vertices of the graph. However, when 
restricted to some special cases of graphs, 
the problem becomes easier. 

It is shown that the min span RCPand min 
span order RCP are NP -complete for planar 
graphs. Note that few combinatorial problems 
remain hard for planar graphs and their proofs 
of hardness are not easy since they have to use 
planar gadgets which are difficult to find and 
understand. 

¢ It presents a O(n A(G)) time algorithm that 
approximates the min order of RCP, Xorder, of 
a planar graph G by a constant ratio which 
tends to 2 as the maximum degree A(G) of G 
increases. 

The algorithm presented is motivated by 
a constructive coloring theorem of Heuvel 
and McGuiness [9]. The construction of [9] 
can lead (as shown) to an O(n”) technique 
assuming that a planar embedding of G is 
given. Fotakis et al. [5, 6] improves the time 
complexity of the approximation, and presents 
a much more simple algorithm to verify and 
implement. The algorithm does not need any 
planar embedding as input. 

e Finally, the work considers the problem 

the number of different 

of a planar graph G. 

This is a #P-complete problem (as can 

be easily seen from the completeness 

reduction presented there that can be done 
parsimonious). They authors employ here 
standard techniques of rapidly mixing Markov 

Chains and the new method of coupling for 

purposes of proving rapid convergence (see 

e.g., [10]) and present a fully polynomial 

approximation scheme for 

estimating the number of radiocolorings 
with A colors for a planar graph G, when 

A = 4A(G) + 50. 


of estimating 


radiocolorings 


randomized 


In [8] and [7] it has been proved that the 
problem of min span RCP is NP-complete, even 
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for graphs of diameter 2. The reductions use 
highly non-planar graphs. In [11] it is proved that 
the problem of coloring the square of a general 
graph is NV P-complete. 

Another variation of RCP for planar graphs, 
called distance-2-coloring is studied in [12]. This 
is the problem of coloring a given graph G with 
the minimum number of colors so that the ver- 
tices of distance at most two get different colors. 
Note that this problem is equivalent to coloring 
the square of the graph G, G?. In [12] it is 
proved that the distance-2-coloring problem for 
planar graphs is VP-complete. As it is shown 
in [5, 6], this problem is different from the min 
span order RCP. Thus, the NVP-completeness 
proof in [12] certainly does not imply the NP- 
completeness of min span order RCP proved 
in [5, 6]. In [12] a 9-approximation algorithm for 
the distance-2-coloring of planar graphs is also 
provided. 

Independently and in parallel, Agnarsson and 
Halld6érsson in [1] presented approximations 
for the chromatic number of square and power 
graphs (G*). In particular they presented an 1.8- 
approximation algorithm for coloring the square 
of a planar graph of large degree (A(G) > 749). 
Their method utilizes the notion of inductiveness 
of the square of a planar graph. 

Bodlaender et al. in [2] proved also 
independently and and in parallel that the min 
span RCP, called A-labeling there, is NP- 
complete for planar graphs, using a similar to 
the approach used in [5, 6]. In the same work 
the authors presented approximations for the 
problem for some interesting families of graphs: 
outerplanar graphs, graphs of bounded treewidth, 
permutation and split graphs. 


Applications 


The Frequency Assignment Problem (FAP) in 
radio networks is a well-studied, interesting 
problem, aiming at assigning frequencies to 
transmitters exploiting frequency reuse while 
keeping signal interference to acceptable levels. 
The interference between transmitters are 
modeled by an interference graph G(V, E), 
where V (|V| =~7) corresponds to the set of 
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transmitters and E represents distance constraints 
(e.g., if two neighbor nodes in G get the same or 
close frequencies then this causes unacceptable 
levels of interference). In most real life cases 
the network topology formed has some special 
properties, e.g., G is a lattice network or a planar 
graph. Planar graphs are mainly the object of 
study in [5, 6]. 

The FAP is usually modeled by variations of 
the graph coloring problem. The set of colors 
represents the available frequencies. In addition, 
each color in a particular assignment gets an inte- 
ger value which has to satisfy certain inequalities 
compared to the values of colors of nearby nodes 
in G (frequency-distance constraints). A discrete 
version of FAP is the k-coloring problem, of 
which a particular instance, for k = 2, is inves- 
tigated in [5, 6]. 

Real networks reserve bandwidth (range of 
frequencies) rather than distinct frequencies. In 
this case, an assignment seeks to use as small 
range of frequencies as possible. It is sometimes 
desirable to use as few distinct frequencies of 
a given bandwidth (span) as possible, since the 
unused frequencies are available for other use. 
However, there are cases where the primary ob- 
jective is to minimize the number of frequencies 
used and the span is a secondary objective, since 
we wish to avoid reserving unnecessary large 
span. These realistic scenaria directed researchers 
to consider optimization versions of the RCP, 
where one aims in minimizing the span (band- 
width) or the order (distinct frequencies used) of 
the assignment. Such optimization problems are 
investigated in [5, 6]. 
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Problem Definition 


This classic problem in complexity theory is 
concerned with efficiently finding a satisfying 
assignment to a propositional formula. The input 
is a formula with n Boolean variables which is 
expressed as an AND of ORs with 3 variables 
in each OR clause (a 3-CNF formula). The 
goal is to (1) find an assignment of variables 
to TRUE and FALSE so that the formula has 
value TRUE or (2) prove that no such assignment 
exists. Historically, recognizing satisfiable 3- 
CNF formulas was the first “natural” example 
of an NP-complete problem, and, because it 
is NP-complete, no polynomial-time algorithm 
can succeed on all 3-CNF formulas unless P 
= NP [4, 10]. Because of the numerous practical 
applications of 3-SAT, and also due to its position 
as the canonical NP-complete problem, many 
heuristic algorithms have been developed for 
solving3-SAT, and some of these algorithms have 
been analyzed rigorously on random instances. 


Notation 

A 3-CNF formula over variables x1, X2,...,Xy iS 
the conjunction of m clauses C} AC2A...A Cm, 
where each clause is the disjunction of 3 literals, 
Ci; = li, V fi, V €i3, and each literal £;, is 
either a variable or the negation of a variable 
(the negation of the variable x is denoted by x). 
A 3-CNF formula is satisfiable if and only if 
there is an assignment of variables to truth values 
so that every clause contains at least one true 
literal. Here, all asymptotic analysis is in terms 
of n, the number of variables in the 3-CNF 
formula, and a sequence of events {E,} is said 
to hold with high probability (abbreviated whp) 
if (jim, Pr[E,] = 1. 


Distributions 
There are many distributions over 3-CNF formu- 
las which are interesting to consider, and this 
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chapter focuses on dense satisfiable instances. 
Dense satisfiable instances can be formed by con- 
ditioning on the event {J,,m is satisfiable}, but 
this conditional distribution is difficult to sample 
from and to analyze. This has led to research 
in “planted” random instances of 3-SAT, which 
are formed by first choosing a truth assignment 
uniformly at random and then selecting each 
clause independently from the triples of literals 
where at least one literal is set to TRUE by 
the assignment @. The clauses can be included 
with equal probabilities in analogy to the In,p 
or Iy,m distributions above [8, 9], or different 
probabilities can be assigned to the clauses with 
one, two, or three literals set to TRUE by 4, in 
an effort to better hide the satisfying assignment 
[2,2]: 


Problem 1 (3-SAT) 


INPUT: 3-CNF Boolean formula F = Cy AC2A 
+++ A Cm, where each clause C; is of the form 
Ci = li, V li V €:3 and each literal £;, is 
either a variable or the negation of a variable. 

OUTPUT: A truth assignment of variables to 
Boolean values which makes at least one 
literal in each clause TRUE or a certificate 
that no such assignment exists. 


Key Results 


A line of basic research dedicated to identifying 
hard search and decision problems, as well as the 
potential cryptographic applications of planted 
instances of 3-SAT, has motivated the develop- 
ment of algorithms for 3-SAT which are known 
to work on planted random instances. 

Majority Vote Heuristic: If every clause 
consistent with the planted assignment is 
included with the same probability, then there 
is a bias towards including the literal satisfied 
by the planted assignment more frequently than 
its negation. This is the motivation behind the 
majority vote heuristic, which assigns each 
variable to the truth value which will satisfy 
the majority of the clauses in which it appears. 
Despite its simplicity, this heuristic has been 
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proven successful whp for sufficiently dense 
planted instances [8]. 


Theorem 1 When c is a sufficiently large con- 
stant and I ~ ein w Whp the majority vote 
heuristic finds the planted assignment 9. 

When the density of the planted random in- 
stance is lower than clog n, then the major- 
ity vote heuristic will fail, and if the relative 
probability of the clauses satisfied by one, two, 
and three literals is adjusted appropriately, then 
it will fail miserably. But there are alternative 
approaches. 

For planted instances where the density is 
a sufficiently large constant, the majority vote 
heuristic provides a good starting assignment, 
and then the k-OPT heuristic can finish the job. 
The k-OPT heuristic of [6] is defined as follows: 
Initialize the assignment by majority vote. Initial- 
ize k to 1. While there exists a set of k variables 
for which flipping the values of the assignment 
will (1) make false clauses true and (2) will not 
make true clauses false, flip the values of the 
assignment on these variables. If this reaches a 
local optimum that is not a satisfying assignment, 
increase k and continue. 


Theorem 2 When c is a sufficiently large con- 
stant and I ~ (a the k-OPT heuristic finds 
a satisfying assignment in polynomial time whp. 
The same is true even in the semi-random case, 
where an adversary is allowed to add clauses to 
I that have all three literals set to TRUE by © 
before giving the instance to the k-OPT heuristic. 


A related algorithm has been shown to run in 
expected polynomial time in [9], and a rigorous 
analysis of warning propagation (WP), a message 
passing algorithm related to survey propagation, 
has shown that WP is successful whp on planted 
satisfying assignments, provided that the clause 
density exceeds a sufficiently large constant [5]. 

When the relative probabilities of clauses con- 
taining one, two, and three literals are adjusted 
carefully, it is possible to make the majority vote 
assignment very different from the planted as- 
signment. A way of setting these relative proba- 
bilities that is predicted to be difficult is discussed 
in [2]. If the density of these instances is high 
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enough (and the relative probabilities are any- 
thing besides the case of “Gaussian elimination 
with noise”), then a spectral heuristic provides 
a starting assignment close to the planted as- 
signment and local reassignment operations are 
sufficient to recover a satisfying assignment [7]. 

More formally, consider instance [ = 
In,pi,p2,p3, formed by choosing a_ truth 
assignment @ on n variables uniformly at random 
and including in J each clause with exactly 
i literals satisfied by @ independently with 
probability p;. By setting p) = p2 = ps3, this 
reduces to the distribution mentioned above. 

Setting pi = p2 and p3 = O yields a 
natural distribution on 3CNFs with a planted 
not-all-equal assignment, a situation where the 
greedy variable assignment rule generates a ran- 
dom assignment. Setting po = p3 = 0 gives 
3CNFs with a planted exactly-one-true assign- 
ment (which succumb to the greedy algorithm 
followed by the nonspectral steps below). Also, 
correctly adjusting the ratios of pi, p2, and p3 
can obtain a variety of (slightly less natural) 
instance distributions which thwart the greedy al- 
gorithm. Carefully selected values of pi, p2, and 
p3 are considered in [2], where it is conjectured 
that no algorithm running in polynomial time 
can solve In. p1,p2,p3 Whp when p; = cja/n 
and 


0.007 < c3 < 0.25 c2 = (1 — 4c3)/6 


4.25 
ci = (1+ 2c3)/6 a> a 


The spectral heuristic modeled after the color- 
ing algorithms of [1,3] was developed for such 
planted distributions in [7]. This polynomial time 
algorithm which returns a satisfying assignment 
to In,pi,p2,p3 Whp when py = d/n?, pr = 
n2d/n?, and p3 = 3d/n?, for 0 < y2,n3 < 
1, and d > admin, where dmin iS a function 
of H2,73. The algorithm is structured as fol- 
lows: 


1. Construct a graph G from the 3CNF. 
2. Find the most negative eigenvalue of a matrix 
related to the adjacency matrix of G. 
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3. Assign a value to each variable based on the 
signs of the eigenvector corresponding to the 
most negative eigenvalue. 

4. Iteratively improve the assignment. 

5. Perfect the assignment by exhaustive search 
over a small set containing all the incorrect 
variables. 


A more elaborate description of each step is the 
following: 


Step (1): Given 3CNF J = In, p1,p2,p3, where 
Pi = d/n?, px = 2d/n?, and p3 = 
n3d/n?, the graph in step (1), G = (V, E), 
has 2n vertices, corresponding to the literals 
in J, and labeled {x1, X1,...Xn, Xn}. G has an 
edge between vertices ¢; and ¢; if J includes 
a clause with both €; and ¢; (and G does not 
have multiple edges). 

Step (2): Consider G’ = (V,E’), formed by 
deleting all the edges incident to vertices with 
degree greater than 180d. Let A be the adja- 
cency matrix of G’. Let \ be the most negative 
eigenvalue of A and v be the corresponding 
eigenvector. 

Step (3): There are two assignments to consider, 
m+, which is defined by 


ee T, if v; => 0; 
th!) ) F. otherwise: 
and m_, which is defined by 


n(x) = 714(x). 


Let mo be the better of m4 and m_ (1.e., the 
assignment which satisfies more clauses). It 
can be shown that mo agrees with ¢~ on at 
least (1 — C/d)n variables for some absolute 
constant C. 

Step (4): For i = 1,...,log n, do the follow- 
ing: for each variable x, if x appears in 5ed 
clauses unsatisfied by m;-1, then set 1;(x) = 
—1;-1(x), where ¢ is an appropriately chosen 
constant (taking ¢ = 0.1 works); otherwise set 
ej (x) = Tyj-1(%). 

Step (5): Let 1) = Tiogn denote the final as- 
signment generated in step (4). Let Az be 
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the set of variables which do not appear in 
(3 + 4e)d clauses as the only true literal 
with respect to assignment mo, and let B be 
the set of variables which do not appear in 
(wp +¢)d clauses, where wpd = (34+ 6)d + 
(6+ 3)nod + 3n3d + O(1/n) is the expected 
number of clauses containing variable x. Form 
partial assignment 1’, by unassigning all vari- 


ables in AY and Bb. Now, fori >1, if there is 
a variable x; which appears in less than (1p — 
2e)d clauses consisting of variables that are all 
assigned by mj, then let m/,, be the partial 
assignment formed by unassigning x; in 1. 
Let x’ be the partial assignment when this 
process terminates. Consider the graph I’ with 
a vertex for each variable that is unassigned in 
m’ and an edge between two variables if they 
appear in a clause together. If any connected 
component in I is larger than log n, then 
fail. Otherwise, find a satisfying assignment 
for J by performing an exhaustive search on 
the variables in each connected component 
of T. 


Theorem 3 For any constants 0 < 2,93 < 1, 
except (2.13) = (0, 1), there exists a constant 
dmin such that for any d > dmin, if py = d/n?, 
p2 = nod /n?, and p3 = n3d/n?, then this 
polynomial-time algorithm produces a satisfying 
assignment for random instances drawn from 


Tn,p1,p2,p3 Whp. 


Applications 


3-SAT is a universal problem, and due to its sim- 
plicity, it has potential applications in many areas, 
including proof theory and program checking, 
planning, cryptanalysis, machine learning, and 
modeling biological networks. 


Open Problems 


An important direction is to develop alternative 
models of random distributions which more ac- 
curately reflect the type of instances that occur in 
the real world. 
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Data Sets 


Sample instances of satisfiability and 3-SAT are 
available on the web at http://www.satlib.org/. 


URL to Code 


Solvers and information on the annual satisfiabil- 
ity solving competition are available on the web 
at http://www.satlive.org/. 
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Problem Definition 


This problem is concerned with using the multi- 
writer multi-reader register primitive in the 
shared memory model to design a fast, wait-free 
implementation of consensus. Below are detailed 
descriptions of each of these terms. 


Consensus Problems 

There are 1 processors and the goal is to design 
distributed algorithms to solve the following two 
consensus problems for these processors. 


Problem 1 (Binary consensus) 

INPUT: Processor i has input bit b;. 

OUTPUT: Each processor i has output bit b; such 
that: (1) all the output bits 5; equal the same value 
v; and (2) v = b; for some processor i. 
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Problem 2 (Id consensus) 

INPUT: Processor i has a unique id uj. 
OutTPuT: Each processor i has output 
u;, such that: (1) all the output values w; 
the same value u; and (2) u=u; for 
processor 1. 


value 
equal 
some 


Wait-Free 

This result builds on extensive previous work on 
the shared memory model of parallel computing. 
Shared object types include data structures such 
as read/write registers and synchronization primi- 
tives such as “test and set’. A shared object is said 
to be wait-free if it ensures that every invocation 
on the object is guaranteed a response in finite 
time even if some or all of the other processors in 
the system crash. In this problem, the existence 
of wait-free registers is assumed and the goal is 
to create a fast wait-free algorithm to solve the 
consensus problem. In the rest of this summary, 
“wait-free implementations” will be referred to 
simply as “implementations” i.e., the term wait- 
free will be omitted. 


Multi-writer Multi-reader Register 

Many past results on solving consensus in the 
shared memory model assume the existence of 
a single writer multi-reader register. For such 
a register, there is a single writer client and 
multiple reader clients. Unfortunately, it is easy 
to show that the per processor step complexity 
of any implementation of consensus from single 
writer multi-reader registers will be at least linear 
in the number of processors. Thus, to achieve 
a time efficient implementation of consensus, the 
more powerful primitive of a multi-writer multi- 
reader register must be assumed. A multi-writer 
multi-reader register assumes the clients of the 
register are multiple writers and multiple readers. 
It is well known that it is possible to implement 
such a register in the shared memory model. 


The Adversary 

Solving the above problems is complicated by the 
fact that the programmer has little control over 
the rate at which individual processors execute. 
To model this fact, it is assumed that the schedule 
at which processors run is picked by an adversary. 
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It is well-known that there is no deterministic 
algorithm that can solve either Binary consensus 
or ID consensus in this adversarial model if the 
number of processors is greater than 1 [6, 7]. 
Thus, researchers have turned to the use of ran- 
domized algorithms to solve this problem [1]. 
These algorithms have access to random coin 
flips. Three types of adversaries are considered 
for randomized algorithms. The strong adversary 
is assumed to know the outcome of a coin flip 
immediately after the coin is flipped and to be 
able to modify its schedule accordingly. The 
oblivious adversary has to fix the schedule before 
any of the coins are flipped. The intermediate 
adversary is not permitted to see the outcome 
of a coin flip until some process makes a choice 
based on that coin flip. In particular, a process can 
flip a coin and write the result in a global register, 
but the intermediate adversary does not know the 
outcome of the coin flip until some process reads 
the value written in the register. 


Key Results 


Theorem 1 Assuming the existence of multi- 
writer multi-reader registers, there exists a ran- 
domized algorithm to solve binary consensus 
against an intermediate adversary with O(1) ex- 
pected steps per processor. 


Theorem 2 Assuming the existence of multi- 
writer multi-reader registers, there exists a ran- 
domized algorithm to solve id-consensus against 
an intermediate adversary with O(log? n) ex- 
pected steps per processor. 


Both of these results assume that every proces- 
sor has a unique identifier. Prior to this result, 
the fastest known randomized algorithm for bi- 
nary consensus made use of single writer multi- 
ple reader registers, was robust against a strong 
adversary, and required O(nlog”n) steps per 
processor [2]. Thus, the above improvements are 
obtained at the cost of weakening the adversary 
and strengthening the system model when com- 
pared to [2]. 
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Applications 


Binary consensus is one of the most fundamental 
problems in distributed computing. An example 
of its importance is the following result shown 
by Herlihy [8]: If an abstract data type X to- 
gether with shared memory is powerful enough to 
implement wait-free consensus, then X together 
with shared memory is powerful enough to im- 
plement in a wait-free manner any other data 
structure Y. Thus, using this result, a wait-free 
version of any data structure can be created using 
only wait-free multi-writer multi-reader registers 
as a building block. 

Binary consensus has practical applications 
in many areas including: database management, 
multiprocessor computation, fault diagnosis, and 
mission-critical systems such as flight control. 
Lynch contains an extensive discussion of some 
of these application areas [9]. 


Open Problems 


This result leaves open several problems. First, 
it leaves open a gap on the number of steps 
per process required to perform randomized con- 
sensus using multi-writer multi-reader registers 
against the strong adversary. A recent result by 
Attiya and Censor shows an §2(n) lower bound 
on the total number of steps for all processors 
with multi-writer multi-reader registers (implying 
Q2Q(n) steps per process) [3]. They also show 
a matching upper bound of O(n?) on the total 
number of steps. However, closing the gap on the 
per-process number of steps is still open. 

Another open problem is whether there is 
a randomized implementation of id consensus 
using multi-reader multi-writer registers that is 
robust to the intermediate adversary and whose 
expected number of steps per processor is better 
than O(log? n). In particular, is a constant run 
time possible? Aumann in follow up work to this 
result was able to improve the expected run time 
per process to O(log) [4]. However, to the best 
of the reviewer’s knowledge, there have been no 
further improvements. 
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A third open problem is to close the gap on the 
time required to solve binary consensus against 
the strong adversary with a single writer multiple 
reader register. The fastest known randomized 
algorithm in this scenario requires O(n log* n) 
steps per processor [2]. A trivial lower bound on 
the number of steps per processor when single- 
writer registers are used is {2(n). However, to the 
best of this reviewers knowledge, a O(log” n) gap 
still remains open. 

A final open problem is to close the gap on 
the total work required to solve consensus with 
single-reader single-writer registers against an 
oblivious adversary. Aumann and Kapah-Levy 
describe algorithms for this scenario that require 
O(n logn exp(2,/ Inn In(c log n log* n) expected 
total work for some constant c [5]. In particular, 
the total work is less than O(n!**) for any € > 0. 
A trivial lower bound on total work is 2(n), but 
a gap remains open. 
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Problem Definition 


This entry investigates deterministic and ran- 
domized protocols for achieving broadcast (dis- 
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tributing a message from a source to all other 
nodes) in arbitrary multi-hop synchronous radio 
networks. 

The model consists of an arbitrary (undi- 
rected) network, with processors communicating 
in synchronous time-slots subject to the following 
rules. In each time-slot, each processor acts either 
as a transmitter or as a receiver. A processor 
acting as a receiver is said to receive a message 
in time-slot ¢ if exactly one of its neighbors 
transmits in that time-slot. The message received 
is the one transmitted. If more than one neighbor 
transmits in that time-slot, a conflict occurs. In 
this case, the receiver may either get a message 
from one of the transmitting neighbors or get 
no message. It is assumed that conflicts (or 
“collisions”) are not detected, hence a processor 
cannot distinguish the case in which no neighbor 
transmits from the case in which two or more 
of its neighbors transmits during that time-slot. 
The processors are not required to have IDs 
nor do they know their neighbors; in particular, 
the processors do not know the topology of the 
network. 

The only inputs required by the protocol are 
the number of processors in the network — n, 
A — an a priori known upper bound on the 
maximum degree in the network, and the error 
bound —e. (All bounds are a priori known to the 
algorithm.) 

Broadcast is a task initiated by a single pro- 
cessor, called the source, transmitting a single 
message. The goal is to have the message reach 
all processors in the network. 


Key Results 


The main result is a randomized protocol that 
achieves broadcast in time which is optimal up to 
a logarithmic factor. In particular, with probabil- 
ity 1 — €, the protocol achieves broadcast within 
O((D + logn/e) - logn) time-slots. 

On the other hand, a linear lower bound on 
the deterministic time-complexity of broadcast 
is proved. Namely, any deterministic broadcast 
protocol requires Q2(”) time-slots, even if the 
network has diameter 3, and n is known to all 
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processors. These two results demonstrate an 
exponential gap in complexity between random- 
ization and determinism. 


Randomized Protocols 


The Procedure Decay 

The basic idea used in the protocol is to resolve 
potential conflicts by randomly eliminating half 
of the transmitters. This process of “cutting by 
half” is repeated each time-slot with the hope 
that there will exist a time-slot with a single ac- 
tive transmitter. The “cutting by half” process is 
easily implemented distributively by letting each 
processor decide randomly whether to eliminate 
itself. It will be shown that if all neighbors of a 
receiver follow the elimination procedure, then, 
with positive probability, there exists a time slot 
in which exactly one neighbor transmits. 

What follows is a description of the procedure 
for sending a message m, that is executed by each 
processor after receiving m: 

procedure Decay(k,m); 

repeat at most k times (but at least once!) 

send m to all neighbors; 

set coin<0 or | with equal probability. 

until coin =0. 

By using elementary probabilistic arguments, 
one can prove: 


Theorem 1 Let y be a vertex of G. Also let 
d >2 neighbors of y execute Decay during the 
time interval [0,k) and assume that they all start 
the execution at Time = 0. Then P(k,d), the 
probability that y receives a message by Time = 
k, satisfies: 


I. lim P(k.d) > 2; 
2. Fork > 2[logd], P(k,d) > 5. 


(All logarithms are to base 2.) 

The expected termination time of the algo- 
rithm depends on the probability that coin = 0. 
Here, this probability is set to be one half. An 
analysis of the merits of using other probabilities 
was carried out by Hofri [4]. 
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The Broadcast Protocol 
The broadcast protocol makes several calls to 
Decay (k,m). By Theorem | (2), to ensure that 
the probability of a processor y receiving the 
message be at least 1/2, the parameter k should 
be at least 2log d (where d is the number of 
neighbors sending a message to y). Since d is 
not known, the parameter was chosen as k = 
2[log A] (recall that A was defined to be an 
upper bound on the in-degree). Theorem | also 
requires that all participants start executing De- 
cay at the same time-slot. Therefore, Decay is 
initiated only at integer multiples of 2[log A]. 
procedure Broadcast; 
k = 2/log A]; 

t = 2[log(N/e)] ; 
Wait until receiving a message, say m; 
do ¢ times { 
Wait until (Time mod k) = 0; 
Decay(k,m) ; 


A network is said to execute the Broad- 
cast_scheme if some processor, denoted s, 
transmits an initial message and each processor 
executes the abovementioned Broadcast proce- 
dure. 


Theorem2 Let T = 2D + Smax{J/D, 
Vlog(n/e) - Jlog(n/e). Assume that Broad- 
cast_scheme starts at Time = 0. Then, with 
probability > 1 — 2, by time 2[log A] - T all 
nodes will receive the message. Furthermore, 
with probability => 1 — 2e, all the nodes will 
terminate by time 2 [log A] - (T + [log(N/e)]). 

The bound provided by Theorem 2 contains 
two additive terms: the first represents the diam- 
eter of the network, and the second represents 
delays caused by conflicts (which are rare, yet 
they exist). 


Additional Properties of the Broadcast 

Protocol 

¢ Processor IDs — The protocol does not use 
processor IDs, and thus does not require 
that the processors have distinct IDs (or that 
they know the identity of their neighbors). 
Furthermore, a processor is not even required 
to know the number of its neighbors. This 
property makes the protocol adaptive to 
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changes in topology which occur throughout 
the execution and resilient to non-malicious 
faults. 

¢ Knowing the size of the network — The 
protocol performs almost as well when given 
instead of the actual number of processors 
(i.e., 7), a “good” upper bound on this number 
(denoted NV). An upper bound polynomial in 
yields the same time-complexity, up to a con- 
stant factor (since complexity is logarithmic 
in N). 

* Conflict detection — The algorithm and its 
complexity remain valid even if no messages 
can be received when a conflict occurs. 

¢ Simplicity and fast local computation — In 
each time slot, each processor performs a 
constant amount of local computation. 

¢ Message complexity — Each processor is ac- 
tive for [log(N/e)] consecutive phases, and 
the average number of transmissions per phase 
is at most 2. Thus, the expected number of 
transmissions of the entire network is bounded 
by 2n - flog(N/e)]. 

¢ Adaptiveness to changing topology and 
fault resilience — The protocol is resilient to 
some changes in the topology of the network. 
For example, edges may be added or deleted 
at any time, provided that the network of 
unchanged edges remains connected. This 
corresponds to fail/stop failure of edges, thus 
demonstrating the resilience to some non- 
malicious failures. 

¢ Directed networks — The protocol does not 
use acknowledgments. Thus it may be applied 
even when the communication links are not 
symmetric, i.e., the fact that processor v can 
transmit to u does not imply that wu can trans- 
mit to v. (The appropriate network model is, 
therefore, a directed graph.) In real life this 
situation occurs, for instance, when v has a 
stronger transmitter than uw. 


A Lower Bound on Deterministic 

Algorithms 

For deterministic algorithms, one can show 
a lower bound: for every n, there exist a 
family of n-node networks such that every 
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n+1 


Randomized Broadcasting in Radio Networks, Fig. 1 
The network used for the lower bound 


deterministic broadcast scheme requires (Q2(7) 


time. For every non-empty subset S C 
{1,2,...,m}, consider the following network Gs 
(Fig. 1). 


Node 0 is the source and node n + | the 
sink. The source initiates the message and the 
problem of broadcast in Gs is to reach the sink. 
The difficulty stems from the fact that the par- 
tition of the middle layer (i.e., S) is not known 
a priori. The following theorem can be proved 
by a series of reductions to a certain “hitting 
game”: 


Theorem 3 Every deterministic broadcast pro- 
tocol that is correct for all n-node networks 
requires time Q2(n). 


The result of [2] depends crucially on the as- 
sumption that the nodes do not know the number 
and IDs of their neighbors. If this restriction is 
lifted, Kowalski and Pelc [5] showed how to 
broadcast in logarithmic time on all networks of 
type Gs. Moreover, they show how to broadcast 
in sublinear time on all n-node graphs of diameter 

o (log log n). 

Kowalski and Pelc also constructed a class of 
graphs of diameter 4, such that every broadcast- 
ing algorithm requires time @ (4/7) on one of 
these graphs. Thus they showed an exponential 
gap for their model too. 


Applications 


The procedure Decay has been used to resolve 
contention in radio and cellular phone networks. 
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Recommended Reading 


Subsequent papers showed the optimality of the 
randomized algorithm: 


e Alon et al. [1] showed the existence of a 
family of radius-2 networks on n vertices for 
which any broadcast schedule requires at least 
Q(log? n) time slots. 

¢ Kushilevitz and Mansour [7] showed that for 
any randomized broadcast protocol, there ex- 
ists a network in which the expected time to 
broadcast a message is Q(D log(N/D). 

e Bruschi and Del Pinto [3] showed that for any 
deterministic distributed broadcast algorithm, 
any n and D < n/2 there exists a network 
with m nodes and diameter D such that the 
time needed for broadcast is Q(D logn). 

* Kowalski and Pelc [6] discussed networks in 
which collisions are indistinguishable from 
the absence of transmission. They showed 
an Q(n logn/ log(n/D)) lower bound and an 
O(n logn) upper bound. For this model, they 
also showed an O(D logn + log” n) random- 
ized algorithm, thus matching the lower bound 
of [1] and improving the bound of [2] for 
graphs for which D = 6(n/ logn). 
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Problem Definition 


The randomized contractions framework is often 
useful when designing fixed-parameter-tractable 
(FPT) algorithms for graph cut problems. Let us 
assume that we are given an undirected graph 
G with n vertices and m edges together with an 
integer k. The goal is to remove at most k edges 
or at most k vertices, in the edge- and vertex- 
deletion variants of a problem, respectively, to 
satisfy some problem-specific constraints. In this 
entry, for the sake of simplicity, we restrict our 
attention to edge-deletion variants only. 

Examples of problems that fit in the above 
graph cut problem class include: 


Randomized Contraction 


Multiway Cut 
Input: an undirected graph G, a set of termi- 
nals 7 C V(G), and an integer k. 
Question: is there a set X C E(G) of at most 
k edges of G, so that in G \ X, no connected 
component contains more than one terminal 
from T? 

Steiner Cut 
Input: an undirected graph G, a set of termi- 
nals T C V(G), and integers k, s. 
Question: is there a set X C E(G) of at 
most k edges of G, so that in G \ X, at least 
s connected components contain at least one 
terminal from T? 

Multiway Cut-Uncut 
Input: an undirected graph G, a set of termi- 
nals T C V(G), an equivalence relation R on 
the set 7, and an integer k. 
Question: is there a set X¥ C E(G) of at most 
k edges of G, so that for any u, v € T, vertices 
u, v are in the same connected component of 
G\ X iff R(u, v)? 

Unique Label Cover 
Input: an undirected graph G, a finite alphabet 
» of size s, an integer k, for each vertex 
v € V(G) aset dy C YZ, and for each edge 
e € E(G) and each its endpoint v, a partial 
permutation We,, of »’, such that ife = uv 
then feu = ee 
Question: is there a set X C E(G) of at most 
k edges of G and a function W : V(G) > Y 
such that for any v € V(G) we have W(v) € 
gy and for any uv € E(G) \ X, we have 


(Y(u), 4(V)) © Vuv,u? 


Key Results 


The randomized contractions framework was ob- 
tained by Chitnis et al. [2]; however, it was 
inspired by an earlier work of Kawarabayashi and 
Thorup [4], who have shown that the k-way cut 
problem is fixed parameter tractable. Random- 
ized contractions were used to obtain the first FPT 
algorithm for unique label cover parameterized 
by both the cut size and the alphabet size, as well 
as to improve the dependency on k in the FPT al- 
gorithms for Steiner cut and multiway cut-uncut. 


Randomized Contraction 


To exemplify usage of randomized con- 
tractions, we use the multiway cut problem. 
Multiway cut is known to be FPT for a long 
time [5] and it admits efficient FPT algorithms 
with f(k) = 4* dependency on k by using 
important separators [1] as well as f(k) = 2* 
by LP-branching [3]. We use multiway cut as an 
illustration of usage of randomized contractions 
to simplify the description and magnify the most 
important parts of the technique. 


High-Level Intuition 
From now on, we assume that the given undi- 
rected graph G is connected, as otherwise one 
can solve the problem independently for each 
connected component of G. Observe that this 
guarantees that after removing k edges, the graph 
contains at most k + 1 connected components. 
On a high level, the technique works in two 
phases. In the first phase, as long as the graph 
admits a certain type of a good edge separation, 
we proceed recursively and simplify the instance. 
On the other hand, if the graph is well con- 
nected and does not contain a cut we are looking 
for, then in the second phase, we solve the prob- 
lem directly, by exploiting the high connectivity 
of G. 


Recursive Understanding 

Assume that we have a set of vertices Vj C 
V(G), such that G[V,] is connected, V; contains 
at least k -k! + 2 vertices, and there are at most k 
edges between V; and V2 = V(G) \ Vj in G. Let 
B C V;, be the set of vertices in V; having at least 
one neighbor in V2. In such a setting, one can 
show that by looking at G[V;] only (in particular 
without looking at G[V2]), one can find an edge 
of G[V;] which can be safely contracted, i.e., 
which is not part of some solution for the whole 
graph G. The reason is that any solution X C 
E(G) gives some partition of B by looking at the 
set of connected components 0 G[V2 U B] \ X. 
There are at most k! partitions of B as |B| < k. 
Imagine that for any such partition, we mark a 
set of at most k edges, which would extend the 
partial solution under consideration, i.e., extend 
X 1 E(G[B U V2]). In total, this marking pro- 
cedure would select & - k! edges, leaving at least 
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one edge unmarked, as E(G[Vi]) => |Vil —1 = 
k-k!+ 1. Such an unmarked edge can be safely 
contracted. The intuition behind this reasoning 
leads to the following definition: 


Definition 1 Let G be a connected graph. A 
partition (V;, V2) of V(G) is called a (q, k)-good 
edge separation, if 


=. (Vals Val >a; 
° |E(Vi, V2)| <k; 
* G[V;] and G[V2] are connected. 


For the multiway cut problem, we would set 
q = k-k!+ 1. The following lemma states that 
we can find a (q,k)-good edge separation, if it 
exists: 


Lemma 1 There exists a deterministic algorithm 
that, given an undirected, connected graph G 
on n vertices along with integers q and k, in 
time O(20nG,-*) loxG+') 13 log n) either finds 
a (q,k)-good edge separation or correctly con- 
cludes that no such separation exists. 


A rough sketch of the proof follows. Assume 
that a (q, k)-good edge separation (V;, V2) exists. 
Let E, be the set of edges of some subtree of 
G[V;] with exactly g edges; similarly let E2 be 
the set of edges of some subtree of G[V2] with 
exactly g edges. By the definition of a (q,k)- 
good separation, such sets E;, Fz exist. Contract 
each edge of the graph with probability 1/2 
independently from other edges. With probability 
at least 1/f(k.qg) = 27@9+, the following 
event happens (see Fig. 1): 


(i) No edge between Vj and V2 is contracted, 
(ii) All edges of Ey U E> are contracted. 


If we are lucky and such an event occurs, then 
by looking for a minimum cut between each two 
vertices onto which at least g + 1 vertices of 
G were contracted, we can find a (q,k)-good 
separation. By a better choice of contraction 
probability, we can improve the probability of 
success, whereas by using splitters [6], we can 
derandomize the procedure. 
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Randomized Contraction 


Randomized Contraction, Fig. 1 A (qg, )-good separation. In the randomized routine, we hope the thick edges to 


be contracted and the thin edges not to be contracted 


Randomized 

Contraction, Fig. 2 

Structure of a connected 

graph G that does not 

admit a (q, k)-good 

separation. After removing C. 
at most k edges, only one 0 
big connected component, 

Co, remains 


Summarizing this phase of the algorithm, 
we look for a (g,k)-good edge separation. If it 
does not exist, then we proceed to the second — 
high connectivity — phase of the algorithm. 
However, if a (g,k)-good edge separation 
exists, then we proceed recursively. Clearly, 
we are omitting some important details in this 
description. The most important of them is that 
when recursing, some vertices play a special role, 
as they are border terminals — vertices which 
have neighbors outside of the part of the graph 
under consideration. For this reason, to make the 
induction work, we need a stronger definition of 
a problem, called its border version, which for 
multiway cut is as follows: 


Border Multiway Cut 
Input: a connected, undirected graph G, a set 
of terminals T C V(G), an integer k, and a set 
Ty © V(G) of at most 2k terminals. 


Output: for each partition P of 7, output, a 
set Xp of size at most k (if it exists), such that 
in the graph Gp \ X, no two terminals from T 
are in the same connected component, where 
Gp = (V(G), E(G) U Ep) and Ep contains 
pairs of vertices which are in the same block 
of P. 


High-Connectivity Phase 

The second phase of the approach is usually 
problem specific; however, its main idea is the 
following. Since we know that G does not admit 
a (q, k)-good edge separation, if we remove any 
set X of at most k edges, there is at most one 
connected component of G \ X containing more 
than q vertices (see Fig.2). Therefore, if we 
independently contract each edge at random, then 
with good enough probability, no solution edge 
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will be contracted and all connected components 
of G \ X except possibly one will be contracted 
onto single vertices (again, see Fig.2). In such 
a case, one can show that we can solve a cut 
problem under consideration either greedily or by 
dynamic programming. 


Related Work 


The currently best-known parameterized 
algorithm for unique label cover is due to 
Wahlstrém [7] and works in time 2k OQ), 
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Problem Definition 


Recent developments in wireless commu- 
nications and digital electronics have led 
to the development of extremely small in 
size, low-power, low-cost sensor devices 
(often called smart dust). Such tiny de- 
vices integrate sensing, data processing and 
wireless communication capabilities. Exam- 
ining each such resource constraint device 
individually might appear to have small 
utility; however, the distributed self-collaboration 
of large numbers of such devices into an ad hoc 
network may lead to the efficient accomplishment 
of large sensing tasks i.e., reporting data about 
the realization of a local event happening in the 
network area to a faraway control center. 

The problem considered is the development of 
a randomized algorithm to balance energy among 
sensors whose aim is to detect events in the net- 
work area and report them to a sink. The network 
is sliced by the algorithm into layers composed of 
sensors at approximately equal distances from the 
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Sink 


Randomized Energy Balance Algorithms in Sensor 
Networks, Fig. 1 The sink and five slices $,, ..., 95 


sink [1, 2, 8] (Fig. 1). The slicing of the network 
depends on the communication distance. The sink 
initiates the process by sending a control mes- 
sage containing a counter, the value of which is 
initially 1. Sensors receiving the message assign 
themselves to a slice number corresponding to 
the counter, increment the counter and propagate 
the message in the network. A sensor already 
assigned to a slice ignores subsequent received 
control messages. 

The strategy suggested to balance the energy 
among sensors consists in allowing a sensor to 
probabilistically choose between either sending 
data to a sensor in the next layer towards the 
sink or sending the data directly to the sink. 
The difference between the two choices is the 
energy consumption, which is much higher if the 
sensor decides to report to the sink directly. The 
energy consumption is modeled as a function of 
the transmission distance by assuming that the 
energy necessary to send data up to a distance d is 
proportional to d*. Actually, more accurate mod- 
els can be considered, in which the dependence 
is of the form d*, with 2<a <5 depending 
on the particular environmental conditions. Al- 
though the model chosen determines the param- 
eters of the algorithm, the particular shape of the 
function describing the relationship between the 
distance of transmission and energy consumption 
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is not relevant except that it might increase with 
distance. The distance between two successive 
slices is normalized to be 1. Hence, a sensor 
sending data to one of its neighbors consumes 
one unit of energy and a sensor located in slice 
i consumes 7” units of energy to report to the sink 
directly. Small hop transmissions are cheap (with 
respect to energy consumption) but pass through 
the critical region around the sink and might 
strain sensors in that region, while expensive 
direct transmissions bypass that critical area. 
Energy balance is defined as follows: 


Definition 1 The network is energy-balanced if 
the average per sensor energy dissipation is the 
same for all sectors, i.e., when 


ElEi] _ ElEj] 
5 


where F; is the total energy available and S; is the 
number of nodes in slice number i. 


The dynamics of the network is modeled 
by assigning probabilities A;,i =1,...,N, 
>° A; =1, of the occurrence of an event in 
slice i. The protocol consists in transmitting the 
data to a neighbor slice with probability p; and 
with probability 1 — p; to the sink, for a sensor 
belonging to slice i. Hence, the mean energy 
consumption per data unit is p; + (1 — p;)i?. 
A central assumption in the following is that the 
events are evenly generated in a given slice. Then, 
denoting by e; the energy available per node in 
slice i (i.e., e; = £;/S;), the problem of energy- 
balanced data propagation can be formally stated 
as follows: 

Given A;,e;, S;,i = 1,..., N, find p;, A such 
that 


(Ai + Aisa piga +... +AnPnPn-1 +: Pi+1) 
————— 


= Xj 


er 
Pig Pi Si 


= he; , io ee | ae 


(2) 
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Initialize %) =A,...,Xp 
Initialize NbrLoop=1 
repeat forever 
Send x; and A values to the stations which compute 
their p; probability 
wait for a data 
for i=0 ton 
if the data passed through slice i then 
A <= I 
else 
2K <=) 
end if 
Generate R a X;-Bernoulli random variable 
Be Ba a NorLoop (X —R) 
Increment NbrLoop by one. 
end for 
end repeat 


Randomized Energy Balance Algorithms in Sensor 
Networks, Fig. 2 Pseudo-code for estimation of the x; 
value by the sink 


Equation (2) amounts to ensuring that the mean 
energy dissipation for all sensors is proportional 
to the available energy. In turn, this ensures that 
sensors might, on average, run out of energy all 
at the same time. Notice that (2) contains the 
definitions of the x;. They are the ones estimated 
in the pseudo-code in Fig. 2, the successive esti- 
mations being denoted as x;. These variables are 
proportional to the number of messages handled 
by slice i. 


Key Results 


In [1, 2] recursive equations similar to (2) were 
suggested and solved in closed form under 
adequate hypotheses. The need for a priori 
knowledge of the probability of occurrence of 
the events, the A; parameters, was considered 
in [7], in which these parameters were estimated 
by the sink on the basis of the observations of 
the various paths the data follow. The algorithm 
suggested is based on recursive estimation, is 
computationally not expensive and converges 
with rate O(1/./n). One might argue that the rate 
of convergence is slow; however, it is numerically 
observed that relatively quickly compared with 
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the convergence time, the algorithm finds an 
estimation close enough to the final value. 
The estimation algorithm run by the sink 
(which has no energy constraints) is given in 
Fig. 2. 

Results taken from [1, 2, 7] all assume the 
existence of an energy-balance solution. How- 
ever, particular distributions of the events might 
prevent the existence of such a solution and the 
relevant question is no longer the computation 
of an energy-balance algorithm. For instance, 
assuming that Ay = 0, sensors in slice N have 
no way of balancing energy. In [9] the prob- 
lem was reformulated as finding the probabil- 
ity distribution {p;}i=1,...~ Which leads to the 
maximal functional lifetime of the networks. It 
was proved that if an energy-balance strategy 
exists, then it maximizes the lifetime of the net- 
work establishing formally the intuitive reasoning 
which was the motivation to consider energy- 
balance strategies. A centralized algorithm was 
presented to compute the optimal parameters. 
Moreover, it was observed numerically that the 
interslice energy consumption is prone to be 
uneven and a spreading technique was suggested 
and numerically validated as being efficient to 
overcome this limitation of the probabilistic al- 
gorithm. 

The communication graph considered is a re- 
strictive subset of the complete communication 
graph and it is legitimate to wonder whether one 
can improve the situation by extending it. For 
instance, by allowing data to be sent two hops 
or more away. In [3, 6] it was proved that the 
topology in which sensors communicate only to 
neighbor slices and the sink is the one which 
maximizes the flow of data in the network. More- 
over, the communication graph in which sensors 
send data only to their neighbors and the sink 
leads to a completely distributed algorithm bal- 
ancing energy [6]. Indeed, as a sensor sends data 
to a neighbor slice, the neighbor must in turn send 
the data and can attach information concerning 
its own energy level. This information might be 
captured by the initial sensor since it belongs to 
the communication range of its neighbor (this 
does not hold any longer if multiple hops are 
allowed). Hence, a distributed strategy consists in 
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sending data to a particular neighbor only if its 
energy level consumption is lower, otherwise the 
data are sent directly to the sink. 


Applications 


Among the several constraints sensor networks 
designers have to face, energy management is 
central since sensors are usually battery pow- 
ered, making the lifetime of the networks highly 
sensitive to the energy management. Besides the 
traditional strategy consisting in minimizing the 
energy consumption at sensor nodes, energy- 
balance schemes aim at balancing the energy con- 
sumption among sensors. The intuitive function 
of such schemes is to avoid energy depletion 
holes appearing as some sensors that run out 
of their available energy resources and are no 
longer able to participate in the global function 
of the networks. For instance, routing might be 
no longer possible if a small number of sensors 
run out of energy, leading to a disconnected 
network. This was pointed out in [5] as well as the 
need to develop application-specific protocols. 
Energy balancing is suggested as a solution in 
order to make the global functional lifetime of 
the network longer. The earliest development of 
dedicated protocols ensuring energy balance can 
be found in [4, 10, 11]. 

A key application is to maximize the lifetime 
of the network while gathering data to a sink. 
Besides increasing the lifetime of the networks, 
other criteria have to be taken into account. In- 
deed, the distributed algorithm might be as sim- 
ple as possible owing to limited computational 
resources, might avoid collisions or limit the 
total number of transmissions, and might en- 
sure a large enough flow of data from the sen- 
sors toward the sink. Actually, maximizing the 
flow of data is equivalent to maximizing the 
lifetime of sensor networks if some particular 
realizable conditions are fulfilled. Besides the 
simplicity of the distributed algorithm, the net- 
work deployment and the self-realization of the 
network structure might be possible in realistic 
conditions. 
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Problem Definition 


The two classical problems of disseminating 
information in computer networks are broad- 
casting and gossiping. In broadcasting, the goal 
is to distribute a message from a distinguished 
source node to all other nodes in the networks. 
In gossiping, each node v in the network initially 
contains a message m,, and the task is to 
distribute each message my to all nodes in the 
network. 

The radio network abstraction captures the 
features of distributed communication networks 
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with multi-access channels, with minimal as- 
sumptions on the channel model and processors’ 
knowledge. Directed edges model unidirectional 
links, including situations in which one of two 
adjacent transmitters is more powerful than the 
other. In particular, there is no feedback mech- 
anism (see, for example, [6]). In some applica- 
tions, collisions may be difficult to distinguish 
from the noise that is normally present in the 
channel, justifying the need for protocols that 
do not depend on the reliability of the collision 
detection mechanism (see [3, 4]). Some network 
configurations are subject to frequent changes. 
In other networks, a network topology could be 
unstable or dynamic, for example, when mobile 
users are present. In such situations, algorithms 
that do not assume any specific topology are more 
desirable. 

More formally a radio network is a directed 
graph G = (V, E), where by |V| = n, we denote 
the number of nodes in this graph. Individual 
nodes in V are denoted by letters u,v,.... If 
there is an edge from u to v, ie., (u,v) € E, 
then we say that v is an out-neighbor of u and 
u is an in-neighbor of v. Messages are denoted 
by letter m, possibly with indices. In particular, 
the message originating from node v is denoted 
by m, . The whole set of initial messages is 
M = {m, : v € V}. During the computation, 
each node v holds a set of messages M, that 
have been received by v so far. Initially, each 
node v does not possess any information apart 
from M, = {my}. Without loss of generality, 
whenever a node is in the transmitting mode, one 
can assume that it transmits the whole content 
of My. 

The time is divided into discrete time steps. 
All nodes start simultaneously, have access to 
a common clock, and work synchronously. A 
gossiping algorithm is a protocol that for each 
node u, given all past messages received by u, 
specifies, for each time step ¢, whether u will 
transmit a message at time f, and if so, it also 
specifies the message. A message M transmitted 
at time ¢ from a node u is sent instantly to all its 
out-neighbors. An out-neighbor v of u receives 
M at time step ¢ only if no collision occurred, that 
is, if the other in-neighbors of v do not transmit 


1746 


at time ¢ at all. Further, collisions cannot be 
distinguished from background noise. If v does 
not receive any message at time f, it knows that 
either none of its in-neighbors transmitted at time 
t or that at least two did, but it does not know 
which of these two events occurred. The running 
time of a gossiping algorithm is the smallest 
t such that for any network topology, and any 
assignment of identifiers to the nodes, all nodes 
receive messages originating in every other node 
no later than at step ¢. 

Limited Broadcast , (k) Given an integer k 
and a node v, the goal of limited broadcasting is 
to deliver the message ™, (originating in v) to at 
least k other nodes in the network. 

Distributed Coupon Collection The set of 
network nodes V can be interpreted as a set of 
n bins and the set of messages M as a set of n 
coupons. Each coupon has at least k copies, each 
copy belonging to a different bin. M, is the set of 
coupons in bin v. Consider the following process. 
At each step, one opens every bin at random, 
independently, with probability 1/n. If no bin 
is opened, or if two or more bins are opened, a 
failure occurs and no coupons are collected. If 
exactly one bin, say v, is opened, all coupons 
from M, are collected. The task is to establish 
how many steps are needed to collect (a copy of) 
each coupon. 


Key Results 


Theorem 1 ({1]) There exists a deterministic 
O(k log?n)-time algorithm for limited broad- 
casting from any node in radio networks with an 
arbitrary topology. 


Theorem 2 ([1]) Let 8 be a given constant, 0 < 
5 < lands = (4n/k)\n(1/8). After s steps 
of the distributed coupon collection process, with 
probability at least 1 — 8, all coupons will be 
collected. 


Theorem 3 ({1]) Let « be a given constant, 
where 0 < € < 1. There exists a randomized 
O(n log? nlog(n/e))-time Monte  Carlo-type 
algorithm that completes radio gossiping with 
probability at least 1 — €. 
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Theorem 4 ({1]) There exists a randomized Las 
Vegas-type algorithm that completes radio gos- 
siping with expected running time O(n log *n). 


Applications 


Further work on efficient randomized radio gos- 
siping include the O(n log 3n)-time algorithm by 
Liu and Prabhakaran; see [5], where the de- 
terministic procedure for limited broadcasting 
is replaced by its O(k log n)-time randomized 
counterpart. This bound was later reduced to 
O(n logn) by Czumaj and Rytter in [2], where 
a new randomized limited broadcasting proce- 
dure with an expected running time O(k) is 
proposed. 


Open Problems 


The exact complexity of randomized radio gos- 
siping remains an open problem. All three gossip- 
ing algorithms [1,2,5] are based on the concepts 
of limited broadcast and distributed coupon col- 
lection. The two improvements [2, 5] refer solely 
to limited broadcasting. Thus, very likely further 
reduction of the time complexity must coincide 
with more accurate analysis of the distributed 
coupon collection process or with development of 
a new gossiping procedure. 
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Problem Definition 


The input to the problem is a connected undi- 
rected graph G = (V, E) with a weight w(e) on 
each edge e € E. The goal is to find a spanning 
tree of minimum weight, where for any subset of 
edges E’ C E, the weight of E’ is defined to be 
w(E’) = SY w(e). 


If the ‘orplt G is not connected, the goal 
of the problem is to find a minimum spanning 
forest, which is defined to be a minimum span- 
ning tree in each connected component of G. 
Both problems will be referred to as the MST 
problem. 

The randomized MST algorithm by Karger, 
Klein, and Tarjan [9] which is considered here 
will be called the KKT algorithm. Also it will be 
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assumed that the input graph G = (V, E) has n 
vertices and m edges and that the edge weights 
are distinct. 

The MST problem has been studied exten- 
sively prior to the KKT result, and several very 
efficient, deterministic algorithms are available 
from these studies. All of these are deterministic 
and are based on a method that greedily adds an 
edge to a forest that is a subgraph of the minimum 
spanning tree at all times. The early algorithms 
in this class are already efficient with a running 
time of O(m log n). These include the algorithms 
of Bortvka [1], Jarnik [8] (later rediscovered by 
Dijkstra and Prim [5]), and Kruskal [5]. 

The fastest algorithm known for MST 
prior to the KKT algorithm runs in time 
O(mlog B(m,n)) [7], where B(m,n) = 
min {i | logn < m/n} [7]; here logn_ is 
defined as log n if i = 1 and as loglog@—"n 
ifi > 1. Although this running time is close to 
linear, it is not linear time if the graph is very 
sparse. 

The problem of finding the minimum spanning 
tree efficiently is an important and fundamental 
problem in graph algorithms and combinatorial 
optimization. 


Background 
Some relevant background is summarized here. 


¢ The basic step in Bortivka’s algorithm [1] is 
the Bortivka step, which picks the minimum 
edge-weight incident on each vertex, adds 
it to the minimum spanning tree, and then 
contracts these edges. This step runs in linear 
time and also very efficiently in parallel. It 
is the backbone of the most efficient parallel 
algorithms for minimum spanning tree and is 
also used in the KKT algorithm. 

e A related and simpler problem is that of min- 
imum spanning tree verification. Here, given 
a spanning tree T of the input edge-weighted 
graph, one needs to determine if 7 is its 
minimum spanning tree. An algorithm that 
solves this problem with a linear number of 
edge-weight comparisons was shown by Kom- 
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16s [13], and later a deterministic linear-time 
algorithm was given in [6] (see also [12] fora 
simpler algorithm). 


Key Results 


The main result in [9] is a randomized algorithm 
for the minimum spanning tree problem that 
runs in expected linear time. The only opera- 
tions performed on the edge weights are pairwise 
comparisons. The algorithm does not assume any 
particular representation of the edge weights (i.e., 
integer or real values) and only assumes that any 
comparison between a pair of edge weights can 
be performed in unit time. The entry also shows 
that the algorithm runs in O(m +n) time with the 
exponentially high probability 1 — exp(—Q(m)) 
and that its worst-case running time is O(n + 
m log n). 

The simple and elegant MST sampling lemma 
given in Lemma | below is the key tool used 
to derive and analyze the KKT algorithm. This 
lemma needs a couple of definitions and facts: 


1. The well-known cycle property for the min- 
imum spanning tree states that the heaviest 
edge in any cycle in the input graph G cannot 
be in the minimum spanning tree. 

2. Let F be a forest of G (i.e., an acyclic 
subgraph of G). An edge e € E is F-light 
if F U {e} either continues to be a forest of 
G, or the heaviest edge in the cycle containing 
e is not e. An edge in G that is not F-light 
is F-heavy. Note that by the cycle property, 
an F-heavy edge cannot be in the minimum 
spanning tree of G, no matter what forest F 
is used. Given a forest F of G, the set of F- 
heavy edges can be determined in linear time 
by a simple modification to existing linear- 
time minimum spanning tree verification 
algorithms [6, 12]. 


Lemma 1 (MST Sampling Lemma) Let H = 
(V, Ex) be formed from the input edge-weighted 
graph G = (V, E) by including each edge with 
probability p independent of the other edges. Let 
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F be the minimum spanning forest of H. Then, 
the expected number of F -light edges in G is < 


n/p. 


The KKT algorithm identifies edges in the 
minimum spanning tree of G only using Borivka 
steps. However, after every two Bortivka steps, 
it removes F-heavy edges using the minimum 
spanning forest F of a subgraph obtained through 
sampling edges with probability p = 1/2. As 
mentioned earlier, these F-heavy edges can be 
identified in linear time. The minimum spanning 
forest of the sampled graph is computed recur- 
sively. 

The correctness of the KKT algorithm is im- 
mediate since every F'-heavy edge it removes 
cannot be in the MST of G since F is a forest 
of G, and every edge it adds to the minimum 
spanning tree is in the MST since it is added 
through a Bortvka step. 

The expected running time analysis as well 
as the exponentially high probability bound for 
the running time are surprisingly simple to derive 
using the MST Sampling Lemma (Lemma 1). 

In summary, the entry [9] proves the following 
results. 


Theorem 1 The KKT algorithm is a randomized 
algorithm that finds a minimum spanning tree of 
an edge-weighted undirected graph on n nodes 
and m edges in O(n + m) time with probability 
at least 1 — exp(—Q(m)). The expected running 
time is O(n+m) and the worst-case running time 
is O(n + mlog n). 

The model of computation used in [9] is the 
unit-cost RAM model since the known MST 
verification algorithms were for this model and 
not the more restrictive pointer machine model. 
More recently the MST verification result and 
hence the KKT algorithm have been shown to 
work on the pointer machine as well [2]. 

Lemma | is proved in [9] through a simulation 
of Kruskal’s algorithm along with an analysis of 
the probability with which an F-light edge is 
not sampled. Another proof that uses a backward 
analysis is given in [3]. 


Randomized Minimum Spanning Tree 
Further Comments 


¢ Recently (and since the appearance of the 
KKT algorithm in 1995), two new determin- 
istic algorithms for MST have appeared, due 
to Chazelle [4] and Pettie and Ramachan- 
dran [14]. The former [4] runs in O(n + 
ma(m,n)) time, where a is an inverse of the 
Ackermann’s function, whose growth rate is 
even smaller than the 6 function mentioned 
earlier for the best result that was known prior 
to the KKT algorithm [7]. The latter algorithm 
[14] provably runs in time that is within a 
constant factor of the decision-tree complex- 
ity of the MST problem and hence is opti- 
mal; its time bound is O(m + ma(m,n)) and 
Q(n + m), and the exact bound remains to be 
determined. 

e Although the KKT algorithm runs in expected 
linear time (and with exponentially high prob- 
ability), it is not the last word on randomized 
MST algorithms. A randomized MST algo- 
rithm that runs in expected linear time and 
uses only O(log *n) random bits is given in 
[16, 17]. In contrast, the KKT algorithm uses 
a linear number of random bits. 


Applications 


The minimum spanning tree problems has a large 
number of applications, which are discussed in 
minimum spanning trees. 


Open Problems 


Some open problems that remain are the follow- 
ing: 


1. Can randomness be removed in the KKT algo- 
rithm? A hybrid algorithm that uses the KKT 
algorithm within a modified version of the 
Pettie-Ramachandran algorithm [14] is given 
in [16, 17] that achieves expected linear time 
while reducing the number of random bits 


1749 


used to only O(log *7). Can this tiny amount 

of randomness be removed as well? If all 

randomness can be removed from the KKT al- 
gorithm, that will establish a linear time bound 

for the Pettie-Ramachandran algorithm [14] 

and also provide another optimal deterministic 

MST algorithm, this one based on the KKT 

approach. 

2. Can randomness be removed from the work- 
optimal parallel algorithms [10] for MST? 
A. linear-work, expected logarithmic-time 
parallel MST algorithm for the EREW PRAM 
is given in [15]. This parallel algorithm 
is both work and time optimal. However, 
it uses a linear number of random bits. 
Another work-optimal parallel algorithm 
is given in [16, 17] that runs in expected 
polylog time using only polylog random bits. 
This leads to the following open questions 
regarding parallel algorithms for the MST 
problem: 

o To what extent can dependence on random 
bits be reduced (from the current linear 
bound) in a time- and work-optimal paral- 
lel algorithm for MST? 

o To what extent can the dependence on 
random bits be reduced (from the current 
polylog bound) in a work-optimal parallel 
algorithm with reasonable parallelism (say 
polylog parallel time)? 


Experimental Results 
Katriel, Sanders, and Traff [11] performed an 
experimental evaluation of the KKT algorithm 
and showed that it has good performance on 
moderately dense graphs. 
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Problem Definition 


The work of Serna and Spirakis provides a par- 
allel approximation schema for the Maximum 
Flow problem. An approximate algorithm pro- 
vides a solution whose cost is within a factor of 
the optimal solution. The notation and definitions 
are the standard ones for networks and flows (see 
for example [2, 7]). 

A network N = (G,58,t,c) is a structure con- 
sisting of a directed graph G = (V, EF), two dis- 
tinguished vertices, s,t € V (called the source 
and the sink), and c: E > Zt, an assignment 
of an integer capacity to each edge in E. A flow 
function f is an assignment of a non-negative 
number to each edge of G (called the flow into 
the edge) such that first at no edge does the 
flow exceed the capacity, and second for every 
vertex except s and f, the sum of the flows on its 
incoming edges equals the sum of the flows on 
its outgoing edges. The total flow of a given flow 
function f is defined as the net sum of flow into 
the sink t. The Maximum Flow problem can be 
stated as 


Name Maximum Flow 
Input A network N = (G,s,t,c) 
Output Find a flow f for N for which the total 


flow is maximum. 
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Maximum Flows and Matchings 

The Maximum Flow problem is closely related 
to the Maximum Matching problem on bipartite 
graphs. 

Given a graph G = (V, E) and a set of edges 
M C E isa matching if in the subgraph (V, M) 
all vertices have degree at most one. A maximum 
matching for G is a matching with a maximum 
number of edges. For a graph G = (V, E) with 
weight w(e), the weight of a matching M is the 
sum of the weights of the edges in M. The 
problem can be stated as follows: 


Name Maximum Weight Matching 

Input =A graph G = (V, E) anda weight w(e) 
for each edgee € EF 

Output Find a matching of G with the maxi- 


mum possible weight. 


There is a standard reduction from the Maximum 
Matching problem for bipartite graphs to the 
Maximum Flow problem [7, 8]. In the general 
weighted case one has just to look at each edge 
with capacity c > | as c edges joining the same 
points each with capacity one, and transform 
the multigraph obtained as shown before. Notice 
that to perform this transformation a c value is 
required which is polynomially bounded. The 
whole procedure was introduced by Karp, Up- 
fal, and Wigderson [5] providing the following 
results 


Theorem 1 The Maximum Matching problem 
jor bipartite graphs is NC equivalent to the Maxi- 
mum Flow problem on networks with polynomial 
capacities. Therefore, the Maximum Flow with 
polynomial capacities problem belongs to the 
class RNC. 


Key Results 


The first contribution is an extension of Theo- 
rem | to a generalization of the problem, namely 
the Maximum Flow on networks with polynomi- 
ally bounded maximum flow. The proof is based 
on the construction (in NC) of a second network 
which has the same maximum flow but for which 
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the maximum flow and the maximum capacity in 
the network are polynomially related. 


Lemma 2 Let N = (G,5,t,c). Given any in- 
teger k, there is an NC algorithm that decides 
whether f(N) => k or f(N) < km. 


Since Lemma 2 applies even to numbers that are 
exponential in size, they get 


Lemma 3 Let N =(G,5,t,c) be a network, 
there is an NC algorithm that computes an integer 
value k such that 2* < f(N) < m 2**!, 


The following lemma establishes the NC- 
reduction from the Maximum Flow problem 
with polynomial maximum flow to the Maximum 
Flow problem with polynomial capacities. 


Lemma 4 Let N =(G,5,t,c) be a network, 
there is an NC algorithm that constructs a second 
network N,; = (G,8,t,c,) such that 


log(Max(N1)) < log(f(N1)) + O(logn) 


and f(N) = f(N1). 


Lemma 4 shows that the Maximum Flow prob- 
lem restricted to networks with polynomially 
bounded maximum flow is NC-reducible to the 
Maximum Flow problem restricted to polyno- 
mially bounded capacities, the latter problem is 
a simplification of the former one, so the follow- 
ing results follow. 


Theorem 5 For each polynomial p, the problem 
of constructing a maximum flow in a network N 
such that f(N) < p(n) is NC-equivalent to the 
problem of constructing a maximum matching in 
a bipartite graph, and thus it is in RNC. 


Recall that [5] gave us an O(log” 7) randomized 
parallel time algorithm to compute a maximum 
matching. The combination of this with the re- 
duction from the Maximum Flow problem to the 
Maximum Matching leads to the following result. 


Theorem 6 There is a randomized parallel algo- 
rithm to construct a maximum flow in a directed 
network, such that the number of processors is 
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bounded by a polynomial in the number of ver- 
tices and the time used is O((logn)® log f(N)) 
for some constant a > 0. 


The previous theorem is the first step towards 
finding an approximate maximum flow in 
a network N by an RNC algorithm. The 
algorithm, given N and an e>0, outputs 
a solution f such that f(N)/f’ < 1+ 1/e. 
The algorithm uses a polynomial number of 
processors (independent of ¢) and _ parallel 
time O(log* n(logn + loge)), where a@ is 
independent of ¢. Thus, the algorithm is an 
RNC one as long as ¢ is at most polynomial 
in n. (Actually ¢ can be O(nes” 2) for some 6.) 
Thus, being a Fully RNC approximation scheme 
(FRNCAS). 

The second ingredient is a rough NC approxi- 
mation to the Maximum Flow problem. 


Lemma7 Let N = (G,5,t,c) be anetwork. Let 
k > 1 be an integer, then there is an NC algo- 
rithm to construct a network M = (G,5,t,c1) 
such thatk f(M) < f(N) <k f(M)+km. 


Putting all together and allowing randomization 
the algorithm can be sketched as follows: 
FAST-FLOW(N = (G,5s,t,c),&) 


1. Compute k such that 2k < F(N) < 2*+1m. 
2. Construct a network N; such that 


log(Max(N1)) < log(F(Ni)) + O(logn). 


3. If 2* < (1+ )m then F(N) < (1+ £)m? so 
use the algorithm given in Theorem 6 to solve 
the Maximum Flow problem in N as a Maxi- 
mum Matching and return 

4. Let B = |(2*)/((1 + )m)|. Construct Ny 
from N, and § using the construction in 
Lemma 7. 

5. Solve the Maximum Flow problem in N2 as 
a Maximum Matching. 

6. Output F’ = BF(M2) and for all ec E, 


fe) = Bfle). 


Theorem 8 Let N = (G,5,t,c) be a network. 
Then, algorithm FAST-FLOW is an RNC algo- 
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rithm such that for all ¢ > 0 at most polynomial 
in the number of network vertices, the algorithm 
computes a legal flow of value f such that 


f(N) 1 
yo sleS 


Furthermore, the algorithm uses a polynomial 
number of processors and runs in expected paral- 
lel time O(log® n(logn + loge)), for some con- 
stant a, independent of &. 


Applications 


The rounding/scaling technique is used in general 
to deal with problems that are hard due to the 
presence of large weights in the problem instance. 
The technique modifies the problem instance in 
order to produce a second instance that has no 
large weights, and thus can be solved efficiently. 
The way in which a new instance is obtained 
consists of computing first an estimate of the 
optimal value (when needed) in order to discard 
unnecessary high weights. Then the weights are 
modified, scaling them down by an appropriate 
factor that depends on the estimation and the 
allowed error. The rounding factor is determined 
in such a way that the so-obtained instance can be 
solved efficiently. Finally, a last step consisting 
of scaling up the value of the “easy” instance 
solution is performed in order to meet the cor- 
responding accuracy requirements. 

It is known that in the sequential case, the 
only way to construct FPTAS uses round- 
ing/scaling and interval partition [6]. In general, 
both techniques can be paralyzed, although 
sometimes the details of the parallelization are 
non-trivial [1]. 

The Maximum Flow problem has a long 
history in Computer Science. Here are recorded 
some results about its parallel complexity. 
Goldschlager, Shaw, and Staples showed that the 
Maximum Flow problem is P-complete [3]. The 
P-completeness proof for Maximum Flow uses 
large capacities on the edges; in fact the values of 
some capacities are exponential in the number of 
network vertices. If the capacities are constrained 
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to be no greater than some polynomial in the 
number of network vertices the problem is in 
ZNC. In the case of planar networks it is known 
that the Maximum Flow problem is in NC, even 
if arbitrary capacities are allowed [4]. 


Open Problems 


The parallel complexity of the Maximum Weight 
Matching problem when the weight of the edges 
are given in binary is still an open problem. How- 
ever, aS mentioned earlier, there is a randomized 
NC algorithm to solve the problem in O(log? n) 
parallel steps, when the weights of the edges 
are given in unary. The scaling technique has 
been used to obtain fully randomized NC approx- 
imation schemes, for the Maximum Flow and 
Maximum Weight Matching problems (see [10]). 
The result appears to be the best possible in 
regard of full approximation, in the sense that the 
existence of an FNCAS for any of the problems 
considered is equivalent to the existence of an NC 
algorithm for perfect matching which is also still 
an open problem. 
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Problem Definition 


Randomized rounding is a technique for design- 
ing approximation algorithms for NP-hard op- 
timization problems. Many combinatorial opti- 
mization problems can be represented as 0-1 
integer linear programs; that is, integer linear 
programs in which variables take values in {0, 1}. 
While 0-1 integer linear programming is NP- 
hard, the rational relaxations (also referred to as 
fractional relaxations) of these linear programs 
are solvable in polynomial time [12, 13]. Ran- 
domized rounding is a technique to construct 
a provably good solution to a 0-1 integer lin- 
ear program from an optimum solution to its 
rational relaxation by means of a randomized 
algorithm. 
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Let IT be a 0-1 integer linear program with 
variables x; € {0,1}, 1 <i <n. Let ITp be the 
rational relaxation of II obtained by replacing the 
x; € {0,1} constraints by x; € [0,1],1 <i <n. 
The randomized rounding approach consists of 
two phases: 


1. Solve /Tp using an efficient linear program 
solver. Let the variable x; take on value 
xf €[0,1],l<i<n. 

2. Compute a solution to JT by setting the vari- 
ables x; randomly to one or zero according to 
the following rule: 


For several fundamental combinatorial optimiza- 
tion problems, the randomized rounding tech- 
nique yields simple randomized approximation 
algorithms that yield solutions provably close 
to optimal. Variants of the basic approach out- 
lined above, in which the rounding of variable 
x; in the second phase is done with a proba- 
bility that is some appropriate function of x;", 
have also been studied. The analyses of algo- 
rithms based on randomized rounding often rely 
on Chernoff—Hoeffding bounds from probability 
theory [5, 11]. 

The work of Raghavan and Thompson [14] 
introduced the technique of randomized round- 
ing for designing approximation algorithms for 
NP-hard optimization problems. The randomized 
rounding approach also implicitly proves the ex- 
istence of a solution with certain desirable prop- 
erties. In this sense, randomized rounding can be 
viewed as a variant of the probabilistic method, 
due to Erdés [1], which is widely used for various 
existence proofs in combinatorics. 

Raghavan and Thompson the 
randomized rounding approach using three 
optimization problems: VLSI routing, multicom- 
modity flow, and k-matching in hypergraphs. 


illustrate 


Definition 1 In the VLSI Routing problem, we 
are given a two-dimensional rectilinear lattice 
L, over n nodes and a collection of m nets 
{a; : 1 <i < mb}, where net aj, is a set of nodes 
to be connected by means of a Steiner tree in 
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L,. For each net a;, we are also given a set A; 
of allowed trees that can be used for connecting 
the nodes in that set. A solution to the problem is 
aset J of trees {T; € Aj: 1 <i < m}. The width 
of solution 7 is the maximum, over all edges 
e, of the number of trees in 7 that contain the 
edge. The goal of the VLSI routing problem is to 
determine a solution with minimum width. 


Definition 2 In the Multicommodity Flow 
Congestiom Minimization problem (or simply, 
the Congestion Minimization problem), we are 
given a graph G = (V, £), and a set of source- 
destination pairs {(s;,t;):1 <i <k}. For each 
pair (s;,t;), we would like to route one unit of 
demand from s; to t;. A solution to the problem is 
a set P = {P;:1 <i <k} such that P; is a path 
from 5; to t; in G. We define the congestion of P to 
be the maximum, over all edges e, of the number 
of paths containing e. The goal of the undirected 
multicommodity flow problem is to determine 
a path set P with minimum congestion. 


In their original work [14], Raghavan and 
Thompson studied the above problem for the 
case of undirected graphs and referred to it as 
the Undirected Multicommodity Flow problem. 
Here, we adopt the more commonly-used term 
of Congestion Minimization and consider both 
undirected and directed graphs since the results 
of [14] apply to both classes of graphs. Re- 
searchers have studied a number of variants of the 
multicommodity flow problem, which differ in 
various aspects of the problem such as the nature 
of demands (e.g., uniform vs. non-uniform), the 
objective function (e.g., the total flow vs. the 
maximum fraction of each demand), and edge 
capacities (e.g., uniform vs. non-uniform). 


Definition 3 In the Hypergraph Simple k - 
Matching problem, we are given a hypergraph 
H over an n-element vertex set V. A k-matching 
of H is a set M of edges such that each vertex 
in V belongs to at most k of the edges in M. A k- 
matching M is simple if no edge in H occurs more 
than once in M. The goal of the problem is to 
determine a maximum-size simple k-matching of 
a given hypergraph H. 
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Key Results 


Raghavan and Thompson present approximation 
algorithms for the above three problems using 
randomized rounding. In each case, the algorithm 
is easy to present: write a 0-1 integer linear 
program for the problem, solve the rational relax- 
ation of this program, and then apply randomized 
rounding. They establish bounds on the quality of 
the solutions (i.e., the approximation ratios of the 
algorithm) using Chernoff—Hoeffding bounds on 
the tail of the sums of bounded and independent 
random variables [5, 11]. 

The VLSI Routing problem can be easily ex- 
pressed as a 0-1 integer linear program, say IT. 
Let W’ denote the width of the optimum solution 
to the rational relaxation of J7,. 


Theorem 1 For any ¢ such that 0 < ¢ < 1, the 
width of the solution produced by randomized 
rounding does not exceed 


* * 2n(n — 1) Me 
Ww" +)3W a 


with probability at least 1 — &, provided W* > 
3 In(2n(n — 1)/e). 


Since W’ is a lower bound on the width of 
an optimum solution to /7,, it follows that the 
randomized rounding algorithm has an approxi- 
mation ratio of 1 + o(1) with high probability as 
long as W’ is sufficiently large. 

The Congestion Minimization problem can be 
easily expressed as a 0-1 integer linear program, 
say IT>. Let C’ denote the congestion of the 
optimum solution to the linear relaxation of JT. 
This optimum solution yields a set of flows, one 
for each commodity i. The flow for commodity 
i can be decomposed into a set I; of at most 
|E| paths from s; to t;. The randomized rounding 
algorithm selects, for each commodity 7, one path 
P; at random from I’; according to the flow values 
determined by the flow decomposition. 


Theorem 2 For any € such that 0 < ¢ <1, the 
capacity of the solution produced by randomized 
rounding does not exceed 
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Eine 
C*+ Ea In a 
€ 
with probability at least 1 —, provided C* > 
2In|E|. 


Since C’ is a lower bound on the width of an op- 
timum solution to J7,, it follows that the random- 
ized rounding algorithm achieves a constant ap- 
proximation ratio with probability 1 — 1/n when 
C’ is Q(logn). 

For both the VLSI Routing and the Con- 
gestion Minimization problems, slightly worse 
approximation ratios are achieved if the lower 
bound condition on W’ and C”, respectively, is 
removed. In particular, the approximation ratio 
achieved is O(logn/loglogn) with probability 
at least 1 — n~© for a constant c > 0 whose value 
depends on the constant hidden in the big-Oh 
notation. 

The hypergraph k-matching problem is 
different than the above two problems in that 
it is a packing problem with a maximization 
objective while the latter are covering problems 
with a minimization objective. Raghavan and 
Thompson show that randomization rounding, in 
conjunction with a scaling technique, yields good 
approximation algorithms for the hypergraph 
k-matching problem. They first express the 
matching problem as a 0-1 integer linear 
program, solve its rational relaxation /73, and 
then round the optimum rational solution by 
using appropriately scaled values of the variables 
as probabilities. Let S” denote the value of the 
optimum solution to J73. 


Theorem 3 Let 6; and 5 be positive constants 
such that 8) >n-e7*/® and 6; + 8) <1. Let 
a = 3In(n/b2)/k and 


——— 7 he 


Then, there exists a simple k-matching for the 
given hypergraph with size at least 


1 1/2 
S’ — | 2S’ In — . 
( x) 
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Note that the above result is stated as an existence 
result. It can be modified to yield a random- 
ized algorithm that achieves essentially the same 
bound with probability 1 — ¢ for a given failure 
probability e. 


Applications 


Randomized rounding has found applications 
for a wide range of combinatorial optimization 
problems. Following the work of Raghavan 
and Thompson [14], Goemans and Williamson 
showed that randomized rounding yields 
an e/(e—1)-approximation algorithm for 
MAXSAT, the problem of finding an assignment 
that satisfies the maximum number of clauses 
of a given Boolean formula [7]. For the set 
cover problem, randomized rounding yields 
an algorithm with an asymptotically optimal 
approximation ratio of O(logn), where n is 
the number of elements in the given set cover 
instance [10]. Srinivasan has developed more 
sophisticated randomized rounding approaches 
for set cover and more general covering and 
packing problems [15]. Randomized rounding 
also yields good approximation algorithms for 
several flow and cut problems, including variants 
of undirected multicommodity flow [9] and the 
multiway cut problem [4]. 

While randomized rounding provides a unify- 
ing approach to obtain approximation algorithms 
for hard optimization problems, better approxi- 
mation algorithms have been designed for spe- 
cific problems. In some cases, randomized round- 
ing has been combined with other algorithms to 
yield better approximation ratios than previously 
known. For instance, Goemans and Williamson 
showed that the better of two solutions, one 
obtained by randomized rounding and the other 
obtained by an earlier algorithm due to Johnson, 
yields a 4/3 approximation for MAXSAT [7]. 

The work of Raghavan and Thompson applied 
randomized rounding to a solution obtained for 
the relaxation of a 0-1 integer program for a given 
problem. In recent years, more sophisticated 
approximation algorithms have been obtained by 
applying randomized rounding to semidefinite 
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program relaxations of the given problem. 
Examples include the 0.87856-approximation 
algorithm for MAXCUT due to Goemans and 
Williamson [8] and an O(,/log n)-approximation 
algorithm for the sparsest cut problem, due to 
Arora, Rao, and Vazirani [3]. 

An excellent reference for the above and other 
applications of randomized rounding in approxi- 
mation algorithms is the text by Vazirani [16]. 


Open Problems 


While randomized rounding has _ yielded 
improved approximation algorithms for a number 
of NP-hard optimization problems, the best 
approximation achievable by a polynomial-time 
algorithm is still open for most of the problems 
discussed in this article, including MAXSAT, 
MAXCUT, the sparsest cut, the multiway 
cut, and several variants of the congestion 
minimization problem. For directed graphs, 
it has been shown that best approximation 
ratio achievable for congestion minimization 
in polynomial time is 2 (logn/ log log n), unless 
NP c ZPTIME(n0(°s'°e”)), matching the upper 
bound mentioned in section “Key Results” up 
to constant factors [6]. For undirected graphs, 
the best known inapproximability lower bound is 
2 (log logn/ log log logn) [2]. 
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Problem Definition 


This problem deals with finding a point at an 
unknown position on one of a set of w rays which 
extend from a common point (the origin). In 
this problem there is a searcher, who starts at the 
origin, and follows a sequence of commands such 
as “explore to distance d on ray 7.” The searcher 
detects immediately when the target point is 
crossed, but there is no other information pro- 
vided from the search environment. The goal of 
the searcher is to minimize the distance traveled. 

There are several different ways this problem 
has been formulated in the literature, including 
one called the “cow-path problem” that involves 
a cow searching for a pasture down a set of paths. 
When w = 2, this problem is to search for a point 
on the line, which has also been described as a 
robot searching for a door in an infinite wall or 
a shipwreck survivor searching for a stream after 
washing ashore on a beach. 


Notation 

The problem is as described above, with w rays. 
The position of the target point (or goal) is de- 
noted (g,i) if it is at distance g on rayi € 
{0,1,...,w—1}. The standard notion of compet- 
itive ratio is used when analyzing algorithms for 
this problem: An algorithm that knows which ray 
the goal is on will simply travel distance g down 
that ray before stopping, so search algorithms are 
compared to this optimal, omniscient strategy. 

In particular, if R is a randomized algorithm, 
then the distance traveled to find a particular 
goal position is a random variable denoted 
distance (R,(g,i)), with expected value 
E [distance (R, (g,i))]. Algorithm R has 
competitive ratio c if there is a constant a such 
that, for all goal positions (g, 7), 


E [distance (R,(g,i))]}<c.g+a. (1) 
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Key Results 


This problem is solved optimally using a random- 
ized geometric sweep strategy: Search through 
the rays in a random (but fixed) order, with 
each search distance a constant factor longer than 
the preceding one. The initial search distance 
is picked from a carefully selected probability 
distribution, giving the following algorithm: 


RAYSEARCH 7, 

o < Arandom permutation of {0,1,2,...,w—1}; 
€ <A random real uniformly chosen from [0,1); 
d<r*: 

p90; 

repeat 

Explore path o(p) up to distance d; 

if goal not found then return to origin; 

d<d-r; 

p<(p+1)modw; 

until goal found; 


The following theorems give the competitive 
ratio of this algorithm, show how to pick the best 
r, and establish the optimality of the algorithm. 


Theorem 1 ([9]) For any fixed r > 1, Algorithm 
RAYSEARCH ,;,, has competitive ratio 


2 l4trtr2?t---+r"1 
Reape 
Ww 


> 


Inr 


Theorem 2 ((9]) The unique solution of the 
equation 


Ltrtre?teeprvl 
(2) 
for r > 1, denoted by re gives the minimum 
value for R(r, w). 


Inr = 


Randomized Searching = 
wir, 

on Rays or the W 

Line, Table 1 The 2 | 3.59112 

asymptotic growth of the 3 2.01092 

competitive ratio with w is 

established in the following i RE 

theorem 5 | 1.44827 
6 1.35020 
7 | 1.28726 
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Theorem 3 ([8, 9, 12]) The optimal competitive 
ratio for any randomized algorithm for searching 
on w rays is 


2 l+rtr74---+7r"! 
min ¢ 1 + = 
r>1 Ww Inr 

Corollary 1 Algorithm RAYSEARCH ,.,, is opti- 


mally competitive. 


Using Theorem 2 and standard numerical 
techniques, r,,* can be computed to any required 
degree of precision. The following table shows, 
for small values of w, approximate values for 
ry* and the corresponding optimal competitive 
ratio (achieved by RAYSEARCH,,,,) — the optimal 
deterministic competitive ratio (see [1]) is also 
shown for comparison (Table 1): 


Theorem 4 ((9]) The competitive ratio for algo- 
rithm RAYSEARCH +,» (with r = ry*) is kw + 
o(w), where 


|e 
k= min|2 2 


s>0 


1 
= 3.088. 


Applications 


The most direct applications of this problem are 
in geometric searching, such as robot navigation 
problems. For example, when a robot is traveling 
in an unknown area and encounters an obstacle, a 
typical first step is to find the nearest corner to go 
around [2,3], which is just an instance of the ray 
searching problem (with w = 2). 

In addition, any abstract search problem with 
a cost function that is linear in the distance to 
the goal reduces to ray searching. This includes 
applications in artificial intelligence that search 
for a goal in a largely unknown search space 


Optimal randomized ratio | Optimal deterministic ratio 


4.59112 9 

7.73232 14.5 
10.84181 19.96296 
13.94159 25.41406 
17.03709 30.85984 
20.13033 36.30277 
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[11] and the construction of hybrid algorithms 
[8]. In hybrid algorithms, a set of algorithms 
A,, A2,..., Ay for solving a problem is consid- 
ered — algorithm A, is run for a certain amount of 
time, and if the algorithm is not successful algo- 
rithm A, is stopped and algorithm Ag is started, 
repeating through all algorithms as many times 
as is necessary to find a solution. This notion of 
hybrid algorithms has been used successfully for 
several problems (such as the first competitive 
algorithm for the online k-server problem [4]), 
and the ray search algorithm gives the optimal 
strategy for selecting the trial running times of 
each algorithm. 


Open Problems 


Several natural extensions of this problem have 
been studied in both deterministic and random- 
ized settings, including ray searching when an 
upper bound on the distance to the goal is known 
(i.e., the rays are not infinite but are line seg- 
ments) [5, 10, 12], or when a probability distribu- 
tion of goal positions is known [7]. Other varia- 
tions of this basic searching problem have been 
studied for deterministic algorithms only, such 
as when the searcher’s control is imperfect (so 
distances cannot be specified precisely) [6] and 
for more general search spaces like points in the 
plane [1]. A thorough study of these variants with 
randomized algorithms remains an open problem. 
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Problem Definition 


We use the abstract tile assembly model of 
Winfree [6], which models the aggregation of 
monomers called files that attach one at a time 
to a growing structure, starting from a single 
seed tile, in which bonds (“glues”) on the tile 
are specific (glues only stick to glues of the 
same type on other tiles) and cooperative (so 
that multiple weak glues are necessary to attach 
a tile). The general idea of randomized self- 
assembly is to use the inherent randomness of 
self-assembly to help the assembly process. If 
multiple types of tiles are able to bind to a 
single binding site, then we assume that their 
relative concentrations determine the probability 
that each succeeds. With careful design, we 
can use the same tile set to create different 
structures, by changing the concentrations to 
affect what is likely to assemble. Another 
use of randomness is in reducing the number 
of different tile types required to assemble a 
shape. 


Definitions 

A shape is a finite, connected subset of Z”. A tile 
type is a unit square with four sides, each side 
consisting of a glue label (finite string) and a non- 
negative integer strength. We assume a finite set 
T of tile types, but an infinite number of copies of 
each tile type, each copy referred to as a tile. An 
assembly is a positioning of tiles on the integer 
lattice Z7; ive., a partial function @ : Z? --> T. 
Write a C B to denote that a is a subassembly 
of 6, which means that dom a C dom f and 
a(p) = B(p) for all points p € dom a. In 
this case, say that 6 is a superassembly of a. 
Two adjacent tiles in an assembly interact if the 
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glue labels on their abutting sides are equal and 
have positive strength. Each assembly induces 
a binding graph, a grid graph whose vertices 
are tiles, with an edge between two tiles if they 
interact. The assembly is t-stable if every cut of 
its binding graph has strength at least t, where 
the weight of an edge is the strength of the glue 
it represents (energy T is required to separate the 
assembly). The t-frontier 0a C Z? \ dom a@ of 
a (or frontier da when Tt is clear from context) is 
the set of empty locations adjacent to a at which 
a single tile could bind stably. 

A tile system is a triple 7 = (T,o,1), where 
T is a finite set of tile types, 0 : Z? --> T is 
a seed assembly consisting of a single tile (i.e., 
|dom o| = 1), and t € N is the temperature. 
An assembly a@ is producible if either a = o 
or if B is a producible assembly and a can be 
obtained from f by the stable binding of a single 
tile. In this case, write B — , a (q@ is producible 
from f by the attachment of one tile), and write 
B > a if B —} a (@ is producible from f 
by the attachment of zero or more tiles). If @ is 
producible, then there is an assembly sequence 
a = (a; | 1 <i <k) such thata; =o, a, =a, 
and, for eachi € {1,...,k —1}, a; 1 aj41. 
An assembly is terminal if no tile can be t-stably 
attached to it. Write A[7] to denote the set of all 
producible assemblies of 7, and write Ap[T] to 
denote the set of all producible, terminal assem- 
blies of 7. We also speak of shapes assembled by 
tile assembly systems, by which we mean dom a 
if @ € Apl[7], and we consider shapes to be 
equivalent up to translation. 

We now define the semantics of incorporat- 
ing randomization into self-assembly. Intuitively, 
there are two sources of nondeterminism in the 
model as defined: (1) if |d@a| > 1, then there are 
multiple binding sites, one of which is nondeter- 
ministically selected as the next site to receive 
a tile, and (2) if multiple tile types could bind 
to a single binding site, then one of them is 
nondeterministically selected. Both concepts are 
handled by assigning positive real-valued con- 
centrations to each tile type; Ref.[3] gives a 
full definition that accounts for both of these. 
However, in the results we discuss, only the 
latter source of nondeterminism will actually af- 
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fect the probabilities of various terminal assem- 
blies being produced; the binding sites them- 
selves can be picked in an arbitrary order with- 
out affecting these probabilities. Thus we state 
here a simpler definition based on this assump- 
tion. 

A tile concentration assignment on T is a 
function p : T — [0, 00). If p(t) is not specified 
explicitly for some t € 7, then p(t) = 1. If 
a is a t-stable assembly such that 4,...,t; € 
T are the tiles capable of binding to the same 
position m € da, then for1 <i < j, ¢; binds 
at position m with probability ONET p 
induces a probability measure on Ap[T] in a 
straightforward way. Formally, let a € Ap[T] 
be a producible terminal assembly. Let A(a) be 
the set of all assembly sequences a = (a; | 
1 <i < k) such that ag = a@, with pei 
denoting the probability of attachment of the tile 
added to a;— 1 to produce qa; (noting that py; = 
1 if the ith tile attached without contention). 

k 
Then Prla] = >> TI] ay] Pasi: Write 7 (p) 
acA(a)i=2 
to denote the random variable representing the 
producible, terminal assembly produced by 7 
when using tile concentration assignment p. 


Problems 
The general problem is this: given a shape X C 
Z? (aconnected, finite set), set the concentrations 
of tile types in some tile system 7 so that 7 is 
likely to create a terminal assembly with shape 
X or “close to it’ We now state formal prob- 
lems that are variations on this theme. The first 
four problems use “concentration programming”: 
varying the concentrations of tile types in a single 
tile system J to get it to assemble different 
shapes. The last two problems concern a tile 
system that only does one thing — assemble a line 
of a desired expected length — because in this 
setting we will require all concentrations to be 
equal. However, the tile system uses randomized 
self-assembly to do this with far fewer tile types 
than are needed to accomplish the same task in a 
deterministic tile system. 

The first three problems concern the self- 
assembly of squares, and the problems are listed 
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in order of increasing difficulty. The first asks 
for a square with a desired expected width, the 
second for a guarantee that the actual width is 
likely to be close to the expected width, and 
finally, for a guarantee that the actual width is 
likely to be exactly the expected width. 

Formally, design a tile system 7 = (T7,0,T) 
such that, for any n € Z* , there exists a tile 
concentration assignment p : T — [0,00) such 
that... 


Problem1 ...dom 7() is a square with ex- 
pected width n. 
Problem 2 ...with probability at least 1 — 6, 


dom 7 (p) is a square whose width is between 
(1 —e)n and (1 + €)n. 


Problem3 ...with probability at least 1 — 6, 
dom 7 (p) is a square of width n. 


The next problem generalizes the previous 
problems to arbitrary shapes, while making 
one relaxation: allowing a scaled-up version 
of a shape to be assembled instead of the 
exact shape. Formally, for c € Z* and shape 
S Cc Z? (finite and connected), define S° = 
{ (x,y) € Z? | ([x/c],Ly/c])€S} to be S 
scaled by factor c. 


Problem 4 Let é > 0. Design a tile system 7 = 
(T,o,T) such that, for any shape S C Z?, there 
exists a tile concentration assignment p : T > 
[0,00) and c € Z* so that, with probability at 
least 1 — 6, dom T(p) is S°. 


It is easy to see that for a deterministic tile 
system to assemble a length n, height 1 line 
requires n tile types. The next problem concerns 
using randomization to reduce the number of 
tile types required, subject to the constraint that 
all tile type concentrations are equal. (Without 
this constraint, a solution to Problem | would 
trivially be a solution to the next problem, with 
optimal O(1) tile types, but since the solution to 
Problem | uses different tile type concentrations 
to achieve its goal, it cannot be used directly for 
this purpose.) 


Problem5 Let n € Z*. Design a tile system 
JT = (T,o,T) such that, with tile concentration 
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assignment p : T — [0, 00) defined by p(t) = 1 
for all t € T, dom 7(p) is a height 1 line of 
expected length n. 


As with the case of concentration program- 
ming, it is desirable for the line to have length 
likely to be close to its expected length. 


Problem6 Let n € Zt and é,e€ > 0. Design 
a tile system 7 = (7,0,t) such that, with 
tile concentration assignment p : T — [0,0o) 
defined by p(t) = 1 for all t € T, dom 7 (p) is 
a height 1 line whose length is between (1 — €)n 
and (1 + €)n with probability at least 1 — 6. 


Key Results 


The solutions to Problems 1—4 use temperature 2 
tile systems. The solutions to Problems 5 and 6 
use a temperature | tile system (there is no need 
for cooperative binding in one dimension). 
Figure | shows a simple tile system with three 
tile types that can grow a line of any desired 
expected length to the right of the seed tile; this is 
the basis for the solutions to Problems 1-4. The 
length of the line has a geometric distribution, 
with expected value controlled by the ratio of the 
concentrations of G and S. Figure 2 shows the 
solution to Problem 1, due to Becker, Remila, and 
Rapaport [1]. It is essentially the tile system from 


ie | Si | | : 
concentration concentration 
1-p P 


expected length / = 1/p 


G 


Icle | 
Randomized Self-Assembly, Fig. 1 A randomized 
temperature t = 2 tile system that can grow a line of 
any desired expected length / by setting p = + Two 
tiles compete nondeterministically to bind to the right of 
the line (using strength 2 glues, indicated by double black 
lines), one of which stops the growth, while the other 


continues, giving the length of the line (not counting the 
seed) a geometric distribution with expected value / 


i seed 
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Fig. | (tile types A and B are analogous to G and 
S in Fig. 1) augmented with a constant number 
of extra tiles that can assemble the square to be as 
high as the line is long. 

Kao and Schweller [4] showed a solution to 
Problem 2, and Doty [3] improved their construc- 
tion to show a solution to Problem 3. Here, we 
describe only the latter construction, since the 
two share similar ideas, and the latter construc- 
tion solves both problems. 

Figure 3 shows an improvement to the tile sys- 
tem of Fig. 1, which will be the starting point for 
the solution. It also can grow a line of any desired 
expected length. However, by using multiple in- 
dependent “stages” of growth, each stage having 
a geometric distribution, the resulting assembly 
is more likely to have a length that is close to its 
expected length. More tile types are needed for 
more stages, but only a constant number of stages 
are required. 

In particular, if the expected length is chosen 
to be midway between any two consecutive pow- 
ers of two, i.e., midway in the interval p= , 2°) 
for arbitrary a € N, with r = 113 stages, the 
probability is at most 0.0025 that the actual length 
is outside the interval [27—!, 2“). So although the 
length is not controlled with exact precision, the 
number of bits needed to represent the length is 
controlled with exact precision (with high proba- 
bility), using a constant number of tile types. 

Figure 4 shows a tile system 7 with the fol- 
lowing property: for any bit string s (equivalently, 
any natural number m if we assume the most sig- 
nificant bit of s is 1), there is a tile concentration 
assignment that causes 7 to grow an assembly of 
height O(log m), width O(m7), such that the tile 
types in the upper-right corner of the assembly 
encode s. The bottom row is the tile system from 
Fig. 3, with identical strength 2 glues on the north 
of the tiles (other than the final stop tile on the 
right). 

Figure 5 shows a high-level overview of 
the entire tile system that assembles an n x n 
square, solving Problem 3. Using similar ideas 
to Fig.4, one can encode three different numbers 
m,,m2,m3 € N into the tile concentrations. 
We choose these numbers to be such that 
each mj = O(n1/3), and each of their binary 
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CB =(1-c)p 
Ca =(1-c)(1-p) 


Qa 


(3) 
io 


Randomized Self-Assembly, Fig. 2. A tile system that 
grows a square of any desired expected width (Figure 
taken from [4]); strength 2 glues are indicated by two lines 
between the tiles. The seed is labeled S, and C4 and Cg 


{hp 
a 


concentration 
p 


concentration 
1-p 


concentration 
p 


concentration 
1-p 
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respectively represent the concentrations of A and B. p 
is used the same way as in Fig. 1, and c represents total 
concentration of all other tile types, since [4] assumed that 
concentrations of all tile types must sum to | 


concentration concentration 


1-p 


expected length r/p 


Randomized Self-Assembly, Fig. 3 A tile system that 
grows a line of a given length with greater precision 
than in Fig. 1. r stages each have expected length 1/p, 


making the expected total length r/p, but more tightly 
concentrated about that expected length than in the case 
of one stage 


i Concentrations of G's and S's ensure S, almost certainly is placed ses 
~<— sampling tiles within this interval m in binary 
concentration = concentration = * 
m2’ +2!" m2 +2/4 
ot a 1,0]1,0]1,0}1,0]1,0)1,0/1,0]1,0)1,0]1,0}1,0/1,0/1,0]1,0}1,1/1,1 oft} most significant k 
1,0]1,0|1,0]1,0/1,0]1,0|1,0}1,0/0,0}0,0/0,0/0,1]0,1|0,1)0,1)0,1]1,1]1,1]1,1]1,1]1,1]4,1}1,0|1,0 ol bits 
4,0]1,0|1,0]1,0/0,0/0,1|0,1)0,1/1,1]1,1]1,1]1,10,1/0,1)0,1/0,0]1,0]1,0}1,0/1,0/0,1/0,1]0,1)0,1/1,1]1,1]1,0|1,0]0,0 
4,0)1,0]0,0]0,1|1,1}1,1/0,10,0/1,0]1,0/0,0]0,1)1,1/1,10,1/0,1]1,1)1,0/0,0/0,1]1,1)1,1/0,0}0,0]1,0]1,1/0,10,1/1,0/1,0]0,0 scape la 
1,0/0,0/1,1]0,1)1,0/0,1]1,1]0,1|1,0/0,0/1,1/0,1|1,0/0,0/1,0/0,1/1,1/0,1/1,0/0,1)1,0]0,0]1,1/0,0}1,0/0,1|1,0|0,1)1,1/0,0)1,1]0,1 
| BE EERE Ee | 6UlLUmdG <— sampling row 
seed G,|G.|G.]S./G,|G,]G,]G,/G,/S,|G,|G,]G,/G,]G,|G,]$,]G,/S,/G,]G,]G,]G,/G_] 8, | <—— signal to stop at 
growth next power of 2 


Randomized Self-Assembly, Fig. 4 Computing the bi- 
nary string 10 (equivalently, the natural number m = 
2) from tile concentrations. For brevity, glue strengths 
and labels are not shown. Each column increments the 
primary counter, represented by the bits on the left of each 
tile, and each gray tile increments the sampling counter, 


represented by the bits on the right of each tile. The 
number of bits at the end is 7 + k, where c is a constant 
coded into the tile set and k depends on m, and] = k+c. 
The most significant k bits of the sampling counter encode 
m. In this example, kK = 2 andc = 1 
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filler tiles 7 
fill in square 

base-8 counter 
counts down from 
n to 0, alternating 
left-moving 
“decrement by two” 
rows and right- 
moving “copy” 
uadruple counter , rows to detect 
Seu eunG shift off c least underflow and stop 
m.,m., andm significant bits at 0 
ee e convert n 
to octal 
isolate most significant 
half of bits by sliding 
markers in from each 
side until they meet 
Since the 
width of the 
structure to 
the right is 
O(n?*), 
space is left 
over here for 
sufficiently 
large n, and it 
is filled in by 
the filler tiles 
emanating 
from the east 
wall of the 


square. 


Randomized _ Self-Assembly, _—_ Fig. 5 High-level 
overview of the entire construction solving Problem 3, 
not at all to scale. For brevity, glue strengths and labels 
are not shown. The double counter number estimator of 
Fig. 4 is embedded with two additional counters to create 
a quadruple counter estimating 71, m2, and m3, shown 
as a box labeled as “Fig. 4” in the above figure. In this 
example, m, = 4, m2 = 3, and m3 = 15, represented 
vertically in binary in the most significant 4 tiles at the 
end of the quadruple counter. Concatenating the bits of 


expansions, interwoven into a single bit string, 
is the binary expansion of n. Then each tile at 
the upper right of Fig.4 encodes not one but 
3 bits of n, or equivalently each encodes an octal 
digit of n. These bits are then used to assemble 


the tiles results in the string 001101011011, the binary 
representation of 859, which equals n — 2k — 4 for 
n = 871, so this example builds an 871X871 square. 


Once the counter ends, c tiles (C = 3 in this example) 
are shifted off the bottom, and the top half of the tiles 
are isolated (k = 4 in this example). Each remaining 
tile represents 3 bits of n, which are converted into octal 
digits, rotated to face upwards, and then used to initialize 
a base-8 counter that builds the east wall of the square. 
Filler tiles cover the remaining area of the square 


a counter that counts from n down to 0 as it 
grows north, and a constant set of tiles (similar to 
Fig. 2) expand this counter to grow about as far 
east as the counter grows north, creating ann xn 
square that surrounds the assembly of Fig. 4. 
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B | 
double counter 0 


Ss 


Randomized Self-Assembly, Fig. 6 On the /eft is the 
seed block used to replace the seed block of [5], from 
which the construction of [5] can assemble a scaled ver- 
sion of the shape S (encoded by a binary string represent- 
ing the list of coordinates, also labeled “S” in the figure). 
S is output by the single-tape Turing machine program 
a. I is estimated from tile concentrations as in Fig. 4, 


Since mj = O(n"/3), and the tiles of Fig. 4 
create a structure of height O(log m;) and width 
O(m?) = O(n?/3), the square is sufficiently 
large to contain the tiles of Fig. 4. 

Finally, the tiles of Fig. 4 are used in a different 
way to solve Problem 4, shown in Fig. 6. Given a 
finite shape S, Soloveichik and Winfree [5] use 
an intricate construction of a “seed block” that 
“unpacks,” from a set of tile types that depend 
on S, a single-tape Turing machine program 
x € {0,1}* that outputs a binary string bin(S) 
representing a list of the coordinates of S. 

The width of the seed block is then c, chosen 
to be large enough to do the unpacking and also 
large enough to accommodate the simulation of 
x by a tile set that simulates single-tape Turing 
machines. Once this seed block is in place, a tile 
set then assembles the scaled shape by carrying 
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then four copies of it are propagated to each side of the 
block, where it is executed in four rotated, but otherwise 
identical, computation regions. When completed, four 
copies of the binary representation of S border the seed 
block, which is sufficient for the construction of [5] to 
assemble a scaled version of S using a spanning tree of 
S as shown on the right 


bin(S) through each block. The order in which 
blocks are assembled is determined by a spanning 
tree of S, so that any blocks with an ancestor 
relationship have a dependency, in that the an- 
cestor must be (mostly) assembled before the 
descendant, whereas blocks without an ancestor 
relationship can potentially assemble in parallel. 

We replace the seed block tiles of [5], which 
depend on S, with a single tile system that 
produces the program z from tile concentrations, 
and use the remainder of the tile set of [5] 
unchanged. This is illustrated in Fig. 6. Choose 
c to be sufficiently large that 2 can be simulated 
within the trapezoidal region of the c x c 
block of Fig.6 and also sufficiently large 
that the construction of Fig.4 has sufficient 
room to estimate the binary string a from 
tile concentrations in the center region (the 


1766 


Is —4,—1 


-—T3— 


Randomized Self-Assembly 


t—T,— ETA 


96 g g g2 
Te Ls fe is hohe in ok fr ok | 
92 


95 
96 95 


4 
94 


3 
93 


HRs 


+—R,z— +—R3— 


+—R2— FRA 


Randomized Self-Assembly, Fig. 7 Example of solution to Problem 5 for the case of expected length 92 


“double counter estimator’) of Fig.6. Once this 
is done, the construction of [5] can take over 
and assemble the entire scaled shape S°. The 
portion of the construction of [5] that achieves 
this is a constant-size tile set, so combined with 
the presented construction remains constant. This 
solves Problem 4. 

Finally, Problems 5 and 6 have solutions due 
to Chandran, Gopalkrishnan, and Reif [2], which 
we now explain intuitively (the actual analysis 
is a bit trickier but is close to the following 
intuitive argument). Figure 7 shows an example 
of a solution to Problem 5 for the case of expected 
length n 92. Each Tjg tile type has an east 
glue, g;, that matches two tile types T@j_1), 
and Rq—1)4. There are O(logn) “stages” (five 
stages in this case). Each stage has probability 
; to either decrement the stage or reset back to 
the highest stage. The number n is programmed 
into the system by choosing each stage to have 
either | or 2 tiles. Given that we are in stage 
i, to make it from stage i to stage 1 without 
resetting means that i consecutive unbiased coin 
flips must come up “heads,” which we expect 
to take 2' flips before happening. Thus we ex- 
pect stage 7 to appear 2! times; this means that 
stage i’s expected contribution to the total length 
is either 2’ or 2 - 2', depending on whether 
it has 1 or 2 tiles. The reason this works to 
encode arbitrary natural numbers 7 is that ev- 
ery natural number can be expressed as n 
oS b;2', where b; € {1,2}. Since there are 
a constant number of tile types per stage, this 
implies that the number of tile types required is 
O(logn). 


This solves Problem 5. To solve Problem 6, it 
suffices to concatenate k independent assemblies 
of the kind shown in Fig. 7, where k is a constant 
that, if chosen sufficiently large based on 6 (the 
desired error probability), solves Problem 6 since 
it increases the number of tile types required. In 
addition to proving that this works, Chandran, 
Gopalkrishnan, and Reif [2] also show a more 
complex construction with even sharper bounds 
on the probability that the length differs very 
much from its expected value. 


Open Problems 


The construction resolving Problem 3 shows that 
for every 5,€ > 0, a tile set exists such that, for 
every  € N, appropriately programming the tile 
concentrations results in the self-assembly of a 
structure of size O(n‘) x O(logn) whose right- 
most tiles represent the value 1 with probability at 
least 1—6. (In the tile system described, « = 2/3, 
and it could be made arbitrarily close to 0 by 
estimating more than 3 numbers at once.) Is this 
optimal? 

Formally, say that a tile assembly system 7 = 
(T, 0,2) is 6-concentration programmable (for 
5 > 0) if there is a (total) computable function 
r Ap[T] — N (the representation func- 
tion) such that, for each n € N, there is a 
tile concentration assignment p : T — [0, 00) 
such that Pr[r(7(p)) = n] => 1 — 6. In other 
words, 7, programmed with concentrations p, 
almost certainly self-assembles a structure that 
“represents” n, according to the representation 
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function r, and such a p can be found to create 
a high-probability representation of any natural 
number. 


Question 1 Is the following statement true? For 
each 6 > 0, there is a tile assembly system 7 and 
a representation function r : Ag|[T] — N such 
that 7 is 6-concentration programmable and, for 
each € > 0 and all but finitely many n € N, 
Pr[|dom 7 (p)| < n£] => 1—4. If so, what is the 
smallest bound that can be written in place of n‘? 
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Problem Definition 


Generally speaking, data structures come in two 
types: those that represent data and those that 
allow efficient searching. The collection of results 
in the area of range searching belongs to the latter 
type. We distinguish two computation phases: 
during the preprocessing phase, data is stored in 
some suitable structure, so that during the query 
phase, all data that lies inside a query range can 
be found and reported efficiently. 

In the most basic form of range searching, 
the data consists of points in a one-, two-, or 
higher-dimensional space, and the query range 
is a simple shape like a rectangle, triangle, or 
circle. Even for this basic form, there are many 
different data structures and corresponding query 
algorithms. 


Problem 1 (Range Searching) 


INPUT: Set P of n points in R?. 

OUTPUT: Description of a data structure 
storing P and query algorithm that will 
report, for any given query d-rectangle (d- 
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simplex, d-sphere) q, all points of P that lie 
inside q. 


When the data is not a set of points but 
a set of more complex objects, such as line 
segments, triangles, circles, or other geometric 
shapes, we may not be interested in only the 
objects that lie completely within a query 
range but also all objects that intersect the 
range. 


Problem 1 (Intersection Searching) 


INPUT: Set S ofn non-crossing line segments in 
IR? (triangles in R3). 

OUTPUT: Description of a data structure stor- 
ing S and query algorithm that will report, 
for any given query line segment q, all line 
segments (triangles) of S that intersect q. 


—_— 9 


S 


For both range searching and intersection 
searching, we may be interested in different types 
of queries. In a counting query, we report the 
number of objects in P or S that lie in the range 
or intersect the query object. A reporting query 
must spend time at least linear in the number 
of objects reported, whereas a counting query 
returns a single value. Usually, small variations 
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of a data structure for reporting can be used for 
the counting version. 

Ray shooting is closely related to intersection 
searching. We are not interested in all objects 
intersected by a line segment but only the first 
along a directed ray. 


Problem 2 (Ray Shooting) 


INPUT: Set S of n non-crossing line segments in 
IR? (triangles in R3). 

OUTPUT: Description of a data structure stor- 
ing S and query algorithm that will report, 
for any given query point q and direction in 
IR? (IR), the first line segment (triangle) of 
S that is reached when q moves in the query 


direction. 


A combination of a data structure and a query 
algorithm forms a solution to a range-searching 
problem. The most important aspects of effi- 
ciency are the storage requirements of the data 
structure and the query time. Sometimes, prepro- 
cessing time and update time are also important. 
If the data structure is so large that it must be 
stored on background storage, I/O complexity 
becomes relevant. 

We can distinguish solutions with guaranteed 
efficiency and heuristics. The heuristic solutions 
used in practice nearly always have linear size but 
often have no guaranteed worst-case query time 
bounds. For example, R-trees [14] are among the 
most used data structures for range searching in 
practice. 

One of the most interesting practical 
approaches for range searching with provable 
bounds is approximate range searching. 
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Problem 3 (Approximate Range Searching) 


INPUT: Set P of 1n points in R¢. 

OUTPUT: Description of a data structure stor- 
ing P and query algorithm that will report, for 
any given query d-polyhedron q of constant 
complexity, all points of P that lie inside q, 
possibly but not necessarily some points that 
lie within distance e from q, and no points that 
lie farther than e from q. 


In this entry we concentrate on algorithmic 
results that have provable worst-case bounds for 
both the storage requirements and the query time. 


Key Results 


Orthogonal Range Searching 

Range searching in one dimension is just search- 
ing in a sorted sequence of values. Standard 
binary search trees for one-dimensional searching 
can be extended in several ways to allow rect- 
angular range-searching queries. For example, a 
kd-tree [4] is a balanced binary tree on a set of 
points in R¢ that splits the point set on different 
coordinates in different nodes: the root splits on 
x -coordinate, its two children on x2-coordinate, 
their four children on x3-coordinate, and so on; 
after the splitting on the xg-coordinate, the tree 
starts over by splitting on x;-coordinate again. As 
soon as there is a single point left, it is stored in a 
leaf. 

An (orthogonal) range tree [5, 10, 15] uses 
associated structures, a technique that has proved 
to be very powerful for solving various kinds 
of query problems. It refers to the fact that the 
structure has a main tree, and each internal node 
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v of the tree stores — besides two pointers to 
children — an extra pointer to a different data 
structure. Suppose that in the main tree, node v 
is root of a subtree storing a subset S, of the 
whole set S. Then the associated structure of v 
also stores S\,, but in a different manner. 


A range tree for a set S of points in d-space 
consists of a main tree that is a balanced binary 
search tree on xg-coordinate. The leaves of the 
main tree store the points of S sorted on xg- 
coordinate in the leaves. If d > 1, then each 
internal node v stores a pointer to a (d — 1)- 
dimensional range tree that stores S,, restricted to 
their first d — 1 coordinates. 

The performance of kd-trees and range trees 
is given in Table 1. To achieve the stated query 
time for range trees, an additional technique 


1770 


called fractional cascading is needed [7]. The 
table also shows that in special cases, like 2- 
dimensional range queries in which one side 
of the query rectangle is unbounded (a 3-sided 
range), better results can be obtained using 
priority search trees. Other small improvements 
can be obtained, also depending on the machine 
model. 


Simplex Range Searching 

The range-searching problem with d-simplices is 
considerably harder than when the query shape 
is an axis-aligned d-box. There are two types 
of solutions: solutions with near-linear-size data 
structures and solutions with near-logarithmic 
query time. The results for d-simplex and d -half- 
space searching are given in Table 2. 

Between the extremes of space-efficient data 
structures and query-efficient data structures, 
many other results “in between” can be obtained. 
For example, if for a problem in the plane 
one knows that a linear number of triangle 
range queries are needed, then one can use a 
data structure of size O(n*/3) and query time 
close to O(n!/3 + k), because this balances 
the preprocessing time (roughly the same as 
the size) and total query time (without the 
time for reporting) to something close to 
O(n4/3), 


A (1/r)-cutting of a set H of hyperplanes is 
a set & of (relatively open) disjoint simplices 
covering R?@ so that each simplex intersects at 
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most n/r hyperplanes of H. Cutting trees are 
based on this concept. We state the main result on 
cuttings as a theorem, because it has implications 
to multidimensional divide-and-conquer schemes 
as well. 


Theorem 1 ((6]) Let H be a set of n hyper- 
planes andr <n a parameter. Setk = [log,r]. 
There exist k cuttings &,,...,&% so that 
&; is a (1/2')-cutting of size O(2'%), each 
simplex of &; is contained in a simplex of 
&i-1, and each simplex of &;-1, contains a 
constant number of simplices of &;. More- 
over, &y,...,&,% can be computed in time 


O(nr@-!), 


Data structures for range searching with 
curved boundaries can be obtained by lineariza- 
tion techniques. For example, range searching 
with a d-ball can be done by mapping each 
point (x1,...,xq) from the set to a point 
(x1, ...,%a, XP + +++ + x2) in R?2+! and storing 
these points in a (d + 1)-dimensional half- 
space range query structure. A d-ball with center 
(b,,...,bq@) and radius r is mapped to the half- 
space Xg41 < by(2x, — by) + +++ + ba (2xqa — 
bg) + r?, and now the mapped points inside 
the mapped half-space correspond exactly to the 
original points inside the d-ball. 

Intersection searching and ray shooting data 
structures are often based on the technique of as- 
sociated structures mentioned before. Depending 
on the type of stored objects and the type of query 
objects (or query rays), different main trees and 
associated structures are combined into efficient 
solutions. 


Approximate Range Searching 

Many of the given data structures are not very 
useful in practice, especially in higher dimen- 
sions. One of the more interesting approaches to- 
ward a practical data structure for range searching 
that has performance guarantees is the approxi- 
mate approach. The idea is that the query range is 
considered a shape with an inner boundary and a 
buffer zone around it. All points inside the inner 
boundary must be reported, all points outside the 
inner boundary but inside the buffer zone may 
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Range Searching, Table 1 Results on orthogonal range searching. n is the number of points stored and k is the number 


of points reported 


Query range Storage 

d-box O(n) 

d-box O(n log?! n) 
3-sided rectangle O(n) 


Query time Reference 

O(n aaa k) kd-tree [4] 

O(log?! n+k) Range tree [10] 
O(logn +k) Priority search tree [13] 


Range Searching, Table 2 Results on simplex and half-space range searching. n is the number of points stored, k is 
the number of points reported, c is some constant, and € > 0 is an arbitrarily small constant 


Query range Storage 
d-simplex O(n) 
d-simplex Oo(nété) 
d-half-space O(n log logn) 


d-half-space O(n!4/2) log n) 


but need not be reported, and all points outside 
the buffer zone may not be reported. A query 
will specify the inner boundary and a distance to 
the inner boundary that is the width of the buffer 
zone. 

Assuming that the inner boundary has constant 
complexity and the buffer zone has width « - D, 
where D is the diameter of the inner boundary 
(€ is any positive constant), an approximate range 
query can be answered in O(logn + 1/7) time, 
where d is the dimension of the space [3]. When 
the inner boundary is convex, the query time can 
be improved slightly. 


Cross-References 


1/O-Model 
Point Location 
R-Trees 


Recommended Reading 


1. Agarwal PK (2004) Range searching. In: Goodman 
JE, O’Rourke J (eds) Handbook of discrete and com- 
putational geometry, chapter 36, 2nd edn. Chapman 
& Hall/CRC, Boca Raton 

2. Agarwal PK, Erickson J (1998) Geometric range 
searching and its relatives. In: Chazelle B, Goodman 


Query time Reference 

O(n Iz 4 k) Partition trees [11] 
O(logn +k) Cutting trees [8] 
O(n! logon +k) | [12] 

O(logn +k) (2, 12] 


10. 


15. 


. Arya S, Mount DM (2000) 


. Bentley JL (1980) 


. Chazelle B, Sharir M, Welzl E (1992) 


J, Pollack R (eds) Advances in discrete and compu- 
tational geometry. American Mathematical Society, 
Providence, pp 1-56 

Approximate range 
searching. Comput Geom 17(3-4):135-152 


. Bentley JL (1975) Multidimensional binary search 


trees used for associative searching. Commun ACM 
18(9):509-517 

Multidimensional divide-and- 
conquer. Commun ACM 23(4):214—229 


. Chazelle B (1993) Cutting hyperplanes for divide- 


and-conquer. Discret Comput Geom 9:145-158 


. Chazelle B, Guibas LJ (1986) Fractional cascading: I 


and IL. Algorithmica 1(2):133-191 
Quasi- 
optimal upper bounds for simplex range searching 


and new zone theorems.  Algorithmica 8(5&6): 
407-429 


. de Berg M, Cheong O, van Kreveld M, Overmars M 


(2008) Computational geometry — algorithms and 
applications, 3rd edn. Springer, Berlin 

Lueker GS (1978) A data structure for orthogonal 
range queries. In: The annual symposium of the 
foundations of computer science (FOCS), Ann Arbor. 
IEEE Computer Society, pp 28-34 


. Matousek J (1992) Efficient partition trees. Discret 


Comput Geom 8:315-334 


. Matousek J (1992) Reporting points in halfspaces. 


Comput Geom 2:169-186 


. McCreight EM (1985) Priority search trees. SIAM J 


Comput 14(2):257-276 


. Samet H (2006) Foundations of multidimensional 


and metric data structures. Morgan Kaufmann, San 
Francisco 

Willard DE (1979) The super-b-tree algorithm. Re- 
port TR-03-79, Aiken Computer Laboratory, Harvard 
University, Cambridge 


1772 


Rank and Select Operations on Bit 
Strings 


Rajeev Raman 
Department of Computer Science, University of 
Leicester, Leicester, UK 


Keywords 


Bit vectors; Predecessor search; Sets; Succinct 
data structures 


Years and Authors of Summarized 
Original Work 


1974; Elias 

1989; Jacobson 

1998; Clark 

2007; Raman, Raman, Rao 

2008; Patrascu 

2014; Golynski, Orlandi, Raman, Rao 


Problem Definition 


Given a static bit string b = b,...bm, the 
objective is to preprocess b and to create a space- 
efficient data structure that supports the following 
operations rapidly: 


rank, (i) takes an index i as input, 1 <i < m, 
and returns the number of 1s among b, ... 5;. 

select; (i) takes an index i > 1 as input and 
returns the position of the i-th 1 in b, and —1 
if 7 is greater than the number of 1s in B. 


A data structure that supports the operations 
above will be called a bit vector. The operations 
ranko and selecto are defined analogously for 
the Os in b. As ranko(i) = i — rank,(i), one 
considers just rank, (abbreviated to rank) and 
refers to selectp and select, collectively as 
select. In what follows, |x| denotes the length 
of a bit string x and w(x) denotes the number of 
1s in it. b is always used to denote the input bit 
string, m to denote |b| and n to denote w(b). 
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Memory Usage Models 

In terms of space usage, we aim not only to 
store b in the minimum amount of space but 
also to minimize any additional space (called the 
redundancy) needed to support rank and select. 
The notion of redundancy can be formalized in 
two different ways. 

In the succinct index model (also known as the 
systematic model), the bit vector does not have 
direct access to b, but can obtain O(log m) con- 
secutive bits of b in O(1) time. During prepro- 
cessing, one can create additional data structures 
(called succinct indices) to allow rapid rank and 
select queries. Indices allow the representation 
of b to be decoupled from the auxiliary data 
structure, e.g., b can be stored (in a potentially 
highly compressed form) in a data structure such 
as that of [6]. The redundancy in the succinct 
index model is the space usage of the index. 

In the unrestricted model, we give a “space 
budget” for storing b, based upon some com- 
pressibility measure (the data structure is usually 
designed to target a particular measure). We now 
give some examples of space budgets: 


¢ The obvious space budget for b is m bits, and 
this is used if b is believed to be incompress- 
ible. 

¢ Recalling that n = w/(b), we define the 
space budget B(m,n) = [log (")1, which is 
the information-theoretic minimum number of 
bits to store a bit string of length m with n 1s. 
Using standard approximations of the factorial 
function, one can show [17] that B(m,n) = 
nlog,(m/n) + nlog, e + O(n?/m). In par- 
ticular, if 7 = o(m), then B(m,n) = o(m). 

e Yet another space budget is obtained from 
the k-th-order empirical entropy, denoted by 
H;,(b). For any bit string s, define #(s) as the 
number of (possibly overlapping) contiguous 
occurrences of the bit string s in b. Then, for 
any k > 0, 


1 #(s0 
Hiijes' (#0 lon 
- se{0,1}* 
#(s1) 
+#(s1) log, Frey ) (1) 
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(take log, (0/0) = Olog, 0 = 0 and #(s) = m 
when s is the empty string). H;,(b) gives the 
information content per bit in b, when con- 
ditioned upon the previous k bits as context. 
The space budget is therefore mH;(b). Note 
that mHo(b) ~ B(m,n), but even H,(b) can 
be much smaller than Ho(b), and in general 
Ag+, < Hy. For example, if b = (01)”/2, 
then Ho(b) ~ m but mH,(b) vanishes. 


The redundancy in the unrestricted model is the 
difference between the space usage of the data 
structure and the space budget. 


Models of Computation 

Three models of computation are commonly con- 
sidered. One is the word RAM model with word 
size O(log m) bits [13]. The other models, which 
are particularly useful for proving lower bounds, 
are the cell probe and bit probe models. In the cell 
probe model, the time complexity of answering 
a query is the worst-case number of words of 
O(logm) consecutive bits of the data structure 
that are read by the algorithm to answer that 
query. All other computation is “free.” The bit 
probe model is similar, except that we only count 
the number of bits of the data structure that are 
read when answering a query. Clearly, O(log m) 
bit probes can be more useful than reading O(1) 
consecutive words, so O(logm) bit probes are 
more powerful than O(1) cell probes. Also, O(1) 
cell probes are more powerful than O(1) time on 
the word RAM, since computation on values read 
into registers is for free in the cell probe model. 
Thus, an O(t) upper bound in the word RAM is 
stronger than O(f) upper bound in the cell probe 
model, which is stronger than an O(t logm) 
upper bound in the bit probe model. For lower 
bounds, the situation is of course reversed, with 
cell probe lower bounds being stronger than 
equivalent word RAM lower bounds. 


Key Results 


Relation to Predecessor Search 
Given a static set S C {0,...,m— 1}, |S| =n, 
the predecessor search problem is to preprocess 
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S to answer the query pred(x,S) = max{y € 
S|y < x}. The predecessor search can easily be 
solved using a bit vector: we simply create a bit 
string b that is the characteristic vector of S, and 
note that (i) |b] = m, (ii) w(b) = n = |S], and 
(iii) pred(x, S) = select, (rank, (x)). 

Clearly, if we are interested in highly space- 
efficient solutions, space usages of significantly 
more than O(nlogm) bits are not of interest, 
since any bit string b can be represented as a 
set using O(nlogm) bits by enumerating the 
positions of its 1s. However, this close connection 
of the bit vector problem to the predecessor 
search problem means that lower bounds for the 
predecessor search problem also apply to the 
bit vector problem. In particular, if rank should 
take O(1) time and the space should be at most 
O(n logm) bits, then this is only possible if n = 
m/(logm)?) [19]. Since constant-time rank 
(and select) is taken by the succinct data struc- 
ture community to be a “standard” expectation, 
this lower bound means that we only consider 
moderately sparse bit strings b in this entry. 


Reductions 

It has been already noted that ranky and rank; 
reduce to each other and that operations on sets 
reduce to select operations on a bit string. Some 
other reductions, whereby one can support opera- 
tions on b by performing operations on bit strings 
derived from J, are: 


Theorem 1 (a) rank reduces to selecty on a bit 
string c such that |c| = m+n andw(c) =n. 
(b) If b has no consecutive 1s, then selecty on b 
can be reduced to rank on a bit string ¢ such 
that |c| = m—n and w(c) is eithern—1 orn. 
(c) From b, one can derive two-bit string bg and 
by such that |bop| = m—n, |hi| = 4, 
w(bo), w(b1) < min{m — n,n}, and selecty 
and select, on b can be supported by support- 

ing select, and rank on bo and by. 


Parts (a) and (b) follow from Elias’s observa- 
tions on multiset representations, specialized to 
sets. For part (a), create c from b by adding a 0 
after every 1. For example, if b = 01100100, then 
c = 01010001000. Then, rank; (i) on b equals 
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selecto(i)—i onc. For part (b), essentially invert 
the mapping of part (a). Part (c) is shown in [3]. 


Succinct Indices for Bit Vectors 


The following is known about the sizes of suc- 
cinct indices for bit vectors: 


O(m' log(n/m’)) 


O(n(1 + max{0, log(m'’/n)})) ifn 


where m' = m/logm, that supports rank, 
select), and select, in O(1) time. This index 
size is optimal for any data structure that makes 
O(log m) bit probes to b. 


This result generalizes an earlier result by 
Golynski, who showed that the index size must 
be O(mloglogm/logm) bits for O(1) time 
operations [9]. The bound of Theorem 2 is 
asymptotically the same when 7 is relatively 
close to m, e.g., when n = Q(m/(logm)!/?), 
but is smaller thereafter, e.g., for n = 
O(m/(logm)*) the index size implied by 
Theorem 2 O(m log logm/(logm)7) bits, which 
is a O(log m) factor better than that given by [9]. 

Elias [5] previously gave an o(m)-bit index 
that supported select in O(log m) bit probes on 
average (where the average was computed across 
all select queries). Jacobson [14] gave o(m)- 
bit indices that supported rank and select in 
O(log m) bit probes in the worst case. Clark and 
Munro [2] gave the first o(m)-bit indices that 
support both rank and select in O(1) time on the 
RAM. 


Bit Vectors in the Unrestricted Model 

In the unrestricted model, the best redundancy, 
if one is targeting the B(m,n) space budget, is 
given by the following result due to Patragcu: 


Theorem 3 ([18]) A bit string b with |b| = 
m and w(b) = n can be represented using 
B(m,n) + m/((logm)/t)! + m3/4(logm)OM 
bits of memory, supporting rank and select, 
queries in O(t) time. 


Earlier results, with a significantly higher redun- 
dancy, were given by [17, 21]. Thus, fort = 
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Theorem 2 ({11]) Given a bit string b with 
|b| = m, w(b) = n, and m/n = (logm)?™, 
there is an index of size 


aw(m’') 


O(m’) 


ifn 


bits, 


O(1), the redundancy is m/(logm)?™. There is 
an almost matching lower bound: 


Theorem 4 ((20]) Any representation of a bit 
string b with |b| = m and w(b) = n that answers 
rank or select, queries in O(t) time on the cell 
probe model must use B(m,n) + m/(logm) bits 
of memory. 


The case where we aim for higher-order entropy 
appears to be less well studied. The best-known 
result is as follows: 


Theorem 5 ((10]) A bit string b with |b| = 
m can be represented using mHy;(b) + 
O(mk/logm) bits of memory, supporting rank 
and select queries in O(1) time, for any k > 1. 


Applications 


Bit vectors are fundamental building blocks in 
a huge number of space-efficient data structures, 
in real-world and theoretical applications such as 
XML document representation [1, 4,7], text re- 
trieval [16], bioinformatics [15], and data mining 
[22], to name but a few. In the Cross-References, 
we list the various succinct data structures that 
build on or are related to bit vectors. 


Experimental Results 


Bit vectors have been extensively experimentally 
evaluated. Mature implementations are available 
in the libraries SDSL [8] and Succinct [12]. 
Other libraries of note are Vigna’s Sux4J (http:// 
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sux.di.unimi.it) and Claude’s libcds (https:// 
github.com/fclaude/libcds). 
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Problem Definition 


The query S.rank, (7) on a sequence S is defined 
to return the number of occurrences of the distinct 
character a among the first i characters of S, 
and the query S.selectg(j) is defined to return 
the position of the jth occurrence of a in S (if 
it exists). Since rank and select queries are fun- 
damental to the field of succinct and compressed 
data structures, researchers have proposed several 
data structures that answer them quickly while 
using little space. Most of these data structures 
also support fast random access to S, and a 
few of them support fast insertions and dele- 
tions of characters in S. Some of them return 
S.rank, (i) more quickly when the ith character 
of S is itself an a; the query is then called partial 
rank, 
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Key Results 


While considering how to store trees and graphs 
in small space while supporting fast navigation, 
Jacobson [16] considered the problem of sup- 
porting rank and select on binary sequences. He 
showed how to store an n-bit binary sequence 
using o(m) bits in addition to the sequence it- 
self, such that we can answer rank and select 
using O(log) bit probes. Later authors have 
considered the problem in the word-RAM model 
with (2(logn)-bit words, in which Jacobson’s 
implementation of rank takes O(1) time; they 
showed how to answer also select in this model 
in O(1) time while still using o(”) extra bits. 
Patragcu [20] showed how we can store an n- 
bit binary sequence containing m Is in a total 
of lg (”") + O(n/log® n) bits, where c is any 
constant, and still answer rank and select in O(1) 
time. 

Grossi, Gupta, and Vitter [12] described a 
data structure, called a wavelet tree, that uses 
rank and select on several binary sequences to 
answer access, rank, and select on sequences over 
larger alphabets. If S is a sequence of length 
n over an alphabet of size o and a wavelet 
tree for S is implemented with uncompressed 
data structures for rank and select on the binary 
sequences, then it takes n logo + o(n logo) bits 
and answers access, rank, and select in O(loga) 
time. With instances of Patragcu’s data structure, 
the space becomes nHo(S) + o(n) bits, where 
Ho(S) is the Oth-order empirical entropy of S. 
To simplify, we assume throughout thato = 
o(n/ logn). 

Ferragina, Manzini, Makinen, and Navarro [7] 
described a multiary version of the wavelet tree 
that uses only o( + 1) 


rank, and select, which is O(1) when o = 
1g°D n. Their implementation takes n Ho(S) + 
o(n) bits when o = igo) n and nHo(S) + 
o(n logo) bits otherwise. Golynski, Raman, and 
Rao [11] reduced the space to n Ho(S) +0(n) bits 
in the general case. 

Golynski, Munro, and Rao [10] described a 
data structure that takes nlgo + o(nlgo) bits 


logo 
log logn 


time for access, 
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and either answers select in O(1) time and access 
and rank in O(log logo) time, or answers access 
in O(1) time, rank in O(log log(c) log log log o) 
time, and select in O(log logo) time. If the space 
is increased to (1 + €)nlgo bits, where ¢€ is 
any positive constant, then both access and select 
take O(1) time and rank takes O(log log o) time. 
Golynski [9] showed that the product of the query 
times for access and select and the per-character 
redundancy in bits must be 2 (<2) in general, 
where w is the length of a machine word. 

Barbay, He, Munro, and Rao [1] described a 
data structure that takes nH;,(S) + o(n logo) 
bits, where H;,(S) is the kth-order empirical 
entropy of S, and answers access in O(1) time, 
rank in O(log log a (log log log o)”) time, and se- 
lect in O(log log(c) loglogloga) time. We as- 
sume throughout that k = o(log, n). They also 
reduced to n Ho(S)-+o0(n logo) bits, the space for 
the version of Golynski, Munro, and Rao’s data 
structure with O(1)-time select and O(log log a)- 
time access and rank. Grossi, Orlandi, and Ra- 
man [13] reduced the space of the version with 
O(1)-time access and O(log log a)-time select to 
nH;(S) + o(nloga) bits and reduced the time 
for rank to O(log logo). 

Barbay, Claude, Gagie, Navarro, and 
Nekrich [2] combined multiary wavelet trees 
with the versions of Golynski, Munro and, Rao’s 
data structure, to obtain a data structure that takes 
nHo(S) + o(n)(Ho(S) + 1) bits and answers 
one of access and select in O(1) time and the 
other in O(logloga) time, and rank also in 
O(logloga) time. If the space is increased to 
(1 + €)nHo(S) + o(n) bits, then both access and 
select take O(1) time. They partition the alphabet 
into sub-alphabets such that all the characters 
in each sub-alphabet have roughly the same 
frequency, and then store a data structure that 
answers access, rank, and select queries on the 
subsequence of characters in S from that sub- 
alphabet. 

Belazzougui and Navarro [3, 4] showed that 


any data structure that takes n - w?) space must 
logo 
log w 


use 2 (log ) time for rank. They also gave 


the following upper bounds: 
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¢ Wecan store S innHo(S) + 0(n) bits and an- 


logo 
(2 +1) 


time, which is O(1) when o = 1g°%) n. 

e We can store S innHo(S) + o(n)(Ao(S) + 
1) bits and answer access in O(1) time and 
select in O( f(n,@)) time or vice versa, where 
f(v,o) is any function in w(1), and answer 


swer access, rank, and select in O 


logo 
log w 


e We can store S in nHy(S) + o(n logo) bits 
and answer access in O(1) time, select in 


O(f(n,c)) time, and rank in O(log eee.) 


time. 


rank in O (log 


time in general and in O( f(n,o)) time when 
o=wW, 


These and the other bounds described above are 
summarized in Table 1. In another paper [5], 
Belazzougui and Navarro showed how we can 
add o(n)(Ho(S) + 1) bits to any of these repre- 
sentations and answer partial rank queries in the 
same time as access. 


Dynamic Sequences 

Several authors have described data structures 
that store binary sequences in succinct or 
compressed space and support fast rank, select, 
and update operations, typically insertions and 
deletions of bits. In particular, Navarro and 
Sadakane [19] described data structures that store 
a binary sequence B in |B|Ho(B) + o(|B}) bits 
and support rank, select, insert, and delete in 
o( eer) time, which is optimal [8]. These 
can be used in wavelet trees to obtain data 
structures that support rank and select on dy- 
namic sequences over larger alphabets. Navarro 
and Sadakane [19] and He and Munro [15] 
described data structures that store a sequence 
S in nHo(S) + o(n logo) bits, where n is the 
current length of S, and support access, rank, 
and select queries and insertions and deletions 


of characters in o( we = (si + 1)) time, 
which is O( we ) when o = 1g? n, 
elogn 
Navarro and Nekrich [17, 18] recently 


described a data structure that stores S in 
nHo(S) + o(nloga) bits and supports access, 
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Rank and Select Operations on Sequences, Table 1 
A summary of previous and current upper bounds for rank 
and select on a sequence S of length n over an alphabet 
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k = o(log, n), and f(n,a) = @(1). The bounds in 
the second row hold when 0 = 1g?) n and those in the 


last row hold when 0 = w?“™), with w the word length 


of size 0 = o(n/logn), with € a positive constant, 
Source Space (bits) Access Rank Select 
[12] nHo(S) + o0(n) O(loga) O(loga) O(loga) 
[7] nHo(S) + o(n) 1 1 1 
1 nH(S)-+otnioga)_—O(@8) —O(R) OC =) 
(I) | nHo(S) + o(n) O(nen) | O(aeter) O( ctr) 
[10] nlgo +o(nloga) O(log log a) O(log log a) 1 
[10] nlgo +o(nloga) 1 O((logloga)!**) O(log log a) 
[10] (+e)nlgo 1 O(log log a) 1 
1] nH,(S) + o(v loga) 1 O((logloga)!**) O((log loga)!+€) 
{1] nHo(S) + o(nloga) O(log log a) O(log log a) 1 
[13] nH,(S) + o( logo) 1 O(log log a) O(log log a) 
[2] nHo(S) + o(n)(Ao(S) + 1) 1 O(log log a) O(log log a) 
[2] nHo(S) + o(n)(Ao(S) + 1) O(log logo) O(log log a) 1 
[2] (1 + €)nAo(S) + o(n) 1 O(log log a) 1 
[3,4] | nHo(S)+o(n) o(@2+1)  o( +1) o( #2 +1) 
[3.4] nHo(S)+o(m)(Ho(S)+1) 1 O(log #82) O(f(n,0)) 
3.4] | nHo(S)+0@)(Ho(S) +1) O(f(n,0)) O(log #8) 1 
[3.4]  nHk(S) + 0(nlogo) 1 O(log #22) O(f(n,0)) 
[3,4] nHx(S) + o(nloga) 1 O(f(a,o)) O(f(a,o)) 
rank, select, insert, and delete in O (8 “) Applications 


time. This time bound is worst-case for the 
queries and amortized for the updates; the update 
times can be made worst-case as well at the cost 
of increasing the times for rank, insert, and delete 


from o( log ) to O(logn). Their structure is 


log logn 
essentially a multiary wavelet tree built using 
rank and select data structures for dynamic 
sequences over sublogarithmic alphabets, much 
like He and Munro’s or Navarro and Sadakane’s, 
but they divide those component sequences into 
polylogarithmic-sized blocks and augment them 
with pointers such that they can ascend and 
descend the tree using only the pointers and 
rank and select on individual blocks. 

Grossi, Raman, Rao, and Venturini [14] later 
reduced the time for access to O(1) while using 
nH;x(S) + o(nloga) bits but at the cost of 
being able only to replace characters instead of 
inserting and deleting them. The time for rank 
and select is the same. 


Jacobson [16] first studied rank and select 
for representing unlabeled trees succinctly 
and planar graphs almost succinctly, while 
supporting fast navigation queries. Since 
then, rank and select on binary sequences 
have been used in succinct and compressed 
representations of several other combinatorial 
objects, such as binary relations and general 
graphs. Rank and select on sequences over 
larger alphabets have been used in succinct 
and compressed representations of labeled 
trees and permutations and in compressed 
full-text indexes such as compressed suffix 
arrays. Notice that with a data structure for 
rank and select that achieves compression 
in terms of Oth-order empirical entropy, we 
can build a_ full-text index that achieves 
compression in terms of kth-order empirical 
entropy. 
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Open Problems 


The current main open problems regarding rank 
and select on static sequences are to answer 
access and select in O(1) time while storing 
S in nHo(S) + o(nloga) bits when lgo = 
o(w), to answer select in constant time and ac- 
cess in almost constant time while storing S' in 
nH;(S) + o(nloga) bits when k > 0, and to 
answer access, rank, and select queries in O(1) 
time while storing S inn H;,.(S) +o0(n) bits when 
o= 1g°) w. 

The current main open problems regarding 
rank and select on dynamic sequences are to 


achieve O( 082 
log logn 


erations while still using compressed space, to 
achieve a similar space bound in terms of H;(S) 
instead of Ho(S) while supporting the same op- 
erations, and to support a wider range of updates. 


) worst-case time for all op- 


Experimental Results 


The most recent experimental results for rank and 
select on static sequences are by Barbay et al. [2] 
and Claude, Navarro, and Ordéfiez [6]. These 
results show that rank and select data structures 
can be implemented in a time- and space-efficient 
way in practice, even when the alphabet size is 
large. There are no current experimental results 
for rank and select on succinct or compressed 
dynamic sequences. 
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Problem Definition 


This problem is concerned with matching a set 
of applicants to a set of posts, where each appli- 
cant has a preference list, ranking a non-empty 
subset of posts in order of preference, possibly 
involving ties. Say that a matching M is popular 
if there is no matching M’ such that the number 
of applicants preferring M’ to M exceeds the 
number of applicants preferring M to M’. The 
ranked matching problem is to determine if the 
given instance admits a popular matching and 
if so, to compute one. There are many practical 
situations that give rise to such large-scale match- 
ing problems involving two sets of participants — 
for example, pupils and schools, doctors and 
hospitals — where participants of one set express 
preferences over the participants of the other set; 
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an allocation determined by a popular matching 
can be regarded as an optimal allocation in these 
applications. 


Notations and Definitions 

An instance of the ranked matching problem is 
a bipartite graph G = (A UP, E) anda partition 
E = E, UE>...U E, of the edge set. Call the 
nodes in A applicants, the nodes in P posts, and 
the edges in E; the edges of rank i. If (a, p) € E; 
and (a, p’) € E; withi < /, say that a prefers p 
to p’. Ifi = j, say that a is indifferent between p 
and p’. An instance is strict if the degree of every 
applicant in every £; is at most one. 

A matching M is a set of edges, no two of 
which share an endpoint. In a matching M, a node 
ué€é AUP is either unmatched, or matched to 
some node, denoted by M(u). Say that an appli- 
cant a prefers matching M’ to M if (i) a is matched 
in M’ and unmatched in M, or (ii) a is matched in 
both M’ and M, and a prefers M'(a) to M(a). 


Definition 1 M’ is more popular than M, denoted 
by M’ > M, if the number of applicants prefer- 
ring M’ to M exceeds the number of applicants 
preferring M to M’. A matching M is popular if 
and only if there is no matching M’ that is more 
popular than M. 


Figure | shows an instance with A = {ay, 
a2,a3}, P={pi, p2, p3}, and each applicant 
prefers p; to p2, and pz to p3 (assume throughout 
that preferences are transitive). Consider the 
three symmetrical matchings M, = {(d1, p1), 
(a2, P2), (a3, p3)}, Mz = {(a1, p3), (a2, Pi); 
(a3,p2)} and M3={(a1,p2), (a2, ps), 
(a3, pi)}. It is easy to verify that none of 
these matchings is popular, since M, < M2, 
M> ~ M3, and M3 ~ M,. In fact, this instance 
admits no popular matching — the problem being, 
of course, that the more popular than relation is 
not acyclic, and so there need not be a maximal 
element. 

The ranked matching problem is to determine 
if a given instance admits a popular matching, 
and to find such a matching, if one exists. Popular 
matchings may have different sizes, and a largest 
such matching may be smaller than a maximum- 
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a: Pi P2 Ps 
a2: Pi P2 Pz 
a3: Pi P2 Pz 


Ranked Matching, Fig. 1 An instance for which there is 
no popular matching 


cardinality matching. The maximum-cardinality 
popular matching problem then is to determine if 
a given instance admits a popular matching, and 
to find a largest such matching, if one exists. 


Key Results 


First consider strict instances, that is, instances 
(A U P, E) where there are no ties in the prefer- 
ence lists of the applicants. Let n be the number 
of vertices and m be the number of edges in G. 


Theorem1 Fora strict instanceG =(A UP,E), 
it is possible to determine in O(m + n) time if G 

admits a popular matching and compute one, if it 

exists. 


Theorem 2 Find a maximum-cardinality popu- 
lar matching of a strict instance G = (A UP, E), 
or determine that no such matching exists, in 
O(m + n) time. 


Next consider the general problem, where prefer- 
ence lists may have ties. 


Theorem 3 Find a popular matching of G = 
(A UP, E), or determine that no such matching 
exists, in O(./nm) time. 


Theorem 4 Find a maximum-cardinality popu- 
lar matching of G = (AUP, E), or determine 
that no such matching exists, in O(./nm) time. 


Techniques 

Our results are based on a novel characterization 
of popular matchings. For exposition purposes, 
create a unique last resort post [(a) for each 
applicant a and assign the edge (a,/(a)) a rank 
higher than any edge incident on a. In this way, 
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assume that every applicant is matched, since any 
unmatched applicant can be allocated to his/her 
last resort. From now on then, matchings are 
applicant-complete, and the size of a matching 
is just the number of applicants not matched to 
their last resort. Also assume that instances have 
no gaps, i.e., if an applicant has a rank i edge 
incident to it then it has edges of all smaller ranks 
incident to it. First outline the characterization 
in strict instances and then extend it to general 
instances. 


Strict Instances 

For each applicant a, let f(a) denote the most 
preferred post on a’s preference list. That is, 
(a, f(a)) € Ey. Call any such post p an f-post, 
and denote by f(p) the set of applicants a for 
which f(a) = p. 

For each applicant a, let s(a) denote the most 
preferred non-f-post on a’s preference list; note 
that s(a) must exist, due to the introduction of 
I(a). Call any such post p an s-post, and remark 
that f-posts are disjoint from s-posts. 

Using the definitions of f-posts and s-posts, 
show three conditions that a popular matching 
must satisfy. 


Lemma 1 Let M be a popular matching. 


1. For every f-post p, (i) p is matched in M, and 
(ii) M(p) € f(p). 

2. For every applicant a, M(a) can never be 

strictly between fia) and s(a) on a’s preference 

list. 

For every applicant a, M(a) is never worse 


sad 


than s(a) on a’s preference list. 


It is then shown that these three necessary con- 
ditions are also sufficient. This forms the basis 
of the following preliminary characterization of 
popular matchings. 


Lemma 2 A matching M is popular if and only if 
(i) every f-post is matched in M, and (ii) for each 
applicant a, M(a) € { f(a), s(a)}. 

Given an instance graph G=(AUP,E), 
define the reduced graph G' =(AUP,E’) 
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as the subgraph of G containing two edges for 
each applicant a: one to f(a), the other to s(qa). 
The authors remark that G’ need not admit an 
applicant-complete matching, since [(a) is now 
isolated whenever s(a) 4 /(a). Lemma 2 shows 
that a matching is popular if and only if it belongs 
to the graph G’ and it matches every f-post. Recall 
that all popular matchings are applicant-complete 
through the introduction of last resorts. Hence, 
the following characterization is immediate. 


Theorem 5 M is a popular matching of G if and 
only if (i) every f-post is matched in M, and (ii) M 
is an applicant-complete matching of the reduced 
graph G’. 


The characterization in Theorem 5 immediately 
suggests the following algorithm for solving the 
popular matching problem. Construct the reduced 
graph G’. If G’ does not admit an applicant- 
complete matching, then G admits no popular 
matching. If G’ admits an applicant-complete 
matching M, then modify M so that every f-post 
is matched. So for each f-post p that is unmatched 
in M, let a be any applicant in f(p); remove the 
edge (a, M(a)) from M and instead match a to p. 
This algorithm can be implemented in O(m + n) 
time. This shows Theorem 1. 

Now, consider the maximum-cardinality pop- 
ular matching problem. Let A, be the set of all 
applicants a with s(a) =/(a). Let Ay be the 
set of all applicants with s(a) = /(a). Our target 
matching must satisfy conditions (i) and (ii) of 
Theorem 5, and among all such matchings, allo- 
cate the fewest A;-applicants to their last resort. 
This scheme can be implemented in O(m +n) 
time. This proves Theorem 2. 


General Instances 

For each applicant a, let f(a) denote the set of 
first-ranked posts on a’s preference list. Again, 
refer to all such posts p as f-posts, and denote by 
J(p) the set of applicants a for which p € f(a). It 
may no longer be possible to match every f-post p 
with an applicant in f(p) (as in Lemma 1), since, 
for example, there may now be more f-posts than 
applicants. Let M be a popular matching of some 
instance graph G = (A U P, E). Define the first- 
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choice graph of Gas G; = (A UP, E,), where 
FE, is the set of all rank one edges. Next the 
authors show the following lemma. 


Lemma 3 Let M be a popular matching. Then 
M 1 E; is a maximum matching of G. 


Next, work towards a generalized definition of 
s(a). Restrict attention to rank-one edges, that is, 
to the graph G; and using M,, partition A UP 
into three disjoint sets. A node v is even (respec- 
tively odd) if there is an even (respectively odd) 
length alternating path (with respect to M,) from 
an unmatched node to v. Similarly, a node v is 
unreachable if there is no alternating path (w.r.t. 
M)) from an unmatched node to v. Denote by F, 
O, and U the sets of even, odd, and unreachable 
nodes, respectively. Conclude the following facts 
about F, O, and U by using the well-known 
Gallai-Edmonds decomposition theorem. 

(a) £, O, and U are pairwise disjoint. Every 
maximum matching in G;, partitions the 
vertex set into the same partition of even, 
odd, and unreachable nodes. 

In any maximum-cardinality matching 
of G;, every node in © is matched with 
some node in £, and every node in U 
is matched with another node in U. The 
size of a maximum-cardinality matching is 
|O| + |Ul/2. 

No maximum-cardinality matching of G, 
contains an edge between two nodes in O, 
or anode in © and a node in U. And there is 
no edge in G; connecting a node in F with 
a node in U. 


(b) 


(c) 


The above facts motivate the following definition 
of s(a): let s(a) be the set of most preferred posts 
in a’s preference list that are even in G, (note that 
s(a) 4 @, since I(a) is always even in G,). Recall 
that our original definition of s(a) led to parts (2) 
and (3) of Lemma | which restrict the set of posts 
to which an applicant can be matched in a popular 
matching. This shows that the generalized defini- 
tion leads to analogous results here. 


Lemma 4 Let M be a popular matching. Then 
for every applicant a, M(a) can never be strictly 
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between f(a) and s(a) on a’s preference list and 
M(a) can never be worse than s(a) in a’s prefer- 
ence list. 


The following characterization of popular match- 
ings is formed. 


Lemma 5 A matching M is popular in G 
if and only if (i) MO Ey 
matching of Gi, and (ii) for each applicant a, 
M(a) € f(a) Us(a). 


is a maximum 


Given an instance graph G = (AU P, E), we 
define the reduced graph G' = (AUP, E’) as 
the subgraph of G containing edges from each 
applicant a to posts in f(a) U s(a). The authors 
remark that G’ need not admit an applicant- 
complete matching, since /(a) is now isolated 
whenever s(a) 4 {/(a)}. Lemma 11 tells us that 
a matching is popular if and only if it belongs to 
the graph G’ and it is a maximum matching on 
rank one edges. Recall that all popular matchings 
are applicant-complete through the introduction 
of last resorts. Hence, the following characteriza- 
tion is immediate. 


Theorem 6 M is a popular matching of G if and 
only if (i) M1 Ey is a maximum matching of 
Gj), and (ii) M is an applicant-complete matching 
of G'. 


Using the characterization in Theorem 6, the 
authors now present an efficient algorithm for 
solving the ranked matching problem. 


Popular-Matching (G = (A UP, E)) 


1. Construct the graph G’ =(AUP,E’), 
where E’ = {(a, p)| p € f(a) Us(a), aE A}. 

2. Compute a maximum matching M, on rank 
one edges i.e., M; is a maximum matching in 
G, =(AUP, Fj). 

(M, is also a matching in G’ because 
E’ > Fy) 

3. Delete all edges in G’ connecting two nodes 
in the set O or a node in © with a node in 
U, where © and U are the sets of odd and 
unreachable nodes of G; = (A U P, F)). 

Determine a maximum matching M in the 
modified graph G’ by augmenting M). 
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4. If M is not applicant-complete, then declare 
that there is no popular matching in G. Else 
return M. 


The matching returned by the algorithm Popular- 
Matching is an applicant-complete matching in 
G’ and it is a maximum matching on rank one 
edges. So the correctness of the algorithm follows 
from Theorem 6. It is easy to see that the running 
time of this algorithm is O(./nm). The algorithm 
of Hopcroft and Karp [7] is uesd to compute 
a maximum matching in G, and identify the set 
of edges E’ and construct G’ in O(./nm) time. 
Repeatedly augment M, (by the Hopcroft—Karp 
algorithm) to obtain M. This proves Theorem 3. 

It is now a simple matter to solve the 
maximum-cardinality popular matching problem. 
Assume that the instance G = (AUP, E) 
admits a popular matching. (Otherwise, the 
process is done.) In order to compute an 
applicant-complete matching in G’ that is 
a maximum matching on rank one edges and 
which maximizes the number of applicants not 
matched to their last resort, first compute an 
arbitrary popular matching M’ and remove all 
edges of the form (a,/(a)) from M’ and from 
the graph G’. Call the resulting subgraph of G’ 
as H, Determine a maximum matching N in H 
by augmenting M’. N need not be a popular 
matching, since it need not be a maximum 
matching in the graph G’. However, this is easy 
to mend. Determine a maximum matching M 
in G’ by augmenting N. It is easy to show that 
M is a popular matching which maximizes the 
number of applicants not matched to their last 
resort. Since the algorithm takes O(,/nm) time, 
Theorem 4 is shown. 


Applications 


The bipartite matching problem with a graded 
edge set is well-studied in the economics litera- 
ture, see for example [1, 10, 12]. It models some 
important real-world problems, including the al- 
location of graduates to training positions [8], 
and families to government-owned housing [11]. 
The concept of a popular matching was first 
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introduced by Gardenfors [5] under the name 
majority assignment in the context of the stable 
marriage problem [4, 6]. 

Various other definitions of optimality have 
been considered. For example, a matching is 
Pareto-optimal [1, 2, 10] if no applicant can 
improve his/her allocation (say by exchanging 
posts with another applicant) without requiring 
some other applicant to be worse off. Stronger 
definitions exist: a matching is rank-maximal [9] 
if it allocates the maximum number of applicants 
to their first choice, and then subject to this, the 
maximum number to their second choice, and so 
on. A matching is maximum utility if it maxi- 
mizes )i(q,pyem Ua,p» Where Ug,p is the utility 
of allocating post p to applicant a. Neither rank- 
maximal nor maximum-utility matchings are nec- 
essarily popular. 
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Problem Definition 


Liu and Layland [11] introduced rate-monotonic 
scheduling in the context of the scheduling of 
recurrent real-time processes upon a computing 
platform comprising a_ single preemptive 
processor. 


Rate-Monotonic Scheduling 


The Periodic Task Model 

The periodic task abstraction models real-time 
processes that make repeated requests for compu- 
tation. As defined by Liu and Layland [11], each 
periodic task t; is characterized by an ordered 
pair of positive real-valued parameters (C;, 7;), 
where C; is the worst-case execution requirement 
and 7; the period of the task. The requests for 
computation that are made by task 1; (subse- 
quently referred to as jobs that are generated by 
Tj) satisfy the following assumptions: 


Al: 1;’s first job arrives at system start time (as- 
sumed to equal time zero), and subsequent 
jobs arrive every 7; time units, i.e., one job 
arrives at time instant k x 7; for all integer 
k>0. 

Each job needs to execute for at most C; 
time units, i.e., C; is the maximum amount 
of time that a processor would require to 
execute each job of t;, without interruption. 
Each job of t; must complete before the next 
job arrives. That is, each job of task t; must 
complete execution by a deadline that is T; 
time units after its arrival time. 

Each task is independent of all other tasks — 
the execution of any job of task 7; is not 
contingent on the arrival or completion of 
jobs of any other task 7;. 

A job of t; may be preempted on the pro- 
cessor without additional execution cost. In 
other words, if a job of 1; is currently execut- 
ing, then it is permitted that this execution be 
halted and a job of a different task t; begins 
execution immediately. 


A2: 
A3: 
Ad: 


AS: 


. * def * 
A periodic task system t = {11,72,...,T} is 


a collection of 1 periodic tasks. The utilization 
U(r) is defined as follows: 


U(r) = )IG/T. . 


i=1 


(1) 


Intuitively, this denotes the fraction of time that 
may be spent by the processor executing jobs of 
tasks in T, in the worst case. 
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The Rate-monotonic Scheduling 

Algorithm 

A (uniprocessor) scheduling algorithm de- 
termines which task executes on the shared 
processor at each time instant. If a scheduling 
algorithm is guaranteed to always meet all 
deadlines when scheduling a task system T, then 
T is said to be schedulable with respect to that 
scheduling algorithm. 

Many scheduling algorithms work as follows: 
at each time instant, they assign a priority to each 
job and select for execution the greatest-priority 
job with remaining execution. A static -priority 
(often called fixed-priority) scheduling algorithm 
for scheduling periodic tasks is one in which it is 
required that all the jobs of each periodic task be 
assigned the same priority. 

Liu and Layland [11] proposed the rate- 
monotonic (RM) _ Static-priority scheduling 
algorithm, which assigns priority to jobs 
according to the period parameter of the task that 
generates them: the smaller the period, the higher 
the priority. Hence, if T; < T; for two tasks 1; 
and t;, then each job of 7; has higher priority 
than all jobs of t; and hence any executing job 
of t; will be preempted by the arrival of one 
of t;’s jobs. Ties may be broken arbitrarily, but 
consistently — if 7; = T;, then either all jobs of 
Tj are assigned higher priority than all jobs of t; 
or all jobs of t; are assigned higher priority than 
all jobs of 7;. 


Key Results 


First, key results from the original paper by 
Liu and Layland [11] are presented. Following 
this, results extending the work of Liu and 
Layland [11] are summarized. 


Results from [11] 


Optimality. Liu and Layland were concerned 
with designing “good” static- priority scheduling 
algorithms. They defined a notion of optimality 
for such algorithms: a static-priority algorithm 
A is optimal if any periodic task system that is 
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schedulable with respect to some static-priority 
algorithm is also schedulable with respect to A. 

Liu and Layland obtained the following re- 
sult for the rate-monotonic scheduling algorithm 
(RM): 


Theorem 1 For periodic task systems, RM is an 
optimal static-priority scheduling algorithm. 


Schedulability testing. A schedulability test for 
a particular scheduling algorithm determines, for 
any periodic task system t, whether t is schedu- 
lable with respect to that scheduling algorithm. 
A schedulability test is said to be exact if it is 
the case that it correctly identifies all schedulable 
task systems and sufficient if it identifies some, 
but not necessarily all, schedulable task systems. 

In order to derive good schedulability tests 
for the rate-monotonic scheduling algorithm, Liu 
and Layland considered the concept of response 
time. The response time of a job is defined as 
the elapsed time between the arrival of a job and 
its completion time in a schedule; the response 
time of a task is defined to be the largest response 
time that may be experienced by one of its jobs. 
For static- priority scheduling, Liu and Layland 
obtained the following result on the response 
time: 


Theorem 2 The maximum response time for a 
periodic task t; occurs when a job of t; arrives 
simultaneously with jobs of all higher-priority 
tasks. Such a time instant is known as the critical 
instant for task T;. 


Observe that the critical instant of the lowest- 
priority task in a periodic task system is also a 
critical instant for all tasks of higher priority. An 
immediate consequence of the previous theorem 
is that the response time of each task in the pe- 
riodic task system can be obtained by simulating 
the scheduling of the periodic task system start- 
ing at the critical instant of the lowest-priority 
task. If the response time for each task 1; ob- 
tained from such simulation does not exceed 
T;, then the task system will always meet all 
deadlines when scheduled according to the given 
priority assignment. This argument immediately 
gives rise to a schedulability analysis test [9] for 
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any static-priority scheduling algorithm. Since 
the simulation may need to be carried out until 
max"_,{7;}, this schedulability test has run-time 
pseudo-polynomial in the representation of the 
task system: 


Theorem 3 (Lehoczky, Sha, and Ding [9]) Ex- 
act rate-monotonic schedulability testing of a 
periodic task system may be done in time pseudo- 
polynomial in the representation in the task sys- 
tem. 


Liu and Layland also derived a polynomial- 
time sufficient (albeit not exact) schedulability 
test for RM, based upon the utilization of the task 
system: 


Theorem 4 Let n denote the number of tasks in 
periodic task system t. If U(t) < n(2'/" — 1), 
then t is schedulable with respect to the RM 
scheduling algorithm. 


Results Since [11] 

The utilization-bound sufficient schedulability 
test (Theorem 4) was shown to be tight in the 
sense that for all n, there are unschedulable 
task systems comprising 7 tasks with utilization 
exceeding n(2!/" — 1) by an arbitrarily small 
amount. However, tests have been devised that 
exploit more knowledge about tasks’ period 
parameters. For instance, Kuo and Mok [8] 
provide a potentially superior utilization bound 
for task systems in which the task period 
parameters tend to be harmonically related — 
exact multiples of one another. Suppose that 
a collection of numbers is said to comprise 
a harmonic chain if for every two numbers 
in the set, it is the case that one is an exact 
multiple of the other. Let 7 denote the minimum 
number of harmonic chains into which the 
period parameters {7;}"_, of tasks in t may be 
partitioned; a sufficient condition for task system 
tT to be RM schedulable is that 


U(r) < a(2l/*® -— 1). 


Since <n for all task systems T, this utilization 
bound above is never inferior to the one in Theo- 
rem 4 and is superior for all t for which n <n. 
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A different polynomial-time schedulability 
test was proposed by Bini, Buttazzo, and 
Buttazzo [4]: they showed that 


T}_,(Ci/T;) +1) < 2 


is sufficient to guarantee that the periodic task 
system {T1, T2,..., Tr} is rate-monotonic schedu- 
lable. This test is commonly referred to as the 
hyperbolic schedulability test for rate-monotonic 
schedulability. The hyperbolic test is in general 
known to be superior to the utilization-based test 
of Theorem 4 — see [4] for details. 

Other work done since the seminal paper of 
Liu and Layland has focused on relaxing the 
assumptions of the periodic task model. The 
(implicit-deadline) sporadic task model relaxed 
assumption A17 by allowing 7; to be the mini- 
mum (rather than exact) separation between ar- 
rivals of successive jobs of task 7;. It turns out that 
the Theorems |—4 continue to hold for systems of 
such tasks as well. 

A more general sporadic task model has also 
been studied that relaxes assumption A17 in ad- 
dition to assumption A17, by allowing for the 
explicit specification of a deadline parameter for 
each task (which may differ from the task’s pe- 
riod). The deadline-monotonic scheduling algo- 
rithm [10] generalizes rate-monotonic scheduling 
to such task systems. 

Work has also been done [2, 12] in removing 
the independence assumption of A4, by allowing 
for different tasks to use critical sections to access 
non-preemptable serially reusable resources. 


Applications 


The periodic task model has been invaluable 
for modeling several different types of systems. 
For control systems, the periodic task model is 
well suited for modeling the periodic requests 
and computations of sensors and actuators. 
Multimedia and network applications also 
typically involve computation of periodically 
arriving packets and data. 

Many of the results described in section “Key 
Results” above have been integrated into 
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powerful tools, techniques, and methodologies 
for the design and analysis of real-time applica- 
tion systems [1, 7]. The general methodology 
framework is commonly referred to as the 
rate-monotonic analysis (RMA) methodology. 
Furthermore, most operating systems pro- 
vide standard primitives for supporting rate- 
monotonic scheduling. 


Open Problems 


There are plenty of interesting and challenging 
open problems in real-time scheduling theory; 
however, most of these are concerned with 
extensions to the basic task and scheduling 
model considered in the original Liu and Layland 
paper [11]. Perhaps the most interesting open 
problem with respect to the task model in [11] 
is regarding the computational complexity 
of schedulability analysis of  static-priority 
scheduling. Recent research by Eisenbrand and 
RothvoB [5] has shown that determining the 
maximum response time of any periodic task 
is NP-hard. This result shows that any exact 
schedulability test that utilizes response time 
cannot run in polynomial time (unless P= NP); 
however, it does not settle the open question of 
whether there are polynomial-time schedulability 
tests for static-priority periodic task systems that 
do not (implicitly or explicitly) calculate task 
response time. 


URLs to Code and Data Sets 


Research efforts have been made to develop 
a standardized methodology for evaluating 
the efficacy and efficiency of algorithms and 
analysis proposed for rate-monotonic scheduling 
problems. Bini and Buttazzo [3] derived an 
unbiased method for synthetically generating 
random periodic task systems  (http://retis. 
sssup.it/~bini/publications/2005BinBut.html). 

Additionally, researchers have proposed suites 
of benchmarks as representative of embedded 
and real-time applications in practice. Notably, 
the Malardalen WCET benchmarks [6] (http:// 
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www.mrtc.mdh.se/projects/wcet/benchmarks. 
html) maintain a collection of programs that 
are typical for real-time applications. 
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Problem Definition 


Given a set of n points in a plane, a spanning tree 
is a set of edges that connects all the points and 
contains no cycles. When each edge is weighted 
using some distance metric of the incident points, 
the metric minimum spanning tree is a tree whose 
sum of edge weights is minimum. If the Eu- 
clidean distance (L2) is used, it is called the Eu- 
clidean minimum spanning tree; if the rectilinear 
distance (L1) is used, it is called the rectilinear 
minimum spanning tree. 

Since the minimum spanning tree problem 
on a weighted graph is well studied, the usual 
approach for metric minimum spanning tree is to 
first define a weighted graph on the set of points 
and then to construct a spanning tree on it. 

Much like a connection graph is defined for 
the maze search [4], a spanning graph can be 
defined for the minimum spanning tree construc- 
tion. 
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Definition 1 Given a set of points V in a plane, 
an undirected graph G = (V,E) is called a 
spanning graph if it contains a minimum span- 
ning tree of V in the plane. 


Since spanning graphs with fewer edges give 
more efficient minimum spanning tree construc- 
tion, the cardinality of a spanning graph is de- 
fined as its number of edges. It is easy to see 
that a complete graph on a set of points contains 
all spanning trees, thus is a spanning graph. 
However, such a graph has a cardinality of O(n). 
A rectilinear spanning graph of cardinality O(n) 
can be constructed within O(n log n) time [6] and 
will be described here. 

Minimum spanning tree algorithms usually 
use two properties to infer the inclusion and 
exclusion of edges in a minimum spanning tree. 
The first property is known as the cut property. 
It states that an edge of smallest weight crossing 
any partition of the vertex set into two parts 
belongs to a minimum spanning tree. The second 
property is known as the cycle property. It says 
that an edge with largest weight in any cycle 
in the graph can be safely deleted. Since the 
two properties are stated in connection with the 
construction of a minimum spanning tree, they 
are useful for a spanning graph. 


Key Results 


Using the terminology given in [3], the unique- 
ness property is defined as follows. 


Rectilinear Spanning a 
Tree, Fig. 1 Octal 

partition and the 

uniqueness property 
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Definition 2 Given a point s, a region R has the 
uniqueness property with respect to s if for every 
pair of points p,q € R, ||pq|| < max(||sp|| , 
\|sq||). A partition of space into a finite set of 
disjoint regions is said to have the uniqueness 
property with respect to s if each of its regions 
has the uniqueness property with respect to s. 


The notation ||sp|| is used to represent the 
distance between s and p under the L; metric. 
Define the octal partition of the plane with re- 
spect to s as the partition induced by the two 
rectilinear lines and the two 45° lines through s, 
as shown in Fig. la. Here, each of the regions R, 
through Rg includes only one of its two bounding 
half lines as shown in Fig. 1b. It can be shown that 
the octal partition has the uniqueness property. 


Lemma 1 Given a point s in the plane, the octal 
partition with respect to s has the uniqueness 


property. 


Proof To show a partition has the uniqueness 
property, it needs to prove that each region of the 
partition has the uniqueness property. Since the 
regions R, through Rg are similar to each other, 
a proof for R; will be sufficient. 

The points in R; can be characterized by the 
following inequalities: 


x z= Xs; 
X—y<Xs— Vs. 
Suppose there are two points p and q in Rj. 


Without loss of generality, it can be assumed 
Xp S Xq- If yp < Yq, then ||sq|| = |Isp|] + 
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llpq|| > |lpgq||. Therefore it only needs to 
consider the case when yp > yq. In this case, 


xp —Xql +|¥p —Yol 


II 


Ilpsll 
= %Xq—Xp + Vp Yq 
= (Xq-— Yq) + Vp — Xp 
< (Xs — Ys) + Vp — Xs 
=J\p— Vs 


SXp—Xs + Vp— Vs 


II 


Isp]. 


Given two points p, g in the same octal region 
of point s, the uniqueness property says that 
l|pq|| < max(||sp||, ||sq||). Consider the cycle 
on points s, p, and g. Based on the cycle property, 
only one point with the minimum distance from 
s needs to be connected to s. An interesting 
property of the octal partition is that the contour 
of equidistant points from s forms a line segment 
in each region. In regions R;, R2, Rs, and Re, 
these segments are captured by an equation of the 
form x + y = c; in regions R3, R4, R7, and Rg, 
they are described by the form x — y =. 

From each point s, the closest neighbor in each 
octant needs to be found. It will be described how 
to efficiently compute the neighbors in R, for all 
points. The case for other octant is symmetric. 
For the R, octant, a sweep line algorithm will run 
on all points according to nondecreasing x + y. 
During the sweep, maintained will be an active 
set consisting of points whose nearest neighbors 
in R, are yet to be discovered. When a point p 
is processed, all points in the active set that have 
p in their Rj regions will be found. If s is sucha 
point in the active set, since points are scanned in 
nondecreasing x + y, then p must be the nearest 
point in R, for s. Therefore, the edge sp will 
be added and s will be deleted from the active 
set. After processing those active points, the point 
p will be added into the active set. Each point 
will be added and deleted at most once from the 
active set. 

A fundamental operation in the sweep line 
algorithm is to find a subset of active points 
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such that a given point p is in their R; regions. 
Based on the observation that point p is in the 
R, region of point s if and only if s is in the 
Rs region of p, it needs to find the subset of 
active points in the Rs region of p. Since Rs 
can be represented as a two-dimensional range 
(—00, Xp] X (Xp — Vp, +00) on (x,x — y), a 
priority search tree [1] can be used to maintain the 
active point set. Since each of the insertion and 
deletion operations takes O(log) time, and the 
query operation takes O(logn + k) time where 
k is the number of objects within the range, the 
total time for the sweep is O(n login). Since other 
regions can be processed in the similar way as in 
R,, the algorithm is running in O(v logn) time. 
Priority search tree is a data structure that relies 
on maintaining a balanced structure for the fast 
query time. This works well for static input sets. 
When the input set is dynamic, rebalancing the 
tree can be quite challenging. Fortunately, the 
active set has a structure that can be explored 
for an alternate representation. Since a point is 
deleted from the active set if a point in its R, 
region is found, no point in the active set can be 
in the R; region of another point in the set. 


Lemma 2 For any two points p, q in the active 
set, it must be Xp F Xg, and if Xp < Xgq, then 
Xp — Vp SXq— Yq: 


Based on this property, the active set can be 
ordered in increasing order of x. This implies a 
nondecreasing order on x — y. Given a point s, 
the points which have s in their Ry region must 
obey the following inequalities: 


XS Xz, 


X—Y>Xs—Vs. 


To find the subset of active points which have 
s in their Ry regions, it can first find the largest x 
such that x < xs and then proceed in decreasing 
order of x until x — y > xs — ys. Since the 
ordering is kept on only one dimension, using 
any binary search tree with O(logn) insertion, 
deletion, and query time will also give us an 
O(n logn) time algorithm. Binary search trees 
also need to be balanced. An alternative is to use 
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skip lists [2] which use randomization to avoid 
the problem of explicit balancing but provide 
O(log n) expected behavior. 

A careful study also shows that after the sweep 
process for R;, there is no need to do the sweep 
for Rs, since all edges needed in that phase 
are either connected or implied. Moreover, based 
on the information in Rs, the number of edge 
connections can be further reduced. When the 
sweep step processes point s, it finds a subset of 
active points which have s in their Rj regions. 
Without lost of generality, suppose p and q are 
two of them. Then p and g are in the Rs region 
of s, which means ||pq|| < max(||sp|| , ||sq|[). 
Therefore, it needs only to connect s with the 
nearest active point. 

Since R; and Rz have the same sweep se- 
quence, they can be processed together in one 
pass. Similarly, R3 and R,4 can be processed 
together in another pass. Based on the above 
discussion, the pseudo-code of the algorithm is 
presented in Fig. 2. 

The correctness of the algorithm is stated in 
the following theorem. 


Theorem 1 Given n points in the plane, the 
rectilinear spanning graph algorithm constructs 
a spanning graph in O(nlogn) time, and the 
number of edges in the graph is O(n). 


Proof The algorithm can be considered as delet- 
ing edges from the complete graph. As described, 
all deleted edges are redundant based on the cycle 
property. Thus, the output graph of the algorithm 
will contain at least one rectilinear minimum 
spanning tree. 


Rectilinear Spanning 
Tree, Fig.2 The 
rectilinear spanning graph 
algorithm 
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In the algorithm, each given point will be 
inserted and deleted at most once from the active 
set for each of the four regions R; through Rq. 
For each insertion or deletion, the algorithm re- 
quires O(log 7) time. Thus, the total time is upper 
bounded by O(nlogn). The storage is needed 
only for active sets, which is at most O(n). 


Applications 


Rectilinear minimum spanning tree problem has 
wide applications in VLSI CAD. It is frequently 
used as a metric of wire length estimation during 
placement. It is often constructed to approximate 
a minimum Steiner tree and is also a key step 
in many Steiner tree heuristics. It is also used 
in an approximation to the traveling salesperson 
problem which can be used to generate scan 
chains in testing. It is important to emphasize that 
for real- world applications, the input sizes are 
usually very large. Since it is a problem that will 
be computed hundreds of thousands times and 
many of them will have very large input sizes, the 
rectilinear minimum spanning tree problem needs 
a very efficient algorithm. 


Experimental Results 


The experimental results using the rectilinear 
spanning graph (RSG) followed by Kruskal’s al- 
gorithm for a rectilinear minimum spanning tree 
were reported in Zhou et al. [5]. Two other ap- 
proaches were compared. The first approach used 


Rectilinear Spanning Graph Algorithm 
for (i =0;i1<2;i++4) { 


if (i == 0) sort points according to x + y; 

else sort points according to x — y; 

A[1] = A[2] = 0; 

for each point p in the order { 
find points in A[1], A[2] such that p is in their 

Ro;4, and Ro;49 regions, respectively; 

connect p with the nearest point in each subset; 
delete the subsets from A[1], A[2], respectively; 
add p to A[1], A[2]; 
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Rectilinear Spanning Tree, Table 1 Experimental results 


Input Complete 

Orig Distinct | #edge Time 
1,000 999 498,501 5.095 s 
2,000 1,996 1,991,010 24.096 s 
4,000 3,995 7,978,015 2 min 7.233 s 
6,000 5,991 17,943,045 5 min 54.697 s 
8,000 7,981 31,844,190 13 min 7.682 s 

10,000 9,962 49,615,741 - 

12,000 = 11,948 - - 

14,000 | 13,914 - - 

16,000 15,883 - - 

18,000 = 17,837 - - 

20,000 19,805 - - 


the complete graph on the point set as the input 
to Kruskal’s algorithm. The second approach is 
an implementation of concepts described in [3]; 
namely, for each point, scan all other points but 
only connect the nearest one in each quadrant 
region. With sizes ranging from 1,000 to 20,000, 
randomly generated point sets were used in the 
experiments. The results are reproduced here in 
Table 1. The first column gives the number of 
generated points; the second column gives the 
number of distinct points. For each approach, the 
number of edges in the given graph and the total 
running time are reported. For input size larger 
than 10,000, the complete graph approach simply 
runs out of memory. 
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Bound degree RSG 
#edge Time #edge Time 
3,878 0.299 s 2,571 0.112s 
7,825 0.996 s 5,158 0.218 s 
15,761 3.452 10,416 0.337 s 
23,704 7.5158 15,730 0.503 s 
31,624 13.141s 21,149 0.672 s 
39,510 20.135 s 26,332 0.934 s 
47,424 32.300 s 31,586 1.052 s 
55,251 46.842 s 36,853 1.322s 
63,089 1 min 3.759 s 42,251 1.486 s 
70,876 Imin19.812s | 47,511 1.701 s 
78,723 1min 45.792s | 52,732 1.907 s 
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Problem Definition 


Given n points on a plane, a Steiner minimal tree 
connects these points through some extra points 
(called Steiner points) to achieve a minimal total 
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length. When the length between two points is 
measured by the rectilinear distance, the tree is 
called a rectilinear Steiner minimal tree. 

Because of its importance, there is much pre- 
vious work to solve the SMT problem. These 
algorithms can be grouped into two classes: exact 
algorithms and heuristic algorithms. Since SMT 
is NP-hard, any exact algorithm is expected to 
have an exponential worst-case running time. 
However, two prominent achievements must be 
noted in this direction. One is the GeoSteiner 
algorithm and implementation by Warme, Winter, 
and Zacharisen [14, 15], which is the current 
fastest exact solution to the problem. The other 
is a Polynomial Time Approximation Scheme 
(PTAS) by Arora [1], which is mainly of theo- 
retical importance. Since exact algorithms have 
long running time, especially on large input sizes, 
much more previous efforts were put on heuristic 
algorithms. Many of them generate a Steiner 
tree by improving on a minimal spanning tree 
topology [7], since it was proved that a mini- 
mal spanning tree is a 3/2 approximation of a 
SMT [8]. However, since the backbones are re- 
stricted to the minimal spanning tree topology in 
these approaches, there is a reported limit on the 
improvement ratios over the minimal spanning 
trees. The iterated 1-Steiner algorithm by Kahng 
and Robins [10] is an early approach to deviate 
from that restriction, and an improved implemen- 
tation [6] is a champion among such programs 
in public domain. However, the implementation 
in [10] has a running time of O(n* logan), and 
the implementation in [6] has a running time 
of O(n). A much more efficient approach was 
later proposed by Borah et al. [2]. In their ap- 
proach, a spanning tree is iteratively improved 
by connecting a point to an edge and deleting 
the longest edge on the created circuit. Their 
algorithm and implementation had a worst-case 
running time of O(n”), even though an alter- 
native O(n logn) implementation was also pro- 
posed. Since the backbone is no longer restricted 
to the minimal spanning tree topology, its perfor- 
mance was reported to be similar to the iterated 
1-Steiner algorithm [2]. A recent effort in this 
direction is a new heuristic by Mandoiu et al. 
[11] which is based on a 3/2 approximation al- 
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gorithm of the metric Steiner tree problem on 
quasi-bipartite graphs [12]. It performs slightly 
better than the iterated 1-Steiner algorithm, but 
its running time is also slightly longer than the 
iterated 1-Steiner algorithm (with the empty rect- 
angle test [11] used). More recently, Chu [3] and 
Chu and Wong [4] proposed an efficient lookup 
table- based approach for rectilinear Steiner tree 
construction. 


Key Results 


The presented algorithm is based on the edge 
substitution heuristic of Borah et al. [2]. The 
heuristic works as follows. It starts with a min- 
imal spanning tree and then iteratively considers 
connecting a point (e.g., p in Fig. 1) to a nearby 
edge (e.g., (a, b)) and deleting the longest edge 
((b, c)) on the circuit thus formed. The algorithm 
employs the spanning graph [17] as a backbone 
of the computation: it is first used to generate the 
initial minimal spanning tree and then to gener- 
ate point-edge pairs for tree improvements. This 
kind of unification happens also in the spanning 
tree computation and the longest edge compu- 
tation for each point-edge pair: using Kruskal’s 
algorithm with disjoint set operations (instead 
of Prim’s algorithm) [5] will unify these two 
computations. 

In order to reduce the number of point-edge 
pair candidates from O(n”) to O(n), Borah et al. 
suggested to use the visibility of a point from an 
edge, that is, only a point visible from an edge 
can be considered to connect to that edge. This 
requires a sweep line algorithm to find visibility 
relations between points and edges. In order to 
skip this complex step, the geometrical proximity 
information embedded within the spanning graph 
is leveraged. Since a point has eight nearest points 
connected around it, it is observed that if a point 
is visible to an edge, then the point is usually con- 
nected in the graph to at least one end point. In the 
algorithm, the spanning graph is used to generate 
point-edge pair candidates. For each edge in the 
current tree, all points that are neighbors of either 
of the end points will be considered to form point- 
edge pairs with the edge. Since the cardinality 
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Rectilinear Steiner Tree, 
Fig. 1 Edge substitution 
by Borah et al. 


Rectilinear Steiner Tree, 
Fig.2 A minimal 
spanning tree and its 
merging binary tree 


of the spanning graph is O(n), the number of 
possible point-edge pairs generated in this way 
is also O(n). 

When connecting a point to an edge, the 
longest edge on the formed circuit needs to 
be deleted. In order to find the corresponding 
longest edge for each point-edge pair efficiently, 
it explores how the spanning tree is formed 
through Kruskal’s algorithm. This algorithm 
first sorts the edges into nondecreasing lengths, 
and each edge is considered in turn. If the end 
points of the edge have been connected, then the 
edge will be excluded from the spanning tree; 
otherwise, it will be included. The structure of 
these connecting operations can be represented 
by a binary tree, where the leaves represent the 
points and the internal nodes represent the edges. 
When an edge is included in the spanning tree, 
a node is created representing the edge and has 
as its two children the trees representing the two 
components connected by this edge. To illustrate 
this, a spanning tree with its representing binary 
tree is shown in Fig. 2. As can be seen, the longest 
edge between two points is the least common 
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ancestor of the two points in the binary tree. 
For example, the longest edge between p and 
b in Fig.2 is (b, c), which is the least common 
ancestor of p and b in the binary tree. To find the 
longest edge on the circuit formed by connecting 
a point to an edge, it needs to find the longest 
edge between the point and one end point of 
the edge that are in the same component before 
connecting the edge. For example, consider the 
pair p and (a, b); since p and b are in the same 
component before connecting (a, b), the edge 
that needs to be deleted is the longest between p 
and b. 

Based on the above discussion, the pseudo- 
code of the algorithm can be described in Fig. 3. 
At the beginning of the algorithm, Zhou et al.’s 
rectilinear spanning graph algorithm [17] is used 
to generate the spanning graph G for the given 
set of points. Then, Kruskal’s algorithm is used 
on the graph to generate a minimal spanning tree. 
The data structure of disjoint sets [5] is used to 
merge components and check whether two points 
are in the same component (the first for loop). 
During this process, the merging binary tree and 
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Rectilinear Steiner Tree, 
Fig.3 The rectilinear 


Steiner tree algorithm t = (hh 
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Rectilinear Steiner Tree (RST) Algorithm 


Generate the spanning graph G by RSG algorithm; 


for (each edge (uw, v) € G in non-decreasing length) { 
s1 = find_set(w); s2 = find_set(v); 
if (s1 ! = 52) { 
add (u, v) in tree T; 


for (each neighbor w of u, v in G) 


if (s1 == find_set(w)) 


lca_add_query(w, v, (u, v)); 


else lca_add_query(w, v, (u, v)); 


lca_tree_edge((u, v), sl.edge); 


lca_tree_edge((u, v), s2.edge); 


s = union_set(s1, s2); s.edge = (u, v) ; 


} 


generate point-edge pairs by lca_answer_queries; 


for (each pair (p, (a, b), (c, d)) in non-increasing positive gains) 
if ((a, 6), (c, d) has not been deleted from T) { 
connect p to (a, b) by adding three edges to T; 
delete (a, b), (c, d) from T; 


the queries for least common ancestors of all 
point-edge pairs are also generated. Here, s, 1, 
and s2 represent disjoint sets, and each records 
the root of the component in the merging binary 
tree. For each edge (u, v) adding to T, each 
neighbor w of either wu or v will be considered to 
connect to (u, v). The longest edge for this pair 
is the least common ancestor of w, u or w, v de- 
pending on which point is in the same component 
as w. The procedure Ica_add_query is used to 
add this query. Connecting the two components 
by (u, v) will also be recorded in the merging 
binary tree by the procedure /ca_tree_edge. After 
generating the minimal spanning tree, it also has 
the corresponding merging binary tree and the 
least common ancestor queries ready. Using Tar- 
jan’s off-line least common ancestor algorithm 
[5] (represented by /ca_answer_queries), it can 
generate all longest edges for the pairs. With 
the longest edge for each point-edge pair, the 
gain of connecting the point to the edge can 


be calculated. Then, each of the point to edge 
connections will be realized in a nonincreasing 
order of their gains. A connection can only be 
realized if both the connection edge and deletion 
edge have not been deleted yet. 

The running time of the algorithm is domi- 
nated by the spanning graph generation and edge 
sorting, which take O(n logn) time. Since the 
number of edges in the spanning graph is O(7), 
both Kruskal’s algorithm and Tarjan’s off-line 
least common ancestor algorithm take O(na(n)) 
time, where a(n) is the inverse of Ackermann’s 
function, which grows extremely slow. 


Applications 


The Steiner minimal tree (SMT) problem has 
wide applications in VLSI CAD. A SMT is gen- 
erally used in initial topology creation for non- 
critical nets in physical synthesis. For timing 
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critical nets, minimization of wire length is gen- 
erally not enough. However, since most nets are 
noncritical in a design and a SMT gives the 
most desirable route of such a net, it is often 
used as an accurate estimation of congestion and 
wire length during floor planning and placement. 
This implies that a Steiner tree algorithm will 
be invoked millions of times. On the other hand, 
there exist many large pre-routes in modern VLSI 
design. The pre-routes are generally modeled as 
large sets of points, thus increasing the input sizes 
of the Steiner tree problem. Since the SMT is a 
problem that will be computed millions of times 
and many of them will have very large input sizes, 
highly efficient solutions with good performance 
are desired. 


Experimental Results 


As reported in [16], the first set of experiments 
were conducted on a Linux system with a 
928 MHz Intel Pentium II processor and 512M 
memory. The RST algorithm was compared with 
other publicly available programs: the exact 
algorithm GeoSteiner (version 3.1) by Warme, 
Winter, and Zacharisen [14]; the Batched Iterated 
1-Steiner (BIS) by Robins; and the Borah et al.’s 
algorithm implemented by Madden (BO/J). 

Table 1 gives the results of the first set of 
experiments. For each input size ranging from 
100 to 5,000, 30 different test cases are ran- 
domly generated through the rand_points pro- 
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gram in GeoSteiner. The improvement ratios of 
a Steiner tree St over its corresponding mini- 
mal spanning tree MST are defined as 100 x 
(MST - St) /MST. For each input size, the 
average of the improvement ratios and the av- 
erage running time (in seconds) on each of the 
programs are reported. As can be seen, RST 
always gives better improvements than BOT with 
less running times. 

The second set of experiments compared 
RST with Borah’s implementation of Borah 
et al.’s algorithm (Borah), Rohe’s Prim-based 
algorithm (Rohe) [13], and Kahng et al.’s Batched 
Greedy Algorithm (BGA) [9]. They were run on 
a different Linux system with a 2.4GHz Intel 
Xeon processor and 2 G memory. Besides the 
randomly generated test cases, the VLSI industry 
test cases used in [9] were also used. The results 
are reported in Table 2. 
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Input GeoSteiner BUS BOI RST 
size Improve Time Improve Time Improve Time Improve Time 
100 11.440 0.487 10.907 0.633 9.300 0.0267 | 10.218 0.004 
200 11.492 3.557 10.897 4.810 9.192 0.1287 10.869 0.020 
300 11.492 12.685 10.931 18.770 9.253 0.2993 | 10.255 0.041 
500 11.525 72.192 - - 9.274 0.877 10.381 0.084 
800 11.343 536.173 - - 9.284 2.399 10.719 0.156 
1,000 - - - - 9.367 4.084 10.433 0.186 
2,000 - - - - 9.326 31.098 10.523 0.381 
3,000 - - - - 9.390 104.919 10.449 0.771 
5,000 - - - - 9.356 307.977 10.499 1.330 
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Rectilinear Steiner Tree, Table 2, Comparison with other algorithms II 
Input BGA Borah Rohe RST 
eee Improve Time Improve Time Improve | Time Improve Time 
Randomly generated testcases 
100 10.272 0.006 10.341 0.004 9.617 0.000 10.218 0.002 
500 10.976 0.068 10.778 0.178 | 10.028 0.010 10.381 0.041 
1,000 10.979 0.162 10.829 0.689 9.768 0.020 10.433 0.121 
5,000 11.012 1.695 | 11.015 25.518 10.139 0.130 10.499 0.980 
10,000 11.108 4.135 11.101 249.924 10.111 0.310 10.559 2.098 
50,000 11.120 59.147 - - 10.109 1.890 10.561 13.029 
100,000 11.098 161.896 - - 10.079 4.410 10.514 28.527 
500,000 - - - - 10.059 27.210 10.527 175.725 
VLSI testcases 
337 6.434 0.035 6.503 0.037 5.958 0.010 5.870 0.016 
830 3.202 0.070 3.185 0.213 3.102 0.020 2.966 0.033 
1,944 7.850 0.342 7.772 2.424 6.857 0.040 7.533 0.238 
2,437 7.965 0.549 7.956 4.502 7.094 0.050 7.595 0.408 
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Problem Definition 


Graph decompositions are the basis for many 
divide-and-conquer algorithms. Two main prop- 
erties make a decomposition useful. The first is 
balance, namely, that the parts of the decompo- 
sition have roughly the same size. Balanced de- 
compositions lead to logarithmic depth recursion. 
The second is small overlap between the parts of 
the decomposition. The overlap affects the time it 
takes to combine solutions of different parts into 
a solution for the union of the parts. 

A decomposition of a graph G is a collection 
of subgraphs of G, called regions, whose union is 
G. A decomposition tree of G is a tree T whose 
nodes correspond to subgraphs of G. The root of 
T consists of the entire graph G. For a node v of 
T that corresponds to a subgraph R, the children 
V1, U2,..., Ug Of v correspond to subgraphs of R 
whose union is R. Every maximal set D of nodes 
of 7, such that no node in D is an ancestor of 
another (i.e., every maximal antichain in 7 with 
respect to the ancestry partial order), corresponds 
to a decomposition of G. 

A vertex v of G that belongs to a unique region 
R in a decomposition is called an interior vertex 
(of R). A vertex uv that belongs to more than one 
region is called a boundary vertex. 

Let G be a graph with n vertices. Given 
a parameter r < n, an r-division of G is a 
decomposition of G into O(n/r) regions, each 
with at most r vertices and O(./r) boundary 
vertices. The bounds on the number of regions 
and on the number of vertices in each region 
imply that an r-division is a balanced decompo- 
sition of G. The O(./r) bound on the number 
of boundary vertices immediately implies the 
same bound for the overlap between different 
regions in the decomposition. For an increasing 
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sequence r = 1,/2,..., a recursive r-division 
is a decomposition tree 7 in which the nodes at 
height 7 form an 7;-division. 

For various applications it is useful to impose 
additional requirements, such as requiring re- 
gions to be connected, requiring that each region 
share vertices with a constant number of other 
regions, etc. One particularly useful requirement 
that is relevant to planar graphs is that the bound- 
ary vertices of each region lie on a small number 
of faces. Formally, every region R inherits its 
embedding from that of the planar graph G. A 
hole of R is a face of R that is not a face of 
G. An r-division with few holes is an r-division 
in which each region has a constant number of 
holes. 


Key Results 


Balanced graph decompositions with small over- 
lap are based on small balanced separators. An 
n-vertex graph G has a f(n)-separator if there 
exists a partition A, B, S of the vertices of G, 
such that the size of S is at most f(m), the sizes 
of A and B are at most 2/3, and no edge exists 
between A and B. The set S is called a separator. 
The subgraphs induced on AUS and BUS forma 
balanced decomposition of G into 2 regions with 
Ff (n) boundary vertices. 

The best-known separator result for planar 
graphs is the O(./n) vertex separator of Lipton 
and Tarjan [16]. Consider a breadth-first-search 
tree T of a planar graph G. Each BFS level 
(i.e., the set of vertices at a specific distance 
from the root) is a separator of G, albeit not 
necessarily a small or a balanced one. Lipton and 
Tarjan’s separator is based on the observation that 
it is possible to construct an O(./n) balanced 
separator by combining two appropriately chosen 
BFS levels with a fundamental cycle with respect 
to the BFS tree 7. 


Theorem 1 (Lipton-Tarjan separator) Let G 
be an n-vertex planar graph, equipped with non- 
negative vertex weights summing to one. There 
exists a linear-time algorithm that returns a sep- 
aration A, B,S of G such that S consists of at 
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most 2/2n vertices, and neither A nor B has 
total weight exceeding 2/3. 


Note that the formulation of the theorem allows 
for a balanced separation with respect to a general 
weight function, rather than just with respect to 
the number of vertices. 

Miller gave an O(./n) simple cycle separa- 
tor [19] for planar graphs. Miller’s result can 
be viewed as a version of Lipton and Tarjan’s 
separator applied to the planar dual of G. 


Theorem 2 (Miller’s cycle separator) Let G be 
an n-vertex 2-connected planar graph, equipped 
with nonnegative face weights summing to 1, such 
that no face weighs more than 2/3. Let d denote 
the maximum over all face sizes in G. There exists 
a linear-time algorithm that returns a simple 
cycle C with at most 2,/2|d/2|n vertices, such 
that neither the interior of C nor the exterior of 
C has total weight exceeding 2/3. 


Similar formulations exist for vertex and edge 
weights. 

O(./n) separators are known for other fami- 
lies of sparse graphs, such as graphs excluding a 
fixed minor [1, 11]. However, some sparse graphs 
(e.g., expanders) do not have small separators. 

By applying a separator recursively, Fred- 
erickson [6] showed that r-divisions exist for 
graphs with O(./n) separators. Frederickson’s 
construction generates a decomposition tree 
whose leaves correspond to the regions of an 
r-division. It consists of two phases. In the first 
phase, the separator theorem is applied to each 
region consisting of more than 7 vertices. This 
results in O(n/r) regions, each with at most r 
vertices and ,/r boundary vertices on average. In 
a second phase, the separator theorem is applied 
to each region with more than ./r boundary 
vertices, assigning weight only to boundary 
vertices. Frederickson proves that this two- 
phase process results in an r-division. A naive 
implementation of Frederickson’s approach to 
construct an r-division takes O(n log) time. By 
applying this approach to a contracted graph, and 
then further subdividing some of the resulting 
regions, an r-division can be constructed in 
O(nlogr + (n/./r) logn) time [6]. 
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Goodrich [7] showed that for planar graphs 
an entire binary decomposition tree whose leaves 
correspond to regions with a constant number 
of vertices can be computed in O(n) time. This 
is achieved by showing that, after linear time 
preprocessing, each invocation of Lipton and Tar- 
jan’s separator theorem can be implemented in 
sublinear time in the number of vertices of a 
region. The key components of Goodrich’s algo- 
rithm are the use of a tree-cotree pair of spanning 
trees of G and of its planar dual to facilitate 
the search for balanced fundamental cycles, rep- 
resenting these trees using dynamic trees [22], 
and the use of balanced binary search trees to 
maintain BFS levels. At each recursive iteration 
a separator is found in a region with n’ vertices 
in O(V/n’ logn’) time. This leads to a total linear 
running time for computing a complete decom- 
position tree. 

Subramanian and Klein [12] were the first 
to suggest r-divisions with few holes in planar 
graphs. The idea for achieving a constant number 
of holes is to use Miller’s simple cycle separa- 
tor instead of Lipton and Tarjan’s. To keep the 
number of holes constant, one needs to alternate 
the separation criterion between balance with 
respect to the number of vertices and balance 
with respect to the number of holes [5]. Since a 
cycle separator introduces at most one new hole 
into each of the resulting two regions, reducing 
the number of holes by a constant factor every 
constant number of iterations ensures that the 
number of holes in each region is bounded by a 
(small) constant. Using Frederickson’s approach, 
an r-division with few holes can be constructed 
in O(nlogr + (n/./r) logn) [9]. 

Klein, Mozes, and Sommer [14] presented 
a modified version of Miller’s cycle separator 
and used it to obtain a linear-time algorithm for 
computing r-divisions with few holes in planar 
graphs (see also [2] for a similar result). Follow- 
ing Miller, their cycle separator algorithm uses 
BFS levels in the planar dual of G. They show a 
choice of a spanning tree T of G that makes their 
cycle separator algorithm very similar to Lipton 
and Tarjan’s vertex separator. Using a technique 
similar to that of Goodrich, they use this cycle 
separator algorithm to generate an entire decom- 
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position tree of G in linear time. They show that 
alternating between three balance criteria (num- 
ber of vertices, number of boundary vertices, and 
number of holes) results in a decomposition tree 
that contains an r-division for any value of r. 
This is in contrast with Frederickson’s two-phase 
construction which targets a specific value of r. 
As a consequence, the resulting decomposition 
tree also contains a recursive r-division, for prac- 
tically any choice of r. 


Theorem 3 There exists a linear-time algorithm 
that, given a planar graph G and an increasing 
sequence Yr, computes a recursive Y-division with 


few holes of G. 


Applications 


Separator-based decompositions are wildly used 
in divide-and-conquer algorithms. Lipton and 
Tarjan [17] used their separator theorem to 
show a variety of approximation algorithms 
and subexponential-time algorithms for NP-hard 
problems such as the maximum independent 
set, as well as O(n°/?)-time algorithms for 
problems such as maximum matching and 
Gaussian elimination [18]. These recursive 
algorithms implicitly generate a complete binary 
decomposition tree of the input graph. Typically, 
such algorithms only use the existence of small 
balanced separators and do not rely on planarity. 
Hence, they are applicable to families of graphs 
other than planar graphs. 
Frederickson introduced 
computing shortest paths in a planar graph with 
nonnegative arc lengths in O(n ,/logn) time and 
for finding a minimum st-cut or a maximum st- 
flow in undirected planar graphs in O(n logn) 
time. Since then r-divisions were used, along 
with Goodrich’s linear-time construction, in 
many algorithms and in different settings 
(sequential, parallel, dynamic graph problems). 
A very partial list includes dynamic planar graph 
algorithms [4], Laplacian solvers and electrical 
flow algorithms [15,20], and parallel algorithms 
in computational geometry [7]. Henzinger 
et al. [8] used a recursive r-division with roughly 


r-divisions for 
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log*n levels to compute shortest paths with 
nonnegative arc lengths in linear time. 

Decompositions based on simple cycle sepa- 
rators are also wildly used in efficient algorithms 
for planar graphs. Examples include maximum 
flow [3, 10], shortest paths [13], and many others. 
These algorithms typically rely on additional 
structural properties specific to planar graphs, 
such as non-crossing of shortest paths (also 
known as the Monge property). Decompositions 
with few holes were introduced by Klein and 
Subramanian [12] to construct approximate 
dynamic distance oracles for planar graphs. 
Fakcharoenphol and Rao [5] used a complete 
decomposition with few holes for computing 
shortest paths with negative lengths in planar 
graphs in O(n log? n)-time. The currently fastest 
algorithm for this problem uses r-divisions with 
few holes and runs in O(n log*n/loglogn) 
time [21]. Italiano et al. [9] used an r-division 
with few holes to find a min st-cut and max 
st-flow in O(n log log n) time. 


Cross-References 


Fully Dynamic Higher Connectivity for Planar 
Graphs 

Separators in Graphs 

Shortest Paths in Planar Graphs with Negative 
Weight Edges 


Recommended Reading 


1. Alon N, Seymour PD, Thomas R (1990) A separator 
theorem for graphs with an excluded minor and 
its applications. In: Proceedings of the 22nd annual 
ACM symposium on theory of computing (STOC), 
Baltimore, pp 293-299 

2. Arge L, van Walderveen F, Zeh N (2013) Multiway 
simple cycle separators and I/O-efficient algorithms 
for planar graphs. In: Proceedings of the 24th an- 
nual ACM-SIAM symposium on discrete algorithms 
(SODA), New Orleans, pp 901-918 

3. Borradaile G, Klein PN, Mozes S, Nussbaum Y, 
Wulff-Nilsen C (2011) Multiple-source multiple-sink 
maximum flow in directed planar graphs in near- 
linear time. In: Proceedings of the 52nd annual sym- 
posium on foundations of computer science (FOCS), 
Palm Springs, pp 170-179 


Reducing Bayesian Mechanism Design to Algorithm Design 


10. 


11. 


12. 


13. 


14. 


15. 


16. 


Eppstein D, Galil Z, Italiano GF, Spencer TH (1993) 
Separator based sparsification for dynamic planar 
graph algorithms. In: Proceedings of the 25th sympo- 
sium theory of computing, San Diego. ACM, pp 208- 
217. http://www.acm.org/pubs/citations/proceedings/ 
stoc/167088/p208-eppstein/ 

Fakcharoenphol J, Rao S (2006) Planar graphs, neg- 
ative weight edges, shortest paths, and near linear 
time. J Comput Syst Sci 72(5):868-889. http://dx.doi. 
org/10.1016/j.jcss.2005.05.007, preliminary version 
in FOCS 2001 

Frederickson GN (1987) Fast algorithms for shortest 
paths in planar graphs with applications. SIAM J 
Comput 16:1004—1022 

Goodrich MT (1995) Planar separators and parallel 
polygon triangulation. J Comput Syst Sci 51(3):374— 
389 

Henzinger MR, Klein PN, Rao S, Subrama- 
nian S (1997) Faster shortest-path algorithms for 
planar graphs. J Comput Syst Sci 55(1):3-23. 
doi:10.1145/195058.195092 

Italiano GF, Nussbaum Y, Sankowski P, Wulff-Nilsen 
C (2011) Improved algorithms for min cut and max 
flow in undirected planar graphs. In: Proceedings 
of the 43rd annual ACM symposium on theory of 
computing (STOC). ACM, New York, pp 313-322. 
http://doi.acm.org/10.1145/1993636. 1993679, http:// 
doi.acm.org/10.1145/1993636.1993679 

Johnson DB, Venkatesan S (1982) Using divide and 
conquer to find flows in directed planar networks in 
O(n3/? logn) time. In: Proceedings of the 20th an- 
nual allerton conference on communication, control, 
and computing, Monticello, pp 898-905 
Kawarabayashi K, Reed BA (2010) A separator 
theorem in minor-closed classes. In: 51th annual 
IEEE symposium on foundations of computer science 
(FOCS), Las Vegas, pp 153-162 

Klein PN, Subramanian S (1998) A fully dynamic 
approximation scheme for shortest paths in planar 
graphs. Algorithmica 22(3):235—249 

Klein PN, Mozes S, Weimann O (2010) Short- 
est paths in directed planar graphs with negative 
lengths: a linear-space O(n log” n)-time algorithm. 
ACM Trans Algorithms 6(2):1-18. http://doi.acm. 
org/10.1145/1721837.1721846, preliminary version 
in SODA 2009 

Klein PN, Mozes S, Sommer C (2013) Structured 
recursive separator decompositions for planar graphs 
in linear time. In: Symposium on theory of computing 
conference (STOC), Palo Alto, pp 505-514 

Koutis I, Miller GL (2007) A linear work, o(n1/6) 
time, parallel algorithm for solving planar laplacians. 
In: Proceedings of the eighteenth annual ACM-SIAM 
symposium on discrete algorithms, society for indus- 
trial and applied mathematics (SODA ’07), Philadel- 
phia,pp 1002-1011. http://dl.acm.org/citation.cfm? 
id=1283383.1283491 

Lipton RJ, Tarjan RE (1979) A separator theorem for 
planar graphs. SIAM J Appl Math 36(2):177-189 


1801 


17. Lipton RJ, Tarjan RE (1980) Applications of a planar 
separator theorem. SIAM J Comput 9(3):615-627 

18. Lipton RJ, Rose DJ, Tarjan RE (1979) Generalized 
nested dissection. SIAM J Numer Anal 16:346-358 

19. Miller GL (1986) Finding small simple cycle separa- 
tors for 2-connected planar graphs.J Comput Syst Sci 
32(3):265-279. doi: 10.1016/0022-0000(86)90030-9 

20. Miller GL, Peng R (2013) Approximate maximum 
flow on separable undirected graphs. In: Proceed- 
ings of the twenty-fourth annual ACM-SIAM sympo- 
sium on discrete algorithms (SODA), New Orleans, 
pp 1151-1170 

21. Mozes S, Wulff-Nilsen C (2010) Shortest 
paths in planar graphs with real lengths in 
O(n log? n/loglogn) time. In: Proceedings of 
the 18th European symposium on algorithms (ESA), 
Liverpool, pp 206-217 

22. Sleator D, Tarjan R (1983) A data structure for 
dynamic trees. J Comput Syst Sci 26(3):362-391. 
doi:10.1016/0022-0000(83)90006-5 


Reducing Bayesian Mechanism 
Design to Algorithm Design 


Yang Cai!, Constantinos Daskalakis?, and 
Matthew Weinberg? 

‘Computer Science, McGill University, 
Montreal, QC, Canada 

2EECS, Massachusetts Institute of Technology, 
Cambridge, MA, USA 

3Computer Science, Princeton University, 
Princeton, NJ, USA 


Keywords 


Equivalence of separation and optimization; Fair 
allocation; Job scheduling; Mechanism design; 
Revenue maximization 


Years and Authors of Summarized 
Original Work 


STOC2012; Cai, Daskalakis, Weinberg 
FOCS2012; Cai, Daskalakis, Weinberg 
SODA2013; Cai, Daskalakis, Weinberg 
FOCS2013; Cai, Daskalakis, Weinberg 
SODA2015; Daskalakis, Weinberg 


1802 


Problem Definition 


The goal is to design algorithms that succeed 
in models where input is reported by strategic 
agents (henceforth referred to as strategic input), 
as opposed to standard models where the input 
is directly given (henceforth referred to as honest 
input). For example, consider a resource allo- 
cation problem where a single user has m jobs 
to process on 7 self-interested machines. Each 
machine 7 can process job 7 in time ¢;;, and 
this is privately known only to the machine. Each 
machine reports some processing times tij to the 
user, who then runs some algorithm to determine 
where to process the jobs. Good approximation 
algorithms are known when machines are honest 
(.e., fi = ¢;; for all i, 7) if the user’s goal is to 
minimize the makespan, the time elapsed until all 
jobs are completed, going back to seminal work 
of Lenstra, Shmoys, and Tardos [13]. However, 
such algorithms do not account for the strate- 
gic nature of the machines, which may want to 
minimize their own work: why would they report 
honestly their processing time for each job if they 
can elicit a more favorable schedule by lying? To 
accommodate such challenges, new algorithmic 
tools must be developed that draw inspiration 
from Game Theory. 

Requiring solutions that are robust against 
potential strategic manipulation potentially in- 
creases the computational difficulty of whatever 
problem is at hand. The discussed works provide 
a framework with which to design such solutions 
(henceforth called mechanisms) and address the 
following important question. 


Question 1 How much (computationally) more 
difficult is mechanism design than algorithm de- 
sign? 


Using this framework, we resolve this ques- 
tion with an answer of “not at all” for several 
important problems including job scheduling and 
fair allocation. Another application of our frame- 
work provides efficient algorithms and structural 
characterization results for multi-item revenue- 
optimal auction design, a central open problem 
in mathematical economics. 
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Model 


Environment 

1. Set F of feasible outcomes. Interpret F as 
the set of all (feasible) allocations of jobs to 
machines, allocations of items to bidders, etc. 

2. n agents who all care about which outcome is 
chosen. 


Strategic Agents 

1. Each agent 7 has a value ¢;(x) for each out- 
come x € F. t; induces a function from F > 
R and is called the agent’s type. 

2. Each t; is drawn independently from some 
distribution D; of finite support. 

3. Agent i knows ¢;; all other agents and the 
designer know only D;. 

4. Agents are quasi-linear and risk neutral. That 
is, the utility of an agent of type ¢ for a 
randomized outcome (distribution over out- 
comes) X € A(F), when he is charged price 
p, is Ex—x[t(x)] — p. 

5. Agents behave in a way that maximizes util- 
ity, taking into consideration beliefs about the 
behavior of other agents. 


Designer 

1. Designs an allocation rule A and price rule 
P. A takes as input a type profile (t1,..., tn) 
and outputs (possibly randomly) an outcome 
A(t) € ¥. P takes as input a type profile 
and outputs (possibly randomly) a price vector 
P(t). The pair (A, P) is called a (direct) 
mechanism. Note that it is without loss of 
generality to consider only the design of direct 
mechanisms by the revelation principle [14]. 

2. Announces A and P to agents. Invites agents 
to report a type. When t is reported, selects the 
outcome A(t) and charges agent i price P;(t). 

3. Has some objective function O to optimize. O 
may depend on the agents’ types, the outcome 
selected, and the prices charged, so we write 
Oct, x, P). Examples include: 
* Social welfare: O(t,x,P) =); ti (x). 
* Revenue: O(t,x,P) = >°, Pi(t). 
¢ Makespan: O(t, x, P) = max; {—1;(x)} (In 

job scheduling, agents’ values from alloca- 
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tions are nonpositive, since they have cost 
for processing jobs. An agent’s cost for 
allocation x is then —t;(x).). 

¢ Fairness: O(t, x,P) = min; {t;(x)}. 


Game Theoretic Definitions 

1. The interim rule of a mechanism is a function 
that takes as input an agent 7 and type 1; 
and outputs the distribution of allocations and 
prices that agent i sees when reporting type 
t; over the randomness of the mechanism and 
the other agents’ types, assuming they tell the 
truth. So the interim allocation rule (z, p) of 
the mechanism (A, P) satisfies: 


Pri[x<—7; (t)]= it_;< D_j [Pr[A(¢;; t_; )=x]] z 


Pr[p<p; (t;)]=Et_; —p_; [Pr[Pi (4; t-i)= pl]. 


2. A mechanism is Bayesian Incentive Com- 
patible (BIC) if every agent receives at 
least as much utility by reporting their 
true type as any other type (assuming 
other agents report truthfully). Formally, 
ti(wi(ti)) — pili) = G(ai(G)) — pit) for 
all i,t;,t; (We use the shorthand ¢;(z;(t})) 
to denote the expected value of ¢; for 
the random allocation drawn from 7; (t/). 
Formally, ¢;(;(¢/)) = E,. m(t})lti (x)].). A 
commonly used relaxation of BIC is called 
€-Bayesian Incentive Compatible (€-BIC). A 
mechanism is €-BIC if every agent derives 
at most € less utility by reporting their true 
type comparing to any other type (assuming 
other agents report truthfully). Formally, 
ti(mi(ti)) — pitti) = (i (t/)) — pilt/) — € 
for all i, t,t}. 

3. A mechanism is individually rational (IR) if 
every agent has nonnegative expected utility 
by participating in the mechanism (assuming 
other agents report truthfully). Formally, 
ti (ti (tz) — pi (ti) = 0 for all i, t;. 


Bayesian Mechanism Design (BMeD) 

Here we describe formally the mechanism design 
problem we study. BMeD is parameterized by a 
set of feasible outcomes /, objective function O, 
and set of possible types V. Both V and F can 


be discrete or continuous. We assume that every 
element v € VY and x € F can be represented 
by a finite bit string (v) and (x). V and F also 
specify how those bit strings are interpreted. For 
instance, V might be the class of all submodular 
functions, and the bit strings used to represent 
them may be interpreted as indexing a black- 
box value oracle. Or Y might be the class of all 
subadditive functions, and the bit strings used to 
represent them may be interpreted as an explicit 
circuit. Or V could be the class of all additive 
functions, and the bit strings used to represent 
them may be interpreted as a vector containing 
values for each item. So we are parameterizing 
our problems both by the actual classes V and 
F but also by how elements of these classes 
are represented. Now, we are ready to formally 
discuss the problem BMeD(/, V, ©). 

BMeD(7, V, ©): 


INPUT: For each agent i € [nl], a discrete 
distribution Dj; over types in Y, described 
explicitly by listing the support of D; and the 
corresponding probabilities. 

OuTPUT: A BIC, IR mechanism. 

GOAL: Find the mechanism that optimizes O 
in expectation, with respect to all BIC, IR 
mechanisms (when n bidders with types drawn 
from x; Dj report truthfully). 

APPROXIMATION: A mechanism is said to be an 
(€, @)-approximation to BMeD if it outputs an 
€-BIC mechanism whose expected value of O 
(when n bidders with types drawn from x;D; 
report truthfully) is at least OPT — € (or at 
most AOPT + € for minimization problems). 


Generalized Objective Optimization 

Problem (GOOP) 

Here we describe formally the algorithmic prob- 
lem we show has strong connections to BMeD. 
GOOP is parameterized by a set of feasible out- 
comes /, objective function O, and set of pos- 
sible types V. We therefore formally discuss the 
problem GOOP(F, V, ©). Below, ¥* denotes the 
closure of VY under linear combinations. Func- 
tions in Y* are represented by a finite list of 
elements of VY, along with (possibly negative) 
scalar multipliers. 
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GOOP(F, V, ©): 


INPUT: For each agent i € [n], a type gi € V, 
multiplier m; € IR, and cost function f; € V*. 
Additionally, an indicator bit b (The indicator 
bit b is included so that the optimization of just 
>>; fi(x) (without price multipliers or O) is 
formally a special case of GOOP(F, V, ).). 

OUTPUT: An allocation x € F, and price vector 
pe R’. 

GOAL: Find argmaxxeF,p{b - O(g,x,p) + 
do, mipi +; fi(x)} (or arg min, if O is a 
minimization objective like makespan). 

APPROXIMATION: (x, p) is said to be an (a, B)- 
approximation to GOOP if B- b-O(g,x,p) + 
yo, Mi pi + 0; fi) is at least/most a - OPT. 
Note that a (a, 1)-approximation is the stan- 
dard notion of an a-approximation. Allowing 
B # 1 boosts/discounts the value of O (the 
objective) before comparing to a - OPT. Note 
also that allowing B # 1 provides no benefit 
ifb =0. 


Key Results 


We provide a poly-time black-box reduction from 
BMeD(/, V, O) to GOOP(F, V, O). That is, we 
provide a reduction from Bayesian mechanism 
design to traditional algorithm design. 


Theorem 1 Let G be an (a, B)-approximation 
algorithm for GOOP(F,V,Q). Then for alle > 0, 
there is an (€, a/B)-approximation algorithm for 
BMeD(F,V,O). If £ is the length of the input to a 
BMeD(F,V,Q) instance, the algorithm succeeds 
with probability 1 — exp (—poly(£, 1/€)), makes 
poly(€, 1/¢) black-box calls to G on inputs of size 
poly(€, 1/e), and terminates in time poly(€, 1/€) 
(times the running time of each oracle call to G). 


This reduction is developed in a recent series of 
papers by the authors [4—7, 9]. The possibility of 
failure and additive error is due to a sampling 
procedure in the reduction. In addition to the 
computational aspect provided in Theorem 1, our 
reduction also has a structural aspect. Namely, we 
provide a characterization of the optimal mecha- 
nism in Bayesian settings. 
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Theorem 2 For all objectives O, feasibility 
constraints F, set of possible types VY, and 
inputs D to BMeD(F,V,Q), the optimal 
mechanism is a distribution over generalized 
objective maximizers. Formally, there exists a 
joint distribution A over an indicator bit b® and 
mappings CO ak), where each Fig maps 
types t; to multipliers m> (ti) € R and cost 
functions ¢° (t;) € V%*, such that the optimal 
mechanism first samples (b°,£°) from A then 
maps the type profile t to the allocation and 
price vector (x(t), p(t)) = arg MaxxeF,p{b* . 


Olt, x,p) + 0; mi) pi + Yi; 6? G)OD}- 


Perhaps the most interesting case of Theorem 2 
is when the objective is revenue. In this case, 
we may interpret the cost functions ¢° € y* 
as the virtual valuation function of bidder i. By 
virtual valuations, we do not mean Myerson’s 
specific virtual valuation functions [14], which 
aren’t even defined for multi-item instances. In- 
stead we simply mean some virtual valuation 
functions that may or may not be the same as 
the types/valuations reported by the agents. We 
include this and other applications of Theorems 1 
and 2 below. 


Applications 


In this section, we apply Theorem | to the objec- 
tives of revenue, makespan, and fairness. 


Revenue Maximization 

We apply Theorem | to reduce the BMeD prob- 
lem of optimizing revenue in multi-item settings 
to GOOP. In [7], it is shown that for this case, 
one need only consider instances of GOOP with 
b=m, =... = my, = O, so the GOOP 
instances that must be solved require just op- 
timization of the cost function (which we call 
virtual welfare for this application). We obtain 
the following computational and structural re- 
sults on optimal auction design in general multi- 
item settings, addressing a long-standing open 
question following Myerson’s seminal work on 
single-item auctions [14]. 
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Theorem 3 (Revenue Maximization, Com- 
putational) Let G be an a-approximation 
algorithm for maximizing virtual welfare over F 
when all virtual types are from V*. Then for all 
€ > 0, there is an (€, @)-approximation algorithm 
for the  problen BMeD(F, V, REVENUE) 
that makes polynomially many black-box 
calls to G. If & is the length of the input 
to a BMeD(F,V,REVENUE) instance, the 
algorithm succeeds with probability 1 — 
exp (—poly(£, 1/e)), makes poly(€,1/e) black- 
box calls to G on inputs of size poly(£,1/e), 
and terminates in time poly(£,1/¢€) (times the 
running time of each oracle call to G). 


Theorem 4 (Revenue Maximization, Struc- 
tural) In any multi-item setting with arbitrary 
feasibility constraints and possible agent types, 
the allocation rule of the revenue-optimal auction 
is a distribution over virtual welfare maximizers. 
Formally, there exists a distribution A over 
mappings (¢1,...,¢n), where each $; maps 
types t; to cost functions f; € V*, such that the 
allocation rule for the optimal mechanism first 
samples @ from A then maps type profile t to the 
allocation arg maxyeF{)_; $i (ti)(x)}- 


We further consider the following important spe- 
cial case: There are m items for sale to n buyers. 
Any allocation of items to buyers is feasible 
(that is, each item can be awarded to at most 
one buyer), so we can denote the set of feasible 
allocations as F = [n + 1]. Furthermore, each 
buyer i has a value v;; for item j and is additive 
across items, meaning that their value for a set S 
of items is }’ ;¢s viz. So we can denote the set of 
possible types as IR” (and have types represented 
as such). 


Theorem 5 (Revenue Maximization for 
Additive Buyers, Computational) There is 
a poly-time algorithm for GOOP({n + 1)”, 
RY, REVENUE). Therefore, there is a poly-time 
algorithm for BMeD({n+1]”, IR”, REVENUE) 
(In this special case, no sampling is required 
in the reduction, so the theorem holds even for 
€ = 0. Formally, this is a (0, 1)-approximation 
(an exact algorithm). See [4] for details.). 
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Theorem 6 (Revenue Maximization for Addi- 
tive Buyers, Structural) Jn any multi-item set- 
ting with n additive buyers and m items for sale, 
the allocation rule of the revenue-optimal auction 
is a distribution over virtual welfare maximizers. 
Formally, there exists a distribution A over map- 
pings (¢1,...,$n), where each $; maps types 
t; to cost functions f; € R™, such that the 
allocation rule for the optimal mechanism first 
samples @ from A then awards every item j to 
a buyer in arg max; {@i; (v;)} if their virtual value 
for item j is nonnegative and does not allocate 
the item otherwise. 


Job Scheduling on Unrelated Machines 

The problem of job scheduling on unrelated 
machines consists of m jobs and n machines, 
with machine i able to process job j in time 
ti;. The goal is to find a schedule (that assigns 
each job to exactly one machine) minimizing the 
makespan. Specifically, if S; are the jobs assigned 
to machine 7, the makespan is max;{)/ <5, lij}- 
As a mechanism design problem, one considers 
the machines to be strategic agents who know 
their processing time for each job (but the 
designer and other machines do not). In the 
language of BMeD, we can denote the feasibility 
constraints as [n]’”, the set of possible types as 
IR“, and the objective as MAKESPAN. Theorem | 
reduces BMeD((n]”,IR’!,MAKESPAN) to 
GOOP([n]”", IR”, MAKESPAN). It is shown in [7] 
that for objectives that don’t depend on the prices 
charged at all (called “allocation-only”), only 
instances of GOOP with m; = O Vi need 
be considered. It is further shown in [9] that 
GOOP([n]”, IR”, MAKESPAN) can be interpreted 
as a job scheduling problem with costs. Specifi- 
cally, GOOP((n], IR“, MAKESPAN) takes as in- 
put a processing time f;; = 0, and monetary cost 
cij € R for all machines i and jobs /. The goal is 
to find a schedule that minimizes the makespan 
plus cost. Formally, partition the jobs into 
disjoint sets S; to minimize max;{)? jes, tij} + 
eS ; cij- While it is NP-hard to approximate 
GOOP([n]”, IR”, MAKESPAN) within any finite 
factor, a result of Shmoys and Tardos from the 
early 1990s obtains a polynomial time (1, 1/2)- 
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approximation algorithm [15]. In combination 
with Theorem 1, this yields the following 
theorem: 


Theorem 7 (Job Scheduling on Unrelated 
Machines) For all « > OQ, is a 
poly-time (€,2)-approximation algorithm for 
BMeD([n]”, R”, MAKESPAN). If € is the length 
of the input to a BMeD({n]”,.R",MAKESPAN) 
instance, the algorithm succeeds with probability 
1 — exp (—poly(¢, 1/e€)) and terminates in time 
poly(€, 1/e). 


there 


Fair Allocation of Indivisible Goods 

The problem of fairly allocating indivisible goods 
consists of m indivisible goods and n children, 
with child 7 receiving value v;; for good j. The 
goal is to find an allocation of goods (that assigns 
each good to at most one child) maximizing the 
fairness. Specifically, if S; are the goods allo- 
cated to child 7, the fairness is minj{)) jes; vi}. 
As a mechanism design problem, one considers 
the children to be strategic agents who know 
their own value for each good (but the designer 
and other children do not). In the language of 
BMeD, we can denote the feasibility constraints 
as [n + 1], the set of possible types as R, and 
the objective as FAIRNESS. Theorem | reduces 
BMeD([n + 1], IR%, FAIRNESS) to GOOP([” + 
1)”, RY, FAIRNESS), which can be interpreted as 
a fair allocation problem with costs (again, be- 
cause FAIRNESS is allocation only) [7,9]. Specif- 
ically, GOOP([n + 1]’”, IR, FAIRNESS) takes as 
input a value v;; => 0 and monetary cost cj; € R 
for all children i and goods j. The goal is to 
find an allocation that maximizes the fairness 
minus cost. Formally, allocate the goods into 
disjoint sets S; to maximize min;{)) cs, vis} — 
>=; 2; ¢cij- While it is NP-hard to approximate 
GOOP([n +1], IR%, FAIRNESS) within any finite 
factor, we develop poly-time (1,m —n + 1)- 
and (1/2, O(./n))-approximation algorithms for 
fair allocation with costs, based on algorithms 
of Bezakova and Dani [2] and Asadpour and 
Saberi [1] for fair allocation (without costs). 


Theorem 8 (Fair Allocation of Indivisible 
Goods) There are poly-time (1,m — n + 1)- 
and (1/2, O(./n))-approximation algorithms for 
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GOOP([n + 1]”, R", FAIRNESS). Therefore, for 
all¢e > 0, there is a (e, minf{O(./n),m—n + 1})- 
approximation algorithm for BMeD([n + 
1)”, IR, FAIRNESS). If € is the length of the 
input to a BMeD({n + 1]”,R%,FAIRNESS) 
instance, the algorithm succeeds with probability 
1 — exp (—poly(€, 1/e€)) and terminates in time 
poly(€, 1/e). 


Tools for Convex Optimization 


We prove Theorems | and 2 by solving a 
linear program over the space of possible 
interim allocation rules and generalizations of 
interim allocation rules that we do not discuss 
here. In doing so, we also develop new tools 
applicable for general convex optimization that 
we discuss here. We omit full details of the 
approach and refer the reader to a series of 
papers by the authors [5—7, 9] for specifics of 
the linear program solved and why it addresses 
BMeD. Seminal works of Khachiyan [12], 
Grotschel, Lovasz, and Schrijver [10], and Karp 
and Papadimitriou [11] study the problems 
of optimization and separation over a close, 
convex region P C R¢% (Below, we denote by 
aP = {ax|x € P}. Also, for simplicity of 
exposition, we only consider P that contain the 
origin, so thataP C P for alla < 1, but our 
results extend to all closed, convex P. See [9] 
for our most general results.). Formally, these 
problems are: 


Optimize(P): 


INPUT: A direction c € R¢. 
OUTPUT: A point x € P. 
GOAL: Find x* € arg maxxe p {c - x}. 


Separate(P): 


INPUT: A point x € R@. 

OuTPUT: “Yes,” or a direction ¢ € R¢. 

GOAL: If x € P, output “yes.” Otherwise, 
output any ¢ such that ¢- x > maxyep{Cc- y}. 


Khachiyan’s Ellipsoid algorithm shows that if 
one can solve the problem Separate(P) in time 
poly(d), then one can also solve Optimize(P) in 
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time poly(d). Grétschel, Lovasz, and Schrijver 
and independently Karp and Papadimitriou show 
that the other direction holds as well: if one can 
solve Optimize(P) in time poly(d), then one can 
also solve Separate(P) in time poly(d). This is 
colloquially called “the equivalence of separation 
and optimization.” While separation as a means 
for optimization has obvious uses, optimization 
as a means for separation is more subtle. Still, nu- 
merous applications exist (including our results) 
and we refer the reader to [10, 11] for several 
others, including the first poly-time algorithm for 
submodular minimization. 

In order to provide our guarantees with re- 
spect to approximation, we develop further the 
equivalence of separation and optimization to ac- 
commodate approximation. Specifically, consider 
the following problems, further parameterized by 
some a < 1: 
a-Optimize(P): 


INPUT: A direction ¢ € R¢. 
OUTPUT: A point x € P. 
GOAL: Find x satisfying ¢-x > @ maxyep{c-y}. 


a-Separate(P): 


INPUT: A point x € R@. 

OuTpPuT: “Yes” and a proof that x € P, 
or a direction © € R®@ (For formal details 
on exactly what constitutes a proof, we refer 
the reader to [6, 7, 9]. Roughly speaking, x 
is written as a convex combination of points 
known to be in P.). 

GOAL: [fx € aP, output “yes” and a proof that 
x € P.Ifx ¢ P, output a direction ec such that 
cx > amaxyep{e: y}. [fx € P\aP, either 
is acceptable. 


Theorem 9 (Approximate Equivalence of Sep- 
aration and Optimization) For all a < 1, the 
problems a-Optimize(P ) and a-Separate(P ) are 
computationally equivalent. That is, if one can 
solve one in time poly(d ), one can solve the other 
in time poly(d) as well. 


We also extend these results to accommodate bi- 
criterion approximation, via the problems below, 
further parameterized by some 6 > 1 and subset 
S © [d] of coordinates (Below, when we write 
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(6xs,X—s), we mean to take x and multiply each 
xi,i € S by B.). 
(a, 6, S)-Optimize(P): 


INPUT: A direction ¢ € R¢. 
OUTPUT: A pointx € P. 
GOAL. Find x satisfying ¢ - (Bxs,x-s) = 
a Maxyep{c: y}. 
(a, 6, S)-Separate(P): 


INPUT: A point x € R@. 

Output: “Yes” and a proof thatx € P, ora 
direction e € R4. 

GOAL: If (Bxs,x—s) € @P, output “yes” anda 
proof thatx € P. Ifx € P, output a direction 
c such that ¢- (Bxs,X—s) > a@maxyep{e: y}. 
If (Bxs,x-s) ¢ a@P andx € P., either is 
acceptable (An astute reader might worry that 
for some a, B,S,P, the problem (a, B, S)- 
Separate(P) is impossible, due to the exis- 
tence of anx ¢ P such that (Bxs,x-s) € 
aP. For some a,B,S,P, this is indeed the 
case, but we show that (a, B, S)-Optimize(P ) 
is impossible in these cases as well.). 


Theorem 10 (Bi-Criterion Approximate 
Equivalence of Separation and Optimization) 
For alla < 1,6 => 1,S C [d], the 
problems (a, 8B, S)-Optimize(P) and (a, B, S)- 
Separate(P) are computationally equivalent. 
That is, if one can solve one in time poly(d), 
one can solve the other in time poly(d) as well. 


More formal statements and how we apply these 
theorems to yield our main result can be found 
in [9]. Finally, the theorems hold for minimiza- 
tion as well as maximization and without the 
restriction that P contains the origin (but the 
theorem statements are more technical). 


Open Problems 


Our work provides a novel computational 
framework for solving Bayesian mechanism 
design problems. We have applied our framework 
to solve several specific important problems, such 
as computing revenue-optimal auctions in multi- 
item settings and approximately optimal BIC 
mechanisms for job scheduling, but numerous 
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important settings and objectives remain 
unresolved. Theorem 1 provides a concrete 
approach for tackling such problems, via the 
design of (a, 6)-approximations for the purely 
algorithmic Generalized Objective Optimization 
Problem. Therefore, one important direction 
following our work is to apply our framework 
to novel settings and design algorithms for the 
resulting GOOP instances. 
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Problem Definition 


Consider a system of asynchronous processes 
that communicate among themselves by only 
executing read and write operations on a set of 
shared variables (also known as shared registers). 
The system has no global clock or other syn- 
chronization primitives. Every shared variable is 
associated with a process (called owner) which 
writes it and the other processes may read it. An 
execution of a write (read) operation on a shared 
variable will be referred to as a Write (Read) on 
that variable. A Write on a shared variable puts 
a value from a pre-determined finite domain into 
the variable, and a Read reports a value from the 
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domain. A process that writes (reads) a variable 
is called a writer (reader) of the variable. 

The goal is to construct shared variables in 
which the following two properties hold. (1) Op- 
eration executions are not necessarily atomic, that 
is, they are not indivisible but rather consist of 
atomic sub-operations, and (2) every operation 
finishes its execution within a bounded number 
of its own steps, irrespective of the presence 
of other operation executions and their relative 
speeds. That is, operation executions are wait- 
free. These two properties give rise to a classi- 
fication of shared variables, depending on their 
output characteristics. Lamport [8] distinguishes 
three categories for 1-writer shared variables, 
using a precedence relation on operation exe- 
cutions defined as follows: for operation execu- 
tions A and B, A precedes B, denoted A —> B, 
if A finishes before B starts; A and B over- 
lap if neither A precedes B nor B precedes A. 
In 1-writer variables, all the Writes are totally 
ordered by “—~”. The three categories of 1- 
writer shared variables defined by Lamport are 
the following. 


1. A safe variable is one in which a Read not 
overlapping any Write returns the most re- 
cently written value. A Read that overlaps 
a Write may return any value from the domain 
of the variable. 

2. A regular variable is a safe variable in which 
a Read that overlaps one or more Writes re- 
turns either the value of the most recent Write 
preceding the Read or of one of the overlap- 
ping Writes. 

3. An atomic variable is a regular variable in 
which the Reads and Writes behave as if they 
occur in some total order which is an extension 
of the precedence relation. 


A shared variable is boolean (Boolean vari- 
ables are referred to as bits.) or multivalued 
depending upon whether it can hold only one 
out of two or one out of more than two val- 
ues. A multiwriter shared variable is one that 
can be written and read (concurrently) by many 
processes. If there is only one writer and more 
than one reader it is called a multireader variable. 
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Key Results 


In a series of papers starting in 1974, for details 
see [4], Lamport explored various notions of con- 
current reading and writing of shared variables 
culminating in the seminal 1986 paper [8]. It for- 
mulates the notion of wait-free implementation of 
an atomic multivalued shared variable — written 
by a single writer and read by (another) single 
reader — from safe 1-writer 1-reader 2-valued 
shared variables, being mathematical versions 
of physical flip-flops, later optimized in [13]. 
Lamport did not consider constructions of shared 
variables with more than one writer or reader. 
Predating the Lamport paper, in 1983 
Peterson [10] published an ingenious wait- 
free construction of an atomic l-writer, n- 
reader m-valued atomic shared variable from 
n+ 2 safe 1-writer n-reader m-valued registers, 
2n \-writer 1l-reader 2-valued atomic shared 
variables, and 2 1l-writer n-reader 2-valued 
atomic shared variables. He presented also 
a proper notion of the wait-freedom property. 
In his paper, Peterson didn’t tell how to construct 
the n-reader boolean atomic variables from 
flip-flops, while Lamport mentioned the open 
problem of doing so, and, incidentally, uses 
a version of Peterson’s construction to bridge 
the algorithmically demanding step from atomic 
shared bits to atomic shared multivalues. On 
the basis of this work, N. Lynch, motivated by 
concurrency control of multi-user data-bases, 
posed around 1985 the question of how to 
construct wait-free multiwriter atomic variables 
from 1-writer multireader atomic variables. Her 
student Bloom [1] found in 1985 an elegant 
2-writer construction, which, however, has 
resisted generalization to multiwriter. Vitanyi and 
Awerbuch [14] were the first to define and explore 
the complicated notion of wait-free constructions 
of general multiwriter atomic variables, in 
1986. They presented a proof method, an 
unbounded solution from 1-writer 1-reader 
atomic variables, and a bounded solution from 
1-writer n-reader atomic variables. The bounded 
solution turned out not to be atomic, but only 
achieved regularity (“Errata” in [14]). The paper 
introduced important notions and techniques 


1810 


in the area, like (bounded) vector clocks, and 
identified open problems like the construction 
of atomic wait-free bounded multireader shared 
variables from flip-flops, and atomic wait-free 
bounded multiwriter shared variables from 
the multireader ones. Peterson who had been 
working on the multiwriter problem for a decade, 
together with Burns, tried in 1987 to eliminate 
the error in the unbounded construction of [14] 
retaining the idea of vector clocks, but replacing 
the obsolete-information tracking technique by 
repeated scanning as in [10]. The result [11] 
was found to be erroneous in the technical 
report (R. Schaffer, On the correctness of atomic 
multiwriter registers, Report MIT/LCS/TM-364, 
1988). Neither the re-correction in Schaffer’s 
Technical Report, nor the claimed re-correction 
by the authors of [11] has appeared in print. Also 
in 1987 there appeared at least five purported 
solutions for the implementation of 1-writer n- 
reader atomic shared variable from 1-writer 1- 
reader ones: [2, 7, 12] (for the others see [4]) 
of which [2] was shown to be incorrect (S. 
Haldar, K. Vidyasankar, ACM Oper. Syst. Rev, 
26:1(1992), 87-88) and only [12] appeared in 
journal version. The paper [9], initially a 1987 
Harvard Tech Report, resolved all multiuser 
constructions in one stroke: it constructs 
a bounded n-writer n-reader (multiwriter) atomic 
variable from O(n?) 1-writer 1-reader safe bits, 
which is optimal, and O(n”) bit-accesses per 
Read/Write operation which is optimal as well. It 
works by making the unbounded solution of [14] 
bounded, using a new technique, achieving 
a robust proof of correctness. “Projections” of 
the construction give specialized constructions 
for the implementation of 1-writer n-reader 
(multireader) atomic variables from O(n) 1- 
writer 1-reader ones using O(n) bit accesses per 
Read/Write operation, and for the implementa- 
tion of n-writer n-reader (multiwriter) atomic 
variables from n 1-writer n-reader (multireader) 
ones. The first “projection” is optimal, while the 
last “projection” may not be optimal since it uses 
O(n) control bits per writer while only a lower 
bound of (2(logn) was established. Taking up 
this challenge, the construction in [6] claims to 
achieve this lower bound. 
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Timestamp System 

In a multiwriter shared variable it is only required 
that every process keeps track of which pro- 
cess wrote last. There arises the general question 
whether every process can keep track of the order 
of the last Writes by all processes. A. Israeli and 
M. Li were attracted to the area by the work 
in [14], and, in an important paper [5], they raised 
and solved the question of the more general and 
universally useful notion of a bounded timestamp 
system to track the order of events in a concurrent 
system. In a timestamp system every process 
owns an object, an abstraction of a set of shared 
variables. One of the requirements of the system 
is to determine the temporal order in which the 
objects are written. For this purpose, each object 
is given a label (also referred to as a timestamp) 
which indicates the latest (relative) time when 
it has been written by its owner process. The 
processes assign labels to their respective objects 
in such a way that the labels reflect the real-time 
order in which they are written to. These systems 
must support two operations, namely labeling 
and scan. A labeling operation execution (Label- 
ing, in short) assigns a new label to an object, and 
a scan operation execution (Scan, in short) en- 
ables a process to determine the ordering in which 
all the objects are written, that is, it returns a set of 
labeled-objects ordered temporally. The concern 
is with those systems where operations can be 
executed concurrently, in an overlapped fashion. 
Moreover, operation executions must be wait- 
free, that is, each operation execution will take 
a bounded number of its own steps (the number of 
accesses to the shared space), irrespective of the 
presence of other operation executions and their 
relative speeds. Israeli and Li [5] constructed 
a bit-optimal bounded timestamp system for se- 
quential operation executions. Their sequential 
timestamp system was published in the above 
journal reference, but the preliminary concurrent 
timestamp system in the conference proceedings, 
of which a more detailed version has been cir- 
culated in manuscript form, has not been pub- 
lished in final form. The first generally accepted 
solution of the concurrent case of the bounded 
timestamp system was from Dolev and Shavit [3]. 
Their construction is of the type presented in [5] 
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and uses shared variables of size O(n), where 
n is the number of processes in the system. 
Each Labeling requires O(n) steps, and each Scan 
O(n? logn) steps. (A ‘step’ accesses an O(n) 
bit variable.) In [4] the unbounded construction 
of [14] is corrected and extended to obtain an 
efficient version of the more general notion of a 
bounded concurrent timestamp system. 


Applications 


Wait-free registers are, together with message- 
passing systems, the primary interprocess com- 
munication method in distributed computing the- 
ory. They form the basis of all constructions 
and protocols, as can be seen in the textbooks. 
Wait-free constructions of concurrent timestamp 
systems (CTSs, in short) have been shown to 
be a powerful tool for solving concurrency con- 
trol problems such as various types of mutual 
exclusion, multiwriter multireader shared vari- 
ables [14], and probabilistic consensus, by syn- 
thesizing a “wait-free clock” to sequence the 
actions in a concurrent system. For more details 
see [4]. 


Open Problems 


There is a great deal of work in the direction 
of register constructions that use less constituent 
parts, or simpler parts, or parts that can tolerate 
more complex failures, than previous construc- 
tions referred to above. Only, of course, if the 
latter constructions were not yet optimal in the 
parameter concerned. Further directions are work 
on wait-free higher-typed objects, as mentioned 
above, hierarchies of such objects, and proba- 
bilistic constructions. This literature is too vast 
and diverse to be surveyed here. 


Experimental Results 
Register constructions, or related constructions 


for asynchronous interprocess communication, 
are used in current hardware and software. 
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Problem Definition 


Given a text string T of length n and a regular 
expression R, the regular expression matching 
problem (REM) is to find all text positions at 
which an occurrence of a string in L(R) ends 
(see below for definitions). 


Regular Expression Matching 


For an alphabet »’, a regular expression R 
over »’ consists of elements of 2’ U {€} (e denotes 
the empty string) and operators - (concatenation), 
| (union), and x (iteration, i.e., repeated concate- 
nation); the set of strings L(R) represented by 
R is defined accordingly; see [7]. It is important 
to distinguish two measures for the size of a 
regular expression: the size, m, which is the total 
number of characters from Y U {-, |, *}, and ¥’- 
size, my, which counts only the characters in »’. 
As an example, for R = (A|T) ((C|CG) «), 
the set L(R) contains all strings that start with 
an A or a T followed by zero or more strings 
in the set {C, CG}; the size of R is m = 8 and 
the X'-size is mys = 5. Any regular expression 
can be processed in linear time so that m = 
O(my) (with a small constant); the difference 
becomes important when the two sizes appear as 
exponents. 


Key Results 


Finite Automata 

The classical solutions for the REM problem 
involve finite automata which are directed 
graphs with the edges labelled by symbols 
from » U {e}; their nodes are called states; 
see [7] for details. Unrestricted automata are 
called nondeterministic finite automata (NFA). 
Deterministic finite automata (DFA) have no ¢- 
labels and require that no two outgoing edges 
of the same state have the same label. Regular 
expressions and DFAs are equivalent, that is, the 
sets of strings represented are the same, as shown 
by Kleene [11]. There are two classical ways of 
computing an NFA from a regular expression. 
Thompson’s construction [17] builds an NFA 
with up to 2m states and up to 4m edges whereas 
Glushkov-McNaughton-Yamada’s —_ automaton 
[5, 12] has the minimum number of states, 
my + 1, and O(ms) edges; see Fig.l. Any 
NFA can be converted into an equivalent DFA 
by the subset construction: each subset of the 
set of states of the NFA becomes a state of the 
DFA. The problem is that the DFA can have 
exponentially more states than the NFA. 
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Classical Solutions 

A regular expression is first converted into an 
NFA or DFA which is then simulated on the 
text. In order to be able to search for a match 
starting anywhere in the text, a loop labelled by 
all elements of »’ is added to the initial state; see 
Fig. 1. 

Searching with an NFA requires linear space, 
but many states can be active at the same time, 
and to update them all we need, for Thompson’s 
NFA, O(m) time for each letter of the text; this 
gives Theorem |. On the other hand, DFAs allow 
searching time that is linear in n but require more 
space for the automaton. Theorem 2 uses the DFA 
obtained from Glushkov-McNaughton- Yamada’s 
NFA. 


Theorem 1 (Thompson [17]) The REM prob- 
lem can be solved with an NFA in O(mn) time 
and O(m) space. 


Theorem 2 (Kleene [11]) The REM problem 
can be solved with a DFA in O(n + 2™*) time 
and O(2"~ ) space. 


Lazy Construction and Modules 

One heuristic to alleviate the exponential increase 
in the size of DFA is to build only the states 
reached while scanning the text, as implemented 


Regular Expression 
Matching, Fig. 1 
Thompson’s NFA (left) and 
Glushkov-McNaughton- 
Yamada’s NFA (right) for 
the regular expression 
(A/T) ((C|CG) «); the 
initial loops labelled . 
A,T,C,G are not part of \ 
the construction; they are a 
needed for REM 
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in Gnu Grep. Still, the space needed for the DFA 
remains a problem. A four-Russians approach 
was presented by Myers [13] where a trade-off 
between the NFA and DFA approaches is pro- 
posed. The syntax tree of the regular expression 
is divided into modules which are implemented 
as DFAs and are thereafter treated as leaf nodes 
in the syntax tree. The process continues until a 
single module is obtained. An O(mn/ log n) time 
and space algorithm is obtained. This bound was 
recently improved by Bille and Thorup [2]. 


Theorem 3 (Bille and Thorup [2]) The REM 
problem can be solved in linear space and 
O (mn/(logn)>/) time. 


The same authors showed in [3] that the length 
m of the regular expression can be essentially re- 
placed in the complexity bounds by the number of 
strings (concatenations of characters) that appear 
in the regular expression. 


Bit Parallelism 

The simulation of the abovementioned modules 
is done by encoding all states as bits of a single 
computer word (called bit mask) so that all can 
be updated in a single operation. The method 
can be used without modules to simulate directly 
an NFA as done in [20] and implemented in the 
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Agrep software [19]. Note that, in fact, the DFA 
is also simulated: a whole bit mask corresponds 
to a subset of states of the NFA, that is, one state 
of the DFA. 

The bit-implementation of Wu and Man- 
ber [20] uses the property of Thompson’s 
automaton that all »’-labelled edges connect 
consecutive states, that is, they carry a bit | from 
position 7 to position? + 1. This makes it easy to 
deal with the X’'-labelled edges, but the e-labelled 
ones are more difficult. A table of size linear 
in the number of states of the DFA needs to be 
precomputed to account for the ¢-closures (set of 
states reachable from a given state by e-paths). 

Note that in Theorems 1, 2, and 3, the 
space complexity is given in words. In 
Theorems 4 and 5 below, for a more practical 
analysis, the space is given in bits and the 
alphabet size is also taken into consideration. 
For comparison, the space in Theorem 2, given 
in bits, is O(|’|m52”~). 


Theorem 4 (Wu and Manber [20]) Thomp- 
son’s automaton can be implemented using 
2m(22™+1 + | ¥|) bits. 


Glushkov-McNaughton-Yamada’s automaton 
has different structural properties. First, it is e- 
free, that is, there are no ¢é-labels on edges. 
Second, all edges incoming to a given state are 
labelled the same. These properties are exploited 
by Navarro and Raffinot [16] to construct a bit- 
parallel implementation that requires less space. 
The result is a simple algorithm for regular ex- 
pression searching which uses less space and usu- 
ally performs faster than any existing algorithm. 


Theorem 5 (Navarro and_ Raffinot  [16]) 
Glushkov-McNaughton-Yamada’s automaton can 
be implemented using (my + 1)(2"=*1 + |Z}) 
bits. 


All algorithms in this category run in O(n) 
time, but smaller DFA representation implies 
more locality of reference and thus faster algo- 
rithms in practice. An improvement of any al- 
gorithm using Glushkov-McNaughton- Yamada’s 
automaton can be done by reducing first the au- 
tomaton by merging some of its states, as done by 
Tlie et al. [8,9]. The reduction can be performed 
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in such a way that all useful properties of the 
automaton are preserved. The search becomes 
faster due to the reduction in size. 


Filtration 

The above approaches examine every character in 
the text. In [18] a multipattern search algorithm is 
used to search for strings that must appear inside 
any occurrence of the regular expression. Another 
technique is used in Gnu Grep; it extracts the 
longest string that must appear in any match (it 
can be used only when such a string exists). In 
[16], bit-parallel techniques are combined with a 
reverse factor search approach to obtain a very 
fast character-skipping algorithm for regular ex- 
pression searching. 


Related Problems 

Regular expressions with backreference have 
a feature that helps remembering what was 
matched to be used later; the matching problem 
becomes NP-complete; see [1]. Extended regular 
expressions involve adding two extra operators, 
intersection and complement, which do not 
change the expressive power. The corresponding 
matching problem can be solved in O((n + m)*) 
time using dynamic programming; see [7, 
Exercise 3.23]. 

Concerning finite automata construction, 
recall that Thompson’s NFA has O(m) edges, 
whereas the e-free Glushkov-McNaughton- 
Yamada’s NFA can have a quadratic number 
of edges. It has been shown in [4] that one can 
always build an e-free NFA with O(m logm) 
edges (for fixed alphabets). However, it is the 
number of states which is more important in the 
searching algorithms. 


Applications 


Regular expression matching is a powerful tool in 
text-based applications, such as text retrieval and 
text editing, and in computational biology to find 
various motifs in DNA and protein sequences. 
See [6] for more details. 
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Open Problems 


The most important theoretical problem is 
whether linear time and linear space can be 
achieved simultaneously. Characterizing the 
regular expressions that can be searched for using 
a linear-size equivalent DFA is also of interest. 
The expressions consisting of a single string are 
included here — the algorithm of Knuth, Morris, 
and Pratt is based on this. Also, it is not clear 
how much we can reduce an NFA efficiently (as 
done by [8,9]); the problem of finding a minimal 
NFA is PSPACE-complete; see [10]. Finally, 
for testing, it is not clear how to define random 
regular expressions. 


Experimental Results 


A disadvantage of the bit-parallel technique com- 
pared with the classical implementation of a DFA 
is that the former builds all possible subsets of 
states whereas the latter builds only the states that 
can be reached from the initial one (the other ones 
are useless). On the other hand, bit-parallel algo- 
rithms are simpler to code and more flexible (they 
allow also approximate matching), and there 
are techniques for reducing the space required. 
Among the bit-parallel versions, Glushkov- 
McNaughton- Yamada-based algorithms are bet- 
ter than Thompson-based ones. Modules obtain 
essentially the same complexity as bit-parallel 
ones but are more complicated to implement and 
slower in practice. As the number of computer 
words increases, bit-parallel algorithms slow 
down and modules may become attractive. Note 
also that technological progress has more impact 
on the bit-parallel algorithms, as opposed to clas- 
sical ones, since the former depend very much on 
the machine word size. For details on comparison 
among various algorithms (including filtration 
based), see [15]; more recent comparisons are in 
[16], including the fastest algorithms to date. 


URLs to Code and Data Sets 


Many text editors and programming languages 
include regular expression search features. They 
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are, as well, among the tools used in protein 
databases, such as PROSITE and SWISS-PROT, 
which can be found at www.expasy.org. The 
package agrep [20] can be downloaded from 
webglimpse.net and nrgrep [14] from www.dcc. 
uchile.cl/gnavarro/software. 
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Approximate Regular Expression Matching is 
a more general problem where errors are al- 
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Problem Definition 


Many sequential decision problems ranging from 
dynamic resource allocation to robotics can be 
formulated in terms of stochastic control and 
solved by methods of reinforcement learning. 


Reinforcement Learning 


Therefore, reinforcement learning (a.k.a neuro- 
dynamic programming) has become one of the 
major approaches to tackling real-life problems. 

In reinforcement learning, an agent wanders 
in an unknown environment and tries to max- 
imize its long-term return by performing ac- 
tions and receiving rewards. The most popular 
mathematical models to describe reinforcement 
learning problems are the Markov Decision Pro- 
cess (MDP) and its generalization, the partially 
observable MDP. In contrast to supervised learn- 
ing, in reinforcement learning, the agent is learn- 
ing through interaction with the environment and 
thus influences the “future.” One of the chal- 
lenges that arises in such cases is the exploration- 
exploitation dilemma. The agent can choose ei- 
ther to exploit its current knowledge and perhaps 
not learn anything new or to explore and risk 
missing considerable gains. 

While reinforcement learning contains many 
problems, due to lack of space, this entry focuses 
on the basic ones. For a detailed history of the 
development of reinforcement learning, see [1, 
Chapter 1]. The focus of this entry is on Q- 
learning and Rmax. 


Notation 


Markov Decision Process 

A Markov decision process (MDP) formalizes 
the following problem. An agent is in an envi- 
ronment, which is composed of different states. 
In each time step, the agent performs an action 
and as a result observes a signal. The signal 
is composed from the reward to the agent and 
the state it reaches in the next time step. More 
formally the MDP is defined as follows: 


Definition 1 A Markov decision process (MDP) 
M is a 4tuple (S,A,P,R), where S is a set 
of states, A is a set of actions, Ps,s’* is the 
transition probability from state s to state s’ when 
performing action a € A instate s, and R(s,a) is 
the reward distribution when performing action a 
in state s. 


A strategy for an MDP assigns, at each time 
t, for each state s a probability for perform- 
ing action a € A, given a history Fy}; = 
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{81,41,11,---,St-1,4¢-1,1r-1} which includes 
the states, actions, and rewards observed until 
time ¢ — 1. While executing a strategy mz, an 
agent performs at time f action a; in state s; 
and observes a reward 7; (distributed according 
to R(s;,a;)), and a next state 5,4, (distributed 
according to P,, 4). The sequence of rewards is 
combined into a single value called the return. 
The agent’s goal is to maximize the return. There 
are several natural ways to define the return. 


¢ Finite horizon: The return of policy m for a 
H 
given horizon H is }> rz. 
t=0 
¢ Discounted return: For a discount parameter 


y € (0, 1), the discounted return of policy x is 
Co 
te. 
t=0 
¢ Undiscounted return: The return of policy 1 


t 
. . 1 
is im —— Tj. 
too f+1 Xu e 


Due to lack of space, only discounted return, 
which is the most popular approach mainly due 
to its mathematical simplicity, is considered. The 


value function for each state s, under policy 1, 
Co 


is defined as V7(s) = E™[)~ r;y'], where the 
i=0 

expectation is over a run of policy m starting at 

state s. The state-action value function for using 

action a in state s and then following mz is defined 


as O"(s,a) = R(s,a)+y >> Poo V*(s'). 


There exists a stationary deterministic optimal 
policy, =*, which maximizes the return from any 
start state [11]. This implies that for any policy 1 
and any state s, V™*(s) > V™(s), and m*(s) = 
argmax,(Q™*(s,a)). A policy x is e-optimal if 
|V7* —V" loo <. 


Problems Formulation 
The reinforcement learning problems are divided 
into two categories, planning and learning. 


Planning 

Given an MDP in its tabular form, compute the 
optimal policy. An MDP is given in its tabular 
form if the 4-tuple, (A,S,P,R) is given explicitly. 
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The standard methods for the planning prob- 
lem in MDP are given below. 


Value Iteration 
Value iteration is defined as follows. Start 
with some initial value function, C;, and then 
iterate using the Bellman operator, TV(s) = 
max R(s,a)+y > P2,,V(s’). 

cd s/eS : 


Vo(s) =C;s 
Visi(s) = TVi(s), 


This method relies on the fact that the Bellman 
operator is contracting. Therefore, the distance 
between the optimal value function and current 
value function contracts by a factor of y with 
respect to max norm (9) in each iteration. 


Policy Iteration 

This algorithm starts with initial policy 9 and it- 
erates over polices. The algorithm has two phases 
for each iteration. In the first phase, the value 
evaluation step, a value function for 1; is calcu- 
lated, by finding the fixed point of Tr:Vanz = Vat, 
where Tz;V = R(S,m:(s))+y > P™SY (st), 

ses 


s,s! 
The second phase, policy improvement step, is 
taking the next policy m;+1 as a greedy policy 
with respect to Vz;. It is known that policy iter- 
ation converges with fewer iterations than value 
iteration. In practice the convergence of policy 
iteration is very fast. 


Linear Programming 

This approach formulates and solves an MDP 
as a linear program (LP). The LP variables are 
V,,...,Vn, where V; = V(s;). The definition is: 


Variables: Vj,..., Vn 
Minimize: > V; 


L 
Subject to: Vi > [R(si,a) + yD Ps;,s; (@Vj] 
J 
Vae A, 8; eS. 
Learning 


Given the states and action identities, learn an 
(almost) optimal policy through interaction with 
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the environment. The methods are divided into 
two categories: model-free learning and model- 
based learning. 

The widely used Q-learning [16] is a model- 
free algorithm. This algorithm belongs to the 
class of temporal difference algorithms [12]. Q- 
learning is an off-policy method, i.e., it does 
not depend on the underlying policy but, as can 
immediately be seen, depends on the trajectory 
and not on the policy generating the trajectory. 


Q-Learning 
The algorithm estimates the state-action value 
function (for discounted return) as follows: 


Qo(s,a) =0 
Q0141(s,a) = (1 — a; (s,a)) O;(s, a) 
+ a (s,a)(77(s,a) + yVi(s")) 


where s’ is the state reached from state s when 

performing action a at time ¢, and V;(s) = 

max qQ;(s,a). Assume that a;(s’,a’) = 0 if at 

time f action a’ was not performed at state 5’. 

A learning rate a; is well behaved if for every 
CO 

state action pair (s,a): (1) }> a;(s,a) = oo and 


co 
(2) > a?(s,a) = oo. As will be seen, this is 
t=1 


necessary for the convergence of the algorithm. 

The model-based algorithms are very simple 
to describe; they simply build an empirical model 
and use any of the standard methods to find the 
optimal policy in the empirical (approximate) 
model. The main challenge in these methods 
is in balancing exploration and exploitation and 
having an appropriate stopping condition. Several 
algorithms give a nice solution for this [3,7]. A 
version of these algorithms appearing in [6] is 
described below. 

On an intuitive level, a state will become 
known when it was visited “enough” times and 
one can estimate with high probability its param- 
eters with good accuracy. The modified empirical 
model is defined as follows. All states that are 
not in K are represented by a single absorbing 
state in which the reward is maximal (which 
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Algorithm 1: A model-based algorithm 


Rmax 
Set K = @; 
ifs ¢ K? then 
Execute 7(s) 
else 
Execute a random action; 
if s becomes known then 
K=K U{s}; 
Compute optimal policy, 7 for 
the modified empirical model 
end 
end 


causes exploration). The probability to move to 
the absorbing state from a state s € K is the 
empirical probability to move out of K from s 
and the probability to move between states in K 
is the empirical probability. 

Sample complexity [6] measures how many 
samples an algorithm needs in order to learn. 
Note that the sample complexity translates into 
the time needed for the agent to wander in the 
MDP. 


Key Results 


The first Theorem shows that the planning prob- 
lem is easy as long as the MDP is given in its 
tabular form, and one can use the algorithms 
presented in the previous section. 


Theorem 1 ((10]) Given an MDP, the planning 
problem is P-complete. 


The learning problem can be done also efficiently 
using the Rmax algorithm as is shown below. 


Theorem 2 ((3,7]) Rimax computes an ¢-optimal 
policy from state s with probability at least 1 —§ 
with sample complexity polynomial in |A|,|S|, 4 
and log t where s is the state in which the algo- 
rithm halts. Also the algorithm’s computational 
complexity is polynomial in |A| and | S|. 


The fact that Q-learning converges in the limit 
to the optimal Q function (which guarantees that 
the greedy policy with respect to the O function 
will be optimal) is now shown. 
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Theorem 3 ({17]) Jf every state-action is visited 
infinitely often and the learning rate is well be- 
haved, then Q; converges to Q* with probability 
one. 


The last statement is regarding the convergence 
rate of Q-learning. This statement must take into 
consideration some properties of the underlying 
policy, and assume that this policy covers the 
entire state space in reasonable time. The next 
theorem shows that the convergence rate of Q- 
learning can vary according to the tuning of the 
algorithm parameters. 


Theorem 4 ([4]) Let L be the time needed for 
the underlying policy to visit every state action 
with probability 1/2. Let T be the time until 
||O* — Or|| < € with probability at least 1 — 8 
and #(s,a,t) be the number of times action a 
was performed at state s until time t. Then if 
ay(s,a) = 1/#(s,a,t), then T is polynomial in 
L, 1, log ; and exponential in a Tf ar(s,a) = 
1 / #s,a,t)® for m € (1/2,1), then T is 
polynomial L, 4, log t and a 


Applications 


The biggest successes of reinforcement learn- 
ing so far are mentioned here. For a list of 
successful applications of reinforcement learn- 
ing, see http://neuromancer.eecs.umich.edu/cgi- 
bin/twiki/view/Main/SuccessesOfRL. 


Backgammon Tesauro [14] used temporal dif- 
ference learning combined with neural networks 
to design a player that learned to play backgam- 
mon by playing itself and resulted in a player at 
the level of the world’s top players. 


Helicopter control Ng et al. [9] used inverse re- 
inforcement learning for autonomous helicopter 
flight. 


Open Problems 


While in this entry only MDPs given in their tab- 
ular form were discussed, much current research 
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is dedicated to two major directions: large state 
space and partially observable environments. 

In many real-world applications, such as 
robotics, the agent cannot observe the state 
she is in and can only observe a signal which 
is correlated with it. In such scenarios, the 
MDP framework is no longer suitable, and 
another model is in order. The most popular 
reinforcement learning for such environments 
is the partially observable MDP. Unfortunately, 
for POMDP even the planning problems are 
intractable (and not only for the optimal policy 
which is not stationary but even for the optimal 
stationary policy); the learning contains even 
more obstacles as the agent cannot repeat the 
same state twice with certainty, and thus, it is 
not obvious how she can learn. An interesting 
open problem is trying to characterize when a 
POMDP is “solvable” and when it is hard to 
solve according to some structure. 

In most applications, the assumption that the 
MDP can be represented in its tabular form is not 
realistic and approximate methods are in order. 
Unfortunately not much theoretically is known 
under such conditions. Here are a few of the 
prominent directions to tackle large state space. 


Function Approximation 

The term “function approximation” is due to 
the fact that this approach takes examples from 
a desired function (e.g., a value function) and 
constructs an approximation of the entire func- 
tion. Function approximation is an instance of 
supervised learning, which is studied in machine 
learning and other fields. In contrast to the tabular 
representation, this time a parameter vector © 
represents the value function. The challenge will 
be to learn the optimal vector parameter in the 
sense of minimum square error, i.e., 


ming )(V™(s) — V(s,®)), 
ses 
where V(s,@) is the approximation function. 
One of the most important function approxima- 
tions is the linear function approximation, 


T 
Vi(s, 0) = D> os(1)Or(), 


i=1 
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where each state has a set of vector features, 
os. A feature-based function approximation 
was analyzed and demonstrated in [2, 15]. The 
main goal here is designing algorithms which 
converge to almost optimal polices under realistic 
assumptions. 


Factored Markov Decision Process 

In an FMDP, the set of states is described via 
a set of random variables X = {X,...,Xn}, 
where each_X; takes values in some finite domain 
Dom(X;). A state s defines a value x; € Dom(X;) 
for each variable X;. The transition model is 
encoded using a dynamic Bayesian network. Al- 
though the representation is efficient, not only is 
finding an ¢-optimal policy intractable [8], but it 
cannot be represented succinctly [1]. However, 
under assumptions on the FMDP structure, there 
exist algorithms such as [5] that have both theo- 
retical guarantees and nice empirical results. 
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Problem Definition 


Consider a system in which n + 1 processes 
Po,..., Pn communicate either by message- 
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passing or by reading and writing a shared 
memory. Processes are asynchronous: there is 
no upper or lower bounds on their speeds, and up 
to t of them may fail undetectably by halting. In 
the renaming task proposed by Attiya, Bar-Noy, 
Dolev, Peleg, and Reischuk [1], each process is 
given a unique input name taken from a range 
0,...,N, and chooses a unique output name 
taken from a strictly smaller range 0,..., K. To 
rule out trivial solutions, a process’s decision 
function must depend only on input names, not 
its preassigned identifier (so that P; cannot simply 
choose output name i). Attiya et al. showed that 
the task has no solution when K = n, but does 
have a solution when K = N +f. In 1993, 
Herlihy and Shavit [2] showed that the task has 
no solution when K < N +f. 

Vertexes, simplexes, and complexes model de- 
cision tasks.(See the companion article entitled 

Topology Approach in Distributed Comput- 
ing). A process’s state at the start or end of a task 
is represented as a vertex v labeled with that 
process’s identifier, and a value, either input or 
output: v = (P,v;). Two such vertexes are com- 
patible if (1) they have distinct process identi- 
fiers, and (2) those process can be assigned those 
values together. For example, in the renaming 
task, input values are required to be distinct, so 
two input vertexes are compatible only if they 
are labeled with distinct process identifiers and 
distinct input values. 

Figure | shows the output complex for the 
three-process renaming task using four names. 
Notice that the two edges marked A are identical, 
as are the two edges marked B. By identifying 
these edges, this task defines a simplicial complex 
that is topologically equivalent to a torus. Of 
course, after changing the number of processes or 
the number of names, this complex is no longer 
a torus. 


Key Results 


Theorem 1 Let S" be an n-simplex, and S” 
a face of S". Let S be the complex consisting of 
all faces of S”, and § the complex consisting of all 
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proper faces of S” (the boundary complex of S). If 
o(S) is a subdivision of S, and ¢:0(S) > F(S) 
a simplicial map, then there exists a subdivision 
t(S) and a simplicial map :t(S) > F(S) 
such that t($) = o(S), and @ and yy agree on 
o(S). 


Informally, any simplicial map of an m-sphere to 
F can be “filled in” to a simplicial map of the 
(m+1)-disk. A span for ¥ (S”) is a subdivision o 
of the input simplex S” together with a simplicial 
map ¢:0(S") — F(S”) such that for every 
face S” of 8”, d:o(S") — F(S™). Spans 
are constructed one dimension at a time. For 
each s = (Pi,u;) € S”,@ carries s to the 
solo execution by P; with input v;. For each 
S! = (9,81), Theorem 1 implies that (so) 
and (s,) can be joined by a path in F(S'). 
For each S* = (So,81,82), the inductively 
constructed spans define each face of the 
boundary complex ¢:0(S},) > F(S})ij, for 
i,j € {0,1,2}. Theorem 1 implies that one 
can “fill in” this map, extending the subdivision 
from the boundary complex to the entire 
complex. 


Theorem 2 /f a decision task has a protocol 
in asynchronous read/write memory, then each 
input simplex has a span. 


One can restrict attention to protocols that have 
the property that any process chooses the same 
name in a solo execution. 


Definition 1 A protocol is comparison-based if 
the only operations a process can perform on 
processor identifiers is to test for equality and 
order; that is, given two P and Q, a process can 
test for P = O,P < Q,and P > Q, but 
cannot examine the structure of the identifiers in 
any more detail. 


Lemma 1 [fa wait-free renaming protocol for K 
names exists, then a comparison-based protocol 
exists. 


Proof Attiya et al. [1] give a simple comparison- 
based wait-free renaming protocol that uses 2” + 
1 output names. Use this algorithm to assign each 
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Renaming, Fig. 1 Output complex for 3-process renaming with 4 names 


process an intermediate name, and use that inter- 
mediate name as input to the K-name protocol.O 


Comparison-based algorithms are symmetric on 
the boundary of the span. Let S” be an input 
simplex, ¢:0(S”) — F(S") a span, and R the 
output complex for 2n names. Composing the 
span map ¢ and the decision map 6 yields a map 
o(S”) — &. This map can be simplified by re- 
placing each output name by its parity, replacing 
the complex R with the binary n-sphere B”. 


pa(S") > B”. (1) 


Denote the simplex of B” whose values are all 
zero by 0”, and all one by 1”. 


Lemma 2 y~!(0") = w1(1") = @. 

Proof The range 0,..., 2n — 1 does not contain 
n + 1 distinct even names or n + | distinct odd 
names. O 


The n-cylinder C” is the binary n-sphere without 
0” and 1”. Informally, the rest of the argument 
proceeds by showing that the boundary of the 
span is “wrapped around” the hole in C” a non- 
zero number of times. 

The span o(S”) (indeed any any subdivided 
n-simplex) is a (combinatorial) manifold with 
boundary: each (n — 1)-simplex is a face of either 
one or two n-simplexes. If it is a face of two, 
then the simplex is an internal simplex, and oth- 
erwise it is a boundary simplex. An orientation 


of S” induces an orientation on each n-simplex of 
a(S”) so that each internal (n—1)-simplex inher- 
its opposite orientations. Summing these oriented 
simplexes yields a chain, denoted o.(S”), such 
that 


dox(S") = ¥°(-1)'o(face;(S")) . 


i=0 


The following is a standard result about the ho- 
mology of spheres. 


Theorem 3 Let the chain 0" be the simplex 0” 
oriented like S". (1) For0 < m <n, any two m- 
cycles are homologous, and (2) every n-cycle C" 
is homologous to k - 00”, for some integer k. C" 
is a boundary if and only if k = 0. 


Let S” be the face of S” spanned by solo exe- 
cutions of Po,..., Pm. Let 0” denote some m- 
simplex of C” whose values are all zero. Which 
one will be clear from context. 


Lemma 3 For every proper face S™—' of S", 
there is an m-chain a(S™~') such that 


m 


px (o4(S)) — 0" — Y 1(—1)' a(face;(S”)) 


i=0 


is a cycle. 
Proof By induction on m. When m=1, 
ids(S') = {i,j}. 0! and ps(ox(S!)) are 
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1-chains with a common boundary (P;,0) — 
(P;,0), so x(ox(S')) — 0! is a cycle, and 
a((P;,0)) = @. 

Assume the claim form,1 > m < n—1. By 
Theorem 3, every m-cycle is a boundary (form < 
n — 1), so there exists an (m + 1)-chain a(S”) 
such that 


[x (F4(S”)) — 0" — Y 1(—-1)'a(face;(S”)) 


1=0 
= da(S”). 


Taking the alternating sum over the faces of 
S™*+1 the a(face;(S™)) cancel out, yielding 


jix(dox(S™**)) = 00") 


m+1 


= > (—1)' da(face;(S*")) . 


i=0 


Rearranging terms yields 


a (ua(o.(s”"y = gmtl 


m+1 


S YC Yia(face,(s"*")) =0, 


i=0 
implying that 


mois) _ gmtl 


m+1 


— D5 (1 a(face;(S”*")) 


i=0 


is an (m + 1)-cycle. O 


Theorem 4 There is no wait-free renaming pro- 
tocol for (n + 1) processes using 2n output 
names. 


Proof Because 


n 


pix (ox(S"*)) — 0"! —Y 1-1) a(face;(S"~")) 


i=0 
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is a cycle, Theorem 3 implies that it is homol- 
ogous to k - 00", for some integer k. Because 
j£ is symmetric on the boundary of o(S”), the 
alternating sum over the (n — 1)-dimensional 
faces of S” yields: 


[tx (00%(S”")) — 00” ~ (n + 1)k - 00” 
or 

[ix (00%(S")) ~ (1 + (n + 1)k)- 00". 
Since there is no value of k for which (1 + 


(n + 1)k) is zero, the cycle 44(d0%(S”)) is not 
a boundary, a contradiction. ] 


Applications 


The renaming problem is a key tool for un- 
derstanding the power of various asynchronous 
models of computation. 


Open Problems 


Characterizing the full power of the topological 
approach to proving lower bounds remains an 
open problem. 
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Introduction 


Fueled by the growth of Internet and advance- 
ments in online advertising techniques, today 
more and more online firms rely on advertising 
revenue for their business. Some of these firms 
include news agencies, media outlets, search 
engines, social and professional networks, 
etc. Much of this online advertising business 
is moving to what’s called programmatic 
buying where an advertiser bids for each single 
impression, sometimes in real time, depending 
on how he values the ad opportunity. This work 
is motivated by the need of a desired property 
in the auction mechanisms that are used in these 
bid-based advertising systems. 

A standard mechanism for most auction 
scenarios is the famous Vickrey-Clarke-Groves 
(VCG) mechanism. VCG is incentive compatible 
(IC) and maximizes social welfare. Incentive 
compatibility guarantees that the best response 
for each advertiser is to report its true valuation. 
This makes the mechanism transparent and 
removes the load from the advertisers to calculate 
the best response. Social welfare is the sum of the 
valuations of the winners. This value is treated 
as a proxy for how much all the participants 
gain from the transaction. What makes VCG 
mechanism versatile is that it reduces the 
mechanism design problem into an optimization 
problem for any scenario. 
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Even though this versatility of VCG mech- 
anism makes it a popular choice mechanism, 
however, it doesn’t satisfy an important property, 
namely, that of revenue monotonicity. Revenue 
monotonicity says that if one increases the bid 
values or add new bidders, the total revenue 
should not go down. To see that VCG is not 
revenue monotone, consider a simple example of 
two items and three bidders (A, B, and C). Say, 
bidder A wants only the first item and has a bid 
of 2. Similarly bidder B wants only the second 
item and has a bid of 2. Bidder C wants both the 
items or nothing and has a bid of 2. Now if only 
bidders A and B participate in the auction, then 
VCG gives a revenue of 2; however, if all the 
three bidders participate, then the revenue goes 
down to 0. 

This lack of revenue monotonicity (which has 
been noted several times in the literature) is one 
of the serious practical drawbacks of the cele- 
brated VCG mechanism. To think of it, an online 
firm that depends on advertising revenue puts 
significant resources in its sales efforts to attract 
more bidders as the general belief is that more 
bidders imply more competition which should 
lead to higher prices. Now to tell this firm that 
their revenue can go down if they get more 
bidders can be strategically very confusing for 
them. To see this from another perspective, say, in 
a search engine firm, there is a team which makes 
a UI change that increases the click-through prob- 
ability (CTR) of the search ads. These changes 
are thought of as good changes in the firm as 
they increase the effective bid of the bidders 
(the effective bid of a bidder in search adver- 
tising is a function of its cost-per-click bid and 
the CTR of its ad). Now if after making the 
change, the revenue goes down, what was sup- 
posed to be a good change may seem like a bad 
change. The point we are trying to make is that 
there are many teams in a firm, and for these 
teams to function properly, it is important that the 
auction mechanisms satisfy revenue monotonic- 
ity. 

In this entry, with a focus on auctions arising 
in advertising scenarios, we seek to understand 
mechanisms that satisfy this additional property 
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of revenue monotonicity (RM). It is well 
known that for various settings (including ours), 
no mechanism can satisfy both IC and RM 
properties while attaining optimal social welfare. 
In fact it is known that one cannot even hope 
to get Pareto-optimality in social welfare while 
attaining both IC and RM [10]. Thus to overcome 
this bottleneck and develop an understanding 
of RM mechanisms, we relax the requirement 
of attaining full social welfare and define the 
notion of price of revenue monotonicity (PORM). 
Price of revenue monotonicity of an IC and RM 
mechanism M is the ratio of optimal social 
welfare to the social welfare attained by the 
mechanism M. The goal is to design mechanisms 
that satisfy IC and RM properties and at the same 
time achieve low price of revenue monotonicity. 
To the best of our knowledge, this is the first 
work that defines and studies this notion of price 
of revenue monotonicity. 

We study two different advertising settings 
in this entry. The first setting we study is the 
image-text auction. In image-text auction there 
is a special box designated for advertising in a 
publisher’s website which can be filled by either 
k text ads or a single image ad. The second setting 
is the video-pod auction where an advertising 
break of a certain duration in a video content 
can be filled with multiple video ads of possibly 
different durations. 

We note that revenue monotonicity is an 
across-instance constraint as it requires total 
revenue to behave in a certain manner across 
different instances, where a single instance is 
defined by fixing the type of the buyers. Note 
that incentive compatibility is also an across- 
instance constraint. A lot of research effort has 
gone into understanding incentive compatibility, 
which has resulted in useful tools for designing 
incentive-compatible mechanisms. Surprisingly, 
hardly any work has gone into understanding 
and building tools for designing mechanisms 
which satisfy the desired property of revenue 
monotonicity. We believe that understanding 
revenue monotonicity will shed new fundamental 
insights into the design of mechanisms for many 
practical scenarios. 
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Related Work 

Ausubel and Milgrom [1] show that VCG satis- 
fies RM if bidders’ valuations satisfy bidder sub- 
modularity. Bidders’ valuations satisfy bidders 
submodularity if and only if for any bidder 7 and 
any two sets of bidders S, S’ with S C S’ we 
have WF(S U {i}) — WF(S) > wk(S’ U {i}) — 
WF(S’), where WF(S) is the maximum social 
welfare achievable using only S. Note that this 
is a general tool one can use to design revenue- 
monotone mechanisms — restrict the range of 
the possible allocations such that we get bidder 
submodularity when we run VCG on this range. 
However, we can show that this general tool is 
not so powerful by showing that for our auction 
scenarios, it is not possible to get a mechanism 
with PORM better than Q(k) by using the above 
tool. 

Ausubel and Milgrom [1] also show that bid- 
der submodularity is guaranteed when the goods 
are substitutes, 1.e., the valuation function of each 
bidder is submodular over the goods. However, 
for many practical scenarios, including ours, the 
valuation function of the bidders is not submod- 
ular. Ausubel and Milgrom [1] design mecha- 
nisms which select allocations that are in the 
core of the exchange economy for combinatorial 
auctions. Here an allocation is in the core if 
there is no coalition of bidders and the seller 
to trade with each other in a way which is pre- 
ferred by all the members of the coalition to 
the allocation. Day and Milgrom [3] show that 
core-selecting mechanisms that choose a core 
allocation which minimizes the seller’s revenue 
satisfy RM given bidders follow so-called best- 
response truncation strategy. Therefore the core- 
selecting mechanism designed by [1] satisfies 
RM if the participants play such best-response 
strategy, although this mechanism is not incentive 
compatible. 

Rastegari et al. [10] prove that no mechanism 
for general combinatorial auctions which satisfies 
IC and RM can achieve weakly maximal social 
welfare. An allocation is weakly maximal if it 
cannot be modified to make at least one partic- 
ipant better off without hurting anyone else. In 
another work [9] they design a randomized mech- 
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anism for combinatorial auctions which achieves 
weak maximality and expected revenue mono- 
tonicity. 

Another related work is around the charac- 
terization of mechanisms that achieve the IC 
property. The classic result of Roberts [11] 
states that affine maximizers are the only social 
choice functions that can be implemented using 
IC mechanisms when bidders have unrestricted 
quasi-linear valuations. Subsequent works study 
the restricted cases [2, 6, 12, 13]. 

There is also an extensive body of research 
around designing mechanisms with good bounds 
on the revenue. Myerson [7] designs a mecha- 
nism which achieves the optimal expected rev- 
enue in the single parameter Bayesian setting. 
Goldbert et al. consider optimizing revenue in 
prior-free settings (see [8] for a survey on this). 


Our Results 

As mentioned earlier, we study two settings: (1) 
image-text auction and (2) video-pod auction. 
Both these settings can be described using the 
following abstract model. Say, there is a seller 
selling k identical items to n participants/buyers. 
Participant i wants either d; items or nothing 
and has a valuation of v; if it gets d; items or 
0 otherwise. Demand dj; is assumed to be public 
knowledge, and valuation v; is assumed to be the 
private information of the participant 7. We want 
to design a mechanism that is incentive com- 
patible, individually rational (IR), and revenue 
monotone and maximizes social welfare. 

For the image-text auction, the demand d; € 
{1,k}, ie., each participant wants either 1 item 
(text ads) or k items (image ad). For the video- 
pod auction, an item corresponds to a unit time 
interval (say, one second), and the demand d; 
could be any number between | and k, i.e., dj € 
[k]. 

The first result of this entry is the following 
theorem. 


Theorem 1 We design a deterministic mech- 
anism for image-text auction (MITA) which 
satisfies individual rationality (IR), IC, and RM 
with PORM of at most bye + & In(k), ie, the 
ratio of MITA’s welfare over the optimal welfare 
is at most In(k). 
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The proof of Theorem | appears in section “Im- 
age-Text Auctions.” We outline our mechanism 
over here: Let v} > ... > Uy, be the valuations 
of text participants and V; be the maximum valu- 
ation of the image participants. If max ;<[x] j + U; 
is less than V;, MITA gives all the items to the 
image participant who has valuation Vj; other- 
wise MITA picks the highest j* text participants 
as the winners where j* is the maximum number 
in [A] such that j*-v;* > V;. Note that the 7 that 
maximizes j -v; might be less than the j * which 
is the largest j such that j -v; > Vj). Also note 
that MITA sometimes picks less than k text ads 
as the winner (even if there are k or more text 
ads). VCG always picks the maximum number 
of text ads (if it decides to allocate the slot to 
text ads); this is one of the reasons why VCG 
fails to satisfy RM. When we allow lesser number 
of text ads to be declared as winners, intuitively, 
this increases the competition which boosts the 
revenue and thus helps in achieving RM, although 
this comes with a loss in social welfare. 
Surprisingly, we can also show that the above 
mechanism achieves the optimal PORM for the 
image-text auction by proving a matching lower 
bound. We show that a mechanism that satis- 
fies IR, IC, RM, and two additional mild as- 
sumptions of anonymity (AM) and independence 
of irrelevant alternatives (ITA) cannot achieve a 
PORM better than a +. Anonymity means 
that the auction mechanism doesn’t depend on the 
identities of the participants (a formal definition 
appears in section “Image-Text Auctions”). IIA 
means that decreasing the bid of a losing partic- 
ipant shouldn’t hurt any winner. Note that our 
mechanism satisfies both AM and IIA as well. 
Formally, we prove the following theorem whose 
proof appears in section “Image-Text Auctions.” 


Theorem 2 There is no deterministic mecha- 
nism which satisfies IR, IC, RM, AM, and ITA and 
has PORM less than ae +: 


Finally we prove the following theorem for 
video-pod auctions. 


Theorem 3 We design a mechanism for video- 
pod auction (MVPA) which satisfies IR, IC, and 
RM with PORM of at most (log k |+1)-(2+Ink). 
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We give the formal proof of Theorem 3 in 
section “Video-Pod Auctions” and outline the 
mechanism here. MVPA partitions the participants 
into ({logk| + 1) groups where each group g € 
[log k] contains only the participants whose de- 
mands are in the range [2®—!, 28). MVPA selects 
winners only from one group. We round up the 
size of each participant in group g to 2%; thus 
we can have at most fe number of winners from 


the group g. Let pif) > > vif) be the 
sorted valuations of all the participants in group 
g. We define the max possible revenue of group 
g (MPRG(g)) to be 


(g) 


MPRG(g) = max j-v;"’. 


Je[k/28] 
As the name of MPRG(g) suggests, its value 
captures the maximum revenue we can truthfully 
obtain from group g without violating revenue 
monotonicity. Let g* be the group with the high- 
est MPRG value and group g’ be the group whose 
MPRG is the second highest. The set of winners 
are the first j participants from group g* where 
j is the largest number in [k/2%] such that j - 
oes is greater than or equal MPRG(g’). We 
show that PORM of MvPA is (|logk]| + 1) - 
(2+ Ink). 


Preliminaries 


Let N = {1,...,m} be the set of all participants 
and k be the number of identical items. We denote 
the type of participant i by 6; = (dj, v;) € [k] x 
Rt, where d; is the number of items participant 
i demands and v; is her valuation for getting 
d; items. Note that the valuation of player i for 
getting less than d; items is 0. Now in the image- 
text auction, participants have demand of either 
1 or k. In the video-pod auction, participants 
can have arbitrary demands in {1,...,k}. Let’s 
denote the set of all possible types [k] x R* by © 
and the set of all type profiles of n participants by 
oO" =0~x...xO. 

—$———— 


A deterministic mechanism M consists of an 
allocation rule x : ©” — 2” which maps each 
type profile to a subset of participants as the 
winners and payment rule p : O” > (Rt)" 
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which maps each type profile to the payments of 
each participant. 

Let 6 = (61, 62,...,0,) € ©” be a specific 
type profile. Also let Ag be the set of all feasible 
solutions, i.e., 


Ao=} sen] Da <i. 


ieS 


For each feasible solution A € Ag, the social 
welfare of A (denoted by WF(A)) is equal to 
one 4vi- To evaluate the social welfare of a 
mechanism on a type profile 0, we compare 
the welfare of its solution to the optimal solution. 


Definition 1 The welfare ratio of mechanism 
M = (x, p) on type profile 6 € ©” (denoted by 
WFR(M, @)) is the following: 


max WF(A 
WER(M, 0) = ——4S4e__*** (A) 
WF(x(6)) 
To capture the worst-case loss in social wel- 
fare across all type profiles, we define the notion 
of price of revenue monotonicity. 


Definition 2 The Price of Revenue Monotonic- 
ity of a mechanism M (denoted by PORM(M)) 
is defined as follows: 


PORM(M) = max WFR(M, 0) 
E€ 71 


The desired goal is to design mechanisms 
which have low PORM value, where the best 
possible value is 1. 

Note that since we are interested in mecha- 
nisms with bounded PORM, we restrict ourselves 
to mechanisms that satisfy consumer sovereignty. 
Consumer sovereignty says that any participant 
can be a winner as long as he bids high enough. 

Now we will define a weakly monotone allo- 
cation rule which is used in the characterization 
of deterministic IC mechanisms. Let function x; : 
©” — {0,1} be the restriction of function x to 
participant 7. Here x; (.) is one if participant is a 
winner and zero otherwise. 


Definition 3 We call allocation function x is 
weakly monotone if for any type profile 6 € 
©” and any participant i ¢€ [n] with demand 
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d;, function x; ((d;, v;), 9-;) is a non-decreasing 
function in v;. 


Note that if a deterministic mechanism M 
satisfies consumer sovereignty and has a weakly 
monotone allocation function, then function 
x; ((d;,v;),9-;) is a single-step function. The 
value at which the function x;((d;, v;), 0-i) 
jumps from zero to one, i.e., the smallest value 
at which the participant i becomes a winner, is 
called critical value. 


Definition 4 Let M/ = (x, p) be a deterministic 
mechanism that satisfies consumer sovereignty 
and has a weakly monotone allocation function; 
the critical value of participant i in type profile 6 
is u* = sup{v;|x; (di, vj), 0-7) = O}. 


The following lemma characterizes determin- 
istic IC mechanisms (first given by [7]). We pro- 
vide a proof sketch for the sake of completeness 
(for a complete proof, see, e.g., [8]). 


Lemmal Let M = (x, p) be a mechanism 
which satisfies IR. Mechanism M is truthful (IC) 
if and only if the following hold: 


1. x is weakly monotone. 
2. If participant i is a winner, then its payment is 
its critical value (v;*). 


Proof First we prove that if M is truthful, then 
it satisfies both conditions | and 2. We prove 
the first condition by contradiction. If x is not 
monotone, then there exist participant i, type 


profile 6, and two values uf? > vu,” such 


that i wins in type profile (a, v), 6-:) but 


loses in type profile (a, vl), 6.1). This makes 
incentive for participant 7 to lie for type profile 
(a, ul), a) and announce its valuation as 
y®, 

Consider an arbitrary participant i who is 
a winner; now we prove that the payment of 
participant i is its critical value. Assume for con- 
tradiction that mechanism M charges participant 
i amount c; where c; < v; in a type profile 
((d;, v;), 0-;). In this case, if participant i had 
type (dj,v;) where c; < vj; < v;, then i is not 
a winner in ((d;,¥;),9-;) as vj is the critical 
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value. Therefore, if the real type of participant 
i is (d;,v;), she has incentive to lie her type as 
(d;, v;), become a winner, and pay c;. Hence, the 
payment cannot be less than v;*. Now suppose 
that there exists value v; for which mechanism 
M charges i amount c; which is more than v;’. In 
this case, if participant 7 had type (d;, v;) where 
vu} < v; < cj, then? is still a winner (as vj is the 
critical value) and pays at most vj; (as M satisfies 
IR). Therefore, she has an incentive to lie her type 
as (d;, 0; ), become a winner, and pay at most ¥j. 
Hence, the payment cannot be more than v* for 
any winning valuation ;. 

For the other direction, it is easy to check 
that any IR mechanism that satisfies conditions 1 
and 2 is truthful. Oo 


Image-Text Auctions 


In this section we give our mechanism for image- 
text auction (MITA) which satisfies IR, IC, RM, 
and PORM(MITA) < Ink. Recall that in the 
image-text auction we have k identical items to 
sell and there are two groups of participants: the 
ones who want all the & items which we call 
image participants and the ones who want only 
one item which we refer to as_ text participants. 
As a result there are also two possible types of 
outcome: MITA gives all the items to an image 
participant; or it gives an item to each member of 
a subset of the text participants. 

We start with explaining why VCG fails to 
satisfy RM and how we address this issue in 
MITA. Consider the type profile where we have 
one image participant with type (k,1) and one 
text participant with type (1, 1). In this case either 
of the participants can be the winner. The pay- 
ment of the winner in VCG is her critical value 
which is one. However if we add one more text 
participant with the same type (1, 1), the two text 
participants win and each of them pays zero. The 
reason for the payment drop is that VCG always 
selects k winners from the text participants. This 
decreases the critical value of each text partici- 
pant as the valuation of the other text participants 
helps her to win against image participants. In 
our mechanism we overcome this issue by not 
guaranteeing that the maximal number of text 
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participants can win an item. In other words, in 
our mechanism it is possible that less than k text 
participants win an item even if there are more 
than & text participants. This way, intuitively, 
even if the number of text participants increases, 
it potentially creates more competition and hence 
increases the payments. 

Let @ be an arbitrary-type profile where there 
are nj, text participants with types (1,v1),..., 
(1,vn,) and nz image participants with types 
(k,Vi),...,(k,Vn5). We define mechanism 
MITA = (xMITA_ MITA) by giving allocation 
function xMITA which is weakly monotone. 
Given the allocation function, we obtain payment 
function pM!T using the critical values defined 
in Lemma | which makes the mechanism 
truthful. 


Allocation rule of MITA. Without loss of gen- 
erality, we assume that vy] > v2 >... = Un, 
and Vy => V2 =>... = Vy. Also, we assume 
that n, > k; if not, we add fake text participants 
with value 0. For each j € [k], we consider 
value j - v;. Let candidate set Cg contain all the 
values j € [k] such that 7 - v; is greater than or 
equal to Vj, i.e., Ce = {7 € [K]|j -v; = Vit. 
If Cg is empty, the image participant with type 
(k, V;) wins. If Cg is nonempty, then let j* be the 
maximum member of Co, i.e., 7* = maxjec, /- 
In this case the first j* text participants win. 


Observation 1 Allocation function xM'TA jis 


weakly monotone. 


Proof Recall from Definition 3, in order to prove 
that xMITA is weakly monotone, we have to show 
that for any participant i € [n] with demand 
d;, function x; ((d;, v;), 9-;) is a non-decreasing 
function in v;. 

If i is an image participant, then 7 wins if its 
valuation is larger than max(W, max ;ejx] j * vj) 
where W is the largest valuation of the 
image participants in 6_;. Moreover, bidder 
i loses for any value smaller than or equal to 
max(W, max ;e[x] j - vj). Therefore x; is weakly 
monotone. 

If i is a text participant, then let v, > v5, >... 
be the sorted valuations of the text participants 
and V; be the largest valuation of image 
participants in 6_;. Let ¢ be the smallest value 
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such that there exist j € [k — 1] where vi ,, < 
t< vu, and (j + 1)-tf is greater than or equal to 
Y,. If the valuation of bidder 7 is larger than or 
equal to ¢, then she wins since (j + 1)-t > Vj; 
otherwise she does not win since ¢ is the smallest 
value for which there exist 7 € [k — 1] such 
that (7 + 1)-t > V,. Therefore x; is weakly 
monotone. oO 


In the following lemma we obtain the critical 
value (or truthful payments) of the winners in 
x MITA using Lemma 1. The lemma also gives 
an intuition to why we select j* text participants 
to win, which is the maximum / such that j-v; = 
Vj. 


Lemma 2 /[f Cg, where Co = {j € [Alli - 

v; => Vi}, is empty, then the first image par- 

ticipant wins all the items with critical value 

max(V2, max jez] j - vj). If Co is not empty, the 

first j* text participants win the items where 
* 


J* = maxjec,j and all of them have critical 
value max(vz+41, 1). 


Proof We find the critical value (Definition 4) of 
a winner by showing that if she has any valuation 
larger than the critical value she wins and for any 
valuation less than the critical value she doesn’t. 

If Cg is empty, then the first image partici- 
pant (with type (k,V,)) wins all the items. As 
long as V; is larger than max(V2, maxjex] j - 
v;), participant (k, V;) wins. If V; is less than 
max(V2, max j;e[x] j - vj), then she loses to the 
image participant (k, V2) if max(V2, max ;e[x] j - 
v;) = V2 or loses to the text participants if 
max(V2, max jek] j - Vj) = Max;elgy j -v;. This 
means that the critical value of the first image 
participant is max(V2, max jez] j - v;) if she is 
the winner. 

If Cg is nonempty, then the first j7* text par- 
ticipants win. Let i € [j*] be an arbitrary 
winner. First we observe that for any valuation 
v} greater than or equal to max (ves, nm), par- 
ticipant i remains as a winner in type profile 
6’ = ((1, v}), @_;). This is because for any such 
change in valuation of participant i number j* 
remains in set Cg. Moreover, this change does 
not add any new number j’ to Cg such that j’ > 
j* because the valuations of the text participants 
with index greater than j* are not changed in 0’. 
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In order to prove that for any valuation v} less 
than critical value max (vest, th), participant i 
is not a winner we consider two cases: (A) when 


the critical value is equal to iS and (B) when the 
critical value is equal to vz+1. 


Case (A): We prove this case by contradiction. 
Let v; be a valuation less than a for which 
participant 7 is in the set of winners in type 
profile 6’ = ((1,v/), @-;). Because v} is less 
than a, the number of winners which con- 
tains participant i cannot be less than or equal 
to j* in type profile 6’. Let 7’ € [k] which 
is greater than j* be the number of winners 
in 6’. This means that there are at least j’ 
participants whose valuation is larger than 4 
in 6’. Note that all the valuations in @ are the 
same as 6’ except v; which is decreased to v/; 
therefore, there are also at least j’ participants 
whose valuation is larger than ull in 6 and 
hence j’ is in set Cg. This contradicts with the 
fact that 7* is the largest member of Co. 

Case (B): Incase (B) we have max (vent, a) = 
vg+1 Which implies that k - vg4, is larger 
than V; as j* € [k]. Therefore Case (B) can 
only happen when j* k. Now consider 
participant i decreases its valuation to value 
v; that is less than vg4 1; then it cannot be a 
winner as there are k other participants whose 
valuations are more than v; while we have 
only k items. Oo 


The payment function of MITA is set to the 
critical values of the winners as specified in 
Lemma 2 which by using Observation | and 
Lemma | implies MITA satisfies IC. Moreover, 
as the payments are always less than the partic- 
ipants’ bid, IR property of MITA follows. Finally 


REVENUE(MITA, 0) = 
max(V2, max je[x] J * Vj) 


max(Vi, k + vet) 


Similarly the total revenue for type profile 0’ is 
the following: 
REVENUE(MITA, 6’) = 
max(V3, max je(x] j - U';) 
max(V/,k + v;, 1) 
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in the following lemma, we show that MITA is 
revenue monotone. 


Lemma 3 Let 6’ be the type profile obtained 
by either increasing the valuation of a partic- 
ipant or adding a new participant to the type 
profile 0; then we have REVENUE(MITA, 6’) > 
REVENUE(MITA, @). 


Proof Let vy} => v2 => ... be the valuations of 
text participants and V; > V2 => ... be the 
valuations of image participants in 0. Similarly 
let vi > v, > ... be the valuations of text 
participants and Vj > Vz > ... be the valuations 
of image participants in 9’. Note that for any i we 
have vj < v; and V; < V; as we have onemore 
participant or a higher valuation in 6’. Let x 
be the new added participant or the participant 
which has higher valuation in 6’. 

We prove this lemma by considering the value 
of REVENUE(MITA, 6) for the case when text 
participants win and the case when an image 
participant wins. If an image participant wins, 
then it means that V; > maxjejx] j - vj and she 
pays max(V2, max [x] j - vj) which is the total 
revenue. 

If text participants win, then it means 
VY, < max;erg) j - vj and there are j* winners 
where each of them pays max(vg41, 44). If 
Yi 


max(vz41, *) = then the total revenue 
is V;. If max(v,41, 44) = Uxz41, it implies 
that k + vgy, is larger than V,. Remember 
that Cg Vio€ ([k]l7j - vy = Vit and 
j* = maxjec, j; therefore j* = k and hence 
the total payment of the winners is k - vg44. 

In summary the total revenue for type profile 


6 is the following: 


V, > max je[x] J ‘Uj (A) 


Vy < max ex] j +; (B) 


(A) 
(B) 


ve > max je[k] J : 
Vi < max jefe) j- 
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Note that because for any 7 we have v; < v; the proof of the lemma follows from Eqs. (2) 
and Vi < V;/ the following inequalities are and (3). Similarly if both REVENUE(MITA, 6) 


straightforward: and REVENUE(MITA, 6’) take their value from 
Case (B), then the proof of the lemma follows 
U<V, (1) from Eqs. (1) and (4). 
; If REVENUE(MITA, @) takes its value from 
V2 < Vj; (2) 
Case (A) and REVENUE(MITA, 0’) takes from 
max j -vj; < max j-v’, (3) Case (B), then it means that participant x is a 
Jélk] jek] tert nurticl : ; a) 
participant which causes max j¢[x] j -v; to be 
k-ugg1 <k- U, 4 (4) larger than V/. The following proves the theorem 


for this case: 
If both REVENUE(MITA,@) and REVENUE 
( MITA, 0’) take their value from Case (A), then 


REVENUE(MITA, @) 


= max(V>, max j -v; 
(V2, max i 24) 


<Vj REVENUE(MITA, @) takes 
its value from Case (A) 
=Vi participant x is a 
text-participant 

< max(Vj,k + v; 41) 


= REVENUE(MITA, 0’) 


If REVENUE(MITA, 6) takes its value from Case (A), then it means that participant x is 
Case (B) and REVENUE(MITA, 6’) takes from an image participant. The following proves the 
theorem for this case: 


REVENUE(MITA, @) 
= max(V,k - vg41) 


< nee ‘Uj REVENUE(MITA, @) takes 
JE ; 


its value from Case (B) and 
the fact that vp > vg41 


= max j -v’, x is an image participant 
jek] 


< max(V3, max j -v’, 
= ( 2. vat A) 


= REVENUE(MITA, 0’) o 
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In the above we proved that MITA satisfies 
IR, IC, and RM. In the following theorem 
we bound the PORM of MITA and finish this 
section. 


Theorem 4 PORM(MITA) < Ink. 


Proof Let A be the set of winner(s) which real- 
izes the maximum social welfare in type profile 0. 
If A contains only one image participant with val- 
uation V;, then we also have Vj > max je[x] J+v;. 
Mechanism MITA also selects an image partici- 
pant with the same valuation if Vj > max ;e[x] j 
v; and hence PORM(MITA) is 1. Otherwise we 
have V; = max;e[x] j - vj; where MITA selects a 
set of text participants which overall gives social 
welfare V; and hence again the PORM(MITA) 
is 1. 

Now we consider the case when A contains 
text participants. By adding enough dummy par- 
ticipants with value zero, and without loss of 
generality, we assume that set A contains the 
first k text participants with highest valuations 
Vy > v2 >... > vz. Mechanism MITA selects 
either the first 7* text participants with highest 


2 jelk] Us 


jels*] Us 


PORM(MITA) = 
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valuations (vj > v2 >... = vj;*) or selects an 
image participant with valuation V;. Remember 
that 7* is the greatest number in set Cg 
ili € [k] Aj: vj; = Vi} which implies the 
following: 


V7 Sl ach ay : (5) 
Note that if MITA selects an image participant, 
then Eq. (5) holds for j* = 0. 

Now we consider the following two cases to 
prove the theorem: 

If MITA selects an image participant, then we 
have the following: 


jet Uj 

1 
7 Diet Mild 
Vi 


PORM(MITA) = 


Eq. (5) 


<Ink 


If MITA selects the first j* text participants, then 
we have the following: 


k 
2 Viet t+ Viajes 


Di jeli*] Ys 


k 3 
— Vitis + Viaje Vi 


Eq. (5) 


iet*) Us 


k 5 
Lyeynyes + Lijajest (Seu v/) / 
_——  ———————— 


because V; < 


<Ink 


jet) Ys 


pa, 


Jeli*] 
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Video-Pod Auctions 


In this section we design a mechanism for video- 
pod auction (MVPA) which satisfies IR, IC, and 
RM whose PORM is at most ({logk| + 1)- (2+ 
Ink). Note that all the log functions are in base 
2. Let 6 = ((d1,11),..., (dn, Un)) € ©” be an 
arbitrary-type profile of n participants. We define 
the allocation and payment function of MVPA for 
this type profile. 

Mechanism MVPA partitions the participants 
into [logk| + 1 groups GY,..., GWosk!]+) 
where group G) contains all the participants 
whose demand is in the range [28~1, 28). Mecha- 
nism MVPA Selects winners only from one group 
Gs), 


Definition 5 Let M ©) be equal to max (Ld). 1) 


which is the maximum number of winners MVPA 
selects from group G), 


Note that we can select at least Ea winners 
from G8) since there are k items and the demand 
of each participant is at most 2°. Moreover, from 
the last group G‘U'’s*!+) we can select at least 
one winner although Loom! = 0, since we 
assume the demand of all the participants is from 
set [k]. 

Let (a), vi) ree (ay vf?) be the 
types of all the participants in group g where 
p = |G®)|. Here by adding enough dummy 
participants, we assume p is always larger than 
M ®). Also, without loss of generality we assume 
pie) > yi) >... uf), We define the max 
possible revenue of group g (MPRG(g)) to be the 
following: 


j 7 y®) 


MPRG(g) = j 


max 
je[M)] 


As the name MPRG suggests, we will see that 
its value captures the maximum revenue can be 


MPRG(g’) 


pMVPA(g) = al ae 
0 
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truthfully obtained from group g. Let G& "bea 
group with the maximum MPRG and G& ) bea 
group with the second maximum MPRG breaking 
the ties arbitrarily. 

The set of winners selected by MVPA is 


ae? uf?) ae (ae, ye") 


where j is the largest number in [M& ”)) for 
which j - af ”) is larger than or equal to 
MPRG(g’). In other words, the number of winners 
(j) is the largest number in [M&")] for which 
j- ye > MPRG(g’). 

Now we use Lemma | to show that MVPA is 
truthful and obtain the payments of winners. 


Observation 2 Allocation function xMYPA is 


weakly monotone. 


Proof Note that MVPA sorts the participants 
according to their valuation and selects the 
first 7 participants. Therefore if any participant 
i increases its valuation, it only helps her to 
enter the winning set. Hence, the observation 
follows: Oo 


In the rest of this section, we drop the group 
identifier of M‘**) and simply use M unless it is 
about another group. 

In the following lemma we find the critical 

value of each winner i which is actually equal to 
its payment (pvFS). 
Lemma 4 Let set of winners xMVPA (4) contain 
the first j participants with highest valuations 
from G8") and vy be the (M + 1)th highest 
valuation in group G8") which is zero if it does 
not exist. Then, the payment of participant i is the 
following: 


ta) een) 


i d xMVPA (0) 
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Proof If participant i is not a winner, then its 
payment is zero. When participant i is a winner, 
then we prove that its payment is equal to its 


critical value (Definition 4). In order to prove 

MPRG(g’) ys ) 
J M+ 

value of participant 7, we show that for any value 


that value max :) is the critical 


larger than max MPRG(@) ve? ) i) participant 7 
still wins and for any value less than it she loses. 
(g*) y yE) > 


Remember that v; > vf *) 
are the valuations of pattigipants in n group ere ) 


(g¢*) ye ys) 
UZ “yee. UF 


and vj; are the valuations of 


the winners. Because group G& ”) is the group 
with the maximum MPRG, we have a ) > 


MPRGte) As there can be at most M winners 


(g*) (g*) 


from group G's"), we have v; > Udi: 


Therefore we have 
* MPRG(g’) * 
Ge) st jp ve). © 


Let participant i with type profile (a a 


of) be the ith winner in group g* where 


We show that for any valuation 
MPRG(g’)_,,(8") ) 
7 Um4i 


ie [jl]. 
greater than or equal to max ( 
participant i remains in_ the citing set. 
Equation (6) implies that there are 7 participants 


in group G& “) whose valuations are larger than 
(MPEG ya? 
J 


max Uys 


iF If we decrease the valu- 


, 2 
ation of participant 7 to max (MERGE ve 2 ie 


we still have j participants in group G®& i) 
MEEGIS) ys) ). 
7 UM+1 
Therefore, the value MPRG(g*) will be at 
least MPRG(g’) and group G&") remains the 
winning group: hence participant 7 remains in the 
winning set. 
Now we prove that if the valuation of partic- 
(MPRBED y (g*) ) 
7 UM+1 
she cannot be in the winning set. In order to 
prove this, we consider two cases: (A) when 


MPRG(g’) , (g*) ) ; (g*) 
eS ae is equal to Uyy4y 


with valuations at least max ( 


ipant 7 is less than the max 


max( and 
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(B) when max (MPEG tue) is equal to 
MPRG(g’) 
J : 
MPRG(g’ ¥ a 
Case (A): If max (MPRSG one) = vee 


and the valuation of participant i is less 
than ve? then it means that there are M 
participants who have valuations greater than 
the valuation of participant 7. As there can 
be at most M winners from group G& ”, 
participant 7 cannot be a winner. 


Case (B): We prove this case by contradic- 


tion. Suppose max (MPRGG) yf) = 


MERGE) and 6’ = (ae ar" ).85) 
be a type profile in which the valuation of 
participant 7 is less than MPRG(s) While she 
is still winner. Because the valuation of par- 
ticipant i (ve) is less than MERG() and 
i is in the winning set, in order for MPRG(g*) 
to be larger than MPRG(g’), there has to be 
more than j winners. Let j’ > j be the 
number of winners in 9’. Having j’ winners 
in 6’ and in order for G&") to be the group 
with the highest MPRG, we conclude that 
there are j’ participants with valuation greater 
than MPR Note that the only difference 
between 6 and 6’ is that the valuation of 
participant i is higher in 6. Therefore, there 
are also at least j’ participants with valuation 
greater than MR in 9. This contradicts 
with the way we select the number of winners 
(j) in @ which is the maximum number for 


which 7 - oe dis larger than MPRG(g’). O 


The allocation function xMVPA is weakly 


monotone (Observation 2) and the payments of 
the winners are their critical values (Lemma 4); 
therefore by Lemma | we conclude that MVPA 
satisfies IC. 

In the rest of this section, first we prove that 
MVPA satisfies RM and then bounds its PORM. 


Proposition 1 The total revenue of mechanism 
MVPA for type profile 90 (REVENUE(MVPA, @)) is 
the following: 


M+1 


REVENUE(MVPA, 0) = max (mprc(e’), M-v& ) 
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where g’ is a group with the second highest 
MPRG. 


Proof From Lemma 4 we know that there are j 


, 
winners and each of them pays max (MERGED 


nei) Therefore the sum of payments or the 


cra MPRG(g’)_, (g*) 
revenue of MVPA is j - max (MERGE Uma): 


The proof of the proposition follows if we show 
that when max (MPEG, ie) 
Os then the number of winners (7) is equal 
to M. 


If max ( 


is equal to 


MPRO tne) is equal to oF, 
then as oe) < ve, we have M - ve > 
MARS Remember that j is the maximum 
number in the set [M/] for which j - we is larger 


than MPRG(g’). Therefore j isequaltoM. 0 


Lemma 5 Let 6’ be the type profile obtained by 
either adding a new participant or increasing the 
valuation of a participant in 0. Then, 


REVENUE(MVPA, 6’) > REVENUE(MVPA, 6). 


Proof Let x be the new added participant or 
the participant which has the increased valuation 


REVENUE(MVPA, 6’) = max (mPRGo(g”), M-v 
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in 6’. Throughout the proof we show MPRG of 
each group g in type profile 6 by MPRGg(g) and 
in type profile 6’ by MPRG@-(g). Similarly, we 
show the /th highest valuation of the participants 
of group g by ee in type profile 9 and by 
we in type profile 6’. 

As the / th highest valuation of the participants 
of each group can only increase by adding partic- 
ipant x, we conclude 


Veg, Vi were > oF . (7) 


Remember that MPRGg of each group g is 
max jepm(s)] J * ye") and using Eq. (7) we get 


Vg MPRG@’(g) = MPRG@(g). (8) 


In order to prove this lemma we consider 
two cases: (A) adding participant x does not 
change the winning group G& *) and (B) adding 
x changes the winning group. 


Case (A): Let g” be a group with the second 
highest MPRG in 0’; it is possible that g’ is 
equal to 2”. 


a) 
M+1 


Proposition | 


> max (MPRGo(g’), Muy) 


i) 


definition of g” 


> max (MPRGo (g’),M - ee) 


M+1 


Eqs. (7) and (8) 


=REVENUE(MVPA, 0) 


Case (B): Let G&”) bea group with the highest 
MPRG in 6’. We have 


MPRG@(g*) > MPRGo@(g’) (9) 


as g* has the highest and g’ has the second 
highest MPRG in 0. 
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MPRGo(g*) > M - ge) 


= Mi? 


Let ¢ be the group with second highest MPRG 
in 6’. Because g* is no longer the winning 
group in 6’, it can be a candidate for the group 
with the second highest MPRG in 6’ and hence 
we have the following: 


REVENUE(MVPA, 0) = max (mPRGo(g’), Muy) 


< MPRG¢(g*) 
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(g*,6) 


As MPRGo(g") = max j -v; 


ax 
Je[M] 


“0 *0 
Aspe sae (10) 


MPRG@(g") < MPRG@(g*) < MPRG@/(g) 


(11) 


The following equations conclude the proof of 
this case: 


i 


by Eqs. (9) and (10) 


< MPRG@’(g) 


by Eq. (11) 


< max (mprco’(@), M8 .y 


(g”,9) 
Ms’) 41 


= REVENUE(MVBPA, 0’) oO 


The following lemma which bounds PORM of 
MVPA finishes this section: 


Theorem 5 PORM(MITA) < ([logk|+1)-(2+ 
Ink) 


Proof Let WF(g) to be the maximum social wel- 
fare achievable if we select the winners only from 
group G®), Let A be a set of winner(s) which 
realizes the maximum welfare in type profile 0. 
Note that as there are |logk| + 1 groups, one 
group (g) has a subset of participants from A 


whose social welfare is at least eee ier and 
ogk]+1 
hence the following: 
wr(g) > 4) (12) 
sakes llogk| +1 


Now we prove the following claim about 
MPRG(£): 


WE(&) 


Claim 1 MpRG(g) > 3% 


Proof Let B be the set of participants from group 
G®) which give the maximum social welfare. 
Because the demands of all the participants of 
G8) are in range [28—!,28), size of B is at 
most |k/28—!]. Remember from Definition 5 
that M) = max (|k/2 |, 1) is the maximum 
number of winners that MVPA potentially selects 
from group G8). Therefore, we have |B| < 2- 
M® +1. 

Throughout the proof, we drop the superscript 
from M &) and simply refer to itas M. 

Let vy; > v2 >... > v2.41 be the valuations 
of the participants in B; if B has less than 2-M +1 
participants, we add enough dummy participants 
with valuations zero. Remember that MPRG(g) = 
maxjetm] J ° eo where M is at least 1 (see 
Definition 5) which implies 


i < Missi) 


=z Vi €[M] (13) 
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The following equations conclude the proof of 
the claim: 


2:-M+1 


yn 


i=1 


WF(g) = 


2M+1 


M 
Yiu + > Ui 


i=1 i=M+1 


II 


2-M+1 


M 
yout ~ UM 


i=1 i=M+1 


lA 


replacing v; with vy fori > M 
M a 2M+1 e 
MPRG(g) MPRG(g) 
< ee eee 
ee. 
i=1 i=M+1 


by Eq. (13) 
< (2+ 1Ink)MpRG(g) 


oO 


Remember G‘%") is the group with maximum 
MPRG value. Let 7 be the number for which 
MPRG(g*) is equal to j - tie Allocation func- 
tion xMVPA selects the first /* participants from 
group G&") where j* is the maximum number 
for which j* - ve is larger than MPRG(g’). 
Therefore we can conclude that 7 < j* and 
hence 


WE (xMVPA(0)) > MPRG(g*). (14) 


The following equations conclude the proof of 
the theorem: 


WE (oo) > MPRG(g*) 
by Eq. (14) 
> MPRG(8) 
G8") has the highest MPRG 


WF(g) 
2+I1nk 


2 
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Claim | 
= WF(A) 
~ (Llogk| + 1)- (2+ Ink) 
by Eq. (12) 
oO 


Lower Bound 


In this section we prove Theorem 2. As men- 
tioned earlier we need two additional mild as- 
sumptions of anonymity and independence of 
irrelevant alternatives (which we define below) 
on the class of mechanisms for which we prove 
our lower bound. 


Definition 6 A mechanism (M = (x, p)) is 
anonymous (AM) if the following holds: Suppose 
61,02 € ©” are two type profiles which are 
permutations of each other (i.e., the set of type 
profiles are same just that the identities of partici- 
pants to whom those types belongs are different). 
Say, 62 = (01). Also say x(@;) = S; and 
x (62) = So. Then So = w(S}). 


Definition 7 Let 6 € ©” be an arbitrary-type 
profile andi € N be an arbitrary participant with 
type 0; = (d;,v;). A mechanism (M = (x, p)) 
satisfies independence of irrelevant alternatives 
(IIA) that if we decrease the bid of a losing 
participant, say, participant 7, to v; < u;, then the 
new set of winners is a super set of the previous 
one, i.e., x(9) C x((d;, U;), O_;). In other words, 
decreasing the bid of a losing participant does not 
hurt any winner. 


The proof outline of Theorem 2 is the 
following. Let M* = (x*, p*) be a mechanism 
which satisfies all the five properties and has the 
optimal PORM OPT (i.e., OPT = PORM(M"*)). 
We study the behavior of M* in a few type 
profiles. Let € be an arbitrary small positive real 
value. First we show that when there are only two 
participants with types (k,1) and (k,1 + €), 
M* gives all the k items to the participant 
with type (k,1 + €). The revenue of M* from 
these two participants is 1. Then, we add k 


1838 


more participants to create type profile @ = 
(1,1—6),0,$-6)....0,¢-9.& D.&, 
1+ ¢)). The RM property requires M* to 
make at least the same revenue for 6. From 
this constraint we are able to show that M* 
assigns all the items to participant k +2 with type 
(k, 1+¢) and hence gets social welfare 1+. Note 
that the maximum social welfare happens when 
the set of winners is {1,...,4} which implies 
WER(M*, 0) > y +—k-e (see Definition 1). 
Because PORM(M*) > wrrR(M*, 6) for any 
6 € ©”, we conclude that OPT > }7;_, 4. 

First we study the behavior of MM* when we 
have only two participants with types (k, 1) and 
(k,1 +). 


Lemma 6 Mechanism M* in type profile 
((k,1),(k,1+€)) gives all k items to the 
second participant and make one unit of 
revenue, i.e., x* ((k,1),(k,1 +6)) = {2} and 
p* ((k, 1), (k, 1 + €)) = (0, 1). 

Proof First we study type profile ((k, v1), (k, v2)) 
for general values v1, v2 € Rt where vy < vo. 


We prove that M* gives all the items to the 
second participant. 


Claim 2 x* ((k, v1), (k,v2)) = 
V1, V2 € Rt where v1 < vo. 


{2} for any 


Proof First note that M* has to have a winner 
for this type profile because otherwise its social 
welfare will be zero while the maximum social 
welfare is v2. This makes the social welfare ratio 
of M* to be undefined. 

Now we prove that if x* ((k, v1), (k,v2)) = 
{1}, then M7* either violates IC or AM. Let call 
type profile ((k, v1), (k, v2)) by 0 and suppose 
for the sake of contradiction x*(@) = {1}. 
From Lemma | we know that if participant 1 
increases his bid to v2, she still wins; hence 
x*(9@) = {1} where O@ = ((k, v2), (k, v2)). 
Now if in type profile 92 participant 2 decrease 
his bid to vj, again from Lemma | we conclude 
that she cannot win, ie., x*(0) = {1} where 
63) — ((k, v2), (k, v1)). Type profile 0 is 9) 
with participant 1 swapped with participant 2 but 
in both of them the first participant wins which 
contradicts with AM. Oo 
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Claim 2 directly proves that the winner in type 
profile ((k, 1), (k, 1+ €)) is the second partic- 
ipant. The only thing remains is to show that 
her payment (p2) is 1. Note that payment p2 
cannot be less than one because otherwise by 
Lemma | participant 2 wins all the items in type 
profile ((k, 1), (k, p2)) which contradicts with 
Claim 2. Payment pz cannot be larger than one 
because otherwise for any value 1 < v2 < p2 
participant 2 wins all the items in type profile 
((k, 1), (k, v2)). This contradicts with Lemma 1 
which states that the payment p2 is the smallest 
value for which participant 2 wins the items. O 


Now we add k more participants, each of 
which wants only one item. In the following 
lemma we prove that RM forces M* to assign all 
of the items to one of the participants who want 
all the items. 


Lemma 7 For the set of k + 2 participants with 
type profile 9 = (1, Pe), ie 2s 

qd, t —e), (k, 1), (k,1 + )), mechanism M* 
assigns all the k items to either participant k + 1 
or participant k + 2, i.e., x* (8) = {k + l}or 
x* (0) = {k + 2}. 


Proof We prove the lemma by contradiction that 
if M* assigns the items to a subset of the first k 
participants, it satisfies RM. We consider a class 
of k type profiles (0, ...,0) where 0 is 
built from 6°), The only possible difference 
between 0 and 6¢— is in the valuation of 
participant 7. If participant i is a winner in 9°-)), 
then we obtain 9 by increasing the valuation 
of the ith participant from + — eto 1—e. Note 
that the payment of participant i in 9°— is at 
most her valuation which is + — and in 0 it 
remains the same by Lemma |. If participant 7 
is not a winner in 9°), then we obtain 0 by 
decreasing his valuation to zero. Note that by ITA, 
no winner turns to a loser in 0, 

Let j € {1,...,k} be the largest number for 
which participant j is a winner in 69-) and we 
increase his valuation to 1 —¢ in 6. Note that at 
the start in type profile 9, the set of winners is 
a nonempty subset of {1,...,k}. Therefore there 
is at least one such j for which participant j is a 


Revenue Monotone Auctions 


winner in 0% since decreasing the non-winners 
valuation does not reduce the size of the winners. 

Now we prove that there is no winner in the set 
of participants {7 +1,...,k} in type profile 9%. 
Assume otherwise and let p € {7 + 1,...,k} be 
the smallest number for which participant p is a 
winner in 9“). Note that when we decrease the 
valuation of each participant j < p’ < p to zero 
to obtain 9”), participant p remains as a winner 
in all of them by IIA. Therefore, participant p is a 
winner in type profile 9°?—)) and we increase his 
valuation in 6) which contradicts with the fact 
that 7 is the largest number for which participant 
j isa winner in OY"), 

The payment of participant j in 9/— is at 


most its valuation which is i — ¢€. When we 


increase his bid to 1 — € in type profile 0%, its 
payment remains the same by Lemma 1. Note 
that by construction of 6%, the valuation of all 
participants in {1,..., 7} is either zero or 1—e. If 
the valuation of them is 1 —€ and they are winner, 
by AM their payment is + —e. Therefore the total 


payments or revenue of M* in 6% is at most 
J: G —e) = 1—/j -€ since there is no other 
winner in set of participants {j7+1,...,k}in type 
profile 0%. 

Note that type profile 6% is obtained from 
type profile ((k, 1), (k, 1 + €)) by adding & more 
participants. However the revenue of 0 is 1—;- 
€ that is strictly less than | which is the revenue of 
((k, 1), (k, 1 + €)) by Lemma 6. This contradicts 
with the RM property of M*; hence M* has to 
assign the items to either participant k + 1 or 
k+2. oO 


Now we show how from Lemma 7 we can de- 
rive Theorem 2. Note that the maximum welfare 
for type profile 9 = ((, 1-—e),(, $ —€),..., 
(1, ¢—©). (k, 1), (k, 1 + €)) realized when we 
give one item to each of the first k participants 
for which we get the total social welfare 
ae + —k-€, i.e., the nominator of Definition | 
for this type profile is yy + —k-e. The 
denominator of Definition | is at most 1 + € by 
Lemma 7. Therefore the ratio of the welfare for 
this type profile is at least Zia ibe Because 
OPT is the maximum ratio over all type profiles 
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k . 
eos K 1/i—-k: 
(see Definition 2), we have OPT > RIL 
which results in OPT > yan — ¢’ where 
ef a eM V/A) 
— 1+e 


Note that the value ¢’ can be made arbitrarily 
small by selecting a sufficiently small value for 
e. Therefore we prove that for any positive small 
real value e’, we have OPT > a + —e' which 
implies Theorem 2. 
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Problem Definition 


We will consider enumeration problems, i.e., we 
want to list all the objects that satisfy given 
conditions (e.g., vertices of a polytope {x | Ax > 
b} or maximal cliques in a given graph). One 
object should not be listed twice or more. 


Introduction 


In this entry, we consider an enumeration scheme 
called reverse search developed by Avis and 
Fukuda [1]. The scheme was originally developed 
to enumerate all the vertices of a given polytope 
represented by the intersection of half spaces [1]. 
The scheme is very powerful, and quite many 
kinds of objects such as arrangements in a hy- 
perplane, triangulations of a polygon, bases of a 
matroid, spanning trees, trees, or maximal cliques 
in a graph, plane graphs of given number of 
vertices, etc., can be enumerated with it [1—4, 6]. 

Think of a problem to enumerate (or visit) 
all the vertices of a given connected graph G. 
Most of the readers may use depth-first search 
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or breadth-first search algorithms. The two algo- 
rithms dynamically find tree structures in G and 
traverse them. Given an enumeration problem, 
reverse search scheme also finds some kind of 
tree structure on objects to enumerate dynami- 
cally and traverse it. To execute depth-first search 
or breadth-first search, a graph G should be given 
explicitly, and we have to remember which ver- 
tices have been visited. However, in most of enu- 
meration problems, the objects that we want to 
enumerate are, of course, not explicitly given, nor 
we cannot remember all the objects that we have 
already output in the execution of an algorithm. 
For example, if we want to enumerate all the ver- 
tices of a polytope represented by the intersection 
of half spaces, the vertices are not given explic- 
itly. If we want to enumerate plane graphs of 100 
vertices, the number is quite large, and we do 
not want to remember every obtained graph. So, 
the scheme is designed to treat implicitly given 
objects and run with small amount of memory. 


Key Results 


When we develop an algorithm for enumerating 
some objects with reverse search scheme, we 
first think of an implicit connected graph G,, of 
the objects. For example, when enumerating all 
the vertices of a given polytope, we think of a 
graph whose vertices correspond to the vertices 
of the polytope and whose edges correspond to 
the edges of the polytope. When enumerating 
spanning trees in a graph G, we think of a graph 
G,, whose vertices correspond to the spanning 
trees of G, and {i,j} € E(G;,s) if and only if 
spanning tree T; of G corresponding to vertex j 
of G,, can be obtained from spanning tree 7; of 
G corresponding to vertex i of G,; by removing 
an edge and adding an edge. Of course, we cannot 
make such a graph G,, explicitly without enumer- 
ating the vertices of the polytope or the spanning 
trees of G. However, given an object x, we can 
easily generate every object whose corresponding 
vertex is adjacent to x’s corresponding vertex in 
G,s. To put it the other way around, we require 
G,; this property. In the above examples, G;,s 
are undirected. However, G,;, is sometimes di- 
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rected. When enumerating (not necessarily max- 
imal) cliques in a graph, (7, /) € E(G;s) if and 
only if the clique corresponding to j is a proper 
subset of the clique corresponding to 7. 

Now we have a connected graph G,, of objects 
to enumerate. Then for every vertex v € V(G;s) 
except for a special vertex r called root, we define 
a parent vertex u of v such that u is adjacent to v, 
and no vertex of G,; is a proper ancestor of itself, 
i.e., by iteratively moving from a vertex v to the 
parent of v, to the parent of the parent of v, and 
so on, we never come to the start vertex v again. 
Then we easily come to the following lemma. 


Lemma 1 For every vertex v of Gis, v is a 
descendant of r. 


Proof Since every vertex of G,; cannot be an 
ancestor of itself and the number of vertices in 
G;s is bounded, every vertex of G,, has its oldest 
ancestor. Since a vertex of G,, except for r has its 
parent, it cannot be the oldest ancestor. Therefore, 
r is the oldest ancestor of every vertex of G;,.. O 


By the lemma above, edges in G;s corre- 
sponding to the “parent-child” relations clearly 
induce a spanning tree (or arborescence) T;; of 
G,s. Therefore, we can enumerate every object by 
traversing 7;, from r in the depth-first manner. 
The whole scheme is shown below. 


Procedure ENUMERATE_SUBTREE(v) 
output v 
for all w satisfying (w, v) € E(G,,) do 
if v is the parent of w then 
ENUMERATE_SUBTREE(w) 
end if 
end for 
end procedure 


Procedure ENUMERATE 
find r 
ENUMERATE_SUBTREE(r) 
end procedure 


If the depth of T,, is very deep, using a recur- 
sion needs a big amount of memory. However, 
since we can find the parent of each vertex of G;., 
we actually do not need to use a recursion. Even 
if we do not remember the previously visited 
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vertices in G,s, we can go back in the tree search 
by finding the parents. If the time complexity for 
finding the parent is relatively high, the total time 
complexity gets high. Therefore, there is a time- 
space trade-off. 


Examples 


For enumerating all the vertices of a given poly- 
tope P = {Ax > b}, we use G,, described in the 
previous section. For the sake of simplicity, we 
assume that P is not degenerated. First, we find a 
vertex x* of P by the simplex method or the inte- 
rior point method. Then, find an objective vector 
c such that the unique optimal solution of the lin- 
ear programming problem min c'x, s.t. Ax > 
b is x*. We define a parent vertex ¥ of a vertex 
x as the vertex corresponding to the basis of P 
obtained from the basis corresponding to x by 
a single pivot in the simplex method minimizing 
c'x with Bland’s pivot rule. The root vertex is 
x*. We can easily find every vertex x’ satisfying 
{x’,x} © E(G,s) by swapping a basic variable 
and a nonbasic variable from the basis corre- 
sponding to x’. Of course, we can easily check 
if vertex x is the parent of vertex x’ by running 
the simplex method by one step from x’. 

For enumerating (not necessarily maximal) 
cliques in a graph G, we also use (the directed 
graph) G,, in the previous section. We define the 
parent of vertex v corresponding to clique Cy 
in G as the vertex u corresponding to clique C, 
such that C,, is obtained from Cy by removing the 
vertex of the smallest index. The root is the empty 
set. Since a vertex w satisfying (w,v) € E(G,s) 
corresponds to a clique obtained by adding a 
vertex to C,, we can find it easily. Clearly we can 
check if v is the parent of u easily, too. 

Note that the algorithms introduced in this 
section are for an easy explanation. One can 
develop faster algorithms for the problems. 


Avoiding Long Delays 


A naive implementation of the reverse search 
scheme sometimes causes a long delay between 
successive outputs of two objects. Consider the 
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case that the depth of 7,; is very deep and one 
has to return from a leaf to the root. In order to 
avoid this kind of long delays, a smart method is 
known [5,7]. At the odd level of recursion, we 
output the objects before making the recursive 
calls, and at the even level of the recursion, we 
output after the termination of the recursive calls. 
In this way, at least one of three iterations outputs 
an object when the algorithm ascends or de- 
scends the search tree 7;,. The algorithm is shown 
below. 


Procedure ENUMERATE_SUBTREE(v, parity) 
if parity = O then 
output v 
end if 
for all w satisfying (w, v) € E(G,,) do 
if v is the parent of w then 
ENUMERATE_SUBTREE(w,parity ® 1) 
end if 
end for 
if parity = 1 then 
output v 
end if 
end procedure 


Procedure ENUMERATE 
find r 
ENUMERATE_SUBTREE(r, 0) 
end procedure 


Note 


For the sake of easy understanding, we intro- 
duced G,,. However, most of results using the 
reverse search type algorithms do not treat G;s. 
It is easy to understand that we can develop re- 
verse search algorithms only by good definitions 
of parent-child relations and fast algorithms to 
enumerate children of given objects. If one can 
develop a fast children enumeration algorithm 
which enumerates all the children of an object 
in time T and if the degrees of some vertices 
in G,s are quite large compared with T, the 
resulting enumeration algorithm is faster than 
the naive implementation described in this entry. 
Such examples will appear in other entries in this 
book. 
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Problem Definition 


This problem is concerned with computing fea- 
tures of the Boltzmann distribution over RNA 
secondary structures in the context of the stan- 
dard Gibbs free energy model used for RNA Sec- 
ondary Structure Prediction by Minimum Free 
Energy (cf. corresponding entry). Thermodynam- 
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ics state that for a system with configuration 
space §2 and free energy given by E: 2 }» R, the 
probability of the system being in state w € £2 is 
proportional to e~# (®)/RT where R is the univer- 
sal gas constant and T the absolute temperature of 
the system. The normalizing factor 


Z = ~ e E@)/RT (1) 


we 


is called the full partition function of the system. 
Over the past several decades, a model 
approximating the free energy of a structured 
RNA molecule by independent contributions of 
its secondary structure components has been 
developed and refined. The main purpose of 
this work has been to assess the stability of 
individual secondarystructures. However, it 
immediately translates into a distribution over 
all secondary structures. Early work focused on 
computing the pairing probability for all pairs 
of bases, i.e., the sum of the probabilities of all 
secondary structures containing that base pair. 
Recent work has extended methods to compute 
probabilities of base pairing probabilities for 
RNA heterodimers [2], i.e., interacting RNA 
molecules, and expectation, variance and higher 
moments of the Boltzmann distribution. 


Notation 

Let s € {A,C,G,U}* denote the sequence of 
bases of an RNA molecule. Use X - Y where 
X,Y €{A,C,G,U} to denote a base pair be- 
tween bases of type X and Y, and i-j where 
1 <i <j <|s| to denote a base pair between 
bases s[i] and s[j]. 


Definition 1 (RNA Secondary Structure) 
A secondary structure for an RNA sequence s 
is a set of base pairs S = {i-j | 1 <i <j <|s| 
Ai <j—3}. For i - j,i’ + j’ € S with 
ij Ail j' 


© {i,j} {i', 7} = @ (each base pairs with at 
most one other base) 

ts]. sl/}t € A, UF. {C, G}.{G, U3} (only 
Watson-Crick and G, U wobble base pairs) 
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* i<i’<j => jj’ <j (base pairs are either 
nested or juxtaposed but not 
overlapping) 


The second requirement, that only canonical base 
pairs are allowed, is standard but not consequen- 
tial in solutions to the problem. The third require- 
ment states that the structure does not contain 
pseudoknots. This restriction is crucial for the 
results listed in this entry. 


Energy Model 

The model of Gibbs free energy applied, usually 
referred to as the nearest-neighbor model, was 
originally proposed by Tinoco et al. [10, 11]. It 
approximates the free energy by postulating that 
the energy of the full three dimensional structure 
only depends on the secondary structure, and 
that this in turn can be broken into a sum of 
independent contributions from each loop in the 
secondary structure. 


Definition 2 (Loops) For i- 7 € S, base k is 
accessible fromi - j iffi <k < j and—di’-j’€ 
S:i < i! < k < j’ < j. The loop closed 
by i-j,£;.;, consists of i- j and all the bases 
accessible from i - j. If i’- j’ € S and i’ and /’ 
are accessible from i - j, then i’ - j’ is an interior 
base pair in the loop closed by i - 7. 


Loops are classified by the number of interior 
base pairs they contain: 


e hairpin loops have no interior base pairs 

e stacked pairs, bulges, and internal loops have 
one interior base pair that is separated from the 
closing base pair on neither side, on one side, 
or on both sides, respectively 

¢ multibranched loops have two or more interior 
base pairs 


Bases not accessible from any base pair are called 
external. This is illustrated in Fig. 1. The free 
energy of structure S is 


AG(S) = 9° AGE;.;) (2) 


i-jes 
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RNA Secondary 
Structure Boltzmann 
Distribution, Fig. 1 

A hypothetical RNA 
structure illustrating the 
different loop types. Bases 
are represented by circles, 
the RNA backbone by 
straight lines, and base 
pairs by zigzagged lines 
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where AG(€;.;) is the free energy contribution 
from the loop closed by i - 7. The contribution of 
S to the full partition function is 


er AGSYRT _ Q-Di-jes AG(Ei.j)/RT 


(3) 


II 


I] e 7 AG (Ei.j)/RT. 


£i.; es 


Problem 1 (RNA Secondary Structure Distri- 
bution) 

INPUT: RNA sequence s, absolute temperature T 
and specification of AG at T for all loops. 
OUTPUT: vs e AG(S)/RT where the sum is over 
all secondary structures for s. 


Key Results 


Solutions are based on recursions similar to 
those for RNA Secondary Structure Prediction 
by Minimum Free Energy, replacing sum and 
minimization with multiplication and sum (or 
more generally with a merge function and 
a choice function [8]). The key difference 
is that recursions are required to be non- 
redundant, i.e., any particular secondary structure 
only contributes through one path through the 
recursions. 


Theorem 1 Using the standard thermodynamic 
model for RNA _ secondary structures, the 
partition function can be computed in time O(\s*) 
and space O(|s*). Moreover, the computation can 
build data structures that allow O(1) queries 


of the pairing probability of i-j for any 
1<i <j <|s| [5, 6, 7]. 


Theorem 2 Using the standard thermodynamic 
model for RNA secondary structures, the ex- 
pectation and variance of free energy over the 
Boltzmann distribution can be computed in time 
O(|s*) and space O(|s?). More generally, the kth 
moment 


EBottzmann[A G] = 1/Z ~ g ASR AGS), 
Ss 


(4) 


where Z = Y\, e ACS)/RT is the full partition 
function and the sums are over all secondary 
structures for s, can be computed in time O(k’|s*) 
and space O(ks?) [8]. 


In Theorem 2 the free energy does not hold a spe- 
cial place. The theorem holds for any function 
defined by an independent contribution from each 
loop, 


®(S) = D> (Eis). 


i-jes 


(5) 


provided each loop contribution can be handled 
with the same efficiency as the free energy con- 
tributions. Hence, moments over the Boltzmann 
distribution of e.g., number of base pairs, un- 
paired bases, or loops can also be efficiently com- 
puted by applying appropriately chosen indicator 
functions. 
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Applications 


The original use of partition function computa- 
tions was for discriminating between well defined 
and less well defined regions of a secondary 
structure. Minimum free energy predictions will 
always return a structure. Base pairing proba- 
bilities help identify regions where the predic- 
tion is uncertain, either due to the approxima- 
tions of the model or that the real structure 
indeed does fluctuate between several low en- 
ergy alternatives. Moments of Boltzmann dis- 
tributions are used in identifying how biologi- 
cal RNA molecules deviates from random RNA 
sequences. 

The data structures computed in Theorem | 
can also be used to efficiently sample secondary 
structures from the Boltzmann distribution. This 
has been used for probabilistic methods for sec- 
ondary structure prediction, where the centroid 
of the most likely cluster of sampled structures 
is returned rather than the most likely, i.e., min- 
imum free energy, structure [3]. This approach 
better accounts for the entropic effects of large 
neighborhoods of structurally and energetically 
very similar structures. As a simple illustration 
of this effect, consider twice flipping a coin with 
probability p > 0.5 for heads. The probability p” 
of heads in both flips is larger than the prob- 
ability p(1 — p) of heads followed by tails or 
tails followed by heads (which again is larger 
than the probability (1— p)? of tails in both 
flips). However, if the order of the flips is ignored 
the probability of one heads and one tails is 
2p(1 — p). The probability of two heads remains 
p* which is smaller than 2p(1 — p) when p < Z. 
Similarly a large set of structures with fairly low 
free energy may be more likely, when viewed as 
a set, than a small set of structures with very low 
free energy. 


Open Problems 


As for RNA Secondary Structure Prediction by 
Minimum Free Energy, improvements in time 
and space complexity are always relevant. This 
may be more difficult for computing distribu- 
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tions, as the more efficient dynamic programming 
techniques of [9] cannot be applied. In the context 
of genome scans, the fact that the start and end 
positions of encoded RNA molecule is unknown 
has recently been considered [1]. 

Also the problem of including structures with 
pseudoknots, i.e., structures violating the last 
requirement in Definition 1, in the configuration 
space is an active area of research. It can be 
expected that all the methods of Theorems 3 
through 6 in the entry on RNA Secondary 
Structure Prediction Including Pseudoknots can 
be modified to computation of distributions 
without affecting complexities. This may require 
some further bookkeeping to ensure non- 
redundancy of recursions, and only in [4] has 
this actively been considered. 

Though the moments of functions that are 
defined as sums over independent loop contribu- 
tions can be computed efficiently, it is unknown 
whether the same holds for functions with more 
complex definitions. One such function that has 
traditionally been used for statistics on RNA sec- 
ondary structure [12] is the order of a secondary 
structure which refers to the nesting depth of 
multibranched loops. 


URL to Code 


Software for partition function computation 
and a range of related problems is available 
from www.bioinfo.rpi.edu/applications/hybrid/ 
download.php and www.tbi.univie.ac.at/~ivo/ 
RNA/. Software including a restricted class of 
structures with pseudoknots [4] is available at 
www.nupack.org. 
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RNA Secondary Structure Prediction by Mini- 
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Problem Definition 


This problem is concerned with predicting the 
set of base pairs formed in the native structure 
of an RNA molecule. The main motivation stems 
from structure being crucial for function and the 
growing appreciation of the importance of RNA 
molecules in biological processes. Base pairing 
is the single most important factor determining 
structure formation. Knowledge of the secondary 
structure alone also provides information about 
stretches of unpaired bases that are likely can- 
didates for active sites. Early work [7] focused 
on finding structures maximizing the number of 
base pairs. With the work of Zuker and Stiegler 
[17], focus shifted to energy minimization in a 
model approximating the Gibbs free energy of 
structures. 


Notation 

Let s € {A,C,G,U}* denote the sequence of 
bases of an RNA molecule. Use X - Y where 
X,Y € {A,C,G,U} to denote a base pair 
between bases of type X and Y andi - j where 
1 <i < j < |s| to denote a base pair between 
bases s[i] and s[/]. 


Definition 1 (RNA Secondary Structure) A 
secondary structure for an RNA sequence s is a 
set of base pairs S = {i-j|1 <i <j <|s|Ai < 
j —3}.Fori-j,i’-j’ € Swithi-j #i'-j’: 


© {i,j}. {i', 7} = @ (each base pair with at 
most one other base) 

 ts[i].sl/]} € (14, US, tC, Gh. {G, US} (only 
Watson-Crick and G, U wobble base pairs) 

° i <i’ <j => j’ < j (base pairs are either 
nested or juxtaposed but not overlapping) 


The second requirement that only canonical base 
pairs are allowed is standard but not consequen- 
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RNA Secondary 
Structure Prediction by 
Minimum Free Energy, 
Fig. 1 A hypothetical 
RNA structure illustrating 
the different loop types. 
Bases are represented by 
circles, the RNA backbone 
by straight lines, and base 
pairs by zigzagged lines 


Bulge 
4 


a Hairpin loop 


1847 


Multibranched loop 


x 


Hairpin loop 


ld 


a“ 
Stacked pair 


Internal loop 


External base 


tial in solutions to the problem. The third require- 
ment states that the structure does not contain 
pseudoknots. This restriction is crucial for the 
results listed in this entry. 


Energy Model 

The model of Gibbs free energy applied, usually 
referred to as the nearest-neighbor model, was 
originally proposed by Tinoco et al. [10, 11]. It 
approximates the free energy by postulating that 
the energy of the full three-dimensional structure 
only depends on the secondary structure and 
that this in turn can be broken into a sum of 
independent contributions from each loop in the 
secondary structure. 


Definition 2 (Loops) Fori-j € S, base k is 
accessible fromi-j iffi <k < j and—di’-j’¢€ 
S:i <i’ <k < j’ < j. The loop closed 
byi- j,€;.;, consists of i - j and all the bases 
accessible from i - j. If i’- j’ € S andi’ and j’ 
are accessible from i - j, then i’- j’ is an interior 
base pair in the loop closed by i - /. 

Loops are classified by the number of interior 
base pairs they contain: 


¢ Hairpin loops have no interior base pairs. 

e Stacked pairs, bulges, and internal loops have 
one interior base pair that is separated from the 
closing base pair on neither side, on one side, 
or on both sides, respectively. 


e Multibranched loops have two or more interior 
base pairs. 


Bases not accessible from any base pair are called 
external. This is illustrated in Fig. 1. The free 
energy of structure S is 


AG(S) = 7 AGE:.;), 


i-jes 


(1) 


where AG(€;.;) is the free energy contribution 
from the loop closed by i - 7. 


Problem I (Minimum Free Energy Structure) 


INPUT: RNA sequence s and specification of 
AG for all loops 


arg min{ AG(S) |S secondary structure for s}. 


OuTPuT: A secondary structure achieving the 
minimum of free energies, taken over all pos- 
sible secondary structures 


Key Results 


Solutions are based on using dynamic program- 
ming to solve the general recursion 
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min 
E031 <i} <j <-<ig <x <j 


Wi] = min) W[i — 1], min {W 
0<k<i 


where AG(€j.j:i,-j;,...,ic,j,) 18 the free energy 
of the loop closed by i - 7 and interior base 
pairs 71° j1,...,%% ° jx and with initial condition 
W [0] = 0. In the following, it is assumed that all 
loop energies can be computed in time O(1). 


Theorem 1 [/f the free energy of multibranched 
loops is a sum of: 


An affine function of the number of interior 
base pairs and unpaired bases 
Contributions for each base pair from stacking 
with either neighboring unpaired bases in the 
loop or with a neighboring base pair in the 
loop, whichever is more favorable 


aminimum free energy structure can be computed 
in time O(\|s|*) and space O(|s|*) [17]. 


With these assumptions, the time required to 
handle the multibranched loop parts of the re- 
cursion reduces to O(|s|*). Hence, handling the 
O(|s|*) possible internal loops becomes the bot- 
tleneck. 


Theorem 2 /f furthermore the free energy of 
internal loops is a sum of: 


e¢ A function of the total size of the loop, 
i.e., the number of unpaired bases in the 
loop 

A function of the asymmetry of the loop, i.e., 
the difference in number of unpaired bases on 
the two sides of the loop 

Contributions from the closing and interior 
base pairs stacking with the neighboring un- 
paired bases in the loop 


a minimum free energy structure can be computed 
in time O(|s|?) and space O(|s|*) [5]. 


Under these assumptions, the time required to 
handle internal loops reduces to O(|s|*). 
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With further assumptions on the free energy 
contributions of internal loops, this can be 
reduced even further, again making the handling 
of multibranched loops the bottleneck of the 
computation. 


Theorem 3 /f furthermore the size dependency 
is concave and the asymmetry dependency is 
constant for all but O(1) values, a multibranched 
loop free minimum free energy structure can 
be computed in time O(|s|* log? |s|) and space 


O(|s|*) [8]. 


The above assumptions are all based on the 
nature of current loop energies [6]. These ener- 
gies have to a large part been developed without 
consideration of computational expediency and 
parameters determined experimentally, although 
understanding of the precise behavior of larger 
loops is limited. For multibranched loops, some 
theoretical considerations [4] would suggest that 
a logarithmic dependency would be more appro- 
priate. 


Theorem 4 /f the restriction on the dependency 
on number of interior base pairs and unpaired 
bases in Theorem I is weakened to any function 
that depends only on the number of interior base 
pairs, the number of unpaired bases, or the total 
number of bases in the loop, a minimum free 
energy structure can be computed in time O(n*) 


and space O(n?) [13]. 


Theorem 5 All the above theorems can be mod- 
ified to compute a data structure that for any 
1 <i < 7 < |s| allows us to compute the 
minimum free energy of any structure containing 


i-j intime O(A) [15]. 


Applications 


Naturally, the key application of these algorithms 
is for predicting the secondary structure of 
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RNA molecules. This holds in particular for 
sequences with no homologues with common 
structure, e.g., functional analysis based on 
mutational effects and to some extent analysis 
of RNA aptamers. With access to structurally 
conserved homologues, prediction accuracy 
is significantly improved by _ incorporating 
comparative information [2]. 

Incorporating comparative information seems 
to be crucial when using secondary structure 
prediction as the basis of RNA gene finding. As 
it turns out, the minimum free energy of known 
RNA genes is not sufficiently different from the 
minimum free energy of comparable random se- 
quences to reliably separate the two [9, 14]. How- 
ever, minimum free energy calculations are at the 
core of one successful comparative RNA gene 
finder [12]. 


Open Problems 


Most current research is focused on refinement 
of the energy parametrization. The limiting factor 
of sequence lengths for which secondary struc- 
ture prediction by the methods described here is 
still feasible is adequacy of the nearest-neighbor 
approximation rather than computation time and 
space. Still, improvements on time and space 
complexities are useful as biosequence analyses 
are invariably used in genome scans. In par- 
ticular, improvements on Theorem 4, possibly 
for dependencies restricted to be logarithmic or 
concave, would allow for more advanced scoring 
of multibranched loops. A more esoteric open 
problem is to establish the complexity of comput- 
ing the minimum free energy under the general 
formulation of (1), with no restrictions on loop 
energies except that they are computable in time 
polynomial in |s|. 


Experimental Results 


With the release of the most recent energy param- 
eters [6], secondary structure prediction by find- 
ing a minimum free energy structure was found 
to recover approximately 73 % of the base pairs 
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in a benchmark data set of RNA sequences with 
known secondary structure. Another independent 
assessment [1] put the recovery percentage some- 
what lower at around 56%. This discrepancy is 
discussed and explained in [1]. 


Data Sets 


Families of homologous RNA sequences aligned 
and annotated with secondary structure are 
available from the Rfam database at www. 
sanger.ac.uk/Software/Rfam/. Three-dimensional 
structures are available from the Nucleic Acid 
Database at ndbserver.rutgers.edu/. An extensive 
list of this and other databases is available at 
www.imb-jena.de/RNA. html. 


URL to Code 


Software for RNA folding and a range of re- 
lated problems is available at www.bioinfo.rpi. 
edu/applications/hybrid/download.php and www. 
tbi.univie.ac.at/~ivo/RNA/. Software implement- 
ing the efficient handling of internal loops of [8] 
is available at ftp.ncbi.nlm.nih.gov/pub/ogurtsov/ 
Afold. 
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Problem Definition 


This problem is concerned with predicting the set 
of base pairs formed in the native structure of an 
RNA molecule, including overlapping base pairs 
also known as pseudoknots. Standard approaches 
to RNA secondary structure prediction only allow 
sets of base pairs that are hierarchically nested. 
Though few known real structures require the 
removal of more than a small percentage of their 
base pairs to meet these criteria, a significant 
percentage of known real structures contain at 
least a few base pairs overlapping other base 
pairs. Pseudoknot substructures are known to be 
crucial for biological function in several contexts. 
One of the more complex known pseudoknot 
structures is illustrated in Fig. 1. 


Notation 

Let s € {A,C,G,U}* denote the sequence of 
bases of an RNA molecule. Use X - Y where 
X,Y € {A,C,G,U} to denote a base pair 
between bases of type X and Y andi - j where 
1 <i < j < |s| to denote a base pair between 
bases s[i] and s[/]. 


Definition 1 (RNA Secondary Structure) A 
secondary structure for an RNA sequence s is a 
set of base pairs S = {i-j|1 <i <j <|s|Ai < 
j —3}.Fori-j,i’-j’ eS withi-j Ai’: j’: 


© {i,j} {i', 7} = @ (each base pair with at 
most one other base) 

° {s[i],s[7]}} € (14, U}, {C, G}, {G, U}} (only 
Watson-Crick and G, U wobble base pairs) 


The second requirement that only canonical base 
pairs are allowed is standard but not consequen- 
tial in solutions to the problem. 


Scoring Schemes 

Structures are usually assessed by extending the 
model of Gibbs free energy used for » RNA Sec- 
ondary Structure Prediction by Minimum Free 
Energy (cf. corresponding entry) with ad hoc 
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extrapolation of multibranched loop energies to 
pseudoknot substructures [11] or by summing 
independent contributions, e.g., obtained from 
base pair restricted minimum free energy struc- 
tures from each base pair [13]. To investigate the 
complexity of pseudoknot prediction, the follow- 
ing three simple scoring schemes will also be 
considered: 


Number of base | #BP(S) = |S| 

pairs 

Number of #SBP(S) = |{i-j € 
stacking base Sit 1-j-1eESv 
pairs i-1-j+1€S}| 


#BPS(S) = |{i-j € 
Slii+1- 7-148) 


Number of base 
pair stackings 


These scoring schemes are inspired by the fact 
that stacked pairs are essentially the only loops 
having a stabilizing contribution in the Gibbs free 
energy model. 


Problem 2 (Pseudoknot Prediction) 


INPUT: RNA sequence s and an appropriately 
specified scoring scheme 

OUTPUT: A secondary structure S for s that is 
optimal under the scoring scheme specified 


Key Results 


Theorem 1 The complexities of pseudoknot 
prediction under the three simplified scoring 
schemes can be classified as follows, where & 
denotes the alphabet. 


Theorem 2 /f structures are restricted to be pla- 
nar, i.e., the graph with the bases of the sequence 
as nodes and base pairs and backbone links of 


consecutive bases as edges is required to be 
planar, pseudoknot prediction under the #BPS 
scoring scheme is NP-hard for an alphabet of size 
4. Conversely, a 1/2-approximation can be found 
in time O(|s|>) and space O(|s|*) by observing 
that an optimal pseudoknot free structure is a I/2- 
approximation [6]. 


There are no steric reasons that RNA sec- 
ondary structures should be planar, and the struc- 
ture in Fig. | is actually nonplanar. Nevertheless, 
known real structures have relatively simple over- 
lapping base pair patterns with very few nonpla- 
nar structures known. Hence, planarity has been 
used as a defining restriction on pseudoknotted 
structures [2, 15]. Similar reasoning has led to 
the development of several algorithms for finding 
an optimal structure from restricted classes of 
structures. These algorithms tend to use more 
realistic scoring schemes, e.g., extensions of the 
Gibbs free energy model, than the three simple 
scoring schemes considered above. 


Theorem 3 Pseudoknot prediction for a 
restricted class of structures including Fig. 2a-e, 
but not Fig. 2f, can be done in time O(\s|®) and 


space O(|s|*) [11]. 


Theorem 4 Pseudoknot for a 
restricted class of planar structures including 
Fig. 2a—c, but not Fig. 2d-f, can be done in time 


O(\s|?) and space O(|s|*) [14]. 


prediction 


Theorem 5 Pseudoknot prediction for a 
restricted class of planar structures including 
Fig. 2a, b, but not Fig. 2c-f, can be done in time 
O(\s|?) and space O(|\s|*) or O(\s|?) [1, 4] 
(methods differ in generality of scoring schemes 
that can be used). 
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Fixed alphabet 
#BP [13] Time O(|s|*), space O(|s|7) 
#SBP [7] | Time O(|s|!=P+!=0), space O(|s||=P +120) 


#BPS 
in time O(|s|) [6] 


ieee 


——< oe ~_——_ —_ _— 
any number of base pairs 
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doknots, Fig. 2 RNA secondary structures illustrating 
restrictions of pseudoknot prediction algorithms. Back- 


Theorem 6 Pseudoknot prediction for a 
restricted class of planar structures including 
Fig. 2a, but not Fig. 2b-f, can be done in time 
O(|s|*) and space O(|s|*) [1, 8]. 


Theorem 7 Recognition of structures belonging 
to the restricted classes of Theorems 3, 5, and 6 
and enumeration of all irreducible cycles (i.e., 
loops) in such structures can be done in time 


O(\s|) [3, 9]. 


Applications 


As for the prediction of RNA secondary struc- 
tures without pseudoknots, the key application of 
these algorithms is for predicting the secondary 
structure of individual RNA molecules. Due to 
the steep complexities of the algorithms of The- 
orems 3-6, these are less well suited for genome 
scans than prediction without pseudoknots. 

Enumerating all loops of a structure in linear 
time also allows scoring a structure in linear time, 
as long as the scoring scheme allows the score 
of a loop to be computed in time proportional 
to its size. This has practical applications in 
heuristic searches for good structures containing 
pseudoknots. 


NP hard for | = 2|, PTAS [7] 1/3-approximation 


eS 
ee 
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Unbounded alphabet 
Time O(|s|3), space O(|s|7) 
NP hard 


NP hard [7], 1/3-approximation in time and space 


O(\s|7) [6] 


f 
bone is drawn as a straight line, while base pairings are 
shown with zigzagged arcs 


Open Problems 


Efficient algorithms for prediction based on 
restricted classes of structures with pseudoknots 
that still contain a significant fraction of all 
known structures are an active area of research. 
Even using the more theoretical simple #SBP 
scoring scheme, developing, e.g., an O(|s|'*!) 
algorithm for this problem would be of practical 
significance. From a theoretical point of view, 
the complexity of planar structures is the least 
well understood, with results for only the #BPS 
scoring scheme. 

Classification of realistic energy models for 
RNA secondary structures with pseudoknots is 
much less developed than for RNA secondary 
structures without pseudoknots. Several recent 
papers have been addressing this gap [3,9, 12]. 


Data Sets 


PseudoBase at __http://biology.leidenuniv.nl/~ 
batenburg/PKB.html is a repository of repre- 
sentatives of most known RNA structures with 
pseudoknots. 
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URL to Code 


The method of Theorem 3 is available at http:// 
selab.janelia.org/software.html#pknots and of 
one of the methods of Theorem 5 at http://www. 
nupack.org, and an implementation applying a 
slight heuristic reduction of the class of structures 
considered by the method of Theorem 6 is 
available at http://bibiserv.techfak.uni-bielefeld. 
de/pknotsrg/ [10]. 
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Problem Definition 
Since ancient history mankind has _ been 


fascinated by the problem of orienting itself in 
unknown environments. Problems like escaping 
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fe+1 


Robotics, Fig. 1 Any reasonable strategy for searching for a point on a line can be expressed as a sequence X = 


Si; ho; fas-- 


. of the search depths of the strategy 


from a labyrinth or searching for target objects 
have been considered and discussed intensively. 
Such problems can be easily modeled in a 
geometric setting. 

For example, let us assume that an agent is 
searching for an unknown door along a wall. 
We assume that the door is detected, if the 
agent exactly hits the door. Geometrically one 
is searching for an unknown point along a line 
as depicted in Fig. 1. Although the location of 
the point is not known and is only detected by a 
visit, it should be found without too much detour. 
This classical problem in online navigation was 
discussed by [2] in the early 1990s. 

Since the distance and the location of the goal 
is not known, obviously any reasonable (deter- 
ministic) strategy should move in the two direc- 
tions alternatingly and with increasing depths, fj, 
until the goal is detected. 

A classical result of [13] shows that it is 
optimal to use a search strategy that doubles the 
search distance in every step, i.e., fj = 2'. It can 
be shown that the resulting path to an arbitrary 
target point ¢ is never greater than 9 times the 
shortest path to the target ¢, regardless of the 
position of ¢. There is no deterministic strategy 
that attains a smaller factor; see also [2]. 


Navigation and Exploration 

We categorize three fundamental tasks: naviga- 
tion, exploration, and localization. Navigation (or 
search) means to find a way to a predescribed 
location in an unknown environment as shown 
above. Exploration means to draw a complete 
map of an unknown environment or to detect or 
visit all possible targets. Localization means to 
determine the currently unknown position on a 
known and given map. In many settings, the en- 


vironment is modeled geometrically as a simple 
polygon with or without holes. To distinguish 
the underlying combinatorial problems from the 
geometric problems, an environment may also be 
modeled as a graph. For an overview of online 
searching and exploration problems, see [25] or 
the more recent survey of Gal [14]. 


Performance Measure, Competitive 

Analysis 

A general concept for evaluating the efficiency 
of an online strategy is the so-called competitive 
analysis. Formally, for a class of problems [7 
and any instance P € IT, the cost, OnlAlg(P), 
of the online algorithm is compared to the cost, 
OfflOpt(P), of the optimal offline algorithm. If 
there are constants C and A, so that 


OnlAlg(P) < C x OfflOpt(P) + A 


holds for any P e€ IJ, the online algorithm is 
called C competitive. In the case of exploration 
and navigation, the robot should minimize its 
travel distance. Therefore, the competitive ratio 
C measures the length of the detour compared 
to the optimal shortest tour computed under full 
information. An overview of efficient compu- 
tations of optimal offline solutions for short- 
est paths problems can be found in the survey 
of Mitchell [23]. Many online motion planning 
problems were classified by the competitive anal- 
ysis; see the surveys [4, 10, 25]. 

A randomized online algorithm against an 
oblivious adversary uses randomization on a 
fixed predetermined input (which is unknown 
to the online algorithm). In this case, the 
competitive ratio is a random variable, and 
it is maximized over all possible inputs. For 


Robotics 


example, an optimal randomized strategy for the 
introductory point-on-a-line problem given by 
Kao et al. [17] achieves an optimal competitive 
ratio of 4.5911... 


Different Models 

The robot can be equipped with a vision system 
or with a local touch sensor, only. The impact 
of a compass is of some interest. One can con- 
sider continuous geometric settings such as (a 
collection of) simple polygons or a concatena- 
tion of corridors. On the other hand, the ge- 
ometric environment might be given by a dis- 
crete concatenation of single cells (i.e., a grid 
graph environment) or is modeled by a general 
graph. Furthermore, we can consider a single 
robot or a set of k agents which are working 
together and exchange information to some ex- 
tent. Additionally the size of the memory of the 
agents can be limited. Tasks for a huge set of 
agents with very limited abilities are related to 
swarm behavior which is not the topic of this 
overview. 


Key Results 


Navigation 

Blum et al. [6] studied the problem of a blind 
robot trying to reach a goal ¢ from a start 
position s (point-to-point navigation) in a 
two-dimensional scene of m non-overlapping 
axis-parallel rectangles of width at least one. 
In the wall problem, t is an infinite vertical 
line. In the room problem, the obstacles are 
within a square room with entry door s 
and the target ¢ lies on the outer boundary. 
O(./n) competitive online algorithms have been 
developed. A lower bound on the competitive 
ratio of 2(./n) for the wall problem was given 
by [24], and for the room problem optimal 
O(logn) competitive algorithms have been 
presented by [3]. For randomized strategies 
and point-to-point navigation, there is an 
92(loglogn) lower bound for the model of an 
oblivious adversary from [18], and [5] presented 
randomized O(logn) competitive algorithms 
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for the same problem and also for the wall 
problem. 

The introductory search problem for the 
line (or 2-rays) was extended to m concurrent 
rays where an optimal competitive ratio of 
1+2m™/(m— iy was shown; see [2, 13]. An 
optimal strategy visits the m rays alternatingly 
with search depth f; = (m/(m — 1))!. For 
p agents on m rays working in parallel and 
exchanging information, an optimal ratio of 
1+2(m/p—1)(m/(m— p))/” can be achieved; 
see [21]. In a natural extension in dimension 2, 
the robot scans the area with a radar connected to 
the starting point. Gal [13] introduced this two- 
dimensional search problem and conjectures that 
a logarithmic spiral (i.e., the natural continuous 
extension of the doubling heuristic) gives an 
optimal strategy. The best logarithmic spiral 
attains a competitive ratio of 17.289 ...; finally a 
proof for the optimality of spiral search is given 
in [20]. 


Exploration 
Deng et al. [7] introduced the online gallery 
route problem. We consider a simple room mod- 
eled by a simple polygon and an agent equipped 
with a visibility system. The task of computing 
the shortest roundtrip so that the agent sees all 
points in the polygon is denoted as the shortest 
watchman route (SWR) problem. In the case of 
a rectilinear simple polygon and with L;-metric, 
there is an optimal (i.e., 1-competitive) online 
algorithm which gives a /2-approximation of the 
SWR for the L2-metric in the rectilinear problem. 
For general simple polygons, the problem was 
first solved by Icking et al. [16] with a proven 
competitive ratio of 26.5, whereas the greatest 
known lower bound is given by 1.28; see [15]. 
For the exploration of a geometric environ- 
ment with k rectilinear obstacles, there is an 
2(4/k) lower bound on the competitive ratio for 
deterministic and randomized strategies; see [1]. 
Online graph exploration by a set of k agents 
means that every vertex of an unknown graph 
has to be visited. In some configuration, addition- 
ally all edges have to be traversed. Assume that 
full communication among the agents is given. 
Finding the optimal makespan (finishing time) 
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algorithm for k agents is an NP-hard problem 
even for a given tree, and there is an O(k/ log k) 
competitive algorithm for the online exploration 
(edges and vertices) version; see [12]. On the 
other hand, a special tree construction in [9] 
gives an §2(logk/ loglog k) lower bound on the 
competitive ratio. For cell environments (grid 
graph with or without holes), optimal competitive 
strategies exist. 


Dependency Between Searching and 
Exploration 

From the m-concurrent rays result above, one can 
easily deduce that there is no constant competi- 
tive online strategy for searching a point in simple 
polygons with n edges where n can grow. Nev- 
ertheless in a fixed polygon, any search strategy 
that sees all points defines a ratio for any point 
and has a worst case ratio; see Fig.2. So there 
have to be some optimal search path for any 
fixed polygon. We want to find a general strategy 
that approximates the best search path for any 
polygon within a constant factor, i.e., within a 
constant search ratio. 

The search ratio definition was first given by 
Koutsoupias et al. [19]; they studied graphs with 
unit edge length. The result of Koutsoupias et 
al. is restricted to the offline case where the 
graph is completely known a priori. Only the goal 
remains hidden. The above concept goes beyond 
competitive analysis, although the definitions of 
the search ratio and the competitive factor are 
quite similar. In the competitive framework, we 


Robotics, Fig. 2 Ina fixed polygon, a search path z sees 
all points and attains a ratio for any single point. At pz 
the target point p is detected for the first time and defines 
a ratio 


Robotics 


compare the online path from the start to the goal 
to the shortest s—to-t path for any possible goal. 
For an approximation of the optimal search ratio, 
we compare the online path to the best possible 
offline path for any goal, which — in turn — may 
already have a very bad competitive ratio. 

The key idea for solving this problem also 
indicates the general dependency between 
searching and exploration. We make use 
of efficient (probably constant competitive) 
exploration strategies for the given environments. 
If they can be restricted to a bounded distance in a 
somewhat greater environment, we successively 
increase the exploration depth by a doubling 
factor. 

Fleischer et al. [11] showed that it is possible 
to approximate an optimal search path for search- 
ing a point in a simple polygon by a factor of 
roughly 4 if the goal has to be visited directly. 
If a vision system is used, a factor of roughly 8 
can be guaranteed. The result even holds when 
the environment is not given in advance. And the 
result also holds even though the optimal search 
path in a given simple polygon is not known. 


Applications 


In practice a robot can efficiently arrive at a 
target point (given by coordinates) in an unknown 
environment with obstacles by Lumelsky’s BUG 
strategy [22]. Many such BUG-variants were de- 
veloped and successfully applied, for example, in 
some of the mars rover expeditions. If Lumel- 
sky’s BUG algorithm is assumed to navigate 
between convex obstacles, in the worst case it 
moves at most once around every significant ob- 
stacle, which is optimal in this case. Additionally 
a robot with a compass can sometimes find the 
goal exponentially faster than a robot without a 
compass. 

Theoretical paradigms have practical rele- 
vance in robotics; see Dudek und Jenkin [8]. 
The doubling heuristic is widely accepted as 
an approximation scheme in practice. The 
general concept for the approximation of the 
optimal search path has some influence. If 
somebody is searching for a target in an unknown 
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environment, it seems to be unavoidably that the 
environment has to be explored efficiently with 
increasing depth. 

The concept also shows that there is some- 
times no significant difference between a known 
environment with unknown targets and a fully un- 
known environment. Roughly speaking, if some- 
body is searching for a goal in unknown position, 
it is not important whether the corresponding 
environment is fully known in advance or not 
known at all. 


Open Problems 


For many settings, the precise competitive 
complexity of an online motion planning 
problems is not known. Tight lower bounds are 
much harder to achieve. Some examples are 
given below. The lower bound construction for 
the navigation among obstacles usually make use 
of arbitrary thin obstacles. Is it possible to get rid 
of such a restriction? 


Exploration of a simple room with visibility: 
Upper bound 26.5 vs. lower bound 1.28 

Exploration of graphs by k agents: Up- 
per bound O(k/logk) vs. lower bound 
2 (log k/ log log k) 

Optimality of spiral search? Upper bounds given 
by spiral search. 


Searching for a line in the plane: Upper 
bound 13.81113... vs. lower bound /2- 9 

Searching for a ray in the plane: Upper bound 
22.513... vs. lower bound 17.289... 


Navigation among k obstacles: Does the lower 
bound of Q(/k) hold for a fixed aspect ratio 
of the obstacles? 

Optimal search path: How to compute the opti- 
mal search path for a given polygon or graph? 
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Problem Definition 


Consider the classical online bin packing prob- 
lem, where items of sizes in (0, 1] arrive over 
time. At the arrival of each item, it has to be 
assigned to a bin of capacity 1 such that the total 
size of all items in the bin does not exceed its 
capacity. The objective is to minimize the number 
of used bins. 

Online bin packing was introduced by 
Ullman [10] and has seen enormous research 
since then (see the survey of Seiden [9] for an 
overview). The quality of an online algorithm 
is typically measured by the asymptotic 
performance guarantee of the algorithm divided 
by the optimal offline solution and is called the 
(asymptotic) competitive ratio. In the case of 
online bin packing, the best known algorithm has 
an asymptotic competitive ratio of 1.58889 (see 
[9]). On the other hand, it was shown that no 
algorithm can achieve a ratio better than 1.54037 
(see [1]). 

To obtain algorithms with improved competi- 
tive ratio for online bin packing, one can allow to 
rearrange already packed items as soon as a new 
item arrives. The notion of robustness allows to 
repack a set of already packed items with limited 
total size whenever a new item arrives. On the one 
hand, we want to guarantee that we use as few 
bins as possible, and on the other hand, when a 
new item arrives, we want to minimize the total 
size of repacked items. 

A modern way to measure the repacking costs 
is the notion of the migration factor, developed 
by Sanders, Sivadasan, and Skutella [8]. It is 
defined by the total size of all moved items 
divided by the size of the arriving item. Following 
the notation of Sanders et al., an online algorithm 
with (asymptotic) approximation ratio 1 + € is 
called robust if its migration factor is of the size 
F( 1), where f is an arbitrary function that only 
depends on L. 


Key Results 


In the case of robust bin packing, Epstein and 
Levin [3] proved that the asymptotic competitive 
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ratio of the robust bin packing problem can be 
arbitrarily close to the optimum. They developed 
an asymptotic PTAS for the problem using a 
migration factor of rs ve 2), They also proved 
that there is no online algorithm for this prob- 
lem which has a constant migration factor and 
that maintains an optimal solution. The asymp- 
totic PTAS by Epstein and Levin was later im- 
proved by Jansen and Klein [7], who developed 
an asymptotic FPTAS for the problem with a 
migration factor of O( +). 


Techniques 

Most robust algorithms rely on a sensitivity 
result for integer linear programs (ILPs) by 
Cook et al. [2]. It was first used by Sanders 
et al. [8] to develop a robust PTAS for the 
scheduling problem on identical machines with 
the objective value of minimizing the makespan. 
The theorem of Cook et al. roughly states that 
for every optimal integral solution y’ of the 
ILP min{cy | Ax < b’}, there exists an optimal 
integral solution y” of the ILP minfcy | Ax < 
b’} with changed right-hand side b” such that the 
distance between y’ and y” can be bounded by 
Iv” —y'lloo < 2A (|b — Blog + 2), where n 
is the number of variables and A is the absolute 
value of the largest subdeterminant of A. 

The major contribution by Epstein and Levin 
for the robust bin packing problem was to 
develop a dynamic rounding technique. Based 
on a classical rounding by Fernandez de La 
Vega and Lueker [5], the dynamic rounding 
techniques present a way on how item sizes 
can be rounded in a setting where new items 
arrive over time. This allows to formulate an 
ILP of fixed dimension. As a new item arrives 
online, the formulated ILP changes accordingly. 
The changed ILP has additional columns and 
the right-hand side of the ILP is increased. 
Using the theorem of Cooks et al. [2] allows 
then to find a solution for the changed ILP 
that is close to the existing solution. This way 
a new packing is constructed for the bin packing 
instance containing the newly arrived item. 

Since the number of variables n and the largest 
subdeterminant A in the ILP formulation can 
only be bounded by an exponential term in i, 
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the use of Cooks et al. theorem leads to an 
exponential migration factor. Jansen and Klein 
developed new LP and ILP techniques which 
are based on approximate solutions of the corre- 
sponding LP. Their central idea is to show that 
for any approximate solution x’ with objective 
value ||x’||, < (+ 4)LIN, there is an approx- 
imate solution x” with improved objective value 
\|x’ ||, < G+ 6)LIN — 1 such that || y” — y’||, = 
O(F). Based on this observation, they can avoid 
the use of Cooks et al. theorem to obtain an 
asymptotic PTAS for the bin packing problem 
with polynomial migration. 


Open Problems 


e There is the obvious open question on how 
much the migration factor of O(5) from [7] 
can be improved and whether there are lower 
bounds for the migration factor. The existence 
of a robust algorithm with constant or sublin- 
ear migration is still open. 

e Is there a robust approximation scheme for 
the case when items not only arrive but also 
depart? In the literature this problem is called 
fully dynamic bin packing and was consid- 
ered by Ivkovic and Lloyd [6]. They devel- 
oped an algorithm which achieves an asymp- 
totic competitive ratio of 5 using amortized 
O(log) shifting moves, where n is the num- 
ber of packed items. A shifting move repacks 
one large item or a bundle of small items of 
bounded size. 

e Epstein and Levin developed a robust asymp- 
totic PTAS for the generalized bin packing 
problem, where d-dimensional cubes have to 
be packed into unit-sized cubes [4]. It would 
be interesting to find other robust approxima- 
tion schemes for other packing problems like 
online strip packing or online bin packing with 
bins of different capacities. 


Cross-References 
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Problem Definition 


Algorithms in computational geometry are usu- 
ally designed under the Real RAM model. In 
implementing these algorithms, however, fixed- 
precision arithmetic is used in place of exact 
arithmetic. This substitution introduces numeri- 
cal errors in the computations that may lead to 
nonrobust behavior in the implementation, such 
as infinite loops or segmentation faults. 

There are various approaches in the the lit- 
erature addressing the problem of nonrobustness 
in geometric computations; see [9] for a survey. 
These approaches can be classified along two 
lines: the arithmetic approach and the geomet- 
ric approach. 

The arithmetic approach tries to address non- 
robustness in geometric algorithms by handling 
the numerical errors arising because of fixed- 
precision arithmetic; this can be done, for in- 
stance, by using multi-precision arithmetic [6], or 
by using rational arithmetic whenever possible. In 
general, all the arithmetic operations, including 
exact comparison, can be performed on algebraic 
quantities. The drawback of such a general ap- 
proach is its inefficiency. 

The geometric approaches guarantee that cer- 
tain geometric properties are maintained by the 
algorithm. For example, if the Voronoi diagram 
of a planar point set is being computed then it 
is desirable to ensure that the output is a planar 
graph as well. Other geometric approaches are 
finite resolution geometry [7], approximate pred- 
icates and fat geometry [8], consistency and topo- 
logical approaches [4], and topology oriented 
approach [13]. The common drawback of these 
approaches is that they are problem or algorithm 
specific. 

In the past decade, a general approach called 
the Exact Geometric Computation (EGC) [15] 
has become very successful in handling the issue 
of nonrobustness in geometric computations; 
strictly speaking, this approach is subsumed in 
the arithmetic approaches. To understand the 
EGC approach, it helps to understand the two 
parts common to all geometric computations: 
a combinatorial structure characterizing the 
discrete relations between geometric objects, e.g., 
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whether a point is on a hyperplane or not; and 
a numerical part that consists of the numerical 
representation of the geometric objects, e.g., the 
coordinates of a point expressed as rational or 
floating-point numbers. Geometric algorithms 
characterize the combinatorial structure by 
numerically computing the discrete relations 
(that are embodied in geometric predicates) 
between geometric objects. Nonrobustness arises 
when numerical errors in the computations 
yield an incorrect characterization. The EGC 
approach ensures that all the geometric predicates 
are evaluated correctly thereby ensuring the 
correctness of the computed combinatorial 
structure and hence the robustness of the 
algorithm. 


Notation 

An expression EF refers to a syntactic object 
constructed from a given set of operators over the 
reals R. For example, the set of expressions on 
the set of operators {Z, +,—, x, ff is the set of 
division-free radical expressions on the integers; 
more concretely, expressions can be viewed as 
directed acyclic graphs (DAG) where the internal 
nodes are operators with arity at least one, and 
the leaves are constants, i.e., operators with arity 
zero. The value of an expression is naturally 
defined using induction; note that the value may 
be undefined. Let E represent both the value of 
the expression and the expression itself. 


Key Results 


Following are the key results that have led to the 
feasibility and success of the EGC approach. 


Constructive Zero Bounds 

The possibility of EGC approach hinges on the 
computability of the sign of an expression. For 
determining the sign of algebraic expressions 
EGC libraries currently use a numerical approach 
based upon zero bounds. A zero bound b > 0 for 
an expression F is such that absolute value | E'| of 
E is greater than b if the value of EF is valid and 
nonzero. To determine the sign of the expression 
E, compute an approximation E to E such that 
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|JE-E| < 8 if E is valid, otherwise E is also 
invalid. Then sign of E is the same as the sign of 
E if |E| > B otherwise it is zero. A constructive 
zero bound is an effectively computable function 
B from the set of expressions to real numbers R 
such that B(E) is a zero bound for any expression 
E. For examples of constructive zero bounds, 
see [2, 11]. 


Approximate Expression Evaluation 

Another crucial feature in developing the EGC 
approach is developing algorithms for approxi- 
mate expression evaluation, i.e., given an expres- 
sion E and a relative or absolute precision p, 
compute an approximation to the value of the 
expression within precision p. The main com- 
putational paradigm for such algorithms is the 
precision-driven approach [15]. Intuitively, this 
is a downward-upward process on the input ex- 
pression DAG; propagate precision values down 
to the leaves in the downward direction; at the 
leaves of the DAG, assume the ability to approx- 
imate the value associated with the leaf to any 
desired precision; finally, propagate the approxi- 
mations in the upward direction towards the root. 
Ouchi [10] has given detailed algorithms for the 
propagation of “composite precision”, a general- 
ization of relative and absolute precision. 


Numerical Filters 

Implementing approximate expression evalu- 
ation requires multi-precision arithmetic. But 
efficiency can be gained by exploiting machine 
floating-point arithmetic, which is fast and 
optimized on current hardware. The basic idea 
is to to check the output of machine evaluation 
of predicates, and fallback on multi-precision 
methods if the check fails. These checks are 
called numerical filters; they certify certain 
properties of computed numerical values, such 
as their sign. There are two main classifications 
of numerical filters: static filters are those that 
can be mostly computed at compile time, but they 
yield overly pessimistic error bounds and thus are 
less effective; dynamic filters are implemented 
during run time and even though they have higher 
costs they are much more effective than static 
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filters, i.e., have better estimate on error bounds. 
See Fortune and van Wyk [5]. 


Applications 


The EGC approach has led to the development 
of libraries, such as LEDA Real and CORE, 
that provide EGC number types, ie., a class of 
expressions whose signs are guaranteed. CGAL, 
another major EGC Library that provides robust 
implementation of algorithms in computational 
geometry, offers various specialized EGC num- 
ber types, but for general algebraic numbers it can 
also use LEDA Real or CORE. 


Open Problems 


1. An important challenge from the perspective 
of efficiency for EGC approach is high degree 
algebraic computation, such as those found 
in Computer Aided Design. These issues are 
beginning to be addressed, for instance [1]. 

2. The fundamental problem of EGC is the zero 
problem: given any set of real algebraic op- 
erators, decide whether any expression over 
this set is zero or not. The main focus here 
is on the decidability of the zero problem for 
non-algebraic expressions. The importance of 
this problem has been highlighted by Richard- 
son [12]; recently some progress has been 
made for special non-algebraic problems [3]. 

3. When algorithms in EGC approach are em- 
bedded in larger application systems (such as 
mesh generation systems), the output of one 
algorithm needs to be cascaded as input to 
another; the output of such algorithms may be 
in high precision, so it is desirable to reduce 
the precision in the cascade. The geometric 
version of this problem is called the geometric 
rounding problem: given a consistent geo- 
metric object in high precision, “round” it 
to a consistent geometric object at a lower 
precision. 

4. Recently a computational model for the 
EGC approach has been proposed [14]. The 
corresponding complexity model needs to 
be developed. Standard complexity analysis 
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based on input size is inadequate for 
evaluating the complexity of real computation; 
the complexity should be expressed in terms 
of the output precision. 


URL to Code 


Core Library: http://www.cs.nyu.edu/ 
exact 

LEDA: http://www.mpi-sb.mpg.de/LEDA 
CGAL: http://www.cgal.org 
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Problem Definition 


In the classic online scheduling model, jobs arrive 
one after another. At the arrival of a new job, 
the scheduler must immediately and irrevocably 
assign it to a machine. In the parallel machine 
case, we have m identical machines to process 
the jobs. Each job j has a processing time p; 
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that is revealed at the moment of its appearance. 
The load of a machine is the sum of processing 
times of jobs assigned to it. The objective is to 
minimize the makespan, that is, the maximum 
machine load. 

The fact that decisions are irrevocable imposes 
a hard constraint on the scheduler. However, 
many applications allow some amount of flexibil- 
ity. Robust scheduling algorithms take this flexi- 
bility into account: whenever a job arrives, some 
reassignment of jobs can be performed. More 
precisely, given a parameter 6 > 0, the arrival of 
job 7 allows to migrate a set of jobs with a total 
processing time of at most f - p;. The factor f is 
called the migration factor of the algorithm and 
it is a measure of its robustness. In this context, 
the quality of solutions is assessed by competitive 
analysis: an algorithm is a@-competitive if for 
any sequence of job arrivals the makespan of 
the algorithm is at most @ times the (offline) 
optimum cost for the set of available jobs. An 
important goal in this area is to understand the 
trade-off between the migration and competitive 
factors. 


Key Results 


Greedy Approaches 

In a setting where no migration is allowed, i.e., if 
B = 0, a competitive ratio of 2 — 1 is achievable 
by a greedy list-scheduling algorithm [7]. 
Although more sophisticated algorithms have 
smaller competitive ratios (see, e.g., [6]), no 
algorithm can achieve a performance guarantee 
smaller than e/(e — 1) & 1.58 [1, 11], even if 
randomization is allowed. 

Sanders et al. [10] give algorithms with im- 
proved competitive ratios for small values of f. 
A simplified version of their most basic algorithm 
is as follows. Let 7 be an arriving job and denote 
by OPT the minimum makespan of the instance 
including j. The algorithm works as follows. 


1. If p; < OPT/2, assign job j to the machine 
with the smallest load. 

2. Otherwise, consider a machine 7 in which 
all jobs are of size at most OPT/2. Greedily 
remove jobs from 7 until their total processing 
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time is at least p;. Add job j toi and greedily 
reassign each removed job to the least loaded 
machine. 


The existence of the claimed machine in Step 
(2) follows since there can be at most m jobs of 
size larger than OPT/2 in the instance. The proof 
that the algorithm is 3/2-competitive is a simple 
exercise that follows since each greedily assigned 
job has size at most OPT/2. By construction the 
algorithm has migration factor 2. The fact that 
the algorithm needs the value OPT as input can 
be avoided by trying out a handful of different 
solutions. 

With this simple approach the competitive 
guarantee is already below the lower bound of 
1.58 for B = 0. Sanders et al. [10] shows that 
a refinement of this algorithm gives the same 
competitive guarantee and reduces f to 4/3. 
They also provide more sophisticated algorithms 
with smaller competitive factors, for example, a 
4/3-competitive algorithm with migration factor 
5/2. 


Robust Approximation Schemes 

The algorithms above show that already small 
migration factors can help to significantly im- 
prove the quality of solutions. However, they tell 
little about the trade-off between the competitive 
and migration ratios. Sanders et al. [10] study 
this trade-off by giving a robust polynomial time 
approximation scheme (robust PTAS), that is, a 
family of algorithms {A,}-59 such that for any 
constant ¢ > O the algorithm A, is (1 + «&)- 
competitive and uses a migration factor of B(e). 
We remark that 6 is a constant that depends only 
on ¢ and not on the specific input data. 

The robust PTAS borrows ideas from the 
known PTAS for the offline problem [8]. At 
the arrival of a job 7, the algorithm takes the 
given (1 + ¢€)-approximate solution and updates 
it to a schedule with the same approximation 
guarantee. The algorithm behaves differently 
depending on the size of 7. If p; is in O(eOPT), 
where OPT denotes the optimal makespan for 
the current instance including 7, then we can 
safely assign this job to the least loaded machine 
and maintain the approximation guarantee. 
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Otherwise, the processing times are rounded 
to simplify the instance and add symmetry to the 
solution. The corresponding offline minimum 
makespan problem for this instance can be 
posed as an integer program (IP) of the form 
min{c'x : A-x = b,x € N¢}. A component xc 
of x corresponds to the number of machines 
with a given configuration C, where each 
configuration is a compact description of a 
one machine schedule. Crucially, the number 
of different configurations, that is, the number 
of variables d in the IP, is a constant Qpoly(1/e) 
Moreover, the complete instance is encoded in 
the right-hand side b. After a new job arrives, the 
corresponding IP can be updated by increasing 
one coordinate of b by one, obtaining a new 
vector b’. A sensitivity analysis result by Cook 
et al. [2] implies that for any optimal solution 
x of the IP, there exists an optimal solution x’ 
with the right-hand side changed to b’ such that 
||x — x']|1 < d*- A(A)- (|b — b'||oo + 2). Here 
A(A) is the maximum |det(B)| over all square 
submatrices B of A, which in this case can be 
bounded by gpoly(1/e) | Therefore, the number of 
machines that need to be modified in order to go 
from schedule x to x’ is ||x — x/||, < 2PG/8), 
Since we are assuming that the new job has 
processing time in §2(¢€OPT), and each machine 
has a load of O(OPT), we obtain an algorithm 
with migration factor 2P°Y(/®) , 


Theorem 1 (Sanders et al. [10]) The problem 
of minimum makespan on identical machines ad- 


mits a robust PTAS with migration factor B = 
Qpoly( 1/e) : 


Applications 


The basic technique for constructing a robust 
PTAS has been adapted to different related prob- 
lem. Most results are based on the sensitivity 
analysis result mentioned above, but differ in 
other parts of the algorithm and analysis. In par- 
ticular robust PTASs have been developed for bin 
packing [3] and cube packing [4]. Other objective 
functions for identical machine scheduling have 
also been considered, for example, minimizing 
the £,-norm of the vector of loads or maximizing 
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the minimum machine load. These problems do 
not admit robust PTASs; however, it is possi- 
ble to design such algorithms for an amortized 
analogue of the migration factor [12]. Epstein 
and Levin [5] consider preemptive scheduling 
problems on parallel machines, obtaining a 1- 
competitive algorithm with migration factor 1 — 
1/m for the minimum makespan and minimum 
£y-norm objectives. As opposed to the previous 
results, this algorithm does not rely on sensitivity 
analysis results for IPs. 

The bin packing problem was considered by 
Jansen and Klein [9]. They improve the result 
in [3] to a robust PTAS with 6 = poly(1/e). 
To obtain this migration factor, they develop new 
sensitivity analysis results aimed specifically at 
approximation algorithms. 


Open Problems 


An interesting question is to determine the pre- 
cise trade-off between the competitive and migra- 
tion factors for the minimum makespan problem 
on identical machines. In particular, determine if 
the migration can be made to depend polynomi- 
ally on 1/e for a (1 + €)-competitive algorithm. 
Another natural question is to extend these results 
to related machines. In this setting, each machine 
i runs at a speed s;, and thus the time it takes to 
process job j on machine i is p;/'s;. 

Other natural objective functions on parallel 
machine scheduling are not fully understood. The 
machine covering version, where we seek to max- 
imize the minimum machine load, does not admit 
a robust PTAS. More specifically, a competitive 
ratio smaller than 20/19 is not possible with 
constant migration factor [12]. It is open if this 
competitive ratio is indeed achievable or if the 
lower bound can be improved. A similar situation 
holds for minimizing the €,-norm for any p > 
1 [4]. 
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The abstract tile assembly model (aTAM), 
originally proposed by Winfree [1], provides 
a useful framework to study algorithmic tile self- 
assembly. As described in other sections, many 
theoretical studies have shown the efficiency and 
computational power of aTAM. 

The aTAM, although widely accepted and 
experimentally verified, is an overly simplified 
combinatorial model in describing the self- 
assembly of DNA tiles. In reality, several effects 
are observed which lead to a loss of robustness 
compared to the aTAM. The assembly tends 
to be reversible, i.e., tiles can fall off from an 
existing assembly, even when the total binding 
strength exceeds the temperature threshold Tt. 
Also, tiles sometimes attach with a weak strength 
but then quickly get incorporated and locked 
into a growing assembly, much like defects in a 
crystal. However, for sophisticated combinatorial 
assemblies like counters, which form the basis 
for controlling the size of a structure, a single 
error can lead to assemblies drastically larger 
or smaller (or different in other ways) than 
the intended structure. An error rate of 0.5- 
10% is observed in previous experimental 
studies. 


KTAM 

A more sophisticated and accurate stochastic 
model called the kinetic tile assembly model 
(kKTAM) was introduced by Winfree [1]. The 
kTAM calculates rates for various types of at- 
tachments and removals based on thermodynamic 
constants. It has the following assumptions: 


1. Tile concentrations are held constant through- 
out the self-assembly process. 

2. Supertiles do not interact with each other. The 
only two reactions allowed are addition of a 
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tile to a supertile and the dissociation of a tile 
from a supertile. 

3. The forward rate constants for all tiles only 
depend on concentrations. 

4. The reverse rate depends exponentially on the 
number of base-pair bonds which must be 
broken, and the mismatched sticky ends make 
no base-pair bonds. 


There are two free parameters in this model, 
both of which are dimensionless free energies: 
Ginc > 0 measures the entropic cost of putting a 
tile at a binding site and depends on the tile con- 
centration, and G,, > 0 measures the free energy 
cost of breaking a single strength-1 bond. Under 
this model, we can approximate the forward and 
reverse rates for each of the tile-supertile reac- 
tions in the process of self-assembly of DNA tiles 
as follows: 

The rate of addition of a tile to a supertile, 
f, is pen, The rate of dissociation of a tile 
from a supertile, rp, is pe PGs, where b is the 
strength with which the tile is attached to the 
supertiles. The parameter p simply gives us the 
time scale for the self-assembly. Winfree showed 
that by setting appropriate tile concentrations and 
binding strengths such that Gne = 2Gse — €, 
the behavior predicted by kTAM approaches the 
behavior described by aTAM as € — 0. However, 
the growth speed also goes to 0 (attachment and 
dissociation form an unbiased random walk) as 
e— 0. 


Problem Definition 


Self-assembly processes in nature are often 
equipped with explicit mechanisms for both error 
prevention and error correction. For artificial 
self-assembly, these problems are even more 
important since we are interested in assembling 
large systems with great precision. Previously, a 
phenomenon called insufficient attachments has 
been identified to be a main source of error [3]. 
An insufficient attachment is the process in which 
a tile ¢ first attach with total strength less than the 
temperature. However, before ¢ falls off, adjacent 
tiles attach and secure f in place. An insufficient 
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Robustness in Self-Assembly, Fig. 1 An example of growth error caused by an insufficient attachment. The red tile 
first attaches with a weaker strength. Then the yellow tile attaches and secures the red tile in place 
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Robustness in Self-Assembly, Fig. 2 (a) An example of a 2 x 2 proofreading block. (b) An example of a 2 x 2 


snaked proofreading block 


attachment happening at a location at which 
another correct tile can attach via cooperative 
bindings is called a growth error. One example 
of growth error caused by insufficient attachment 
is illustrated in Fig. 1. An insufficient attachment 
happening at a location at which no tiles can 
attach according to aTAM is called a facet 
error. The rate at which any specific insufficient 
attachment happens is e~ 2m /(e~Gm + e~ Gs), 
which is roughly e~%* times the rate of tile 
attachments when Gye * 2Gs-. The main goal 
of this section is to introduce error-correction 
systems that deal with insufficient attachments. 


Key Results 


Proofreading Tilesets 

The first error-correction scheme for the tile as- 
sembly model was the proofreading scheme pro- 
posed by Winfree and Bekbolatov [2]. The proof- 
reading scheme turns any tile system with unidi- 
rectional growth into a new system that produces 


the same pattern (with scaling). The scheme re- 
places each tile type t by k? distinct tile types. 
These k? tile types are designed to forma k x k 
block. All internal glues have strength | and are 
unique to this block. The glues on the boundary 
of the block are duplicates of glues on the original 
tile t. A 2 x 2 block is illustrated in Fig. 2a. This 
construction enforces that whenever a growth 
error happens, at least k growth errors must 
happen locally in order for the assembly process 
to proceed. Thus when growth error happens, the 
erroneous tiles are more likely to detach than to 
stay and wait for another k — 1 growth errors to 
happen. Since the probability that a growth error 
happens at any given location is roughly e~°~, 
the probability that a growth error happens at any 
proofreading block and proceeds to produce the 
final incorrect assembly is O(e~*@*). 


Snaked Proofreading Tilesets 
The abovementioned proofreading system only 
handles growth errors but not facet errors, which 
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are a major source of error both in theoreti- 
cal analysis and computer simulations [3]. Chen 
and Goel [3] proposed the snaked proofreading 
scheme which handles all errors caused by insuf- 
ficient attachments. Similar to the proofreading 
scheme, the snaked proofreading scheme replaces 
each tile type by a block of 2k x 2k tile types. 
The only difference is that some internal glues 
have strength 0 or 2 instead of 1. A 2 x 2 block 
is illustrated in Fig.2b. Under correct growth, 
the snaked proofreading block checks its two 
input sides alternatively. Therefore, k insufficient 
attachments must happen before an erroneous 
block “thinks” it attaches correctly and propa- 
gate toward any single direction. As a result, the 
snaked proofreading scheme with block size 2k x 
2k ensures that without k insufficient attachments 
happening locally, all erroneous structures have 
O(k7) tiles and are expected to fall off in time 
polynomial in k. Assuming that the thermody- 
namic parameters G,- and Gs. can be set arbi- 
trarily, the following theorem characterizes the 
performance of the snaked proofreading system. 


Theorem 1 With a 2k x 2k snaked tile system, 
k = O(logn), ann x n square of blocks can be 
assembled in expected time O(n) and with high 
probability, while remaining stable for 2(n) time 
after being assembled. 


One main drawback for the proofreading and 
the snaked proofreading scheme is the resolution 
loss. Since each tile in the original system is 
replaced by ak x k block, the size of the original 
pattern is increased by a factor of k. Chen, Goel, 
and Luhrs [4] showed that if the third dimension 
can be used, two-dimensional tile systems can be 
proofread with no resolution loss. Their system 
replaces each tile by a column in the third dimen- 
sion and thus maintains the original scale on the 
two-dimensional plane. 
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Problem Definition 


One of the most often used techniques in modern 
computer networks is routing. Routing means 
selecting paths in a network along which to 
send data. Demands usually randomly appear on 
the nodes of a network, and routing algorithms 
should be able to send data to their destination. 
The transfer is done through intermediate nodes, 
using the connecting links, based on the topology 
of the network. The user waits for the network to 
guarantee that it has the required capacity during 
data transfer, meaning that the network behaves 
like its nodes would be connected directly by 
a physical line. Such service is usually called 
the permanent virtual circuit (PVC) service. To 
model real life situations, assume that demands 
arrive on line, given by source and destination 
points, and capacity (bandwidth) requirements. 
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Similar routing problems may occur in other 
environments, for example in parallel computa- 
tion. In this case there are several processors 
connected together by wires. During an opera- 
tion some data appear at given processors which 
should be sent to specific destinations. Thus, this 
also defines a routing problem. However, this 
paper mainly considers the network model, not 
the parallel computer one. 

For any given situation there are several rout- 
ing possibilities. A natural question is to ask 
which is the best possible algorithm. To find 
the best algorithm one must define an objective 
function, which expresses the effectiveness of the 
algorithm. For example, the aim may be to reduce 
the load of the network. Load can be measured 
in different ways, but to measure the utilization 
percent of the nodes or the links of the network 
is the most natural. In the online setting, it is 
interesting to compare the behavior of a routing 
algorithm designed for a specific instance to the 
best possible routing. 

There are two fundamental approaches to- 
wards routing algorithms. The first approach is 
to route adaptively, i.e., depending on the actual 
loads of the nodes or the links. The second 
approach is to route obviously, without using 
any information about the current state of the 
network. Here the authors survey only results on 
oblivious routing algorithms. 


Notations and Definitions 
A mathematical model of the network routing 
problem is now presented. 

Let G(V,E,c) be a capacitated network, 
where V is the set of nodes and E is the set 
of edges with a capacity function c : E > Rt. 
Let |V| =7,|E| =m. It can be assumed that 
G is directed, because if G is undirected then 
for each undirected edge e = (u,v) two new 
nodes x, y and four new directed edges e} = 
(u,x),e2 = (v,x),e3 = (y,u),e4 = (y,u) with 
infinite capacity may be added to the graph. If e 
is considered as an undirected edge with the same 
capacity then a directed network equivalent to the 
original one is received. 
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Definition 1 A set of functions f := { fiji, 
i €V, fii :E(G) > R*} is called a multi- 
commodity flow if 


»- #6= >) He 


ecEt eck 


holds for all k #i,k #7, where k eV and 
one E,, are the set of edges coming out from k 
and coming into k resp. Each function fj defines 
a single-commodity flow from i to /. 


Definition 2 The value of a multi-commodity 
flow is ann x n matrix Ty = @); where 


th = > fiO- > fi. 


ecE;t ecE, 
ifi # j andv/ =0, foralli,j eV. 


Definition 3 Let D be a nonnegative n x n ma- 
trix where the diagonal entries are 0. D is called 
as demand matrix. The flow on an edge e € E 
routing the demand matrix D by routing r is 
defined by 


flow(e,r, D) = > dijrij(e) . 
ijeV 
while the edge congestion is 


flow(e, r, D) 
c(e) 


con(e,r, D) = 


The congestion of demand D using routing r is 


con(r, D) = maxcon(e,r, D). 
eck 


Definition 4 A multi-commodity flow r is called 
routing if ¢/; = 1, andifi # j foralli,j eV. 


Routing represents a way of sending information 
over a network. The real load of the edges can be 
represented by scaling the edge congestions with 
the demands. 


Definition 5 The oblivious performance ratio 
P,. of routing r is 
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con(r, D) 
P, = sup ) ————— 
p { opt(D) 

where opt(D) is the optimal congestion which can 
be achieved on D. The optimal oblivious routing 
ratio for a network G is denoted by opt(G), where 


opt(G) = min P, 
a 


Problem 

Input: A capacitated network G(V, E,c). 
Output: An oblivious routing r, where P,. is mini- 
mal. 


Key Results 


Theorem 1 There is a polynomial time algo- 
rithm that for any input network G (directed or 
undirected) finds the optimal oblivious routing 
ratio and the corresponding routing r. 


Theorem 2 There is a directed graph G of n 
vertices such that opt(G) is at least 2(./n). 


Applications 


Most importantly, with these results one can ef- 
ficiently calculate the best routing strategy for 
a network topology with capacity constraints. 
This is a good tool for network planning. The 
effectiveness of a given topology can be tested 
without any knowledge of the the network traffic 
using this analysis. 

Many researchers have investigated the 
variants of routing problems. For surveys on 
the most important models and results, see [10] 
and [11]. Oblivious routing algorithms were 
first analyzed by Valiant and Brebner [15]. 
Here, they considered the parallel computer 
model and investigated specific architectures, 
like hypercube, square grids, etc. Borodin and 
Hopcroft investigated general networks [6]. 
They showed that such simple deterministic 
strategies like oblivious routing can not be very 


Routing 


efficient for online routing and proved a lower 
bound on the competitive ration of oblivious 
algorithms. This lower bound was later improved 
by Kaklamanis et al. [9], and they also gave an 
optimal oblivious deterministic algorithm for the 
hypercube. 

In 2002, Racke constructed a polylog com- 
petitive randomized algorithm for general undi- 
rected networks. More precisely, he proved that 
for any demand there is a routing such that the 
maximum edge congestion is at most polylog(n) 
times the optimal congestion for this demand 
[12]. The work of Azar et al. extends this re- 
sult by giving a polynomial method for calcu- 
lating the optimal oblivious routing for a net- 
work. They also prove that for directed net- 
works no logarithmic oblivious performance ra- 
tio exists. Recently, Hajiaghayi et al. present an 
oblivious routing algorithm which is O (log? n)- 
competitive with high probability in directed net- 
works [8]. 

A special online model has been investigated 
in [5], where the authors define the so called 
“repeated game” setting, where the algorithm is 
allowed to chose a new routing in each day. This 
means that it is oblivious to the demands, that 
will occur the next day. They present an 1 + e- 
competitive algorithm for this model. 

There are better algorithms for the adaptive 
case, for example in [2]. For the offline case 
Raghavan and Thomson gave an efficient algo- 
rithm in [13]. 


Open Problems 


The authors investigated edge congestion in this 
paper, but in practice, node congestion may be 
interesting as well. Node congestion means the 
ratio of the total traffic traversing a node to 
its capacity. Some results can be found for this 
problem in [7] and in [3]. It is an open problem 
whether this method used for edge congestion 
analysis can be applied for such a model. Another 
interesting open question may be whether there is 
amore efficient algorithm to compute the optimal 
oblivious performance ratio of a network [1, 14]. 
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Experimental Results 


The authors applied their method on ISP network 
topologies and found that the calculated optimal 
oblivious ratios are surprisingly low, between 1.4 
and 2. Other research dealing with this question 
found similar results [1, 14]. 
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Problem Definition 


Wireless networks are often modelled using ge- 
ometric graphs. Using only local geometric in- 
formation to compute a sequence of distributed 
forwarding decisions that send a message to its 
destination, routing algorithms can succeed on 
several common classes of geometric graphs. 
These graphs’ geometric properties provide navi- 
gational cues that allow routing to succeed using 
only limited local information at each node. 


Network Model 
A common geometric graph model for wireless 
networks is to represent each node by a point 
in the Euclidean plane, R?, and to add an edge 
(u, v) for each pair of nodes that can communi- 
cate by direct wireless transmission. The absence 
of the edge (u, v) signifies that u cannot transmit 
directly to v, requiring a multi-hop transmission 
via a sequence of intermediate nodes that forms 
a route from u to v. The cost c(e) of sending 
a message over an edge e = (u,v) has been 
modeled in different ways; the most common 
measures include the hop (link) metric (c(e) = 
1), the Euclidean metric (c(e) = |e|, where |e] = 
dist(u, v) is the Euclidean length of the edge e), 
and the energy metric (c(e) = |e|* fora > 2). 
In some models, transmission is assumed to 
be uniform in all directions and of equal range, 
say r, for all nodes. Under this assumption, 
the undirected edge (u,v) exists if and only if 
dist(u, v) < r. Thus, for each node v there is 
an edge from v to every node wu that lies within 
a disk of radius r centered at v. This is the 
unit disk graph model for wireless networks. 
Common classes of geometric graphs that are 
used to model wireless networks include: 


Unit Disk Graph. Vertices are points in R? 
and each edge (u,v) exists if and only if 
dist(u, v) < r, for a given fixed r > 0. 

Plane Graph. Vertices are points in R? and no 
two edges cross. 

Triangulation. Vertices are points in R? and 
every interior face is a triangle. 

Quasi-unit Disk Graph. Vertices are points in 
R* and each edge (u, v) exists if dist(u, v) < 
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r1, may exist if ry < dist(u, v) < rz, but does 
not exist if dist(u,v) > rz, for given fixed 
12 >7r1 > 0. 

Unit Ball Graph. Vertices are points in R? 
and each edge (u,v) exists if and only if 
dist(u, v) < r, for a given fixed r > 0. 

Gabriel Graph. Vertices are points in R? and 
each edge (u,v) exists if and only if the disk 
with diameter (u, v) does not contain any other 
vertices. 


Other classes of geometric graphs used to 
model wireless networks include relative neigh- 
borhood graphs, Delaunay triangulations, Yao 
graphs, convex subdivisions, monotone subdivi- 
sions, edge-augmented plane graphs, and physi- 
cally based models such as SINR. 

A geometric graph G is civilized with A- 
precision if for every pair of nodes u and v in G, 
dist(u, v) > A for a given fixed A > 0, where A is 
independent of 1, the number of nodes in G. 


Communication Protocol 

In several wireless network protocols, e.g., ad 
hoc or wireless sensor networks, there is no 
fixed infrastructure for routing nor any central 
servers. All nodes act as hosts as well as routers. 
Apart from a node’s immediate neighborhood, 
the topology of the network is unknown, Le., 
each node is aware of its own location (its (x, y) 
coordinates) as well as the coordinates of its 
neighbors. Nodes must discover and maintain 
routes in a distributed manner without knowledge 
of precomputed routing tables, any particular 
vertex labeling (other than spatial coordinates), 
nor the support of a central server. Additionally, 
some models incorporate constraints for limited 
memory and power. Depending on the particular 
model, a limited amount of information can be 
stored in message headers to assist with rout- 
ing. When a node receives a message, it reads 
the header (possibly modifying the header in- 
formation) before selecting one of its neighbors 
to which to forward the message. A stateless 
algorithm does not modify the header. Network 
nodes have no memory themselves; any dynamic 
state information is stored in the message header. 
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Furthermore, no precomputed information about 
the network is known to the nodes. 


Geometric Routing 

Given the coordinates of a target node ¢ in a 
(wireless) geometric network G, a source node 
s in G is tasked with sending a message via a 
multi-hop route through G from s to t. Routing 
proceeds by computing a sequence of distributed 
forwarding decisions, where each node along 
the route selects one of its neighbors to which 
to forward the message. Geometric routing is 
uniform in that all nodes execute the same pro- 
tocol. Each node makes a forwarding decision 
as a function of its coordinates, the coordinates 
of its neighbors, the coordinates of t, and any 
available state bits stored in the message header. 
The number of state bits available is critical to 
guaranteeing delivery in some classes of geomet- 
ric graphs by enabling the route to avoid looping 
and reach t. A node may modify the state bits 
before forwarding the message. In some models, 
this state information corresponds to storing data 
about O(1) nodes, e.g., storing the coordinates of 
O(1) nodes. 

The primary objective is to guarantee mes- 
sage delivery to the target node t. Secondary 
objectives include minimizing the total cost of 
communication (the sum of c(e) for all edges e 
on the route) and minimizing the worst-case or 
average dilation (the ratio of the cost of the route 
followed relative to that of the route of lowest 
cost). These secondary objectives are motivated 
by the need for nodes to conserve power in many 
wireless networking settings. 


Key Results 


Local geometric routing assumes only limited 
control information stored in message headers 
and local information available at each node 
along the route. This locality provides network 
independence that results in natural scalability 
to larger networks and continued functionality 
after arbitrary changes to the network. A routing 
algorithm is said to succeed on a particular class 
of geometric graphs if it guarantees delivery 
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from any source node s to any target node ¢ on 
any graph in the class; otherwise, the algorithm 
fails on that class of graphs. 

Below we summarize key local geometric 
routing algorithms and their properties. 


Greedy Routing. Upon receiving a message, a 
node forwards it to its neighbor closest to the 
target node t. Greedy routing is stateless. This 
strategy succeeds on Delaunay triangulations, but 
fails on more general classes of geometric graphs 
such as non-Delaunay triangulations, convex sub- 
divisions, plane graphs, and unit disk graphs. 


Compass Routing [7]. Upon receiving a mes- 
sage, a node u forwards it to its neighbor v that 
minimizes the angle Zuut with the target node 
t. Compass routing is stateless. This strategy 
succeeds on regular triangulations but fails on 
more general classes of geometric graphs such as 
non-regular triangulations, convex subdivisions, 
plane graphs, and unit disk graphs. 


Greedy-Compass Routing [2]. Upon receiving 
a message, a node u considers its two neighbors 
on either side of the line segment uf (node u’s 
compass neighbors) and forwards the message 
to the one closest to t. Greedy-compass routing 
is stateless. This strategy succeeds on all trian- 
gulations but fails on more general classes of 
geometric graphs such as convex subdivisions, 
plane graphs, and unit disk graphs. 

Bose et al. [2] show that no stateless algorithm 
can succeed on convex subdivisions (including 
plane graphs and unit disk graphs). Therefore, to 
succeed on classes of geometric graphs beyond 
triangulations, local routing algorithms require 
storing one or more state bits in the message 
header or predecessor information, i.e., the co- 
ordinates of the node that last forwarded the 
message. 


One State Bit [4]. Upon receiving a message, a 
node u chooses between forwarding the message 
to its clockwise or counter-clockwise compass 
neighbor, depending on the value of a state bit. 
If the compass neighbor lies opposite the verti- 
cal line through ¢, the state bit is flipped. This 
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algorithm uses a single state bit. This strategy 
succeeds on all triangulations and convex sub- 
divisions, but fails on more general classes of 
geometric graphs such as plane graphs and unit 
disk graphs. 


Predecessor Awareness and Monotonicity [4]. 
Each node locally identifies its topmost left 
neighbor as its parent and its right neighbors as its 
children. With knowledge of the predecessor, the 
node forwards the message to its (i + 1)st child 
after receiving it from its 7th child and eventually 
back to its parent after receiving it from its last 
child. The resulting route contains a depth-first 
traversal of a spanning tree of the network. This 
algorithm is stateless, but each node requires 
knowledge of its predecessor, i.e., the coordinates 
of the node that last forwarded the message. 
This strategy succeeds on triangulations, convex 
subdivisions, monotone subdivisions, and edge- 
augmented graphs from these classes but fails on 
more general classes of geometric graphs such as 
non-monotone plane graphs and unit disk graphs. 


Face Routing [1,7]. The message is forwarded 
along the perimeters of faces in the sequence 
of faces that intersect the line segment from the 
source node s to the target node f. This strategy 
applies the right-hand principle, in which each 
face in the sequence is traversed in a counter- 
clockwise direction, as if one were walking while 
sliding the right hand along the wall. To avoid 
cycling indefinitely, the algorithm must store the 
coordinates of O(1) nodes that act as progress 
markers. Furthermore, each node requires knowl- 
edge of its predecessor. This strategy succeeds 
on plane graphs, including triangulations, convex 
subdivisions, and Gabriel graphs. The intersec- 
tion of a unit disk graph with the Gabriel graph 
of a set of points is planar and remains connected 
if the original unit disk graph is connected. Fur- 
thermore, this subgraph can be computed locally; 
this property allows face routing to succeed on 
unit disk graphs [1], as well as quasi-unit disk 
graphs with bounded ratio r2/r) < 2 and unit 
ball graphs contained within slabs of thickness 
less than 1/./2 [6]. Although unit disk graphs 
are nonplanar in general, the nonplanarity is lo- 
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calized; face routing fails on more general classes 
of nonplanar geometric graphs such as quasi-unit 
disk graphs and unit ball graphs [6] and edge- 
augmented plane graphs. Face routing can have 
dilation ©(), where n is the number of network 
nodes. 


Adaptive Face Routing (AFR) [8]. Adaptive 
face routing is a variant of face routing that 
achieves optimality on civilized unit disk graphs 
and civilized planar graphs with the Gabriel prop- 
erty. Like face routing, O(1) state data are stored 
in the message header and each node requires 
knowledge of its predecessor. The algorithm at- 
tempts to estimate the length c of the shortest 
path from s and ¢ by ¢ (starting with ¢ = 2|st| 
and doubling it in every consecutive round). In 
each round, the face traversal is restricted to the 
region formed by the ellipse with the major axis 
€ centered on sf. Each edge is traversed at most 
four times, and the dilation achieved is O(c). 


Geometric Ad-hoc Routing (GOAFR*) [9]. 
Combining methods from greedy routing, face 
routing, and adaptive face routing allows this 
hybrid algorithm to meet the bounds of adaptive 
routing on any unit disk graphs and planar graphs 
with the Gabriel property (not necessarily civi- 
lized). The algorithm first applies greedy routing 
and switches to face routing when the routed 
message enters a local minimum (a dead end), 
before again resuming greedy routing as early as 
possible by applying an early fallback technique. 


General (Non-geometric) Networks 

Is geometry necessary for local routing to suc- 
ceed? Even with knowledge of the predecessor, 
stateless routing algorithms require knowledge 
of the induced subgraph of nodes up to dis- 
tance n/3 away in the worst case [3]. That is, 
stateless routing using only local information is 
impossible. With © (log 7) state bits, local routing 
on arbitrary (not necessarily geometric) graphs 
is possible by deterministically recomputing a 
polynomial-length universal traversal sequence at 
each node along the route, where O(logn) bits 
store an index into the sequence [5]. 
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Open Problems 


If a node’s coordinates can be stored using 
O(logn) bits (e.g., if network nodes are 
positioned on a n° x n° grid), then face routing 
can be applied using O(logn) state bits. It 
remains open whether any local geometric 
routing algorithm can succeed on plane graphs 
using o(logn) state bits. Similarly, it would 
be interesting to characterize broad classes of 
geometric graphs on which local geometric 
routing is possible using O(1) state bits. In 
addition to guaranteeing delivery, bounding 
dilation is of interest. For example, can O(1) 
dilation be guaranteed on convex subdivisions 
using O(1) state bits? Finally, the problem 
of traversing a graph (visiting all nodes) by 
a sequence of local forwarding decisions is 
interesting. Stateless algorithms are impossible 
for any non-Hamiltonian network. How many 
state bits are necessary for a local algorithm to 
traverse a triangulation? 
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Problem Definition 


For a given directed graph G = (V, E) with non- 
negative edge weights, the problem is to compute 
a shortest path in G from a source node s to 
a target node ft for given s and ¢t. Under the 
assumption that G does not change and that a lot 
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of source-target queries have to be answered, it 
pays to invest some time for a preprocessing step 
that allows for very fast queries. As output, either 
a full description of the shortest path or only 
its length d(s, t) is expected — depending on the 
application. 

Dijkstra’s classical algorithm for this prob- 
lem [4] iteratively visits all nodes in the order of 
their distance from the source until the target is 
reached. When dealing with very large graphs, 
this general algorithm gets too slow for many 
applications so that more specific techniques are 
needed that exploit special properties of the par- 
ticular graph. One practically very relevant case 
is routing in road networks where junctions are 
represented by nodes and road segments by edges 
whose weight is determined by some weighting 
of, for example, expected travel time, distance, 
and fuel consumption. Road networks are typi- 
cally sparse (i.e., |E| = O(|V]|)), almost planar 
(i.e., there are only a few overpasses), and hi- 
erarchical (i.e., more or less ‘important’ roads 
can be distinguished). An overview on various 
speedup techniques for this specific problem is 
given in [7]. 


Key Results 


Transit-node routing [2, 3] is based on a simple 
observation intuitively used by humans: When 
you start from a source node s and drive to 
somewhere ‘far away’, you will leave your cur- 
rent location via one of only a few ‘important’ 
traffic junctions, called (forward) access nodes 
A(s). An analogous argument applies to the 
target ¢, i.e., the target is reached from one of 


only a few backward access nodes A(t). More- 
over, the union of all forward and backward 
access nodes of all nodes, called transit-node set 
7, is rather small. The two observations imply 
that for each node the distances to/from its for- 
ward/backward access nodes and for each transit- 
node pair (u, v), the distance between u and v 
can be stored. For given source and target nodes 
s and f, the length of the shortest path that passes 
at least one transit node is given by 
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dr(s,t) = min{d(s,u) + d(u,v) + d(v,t) | 


ue A(s),vE Al}. 


Note that all involved distances d(s, u), d(u, 
v), and d(v, t) can be directly looked up in 
the precomputed data structures. As a final 
ingredient, a locality filter L:'V x Vo — 
{true, false} is needed that decides whether 
given nodes s and ¢ are too close to travel 
via a transit node. L has to fulfill the property 
that =L(s,t) implies that d(s,t) = dy(s,t). 
Note that in general the converse need not hold 
since this might hinder an efficient realization 
of the locality filter. Thus, false positives, 1.e., 
“L(s,t) A d(s,t) = dr(s,t)”, may occur. 

The following algorithm can be used to com- 
pute d(s, ft): 

If -L(s,t), then compute and return d+(s, ft); 

else, use any other routing algorithm. 


Figure 1 gives an example. Knowing the 
length of the shortest path, a complete description 
of it can be efficiently derived using iterative 
table lookups and precomputed representations 
of paths between transit nodes. Provided that the 
above observations hold and that the percentage 
of false positives is low, the above algorithm 
is very efficient since a large fraction of all 
queries can be handled in line 1, d7(s,t) 
can be computed using only a few table 
lookups, and source and target of the remaining 
queries in line 2 are quite close. Indeed, the 
remaining queries can be further accelerated by 
introducing a secondary layer of transit-node 
routing, based on a set of secondary transit 
nodes 72> 7. Here, it is not necessary to 
compute and store a complete 7 x 72 distance 
table, but it is sufficient to store only distances 
{d(u,v) | u,v € Tz Ad(u,v) 4 dr(s,t)},  Le., 
distances that cannot be obtained using the 
primary layer. Analogously, further layers can 
be added. 

There are two different implementations: one 
is based on a simple geometric grid and one 
on highway hierarchies, the fastest previous 
approach [5, 6]. A highway hierarchy consists of 
a sequence of levels (Fig. 1), where level i + 1 is 
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Routing in Road Networks with Transit Nodes, Fig. 1 
Finding the optimal travel time between two points (flags) 
somewhere between Saarbriicken and Karlsruhe amounts 
to retrieving the 2 x 4 access nodes (diamonds), perform- 
ing 16 table lookups between all pairs of access nodes, 
and checking that the two disks defining the locality filter 


constructed from level i by bypassing low-degree 
nodes and removing edges that never appear 
far away from the source or target of a shortest 
path. Interestingly, these levels are geometrically 
decreasing in size and otherwise similar to each 
other. The highest level contains the most ‘impor- 
tant’ nodes and becomes the primary transit-node 
set. The nodes of lower levels are used to form 
the transit-node sets of subordinated layers. 


Applications 


Apart from the most obvious applications in car 
navigation systems and server-based route plan- 
ning systems, transit-node routing can be applied 
to several other fields, for instance to massive 
traffic simulations and to various optimization 
problems in logistics. 


Open Problems 


It is an open question whether one can find better 
transit-node sets or a better locality filter so that 


do not overlap. Transit nodes that do not belong to the 
access node sets of the selected source and target nodes 
are drawn as small squares. The figure draws the levels 
of the highway hierarchy using colors gray, red, blue, and 
green for levels 0-1, 2, 3, and 4, respectively 


the performance can be further improved. It is 
also not clear if transit-node routing can be suc- 
cessfully applied to other graph types than road 
networks. In this context, it would be desirable 
to derive some theoretical guarantees that apply 
to any graph that fulfills certain properties. For 
some practical applications, a dynamic version of 
transit-node routing would be required in order 
to deal with time-dependent networks or unex- 
pected edge weight changes caused, for example, 
by traffic jams. The latter scenario can be handled 
by a related approach [8], which is, however, 
considerably slower than transit-node routing. 


Experimental Results 


Experiments were performed on road networks 
of Western Europe and the USA using a cost 
function that solely takes expected travel time 
into account. The results exhibit various trade- 
offs between average query time (5-63 us for 
the USA), preprocessing time (59 min to 1,200 
min), and storage overhead (21 bytes/node to 
244 bytes/node). For the variant that uses three 
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Routing in Road Networks’ with Transit and the average number of access nodes to the respective 
Nodes, Table 1 Statistics on preprocessing. The size of _ layer are given; furthermore, the space overhead and the 
transit-node sets, the number of entries in distance tables, preprocessing time 


Layer | Layer 2 Layer 3 

IT| — |A| Avg. |7| |Tablez| [x 10°] | |A2| Avg. | |73| |Table3| [x 10°]| Space [B/node] Time [h] 
Europe | 11,293 9.9 323,356 130 4.1 2,954,721 119 251 2:44 
USA | 10,674 5.7 485,410 204 4.2 3,855,407 173 244 3:25 


Routing in Road Networks’ with Transit layers <i. Each box spreads from the lower to the upper 
Nodes, Table 2 Performance of transit-node routing quartile and contains the median, the whiskers extend 
with respect to 10,000,000 random queries. The column to the minimum and maximum value omitting outliers, 
for layer i specifies which fraction of the queries is which are plotted individually 

correctly answered using only information available at 


#nodes #edges Layer | Layer 2 Layer 3 Query 
Europe 18 029 721 42 199 587 99.74 % 99.9984 % 99.99981 % 5.6 [Ls 
USA 24 278 285 58 213 192 99.89 % 99.9986 % 99.99986 % 4.9 ws 
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Routing in Road Networks with Transit Nodes, Fig. 2. box-and-whisker plots: each box spreads from the lower to 
Query time distribution as a function of Dijkstra rank-the _ the upper quartile and contains the median, the whiskers 
number of iterations Dijkstra’s algorithm would need to extended to the minimum and maximum value omitting, 
solve this instance. The distributions are represented as__ which are plotted individually 


layers and is tuned for best query times, Tables | tribution for 1,000 queries with random starting 
and 2 show statistics on the preprocessing and point s and the target node t for which Dijkstra’s 
the query performance, respectively. The average algorithm would need r iterations to find it. The 
query times of about 5 us are six orders of magni- three layers of transit-node routing with small 
tude faster than Dijkstra’s algorithm. In addition, transition zones in between can be recognized: 
Fig. 2 gives for each rank r on the x-axis a dis- for large ranks, it is sufficient to access only the 
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primary layer yielding query times of about 5 us, 
for smaller ranks, additional layers have to be 
accessed resulting in median query times of up 
to 20 us. 


Data Sets 


The European road network has been provided by 
the company PTV AG, the US network has been 
obtained from the TIGER/Line Files [9]. Both 
graphs have also been used in the 9th DIMACS 
Implementation Challenge on Shortest Paths [1]. 


URL to Code 


The source code might be published at some point 
in the future at http://algo2.iti.uka.de/schultes/ 
hwy/. 
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Problem Definition 


Connected dominating set CDS is typically 
adapted in wireless multihop networks such as 
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wireless sensor, ad hoc networks. In order to 
achieve routing efficiency, a virtual backbone 
which is inspired by the backbone in wired 
networks is often used to improve routing 
because it can reduce the path search space 
and the routing table size [3]. According to 
[8, 10], there are many methods to construct a 
virtual backbone, and the competitive approach 
is connected dominating set(CDS). The detailed 
definition of CDS is as follows. 

Given a connected graph G(V, EF) represents 
as a wireless sensor network, where V is the set of 
sensor nodes and E is the set of edges connecting 
sensor nodes in V. If there is asubset D(D C V), 
each sensor node in V either belonging to D or 
adjacent to a sensor node in D, then we call D 
is a dominating set (DS). If the subgraph induced 
by DS is a connected graph, then we call DS is a 
connected dominating set(CDS). 

Intuitively, if the size of CDS is smaller, the 
virtual backbone can play a greater role in rout- 
ing. Many studies such as [1, 13, 14] and [9, 12] 
also aimed to construct a virtual backbone based 
on a CDS with minimum size which is called 
minimum connected dominating set (MCDS). A 
minimum CDS (MCDS) is a CDS that has the 
minimum number of nodes. For example, the 
gray nodes in Fig. la form an MCDS of the sam- 
ple graph, while the black nodes in Fig. 1b make 
a CDS. However, these studies didn’t ignore that 
if the size of the CDS is too small, some sensor 
nodes couldn’t find a shortest routing path to their 
destination. Du et al. [10] analyzes how a virtual 
backbone based on MCDS' makes some routing 
paths much longer than the shortest paths. Thus, 
there exists a disadvantage in MCDS which is 
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Set, Fig. 1 (a) An example MCDS. (b) An example CDS 
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the unavailability of shortest routing paths. If we 
only aim to construct a virtual backbone based on 
MCDS, we couldn’t achieve a guaranteed routing 
constraint in data delivery. Routing constraint 
in CDS becomes much more important when 
constructing a virtual backbone. 

In this section, the problem of the routing 
constraint in connected dominating set (R-CDS) 
is given in formal by considering the wireless 
multihop network environment. The problem of 
R-CDS is defined as follows: 

Given a graph G = (V, E) where V represents 
a node set and E denotes an edge set, we would 
like to find a CDS D in polynomial time so that, 
for every pair of nodes u and v, there exists a path 
between u and v with intermediate nodes in D 
and path length at most a - d(u,v), where @ is a 
constant and d(u, v) is the length of the shortest 
path between u and v. In addition, the size of the 
resulting CDS |D| is bounded by 6 - optucps, 
where is a constant and optycps is the size of 
the MCDS. 

The problem specified in wireless multihop 
networks has been considered under both general 
graph and unit disk graph(UDG) model and will 
be further discussed in the following section. 
With the UDG model, all nodes in the network 
have the same transmission range, and there does 
not exist any obstacle. As a result, as long as a 
receiving node is within the transmission range 
of a sending node, the receiving node will be 
able to receive the data successfully. With general 
graph model, the nodes in the network could 
have different transmission ranges, and obstacles 
might interfere with normal data communication. 
As a result, being in the transmission range of 
a sending node does not guarantee successful 
transmission. 


Key Results 


Many literatures also focus on the study of rout- 
ing constraint in CDS which can be classified into 
two categories: the general graph and UDG. 

In the general graph category, Ding et al. [2] 
studied a special connected dominating set (CDS) 
problem named minimum routing cost CDS 
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(MOC-CDS). They proved that constructing a 
minimum MOC-CDS in general graph is NP-hard 
and proposed a distributed heuristic algorithm 
(called as FlagContest) for constructing MOC- 
CDS with performance ratio (J — In2) + 2In6. 
Du et al. [8] presented a constant-approximation 
scheme which produces a connected dominating 
set D, whose size |D| is within a factor a from 
that of the minimum connected dominating set, 
and each node pair exists in a routing path with 
all intermediate nodes in D and with length at 
most 5d(u,v), where d(u,v) is the length of 
shortest path of this node pair. Ding et al. [3] 
developed an exact algorithm for minimum CDS 
with shortest path constraint called SPCDS and 
proved that finding such a minimum SPCDS can 
be achieved in polynomial time. Ding et al. [4] 
showed that under general graph model, a-MOC- 
CDS is NP-hard for any a > /. Ding et al. [5, 6] 
studied virtual backbone with guaranteed routing 
costs, named a@ minimum routing cost directional 
virtual backbone(a-MOC-DVB). They proved 
that the construction of a minimum a-MOC- 
DVB is an NP-hard problem in a general directed 
graph. Du et al. [10] proved that there is no 
polynomial-time constant approximation for a- 
MOC-CDS unless P = NP when a > 2. 

In the UDG category, Wu et al. [15] studied 
the relationship between minimum connected 
dominating sets and maximal independent sets 
in unit disk graphs. Kim et al. [11] proposed a 
distributed algorithm under UDG model, CDS- 
BD-D, which constructs a CDS whose size and 
maximum path length are bounded. Du et al. [8] 
proposed two algorithms which are centralized 
algorithm and distributed algorithm to achieve 
constant-approximation performance ratio on 
MCDS and routing cost. Du et al. [7] studied 
a problem of minimizing the size of connected 
dominating set | D | under constraint that for 
any two nodes u and v, mp(u,v) < am(u, v) 
where @ is a constant, mp (u, v) is the number of 
intermediate nodes on a shortest path connecting 
u and v through D, and m(u, v) is the number of 
intermediate nodes in a shortest path between u 
and v in a given unit disk graph. 

In this chapter, we introduce that, under gen- 
eral graph model, there is no polynomial-time 
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Algorithm 1: Centralized algorithm GOC- 


MCDS-C 

1: Initially Set D < @. 

2: Step 1. Construct a maximal independent set J. 

3: Step 2. For every pair of nodes u, v in I with 
d(u, v) < 3, compute a shortest path p(u, v) and 
put all intermediate nodes of p(u, v) into C. 

4: Output D=CUT. 


Algorithm 2: Construct an MIS J (Stage 1) 


1: Initially Every node is colored in white and is 
assigned with a positive integer ID; different nodes 
have different IDs. 

2: Step 1 Every white node sends its ID to its 
neighbors and then compares its ID with received 
IDs from neighbors. If its ID is smaller than every 
received ID from neighbors, then it turns the color 
from white to black. 

3: Step 2 Every black node sends message “black” to 
its neighbors. If a white node receives a message 
“black,” then it turns its color from white to gray. 

4: Step 3 Go back to Step 1 until no white node exists. 

: Output All black nodes form a maximal 

independent set [ 


Nn 


constant-approximation solution in terms of CDS 
size unless P = WNP shown in [10]. Under 
UDG model, we will present a polynomial-time 
constant-approximation algorithm GOC-MCDS- 
C which produces a CDS D with size |D| < 
176-optucps + 64 and with a property that for 
any pair of nodes u and v, dp(u,v) < 7-d(u,v) 
in [10]. The distributed version of the algorithm, 
GOC-MCDS-D, is thoroughly analyzed. 


GOC-MCDS-C: The Centralized Algorithm 

Under general graph model, the existing 
proof is that there is no polynomial-time 
constant approximation for the problem under 
investigation unless NP=P shown in [10]. 
However, under UDG model, polynomial-time 
constant-approximation algorithms do exist. Kim 
et al. [11] proposed a distributed algorithm, 
CDS-BD-D, that constructs a CDS whose size 
and maximum path length are bounded. In this 
section, we advance Kim et al.’s results by pre- 
senting the details of an innovative polynomial- 
time constant-approximation algorithm, GOC- 
MCDS-C. The proposed algorithm produces a 
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Algorithm 3: Connect the MIS 7 (Stage 2) 


1: Step 1 Every black node sends its ID to its neigh 
-bors. 

2: Step 2 Every node adds its own ID id2 to each 
received ID id, and then sends those pairs of IDs 
(id,, idz) to all its neighbors. 

3: Step 3 Each node does the following: Suppose its 
ID isid*: 


1. For each pair of IDs id, and id,» received in 
Step 1, ifid; < id», then send a message 


(id\*,id*,id,) to the neighbor with ID id). 


2. For each pair of messages (id, id) and 
(id, ,idz*) received at Step 2, if 
id, < idx, then send a message 
(id\*,id2«,id*,idz, id) to the neighbor 
with ID id>. 

3. For each message (id1, idz) received at 
Step 2 and ID id= received at Step 1, if 
id, < idx, then send a message 
(idx, id*,idz,id}) to the neighbor 
with ID idz; otherwise, send a message 
(id,, idz,id* ,id,«) to the neighbor 
with ID id,» 4 


4: Step 4 When a node with ID id> received a 
message (id1*,ido*,id*,idz,id,) or 
(idx, id*,idz,idj), it sends this message to its 
neighbor with ID id. 

5: Step 5 Each black node with ID id, collects all 
messages in form (id3, id2,id,) or 
(id4,id3,idz,id,) or (ids,id4, id3,idz,id)) 
received in Step 3 and Step 4. Suppose those 
messages form a set M. Then it performs the 
following computation: 
while M + @ do begin 

choose (idy,...,id2,id,|) € M; 

send message (idy,...,id2,id,) to 
node with ID id>; 

delete all messages starting with id, 
from M; 
end-while 

6: Step 6 When a node with ID id; received a 
message (...,idj—1,id;,...), it turns black. In 
addition, if id; is not the leftmost id in the message, 
then it passes this message to node with ID 
idj—; if id; is the leftmost id in the message, do 
nothing. 

7: Step 7 If no message is passed in Step 6, then stop. 
Otherwise, go back to Step 6. 


CDS D with size |D| < 176-optycps + 64 
and with a property that for any pair of nodes 
u and v, dp(u,v) < 7d(u,v)[10]. Note that 
GOC-MCDS-C is a centralized algorithm. GOC- 
MCDS-C follows the steps of regular MCDS 


Routing-Cost Constrained Connected Dominating Set 


construction algorithms. Namely, there are two 
steps in total. During the first step, an MIS is 
constructed. In the second step, the nodes in the 
MIS are connected in order to form a CDS. 


GOC-MCDS-D: The Distributed Algorithm 

In this section, the distributed algorithm GOC- 
MCDS-D is described in details. The perfor- 
mance of GOC-MCDS-D is the same as that of 
GOC-MCDS-C shown in [10]. Similar to the cen- 
tralized algorithm GOC-MCDS-C, GOC-MCDS- 
D consists of two stages. In the first stage, an MIS 
is constructed using Algorithm 2. In the second 
stage, the MIS is connected using Algorithm 3. 


Open Problems 


The coverage problems in wireless sensor net- 
works which related to the routing-cost con- 
strained in CDS are still an open problem. 
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Problem Definition 


Problem statement and the I/O model. Let S 
be a set of N axis-parallel hypercubes in R7.A 
very basic operation in a spatial database is to 
answer window queries on the set S. A window 
query Q is also an axis-parallel hypercube in R?@ 
that asks us to return all hypercubes in S' that 
intersect Q. Since the set S is typically huge 
in a large spatial database, the goal is to design 
a disk-based or external memory data structure 
(often called an index in the database literature) 
such that these window queries can be answered 
efficiently. In addition, given S, the data struc- 
ture should be constructed efficiently and should 
be able to support insertions and deletions of 
objects. 

When external memory data structures 
are concerned, the standard external memory 
model [2], a.k.a. the I/O model, is often used 
as the model of computation. In this model, 
the machine consists of an infinite-size external 
memory (disk) and a main memory of size M. 
A block of B consecutive elements can be 
transferred between main memory and disk in 
one J/O operation (or simply I/O). An external 
memory data structure is a structure that is stored 
on disk in blocks, but computation can only occur 
on elements in main memory, so any operation 
(e.g., query, update, and construction) on the data 
structure must be performed using a number I/Os, 
which is the measure for the complexity of the 
operation. 


R-trees. The R-tree, first proposed by 
Guttman [9], is a multi-way tree 7, very similar 
to a B-tree, that is used to store the set S such 
that a window query can be answered efficiently. 
Each node of 7 fits in one disk block. The 
hypercubes of S are stored only in the leaves 
of 7. All leaves of 7 are on the same level, 
and each stores ©(B) hypercubes from S; while 
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each internal node, except the root, has a fan- 
out of ©(B). The root of 7 may have a fan-out 
as small as 2. For any node u € 7, let R(u) 
be the smallest axis-parallel hypercube, called 
the minimal bounding box, that encloses all the 
hypercubes stored below u. At each internal node 
v € TJ, whose children are denoted v1,..., ux, 
the bounding box R(v;) is stored along with the 
pointer to v; fori = 1,...,k. Note that these 
bounding boxes may overlap. Please see Fig. 1 
for an example of an R-tree in two dimensions. 

For a window query Q, the query answering 
process starts from the root of 7 and visits all 
nodes u for which R(u) intersects QO. When 
reaching a leaf v, it checks each hypercube stored 
at uv to decide if it should be reported. The 
correctness of the algorithm is obvious, and the 
efficiency (the number of I/Os) is determined by 
the number of nodes visited. 

Any R-tree occupies a linear number O(N /B) 
disk blocks, but different R-trees might have 
different query, update, and construction costs. 
When analyzing the query complexity of window 
queries, the output size T is also used, in addition 
to N, M,and B. 


Key Results 


Although the structure of an R-tree is restricted, 
there is much freedom in grouping the 
hypercubes into leaves and grouping subtrees 
into bigger subtrees. Different grouping strategies 
result in different variants of R-trees. Most of the 
existing R-trees use various heuristics to group 
together hypercubes that are “close” spatially, 
so that a window query will not visit too many 
unnecessary nodes. Generally speaking, there are 
two ways to build an R-tree: repeated insertion 
and bulk loading. The former type of algorithms 
include the original R-tree [9], the Rt-tree [15], 
the R*-tree [6], etc. These algorithms use 
O(logg N) I/Os to insert an object and hence 
O(N logp N) I/Os to build the R-tree on S, 
which is not scalable for large N. When the set 
S is known in advance, it is much more efficient 
to bulk load the entire R-tree at once. Many 
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bulk-loading algorithms have been proposed, 
e.g., [7, 8, 11, 13]. Most of these algorithms 
build the R-tree with O (4 logy;g 4) I/Os (the 
number of I/Os needed to sort N elements), 
and they typically result in better R-trees than 
those obtained by repeated insertion. During the 
past decades, there have been a large number 
of works devoted to R-trees from the database 
community, and the list here is by no means 
complete. The reader is referred to the book 
by Manolopoulos et al. [14] for an excellent 
survey on this subject in the database literature. 
However, no R-tree variant mentioned above has 
a guarantee on the query complexity; in fact, 
Arge et al. [3] constructed an example showing 
that some of the most popular R-trees may have 
to visit all the nodes without reporting a single 
result. 

From the theoretical perspective, the following 
are the two main results concerning the worst- 
case query complexity of R-trees. 


Theorem 1 ({1,12]) There is a set of N points 
in R¢@, such that for any R-tree T built on 
these points, there exists an empty window 
query for which the query algorithm has to visit 
Q((N/B)!—"/2) nodes of T. 


The priority R-tree, proposed by Arge 
et al. [3], matches the above lower bound. 


Theorem 2 ((3]) For any set S of N_ axis- 
parallel hypercubes in R4, the priority R-tree 
answers a window query with O((N/B)!—1/4 + 
T/B) Os. It can be 
O (F logup 4) Os. 


constructed with 


It is also reported that the priority R-tree 
performs well in practice, too [3]. However, it 
is not known how to update it efficiently while 
preserving the worst-case bound. The logarithmic 
method was used to support insertions and dele- 
tions [3], but the resulted structure is no longer an 
R-tree. 

Note that the lower bound in Theorem | only 
holds for R-trees. If the data structure is not 
restricted to R-trees, better query bounds can 
be obtained for the window-query problem; see 


e.g., [4]. 
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R-Trees, Fig. 1 An R-tree example in two dimensions 


Applications 


R-trees have been used widely in practice due to 
its simplicity and ability to store spatial objects of 
various shapes and to answer various queries. The 
areas of applications span from geographical in- 
formation systems (GIS), computer-aided design, 
computer vision, and robotics. When the objects 
are not axis-parallel hypercubes, they are often 
approximated by their minimal bounding boxes, 
and the R-tree is then built on these bounding 
boxes. To answer a window query, first the R- 
tree is used to locate all the intersecting bounding 
boxes, followed by a filtering step that checks 
the objects exactly. The R-tree can also be used 
to support other kinds of queries, for example, 
aggregation queries, nearest neighbors, etc. In 
aggregation queries, each object o in S is asso- 
ciated with a weight w(o) € R, and the goal is 
to compute > w(0) where the sum is taken over 
all objects that intersect the query range Q. The 
query algorithm is same as before, except that in 
addition it keeps running sum while traversing 
the R-tree and may skip an entire subtree rooted 
at some u if R(u) is completely contained in Q. 
To find the nearest neighbor of a query point gq, 
a priority queue is maintained, which stores all 
the nodes u that might contain an object that is 
closer to the current nearest neighbor found so 
far. The priority of u in the queue is the distance 
between g and R(u). The search terminates when 
the current nearest neighbor is closer than the top 


element in the priority queue. However, no worst- 
case guarantees are known for R-trees answering 
these other types of queries, although they tend to 
perform well in practice. 


Open Problems 


Several interesting problems remain open with 
respect to R-trees. Some of them are listed 
here: 


e Is it possible to design an R-tree with the 
optimal query bound O((N/B)!~!/4 + T/B) 
that can also be efficiently updated? Or prove 
a lower bound on the update cost for such an 
R-tree. 

e Is there an R-tree that supports aggregation 
queries for axis-parallel hypercubes in 
O((N/B)!—'/4) I/Os? This would be optimal 
because the lower bound of Theorem | also 
holds for aggregation queries on R-trees. Note 
that, however, no sublinear worst-case bound 
exists for nearest-neighbor queries, since it is 
not difficult to design a worst-case example 
for which the distance between the query 
point g and any bounding box is smaller than 
the distance between gq and its true nearest 
neighbor. 

¢ When the window query Q shrinks to a point, 
that is, the query asks for all hypercubes in 
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S that contain the query point, the problem 
is often referred to as stabbing queries or 
point enclosure queries. The lower bound of 
Theorem | does not hold for this special case, 
while a lower bound of Q(log, N + T/B) 
was proven in [5], which holds in the strong 
indexability model. It is intriguing to find out 
the true complexity for stabbing queries using 
R-trees, which is between Q(log, N + T/B) 
and O((N/B)!-1/4 + T/B). 


Experimental Results 


Nearly all studies on R-trees include experimen- 
tal evaluations, mostly in two dimensions. Re- 
portedly the Hilbert R-tree [10, 11] has been 
shown to have good query performance while 
being easy to construct. The R*-tree’s insertion 
algorithm [6] has often been used for updating the 
R-tree. Please refer to the book by Manolopoulos 
et al. [14] for more discussions on the practical 
performance of R-trees. 


Data Sets 


Besides some synthetic data sets, the TIGER/Line 
data = (http://www.census.gov/geo/wwwi/tiger/) 
from the US Census Bureau has been frequently 
used as real-world data to test R-trees. The R-tree 
portal (http://www.rtreeportal.org/) also contains 
many interesting data sets. 


URL to Code 


Code for many R-tree variants is available at the 
R-tree portal (http://www.rtreeportal.org/). The 
code for the priority R-tree is available at (http:// 
www.cs.duke.edu/~yike/prtree/). 
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Problem Definition 


The aim is to design effective algorithms for 
controlling rumor propagation in social networks. 
Here, a rumor is viewed as an undesirable thing. 
Social networks are represented by undirected or 
directed graphs, depending on different contexts. 
In these graphs, nodes denote individuals and 
edges denote the influence between individuals. 
A list of strategies has been proposed to limit 
the spread of a rumor in a network. We group 
some of the existing research works into two 
categories. The first one includes the works that 
launch the opposite cascade, protector, to spread 
in a network, such that the number of nodes that 
adopt the rumor at the end of both cascades dif- 
fusion is limited. The second contains the works 
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that are concerned with the network structure, 
that is, they control the rumor propagation by 
blocking some edges or nodes or both of them 
together in a network. 

In this work, we mainly introduce two specific 
works belonging to the two different categories. 
For each of them, we briefly introduce some 
related works. 


Problem 1 [2] 

Two cascades, rumor (bad campaign) and protec- 
tor (limiting campaign), diffuse simultaneously in 
a network. Influence diffusion models are used 
to capture their propagation processes. The ob- 
jective is to limit the rumor propagation through 
protector diffusion. 

Given a directed graph G = (V, E), original 
rumor sources R C V, an integer k > 0, and the 
time delay d (a nonnegative integer) for detecting 
rumor sources, the objective is to identify k 
nodes as initial protectors, such that the expected 
number of nodes adopting the rumor at the end of 
both rumor and protector propagation processes 
is minimized, or equivalently, the reduction in the 
expected number of nodes adopting the rumor is 
maximized. 


Two Influence Diffusion Models 
Two influence diffusion models are adopted in 


[2]. 


Multi-campaign Independent Cascade Model 
(MCICM) In this model, a network is viewed as 
a directed graph G = (V, £). The initial set of 
rumor sources is denoted by R, and the initial set 
of protectors is denoted by P. Each node must 
be in one of the three statuses: infected (by the 
rumor), protected (by the protector), and inactive 
(neither infected nor protected). Each edge é,,y is 
associated with two values 0 < p;(u,v) < 1 and 
0 < pp(u, v) < 1. Once a node becomes infected 
or protected, it remains so forever. 

The diffusion process unfolds in discrete time 
steps. In any step ¢ > 1, when a node u first be- 
comes infected (protected), it has a single chance 
to activate each currently inactive neighbor v, and 
it succeeds with probability p;(u, v) (pp(u, v)) 
provided no neighbor of v tries activating v at the 


1888 


same step. In other words, at step t + 1, node v 
will become infected (protected) with probability 
Pr(u,v) (pp(u, v)) provided no neighbor of v 
tries activating uv at the same step. If there are 
two or more nodes trying to activate v at the 
same step, at most one of them can succeed. 
If infected node(s) and protected node(s) try to 
activate a node at the same step, protected nodes 
have priority over infected nodes. The process 
continues until no newly infected or protected 
node appears. 


Campaign-Oblivious Independent Cascade 
(COICM) This model is similar to the MCICM 
model; the only difference is that instead of two 
probabilities are associated with each edge, only 
one probability 0 < p(u,v) < 1 is associated 
with each edge e,y. That is, each node has the 
same probability to forward the two kinds of 
information, indicating both rumor and protector 
cascades pass through the same edge with the 
same probability. 


Problem 2 [8] 

A single cascade, rumor, diffuses through net- 
works. Influence diffusion models are used to 
capture rumor propagation process. The objective 
is to limit the propagation of rumors through 
blocking links in networks. The aim of [8] is to 
minimize contamination degree by appropriately 
removing a fixed number of links. Here, the 
contamination degree of a network is used to 
measure how badly the rumor will contaminate 
the network; see its definition later. 

Given a directed graph G = (V, E), a positive 
integer k where k < |E], find a subset B* C E 
with |B*| = k such that c(G(B*)) < c(G(B)) 
for any B C E with|B| =k. Here c(G) denotes 
the contamination degree. For any link e € E, 
let G(e) denote the graph G(V, E \ e). And G(e) 
is used as the graph constructed by blocking e in 
G. Similarly, for any B C E, let G(B) denote 
graph G(V, E \ B). Then G(B) represents the 
graph constructed by blocking B in G. 


Independent Cascade Model 
In this model, a network is considered as a di- 
rected graph G = (V,E). Each edge ey € 
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E is assigned an influence probability p(u, v), 
representing the possibility that node u influences 
node v successfully. For e,,» ¢ FE, let p(u,v) = 
0. Each node can only be in one of the following 
two statuses: inactive or infected. Once a node 
becomes infected, it stays infected forever. 

The diffusion process unfolds in discrete time 
steps. Starting with an initial set of infected nodes 
Apo, at any step tf > 1, when node wu first becomes 
infected in step f, it has a single chance to 
activate any of its currently inactive neighbors. 
For neighbor node v, it succeeds with probability 
plu, v). If u succeeds in activating v, then v will 
become infected in step ¢ + 1, and if w fails in 
activating v, then v will stay inactive. If node u 
does not succeed in activating v, it will not have a 
second chance to do in all subsequent steps. The 
process continues until no more activations are 
possible. If multiple newly activated nodes are in- 
neighbors of the same inactive node, then their 
activation attempts are sequenced in an arbitrary 
order. 


Key Results 


For Problem 1 

Given the set of rumor sources R, a set of initial 
protectors P, and rumor detection delay d, a set 
function frg(P) represents the number of nodes 
that are prevented by P with diffusion delay 
d from adopting R. In other words, function 
FRa(P) denotes the nodes that will be infected 
by R if, instead of P, the empty set is selected as 
the set of protectors. Therefore, the problem is to 
select P such that the expectation of frg(P) is 
maximized. 

The NP-hardness of this problem is proved. 
Then for the MCICM model, the high- 
effectiveness property where pp(u,v) = 1 for 
edge e,,y € FE is adopted. Then the objective 
functions are proved to be submodular and 
monotone under both the MCICM model with 
the high-effectiveness property and the COICM 
model. Therefore, Algorithm 1 is applied to 
provide (1 —1/e)-approximation solutions for the 
problem. However, the objective function is not 
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Algorithm 1: The greedy algorithm 


Input: Graph G = (V, £), the set of initial rumor 
sources R, rumor detection delay d, a positive 
integer k, a positive number 7, representing the 
simulation times 

Output: Set P 


P=6 
fori = 1 tok do 
for eachu € V \ (RU P) do 
N, = 0 


for j = 1 tondo 
| Nut = fra(P VU {u}) — fra(P) 
end 
Nu = N,/n 
end 


Loc = arg maxye(v \(RUP)) {Nut 
P=PUtLoc 
end 


Output P 


submodular under the MCICM model without 
the high-effectiveness property. 


Variants of Problem 1 

Several works in different contexts also consider 
using the diffusion of protectors to contain the 
spread of rumors. In comparison to Problem 1, 
they have adopted different influence diffusion 
models, as well as formulated different optimiza- 
tion problems. 


Selection of Fixed Number of Protectors 

The work of He et al. [5] studies rumors blocking 
maximization under an extension of the classical 
Linear Threshold (LT) model [6], in which they 
incorporate two cascades, rumor and protector. 
Each node in this model can be in one of the 
three states: infected, protected, and inactive. For 
each node, its currently infected neighbors and 
protected neighbors determine whether it will 
become infected, protected, or stay inactive, re- 
spectively. When a node is activated by its in- 
fected neighbors and protected neighbors at the 
same time, then infected neighbors have priority 
over protected neighbors. Each edge e = (u, v) 
has two weights, w’,, (rumor propagation) and 
w?y (protector propagation). Each node u picks 
two independent thresholds from [0,1]; one is 
for rumor diffusion and the other is for protector 
diffusion. 
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Then they develop the objective function 
Sr(X) for this problem, which represents 
the expected number of nodes that is saved 
(from being infected by rumors R) by X. 
This problem is shown to be NP-hard and the 
objective function is proved to be submodular 
and monotone, then the greedy algorithm with 
performance ratio 1 — 4 is applied. To efficiently 
compute the values of Spr(X), the authors 
propose the CLDAG algorithm. 

Instead of choosing initial protectors from 
nodes not in rumor sources, the authors in [10] se- 
lect a fixed number of nodes from initial infected 
nodes (rumor sources) and the rest of the nodes 
in a network as initial protectors, such that the 
number of nodes protected during T time steps 
is maximized. They study this problem under the 
LT model and the IC model. Two approximation 
algorithms are proposed. 


Protection of a Subset of Nodes 

Instead of limiting rumor diffusion through 
launching a fixed number of protectors, the work 
of [12] exploits the problem which aims to select 
the smallest set of influential people as protectors, 
such that the diffusion process starting from these 
protectors limits the propagation of rumors R in 
a fraction 0 < a < 1 of the whole network in 
T time steps. They study four variants of this 
problem, which are the combinations of the two 
parameters: R (can be unknown or known) and 
T (can be constrained or unconstrained). These 
problems are studied under the extensions of 
the IC model and the LT model, in which two 
cascades, rumors and protectors, are considered. 
For each edge, both of them have the same 
influence probability (IC) or influence weight 
(LT). For each node, they have the same threshold 
(LT). The key point is that when the two cascades 
try to activate a node at the same time, protectors 
have priority over rumors. 

The authors prove the NP-hardness of the 
four problems under the proposed models. For 
the variant that R is unknown and T is uncon- 
strained, the Greedy Viral Stopper (GVS) algo- 
rithm is adopted to select the protectors, and the 
solution obtained is within a constant factor (in 
terms of the number of nodes in the network) ex- 
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tra from the optimal solution. The GVS algorithm 
can be used in the variants that R is known and 
T is either constrained or unconstrained. These 
variants are shown to be hard to approximate to 
a logarithmic factor in terms of the number of 
nodes in the network. To get a good solution 
within short time, the Community-Based Heuris- 
tic algorithm is proposed. 

Noticing the community structure of social 
networks, the work of Fan et al. [3] contains 
rumor propagation by selecting a minimal set of 
initial protectors to protect a special kind of ver- 
tex set, which play the role as the “gates” of ru- 
mors’ neighborhood communities. Two variants 
of the problem are studied under two different 
models, both variants are shown to be NP-hard, 
and approximation algorithms are developed to 
obtain good solutions. 


Game Theory Aspect 

The rumor blocking is also studied from the game 
theory aspect [13], where it uses graphs with 
nodes representing the tribal leaders and edges 
representing possible transmission of influence. 
Under this context, rumor blocking is viewed 
as a two-player game, in which one player, the 
rumor, will attempt to maximize the number of 
nodes accepting it while the second player, the 
protector, will attempt to minimize the rumor’s 
influence. Both the rumor and the protector will 
choose their action sources (initial rumor sources 
and initial protector sources). In the zero-sum 
game context, the rumor’s payoff is equal to 
the expected number of nodes infected, and the 
protector’s payoff is the opposite of the rumor’s 
payoff. The authors propose a double oracle al- 
gorithm for this game. 


For Problem 2 

Under the IC model, given an initial active set 
X, define the number of active nodes at the 
end of the influence diffusion process on G as 
F(X; G). Let o(X; G) denote the expected value 
of f(X;G). o(X;G) is called as the influence 
degree of node set X on graph G. Two no- 
tions of containment degrees are defined. One 
is called Average Contamination Degree, repre- 
senting the average of influence degree of all the 
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Algorithm 2: The greedy algorithm - IC 

Input: Graph Go = (Vo, Eo), a positive integer 
k <|£o| 
Output: The set of links blocked 
Initialize a subset L C Eo as L = Initialize a 
graph G = (V, E)as V = Vo, E = Eo 
while |L| < k do 

select a link e*, such that 

e* = argmineer c(G(e)) 


L=LU{e*} 
E=E \{e*} 
end 
Output L 


nodes in G, denoted as co(G). Its definition is 
co(G) = wv vey O(v; G). The other is called 
Worst Contamination Degree, representing the 
maximum of influence degree of all the nodes in 
G, denoted as c+.(G). Its definition is: cz (G) = 
maxyey 0(v;G). Approximation algorithms are 
proposed to find good solution for the problem. 

For a given graph G = (V,E), exactly 
computing influence degree c(G(e));e € E in 
Algorithm 2 is an open problem. Therefore, 
heuristic strategies are proposed to estimate 
c(G(e));e € E. These estimations are based 
on the Bond Percolation Method proposed by 
Kimura in [7], which we describe below. 


Bond Percolation Method [7] 
Assume there are propagation probabilities 
{pere € E} ona graph G = (V, E). In terms of 
information diffusion on a network, the occupied 
links represent the links that the information 
propagates, and the unoccupied links represent 
the links that the information does not propagates. 
The bond percolation process with occupation 
probabilities {pe;e € E} ona graph G = (V, E) 
is a stochastic process in which the probability of 
each link e € E becomes occupied is Pe. 
Construct N graphs through the bond 
percolation process, that is, {G, = (V, E,);n = 
1,...,N}. For any wu € V’ on graph G’ = 
(V',E’), let F(u;G’) represent the set of 
all the nodes that are reachable from u on 
G’. A node v is said to be reachable from 
u if there is a path from u to v through the 
links on G’. Define function g(u;G,N) = 
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oy |F(u;G")|. Then, g(u;G,N) can be 
used to estimate o(u;G), where u € V and 
if N is sufficiently large. Decompose each 
Gn into the strongly connected components 
(SCC) as Veo = (Uf, SCC(uw?; Gy), where 
u? € V and SCC(u?;G,) denotes the SCC of 
graph G, that contains u?. I, is the number 
of the SCC of graph G,, using the fact that 
|F(u;G")| = |F(u?;G,)| for all u € V to 
calculate {|F(u;G")|;u € V.n = 1,...,N}, 
then compute g(u;G,N), and finally o(u; G) 
can be calculated, where u € V. 


Estimation Method 

We now describe how to estimate c(G(e));e € E 
in [8]. For a graph G = (V, E), first, construct 
N sample graphs through the bond percolation 
process as {G, = (V,E,);n = 1,...,N}. 
Next, for each e € E, identify the subset of N, 
which is denoted as Sy (e) and satisfies Sy (e) = 
{n € {1,...,N};e € En}. Now apply the 
bond percolation process on the graph G(e) = 
(V,E \ {e}) for |Swy(e)| times, then |Sy(e)| 
graphs are obtained by the occupied links, denote 
them as {G(e)";n = 1,...,|Sn(e)|}. Given that 
N is large enough to ensure |S (e)| sufficiently 
large, then the function g(u; G(e),|Sw(e)|) 
equals to rol y Sx |F(u; G(e)")|, where 
u € V, can be used to estimate o(u; G(e)). 
Since each link of the graph G is independently 
declared occupied in the bond percolation 
process, then an alternative, g’(u,e) = 
ON VeneSy (e) |F (us G")|, is used to estimate 
o(u; G(e)) 


Variants of Problem 2 

The authors in [9] adapt the method they used 
for the IC model to study Problem 2 under the 
LT model [6]. In [1], the authors incorporate the 
trust among users in the information propagation 
process, and they propose a measure to compute 
trust between a pair of users. Then a Weighted 
Trust Network (WTN) is built, and the objective 
of the problem is to find the Maximum Spanning 
Tree (MST) in the WTN and, finally, immunize 
all the edges in the MST of the WTN. Another 
method that controls rumor spread through block- 
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ing nodes and links simultaneously can be found 
in [4]. 

Nguyen et al. [11] study the rumor block- 
ing problem under a dynamic social network 
structure. They propose to distribute patches to 
the most influential nodes in the social network, 
such that the number of nodes influenced by 
rumors is limited. In their work, they first take 
into account the network community structure 
and adaptively keeps it updated as the social 
network evolves, and then select most influen- 
tial individuals from each communities to be 
patched. 

The work of Zhu [14] focuses on the rumor 
blocking problem in cellular networks. First, a 
social relationship graph between mobile phones 
is obtained based on network traffic; the au- 
thors develop two graph-partitioning algorithms 
to partition the graph into many separate parts as 
possible and contain the rumor diffusion within 
each part. Then a minimum set of key nodes, 
which separate these different parts, is selected 
to be patched. The intuition is that the infected 
nodes in a part need to go through some of 
these key nodes to influence nodes in another 
part. Once these nodes are patched, it is impossi- 
ble for the influence propagates among different 
parts. 


Applications 


Practical applications can be seen in control- 
ling: propagation of computer viruses and worms 
propagates over computer networks, spread of 
malicious rumors through social networks, diffu- 
sion of infections or epidemics (such as swine flu) 
among groups of people, propagation of mobile 
worm in cellular networks, and so on. 


Open Problems 


There are many interesting directions that deserve 
further explorations. One direction is to improve 
existing influence diffusion models by consider- 
ing continuous time influence diffusion, users’ 
preferences to different kinds of information, 
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factors influencing users’ threshold in adopting a 
kind of information, etc. Another direction is to 
design efficient strategies to control the spread of 
rumors when only partial of a network structure 
is observable. Another research issue is incorpo- 
rating the detection of rumor sources into rumor 
blocking and continuous time delay of protectors. 


Cross-References 
Influence and Profit 


Influence Maximization 


Recommended Reading 


1. Bao Y, Niu Y, Yi C, Xue Y (2014) Effective 
immunization strategy for rumor propagation based 


on maximum spanning tree. In: Computing, 
Networking and Communications (ICNC), 
2014 International Conference on, pp 11-15, 


DOI:10.1 109/ICCNC.2014.6785296 

2. Budak C, Agrawal D, El Abbadi A (2011) Limiting 
the spread of misinformation in social networks. In: 
Srinivasan S, Ramamritham K, Kumar A, Ravindra 
MP, Bertino E, Kumar R (eds) WWW, ACM, pp 665— 
674 

3. Fan L, Lu Z, Wu W, Thuraisingham BM, Ma H, Bi Y 
(2013) Least cost rumor blocking in social networks. 
In: ICDCS, IEEE, pp 540-549 

4. He J, Liang H, Yuan H (2011) Controlling infec- 
tion by blocking nodes and links simultaneously. 
In: Chen N, Elkind E, Koutsoupias E (eds) WINE, 
Springer, Lecture Notes in Computer Science, vol 
7090, pp 206-217 


10. 


Rumor Blocking 


. He X, Song G, Chen W, Jiang Q (2012) Inuence 


blocking maximization in social networks under the 
competitive linear threshold model. In: SDM, SIAM 
/ Omnipress, pp 463-474 


. Kempe D, Kleinberg JM, Tardos E (2003) Maximiz- 


ing the spread of inuence through a social network. 
In: Getoor L, Senator TE, Domingos P, Faloutsos C 
(eds) KDD, ACM, pp 137-146 


. Kimura M, Saito K, Nakano R (2007) Ex- 


tracting inuential nodes for information diusion 
on a social network. In: AAAI, AAAT Press, 
pp 1371-1376 


. Kimura M, Saito K, Motoda H (2008) Minimizing 


the spread of contamination by blocking links in a 
network. In: Fox D, Gomes CP (eds) AAAI, AAAT 
Press, pp 1175-1180 


. Kimura M, Saito K, Motoda H (2008) Solving the 


contamination minimization problem on networks for 
the linear threshold model. In: Ho TB, Zhou ZH 
(eds) PRICAI, Springer, Lecture Notes in Computer 
Science, vol 5351, pp 977-984 

LiS, Zhu Y, Li D, Kim D, Huang H (2013) Rumor re- 
striction in online social networks. In: IPCCC, IEEE, 
pp 1-10 


. Nguyen N, Xuan Y, Thai M (2010) A novel method 


for worm containment on dynamic social networks. 
In: MILITARY COMMUNICATIONS CONFER- 
ENCE, 2010 - MILCOM 2010, pp 2180-2185, 
DOI:10.1109/MILCOM.2010.5680488 


. Nguyen NP, Yan G, Thai MT, Eidenbenz S (2012) 


Containment of misinformation spread in online so- 
cial networks. In: Contractor NS, Uzzi B, Macy MW, 
Nejdl W (eds) WebSci, ACM, pp 213-222 


. Tsai J, Nguyen TH, Tambe M (2012) Security games 


for controlling contagion. In: Homann J, Selman B 
(eds) AAAI, AAAI Press 


. Zhu Z, Cao G, Zhu S, Ranjan S, Nucci A (2009) 


A social network based patching scheme for worm 
containment in cellular networks. In: INFOCOM, 
IEEE, pp 1476-1484 


Schedulers for Optimistic Rate Based 
Flow Control 


Panagiota Fatourou 
Department of Computer Science, University of 
Joannina, Ioannina, Greece 


Keywords 


Bandwidth allocation; Rate adjustment; Rate al- 
location 


Years and Authors of Summarized 
Original Work 


2005; Fatourou, Mavronicolas, Spirakis 


Problem Definition 


The problem concerns the design of efficient 
rate-based flow control algorithms for virtual- 
circuit communication networks where a con- 
nection is established by allocating a fixed path, 
called session, between the source and the des- 
tination. Rate-based flow-control algorithms re- 
peatedly adjust the transmission rates of different 
sessions in an end-to-end manner with primary 
objectives to optimize the network utilization and 
achieve some kind of fairness in sharing band- 
width between different sessions. 

A widely-accepted fairness criterion for flow- 
control is max-min fairness which requires that 


© Springer Science+Business Media New York 2016 
M.-Y. Kao (ed.), Encyclopedia of Algorithms, 
DOI 10.1007/978-1-4939-2864-4 


the rate of a session can be increased only if this 
increase does not cause a decrease to any other 
session with smaller or equal rate. Once max- 
min fairness has been achieved, no session rate 
can be increased any further without violating 
the above condition or exceeding the bandwidth 
capacity of some link. Call max-min rates the 
session rates when max-min fairness has been 
reached. 

Rate-based flow control algorithms perform 
rate adjustments through a sequence of opera- 
tions in a way that the capacities of network links 
are never exceeded. Some of these algorithms, 
called conservative [3, 6, 10, 11, 12], employ 
operations that gradually increase session rates 
until they converge to the max-min rates without 
ever performing any rate decreases. On the other 
hand, optimistic algorithms, introduced more re- 
cently by Afek, Mansour, and Ostfeld [1], allow 
for decreases, so that a session’s rate may be 
intermediately be larger than its final max-min 
rate. 

Optimistic algorithms [1, 7] employ a specific 
rate adjustment operation, called update opera- 
tion (introduced in [1]). The goal of an update 
operation is to achieve fairness among a set of 
neighboring sessions and optimize the network 
utilization in a local basis. More specifically, an 
update operation calculates an increase for the 
rate of a particular session (the updated session) 
for each link the session traverses. The calculated 
increase on a particular link is the maximum 
possible that respects the max-min fairness condi- 
tion between the sessions traversing the link; that 
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is, this increase should not cause a decrease to 
the rate of any other session traversing the link 
with smaller rate than the rate of the updated 
session after the increase. Once the maximum 
increase on each link has been calculated the 
minimum among them is applied to the session’s 
rate (let e be the link for which the minimum 
increase has been calculated). This causes the 
decrease of the rates of those sessions traversing 
e which had larger rates than the increased rate 
of the updated session to the new rate. Moreover, 
the update operation guarantees that all the 
capacity of link e is allocated to the sessions 
traversing it (so the bandwidth of this link is fully 
utilized). 

One important performance parameter of 
a rate-based flow control algorithm is its 
locality which is characterized by the amount 
of knowledge the algorithm requires to decide 
which session’s rate to update next. Oblivious 
algorithms do not assume any knowledge of the 
network topology or the current session rates. 
Partially oblivious algorithms have access to 
session rates but they are unaware of the network 
topology, while non-oblivious algorithms require 
full knowledge of both the network topology and 
the session rates. Another crucial performance 
parameter of rate-based flow control algorithms 
is the convergence complexity measured as the 
maximum number of rate-adjustment operations 
performed in any execution until max-min 
fairness is achieved. 


Key Results 


Fatourou, Mavronicolas and Spirakis [7] have 
studied the convergence complexity of optimistic 
rate-based flow control algorithms under 
varying degrees of locality. More specifically, 
they have proved lower and upper bounds 
on the convergence complexity of oblivious, 
partially-oblivious and non-oblivious algorithms. 
These bounds are expressed in terms of 
n the number of sessions laid out on the 
network. 


Schedulers for Optimistic Rate Based Flow Control 


Theorem 1 (Lower Bound for Oblivious 
Algorithms, Fatourou, Mavronicolas and Spi- 
rakis [7]) Any optimistic, oblivious, _rate- 
based flow control algorithm requires 2(n”) 
update operations to compute the max-min 
rates. 


Fatourou, Mavronicolas and Spirakis [7] have 
presented algorithm RoundRobin, which ap- 
plies update operations to sessions in a round 
robin order. Obviously, RoundRobin is obliv- 
ious. It has been proved [7] that the conver- 
gence complexity of RoundRobin is O(n’). 
This shows that the lower bound for oblivious 
algorithms is tight. 


Theorem 2 (Upper Bound for Oblivious 
Algorithms, Fatourou, Mavronicolas and Spi- 
rakis [7]) RoundRobin computes the max- 
min rates after performing O(n’) update 
operations. 


RoundRobin belongs to a class of oblivious 
algorithms, called Epoch [7]. Each algorithm of 
this class repeatedly chooses some permutation 
of all session indices and applies update op- 
erations on the sessions in the order determined 
by this permutation. This is performed n times. 
Clearly, Epoch is a class of oblivious algorithms. 
It has been proved [7] that each of the algo- 
rithms in this class has convergence complexity 
O(n’). 

Another oblivious algorithm, called 
Arbitrary, has been presented in [1]. The 
algorithm works in a very simple way by 
choosing the next session to be updated in an 
arbitrary way, but it requires an exponential 
number of update operations to compute the 
max-min rates. 

Fatourou, Mavronicolas and Spirakis [7] have 
proved that partially-oblivious algorithms do 
not achieve better convergence complexity than 
oblivious algorithms despite the knowledge they 
employ. 


Theorem 3 (Lower Bound for Partially Obliv- 
ious Algorithms, Fatourou, Mavronicolas 
and Spirakis [7]) Any optimistic, partially 
oblivious, rate-based flow control algorithm 
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requires (n*) update operations to compute 
the max-min rates. 


Afek, Mansour and Ostfeld [1] have pre- 
sented a partially oblivious algorithm, called 
GlobalMin. The algorithm chooses as the 
session to update next the one with the minimum 
rate among all sessions. The convergence 
complexity of GlobalMin is O(n’) [1]. This 
shows that the lower bound for partially-oblivious 
algorithms is tight. 


Theorem 4 (Upper Bound for Partially 
Oblivious algorithms, Afek, Mansour and Os- 
tfeld [1]) GlobalMin computes the max- 
min rates after performing O(n?) update 
operations. 


Another partially-oblivious algorithm, called 
LocalMin, is also presented in [1]. The 
algorithm chooses to schedule next a session 
which has a minimum rate among all the sessions 
that share a link with it. LocalMin has time 
complexity O(n”). 

Fatourou, Mavronicolas and Spirakis [7] 
have presented a non-oblivious algorithm, 
called Linear, that exhibits linear convergence 
complexity. Linear follows the classical 
idea [3, 12] of selecting as the next updated 
session one of the sessions that traverse the 
most congested link in the network. To discover 
such a session, Linear requires knowledge 
of the network topology and the session 
rates. 


Theorem 5 (Upper Bound for Non-Oblivious 
Algorithms, Fatourou, Mavronicolas and Spi- 
rakis [7]) Linear computes the 
rates after performing O(n) update opera- 
tions. 


max-min 


The convergence complexity of Linear is opti- 
mal, since n rate adjustments must be performed 
in any execution of an optimistic rate-based flow 
control algorithm (assuming that the initial ses- 
sion rates are zero). However, this comes at a re- 
markable cost in locality which makes Linear 
impractical. 
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Applications 


Flow control is the dominant technique used 
in most communication networks for prevent- 
ing data traffic congestion when the externally 
injected transmission load is larger than what 
can be handled even with optimal routing. Flow 
control is also used to ensure high network uti- 
lization and fairness among the different con- 
nections. Examples of networking technologies 
where flow control techniques have been ex- 
tensively employed to achieve these goals are 
TCP streams [5] and ATM networks [4]. An 
overview of flow control in practice is provided 
in [3]. 

The idea of controlling the rate of a traffic 
source originates back to the data networking 
protocols of the ANSI Frame Relay Standard. 
Rate-based flow control is considered attractive 
due to its simplicity and its low hardware require- 
ments. It has been chosen by the ATM Forum on 
Traffic Management as the best suited technique 
for the goals of ABR service [4]. 

A substantial amount of research work has 
been devoted in past to conservative flow control 
algorithms [3, 6, 10, 11, 12]. The optimistic 
framework has been introduced much later by 
Afek et al. [1] as a more suitable approach for real 
dynamic networks where decreases of session 
rates may be necessary (e.g., for accommodat- 
ing the arrival of new sessions). The algorithms 
presented in [7] improve upon the original algo- 
rithms proposed in [1] in terms of either con- 
vergence complexity, or locality, or both. More- 
over, they identify that certain classical schedul- 
ing techniques, such as round-robin [11], or ad- 
justing the rates of sessions traversing one of 
the most congested links [3, 12] can be efficient 
under the optimistic framework. The first general 
lower bounds on the convergence complexity of 
rate-based flow control algorithms are also pre- 
sented in [7]. 

The performance of optimistic algorithms has 
been theoretically analyzed in terms of an ab- 
straction, namely the update operation, which 
has been designed to address most of the in- 
tricacies encountered by rate-based flow con- 
trol algorithms. However, the update operation 
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masks low-level implementation details, while it 
may incur non-trivial, local computations on the 
switches of the network. Fatourou, Mavronico- 
las and Spirakis [9] have studied the impact on 
the efficiency of optimistic algorithms of local 
computations required at network switches in 
order to implement the update operation, and 
proposed a distributed scheme that implements 
a broad class of such algorithms. On a different 
avenue, Afek, Mansour and Ostfeld [2] have 
proposed a simple flow control scheme, called 
Phantom, which employs the idea of consider- 
ing an imaginary session on each link [10, 12], 
and they have discussed how Phantom can be 
applied to ATM networks and networks of TCP 
routers. 

A broad class of modern distributed appli- 
cations (e.g., remote video, multimedia confer- 
encing, data visualization, virtual reality, etc.) 
exhibit highly differing bandwidth requirements 
and need some kind of quality of service guar- 
antees. To efficiently support a wide diversity of 
applications sharing available bandwidth, a lot 
of research work has been devoted on incorpo- 
rating priority schemes on current networking 
technologies. Priorities offer a basis for model- 
ing the diverse resource requirements of modern 
distributed applications, and they have been used 
to accommodate the needs of network manage- 
ment policies, traffic levels, or pricing. The first 
efforts for embedding priority issues into max- 
min fair, rate-based flow control were performed 
in [10, 12]. An extension of the classical the- 
ory of max-min fair, rate-based flow control to 
accommodate priorities of different sessions has 
been presented in [8]. (A number of other pa- 
pers addressing similar generalizations of max- 
min fairness to account for priorities and utility 
have been presented after the original publication 
of [8].) 

Many modern applications are not based 
solely on point-to-point communication but 
they rather require multipoint-to-multipoint 
transmissions. A max-min fair rate-based flow 
control algorithm for multicast networks is 
presented in [14]. Max-min fair allocation of 
bandwidth in wireless adhoc networks is studied 
in [15]. 
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Open Problems 


The research work on optimistic, rate-based flow 
control algorithms leaves open several interesting 
questions. The convergence complexity of the 
proposed optimistic algorithms has been ana- 
lyzed only for a static set of sessions laid out 
on the network. It would be interesting to eval- 
uate these algorithms under a dynamic network 
setting, and possibly extend the techniques they 
employ to efficiently accommodate arriving and 
departing sessions. 

Although max-min fairness has emerged as 
the most frequently praised fairness criterion for 
flow control algorithms, achieving it might be 
expensive in highly dynamic situations. Afek 
et al. [1] have proposed a modified version 
of the update operation, called approximate 
update, which applies an increase to some 
session only if it is larger than some quantity 
6 >0. An approximate optimistic algorithm 
uses the approximate update operation 
and terminates if no session rate can be 
increased by more than 8. Obviously such an 
algorithm does not necessarily reach max-min 
fairness. It has been proved [1] that for some 
network topologies every approximate optimistic 
algorithm may converge to session rates that 
are away from their max-min counterparts 
by an exponential factor. The consideration 
of other versions of update operation or 
different termination conditions might lead to 
better max-min fairness approximations and 
deserves more study; different choices may also 
significantly impact the convergence complexity 
of approximate optimistic algorithms. It would be 
also interesting to derive trade-off results between 
the convergence complexity of such algorithms 
and the distance of the terminating rates they 
achieve to the max-min rates. 

Fairness formulations that naturally approx- 
imate the max-min condition have been pro- 
posed by Kleinberg et al. [13] as suitable fairness 
criteria for certain routing and load balancing ap- 
plications. Studying these formulations under the 
rate-based flow control setting is an interesting 
open problem. 
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Problem Definition 


Wireless data broadcasting means a set of data 
are repeatedly broadcast from a base station to 
a mass number of wireless and mobile clients. 
If a client wants a specific datum, it will access 
onto the broadcasting channel, get the location 
(appearance time) of the datum with the help 
indices, and wait until the datum has been broad- 
cast. The scheduling problem in data broadcast- 
ing deals with the design of an efficient permuta- 
tion strategy for a client to download a required 
subset of data from an multichannel broadcasting 
system, with both time and energy constraints. 
Here time constraint means the client wants the 
minimum downloading time from when it starts 
the query until the moment it has successfully 
download each piece of datum, while the energy 
constraint means the client wants the minimum 
switching numbers among channels to reduce 
extra battery consumption. Correspondingly, we 
can define the scheduling problem formally as 
follows: 

A client wants to download a group of k 
data items D = {dj,d2,...,d,}, each with 
different sizes. Those data items are broadcasted 
on n different channels C = {c1,C2,...,Cn} 
repeatedly together with many other data items. 
Each channel may have different bandwidth and 


1898 Scheduling in Data Broadcasting 
Time 1 2 3 4 =5 6 7 8 9 10 11 12 13 14 15 16 17 18 
GC; d, ds d, d> 
Broadcast Cycle=9 
Cc. a dy a | | da ds | da | | 


Broadcast Cycle=6 


switch switch switch 


ds Payal dy do 
ar Ae 


Access Latency=7 ———— 


switch 


dy 


Access Latency=12 


a} 


Option 1: Switch=3 


Option 2: Switch=1 


4 


Scheduling in Data Broadcasting, Fig. 1 Example of possible objective contradiction 


broadcast cycle length. Let the time to download 
the smallest transmission packet be a unit time, 
and the length of d; can be represented as /; (also 
referring as downloading time). 

Assume the client knows the locations (chan- 
nel id and time offsets) of the required data set 
beforehand at the starting time ¢ O (with the 
help of indices, which is beyond the scope of 
our problem), and the target is how to down- 
load k known data from n channels efficiently 
with minimum downloading time (we also refer 
it as access latency) and minimum switching 
numbers. 

Unfortunately, the two objectives in this prob- 
lem are conflicting to each other. Figure | is an 
example to illustrate this phenomenon. In Fig. 1, 
there are two channels broadcasting 15 data items 
repeatedly. Suppose the gray data items {1,2,3,4} 
are of the request. The starting point of the client 
retrieving process is at ¢ 0. If we want to 
minimize the access latency, the request should 
be retrieved in the order of “3 > 1 > 4 > 
2” which takes only 7 time units but needs 3 
switches (shown as Option | in Fig. 1). However, 
if we want to minimize the switches, the best 
retrieving order should be “3 > 4 > 2 > 1” 
which needs only | switch but takes 12 time units 
(shown as Option 2 in Fig. 1). This example ex- 
hibits that access latency and number of switches 
cannot be minimized at the same time. They are 
contradictory factors. 


As a consequence, we want to fix one factor 
and minimize another objective, and thus have the 
following objective: 


Objective 

We hope to design a data downloading order for 
a client to download k data items from n broad- 
casting channels, such that the access latency t 
is minimized if we restrict the switch number 
among channels (denoted as /); otherwise, we 
will minimize the number of switches h/ once the 
access latency t is bounded. 


Constraints 


1. Switch Constraint: Note that if a client is 
downloading a data from channel c; at time 
to, then it cannot switch to channel c;, where 
J # i, to download another data at time 
to + 1 due to connection protocols. Thus, we 
assume if a client wants to download data from 
another channel, it needs at least one time 
unit for channel switching. Figure 2 gives a 
typical process of data retrieval in multichan- 
nel broadcast environments. The query data 
set is {d,,d3,ds5}, and a user can download 
data object d, and d3 from channel c, and 
then switch to channel c3 at time f¢ 6 
to download data object ds at time t 7. 
However, after time t = 5, the user cannot 
switch from channel c; to cz to download data 
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straint 


ds at time t = 6. From Fig. 2, we also get that 
the bandwidths of different channels are not 
necessarily the same. Actually, the bandwidth 
of channel c is twice as that of c; or c3, thus 
d3 or ds, which take two time slots on c, or 
c3, can be broadcasted in one time unit by c2. 

2. Objective Constraint: We have to setup a 
reasonable threshold for latency constraint ¢ 
and switch constraint h, such that we would 
achieve a feasible solution for the correspond- 
ing minimized switches and shortest access 
latency. 


Problem 1 (Scheduling in Data 
Broadcasting) 


INPUT: The required data subset D = 
{d,dz,...,d,} broadcast on n_ different 
channels with their locations and downloading 
time 1;, a switch constraint h or latency 
constraint t. 

OUTPUT: A permutation of D such that if start- 
ing from time slot zero, a client would achieve 
the shortest access latency (with switch thresh- 
old h) or the minimum switch numbers (with la- 
tency threshold t) if it follows this permutation 
to download each data item sequentially with 
switch constraint. 


Key Results 


Scheduling is an important part in the wireless 
data broadcast system. Researchers tend to divide 
the scheduling problems into two subproblems. 
The first one is the data allocation problem in 
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the server side, while the other one is the data 
retrieval problem in the client size. 

With respect to server side scheduling, several 
works have been proposed to improve the sys- 
tem performance [1-5]. Acharya et al. [1] first 
dealt with the data allocation problem for single- 
channel environment. He proposed a scheduling 
algorithm considering data access frequencies 
and allowed frequent accessed data to be broad- 
casted more often. Most works concerned multi- 
channel environment. For data set with uniform 
length, Yee et al. [2] proposed an O(t?m) time- 
complexity dynamic programming algorithm to 
find the optimal schedule and also a near optimal 
greedy algorithm to reduce the time complex- 
ity. For nonuniform lengths case, Ardizzoni et 
al. [3] proved that this problem is strong NP- 
hard. Ardizzoni et al. [3], Anticaglia et al. [4], 
and Kenyon et al. [5] designed algorithms based 
on greedy and heuristic strategy. 

Also most of the literature discussed the data 
allocation problem from server’s point of view; 
several works [6—10] considered the data retrieval 
scheduling problem from the client’s point of 
view. Shi et al. [6] defined the data retrieval 
problem in MIMO environment as parallel 
data retrieval scheduling with MIMO Antennae 
(PADRS-MIMO) and proposed two greedy 
heuristics to guarantee minimum switchings 
among channels or reduce the downloading 
time when the number of antennae in the 
mobile devices are limited. Lu et al. [7, 8] 
defined the largest number data retrieval (LNDR) 
and maximum cost data retrieval (MCDR) 
problems and considered the hopping cost. 
He also proved that when the hopping cost 
cannot be ignored, LNDR is NP-hard and 
designed a 1/2-approximation algorithm. Gao 
et al. [9, 10] designed a randomized algebraic 
algorithm that takes both energy cost and access 
time into consideration to schedule the data 
retrieval process in multichannel environments. 
The algorithm proposed can detect whether a 
given data retrieval problem has a solution with 
access time ¢ and number of switchings h in 
O OF (hnt)O) time, where 7 is the number of 
channels and k is the number of requested data 
items. 
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Hardness Analysis 


Define a tuple s = {is, js,ts,t{} to denote the 
datum d;,, which can be downloaded from 
channel c;, during the time span AAP then 
it is clear that a valid data retrieval schedule 
is a sequence of k intervals s1,52,...,Sx, 
each tuple corresponds to a distinct data item 
in D, and there are no conflicts between 
any two of the & tuples. To analyze the NP- 
hardness, we then define the decision problem of 
MCDR. 


Definition 1 (Decision MCDR) Given a data 
set D, a channel set C, a time threshold f, 
and a switching threshold fA, find a valid data 
retrieval schedule to download all the data 
in D from C before time t with at most h 
switchings. 


Theorem 1 MCDR problem is NP-hard. 


Proof We use VC <p, MCDR to prove this 
theorem. Here VC is the decision problem of 
vertex cover, say, given a graph G = (V,E£), 
we want to find a minimum size vertex subset 
VC C V such that for any edge (v;,v;) € E, 
either vu; € VC or v; € VC. An instance of 
vector cover is: given a graph G = (V, E) and 
integer k, does it have a vertex cover VC with 
size k? Then we construct an instance of MCDR 
from G and k as follows: 


1. For each vertex v; € V, define a channel v;. 
Define another k channels b;,...,b;. Then 
the channel set is C = {v1,..., vj], 51, Awies 
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bx}. Totally |V| + & channels. Let 6 be the 
maximum vertex degree in G, and then each 
channel has broadcast cycle length 6 + 3. 

2. For each edge (v;,v;) € FE, define a unit 
length data item e;; in data set De and ap- 
pend it on channel c; and c; (the order can 
be arbitrary and starting from the third time 
unit). 

3. For each channel b;, define a unit length data 
item d; in data set Dg and allocate it on the 
first time unit of channel b;. 

4, The data set D = De U Dp. 


IVl=4 


Vo 
V3 
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Figure 3 is an example to show how to con- 
struct the broadcast system. In this figure, 6 = 3, 
k = 2,|V| = 4; thus, the channel set should 
be {v1, U2, 03, v4, by, b2}, each having broadcast 
length 6 + 3 = 6. Each ej; represents an edge 
(v;,u;), and it is clear that if we download all 
data items from channel v;, then it means we 
cover the edges connecting node v;. 

Next, we prove that G has a vertex cover with 
size k if and only if there is a valid data retrieval 
schedule S such that t = k(6+3) andh = 2k-1. 


=>: If G has a vertex cover VC with size 
k, then we can select the corresponding k 
channels in {v;|vj € VC} to receive all the 
data in k cycles. At the beginning of ith cycle 
(iteration), the client will visit b; att = 1, and 
hop to some v; € VC channel, stay on this 
channel till the last time unit of the broadcast 
cycle, and then hop to b;+ 1. There are k bjs, 
so each iteration client will download one of 
them. VC is a vertex cover, so following all 
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v; € VC we must download every e;;. The 
length of each broadcast cycle is 6 + 3, so 
the total access latency is k(6 + 3). In each 
broadcast cycle, the client will switch twice 
(except the last cycle), soh = 2k — 1. 

<=: Assume MCDR has a valid schedule S$ 
with ¢ = k(6 + 3) andh = 2k — 1. Let us 
consider Dy, first. There are k b;’s located at 
the first time unit on k different channels. It 
means we have to switch at least k — 1 hops to 
download Dz, and then we only have another 
k hops for De, which means we can visit 
at most k channels in {v;}. At the beginning 
of each broadcast cycle, we always stay at 
some channel b; to download d;, and then 
we switch to some v;, and at the end of this 
cycle, we have to switch to channel b;+, for 
dj;+1. This means we cannot switch to two 
vertex channels within one broadcast cycle, 
otherwise we cannot download D = D.U Dy 
in k iterations. Since S is valid, we visit k 
vertex channels and download all D, data 
items, it means these k vertices form a vertex 
cover with size k. 


This reduction can be done in polynomial 
time, and we can conclude that MCDR is NP- 
hard. 


Randomized Algebraic Algorithm 


To solve the above decision problem, we devel- 
oped a randomized algebraic algorithm. It can 
detect if a given problem has a schedule to down- 
load all the requested data before time ¢ and with 
at most / channel switchings in O (oF (nht)°) 
time, where 7 is the number of channels and k 
is the number of required data items. We also 
provide a fixed parameter tractable (FPT) algo- 
rithm with computational time O (2! (nht)?™). 
It can determine whether there is a scheduling 
to download / data items from D in at most n 
time slots and at most / channel switches. Service 
provider can adjust n and A freely to fit their own 
requirement. We firstly give some preliminaries 
and then present our algorithms in detail. 
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Preliminaries 

Here we introduce some notions about group 
algebra which are not often used in algorithm 
design. 


Definition 2 Assume that x,, .. 
ables in group algebra. Then, 


. Xk are vari- 


1. A monomial has format x{!x3? ...xg*. 

2. A multilinear monomial is a monomial such 
that each variable has degree exactly one. For 
example, x3x5X6 is a multilinear monomial, 
but Taken is not. 

3. For a polynomial p(x1,...,X,), its sum of 
product expansion is poe Dj(X1,..-,Xk); 
where each p; is a monomial, which has a 
format gx e a. 
coefficient. 

4. G2 = ({0,1},+,-) is a field with two ele- 
ments {0, 1} and two operations + and -. The 
addition operation is under the modular of 2 
(mod 2). 

Ds ae is the group of binary k-vectors. Let wo 
denote the all-zero vector, which is the identity 
of Zz and then for every v € Ze. v2 = wo, 
VU:Wo = UV. 


with c; respect to its 


The operations between elements in the group 
algebra are standard. 


Algorithm Description 

The basic idea of our algebraic algorithm is that 
for each item d; € D, where D is the query 
data set, we create a variable x; to represent it. 
Therefore, given D = {d,,d2,..., dx}, we con- 
struct a variable set X = {x1,X2,...,x,}. We 
then design a circuit H;,,, such that a schedule 
without conflict will be generated by a multilinear 
monomial in the sum of product expansion of the 
circuit. The existence of schedules to download 
all the data items in D from the multiple channel 
set C is converted into the existence of multilin- 
ear monomials of H;7,,. Replace each variable 
by a specified binary vector which can remove all 
of the non-multilinear monomials by converting 
them to zero. Thus, the data retrieval problem is 
transformed into testing if a multivariate polyno- 
mial is zero. It is well known that randomized 
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algorithms can be used to check if a circuit is 
identical to zero in polynomial time. Thus, we 
have the following statements. 


Lemma 1 There is a polynomial time algorithm 
such that given a channel c;, a time interval 
[t1, t2], and an integer m, it constructs a circuit 
of polynomial P;,t),t,,m Such that for any subset 
D! = {dj,,...,di,} CG D which has a size 
of m and is downloadable in the time inter- 
val [t1,t2] from channel c;, the product expan- 
sion of P;,t),t,m contains a multilinear monomial 
Xi, Xin «++ Xin: 

Proof We can use a recursive way to compute the 
circuit P;,t,,t.,m in polynomial time. 


1. Pi,t),t2,0 = 9. 

2. Pity. = ee xj;,x; © X, and the corre- 
sponding datad ; is entirely in the time interval 
[t1, t2] of channel c;. 

3. Pititodti = oy Xe Piet + Pie ati 
dj; starts at time 5 + 1 and ends before time tz 
on channel c;. 


When computing Pjt,,15,141, x; multiplies 
Pity, is based on the case that d; is down- 
loadable from time ty + 1 to ft in the final 
phase, and the other / data items are download- 
able before time t}. The term Pity 05,041 is the 
case that / + 1 items are downloaded before 
time bis Note that the parameter m in P; 41,12,m 
controls the total number of data to be down- 
loaded. 


Definition 3 A subset data items D’ = 
Coe ae C D is (i,t, h)-downloadable 
if we can download all data items in D’ before 
time f, the total number of channel switches is 
at most /, and the last downloaded item is from 
channel c;. 


Lemma 2 Given two integers t and h, there is a 
polynomial time algorithm to construct a circuit 
of polynomial Fi t,n,m such that for any (i,t, h)- 
downloadable subset D'! = {dj,,...,di,} © 
D, the product expansion of F;t.n,m contains a 
multilinear monomial (xi, sadocs Xin) Y, where Y 
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is a multilinear monomial which does not include 
any variable in X. 


Proof We still use a recursive way to construct 
the circuit. Some additional variables are 
used as needed. Without loss of generality, 
we assume the data retrieval process start at 
time 0. 


i Repo 

2. Fito. = Pi,1,t,1 * Vi,t,0,1- 

3. Fit,n+igm'+1 = Vint,’ +1,m'/+1,0 re 
Fie nim’ * Pit’4igt,1) + Yit,h'tim'+1,1 


(Dy Derree Fi,t—1hm! * Pig'tiyt1) . 


Then we can get Lemma 2 immedi- 


ately. 


The computation of Fj 4.4/41,m/+1 1s based 
on two cases, and we use two. variables, 
Vit,A’t1ym/+1,0 ANd Vit,n’+iym/+1,1, to mark 
them respectively. We now present an algorithm 
that involves one layer randomization to 
determine if there is a schedule to download 
all the data items in D before time f and with at 
most / channel switchings. 


Theorem 2 There is an O (2* (Ant)O) time 
randomized algorithm to determine if there is a 
scheduling to download |D| = k data items 
before time t and the number of channel switches 
is at most h, where n is the total number of 
channels. 


Proof By Lemma 2, we can construct a circuit 
HAian = >= Fit.h,k in polynomial time. It is 
easy to see there is a scheduling for download- 
ing the k data items before time ¢ and with h 
channel switches, if and only if the sum product 
expansion of H;,,,, has a multilinear monomial 
(x1, ° 2, XK)Y. 

We can replace each s; by a vector w; = we + 
a , where Wg Is the all-zero vector of dimension 
k and v; is a binary vector of dimension k with 
its ith element being 1 and all other elements 
being 0. Assume k = 3, we define the following 
operations: 
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ay by (a; + b;)(mod2) 
Va'Vp = | a2 |-| bo | = | (ai + 42)(mod2) 
a3 b3 


(a, + b3)(mod2) 
(1) 


(Vg + Up) + Ve = Va: Ve + UZ: Ve (2) 


By Eqs. | and 2, for any k-dimensional binary 
vector w! = wo + v, we have w = we + 
2wo: vu + v2 = wo + 2(wo-v) + wo = 
2(wo - v) + 2wo = O, because of the coeffi- 
cients are in the field of Gz. The replacement 
x; = w;(i = 1,...,m) makes all the non- 
multilinear monomials become zero. Meanwhile, 
all the multilinear monomials remain nonzero. 
Hence, it is clear that there is a scheduling to 
download all the data items in D before time ft 
and with at most / channel switchings if and only 
if Ay p.n|x;=w;(i=1,...,k) 18 a nonzero polynomial. 
The variables in Y makes it impossible to have 
cancelation when adding two identical multilin- 
ear monomials, which can be generated from 
different paths with variables in {x1,..., xx}. It 
is well known that randomized algorithms can be 
used to check if a circuit is identical to zero in 
polynomial time [11, 12]. 

The algorithm generates than 2* 
terms during the computing process since 
there are at most 2* distinct binary vec- 
tors. Therefore, the computational time is 
O (2 (nht)O™), 


less 


Example Let Hy = x\x2y1 + xZy2 and Hz = 
xty1 + xZy2. Consider the replacement x; = 
wr = (9) + (0) and x2 = we = (9) + (7). We 


have the following steps of operations. 


Ay |x1 = wi, x2 = w2 


“(GOO -()» 
“()) » 
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H|x1 = wi,X2 = wo 


(()+()) = (@)*()) » 
(r) *(.)+(0)* 

(Gi) +()G 
= Pl) 2ba)) Pl) >)» 


=0 


1, is a polynomial that contains a multilinear 
monomial. It becomes nonzero after replacement. 
H is a polynomial that is without multilinear 
monomials. It becomes zero after the replace- 
ment. If we just down a subset of / data items 
in set D, we have the following theorem that 
involves two layers of randomization. 


Theorem 3 There is an O (2! (hnt)?) time 
randomized algorithm to determine if there is 
a scheduling to download | data items from D 
in at most t time units and at most h channel 
switches. 
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Proof By Lemma 2, we can construct a polyno- 
mial Hy~7 = yy F;¢.n1 in polynomial time. 
Replace each x; with a vector wj = we + ae 
where Wo is the all-zero vector of dimension / and 
v; is arandom distinct vector of dimension /. The 
replacement x; = w; (i = 1,...,k) makes all 
monomials which has non-multilinear monomial 
at x part become zero. 

Therefore, there is a scheduling to download 
1 data items of D before time ¢ and with the 
number of switches no more than h if and only if 
FAL, hn k\x;=w; i=1,...,k) 18 not a zero polynomial 
in the field of G2. Assume that the product 
expansion of H;.,,; has a multilinear monomial 
(Xi,,--.,%i,)Y, where Y is a_ multilinear 
monomial with variables not in xj,...,X,x. 
For a series of randomly assigned vectors with 
dimension /: v;,,...,Uj;,, the probability that 
v;, 1s a linear combination of vj,,.. 

gi-1 

is at most f= Sea 

1 


Ui 


Therefore, with 


vj, 18 a linear 


I 


I 

probability at most )~ 
i=l 

combination of v;,,...,v,;,_, for some i < /. 


When v;,,...,u,;, are linearly independent, 
the product of v;,,...,uU,, is nonzero. Every 
multilinear monomial in the product expansion 
has different variables to form Y since it is 
determined by a unique path to generate the 
polynomial. Therefore, for those random vectors 
vj, every multilinear monomial has a chance at 
least 1 — 3 = ; to be nonzero. Therefore, if there 
is a solution, Hy 7,k\x;=w;(i,1,...,/) With random 
assignment Y is not zero in the field of G2 with 
probability at least i. 

After the replacements, it generates less than 
2] terms since there are at most 2k different 
vectors for a group of Z. The coefficient 
of each vector is kept as a polynomial size 
circuit. Therefore, the computational time of 
our algorithm is O (2 (ant)O), and if we run 


it 30 times, the error rate is (A < 0.0002. 


Applications 


Scheduling problem is one of the most funda- 
mental problems in combinatorial optimization, 
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which could model various real-world practical 
applications. Especially, scheduling problem at 
client hand side would be very useful for data 
retrieval problem in wireless data broadcasting 
or data streaming environment to reduce energy 
consumption and improve query efficiency. Such 
problem would also be helpful for parallel query 
applications in distributed storage systems. 


Open Problems 


How to download data items efficiently in 
wireless data broadcast environment can usually 
be formulized as NP-hard problems with 
different constraints, and can be categorized 
into two kinds: single channel process and 
multiple channel process. The best known 
result for the former problem is constant-factor 
approximations, while currently there is no 
polynomial time approximation scheme (PTAS) 
for both of them. The results for this problem is 
also helpful for parallel data retrieval problem in 
distributed data storage system and cloud system. 


Experimental Results 


Many literature proposed experimental results for 
scheduling problem in data broadcasting. Shi et 
al. [6] simulated a base station with n broadcast 
channels and 10,000 items, each of size 1KB, and 
multiple clients with various requests of data. The 
access probability of the database follows Zipf 
distribution, n varies from 5 to 30, the number 
of antennae varies from | to 10, and the size of 
a request varies from 10 to 1,000. For each ex- 
periment, they generated 100 requests to get the 
average access latency and number of switchings 
during data retrieval. Lv et al. [7,8] constructed 
two types of broadcast programs: special data 
broadcast without channel switching time (SDB) 
and general data broadcast with channel switch- 
ing time (GDB). In both types of programs, they 
simulated a base station with n broadcast chan- 
nels; the bandwidth of each channel is I Mbit/sec. 
The database to be broadcasted has N data items, 
each of size 512 bytes. The time duration is 
denoted by ¢. The data items of query data set D 
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is generated with access probabilities following 
the Zipf distribution. 


URLs to Code and Data Sets 


Shi et al. [6] provided the program for users 
to test parameter setting for their own data sets 
and available channels (http://theory.utdallas.edu/ 
dataengineering). 


Cross-References 


Efficient Polynomial Time Approximation 
Scheme for Scheduling Jobs on Uniform 
Processors 


Recommended Reading 


1. Acharya S, Alonso R, Franklin M, Zdonik S (1995) 
Broadcast disks: data management for asymmetric 
communication environments. In: The ACM special 
interest group on management of data conference 
(SIGMOD), San Jose, 22-25 May 1995, pp 199-210 

2. Yee W, Navathe S, Omiecinski E, Jermaine C 
(2002) Efficient data allocation over multiple chan- 
nels at broadcast servers. IEEE Trans Comput 
51(10):1231-1236 

3. Ardizzoni E, Bertossi A, Pinotti M, Ramaprasad S, 
Rizzi R, Shashanka M (2005) Optimal skewed data 
allocation on multiple channels with flat broadcast 
per channel. IEEE Trans Comput 54(5):558-572 

4. Anticaglia S, Barsi F, Bertossi A, Iamele L, Pinotti M 
(2008) Efficient heuristics for data broadcasting on 
multiple channels. Wirel Netw 14(2):219-231 

5. Kenyon C, Schabanel N (1999) The data broadcast 
problem with non-uniform transmission times. In: 
Proceedings of the tenth annual ACM-SIAM sym- 
posium on discrete algorithms (SODA), Baltimore, 
17-19 Jan 1999, pp 547-556 

6. Shi Y, Gao X, Zhong J, Wu W (2010) Efficient 
parallel data retrieval protocols with MIMO antennae 
for data broadcast in 4G wireless communications. 
In: The 21st international conference on database 
and expert systems applications (DEXA), Bilbao, 30 
Aug-—3 Sept 2010, pp 80-95 

7. Lu Z, Shi Y, Wu W, Fu B (2012) Efficient data 
retrieval scheduling for multi-channel wireless data 
broadcast. In: International conference on computer 
communications (INFOCOM), Orlando, 25-30 Mar 
2012, pp 891-899 

8. Lu Z, Shi Y, Wu W, Fu B (2014) Data retrieval 
scheduling for multi-Item requests in multi-channel 
wirelessBroadcast environments. IEEE Trans Mobile 
Comput 13(4):752-765 


1905 


9. Gao X, Lu Z, Wu W, Fu B (2011) Algebraic algorithm 
for scheduling data retrieval in multi-channel wireless 
data broadcast environments. In: The 6th annual in- 
ternational conference on combinatorial optimization 
and applications (COCOA), Zhangjiajie, 4-6 Aug 
2011, pp 74-81 

10. Gao X, Lu Z, Wu W, Fu B (2013) Algebraic data 
retrieval algorithms for multi-channel wireless data 
broadcast. Theor Comput Sci 497:123-130 

11. Williams R (2009) Finding paths of length k in O 
(2*)time. Inf Process Lett 109(6):315-318 

12. Koutis I (2008) Faster algebraic algorithms for path 
and packing problems. In: The 2008 international col- 
loquium on automata, languages and programming 
(ICALP), Reykjavik,6— 13 July 2008, pp 575-586 


Scheduling with a Reordering Buffer 


Matthias Englert! and Matthias Westermann 
1Department of Computer Science, University of 
Warwick, Coventry, UK 

?Department of Computer Science, TU 
Dortmund University, Dortmund, Germany 


Keywords 
Machine scheduling; Minimum makespan 
scheduling; Online algorithms; Reordering 


buffer; Sorting buffer 


Years and Authors of Summarized 
Original Work 


2002; Ricke, Sohler, Westermann 

2009; Gamzu, Segev 

2010; Englert, Racke, Westermann 

2011; Adamaszek, Czumaj, Englert, Racke 
2011; Désa, Epstein 

2013; Avigdor-Elgrabli, Rabani 

2014; Englert, Ozmen, Westermann 


Problem Definition 
The problem known as the reordering buffer 


problem or as the sorting buffer problem is 
concerned with sorting a sequence of colored 
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items according to their color using a limited 
size buffer. More precisely the items are to be 
processed and arrive one by one. Arriving items 
must first be placed into a buffer which can hold 
up to & items. Once the buffer is completely 
filled, an algorithm has to free space by selecting 
one of the items in the buffer for processing and 
removing that item from the buffer. After items 
stop arriving, the remaining items in the buffer 
may be processed in any order. Whenever an item 
is processed that has a different color than the 
item processed in the step before, this generates 
a cost of 1. The objective is to minimize the total 
cost. 


Metric Space Generalization 


This problem can be further generalized. Items, 
instead of having a color, correspond to points in 
a metric space. A single server must process all 
items. In order to process an item, the server has 
to move to the corresponding point in the metric 
space. At every point, the server has to chose 
one of the first k as of yet unprocessed items for 
processing and move the server accordingly. The 
goal is to minimize the total distance the server 
travels. 

The uniform metric in which any two points 
either have distance 0 or distance | from one an- 
other corresponds to the original “color sorting” 
setting. Other metrics studied include line metrics 
and “star” metrics which are the distance metrics 
over weighted undirected trees of diameter 2. 


Block Operation Setting 


Another variant is the so-called block operation 
setting. Once again, the input consists of a se- 
quence of colored items. The first k items are 
placed in a buffer. In each step, an algorithm 
selects one of the colors and processes all items of 
that color currently stored in the buffer, incurring 
a cost of 1. This is called a block operation. 
The processed items are removed from the buffer 
and replaced with the next items from the input 
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sequence (if there are any). The goal is once again 
to minimize the total cost. 

The difference between this block operation 
setting and the original setting is most 
pronounced for an input sequence consisting 
of £ items of a single color. While in the original 
setting such a sequence would not produce any 
cost, the cost in the block device setting would 
be £/k since only k items can be processed per 
block operation. 


Minor Variants Found in the 
Literature 


In some cases, there are slight differences in 
which these problems are defined in the literature. 
Does a cost incur for the first ever processed item 
or, similarly, is the first position of the server in 
the metric space part of the input or does the algo- 
rithm get to chose that position (without incurring 
any cost)? Does an item first have to be placed in 
the buffer or can the algorithm process an arriving 
item directly, thereby bypassing the buffer? Do 
we need to remove the remaining items in the 
buffer once new items stopped arriving? It turns 
out however that these details are inconsequential 
for most of the results we are interested in. 


Key Results 


The main focus of study in the area of schedul- 
ing with a reordering buffer has been on online 
algorithms. In the online setting, the algorithm’s 
decisions have to be based solely on the items 
that arrived in the past and must not depend on 
items arriving in the future. An online algorithm 
is called c-competitive if the cost of the algorithm 
is at most c times that of an optimal off-line 
solution. 


The Online Problem 


Deterministic Algorithms 
Riacke, Sohler, and Westermann [29] first 
introduced the problem for the uniform metric 
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and gave a O(log” k)-competitive online algo- 
rithm. After further improvements to O(log k) 
[18] and O(logk/loglogk) [6] eventually, an 
O(./log k)-competitive online algorithms was 
designed [2]. This is almost optimal since a 
lower bound of 2(,/logk/loglogk) is known 
[2]. Many of these upper bounds also generalize 
from the uniform metric to star metrics. 

While the proof techniques for some of these 
results differ significantly, the basic idea behind 
all the algorithms is the same. As long as the 
buffer contains an item of the same color as 
the previously processed item, such an item is 
processed next. Otherwise, the algorithm has to 
pick a different color and performs a color switch 
which incurs a cost of 1. In order to decide 
which color to switch to, each color is assigned 
a “penalty” counter which is initially set to 0 
and is reset to 0 whenever the color is selected 
for processing. If there is a color with penalty at 
least k, then an item with that color is selected 
next. Otherwise, an arbitrary color is selected and 
the penalty counters for each color are increased 
proportional to the number of items of that color 
that are stored in the buffer. For the O(,/log k)- 
competitive algorithm, instead of picking an arbi- 
trary color, a more sophisticated rule is used. 


Randomized Algorithms 

Randomized algorithms can achieve much 
smaller competitive ratios. The first random- 
ized algorithm with a competitive ratio of 
O(log logk) was given for the block operation 
model [3]. Shortly afterward, a randomized 
algorithm with the same competitive ratio was 
presented for the original model [8]. This is best 
possible since a matching lower bound is known 
[2]. These randomized algorithms are based on 
online primal-dual LP schemes [11]. 


Other Metric Spaces 

Apart from the uniform metric, line metrics have 
received some attention. After a randomized 
O(log? n)-competitive online algorithm for 
n equally spaced points on a line [27], an 
improved deterministic O(log n)-competitive 
algorithm was given [24]. A deterministic 
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O(log N loglog N)-competitive algorithm for 
a line metric with not necessarily equally spaced 
points was also given, but here N refers to the 
number of items in the input sequence [24]. 
An easy observation, however, shaves off the 
loglog N factor and improves the analysis to 
show O(log NV) competitiveness (Cygan, Mucha, 
Private communication, 2011). There is still a 
significant gap between this upper bound and the 
best known lower bound of about 2.154 [24]. 

For general metric spaces, a randomized 
O(log” k logn)-competitive online algorithm 
is known, where n is the number of points in 
the metric space [19]. This result is based on a 
deterministic algorithm for trees that is turned 
into an algorithm for general metrics by using a 
metric embedding [23]. 


Stochastic Inputs 

In a setting where the input is not adversarial 
constructed but where the colors of the items 
are drawn i.i.d. from an unknown distribution, 
a constant competitive ratio is achievable [22]. 
This result also holds when the colors of the 
items are fixed by an adversary but the order 
in which the items arrive is random. The proof 
is based on the fact that a constant competitive 
online algorithm is known for adversarial inputs, 
if the online algorithm can use a buffer that is 
four times as large as the one used by the optimal 
off-line algorithm. In the stochastic input setting, 
this difference in buffer size does not lead to 
significantly different cost, i.e., the cost of an 
optimal algorithm with buffer size k is only by a 
constant factor larger than the cost of an optimal 
algorithm with buffer size 4k. This is not true for 
adversarial inputs [1]. 


The Off-Line Problem 


The reordering buffer problem is NP-hard [5, 12] 
for the uniform metric, and the complexity for 
line metrics is unknown. Therefore, several pa- 
pers focus on approximating the off-line scenario. 

A constant factor approximation is known for 
the uniform metric [7]. For star metrics, the best 
known approximation factor of O(log log ky) is 
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achieved by a randomized algorithm, where y 
denotes the ratio of the maximum to the minimum 
weight [25]. Both results are based on the intri- 
cate rounding of the solution to an LP relaxation 
of the corresponding problem. 


Bicriteria Approximations 

For more general metric spaces, the best ap- 
proximation ratios are achieved by bicriteria ap- 
proximations, i.e., the approximation algorithm 
can make use of more buffer capacity than an 
optimal algorithm. For metric spaces given by the 
distance metric over a weighted undirected tree, 
a bicriteria approximation with approximation 
factor 9 to cost and 4 + 1/k to buffer size is 
known [10]. Using metric embeddings [23], this 
implies a randomized bicriteria approximation 
with approximation factor O(logm) to cost and 
O(1) to buffer size, where n denotes the number 
of points in the metric space. 


The Maximization Problem 

In the maximization version of the problem, the 
goal is to maximize the total cost savings that 
result from reordering the input sequence. In 
terms of an optimal solution, the minimization 
and maximization scenario are identical. How- 
ever, in terms of approximation, they behave 
quite differently in the sense that a c-approximate 
solution for the maximization problem usually 
has very different cost from a c-approximate 
solution for the minimization problem. For the 
uniform metric, the first result was an approxima- 
tion algorithm with an approximation factor of 20 
[28]. This was later improved to a factor of 9 [9]. 


Online Minimum Makespan 
Scheduling 


Reordering buffers have also been studied in 
connection with other scheduling problems, in 
particular online minimum makespan scheduling. 
As in the classic problem without reordering, 
the input consists of a sequence of jobs with 
processing times, and a scheduling algorithm has 
to assign the jobs to m parallel machines, with the 
objective to minimize the makespan, which is the 
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time it takes until all jobs are processed. However, 
it is not required that each arriving job has to 
be assigned immediately to one of the machines. 
A reordering buffer can be used to reorder the 
input sequence of jobs. At each point in time, 
the reordering buffer contains the first k jobs of 
the input sequence that have not been assigned so 
far. An online scheduling algorithm has to decide 
which job to assign to which machine next. Upon 
its decision, the corresponding job is removed 
from the buffer and assigned to the corresponding 
machine, and thereafter the next job in the input 
sequence takes its place. 


Non-preemptive Scheduling 


For non-preemptive scheduling, Englert, Ozmen, 
and Westermann [20] give, for m identical ma- 
chines, a tight bound on the competitive ratio. 
Depending on m, the achieved competitive ratio 
lies between 4/3 and 1.4659. This optimal ratio is 
achieved with a buffer of size of at most [2.5 - 
m]| + 2. They show that larger buffer sizes do 
not result in an additional advantage and that a 
buffer of size (2(m) is necessary to achieve this 
competitive ratio. This improves upon an optimal 
algorithm for two identical machines [26]. 

Further, they present several algorithms for 
different buffer sizes. In addition, for m uni- 
formly related machines, they give a scheduling 
algorithm that achieves a competitive ratio of 2 
with a reordering buffer of size m. 

Subsequently to [20], a variety of related pa- 
pers appeared (compare, e.g., [4, 14-16, 21]). For 
2 uniformly related machines with speed ratio 
s > 1, it is shown that, for any s > l,a 
buffer of size 3 is sufficient to achieve an optimal 
competitive ratio, and in the case s > 2, a buffer 
of size 2 already allows to achieve an optimal 
ratio [15]. 


Job Migrations 

The results of [20] can be generalized to the 
problem of online minimum makespan schedul- 
ing with job migrations, i.e., where no reordering 
buffer is available, but a limited number of job 
reassignments may be performed. For m identical 
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machines, the same competitive ratio as in [20] 
can be achieved [4]. The algorithm uses, for 
m > 11, at most 7m migration operations and, 
for smaller m, 8m to 10m migration operations. 
A number of papers consider similar models 
(compare, e.g., [13, 17,30,31)]). 


Preemptive Scheduling 


For preemptive scheduling on m identical ma- 
chines, tight bounds on the competitive ratio can 
be achieved for any m. This bound is 4/3 for even 
values of m and slightly lower for odd values 
of m [16]. A buffer of size O(m) is sufficient 
to achieve this bound, but a buffer of size o(m) 
does not reduce the best overall competitive ratio 
e/(e — 1) that is known for the case without 
reordering [16]. 
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Problem Definition 


The classic secretary problem, a prime example 
of stopping theory, has been studied extensively 
in the computer science literature. Consider the 
scenario where an employer is interested in hiring 
one secretary out of a pool of candidates. The 
difficulty is that, although the employer does not 
know the utility of a candidate before she is 
interviewed, the irrevocable hiring decision for 
each candidate has to be made right after the in- 
terview and prior to interviewing the subsequent 
candidates. The goal is nonetheless to pick the 
best candidate or maximize the probability of 
achieving this. 


Optimization Angle 

The above scenario is hopeless from an algo- 
rithmic point of view since an adversarial input 
makes it impossible to hire the best candidate. We 
can take either of two paths to make the problem 
tractable: restrict the set of utilities or the arrival 
order of candidates. The former path yields, for 
instance, the stochastic variant of the problem. 
However, we follow the second idea here that 
leads to the classic secretary problem. The extra 
assumption, then, is that the candidates arrive 
in a random error; i.e., although each candidate 
may have an arbitrary adversarial utility, every 
permutation of the candidates is equally likely to 
be the arrival order. 

A folklore solution to the problem, often at- 
tributed to [3], is to look into the first i fraction of 
the candidates (called the “tuning set’), without 
giving them any offers, and then hire the first 
candidate with utility more than every one in the 
tuning set. It is not difficult to show that this 
approach hires the best candidate with probability 
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at least 4. Indeed, it is known that this is the best 
possible performance. 

There are two questions to be answered, once 
we extend the problem to multiple secretaries. 


1. What subsets of secretaries can be hired to- 
gether? The simplest answer is to allow at 
most k secretaries to be hired. Alternately, we 
can place (several) knapsack and/or matroid 
constraints on the feasible set. The former 
assigns a cost to each hire — say, the requested 
salary — that is to be paid out of a given budget. 
The latter permits only those combinations 
that form an independent set according to 
a given matroid. It is easy to see that both 
generalize the cardinality constraint. 

2. How do we compute the utility of a set? The 
utility of a set can be defined as the sum of 
the utilities of individual secretaries in the set. 
More generally, a submodular or subadditive 
function may be employed to describe the 
utility of a set. 


We then attempt to hire a feasible set of secre- 
taries of maximum expected utility. 


Mechanism Design Angle 

Mechanism design literature has looked at this 
problem from a slightly different angle. In this 
setting, the players that arrive in a random order 
declare a bid — i.e., how much they value the 
item being sold — and then the seller decides who 
should get the item (or items) and how much 
they should be charged. Such decisions are to be 
taken irrevocably as in the optimization problem 
discussed above. 

The players can play strategically, though, 
by declaring higher or lower bids in order to 
increase their chances of winning the item or to 
reduce the price they pay. In addition, they may 
declare their arrival/departure time untruthfully 
to achieve a better result. We want to design a 
“truthful” auction that precludes such undesirable 
outcomes. Although we allow the player to de- 
clare any nonnegative bid (if it is in her favor), 
we do not let them state an arrival time that is 
earlier than their actual one. (Presence intervals 
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may be overlapping and/or nested.) We say that 
a mechanism is value-strategyproof if no player 
can benefit from declaring a bid different from 
her real value. Similarly the mechanism is called 
time-strategyproof if there is no benefit in stating 
the arrival/departure times untruthfully. We look 
for mechanisms that are both time- and value- 
strategyproof. 


Key Results 


Optimization 

Kleinberg [6] studies the multiple-choice gener- 
alization where the goal is to hire & candidates, 
whose total utility (defined as the sum of the 
individual utilities) is maximized. He presents 


a tight performance guarantee of 1 + 0 (=) 


for the problem. In the case of k = 1, this is 
equivalent to the classic secretary problem. (The 
nontrivial direction follows from a construction 
where the utilities are hugely different.) Klein- 
berg’s algorithm partitions the set of candidates 
into two (almost) equal pieces, recursively hires 
x secretaries in the first, sets the threshold for the 
second piece by looking at the solution to the first 
piece, and picks as many as x secretaries in the 
second piece who are better than threshold. 

Babaioff et al. [1] look at the generalization 
where there is a restriction on the set of candi- 
dates that can be hired together; the restriction 
is in the form of a matroid. They present an 
O(log n) competitive ratio in this case along with 
improved bounds when the matroid has a special 
form. Their general matroid algorithm partitions 
the items into logarithmically many sets of almost 
equal utility and focuses (randomly) on one such 
set, which reduces the problem into that of maxi- 
mizing the cardinality of the solution (solved via 
the greedy method). 

The case of submodular utilities is discussed 
in Bateni et al. [2]: several matroid or knapsack 
constraints can be placed on the set of feasible 
candidates, and the total utility of a set is com- 
puted by a submodular function of the participat- 
ing candidates. They provide constant competi- 
tive ratios as long as a fixed number of knapsack 
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constraints are present. When (a constant number 
of) matroid constraints are involved, too, their 
performance guarantees grow to O(log” k) where 
k is the rank of the matroid. They divide the input 
into different pieces where at most one secretary 
should be picked from each, not losing too much 
utility in the process. As a result, the submodular 
function collapses to an additive one within each 
piece (by taking the marginal values of secretaries 
with respect to the current solution). The classic 
algorithm is then used inside each piece. The 
main idea behind the matroid algorithm is that 
we only need to show that, whatever choices 
we have already committed to, there are enough 
options left that can appropriately augment the 
current solution. The argument goes by proving 
the existence of a magical solution with k’ secre- 
taries any of whose K’ size subsets has significant 
contribution (say, at least a ot fraction of the 
optimum) in the submodular function. Had we 
known k’, a simple greedy algorithm would have 
sufficed to find a solution similar to the magical 
set. At the cost of another factor O(log k), we can 
guess k’. 

Furthermore, Bateni et al. show that subad- 
ditive utility functions make the problem much 
more difficult. In particular, they provide match- 
ing O(Vk) competitive ratios. 


Mechanism Design 

The Dynkin’s algorithm for the classic secretary 
problem can be readily turned into an auction: set 
the price after observing the tuning set, and then 
sell to anyone with a higher bid. This mechanism 
is not truthful, though, since high-bid players 
spanning across the time threshold have an in- 
centive to declare later arrival time (i.e., after the 
threshold); this way, they will win the item but do 
not set the price. 

Nevertheless, Hajiaghayi et al. [5] show how 
one can modify the mechanism slightly to make 
it truthful: after the threshold, consider the option 
of selling the item to the agent with the highest 
bid so far — if she is still present — and charge 
her the second-highest bid so far. Their method 
achieves constant competitiveness for both effi- 
ciency and revenue. Their 1/e competitiveness 
for efficiency is best possible since it generalizes 
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the optimization problem; however, when com- 
paring the revenue to that achieved by the Vickrey 
auction, their upper bound of 1/e? for competi- 
tiveness fares against a lower bound of 1/e. (It 
is possible, they show, to modify the mechanism 
slightly to trade efficiency loss for revenue gain; 
for instance, simultaneous 4 competitiveness for 
both objectives is possible.) 

The general idea for the transformation is to 
define a “tuning period” where the price is set 
for everyone. Then, not only a simple auction-like 
mechanism is employed in the “hiring phase” to 
obtain a strategyproof mechanism, but also extra 
care should be given to the “transition phase” 
(from tuning to hiring) so as not to incentivize 
untruthful declaration of arrival time for those 
whose presence spans the transition. The same 
approach can be applied to the multiple-choice 
secretary problem to obtain constant-factor com- 
petitive mechanisms (for efficiency and revenue), 
but this bound is far from the one achieved in the 
optimization setting by Kleinberg [6]. 


Open Problems 


Though there has been some improvements on 
the matroid case, we still do not know which 
cases are hard and admit no constant-factor com- 
petitive ratio. For submodular utilities (and sim- 
ple cardinality constraints), in particular, there is 
a gap between ( 1- t) /(e +1) algorithmic result 


1 1 
[4] and the 1 — > (or 1—- -z) target known for 
linear utilities. 
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Problem Definition 


Temperature | (also called noncooperative) self- 
assembly is a model of the formation of struc- 
tures by growing and branching tips. Despite its 
ubiquity in nature (in systems such as plants and 
mycelium or percolation processes) and apparent 
dynamic simplicity, it is one of the least under- 
stood models of self-assembly. 
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This model was introduced in a broader frame- 
work called the abstract Tile Assembly Model 
(aTAM) [10]. In the aTAM, we consider tile 
assembly systems, which are defined by a finite 
set T of square or cubic tile types, an initial seed 
assembly o (one or more tiles stuck together), and 
an integer temperature tT = 1,2,3,.... All tiles, 
on each of their sides, have glues with an integer 
color and an integer strength. 

The dynamics of tile self-assembly starts from 
the seed assembly and proceeds one tile at a time, 
asynchronously and nondeterministically. A tile 
can stick to an existing assembly if it can be 
placed so that the sum of the strengths on its 
sides matching the existing assembly is at least 
the temperature. In the case of temperature 1, this 
means that tiles can be placed as soon as one 
of their sides matches the existing assembly. At 
higher temperatures, we can require that newly 
placed tiles match several of their neighbors to 
attach. 

Ultimately, after a countable (potentially infi- 
nite) number of steps, no tile can be added to the 
assembly, in which case we call it terminal. Like 
in Wang tilings, tiles cannot overlap, be rotated, 
or be flipped. However, tiles can have mismatches 
with their neighbors (Fig. 1). 


Key Results 


The first comparison between temperatures | and 
2 was shown by Rothemund and Winfree [9], 
with the motivation of computing and efficiently 


Non-cooperative (7 = 1) Cooperative (7 > 2) 


Self-Assembly at Temperature 1, Fig. 1 In the non- 
cooperative model, tiles can attach as soon as one side 
matches the neighborhood 
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building arbitrary shapes at the nanoscale. In 
this context, the generally accepted definition of 
“efficient” is with significantly less tile types than 
the size of the output. 


Assembling Simple Shapes Efficiently 

The first step toward these goals is the program- 
ming of simple shapes like squares or trees. At 
temperature >2, constructions with Turing ma- 
chines can be used to show the following bound: 


Theorem 1 (from [9]) The smallest —two- 
dimensional tileset T, producing only squares 
of size n x n from a single-tile seed is of size 


fa) logn ; 
log logn 


The smallest number of tile types that can 
assemble exactly a set of shapes is called the 
tile complexity of that set. In the noncooperative 
model, the following upper bound is known: 


Theorem 2 (from [9]) For all integer n, there is 
a tileset Ty, of size 2n — 1, that produces only 
squares of sizen x n from a single-tile seed. 


Whether this upper bound is optimal is still 
one of the major open problems of the model, 
and little progress has been made since its identi- 
fication. The real motivation behind this question 
is whether we (or natural systems) can perform 
useful computations with this model. 

Finding the smallest tileset for assembling an 
input shape can also be treated as an optimization 
problem: see Adleman et al. [1] for the case of 
tree shapes. 


The Role of Geometry 

A partial answer to this question was found by 
Cook, Fu, and Schweller [2], who tried to “fake” 
cooperation by blocking the growth of some parts 
of the assembly. They introduced two different 
ways to do this: removing the planarity constraint 
and allowing errors. 

In both cases, “faking” cooperation means 
producing the same assemblies as a temperature 
2 tile assembly systems up to rescaling by a 
constant factor. 
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Three-Dimensional Noncooperative 
Self-Assembly 

In three dimensions, temperature | self-assembly 
is able to simulate Turing computations: 


Theorem 3 (from [2]) There is a_ three- 
dimensional tileset T such that for all Turing 
machine M and input x € N, there is a 
computable seed assembly onyx and atilet € T, 
such that all terminal assemblies of (T,0.M,x, 1) 
contain t if and only if M accepts input x. 


The construction simulates a Turing-universal 
cooperative tile assembly system called a zigzag 
system, in which rows grow on top of each other, 
alternatively to the left and to the right, using 
cooperation to copy and update the previous row 
(Fig. 2). 

The idea is pictured on Fig.3. A “main” 
path grows on each row, building “bridges” 
and “blockers” (in blue on Fig.3) that encode 
bits. These bits can be read by the next row: 
before reading a bit, the main path (in orange 
on Fig.3) of the row forks into two branches, 
respectively probing for a bridge (encoding a 1) 
and a blocker (encoding a 0). Exactly one branch 
passes through and can accumulate successive 
bits in its state, until a full tile has been read. 
Then, it rewrites bits encoding the next tile for 
the row above. 


Allowing Erroneous Blocking 

Adapting the mechanism used in the 3D con- 
struction to the planar case is widely conjectured 
impossible [9], because allowing the “wrong” 
branch to grow and collide against a previous 


Self-Assembly at Temperature 1, Fig. 2. An example 
zigzag system (Figure from [2]) 
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Self-Assembly at Temperature 1, Fig. 3 Bit selection in 3d 


part of the assembly, in Fig. 3, encloses the other 
“correct” branch inside a finite portion of the 
plane. 

However, it becomes possible if we consider 
a stochastic assembly schedule, where at each 
time step, exactly one tile attaches, and all tiles 
that can attach do so with equal probability. If 
we repeat the above construction k times con- 
secutively, only one needs to succeed. We can 
therefore lower the probability of failure of each 
bit selection to 2-*: 


Theorem 4 (from [2]) For all ¢ > 0O and 
all zigzag tile systems T = (T,s,2), whose 
producible assemblies have size at most some 
constant r, there is a planar temperature 1 
probabilistic tile assembly system S that sim- 
ulates JT without error with probability at least 
l-e. 


Of course, this construction means that the 
number of tile types and scaling factor will in- 
crease by a factor depending on « and r. 


Simulation up to Rescaling 

One of the latest developments of tile assembly 
is the notion of intrinsic universality [3,4, 11], a 
notion of simulation by rescaling only between 
tile assembly systems. 

This idea is useful in particular to compare 
different models, because it provides qualitative 
properties to check, as opposed to quantitative 
properties such as tile complexity. The general 
argument is: 


¢ At temperature 2 in two dimensions, there is 
a tileset known from [4] to be able to simu- 
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Self-Assembly at Temperature 1, Fig. 4 Tile assembly system 7 (Figure from [7]) 


late any other tile assembly system, modulo 
rescaling. 

¢ However, there is a tile assembly system 7 
that no tileset in model X (in our case, tem- 
perature 1) can simulate without errors. 

¢ Therefore, model X is not as powerful as 
planar temperature 2. 


This argument was used, for instance, to prove 
the first separation result between temperature 
2 and the fully general model of temperature 
1 [7]: 


Theorem 5 (from [7]) There is a planar (tem- 
perature 2) tile assembly system T (whose pro- 
ductions are pictured on Fig. 4) that no (two- or 
three- dimensional) tile assembly system (A, @, 1) 
can simulate up to rescaling. 


The proof uses a combinatorial argument 
(called the window movie lemma) to show that if 
there were a tile assembly system simulating all 
productions of 7, then it would also be able to 
produce other “illegal” assemblies (see Fig. 5) 
that do not represent any of 7’s producible 
assemblies. 


Important Particular Cases 

Noncooperative self-assembly, when restricted to 
dimension one, is similar to nondeterministic 
finite automata. It is therefore natural to look for 
a pumping lemma. 

The first result in this direction was proven 
by Doty, Patitz, and Summers [5], who intro- 
duced the notion of pumpable paths: a path P is 
pumpable if it contains a subsegment Pj; +41,...,; 
that can be repeated arbitrarily many (consec- 
utive) times along PP; while remaining self- 
avoiding. 


Theorem 6 (from [5]) Let T be a tile assembly 
system that assembles exactly one (potentially 
infinite) terminal assembly a. If any path in 
a, longer than a constant c, is pumpable, then 
there are finite families of vectors by,...,Dy, 


Uy,...,Uy, and Vy,...,V_ € Z’, such that: 


dom(a) = 'o {bi + juy + kvil 7, 


1<i<n 


In [5], examples were identified, of paths with 
segments that could be repeated, but only finitely 
many times due to collisions. The formalization 
of these examples was later done by Manuch, 
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Self-Assembly at Temperature 1, Fig. 5 Illegal productions, that a temperature 1 system that can simulate all 
productions of 7 must also be able to simulate (Figure modified from [7]) 


Stacho, and Stoll [6], proving lower bounds 
under the hypothesis that no mismatches can 
occur. 

This approach was then extended by Reif and 
Song [8], to show that tile assembly systems 
without mismatches have a recursive set of pro- 
ductions. However, the decidability of the “no 
mismatches” hypothesis is still an open problem. 


Applications 


Given the successful experimental applications 
of tile self-assembly, particularly in the field of 
DNA nanotechnologies, it seems natural to try to 
implement them: indeed, intuition suggests that 
they would make no errors in cooperation tiles. 
However, no successful construction of noncoop- 
erative experiments has been reported; the reason 
might be that ensuring uniqueness of the seed is 
impossible, as any two tiles in solution together 
might bind, without any of them being bound to 
the seed. 


Open Problems 


Aside from understanding the exact geometric 
requirements for Turing universality, a number 
of open problems have been identified in this 
model: 


1. From [9]: What is the tile complexity of 
squares of size m xn in the planar, temperature 
1 model? 
A related problem, which has been in the 
folklore for some time, is the existence of 
a shape of tile complexity arbitrarily smaller 
than its Manhattan diameter. 


2. From [5]: If 7 is a tile assembly system 
with exactly one terminal assembly, is there a 
constant c such that any path longer than c is 
pumpable? 

3. From [7]: Is there a temperature | tile as- 
sembly system with a non-recursive set of 
productions? 

4. From [7]: Is there a single tileset able to sim- 
ulate any temperature | tile assembly system 
up to rescaling, using only noncooperative 
bindings? 
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Problem Definition 


This problem is concerned with the self-assembly 
fractal patterns and structures. More specifically, 
it deals with discrete self-similar fractals and dif- 
ferent notions of them self-assembling from tiles 
in the abstract Tile Assembly Model (aTAM) and 
derivative models. The self-assembly of fractals 
and fractal-like structures is particularly interest- 
ing due to their pervasiveness in nature, as well 
their complex aperiodic structures which result in 
them occupying less dimensional space than the 
space they are embedded within. 

Using the terminology from [1], we define Ng 
as the subset {0,1,..., g—l} of N, andif A, B C 
N? andk € N, then A+ kB = {m+ kn|me A 
and n € B}. We then define discrete self-similar 
fractals as follows. 

We say that X C N? is a discrete self-similar 
fractal (or dssf for short) if there exist 1 < g ¢ N 
and a set {(0,0)} CGC N? with at least one 
point in every row and column, such that X = 
eae X;, where X;, the ith stage of X, is defined 
by X; = Gand X;4, = X; + g’G. We say that 
G is the generator of X. 

Figure | shows, as an example, the first 5 
stages of the discrete self-similar fractal known 
as the Sierpinski triangle. In this example, G = 
{(0,0), (1,0), (0, 1)}. 

In general, we ask whether or not a given dssf 
X can self-assemble within a given model. 


Variants 

The general problem of determining whether or 
not a discrete self-similar fractal self-assembles 
within a given model has several variants, which 
determine the way in which the fractal shape is 
represented within a resulting assembly. 


1. Weak self-assembly. If a dssf X weakly self- 
assembles using a tile set 7, then there exists 
a subset of tile types B C T such that, in 
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Self-Assembly of Fractals, Fig. 1 Example discrete self-similar fractal: the first 5 stages of the Sierpinski triangle 


the terminal assembly a, for every point p € 
dom a such that p € X, the tile type at location 
p in @ is a type within B, and for every point 
p € dom a such that p ¢ X, the tile type at 
location p in @ is not within B. That is, the 
tile types in the subset B precisely “paint a 
picture” of X, while tiles of types not in B may 
appear in locations outside of X. 

2. Strict self-assembly. If a dssf X strictly self- 
assembles, then it weakly self-assembles with 
B = T, 1., the locations that exist in the 
domain of the terminal assembly are exactly 
those of X. 

3. Approximate self-assembly. A dssf X, and 
thus a strictly self-assembled version of X, has 
fractal dimension (i.e., zeta-dimension [3]) 
<2, and a weakly self-assembled version has 
dimension 2. Since it appears to be difficult if 
not impossible to strictly self-assemble many 
(or all) dssf’s, it is interesting to consider if an 
approximation of a dssf X which retains the 
same fractal dimension as X can strictly self- 
assemble. 


Key Results 


Self-assembly of dssf’s has been studied in all of 
the above variants and within the aTAM, 2HAM, 
and STAM [9]. As previously mentioned, the 
complexity of dssf’s makes them interesting to 
study since they are infinite, aperiodic structures. 


This requires any system in which they self- 
assemble to rely on algorithmic self-assembly 
(rather than unique tile types hard coded to 
each position of the shape), and for this reason 
early experimental results even included the 
weak self-assembly of the initial few stages 
of the Sierpinski triangle [11] as a proof of 
concept that DNA-based tile implementations 
of the aTAM are capable of algorithmic self- 
assembly. Nonetheless, as infinite structures, 
dssf’s are more often the focus of theoretical 
studies. 


Weak Self-Assembly 

As seen in [11], it is possible for a very sim- 
ple tile set of only 7 tile types to weakly self- 
assemble the Sierpinski triangle. This tile set can 
essentially be thought of as computing the xor 
function on two inputs (i.e., 00 — 0, 01 — 1, 
10 — 1, and 11 — 0), with the glues with which 
a tile initially binds to an assembly encoding the 
input bits and those to which tiles later attach 
encoding the output bits. 

In [4] it was noted that another characteriza- 
tion of the Sierpinski triangle is as the nonzero 
residues modulo 2 of Pascal’s triangle. They then 
provided a characterization of an infinite class of 
dssf’s, known as generalized Sierpinski carpets, 
which can be defined as the residues, modulo a 
prime number, of the entries in a two-dimensional 
matrix generated by a simple recursive equa- 
tion. (A well-known example among this class of 
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dssf’s is the Sierpinski carpet.) They then proved 
that all generalized Sierpinski carpets weakly 
self-assemble in the aTAM. 


Strict Self-Assembly 

Although weak self-assembly of many dssf’s can 
be achieved with very simple tile sets, it turns out 
that strict self-assembly is an entirely different, 
and much more difficult, problem. In fact, in [6] 
they proved that it is impossible for the Sierpinski 
triangle to strictly self-assemble in the aTAM. 
Furthermore, their proof showed that to be the 
case regardless of the temperature parameter. 
This result was extended in [10] to a proof that 
an infinite class of “pinch-point” dssf’s, which 
includes the Sierpinski triangle, cannot strictly 
self-assemble in the aTAM at any temperature. 
Pinch-point fractals are those whose generators 
have exactly one point in their topmost row, the 
leftmost, and one in their eastmost column, the 
bottommost. Yet another extension was provided 
in [1], where the authors defined “tree” fractals, 
which again include the Sierpinski triangle, as 
those with generators which are trees and which 
have a single point in their topmost row and a 
single point in their rightmost column. They then 
proved that, regardless of the temperature or even 
of the scale factor, no tree fractal strictly self- 
assembles in the aTAM. 

Additional results related to strict self- 
assembly of dssf’s include the proof in [10] 
that in the aTAM at temperature | (i.e., systems 
with t = 1), it is impossible for any dssf 
to self-assemble within a locally deterministic 
system (see [12] for a definition of local 
determinism), and in [2] it was proven that the 
Sierpinski triangle also cannot self-assemble 
in the 2-Handed Assembly Model, at any 
temperature. 

To date, the single positive result related to the 
strict self-assembly of a dssf is for an “active” 
model of self-assembly, where tiles are allowed to 
change the states of their glues during assembly, 
called the Signal-passing Tile Assembly Model 
(STAM). In [9] they gave a construction prov- 
ing that the Sierpinski triangle can self-assemble 
within the STAM at temperature 1 and scale 
factor 2. By the result of [1], this is impossible 
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in the aTAM and demonstrates the power of the 
active nature of the STAM, as that construction 
essentially builds stages of the Sierpinski triangle 
in a manner analogous to weak self-assembly, 
but then causes the unwanted interior portions to 
dissociate and then break apart. 


Approximate Self-Assembly 

It has been shown that an infinite subset of dssf’s 
can weakly self-assemble in the aTAM, while an- 
other infinite subset cannot strictly self-assemble. 
Recall also that dssf’s have fractal dimension <2, 
and since their strictly self-assembled versions 
retain their original fractal dimensions, so do 
they. However, their weakly self-assembled ver- 
sions have dimension 2. Therefore, the question 
arises about whether or not some transformation 
of a dssf (especially, a dssf which cannot strictly 
self-assemble), which visually approximates the 
original dssf while retaining its fractal dimension, 
can strictly self-assemble in the aTAM. 

This question was first answered positively 
in [6], where they defined a transformation for 
the Sierpinski triangle which they called “fiber- 
ing,’ and they then gave a construction proving 
that the so-called fibered Sierpinski triangle does 
strictly self-assemble in the aTAM while main- 
taining the Sierpinski triangle’s fractal dimension 
of ~1.585. An example can be seen in Fig. 2b, 
showing how the fibering consists an additional 
row of tiles added to the south and west borders of 
each copy of each subsequent stage of the fractal. 
In [10] they extended the technique of fibering 
to include an infinite subclass of dssf’s (which 
again includes the Sierpinski triangle) which they 
called “nice” dssf’s. Nice dssf’s are those whose 
generators are connected and contain all points on 
the west and south boundaries. 

While the fibering technique creates visual 
approximations of fractals, it results in subse- 
quent stages being further and further separated 
from each other. To counter this drawback, in 
[8] they introduced a technique for fibering the 
Sierpinski triangle “in place.’ An example can 
be seen in Fig. 2c, showing how this version of 
fibering only uses space on the interior of each 
stage of the fractal, thus allowing the stages to re- 
main in the same positions relative to each other. 
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Self-Assembly of Fractals, Fig. 2 Various patterns cor- 
responding to the Sierpinski triangle. (a) A portion of the 
discrete Sierpinski triangle. (b) A portion of the fibered 


Furthermore, this technique retains the same frac- 
tal dimension as the Sierpinski triangle, and they 
showed that it is impossible to use asymptot- 
ically less space than their construction while 
strictly self-assembling a shape which contains 
the Sierpinski triangle as a subset. In [5] this 
technique was extended to strictly self-assemble 
approximations for every generalized Sierpinski 
carpet. 


Open Problems 


1. Does there exist a discrete self-similar fractal 
which can strictly self-assemble in the aTAM, 
or conversely, can it be shown that none does? 

2. What is the class of discrete self-similar 
fractals for which an approximation, such as 
fibering or in-place fibering, which maintains 
the original fractal dimension, strictly self- 
assembles in the aTAM? 


URLs to Code and Data Sets 


ISU TAS simulation software for the aTAM, 
kTAM, and 2HAM (http://self-assembly.net/ 
wiki/index.php?title=ISU_TAS) and the Fibered 
Fractal Tiler for defining discrete self-similar 
fractals which can be fibered and generating 
the corresponding aTAM tile sets (http:// 


Sierpinski triangle of [7] (Figure from [7]). (c) A portion 
of the in-place fibered Sierpinski triangle of [8] (Figure 
from [8]) 


self-assembly.net/wiki/index.php?title=Fibered_ 
Fractal_Tiler). 
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Problem Definition 


Abstract Tile Assembly Model 

The abstract Tile Assembly Model (aTAM) [3] is 
a mathematical model of self-assembly in which 
system components are four-sided Wang tiles 
with glue types assigned to each tile edge. Any 
pair of glue types are assigned some nonnega- 
tive interaction strength denoting how strongly 
the pair of glues bind. An aTAM system is an 
ordered triplet (T,t,o) consisting of a set of 
tiles T, a positive integer threshold parameter t 
called the system’s temperature, and a special 
tile 0 € T denoted as the seed tile. Assembly 
proceeds by attaching copies of tiles from T to a 
growing seed assembly whenever the placement 
of a tile on the 2D grid achieves a total strength 
of attachment from abutting edges, determined 
by the sum of pairwise glue interactions, that 
meets or exceeds the temperature parameter T. 
The pairwise strength assignment between glues 
on tile edges is often restricted to be “linear” in 
that identical glue pairs may be assigned arbi- 
trary positive values, while non-equal pairs are 
required to have interaction strengths of 0. We 
denote this restricted version of the model as 
the standard aTAM. When this restriction is not 
applied, i.e., any pair of glues may be assigned 
any positive integer strength, we call the model 
the flexible glue aTAM. 

Given the aTAM’s model of growth, we may 
consider the problem of designing an aTAM sys- 
tem which is guaranteed to grow into a target 
shape S, given by a set of 2D integer coordinates, 
and stop growing. Such systems are guaranteed 
to exist for any finite shape S', but solutions will 
typically vary in the number of tiles |7| used. 
For a given shape S, an interesting problem is 
to design a system that assembles S while using 
the fewest, or close to the fewest, number of tiles 
|T | possible. This fewest possible number of tiles 
required for the assembly of a given shape S is 
termed the program-size complexity of S. 


Problem 1 Let Ks,4(n) and K¢,(1) denote the 
program-size complexity of an n x n square for 
the standard aTAM and the flexible glue aTAM, 
respectively. What are Ks,4(n) and K¢,(n)? 
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Problem 2 Let Ks,(n,k) and Ky,(1,k) de- 
note the program-size complexity of a k x n 
rectangle for the standard aTAM and the flexible 
glue aTAM, respectively. What are Ks,4(n,k) 
and Ky,(n,k)? 


Problem 3 For an arbitrary given shape S, what 
is the program-size complexity of S? Let the 
scale-free program size of S be the smallest tile 
set system that uniquely builds some scaled-up 
version S. Let Ks4(S) and Ky,(S) denote the 
scale-free program size of S for the standard 
aTAM and the flexible glue aTAM, respectively. 
What are Kg4(S) and Ky, (S)? 


Key Results 


The best known bounds for program-size com- 
plexity for squares, rectangles, and general scaled 
shapes are presented in this section. 


n x n Squares 

The efficient self-assembly of n x n squares has 
served as a benchmark for self-assembly algo- 
rithms within the aTAM and more general tile 
assembly models. Within the aTAM, the prob- 
lem is well understood up to constant factors. 
The first result states a general upper bound for 
the program size of self-assembled squares for 
general n, which is matched by an information- 
theoretic lower bound that holds for almost all 
integers n. The precise bounds differ between the 
standard and flexible glue models but are tight in 
both cases. The lower bound of inequality (1) is 
proven in [3] and is based on the Kolmogorov 
complexity of the integer n. The lower bound of 
(2) is proven in [2] by the same approach. The 
upper bound of (1) is proven in [1] and offers 
an improvement over the initial upper bound of 
O(logn) from [3]. The O(log) result of [3] 
is achieved by implementing a key primitive in 
tile self-assembly: a binary counter of log tiles 
that grows to length n. The improvement of [1] 
is achieved by modifying the counter concept to 
work with an optimal, variable base. The upper 
bound of (2) is proven in [2] and is obtained 
by combining the aTAM counter primitives with 
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a scheme for efficiently seeding the counter by 
extracting bits from the values of the flexible glue 
interactions. 


Theorem 1 There exist positive constants c, and 
c2 such that for almost all integers n € N, the 
following inequalities hold. Moreover, the upper 
bounds hold for alln € N. 


logn logn 
ce" < Ksa(n) <a =". (1) 
log logn log logn 
ci Vlogn < Kg4(n) < co Vlogn. (2) 


While the above theorem presents a tight un- 
derstanding of the program-size complexity for 
most self-assembled squares, the information- 
theoretic lower bound allows for special values of 
n to be assembled with a much smaller program 
size. The program size is in fact as small as one 
could reasonably hope for. In [3], a tile system 
is presented that simulates a Busy Beaver Turing 
Machine and assembles correspondingly large 
squares for each tile set size. This construction 
yields the following theorem implying that the 
largest self-assembled square for a given number 
of tiles grows faster than any computable func- 
tion! 


Theorem 2 There exists a positive constant c 
such that for infinitely many n, Kg,(n) < cf(n) 
for f(n) any nondecreasing unbounded com- 
putable function. 


Thin Rectangles 

The program size of self-assembled squares and 
other thick rectangles is dictated by information- 
theoretic bounds which stem from the aTAM’s 
ability to simulate arbitrary Turing machines 
given enough geometric space to work within. 
When this space is cut down, such as in the case 
of building a thin k xn rectangle, the program size 
is limited by geometric factors. The following 
upper and lower bounds are shown in [2] and 
represent the best known bounds for thin k x n 
rectangles in which k = O(log/loglogn). 
The lower bound is achieved by a pigeon-hole 
pumping argument on the types of tiles placed, 
along with their order of placement, along a 
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width k column of the target rectangle. The upper 
bound is based on the construction of a general- 
base, general-width counter, which generalizes 
the binary counter concept of [3]. 


Theorem 3 There exist positive constants c, and 
C2 such that for any n,k € N, the following 
inequalities hold. 


1/k 
ci 5 Kgaln, k) <Ksa(n,k) Sen(n'/* +h). 


Scaled Shapes 

The program size of general shapes is difficult 
to analyze as it is highly dependent on geomet- 
ric features of the target shape. However, if we 
consider the assembly of an arbitrarily scaled- 
up version of a target shape, these geometric 
difficulties can be eliminated and a very general 
result can be achieved. The next result from [4] 
shows that the scale-free program size of S is 
closely related to the Kolmogorov complexity 
of S. In particular, the scale-free program-size 
complexity of S is a log factor less than the Kol- 
mogorov complexity of S' for the standard model, 
and the scale-free program size complexity of S 
is the square root of the Kolmogorov complexity 
of S for the flexible glue model. The standard 
model result is shown in [4] and is achieved 
by encoding a compressed description of S in 
a small tile set which is extracted by a set of 
tiles simulating a Turing machine that extracts the 
pixels of S from this compressed representation. 
The need for the scale factor increase of S is to 
allow room for the Turing machine simulation. In 
fact, the required scale factor is the run time of the 
Turing machine that decompresses the optimal 
encoding of S. The flexible glue result is achieved 
by combining portions of the flexible glue con- 
struction for squares [2] with the construction 
of [4]. In the following theorem, K(S) denotes 
the Kolmogorov complexity of S with respect to 
some fixed universal Turing machine. 


Theorem 4 For any shape S, there exist positive 


constants Cc, and Cz such that 


K(S) 
“log K(S) 


K(S) 


K —_—_.. 
< Ksa(S) < are KS) 


(3) 
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c1V K(S) < Kg4(S) <coV K(S). (4) 


Open Problems 


A few important open problems in this area are as 
follows. In the case of squares, the program size 
is well understood as long as the temperature of 
the system is at least two. A long-standing open 
problem has been to determine the program-size 
complexity of n x n squares for temperature-1 
self-assembly in which each positive glue force 
alone is sufficient to cause a tile attachment. To 
date, no known method is able to achieve o(n) 
tile complexity at temperature-1 for ann x n 
square, but no proof exists that this cannot be 
done. With respect to thin k x n rectangles, the 
best upper and lower bound have a gap with 
respect to variable k. Does there exist a more 
efficient rectangle construction, or can a higher 
lower bound be derived? Finally, while the scale- 
free program-size complexity of general shapes 
is well understood, little is known about the 
(unscaled) program size of general shapes. What 
new tools and geometric classifications can be 
developed to analyze and bound this complexity 
for general shapes? 
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Problem Definition 


Self-assembly is an asynchronous, decentralized 
process in which particles aggregate to form 
superstructures according to localized interac- 
tions. The most well-studied models of these 
particle systems, e.g., the abstract Tile Assembly 
Model of Winfree [11], utilize square-shaped par- 
ticles arranged on a lattice by attaching edgewise. 
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Particles attach to form larger assemblies, and a 
pair of assemblies or tiles can attach if they can 
translate to a nonoverlapping configuration with 
a set of k coincident edges, where k > tT, a 
parameter of the system called the temperature. 

In seeded assembly, individual particles at- 
tach to a growing seed assembly. This assem- 
bly may begin as a single-tile or a multi-tile 
assembly. In unseeded assembly (also called hi- 
erarchical [3], two-handed [2], or polyomino [7] 
assembly), there is no such restriction. The set 
of assemblies to which a single tile cannot attach 
(in seeded assembly) or that cannot attach to any 
other assembly (in unseeded assembly) are the 
terminal assemblies of the system. 


Objectives In general, the goal is to design a 
system of minimal complexity that assembles 
into a unique terminal assembly with a desired 
shape or property. In models using square tiles, 
this is equivalent to designing a system using the 
fewest tile types. When tiles are allowed to be 
more general shapes, then the option of trading 
tile types for tile complexity becomes available. 
The motivation for this work is to understand 
how more complex tile shapes can be used to 
reduce the number of tile types in a system, and 
two benchmark problems regarding the compu- 
tational power and efficiency of tile systems are 
considered in the context of systems of non- 
square tiles: 


Problem 1 (Square Assembly) 


INPUT: A natural number n. 

OUTPUT: A self-assembly system with a unique 
terminal assembly consisting of n? tiles in a 
n Xn square shape. 


Problem 2 (Computational Power) What sys- 
tems of non-square tiles are capable of simulating 
computation, and to what extent? 


Key Results 
In general, it is the case that allowing non-square 


tiles permits an asymptotic reduction in the 
number of tile types, and systems of very 
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few non-square tiles are capable of universal 
computation. At a high-level, such reductions are 
achieved by simulating many tiles via translations 
and rotations of a single tile type. 


Models 
Fu, Patitz, Schweller, and Sheline [6] introduce 
two models of general shaped types. The first, 
called the geometric Tile Assembly Model 
(gTAM), is a model of seeded, translation-only 
assembly where tiles are polyomino-shaped — 
equivalent to prebuilt assemblies of square tiles. 
The second is an unseeded version they call the 
Two-Handed Planar Geometric Tile Assembly 
Model (2GAM), which has the added restriction 
that assemblies can only attach if there exists a 
continuous motion bringing the two assemblies 
together during which they remain disjoint. 
This can be thought of as a restriction that the 
assemblies live in the plane and do not make use 
of the third dimension to maneuver into place. 
Demaine et al. [4] introduce the polygonal 
free-body Tile Assembly Model (pfbTAM) in 
which tiles may have arbitrary simple polygonal 
shapes, attaching edgewise along equal-length 
edges. Systems without and without rotation 
are both permitted — we note that rotation is 
forbidden in the gTAM and 2GAM (as well as 
the aTAM). 


Efficient Construction 
Fu, Patitz, Schweller, and Sheline [6] prove that 
both the gTAM and 2GAM allow an asymptotic 
reduction in the number of tiles needed to as- 
semble an n x n square of tiles. For the gTAM, 
they prove that such a square can be assembled 
using a temperature-1 system of O(,/logi) tile 
types, beating the optimal (temperature-2) system 
of 2(logn/loglogn) square tiles by Adleman 
et al. [1]. This is a reduction in both the num- 
ber of tile types (by a quadratic factor) and 
temperature. The temperature reduction is es- 
pecially significant, as a lower bound of (2(n) 
for temperature-! aTAM systems is a widely 
believed conjecture [8-10]. 

For the 2GAM, they reduce the number of tile 
types even further, using a temperature-2 system 
O(loglogn) tile types to assemble an n x n 
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square. However, this system comes with the 
caveat that system makes use of either a discon- 
nected tile shape or a slightly three-dimensional 
shape. 


Computational Power 

Positive results on the computational power 
of general shaped tile systems fall into two 
categories: Turing universality and bounded- 
time computation. Fu, Patitz, Schweller, and 
Sheline [6] prove that any Turing machine 
computation can be carried out by a temperature- 
1 gTAM system. As with the temperature-1 
construction of squares, this result is surprising 
due to the open conjecture regarding the 
computational power of square tile systems at 
temperature 1. 

Demaine et al. [4] prove that any Turing 
machine computation can be carried out by a 
temperature-2 pfbTAM system (with rotation) 
consisting of a single tile. Their result actually 
proves that any aTAM system can be simulated 
by such a system, and thus Turing universality is 
achieved by simulating aTAM systems carrying 
out computation. Combined with the intrinsic 
universality result of Doty et al. [5], this result can 
be extended to prove that a single temperature- 
2 pfbTAM system (with rotation) consisting of 
a single tile can carry out any Turing machine 
computation, given an appropriate seed assembly 
consisting of copies of this tile. 

Finally, Demaine et al. also prove that 
temperature-3 pfbTAM — systems (without 
rotation) consisting of a single tile can carry out 
simulation of computationally universal cellular 
automata for a number of steps limited by the size 
of the seed assembly. Specifically, they prove that 
n steps can be carried out using a seed assembly 
of O(n) tiles. A loose lower bound is also proved, 
namely, that more than three tiles are needed to 
carry out any computation. 


Applications 
The generic ability to reduce the number of tile 


types in a system by increasing the geometric 
complexity of these tiles extends many other 
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constructions in theoretical tile assembly. Addi- 
tionally, there may be practical barriers to sys- 
tems of many tile types, e.g., additional cost 
of manufacturing or longer assembly time due 
to heterogenous combinations of many particle 
types, that can be reduced or eliminated by re- 
placing these systems with systems of fewer, 
more complex tile. 


Open Problems 


Obtaining an upper bound on the number of 
steps of a cellular automaton simulable by single- 
tile translation-only systems remains open. It is 
conjectured that a seed assembly of size n can 
only carry out O(n”) steps. 
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Problem Definition 

In bin packing games with selfish items, n items 


are to be packed into (at most) n bins, where each 
item chooses a bin that it wishes to be packed 
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into. The cost of an item i of size 0 < s; < 
1 is defined based on its size and the contents 
of its bin. Nash equilibria (NE) are defined as 
solutions where there is no item that can change 
its choice unilaterally and gain from this change. 
Bin packing games were inspired by the well- 
known bin packing problem [2]. In this problem, 
a set of items, each of size in (0, 1], is given. 
The goal is to partition (or pack) the items into 
a minimum number of subsets that are called 
bins. Each bin has unit capacity, and the load 
of a bin is defined to be the total size of items 
packed into it (where the load cannot exceed 1). 
The problem is NP-hard in the strong sense, and 
thus theoretical research has focused on studying 
and developing approximation algorithms, which 
allow to design nearly optimal solutions, and on 
online algorithms, which receive the items one by 
one and must assign each item to a bin immedi- 
ately and irrevocably (without any information on 
further items). 

In a bin packing game, every item is operated 
by a selfish player. There are n bins, and the 
strategy of a player is the bin that it selects. If 
the resulting packing is valid (i.e., the load of 
no bin exceeds 1), then the set of items sharing 
a bin share its cost proportionally, i.e., let B be 
a bin (a subset of items). The cost of i € B is 


si / ( ye “)) If the resulting packing is invalid, 
icB 
any item packed into an invalid bin has infinite 
cost. We are interested in pure Nash equilibria, 
and by the term NE, we refer to such an equilib- 
rium. The problem was presented by Bild [1]. 
There are several directions which can be 
explored. First, one would like to find out if any 
bin packing game has an NE. If this is the case, 
other kinds of equilibria might be of interest as 
well. For a class of games (such that each of 
them has an NE), a process of convergence is 
defined as follows. The process starts with an 
arbitrary configuration, and at each time, an item 
that can reduce its cost is selected and moved to 
another bin (where the cost of this item will be 
smaller than its cost before it is moved). Such a 
process can also be seen as local search. Items 
are moved one at a time; a single move (for one 
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item) is called a step. Note that one item can 
participate in multiple steps. The questions which 
can be asked are whether the process converges 
for any initial packing (i.e., reaches a state that 
no further step can be applied) and how large 
can the number of steps be. As it turns out, any 
bin packing game has at least one NE, and the 
processes described here always converge [1, 8]. 
Since it is possible that the process converges 
in exponential time, it is of interest to develop 
a polynomial time algorithm that computes NE 
packings. Such an algorithm for this problem 
defined above was designed by Yu and Zhang 
[9]. Finally, once the existence of NE packing 
has been established, the primary goal becomes 
the study of the quality of worst-case equilibria. 
This concept is called price of anarchy. For a 
given game G (i.e., a set of items which is an 
input for bin packing), the price of anarchy of this 
game, denoted by POA(G), is the ratio between 
the maximum number of nonempty bins in any 
NE packing and the minimum number of bins in 
any packing (the number of bins in an optimal 
packing, also called the social optimum, denoted 
by OPT(G)). The price of stability is similar, 
but best-case equilibria are studied, and as Bilo 
[1] proved that any game has a social optimum 
that is an NE, the price of stability is 1 for any 
game. 

The price of anarchy (POA) of a class 
of games (here, the class of all bin packing 
games) is defined to be the supremum POA 
over all games in the class. However, as bin 
packing is typically studied with respect to 
the asymptotic approximation ratio, the POA 
for the bin packing class of games is defined, 
similarly to the asymptotic approximation ratio, 


as lim sup POA(G). 
M~>© {G:OPT(G)>M} 


Key Results 


The POA was studied already in [1], where Bilo 
provided the first bounds on it, a lower bound of 
8 and an upper bound of 3. The quality of NE 
solutions was further investigated in [4], where 
nearly tight bounds for the PoA were given, an 
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upper bound of 1.6428 and a lower bound of 
1.6416 (see also [9]). The parametric POA, which 
is the POA for subclasses of games where the size 
of no item exceeds a given value, was considered 
as well [5]. 

NE packings are related to outputs of the 
algorithm First Fit (FF) for bin packing [7]. FF is 
in fact an online algorithm that packs each item, 
in turn, into a minimum index bin where it fits 
(using an empty bin if there is no other option). 
It is not difficult to see that every NE is an output 
of FF; sort the bins of the NE by non-increasing 
loads, and create a list of items according to 
the ordering of bins. FF will create exactly the 
bins of the original packing. Interestingly, the 
POA is significantly smaller than the asymptotic 
approximation ratio of FF (which is equal to 
1.7 [7]). Note that the PoA is not equal to the 
approximation ratio of any natural algorithm for 
bin packing. 

Some intuition regarding the difference be- 
tween the asymptotic approximation ratio of First 
Fit and the POA of this class of games can be 
shown using a small example. Consider items of 
the following sizes (for a sufficiently small ¢ > 
0): é — 2 (small items), ; + © (medium items), 
and , + e (large items). The worst-case examples 
for FF are similar to this example, though the 
items of the first two types have a number of 
different sizes; small items can be slightly smaller 
or slightly larger than , and medium items can 
be slightly smaller or slightly larger than i. Given 
the item types defined above, assume that there 
are 6N items of each type (for some positive 
integer NV), when FF receives this input (sorted by 
non-decreasing size), it creates N bins with six 
small items packed into each bin, 3N bins with 
two medium items packed into each bin, and the 
remaining items are packed into dedicated bins. 
This packing is not an NE, as a medium item 
reduces its cost from 5 to approximately 2 if it 
joins a large item. Indeed, roughly speaking, if 
an NE packing consists of a large number of bins 
(compared to an optimal solution), a bin of this 
NE packing either has an item whose size exceeds 
5 or its load cannot be as small as approximately 
Z. This allows a tighter analysis. Interestingly, in 
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worst-case examples for the POA, medium items 
have sizes that are close to + instead of i. 


Related Results 


Bin packing games, where the cost of an item 
is defined differently, were studied. One option 
is to assign equal costs to all players (which 
are packed together into a valid bin) [3, 6]. A 
generalized version where each item has a pos- 
itive weight, and costs are based on cost sharing 
proportional to the weights of items that share a 
bin [3] was studied as well. The weights of items 
in the games described above (those of [1, 4]) 
are equal to their sizes. These are two classes 
of games, for which the POA turns out to be of 
interest. The POA for the class of games with 
equal weights is slightly (strictly) below 1.7, and 
in the case of general weights, the POA is equal 
to 1.7 [3]. Another topic of interest is the quality 
of other kinds of equilibria. Those are strong 
equilibria, which are solutions that are also re- 
silient to deviations of subsets of items reducing 
their costs, and Pareto optimal equilibria, where 
the solution is required to be weakly (or strictly) 
Pareto optimal, that is, there is no alternative 
packing where all items reduce their costs (or a 
packing where no item increases its cost and at 
least one item reduces it) [3]. For these last kinds 
of equilibria, the POA is still above 1.6 (but at 
most 1.7). 
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Problem Definition 


Consider having a set of resources E in a system. 
For each e € E, let de(-) be the delay per user 
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that requests its service, as a function of the total 
usage of this resource by all the users. Each such 
function is considered to be non—decreasing in 
the total usage of the corresponding resource. 
Each resource may be represented by a pair of 
points: an entry point to the resource and an exit 
point from it. So, each resource is represented by 
an arc from its entry point to its exit point and the 
model associates with this arc the cost (e.g., the 
delay as a function of the load of this resource) 
that each user has to pay if she is served by this 
resource. The entry/exit points of the resources 
need not be unique; they may coincide in order 
to express the possibility of offering joint service 
to users, that consists of a sequence of resources. 
Here, denote by V the set of all entry/exit points of 
the resources in the system. Any nonempty col- 
lection of resources corresponding to a directed 
path in G = (V, E) comprises an action in the 
system. 

Let N = [n] be a set of users, each willing 
to adopt some action in the system. Vi € N, let 
w; denote user i’s demand (e.g., the flow rate 
from a source node to a destination node), while 
IT; © 2 \ @ is the collection of actions, any of 
which would satisfy user i (e.g., alternative routes 
from a source to a destination node, if G repre- 
sents a communication network). The collection 
IT; is called the action set of user i and each 
of its elements contains at least one resource. 
Any vector r=(r,...,%m)€ = xj li 
is a pure strategies profile, or a configuration 
of the users. Any vector of real functions 
P = (/~1, P2,---, Pn) s.t. Vie [n], pi : TT; > [0, 1] 
is a probability distribution over the set of allow- 
able actions for user i (1.e., nel pi(ri) = 1), 
and is called a mixed strategies profile for the n 
users. 

A congestion model typically deals with users 
of identical demands, and thus, user cost function 
depending on the number of users adopting each 
action [1, 4, 6]. In this work the more general 
case is considered, where a weighted congestion 
model is the tuple ((wi)iew, UTi)ien, (de)ecz). 
That is, the users are allowed to have different 
demands for service from the whole system, 
and thus affect the resource delay functions in 
a different way, depending on their own weights. 


Selfish Unsplittable Flows: Algorithms for Pure Equilibria 


A weighted congestion game associated with this 
model, is a game in strategic form with the set 
of users N and user demands (w;)jen, the action 
sets (U7;);en and cost functions (Al. Jien rj ely; 
defined as follows: For any configuration r € IT 
and Ve € E, let Ae(r) = {i € N : e € r;} be the 
set of users exploiting resource e according to r 
(called the view of resource e wrt configura- 
tion r). The cost A‘ (r) of user i for adopting 
strategy r; € II; in a given configuration r is 
equal to the cumulative delay A,,(r) along this 
path: 

Ai(r) =An@) = Yo de(Ge(r)) (1) 


eer; 


where, Ve € E, O(r) = Vics.) Wi is the load 
on resource e wrt the configuration r. 

On the other hand, for a mixed strategies 
profile p, the expected cost of user i for adopting 
strategy r; € IT; is 


M(p)= >> P(p tr): 
r—teq-i 
. (2) 
DY de (G(r ri) 
ecr; 
where, r~? is a configuration of all the users 


except for user i, p~? is the mixed strategies 
profile of all users except for i, r~! @ 1; is the 
new configuration with user i choosing strategy 
rj, and P(p™,r~) = [] engi; Di (ry) is the 
occurrence probability of r~?. 


Remark I Here notation is abused a little bit and 
the model considers the user costs Xi as func- 
tions whose exact definition depends on the other 
users’ strategies: In the general case of a mixed 
strategies profile p, (2) is valid and expresses the 
expected cost of user i wrt p, conditioned on the 
event that i chooses path r;. If the other users 
adopt a pure strategies profile r~’, we get the 
special form of (1) that expresses the exact cost 
of user i choosing action 7;. 


A congestion game in which all users are 
indistinguishable (i.e., they have the same user 
cost functions) and have the same action set, is 
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called symmetric. When each user’s action set IT; 
consists of sets of resources that comprise (sim- 
ple) paths between a unique origin-destination 
pair of nodes (s;,¢;) in a network G = (V, E), 
the model refers to a network congestion game. 
If additionally all origin-destination pairs of the 
users coincide with a unique pair (s, f) one gets 
a single commodity network congestion game 
and then all users share exactly the same action 
set. Observe that a single-commodity network 
congestion game is not necessarily symmetric 
because the users may have different demands 
and thus their cost functions will also differ. 


Selfish Behavior 

Fix an arbitrary (mixed in general) strategies pro- 
file p for a congestion game ((wiien, Uj )ien; 
(de)eez). We say that p is a Nash Equilib- 
rium (NE) if and only if Vi € N,Vri,m € 
TT;, pi(r;) > 0 > ro (p) < Me (p). A configu- 
ration r € IT is a Pure Nash Equilibrium (PNE) 
if and only if (Vi € N, Va; € Tj,A,,(r) < Ax; 
(r7# ® m;) where, r~' ® 7; is the same config- 
uration with r except for user i that now chooses 
action 1;. 


Key Results 


In this section the article deals with the existence 
and tractability of PNE in weighted network 
congestion games. First, it is shown that it is 
not always the case that a PNE exists, even for 
a weighted single-commodity network conges- 
tion game with only linear and 2-wise linear (e.g., 
the maximum of two linear functions) resource 
delays. In contrast, it is well known [1, 6] that any 
unweighted (not necessarily single-commodity, 
or even network) congestion game has a PNE, for 
any kind of nondecreasing delays. It should be 
mentioned that the same result has been indepen- 
dently proved also by [3]. 


Lemma 1 There exist instances of weighted 
single-commodity network congestion games 
with resource delays being either linear or 2- 
wise linear functions of the loads, for which there 
is no PNE. 
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Theorem 2 For any weighted multi-commodity 
network congestion game with linear resource 
delays, at least one PNE exists and can be com- 
puted in pseudo-polynomial time. 


Proof Fix an arbitrary network G = (V, E) with 
linear resource/edge delays de(x) = dex + De, 
e€ E, ae, be => 0. Let r € IT be an arbitrary 
configuration for the corresponding weighted 
multi-commodity congestion game on G. For 
the configuration r consider the potential 
P(r) = C(r) + W(r), where 


C(r) = ¥> de(Ge(r))be(r) 


ecE 


= Do [ac62 (r) + bebe(r)]. 


ecE 


and 


Wr) = 0 Yo de(wi)wi 


I=] e€r; 


=>) YS adewi)wi 


ecF ier (r) 


= >. > (aew? + bew;) 


e€E icy (r) 


one concludes that 
B(r') — P(r) = Ww ()—M)], 


Note that the potential is a global system 
function whose changes are proportional to self- 
ish cost improvements of any user. The global 
minima of the potential then correspond to con- 
figurations in which no user can improve her 
cost acting unilaterally. Therefore, any weighted 
multi-commodity network congestion game with 
linear resource delays admits a PNE. Oo 


Applications 


In [5] many experiments have been conducted for 
several classes of pragmatic networks. The ex- 
periments show even faster convergence to pure 
Nash Equilibria. 
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Open Problems 


The Potential function reported here is polyno- 
mial on the loads of the users. It is open whether 
one can find a purely combinatorial potential, 
which will allow strong polynomial time for 
finding Pure Nash equilibria. 
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Problem Definition 


An algorithm is self-stabilizing if it eventually 
manifests correct behavior regardless of initial 
state. The general problem is to devise self- 
stabilizing solutions for a specified task. The 
property of self-stabilization is now known to 
be feasible for a variety of tasks in distributed 
computing. Self-stabilization is important for dis- 
tributed systems and network protocols subject to 
transient faults. Self-stabilizing systems automat- 
ically recover from faults that corrupt state. 

The operational interpretation of  self- 
stabilization is depicted in Fig. 1. Part (a) of the 
figure is an informal presentation of the behavior 
of a self-stabilizing system, with time on the x- 
axis and some informal measure of correctness 
on the y-axis. The curve illustrates a system 
trajectory, through a sequence of states, during 
execution. At the initial state, the system state 
is incorrect; later, the system enters a correct 
state, then returns to an incorrect state, and 
subsequently stabilizes to an indefinite period 
where all states are correct. This period of 
stability is disrupted by a transient fault that 
moves the system to an incorrect state, after 
which the scenario above repeats. Part (b) of the 
figure illustrates the scenario in terms of state 
predicates. The box represents the predicate 
true, which characterizes all possible states. 
Predicate C characterizes the correct states of the 
system, and £ C C depicts the closed legitimacy 
predicate. Reaching a state in £ corresponds to 
entering a period of stability in part (a). Given an 
algorithm A with this type of behavior, it is said 
that A self-stabilizes to £; when CL is implicitly 
understood, the statement is simplified to: A is 
self-stabilizing. 

Problem [3]. The first setting for self- 
stabilization posed by Dijkstra is a ring of n 
processes numbered 0 through n — 1. Let the 
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state of process i be denoted by x[i]. Communi- 
cation is unidirectional in the ring using a shared 
state model. An atomic step of process i can be 
expressed by a guarded assignment of the form 
g(xfi 6 1], x[i]) > xff] -= f@li o 1, xf). 
Here, © is subtraction modulo n, so that x[i © 1] 
is the state of the previous process in the ring with 
respect to process i. The guard g is a boolean 
expression; if g(x[i O 1],x[i]) is true, then 
process i is said to be privileged (or enabled). 
Thus in one atomic step, privileged process 
i reads the state of the previous process and 
computes a new state. Execution scheduling is 
controlled by a central daemon, which fairly 
chooses one among all enabled processes to 
take the next step. The problem is to devise g 
and f so that, regardless of initial states of x[i], 
0 <i <n, eventually there is one privilege and 
every process enjoys a privilege infinitely often. 


Complexity Metrics 

The complexity of self-stabilization is evaluated 
by measuring the resource needed for conver- 
gence from an arbitrary initial state. Most promi- 
nent in the literature of self-stabilization are met- 
rics for worst-case time of convergence and space 
required by an algorithm solving the given task. 
Additionally, for reactive self-stabilizing algo- 
rithms, metrics are evaluated for the stable behav- 
ior of the algorithm, that is, starting from a le- 
gitimate state, and compared to non-stabilizing 
algorithms, to measure costs of self-stabilization. 


Key Results 


Composition 

Many self-stabilizing protocols have a layered 
construction. Let { A; }"%j be a set programs 
with the property that for every state variable x, 
if program A; writes x, then no program Aj, for 
j >i, writes x. Programs in { A; Ve may 
read variables written by Aj, that is, they use 
the output of A; as input. Fair composition of 
programs B and C, written B[]C, assumes fair 
scheduling of steps of B and C. Let X; be the set 
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Self-Stabilization, Fig. 1 a 
Self-stabilization 
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transient 
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of variables read by A; and possibly written by 
jrl 
{ Aj } 6: 


Theorem 1 (Fair Composition [4]) Suppose A; 
is self-stabilizing to £; under the assumption that 
all variables in X; remain constant throughout 
any execution; then Ag[] Ai[]--- [] Am-1 self- 
stabilizes to {Lj }¥") 


; is : amt 
Fair composition with a layered set { A; }?" 5 


corresponds to sequential composition of phases 
in a distributed algorithm. For instance, let 
B be a self-stabilizing algorithm for mutual 
exclusion in a network that assumes the existence 
of a rooted, spanning tree and let algorithm 
C be a self-stabilizing algorithm to construct 
a rooted spanning tree in a connected network; 
then B []C is a self-stabilizing mutual exclusion 
algorithm for a connected network. 


Synchronization Tasks 

One question related to the problem posed in sec- 
tion “Problem Definition” is whether or not there 
can be a uniform solution, where all processes 
have identical algorithms. Dijkstra’s result for the 
unidirectional ring is a semi-uniform solution (all 
but one process have the same algorithm), using 
n states per process. The state of each process is 
a counter: process 0 increments the counter mod- 
ulo k, where k > n suffices for convergence; the 
other processes copy the counter of the preceding 
process in the ring. At a legitimate state, each 
time process 0 increments the counter, the result- 
ing value is different from all other counters in 


time ——————> 
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the ring. This ring algorithm turns out to be self- 
stabilizing for the distributed daemon (any subset 
of privileged processes may execute in parallel) 
when k > n. Subsequent results have established 
that mutual exclusion on a unidirection ring is 
(1) space per process with a non-uniform solu- 
tion. Deterministic uniform solutions to this task 
are generally impossible, with the exceptional 
case where n is and prime. Randomized uniform 
solutions are known for arbitrary n, using O(lg a) 
space where a is the smallest number that does 
not divide n. Some lower bounds on space for 
uniform solutions are derived in [7]. Time com- 
plexity of Dijkstra’s algorithm is O(n”) rounds, 
and some randomized solutions have been shown 
to have expected O(n) convergence time. 

Dijkstra also presented a solution to mutual 
exclusion for a linear array of processes, using 
O(1) space per process [3]. This result was later 
generalized to a rooted tree of processes, but 
with mutual exclusion relaxed to having one 
privilege along any path from root to leaf. Sub- 
sequent research built on this theme, showing 
how tasks for distributed wave computations have 
self-stabilizing solutions. Tasks of phase syn- 
chronization and clock synchronization have also 
been solved. See reference [9] for an example of 
self-stabilizing mutual exclusion in a multipro- 
cessor shared memory model. 


Graph Algorithms 

Communication networks are commonly 
represented with graph models and the need 
for distributed graph algorithms that tolerate 
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transient faults motivates study of such tasks. 
Specific results in this area include self- 
stabilizing algorithms for spanning trees, center- 
finding, matching, planarity testing, coloring, 
finding independent sets, and so forth. Generally, 
all graph tasks can be solved by self-stabilizing 
algorithms: tasks that have network topology and 
possibly related factors, such as edge weights, 
for input, and define outputs to be a function of 
the inputs, can be solved by general methods for 
self-stabilization. These general methods require 
considerable space and time resource, and may 
also use stronger model assumptions than needed 
for specific tasks, for instance unique process 
identifiers and an assumed bound on network 
diameter. Therefore research continues on graph 
algorithms. 

One discovery emerging from research on 
self-stabilizing graph algorithms is the difference 
between algorithms that terminate and those that 
continuously change state, even after outputs 
are stable. Consider the task of constructing 
a spanning tree rooted at process r. Some 
algorithms self-stabilize to the property that, 
for every p #1, the variable u, refers to p’s 
parent in the spanning tree and the state remains 
unchanged. Other algorithms are self-stabilizing 
protocols for token circulation with the side- 
effect that the circulation route of the token 
establishes a spanning tree. The former type 
of algorithm has O(lgn) space per process, 
whereas the latter has O(lgé) where 6 is the 
degree (number of neighbors) of a process. This 
difference was formalized in the notion of silent 
algorithms, which eventually stop changing any 
communication value; it was shown in [5] for 
the link register model that silent algorithms for 
many graph tasks have {2 (lg) space. 


Transformation 

The simple presentation of [3] is enabled by the 
abstract computation model, which hides details 
of communication, program control, and atom- 
icity. Self-stabilization becomes more compli- 
cated when considering conventional architec- 
tures that have messages, buffers, and program 
counters. A natural question is how to transform 
or refine self-stabilizing algorithms expressed in 
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abstract models to concrete models closer to 
practice. As an example, consider the problem 
of transforming algorithms written for the cen- 
tral daemon to the distributed daemon model. 
This transformation can be reduced to finding 
a self-stabilizing token-passing algorithm for the 
distributed daemon model such that, eventually, 
no two neighboring processes concurrently have 
a token; multiple tokens can increase the effi- 
ciency of the transformation. 


General Methods 

The general problem of constructing a self- 
stabilizing algorithm for an input nonreactive task 
can be solved using standard tools of distributed 
computing: snapshot, broadcast, system reset, 
and synchronization tasks are building blocks 
so that the global state can be continuously 
validated (in some fortunate cases £ can be 
locally checked and corrected). These building 
blocks have self-stabilizing solutions, enabling 
the general approach. 


Fault Tolerance 

The connection between self-stabilization and 
transient faults is implicit in the definition. Self- 
stabilization is also applicable in executions that 
asynchronously change inputs, silently crash and 
restart, and perturb communication [10]. One 
objection to the mechanism of self-stabilization, 
particularly when general methods are applied, is 
that a small transient fault can lead to a system- 
wide correction. This problem has been inves- 
tigated, for example in [8], where it is shown 
how convergence can be optimized for a limited 
number of faults. Self-stabilization has also been 
combined with other types of failure tolerance, 
though this is not always possible: the task of 
counting the number of processes in a ring has 
no self-stabilizing solution in the shared state 
model if a process may crash [1], unless a failure 
detector is provided. 


Applications 


Many network protocols are self-stabilizing 
by the following simple strategy: periodically, 
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they discard current data and regenerate it 
from trusted information sources. This idea 
does not work in purely asynchronous systems; 
the availability of real-time clocks enables 
the simple strategy. Similarly, watchdogs with 
hardware clocks can provide an effective basis for 
self-stabilization [6]. 
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Problem Definition 


Semi-supervised learning [1,4,5, 8, 12] refers to 
the problem of using a large unlabeled data set U 
together with a given labeled data set L in order 
to generate prediction rules that are more accurate 
on new data than would have been achieved using 
just L alone. Semi-supervised learning is moti- 
vated by the fact that in many settings (e.g., doc- 
ument classification, image classification, speech 
recognition), unlabeled data is plentiful but la- 
beled data is more limited or expensive, e.g., due 
to the need for human labelers. Therefore, one 
would like to make use of the unlabeled data if 
possible. 

The general idea behind semi-supervised 
learning is that unlabeled data, while missing 
the labels, nonetheless often contains useful 
information. As an example, suppose one 
believes the correct decision boundary for 
some classification problem should be a linear 
separator that separates most of the data by a 
large margin. By observing enough unlabeled 
data to estimate the probability mass near to any 
given linear separator, one could in principle then 
discard separators in advance that slice through 
dense regions and instead focus attention on just 
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those that indeed separate most of the distribution 
by a large margin. This is the high-level idea 
behind semi-supervised SVMs. Alternatively, 
suppose data objects can be described by two 
different “kinds” of features, and one believes 
that each kind should be sufficient to produce an 
accurate classifier. Then one might want to train 
a pair of classifiers and use unlabeled data for 
which one classifier is confident but the other 
is not to bootstrap, labeling such examples with 
the confident classifier and then feeding them as 
training data to the less-confident classifier. This 
is the high-level idea behind Co-Training. Or, if 
one believes “similar examples should generally 
have the same label,’ one might construct a 
graph with an edge between examples that are 
sufficiently similar and aim for a classifier that 
is correct on the labeled data and has a small cut 
value on the unlabeled data; this is the high-level 
idea behind graph-based methods. (These will 
all be discussed in more detail later.) General 
surveys of semi-supervised learning appear in 
[5, 12]. 


A Formal Framework 

We now present a formal model for analyzing 
semi-supervised learning due to Balcan and 
Blum [1]. This model was developed to provide 
a unified explanation for a wide range of 
semi-supervised learning algorithms including 
the semi-supervised SVMs, Co-Training, and 
graph-based methods mentioned above. Before 
describing it, however, we first describe the 
classic PAC and agnostic learning models for 
supervised learning that this model builds on. 

In the PAC and agnostic learning models, data 
is assumed to be drawn iid from some fixed but 
initially unknown distribution D over an instance 
space ¥ and labeled by some unknown target 
function c* : 4 — {0,1}. The error of some 
hypothesis function h is defined as err(h) = 
Pry~p[h(x) 4 c*(x)]. In the PAC model (also 
known as the realizable case), we assume that c* 
is a member of some known class of functions C, 
and we say that an algorithm PAC-learns C if for 
any given €,6 > 0, with probability > 1 — 4, it 
produces a hypothesis / such that err() < €. 
In the agnostic case, we do not assume that 
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c* e€ C and instead aim to achieve error close 
to inf ¢eclerr(f)]. 

The PAC and agnostic learning models in 
essence assume that one’s prior beliefs about the 
target be described in terms of a class of func- 
tions C. In order to capture the reasoning used 
in semi-supervised learning, however, we need to 
also describe beliefs about the relation between 
the target function and the data distribution. This 
is done in the model of Balcan and Blum [1] via 
a notion of compatibility y between a hypothesis 
h and a distribution D. Formally, y maps pairs 
(h, D) to [0,1] with y(h, D) = 1 meaning that 
h is highly compatible with D and y(h, D) = 0 
meaning that is very incompatible with D. The 
quantity 1 — y(h, D) is called the unlabeled error 
rate of h and denoted erryy(2). Note that for x 
to be useful, it must be estimatable from a finite 
sample; to this end, y is further required to be 
an expectation over individual examples. That is, 
overloading notation for convenience, we require 
x(h, D) = Ex~p[y(h, x)], where ¥:C x ¥ > 
[0, 1]. As with the class C, one can either assume 
that the target is fully compatible (erry, (c*) = 
0) or instead aim to do well as a function of 
how compatible the target is. The case that we 
assume c* € C and etryni(c*) = 0 is termed the 
“doubly realizable case.” The concept class C and 
compatibility notion y are both viewed as known. 


Examples 

Suppose we believe the target should separate 
most data by a large margin y. We can represent 
this belief by defining y(1, x) = 0 if x is within 
distance y of the decision boundary of / and 
x(h, x) = 1 otherwise. In this case, erryn(/) will 
denote the probability mass of D within distance 
y of h’s decision boundary. Alternatively, if we 
do not wish to commit to a specific value of y, 
we could define y(, x) to be a smooth function 
of the distance of x to the separator defined 
by h. As a very different example, in co-training 
(described in more detail below), we assume each 
example can be described using two “views” 
that each are sufficient for classification, that is, 
there exist cf,c} such that for each example 
x = (X1,%2), we have cf(x1) = c3(x2). We 
can represent this belief by defining a hypothesis 
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h = (hy,h2) to be compatible with an example 
(x1,X2) if hy(x1) = h2(x2) and incompatible 
otherwise; erry (2) is then the probability mass 
of examples, under D, where the two halves of h 
disagree. 


Intuition 

In this framework, the way that unlabeled data 
helps in learning can be intuitively described as 
follows. Suppose one is given a concept class C 
(such as linear separators) and a compatibility 
notion x (such as penalizing 4 for points within 
distance y of the decision boundary). Suppose 
also that one believes c* € C (or at least is close) 
and that erryi(c*) = O (or at least is small). 
Then, unlabeled data can help by allowing one to 
estimate the unlabeled error rate of allh € C, 
thereby in principle reducing the search space 
from C (all linear separators) down to just the 
subset of C that is highly compatible with D. The 
key challenge is how this can be done efficiently 
(in theory, in practice, or both) for natural notions 
of compatibility, as well as identifying types of 
compatibility that data in important problems can 
be expected to satisfy. 


Key Results 


The following, from [1], illustrate formally how 
unlabeled data can help in this model. Fix some 
concept class C and compatibility notion 7. Given 
a labeled sample L, define érr(h) to be the frac- 
tion of mistakes of / on L. Given an unlabeled 
sample U, define 7(,U) = Ex~u[y(h, x)] and 
define Giuni(2) = 1 — y(h,U). That is, érr(h) 
and €ffyn() are the empirical error rate and 
unlabeled error rate of h, respectively. Finally, 
given a > O, define Cp,,(a) to be the set of 
functions f € C such that erryy(f) < @. 


Theorem 1 ({1]) Jfc* € C then with probability 
at least 1 —6, for a random labeled set L and un- 
labeled set U, the h € C that optimizes €tfyq(h) 
subject to &tr(h) = 0 will have err(h) < € for 


2 4 
U| > —|In[C| + In], 
w= 5 [ime +in5 | 
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1 2 
IE] = £[In|Cp,x(erin(e*) + 26) + In 5. 


Equivalently, for |U| satisfying the above bound, 
for any |L|, with probability at least 1 — 6, the 
h €C that optimizes €fty,(h) subject to €rt(h) = 
0 has 


1 2 
err(h) = |L| E ICp,x(etTun(c*) +2€)| +In | = 


One can view Theorem | as bounding the number 
of labeled examples needed to learn well as a 
function of the “helpfulness” of the distribution 
D with respect to y, for sufficiently large U. 
Namely, a helpful distribution is one in which 
Cp,,(a) is small for a slightly larger than the 
compatibility of the true target function, so we 
do not need much labeled data to identify a good 
function among those in Cp, y (a). 

For infinite hypothesis classes, one needs to 
consider both the complexity of the class C and 
the complexity of the compatibility notion y. 
Specifically, given h e€ C, define y;,(x) = 
x(h,x) and let VCdim(y(C)) denote the VC- 
dimension of the set {y,|h © C}. A sample 
complexity bound from [1] based on €-cover size 
is the following. 


Theorem 2 ({1]) Assume c* € C and let p be 
the size of the smallest set of functions H such 
that every function in Cp,y(ettun(c*) + €/3) is 
€/6-close to some function in H. Then |U| = 


O ( rence, esi GC face 1 in 3) ai 
€ € to 


|L| O (2 In 5) is sufficient to identify a 
function f € C of error at most € with probability 
at least 1 — 6. 


Finally, for the general (agnostic) case that c* ¢ 
C, we can define a regularizer based on empirical 
unlabeled error rates, and then get good bounds 
for optimizing a combination of the empirical 
labeled error and the regularization term. Specif- 
ically, for a hypothesis h, define N (h) to be the 
number of ways of partitioning the first | L| points 
in U using {f €C : €tfun(/) < Gtfun(A)}. Then 
we have 
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Theorem 3 ({1]) With probability at least 1 — 6, 
the hypothesis 


h = arg min|err(h’) + R(h’)]|, where 


R(t’) = [AND 


satisfies 


err(h) < min [err(h’) + R(A’)] +5 wd 
h’ec 
Co-Training 
Co-Training is a semi-supervised learning 


method due to [4] for settings in which examples 
can be thought of as having two “views,” that is, 
two distinct types of information. For example, 
in classifying webpages (e.g., into student home 
page, faculty member home page, course home 
page, etc.), one could use the words on the page 
itself, but one could also use information from 
links pointing fo that page [4]. Or, in classifying 
visual images, one might have two cameras or 
even two different filters or preprocessing steps 
on images from the same camera [9]. Or, in 
understanding video, one can use visual images 
and spoken dialogue [7]. In such settings, one 
can think of an example x as a pair x = (x1, X2). 
The idea of Co-Training is that if each view is in 
principle enough to achieve a good classification 
by itself, but each provides somewhat different 
information, then one can hope to improve 
performance using unlabeled data. Specifically, 
in Co-Training, one maintains two hypotheses, 
one for each view (e.g., a hypothesis that 
classifies webpages based on the text on the page 
itself and one that classifies webpages based on 
information from links pointing to the page). A 
hypothesis pair h = (h,,h2) is compatible with 
an example (x1, x2) if 4y(x1) = h2(x2) and is 
incompatible otherwise. So the unlabeled error 
rate of a hypothesis (pair) h = (h,,h2) is the 
probability mass of examples (x;, x2) on which 
the two parts of / disagree. 

In practice, there are two primary ways that 
this notion of compatibility is used to learn from 
a small amount of labeled data and a large 
amount of unlabeled data. The first is iterative 
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co-training, introduced in [4]. In iterative co- 
training, a small labeled sample L is used 
to produce predictors for each view that are 
confident in some part of their respective input 
spaces and not confident in other parts. Then, the 
algorithm searches through the (large) unlabeled 
set U to find examples x = (x1,X2) for which 
one classifier is confident and the other is not. 
These examples are labeled by the confident 
classifier and handed to the less-confident 
classifier to improve its predictor. The other 
primary method is to optimize a global objective 
that combines accuracy over the labeled sample 
L with agreement over the unlabeled sample 
U. That is, one searches for the hypothesis 
pair fh that minimizes érr(h) + A€tfun(h) for 
some regularization parameter A [6, 10]. This is 
generally a non-convex optimization problem, 
and so various heuristics are typically applied to 
perform the optimization. 

Theoretically, the guarantees for Co-Training 
are strongest when the data satisfies indepen- 
dence given the label (with some probability p, a 
random positive example (x1, 2) is drawn from 
Dt x Dt , and with probability | — p, arandom 
negative example is drawn from D; x D,) and in 
the realizable case (there exist targets cf,cz €C 
such that all examples (x;,x2) in the support 
of the distribution satisfy cf(x1) = c3(x2)). 
Specifically, two key results are 


Theorem 4 ([4]) Any class C that is efficiently 
PAC-learnable from random classification noise 
is efficiently learnable from unlabeled data alone 
in the realizable Co-Training setting, if data satis- 
fies independence given the label and one is given 
an initial weakly useful predictor h,(x,). 


Here, h is a weakly useful predictor of a function 
f if for some € > 1/poly(n) we have both (a) 
Pry~p[h(x) = 1] > € and (b) Pry pl f(x) = 
h(x) = 1] = Prx~p[ f(x) = 1]+¢. Theorem 4 
implies that if one is able to use a small labeled 
sample to produce an initial hypothesis that gives 
a slight “edge” in predicting the target beyond 
just the overall class probabilities, then under 
independence given the label one can boost that 
to a high-accuracy predictor from just unlabeled 
data. 
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Furthermore, ignoring computation time, 
under independence given the label, any class 
of finite VC-dimension is learnable from a 
single labeled example. In the case of linear 
separators, this can be done computationally 
efficiently. 


Theorem 5 ([1]) Any class C of finite VC- 
dimension is learnable from polynomially 


many unlabeled examples and a_ single 
labeled example if D_ satisfies independence 
given the label. Furthermore, for linear 


separators this can be done in polynomial 
time. 


Semi-supervised SVMs 

Semi-supervised SVMs (also called transductive 
SVMs) [8, 11] aim to find a linear separator 
that separates both the labeled sample L and 
the unlabeled sample U by the largest possible 
margin. That is, one wants to find a separator 
such that for y as large as possible, all labeled 
examples are on the correct side of the separator 
by distance at least y and all unlabeled examples 
are on some side of the separator by distance 
at least y. In practice, one combines a large- 
margin objective with a hinge-loss penalty for 
labeled examples that fail to satisfy the condition, 
and a “hat-loss” penalty for unlabeled examples 
that fail to satisfy the condition. Formally, the 
goal is to minimize c;w? w+ c2 Pe aj + 
C3 Dieu B; subject to (w?x;)y; > 1 — a; for 
all (x;, yi) € L and (w? x;)¥; => 1— B; for all 
xj; € U (and a;,B; > 0), where y; € {-1, 1} 
is the (known) label of x; € L and y; € {-1, ]} 
is a variable representing the algorithm’s guess 
of the label of x; € U. While the optimization 
problem is NP-hard, a number of heuristics have 
been developed. For example, Joachims [8] uses 
an iterative labeling heuristic to approximately 
optimize the objective. Semi-supervised SVMs 
have been shown to achieve high accuracy in 
a number of text classification domains where 
unlabeled data is plentiful [8]. 


Graph-Based Methods 
Graph-based methods [3, 13] can be viewed 
as a (transductive) semi-supervised version of 
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nearest-neighbor learning. In these methods, one 
creates a graph with a vertex for each example in 
L UU and an edge between two examples x, x’ 
if they are deemed to be sufficiently “similar” (or 
with edge weights based on how similar they are 
deemed to be). Similarity can be directly based 
on distance between the examples in the input 
space or given by some provided kernel function 
k(x, x’). Given the labels for the examples in L, 
one then finds a “most compatible” labeling for 
the examples in U, based on the belief that similar 
examples will typically have the same label. 
Specifically, in the mincut approach of [3], the 
labeling h produced is the cut of least total weight 
subject to agreeing with the known labels on 
examples in L or equivalently the cut that agrees 
with L minimizing )),—(,,.7) WelA(x) — A(x’)]. 
In the algorithm of [13], in order to produce 
a smoother solution, the algorithm instead 
views the graph as an electrical network, 
finding the cut agreeing with L that minimizes 


ae ee, We (h(x) ~ h(x’). 


Open Problems 


There are a number of open problems in 
developing computationally — efficient 
supervised learning algorithms. For example, 
can one extend the algorithm of Theorem 5 
for Co-Training with linear separators to 
weaker conditions than independence given 
the label, while maintaining computational 
efficiency? (Note: A number of weaker 
conditions are known to produce good sam- 
ple bounds if computational considerations 
are ignored [2].) More broadly, can one 
develop efficient algorithms for other classes 
or notions of compatibility that meet the 
cover-based sample complexity bounds of 
Theorem 2? Additional open problems are given 
in [1]. 


semi- 
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Problem Definition 


The (balanced) separator problem asks for a cut 
of minimum (edge)-weight in a graph, such that 
the two shores of the cut have approximately 
equal (node)-weight. 

Formally, given an undirected graph G = 
(V, E), with a nonnegative edge-weight function 
c: E — R4, a nonnegative node-weight func- 
tion z : V > R,, and aconstant b < 1/2, acut 
(S : V\S) is said to be b -balanced, or a (b, 1—b) 
-separator, if br(V) < a(S) < (1 —b)x(V) 
(where z(S) stands for )°,<5 2(v) ). 


Problem 1 (b-balanced separator) 

INPUT: Edge- and node-weighted graph G = 
(V, E,c, 2), constant b < 1/2. 

OuTPUT: A b-balanced cut (S : V \ S). Goal: 
minimize the edge weight c(5(S)). 


Closely related is the product sparsest cut prob- 
lem. 


Problem 2 ((Product) Sparsest cut) 

INPUT: Edge- and node-weighted graph G = 
(V, E,c,7). 

OUTPUT: A cut (S : V \ S) minimizing the ratio- 
cost £(8(S)))/(r(S)x(V \ S)). 


Problem 2 is the most general version of spars- 
est cut solved by Leighton and Rao. Setting all 
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node weights are equal to | leads to the uniform 
version, Problem 3. 


Problem 3 ((Uniform) Sparsest cut) 
INPUT: Edge-weighted graph G = (V, E,c). 
OUTPUT: A cut (S : V \ S) minimizing the ratio- 


cost (c(5(S)))/(SIIV \ SI). 


Sparsest cut arises as the (integral version of 
the) linear programming dual of concurrent mul- 
ticommodity flow (Problem 4). An instance of 
a multicommodity flow problem is defined on 
an edge-weighted graph by specifying for each 
of k commodities a source s; € V,a sink t; € 
V, and a demand D,. A feasible solution to the 
multicommodity flow problem defines for each 
commodity a flow function on £, thus routing 
a certain amount of flow from 5; to t;. The edge 
weights represent capacities, and for each edge 
e, a capacity constraint is enforced: the sum of 
all commodities’ flows through e is at most the 
capacity c(e). 


Problem 4 (Concurrent multicommodity flow) 
INPUT: Edge-weighted graph G = (V,E,c), 
commodities (51,t;, D1),... (8%, th, Dx). 
OUTPUT: A multicommodity flow that routes 
fD; units of commodity i from s; to t; for each i 
simultaneously, without violating the capacity of 
any edge. Goal: maximize f. 


Problem 4 can be solved 
time by linear programming, 


in polynomial 
and approx- 


imated arbitrarily well by several more 
efficient combinatorial algorithms (section 
“Implementation”). The maximum value f 
for which there exists a  multicommod- 


ity flow is called the max-flow of the in- 
stance. The min-cut is the minimum atio 
(c(6(S)))/(D(S, V \ S)), where D(S,V \ S) = 
Vizts;t;}30s|<1 Di- This dual interpretation 
motivates the most general version of the 
problem, the nonuniform sparsest cut (Prob- 
lem 5). 


Problem 5 ((Nonuniform) Sparsest cut) 
INPUT: Edge-weighted graph G = (V,E,c), 
commodities (s;,t;, D,),... (8%, th, Dx). 
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OUTPUT: A min-cut (S : V \ S), that is, a cut of 
minimum ratio-cost (c(6($)))/(D(S, V \ S)). 


(Most literature focuses on either the uniform or 
the general nonuniform version, and both of these 
two versions are sometimes referred to as just the 
“sparsest cut” problem.) 


Key Results 


Even when all (edge- and node-) weights are 
equal to 1, finding a minimum-weight b-balanced 
cut is NP-hard (for b = 1/2, the problem 
becomes graph _ bisection). Leighton and 
Rao [23, 24] give a_pseudo-approximation 
algorithm for the general problem. 


Theorem 1 There is  a_ polynomial-time 
algorithm that, given a weighted graph G = 
(V,E,c,2), b < 1/2 and b! < min{b, 1/3}, 
finds a _b'-balanced cut of weight O((log n)/(b— 
b’)) times the weight of the minimum b-balanced 
cut. 


The algorithm solves the sparsest cut problem 
on the given graph, puts aside the smaller-weight 
shore of the cut, and recurses on the larger-weight 
shore until both shores of the sparsest cut found 
have weight at most (1—b’)(G). Now the larger- 
weight shore of the last iteration’s sparsest cut is 
returned as one shore of the balanced cut, and ev- 
erything else as the other shore. Since the sparsest 
cut problem is itself NP-hard, Leighton and Rao 
first required an approximation algorithm for this 
problem. 


Theorem 2 There is a_ polynomial-time 
algorithm with approximation ratio O(log p) 
for product sparsest cut (Problem 2), where 
p denotes the number of nonzero-weight nodes in 
the graph. 


This algorithm follows immediately from Theo- 
rem 3. 


Theorem 3 There is a polynomial-time algo- 
rithm that finds a cut (S : V \ S) with ratio-cost 
(c(6(S)))/(a(S)x(V \ S)) € OCF log p), where 
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f is the max-flow for the product multicommodity 
flow and p the number of nodes with nonzero 
weight. 


The proof of Theorem 3 is based on solving 
a linear programming formulation of the multi- 
commodity flow problem and using the solution 
to construct a sparse cut. 


Related Results 

Shahrokhi and Matula [27] gave a max-flow min- 
cut theorem for a special case of the multicom- 
modity flow problem and used a similar LP- 
based approach to prove their result. An O(log n) 
upper bound for arbitrary demands was proved by 
Aumann and Rabani [6] and Linial et al. [26]. In 
both cases, the solution to the dual of the mul- 
ticommodity flow linear program is interpreted 
as a finite metric and embedded into ¢; with 
distortion O(log), using an embedding due to 
Bourgain [10]. The resulting £; metric is a convex 
combination of cut metrics, from which a cut can 
be extracted with sparsity ratio at least as good as 
that of the combination. 

Arora et al. [5] gave an O(,/logn) pseudo- 
approximation algorithm for (uniform or product- 
weight) balanced separators, based on a semidefi- 
nite programming relaxation. For the nonuniform 
version, the best bound is O(,/logn log logn) 
due to Arora et al. [4]. Khot and Vishnoi [18] 
showed that, for the nonuniform version of the 
problem, the semidefinite relaxation of [5] has 
an integrality gap of at least (loglogn)!/—§ 
for any 6 > O, and further, assuming their 
Unique Games Conjecture, that it is NP-hard 
to (pseudo)-approximate the balanced separator 
problem to within any constant factor. The SDP 
integrality gap was strengthened to (2 (log logn) 
by Krauthgamer and Rabani [20]. Devanur 
et al. [11] show an {2(loglogn) integrality gap 
for the SDP formulation even in the uniform 
case. 


Implementation 

The bottleneck in the balanced separator algo- 
rithm is solving the multicommodity flow linear 
program. There exists a substantial amount of 
work on fast approximate solutions to such linear 
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programs [19, 22, 25]. In most of the follow- 
ing results, the algorithm produces a (1 + €)- 
approximation, and its hidden constant depends 
on €~*. Garg and Kénemann [15], Fleischer [14] 
and Karakostas [16] gave efficient approximation 
schemes for multicommodity flow and related 
problems, with running times O((k + m)m) [15] 
and O(m?) [14, 16]. Bencztir and Karger [7] gave 
an O(logn) approximation to sparsest cut based 
on randomized minimum cut and running in time 
O(n?). The current fastest O(logn) sparsest cut 
(balanced separator) approximation is based on 
a primal-dual approach to semidefinite program- 
ming due to Arora and Kale [3], and runs in time 
O(m + n3/2)(O(m + n3/2), respectively). The 
same paper gives an O(,/log 7) approximation in 
time O(n?)(O(n?), respectively), improving on 
a previous O(n?) algorithm of Arora et al. [2]. 
If an O(log? n) approximation is sufficient, then 
sparsest cut can be solved in time O(n3/?), and 
balanced separator in time O(m + n3/?) [17]. 


Applications 


Many problems can be solved by using a bal- 
anced separator or sparsest cut algorithm as a sub- 
routine. The approximation ratio of the resulting 
algorithm typically depends directly on the ra- 
tio of the underlying subroutine. In most cases, 
the graph is recursively split into pieces of bal- 
anced size. In addition to the O(logn) approxi- 
mation factor required by the balanced separator 
algorithm, this leads to another O(log) factor 
due to the recursion depth. Even et al. [12] im- 
proved many results based on balanced separators 
by using spreading metrics, reducing the ap- 
proximation guarantee to O(log n log logn) from 
O(log? n). 

Some applications are listed here; where no 
reference is given, and for further examples, 
see [24]. 


¢ Minimum cut linear arrangement and mini- 
mum feedback arc set. One single algorithm 
provides an O(log’ n) approximation for both 
of these problems. 
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¢ Minimum chordal graph completion and elim- 
ination orderings [1]. Elimination orderings 
are useful for solving sparse symmetric linear 
systems. The O(log” n) approximation algo- 
rithm of [1] for chordal graph completion has 
been improved to O(logn log logn) by Even 
et al. [12]. 

e Balanced node cuts. The cost of a balanced 
cut may be measured in terms of the weight of 
nodes removed from the graph. The balanced 
separator algorithm can be easily extended to 
this node-weighted case. 

e« VLSI layout. Bhatt and Leighton [8] stud- 
ied several optimization problems in VLSI 
layout. Recursive partitioning by a balanced 
separator algorithm leads to polylogarithmic 
approximation algorithms for crossing num- 
ber, minimum layout area and other problems. 

¢ Treewidth and pathwidth. Bodlaender et al. [9] 
showed how to approximate treewidth within 
O(logn) and pathwidth within O(log? n) by 
using balanced node separators. 

e Bisection. Feige and Krauthgamer [13] gave 
an O(a log) approximation for the minimum 
bisection, using any a-approximation algo- 
rithm for sparsest cut. 


Experimental Results 


Lang and Rao [21] compared a variant of the 
sparsest cut algorithm from [24] to methods used 
in graph decomposition for VLSI design. 


Cross-References 
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Problem Definition 


The problem is to detect specific patterns in 
string and patterns in string pairs for discov- 
ery of sequence and spatial motifs in membrane 
proteins. A spatial interaction motif of residue 
pair X-Y is defined as a pattern in which a 
character or residue of type X is found interact- 
ing with a residue of type Y on two strings or 
sequences (Fig. la). We define a sequence pair 
XYk as a pattern in which a residue of type Y 
is found at the k-th position from a residue of 
type X along a single sequence (Fig. 1b). The 
propensity P(X,Y) of residue pairing XY is 
P(X,Y) = en where fops(X, Y) is the 
observed count of XY patterns and E[ f(X, Y)] 
is the expected count of XY patterns according 
to some random null model. We define a motif 
as a residue pair with propensity >1.0 (or greater 
than some other predefined limit) and statistically 
significant. 


1946 


Y 


+1 42 43 


Sequence and Spatial Motif Discovery in Short Se- 
quence Fragments, Fig. 1 Examples of spatial and 
sequence patterns. (a) Two X-Y spatial patterns on in- 
teracting sequences. (b) An X Y3 sequence pattern 


The null model for calculating E[ f(X, Y)] is 
critical for motif detection. For short sequence 
fragments, the null model for spatial motif de- 
tection cannot be the 7 distribution as was used 
in [13], since the assumption of Gaussian distri- 
bution is not valid for short sequences. The null 
model for sequence motif detection cannot be the 
binomial distribution as was used in [4, 10], since 
the assumption of drawing from a universal pop- 
ulation with replacement is unrealistic for short 
sequence fragments. Instead, we use a combina- 
torial model called the permutation model more 
effective for discoveries of motifs [S—7]. This null 
model is similar for both pair types: the residues 
within each sequence are exhaustively and in- 
dependently permuted without replacement, and 
each permutation occurs with equal probabil- 
ity. This model has been called the internally 
random model [6]. This permutation model is 
further extended to positional null model to cor- 
rect position-specific bias in residue distributions 


[6]. 


Objective. Our task is to determine explicit for- 
mulas to calculate E[ f(X, Y)] for each pair type 
under different conditions. Explicit probability 
distributions for f(X,Y) can also be found for 
many special cases, which will allow for the 
calculation of statistical significance p-values. 
These formulas can also be used to study whole 
datasets of short sequences. 
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Key Results 
Spatial Motifs by Permutation Model 


Expectation for Interacting Residues of the 

Same Type 

For cases in which X is the same as Y (ie., 
X-X pairs), let x; be the number of residues 
of type X in the first sequence, x2 the number 
of residues of type X in the second sequence, 
and 7 the common length of the sequence pair. 
The probability Pyx (i) of exactly i = f(X, X) 
number of X-X contacts follows a hypergeomet- 
ric distribution: Pyy(i) = (A): Its 
expectation E[ f(X, X)] is then: 


X1X2 
ze 


E[f(X, X)] = 


Expectation for Interacting Residues of 

Different Types 

When X + Y, the number of X-Y contacts in the 
permutation model for one sequence pair is the 
sum of two dependent hypergeometric variables, 
one variable for type X residues in the first 
sequence s; and type Y in the second sequence 
$2, and another variable for type Y residues in 
Ss; and type X in sz. The expected number of 
X-Y contacts E[ f(X, Y)] is the sum of the two 
expected values E[f(X,Y|X € 51,Y € s2)] + 
Lf (X, Y|Y € 51, X € s2)]: 


X1y2 + y1%X2 


tA, Y)] = 2 4 A 


where x, and x2 are the numbers of residues of 
type X in the first and second sequence, respec- 
tively, y, and yz are the numbers of residues of 
type Y in the first and second sequence, respec- 
tively, and / is the length of the sequence pair. 


Significance of Spatial Motifs 
To calculate the statistical significance in the 
form of p-value of interacting residues of the 
same type, two-tailed p-values can be calcu- 
lated using the hypergeometric distribution for a 
dataset of sequence pairs. 

For interacting residues of different types, the 
formula to determine the p-value for a specific 
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observed number of X-Y contacts is more com- 
plex because of the dependency. We define a 
3-element multinomial function M(a,b,c) = 
BIE Eeyt where M(a,b,c) = Oifa—b—c < 
0. This represents the number of distinct permuta- 
tions, without replacement, in a multiset of size a 
containing three different types of elements, with 


M h,i)-M ik 
P(h.i, j,k) = (x1,h,i)+-M(y1, j,k) 
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number count b, c, and a — b —c of each of the 
three element types. 

The probability P(H,i, 7,) of inter-sequence 
matches, namely, the probability of h X-X con- 
tacts, i X-Y contacts, 7 Y-X contacts, and k Y- 
Y contacts occurring in a random permutation is 
(Fig. 2) 


-M(l— x1 —y1,X%2 —h—j,y2-i-k) 


M(I, x2, y2) 


The marginal probability Pyy(m) that there are 
a total of i + 7 =m X-Y contacts is 


y 
x, x,—h(m-i) 


Pyy(m)=)) >) YO Plh,i,m—i,k). 


h=0 i=0 k=0 


There are x; possible values for h, one for each 
residue of type X on sequence 1; x; —/ possible 
values for i, once h has been determined; and 
yy — j = y1 — (m — i) possible values for k, 
once i has been determined. The i number of X - 
Y contacts plus the m—i number of Y -X contacts 
will sum to the m number of contacts desired. 

This closed-form formula allows calculation 
of p-values analytically. The running time is 
O(/*), due to the presence of 3 summations 
and /! in the summand. For short sequences, the 
computing cost is not prohibitive. 


Sequences of Different Lengths 

The requirement for interacting sequences to be 
of the same length may be relaxed by introducing 
a 21st “dummy” amino acid type. All unpaired 
residues in the longer member of a sequence 
pair will be paired to this extra amino acid type, 
and our standard method can be applied to de- 
termine the propensity of unpaired amino acids 
(i.e., residues paired with the “dummy” amino 
acid type). 


Sequence Motifs by Permutation Model 
The propensity P(X, Y|k) for the X Yk pattern 
of two ordered intrasequence residues of type 


X and type Y that are k positions away on 
the same sequence (Fig. 1b) is P(X,Y|k) = 
ASHES where fops(X, Y|K) is the observed 
count of X Yk patterns, and E[ f(X, Y|k)] is the 
expected count of X Yk patterns. 


Expectation of X¥ Yk and X Xk Two-Residue 
Motifs 

We can regard f(X, Y|k) as the sum of identical 
Bernoulli variables f,(X,Y|k), each of which 
equals | if one of the x number of residues of 
type X occurs at position ¢ in the sequence and 
one of the y number of residues of type Y occurs 


h 

| a 
X,—h-i 

J 

l VY [| k 
y,J-k 
Xy—h-j 
I-x)-y [| Vy-I-k 


Sequence and Spatial Motif Discovery in Short Se- 
quence Fragments, Fig. 2 Division of residues in spa- 
tial motif analysis when X 4 Y. White=X, black=Y, 
gray = “neither” X or Y 
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at position ¢t + k or equals 0 otherwise. Since an 
XYk pattern cannot occur if t > / — k, we con- 
cern ourselves only with the first / — k positions. 
We have: E[f,(X,VY|k)] = PIAQ.Y|k) = 
l= F-q2 if t <1—k. There are/—k such 
identical variables, and their expectations may be 
summed as 


xy 


Id —1)’ 


ELF (X, ¥|k)] = U —k) (1) 


where / is the length of the sequence, x is the 
number of residues of type X, and y is the 
number of residues of type Y. 

For XXk patterns, the expectation is calcu- 
lated as 


x(x — 1) 


iL A(X, X|k)] = —OTG=” (2) 


as there will be x — | residues available to place 
the second X residue at position t + k after the 
first X residue is placed at t. Although these 
Bernoulli random variables are dependent (i.e., 
the placement of one X Yk pattern will affect the 
probability of another X Yk pattern), their expec- 
tations may be summed, because expectation is a 
linear operator. 


Significance of X Yk and X¥ Xk Two-Residue 
Sequence Motifs 

To calculate statistical significance p-values, sev- 
eral formulas have been derived to determine 
Pyy (i), the probability of the occurrence of i = 
f(X, Y|k) XYk patterns for different k values. 


1. Sequence motifs when k = 1. We have 


Pyyi(i) = ; = 

ita 4) 

) a) 
(‘) 


with the convention that (”) =0Oifn <r. 

2. Sequence motifs with residues of different 
types and if x < 2o0r y < 2. 
¢ Ifeither x = 1 or y = 1, we have 


Pxxi(i) = 


’ 
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xy 
Il — 1) 


Pyyx(1) = 7 —k) 


For i = 0, we have simply Pyy,;(0) = 
1—Pyyx(1). 

¢ Ifx = 2or y = 2, the probability of two 
XYk patterns is 


(3) - ¢-20)] 


1(J—1)(1—2) (1-3) 
x(x-1)y(y-1) 


Pyyx(2) = 


We also have for the probabilities of ex- 
actly one X Yk pattern or zero pattern: 


Pyye(1) = E[f(XYk)] — 2Pxyx(2) and 
Pyyx(0) = 1 — [Pxyyx(1) + Pry (2)]. 


3. Sequence motifs with residues of the same 
type if x < 3. 
¢ If x = 2, the probability of one XXk 
pattern is 


— gg pr@=D 
eLfEXI)] = UW TG 


Pyxe.() = 


The probability of no X Xk pattern is 
Pxxx(0) = 1—Pyxg(1). 


¢ If x = 3, the probability of exactly two 
X Xk patterns is 


| — 2k 
(,) 


4. Sequence motifs with k > 1, x > 2, and 
y > 2. When k > 1, x > 2, and y > 2, the 
analytical formulas for Pyy; (i) become very 
complicated. However, when the sequences in 
the dataset used are short, it is possible to 
fully enumerate all permutations of a sequence 
and calculate Pyy;(i) and p-values exactly, 
as shown by Senes et al. [11]. Because x and 
y are usually small in short sequences, the 
computation time needed for motif analysis of 
short sequences is not prohibitive. 


Pyxxz(2) = 
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Sequence and Spatial Motif Discovery in Short Se- 
quence Fragments, Fig. 3 Example of a multi-residue 
sequence pattern as described in the text. This pattern 
contains five specified residues in a span of ten residues. 
Here, Xo, X1, X2, X3, and Xq are specified amino acid 
types, and the corresponding k values are counted as the 
distance from the first position of the sequence (i.e., the 
position occupied by Xo). Thus, ki = 2, ky = 3, 
k3 = 6, and k4 = 9. All other residues (in white) are 
unspecified and may be any amino acid type. This pattern 


is written as (Xo, X1, X2, X3, X4 | 2, 3, 6, 9) 


Propensity of Multi-residue Sequence Motifs 
We now discuss the expected number E[ /(Xo0, 
X1,X2,..., Xn|ki, k2,...,kn)] of a specific pat- 
tern containing n+ 1 residues placed in a contigu- 
ous subsequence of k, + 1 residues (ky, > n). 
Here X; is the residue type of the i-th fixed 
residue in the pattern and k; is the position of this 
residue from the 0-th residue (kg = 0). Positions 
not specified by k; can be any residue type. 
For example, the pattern (A, L, Y|2, 4) is written 
as AL2Y4 and represents AxLxY. A graphic 
example is shown in Fig.3. Many examples of 
these multi-residue sequence motifs in proteins 
have been discovered, including the GxGxxG 
NADH binding motif [1] and the RSxSxP 14-3-3 
binding motif [14]. 

The expected value can be calculated as: 


“Lf (Xo, X1, X2,---, Xn|ki,k2,..-,kn)] 
Tico xs — #0(X3))] 
Il , 


U=n-1y! 


(3) 


= (1 kn) 


where x; is the number of residues of type X;, / 
is the length of the sequence, and #(1(X;)) is the 
number of times residue type X; appears in the 
“subpattern” {Xo, X1, X2,..., Xj-1}- 


Remark 

The above discussions are for determining motifs 
in a single short sequence or sequence pair. This 
can be extended so analysis can be performed 
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Sequence and Spatial Motif Discovery in Short Se- 
quence Fragments, Fig. 4 Difference between (a) a 
permutation null model for sequence motif analysis and 
(b) a position-dependent null model. In both cases, only 
residues of the same shade are permuted with each other. 
In (a), residues are permuted only within each sequence 
individually, while in (b), residues are permuted across 
sequences but only within their specified position ¢ 


on a dataset of multiple short sequences to attain 
sufficient statistical significance. This has the 
advantage of capturing within-sequence relation- 
ships on a scale large enough to obtain reliable 
p-values. Details can be found in [6]. 


Spatial Motifs by Positional Null Model 

When there are significant biases in residue pref- 
erences for certain positions in a sequence known 
a priori, e.g., the enrichment of aromatic residues 
at either end of a transmembrane a-helix or B- 
strand [12], these single-residue biases may con- 
found two-residue propensities. The positional 
null model should be used for motif detection 
in such cases [6]. Instead of permuting residues 
across all positions within individual sequences, 
we permute residues across all sequences in a 
dataset within specific positions (Fig. 4). 


Expectation and Significance of Interacting 
Residue Pairs 

We allocate residues into regions, which do not 
overlap. Regions may have different lengths 
along the sequences. Interacting regions within a 
sequence pair are assumed to have equal length. 
If a residue in region r interacts with a residue 
in region s on a Spatially adjacent sequence 
fragment, all residues in region r in the dataset 
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must only interact with residues in region s. For 
example, for interacting antiparallel 6-strands, 
we divide each strand into three regions, the N- 
terminal, central core, and C-terminal regions, 
and all interacting strand pairs into two spatial 
pair types, N-terminal with C-terminal and core 
with core. We require that no core residue interact 
with an N-terminal or C-terminal residue. 

The null model for position-dependent spa- 
tial motifs differs depending on whether paired 
residues are from the same region (r = S) or 
different regions (r # s), and whether the residue 
types in the pair are the same (X = Y) or 
different (X 4 Y). 


1. Whenr = s and X = Y. The expected value 
of X-X pairs in region r is 


: = CS) its _ Xr(xr — 1) 
ANS ys > SG, = 
PG, j,k) = 
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where 77 is the number of residues in region r. 
The probability Py x \,,(i) of i X-X interact- 
ing pairs in region r in the dataset for p-values 
calculation calculated is 


M(2, i, xp — 2i) - 2°24 


@) ) 


Pyx|rr(@) = 


where the 3-element multinomial function 
M(a, b,c) is as defined before. 

2. Whenr = s and X + Y. The expected value 
when X # Y is 


xp 2 eS 


(z) 2 
The probability P(i, 7, k) of each combination 


of i, j, and k pairs of type X-Y, X-X, and 
Y-Y interactions, respectively, is 


Xr Yr 
np —1- 


M(2E,i, j,k, xp —i — 2), vp — i — Dk) DTP 2k 


M(n;, Xr, yr) 


where the 6-variable multinomial function 
! 

M(a,b,c, d, ef) = FIsMiBif\u=bse=a=e= pI . 

The probability Pyy,,,(i) of i X-Y pairs in 

the dataset is then 


xXr—-i yr-i 


Pyypr = >) >> PUL. 


j=0 k=0 


3. When r # s. We distinguish X,, a residue of 
type X occurring in region r in one sequence, 
and Xs, a residue of type X occurring in 
region s in the other sequence. Thus, an X- 
Y pair, which we define as an X; — Ys pair, is 
different from a Y-X pair, which is Y, — Xs. 
Because there is a one-to-one correspondence 
between residues in region r and region s, 
Ny = Ng is the total number of r — s pairs. 

In order for exactly i X-Y pairs to occur, 
i X; residues must be drawn from a possible 
xX; residues of type X to match i Y, residues 


drawn from a possible y, residues of type 
Y. This can be modeled with a simple 
hypergeometric distribution. The expected 
value can be calculated as 


He dry eee 
n 


r 


The Pyy|,s(i) of i X-Y pairs is 


CCS) 
Pxytrs(@) = a ae 
GC) 


Expectation and Significance of Sequence 

Motifs 

We define the positional residue frequency x; as 
the number of residues of type X occupying the 
t-th position of all sequences in the dataset. If 
sequences of different lengths are represented in 
the dataset, it is necessary to normalize ¢ to be 
within an appropriate range [1, /], to approximate 
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an average or predetermined sequence length of 
l: 
l (tobs = 0.5) 


l obs 


t= |, 


where fons € {1,2,3,--- ,Jops} is the actual po- 
sition of the residue within its sequence, / 4, is 
the actual length of the sequence, [x] represents 
the ceiling function, equal to the lowest integer 
greater than or equal to x, and the 0.5 factor is 
a correction for continuity to round to the next 
integer. This ensures that 1 < ¢ < /, no residues 
are removed from the model by truncation, and 
each position ¢ will be represented by nearly the 
same number of residues. 

For sequence motif, we use the model of 
permutation within each position in a sequence 
with replacement across all sequences. Although 
all other null models in this study rely on permu- 
tation without replacement, this model is based 
on datasets of multiple sequences instead of indi- 
vidual sequences, and the approximation of sam- 
pling without replacement will not be problem- 
atic once a sufficiently large sample of sequences 
is assembled. 


1. X Yk motif at position t. When t < 1—k, the 
probability of an X Yk pattern at position ¢ is 


P(X, ¥|k,#) = 2 * 


Ne Nt+k 


where x; is the number of residues of type X 
in position ¢ on all sequences, y; is the number 
of residues of type Y in position ¢, and n; 
is the number of all residues of all types in 
position ¢. This null model can be represented 
as a binomial distribution. 

The expected frequency of X Yk patterns at 
position ¢ is 


OL (X,Y |k, t)] =n; «P(X, Yk, 0). 


The probability of i X Yk patterns at position 
t in the dataset is 


Pxvtu(i) = Qi Yk. 0) 


[1 —P(X,Y|k, 2)". 
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Note that the probability that an X Yk pat- 
tern appears at position ¢ is Oift > 7 —k, as 
an X Yk pattern would span across the end of 
a sequence of length /. 

2. XYk motif at any arbitrary position. To 
calculate the dataset-wide probability of an 
XYk pattern at any arbitrary position of the 
sequence, we average P(X, Y|k,t) over all 
| —k possible positions: 


l-k 
1 
t=1 


This can similarly be represented as a bino- 
mial distribution with probability distribution 
function: Pyyx(@i) = ("*)P(X,Y|k)'[1 - 
P(X, Y |k)]"*", where n,; is the number of all 
pairs of all residue types k residues apart in the 
dataset. The expected value is 


EL F(X, ¥|k)] = ne P(X, Yk). 


Unlike the situation where only one position 
t is concerned, this distribution represents the 
sum of dependent Bernoulli variables. Meth- 
ods of accounting for this dependence can be 
found in Robin et al. [10]. 


Applications 


Several spatial and sequence motifs have been 
discovered using the approach discussed here 
[5-7]. The estimated propensities have also been 
used to develop empirical potential function for 
prediction of oligomerization stated [8], protein- 
protein interaction interfaces [3, 8], engineering 
of thermal resistance [2], and in predicting struc- 
tures of B-barrel membrane proteins [9]. 


Open Problems 


General analytical formulas for calculating prob- 
abilities of two-residue and multi-residue motifs 
under the permutation model are unknown. 
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Problem Definition 


One of the key steps in a VLSI design flow 
is technology mapping that converts a Boolean 
network of technology-independent logic gates 
and D-flipflops (FFs) into an equivalent one com- 
prised of cells from a technology library [1, 4]. 
Technology mapping can be formulated as a 
covering problem where logic gates are covered 
by cells from the technology library. For ease 
of discussion, it is assumed that the cell library 
contains only one cell, a K-input lookup table 
(K-LUT) with one unit of delay. A K-LUT can 
implement any Boolean function with up to K 
inputs as is the case in field-programmable gate 
arrays (FPGAs) [1,3]. 

Figure 1 shows an example of technology 
mapping. The original network in (1) with three 
FFs and four gates is covered by three 3-input 
cones as indicated in (2). The corresponding 
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mapping solution using 3-LUTs is shown in (3). circuit technology mapping problem can 


Note that gate 7 is covered by two cones so it will 
be replicated. The mapping solution has a cycle 
time (or clock period) of two units, which is the 
maximum number of LUTs on all paths without 
FFs. 

Retiming relocates FFs in a network by mov- 
ing FFs across logic nodes backward or forward 
[5]. Retiming does not alter the functionality of a 
network. Figure 2 (1) shows the network obtained 
from the one in Fig. 1 (1) by moving the FFs 
at the output of gates y and i to their inputs. It 
can now be covered with just one 3-input cone 
as indicated in (1). The corresponding mapping 
solution shown in (2) is better in both cycle 
time and area than the one in Fig. 1 (3) obtained 
without retiming. 

A K-bounded network is one in which each 
gate has at most K inputs. The sequential 


be defined as follows: Given a K-bounded 
Boolean network N and a target cycle time 
@, find a mapping solution with a cycle time 
of $, assuming FFs can be relocated using 
retiming. 


Key Results 


The first polynomial time algorithm for the prob- 
lem was proposed in [9, 10]. An improved algo- 
rithm was proposed in [2] to reduce runtime. Both 
algorithms are based on min-cost flow computa- 
tion. 

In [8], an efficient algorithm was proposed to 
take advantage of the fact that K is a small integer 
usually between 3 and 6. The algorithm is based 
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FindAliCuts(N, K) 
foreach node vin N do C(v) = {{v°}} 


while (new cuts discovered ) do 


foreach node v in N do C(v) < merge (C(u)),....,;C(u,)) 


on enumerating all K-input cones for each gate 
and will be described next. 


Cut Enumeration 


A Boolean network can be represented as an 
edge-weighted directed graph where the nodes 
denote logic gates and primary inputs/outputs. 
There is a directed edge (u,v) with weight d if 
u, after going through d FFs, drives v. 

A cone for a node can be captured by a cut 
consisting of inputs to the cone. An element 
in a cut for v consists of the driving node u 
and the total weight d on the paths from u to 
v, denoted by wu. If u can reach v on several 
paths with different FF counts, u will appear 
in the cut multiple times with different ds. For 
the cone for z in Fig.2 (2), the corresponding 
cut is tz) b'\. A cut of size K is called a 
K-cut. 

Let (uj,v), where i = 1,...,¢, be all input 
edges to v in N. Further assume the weight 
of (u;,v) is d; and C(u;) is a set of K-cuts 
for u;. Let merge(C(u,),...,C(u;)) denote the 
following set operation: 


d d 
{{v°}} U {eft U---Uer* ler € (ui), 005 ct 


€ C(u), |e U---U c#*| < K} 


where cli = {u2+% |u4 € ¢;} fori =1,...,¢. It 
is obvious that merge(C(u;),...,C(uz)) is a set 
of K-cuts for v. 

If the network N does not contain cycles, 
the K-cuts of all nodes can be determined us- 
ing the merge operation in a topological order 
starting from the PIs. For general networks, Fig. 3 
outlines the iterative cut computation procedure 
proposed in [8]. 


Figure 4 depicts the iterations in enumerating 
the 3-cuts for the network in Fig.1 (1) where 
cuts are merged in the order 7, x, y, z, and o. 
At the beginning, every node has a trivial cut 
formed by itself (Row 0). Row | shows the new 
cuts discovered in the first iteration. In second 
iteration, two more cuts are discovered (for x). 
After that, further merging does not yield any new 
cut and the procedure stops. 


Lemma 1 After at most Kn iterations, the cut 
enumeration procedure will find all the K-cuts 
for every node in N. 


Techniques have been proposed to speed up 
the procedure [8]. For practical networks, the 
cut enumeration procedure typically converges in 
just a few iterations. 


Label Computation 


After obtaining all K-cuts, the cuts are evaluated 
based on sequential arrival times (or /-values), 
which is an extension of traditional arrival times, 
to consider the effect of retiming [7,9]. 

The labeling procedure tries to find a label for 
each node as outlined in Fig. 5, where w, denotes 
the weight of the shortest paths from PIs to 
node v. 

Figure 6 shows the iterations for label compu- 
tation for the network in Fig. | (1), assuming that 
the target cycle time @¢ = 1 and the nodes are 
evaluated in the order of i, x, y, z, and o. In the 
table, the current label as well as a corresponding 
cut for each node is listed. In this example, after 
the first iteration, none of the labels will change 
and the procedure stops. 

It can be shown that the labeling procedure 
will stop after at most n(n — 1) iterations [10]. 
The following lemma relates labels to mapping: 
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Lemma 2 N has a mapping solution with cycle 
time ¢ iff the labeling procedure returns “suc- 
cess.” 


Mapping Solution Generation 


Once the labels for all nodes are computed suc- 
cessfully, a mapping solution can be constructed 
starting from primary outputs. At each node v, the 
procedure selects the cut that realizes the label of 


the node and then moves on to select a cut for 
u if u@ is in the cut selected for v. On the edge 
from u to v, d FFs are inserted. For the network in 
Fig. | (1), the mapping solution generated based 
on the labels found in Fig. 6 is exactly the one in 
Fig. 2 (2). 

To obtain a mapping solution with the target 
cycle time ¢, v will be retimed by a value of 
[/(v)/@] — 1. For the network in Fig. 1 (1), the 
final mapping solution after retiming is shown in 
Fig. 2 (3) which has a cycle time of 1. 
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Applications 


The algorithm can be used to map a technology- 
independent Boolean network to a network 
consisting of cells from a target technology 
library. The concepts and framework are 
generally enough to be adapted to study other 
circuit optimizations such as sequential circuit 
clustering and sequential circuit restructuring [6]. 
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Problem Definition 


Short History 

The k-set agreement problem is a paradigm of 
coordination problems. Defined in the setting 
of systems made up of processes prone to fail- 
ures, it is a simple generalization of the con- 
sensus problem (that corresponds to the case 
k = 1). That problem was introduced in 1993 
by Chaudhuri [2] to investigate how the num- 
ber of choices (k) allowed for the processes is 
related to the maximum number of processes 
that can crash. (After it has crashed, a process 
executes no more steps: a crash is a premature 
halting.) 


Definition 

Let S be a system made up of n processes where 
up to ¢ can crash and where each process has an 
input value (called a proposed value). The prob- 
lem is defined by the three following properties 
(i.e., any algorithm that solves that problem has 
to satisfy these properties): 


1. Termination. Every nonfaulty process decides 
a value. 

2. Validity. A decided value is a proposed value. 

3. Agreement. At most k different values are 
decided. 


Set Agreement 


The Trivial Case 

It is easy to see that this problem can be trivially 
solved if the upper bound on the number of 
process failures tf is smaller than the allowed 
number of choices k, also called the coordi- 
nation degree. (The trivial solution consists in 
having t + 1 predetermined processes that send 
their proposed values to all the processes, and 
a process deciding the first value it ever re- 
ceives.) So, k < t is implicitly assumed in the 
following. 


Key Results 
Key Results in Synchronous Systems 


The Synchronous Model 
In this computation model, each execution 
consists of a sequence of rounds. These are 
identified by the successive integers 1,2, etc. For 
the processes, the current round number appears 
as a global variable whose global progress entails 
their own local progress. 

During a round, a process first broadcasts 
a message, then receives messages, and finally 
executes local computation. The fundamental 
synchrony property the a synchronous system 
provides the processes with is the following: 
a message sent during a round r is received 
by its destination process during the very same 
round r. If during a round, a process crashes 
while sending a message, an arbitrary subset (not 
known in advance) of the processes receive that 
message. 


Main Results 
The k-set agreement problem can always be 
solved in a synchronous system. The main result 
is for the minimal number of rounds (R,) that are 
needed for the nonfaulty processes to decide in 
the worst-case scenario (this scenario is when 
exactly k processes crash in each round). It 
was shown in [3] that R; = Le] +1. A very 
simple algorithm that meets this lower bound is 
described in Fig. 1. 

Although failures do occur, they are rare in 
practice. Let f denote the number of processes 
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that crash in a given run, 0< f <t. We 
are interested in synchronous algorithms that 
terminate in at most R, rounds when f processes 
crash in the current run, but that allow the 
nonfaulty processes to decide in far fewer rounds 
when there are few failures. Such algorithms are 
called early-deciding algorithms. It was shown 
in [4] that, in the presence of f process crashes, 
any early-deciding k-set agreement algorithm 
has runs in which no process decides before 
the round Ry = min(|£ | +2,1£] +1). This 
lower bound shows an inherent tradeoff linking 
the coordination degree k, the maximum number 
of process failures ¢, the actual number of process 
failures f, and the best time complexity that can 
be achieved. Early-deciding k-set agreement 
algorithms for the synchronous model can be 
found in [4, 12]. 


Other Failure Models 

In the send omission failure model, a process is 
faulty if it crashes or forgets to send messages. 
In the general omission failure model, a process 
is faulty if it crashes, forgets to send messages, 
or forgets to receive messages. (A send omission 
failure models the failure of an output buffer, 
while a receive omission failure models the fail- 
ure of an input buffer.) These failure models were 
introduced in [11]. 

The notion of strong termination for set 
agreement problems was introduced in [13]. 
Intuitively, that property requires that as many 
processes as possible decide. Let a good process 
be a process that neither crashes nor commits 
receive omission failures. A set agreement 
algorithm is strongly terminating if it forces all 
the good processes to decide. (Only the processes 
that crash during the execution of the algorithm, 
or that do not receive enough messages, can be 
prevented from deciding.) 

An early-deciding k-set agreement algorithm 
for the general omission failure model was 
described in [13]. That algorithm, which requires 
t <n/2, directs a good process to decide and 
stop in at most Rr = min(| 4] +2,1¢)/+) 
rounds. Moreover, a process that is not a good 
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A simple k-set agreement Function k-set_agreement (v;) 
synchronous algorithm (1) esti — vi; 
(code for pj) (2) whenr=1,2,..., el + 1 do %r: round number % 
(3) begin_round 
(4) send (est;) to all; % including p; itself % 
(5) est; <— min({est j values received during 


the current round r}); 


(6) end_round; 
(7) return (est;) 


process executes at most 


min([£] + 2,|£)] + 1) rounds. 

As Ry is a lower bound for the number of 
rounds in the crash failure model, the previous 
algorithm shows that Ry is also a lower bound 
for the nonfaulty processes to decide in the more 
severe general omission failure model. Proving 
that Ry¢(not good) is an upper bound for the 
number of rounds that a nongood process has to 
execute remains an open problem. 

It was shown in [13] that, for a given 
coordination degree k, t < Ean is an upper 
bound on the number of process failures when 
one wants to solve the k-set agreement problem 
in a synchronous system prone to process general 
omission failures. A k-set agreement algorithm 
that meets this bound was described in [13]. 
That algorithm requires the processes execute 
R=t+2-—k rounds to decide. Proving (or 
disproving) that R is a lower bound when 
t< gern is an open problem. Designing an 


early-deciding k-set agreement algorithm for 


t< peqn and k > 1 is another problem that 


remains open. 


Ry (not good)) 


Key Results in Asynchronous Systems 


Impossibility 

A fundamental result of distributed computing 
is the impossibility to design a deterministic al- 
gorithm that solves the k-set agreement problem 
in asynchronous systems when k < t [1, 7, 15]. 
Compared with the impossibility of solving asyn- 
chronous consensus despite one process crash, 
that impossibility is based on deep combinato- 
rial arguments. This impossibility has opened 


new research directions for the connection be- 
tween distributed computing and topology. This 
topology approach has allowed the discovery of 
links relating asynchronous k-set agreement with 
other distributed computing problems such as the 
renaming problem [5]. 


Circumventing the Impossibility 

Several approaches have been investigated to cir- 
cumvent the previous impossibility. These ap- 
proaches are the same as those that have been 
used to circumvent the impossibility of asyn- 
chronous consensus despite process crashes. 

One approach consists in replacing the “de- 
terministic algorithm” by a “randomized algo- 
rithm.” In that case, the termination property 
becomes “the probability for a correct process 
to decide tends to 1 when the number of rounds 
tends to +00.” That approach was investigated 
in [9]. 

Another approach that has been proposed is 
based on failure detectors. Roughly speaking, 
a failure detector provides each process with a list 
of processes suspected to have crashed. As an ex- 
ample, the class of failure detectors denoted OS,, 
includes all the failure detectors such that, after 
some finite (but unknown) time, (1) any list con- 
tains the crashed processes and (2) there is a set 
Q of x processes such that Q contains one correct 
process and that correct process is no longer 
suspected by the processes of Q (let us observe 
that correct processes can be suspected intermit- 
tently or even forever). Tight bounds for the k- 
set agreement problem in asynchronous systems 
equipped with such failure detectors, conjectured 
in [9], were proved in [6]. More precisely, such 
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a failure detector class allows the k-set agreement 
problem to be solved for k > t — x + 2 [9], and 
cannot solve it when k < t —x + 2 [6]. 

Another approach that has been investigated is 
the combination of failure detectors and condi- 
tions [8]. A condition is a set of input vectors, and 
each input vector has one entry per process. The 
entries of the input vector associated with a run 
contain the values proposed by the processes in 
that run. Basically, such an approach guarantees 
that the nonfaulty processes always decide when 
the actual input vector belongs to the condition 
the k-set algorithm has been instantiated with. 


Applications 


The set agreement problem was introduced 
to study how the number of failures and 
the synchronization degree are related in an 
asynchronous system; hence, it is mainly 
a theoretical problem. That problem is used as 
a canonical problem when one is interested in 
asynchronous computability in the presence of 
failures. Nevertheless, one can imagine practical 
problems the solutions of which are based on 
the set agreement problem (e.g., allocating 
a small shareable resources — such as broadcast 
frequencies — in a network). 
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Problem Definition 


The SET COVER problem has as input a set R of 
m items, a set C of n subsets of R and a weight 
function w: C — Q. The task is to choose a sub- 
set C’C C of minimum weight whose union 
contains all items of R. 

The sets R and C can be represented by an 
m Xn binary matrix A that consists of a row for 
every item in R and a column for every subset of R 
in C, where an entry a;,; is 1 iff the ith item in R 
is part of the jth subset in C. Therefore, the SET 
COVER problem can be formulated as follows. 


Input: An m Xn binary matrix A and a weight 
function w on the columns of A. 

Task: Select some columns of A with minimum 
weight such that the submatrix A’ of A that is 
induced by these columns has at least one | in 
every row. 


While SET COVER is NP-hard in general [4], it 
can be solved in polynomial time on instances 
whose columns can be permuted in such a way 
that in every row the ones appear consecutively, 
that is, on instances that have the consecutive 
ones property (CIP). (The C1P can be defined 
symmetrically for columns; this article focuses 
on rows. SET COVER on instances with the C1P 
can be solved in polynomial time, e.g., with 
a linear programming approach, because the cor- 
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responding coefficient matrices are totally uni- 
modular (see [9]). 

Motivated by problems arising from railway 
optimization, Mecke and Wagner [7] consider the 
case of SET COVER instances that have “almost 
the CIP”. Having almost the C1P means that the 
corresponding matrices are similar to matrices 
that have been generated by starting with a matrix 
that has the C1P and replacing randomly a certain 
percentage of the 1’s by 0’s [7]. For Ruf and 
Schébel [8], in contrast, having almost the C1P 
means that the average number of blocks of 
consecutive 1’s per row is much smaller than the 
number of columns of the matrix. This entry will 
also mention some of their results. 


Notation 

Given an instance (A, w) of SET COVER, let R 
denote the row set of A and C its column set. 
A column c; covers a row r;, denoted by rj € c;, 
if a ig = 1. 

A binary matrix has the strong C1P if (without 
any column permutation) the 1’s appear consecu- 
tively in every row. A block of consecutive 1’s is 
a maximal sequence of consecutive 1’s in a row. It 
is possible to determine in linear time if a matrix 
has the C1P, and if so, to compute a column 
permutation that yields the strong C1P [2, 3, 6]. 
However, note that it is NP-hard to permute the 
columns of a binary matrix such that the number 
of blocks of consecutive 1’s in the resulting ma- 
trix is minimized [1, 4, 5]. 

A data reduction rule transforms in polyno- 
mial time a given instance J of an optimization 
problem into an instance I’ of the same problem 
such that |/’| < |/| and the optimal solution for J’ 
has the same value (e.g., weight) as the optimal 
solution for /. Given a set of data reduction rules, 
to reduce a problem instance means to repeatedly 
apply the rules until no rule is applicable; the 
resulting instance is called reduced. 


Key Results 
Data Reduction Rules 


For SET COVER there exist well-known data 
reduction rules: 
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Row domination rule: If there are two rows 
Vi, Ti, € R with Ve € C:r;, € cimpliesr;, € c, 
then r;, is dominated by r;,. Remove row rj, 
from A. 

Column domination rule: If there are two 
columns cj;,,Cj, € C with w(c;,) = w(c;,) and 
Vr € R:r € cj, implies r €c;,, then c;, is 
dominated by cj. Remove c;, from A. 

In addition to these two rules, a column 
cj, €C can also be dominated by a subset 
C’CC of the columns instead of a single 
column: If there is a subset C’ CC _ with 
w(cj,) = Yieec wc) and Vr € Rir € c;, 
implies (Ac € C’:r € c), then remove cj, 
from A. Unfortunately, it is NP-hard to find 
a dominating subset C’ for a given set c;,. Mecke 
and Wagner [7], therefore, present a restricted 
variant of this generalized column domination 
rule. 

For every row r € R, let Cmin(7) be a column 
in C that covers r and has minimum weight under 
this property. For two columns c;,,cj, € C, de- 
fine X(cj,,Cj.) = {Cmin(r) | 7 € cj, AT € Cj}. 
The new data reduction rule then reads as follows. 

Advanced column domination rule: If 
there are two columns c;,,cj, € C and a row 
that is covered by both c;, and c;,, and if 
w(cj,) 2 W(Cj2) + dice X(€j, .€i) w(c), then cj, 
is dominated by {c;,} U X(c;,,Cj.). Remove cj, 
from A. 


Theorem 1 ([7]) A matrix A can be reduced in 
O(Nn) time with respect to the column domina- 
tion rule, in O(Nm) time with respect to the row 
domination rule, and in O(Nmn) time with respect 
to all three data reduction rules described above, 
when N is the number of I’s in A. 


In the databases used by Ruf and Schdébel [8], 
matrices are represented by the column indices 
of the first and last 1’s of its blocks of consec- 
utive 1’s. For such matrix representations, a fast 
data reduction rule is presented [8], which elim- 
inates “unnecessary” columns and which, in the 
implementations, replaces the column domina- 
tion rule. The new rule is faster than the column 
domination rule (a matrix can be reduced in 
O(mn) time with respect to the new rule), but not 
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as powerful: Reducing a matrix A with the new 
rule can result in a matrix that has more columns 
than the matrix resulting from reducing A with 
the column domination rule. 


Algorithms 

Mecke and Wagner [7] present an algorithm that 
solves SET COVER by enumerating all feasible 
solutions. 

Given a row 7; of A, a partial solution for the 
rows 11,...,1j 18 asubset C’ C C of the columns 
of A such that for each row 7; with j € {1,...,7} 
there is a column in C’ that covers row 7;. 

The main idea of the algorithm is to find an 
optimal solution by iterating over the rows of A 
and updating in every step a data structure S that 
keeps all partial solutions for the rows considered 
so far. More exactly, in every iteration step the 
algorithm considers the first row of A and updates 
the data structure S' accordingly. Thereafter, the 
first row of A is deleted. The following code 
shows the algorithm. 


1 Repeat m times: { 

2 for every partial solution C’ in S that does not 
cover the first row of A: { 

3 for every column c of A that covers the first row 

of A: { 

4 Add {c} UC’ to S; } 

5 Delete C’ from S; } 

6 Delete the first row of A; } 


This straightforward enumerative algorithm 
could create a set S$ of exponential size. 
Therefore, the data reduction rules presented 
above are used to delete after each iteration 
step partial solutions that are not needed any 
more. To this end, a matrix B is associated 
with the set S, where every row corresponds 
to a row of A and every column corresponds to 
a partial solution in S—an entry b;,; of B is 1 iff 
the jth partial solution of B contains a column 
of A that covers the row 7;. The algorithm 


_ ~1._( A|B 
uses the matrix C := (ot 77). which 


is updated together with S in every iteration 
step. (The last row of C allows to distinguish the 
columns belonging to A from those belonging 
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to B.) Line 6 of the code shown above is replaced 
by the following two lines: 


6 Delete the first row of the matrix C; 
7 Reduce the matrix C and update S accordingly;} 


At the end of the algorithm, S contains exactly 
one solution, and this solution is optimal. More- 
over, if the SET COVER instance is nicely struc- 
tured, the algorithm has polynomial running time: 


Theorem 2 ([7]) Jf A has the strong CIP, is 
reduced, and its rows are sorted in lexicographic 
order, then the algorithm has a running time 
of O(M?") where M is the maximum number of 1’s 


per row and per column. 


Theorem 3 ((7]) Jf the distance between the first 
and the last 1 in every column is at most k, then 
at any time throughout the algorithm the number 
of columns in the matrix B is O( 2k") and the 
running time is O( 27k kmn?). 


Ruf and Schobel [8] present a branch and bound 
algorithm for SET COVER instances that have 
a small average number of blocks of consecu- 
tive 1’s per row. 

The algorithm considers in each step a row 7; 
of the current matrix (which has been reduced 
with data reduction rules before) and branches 
into bl; cases, where bl; is the number of blocks 
of consecutive 1’s in r;. In each case, one block 
of consecutive 1’s in row r; is selected, and 
the 1’s of all other blocks in this row are replaced 
by 0’s. Thereafter, a lower and an upper bound 
on the weight of the solution for each resulting 
instance is computed. If a lower bound differs by 
a factor of more than | + €, for a given constant e, 
from the best upper bound achieved so far, the 
corresponding instance is subjected to further 
branchings. Finally, the best upper bound that 
was found is returned. 

In each branching step, the b/; instances that 
are newly generated are “closer” to have the 
(strong) C1P than the instance from which they 
descend. If an instance has the C1P, the lower 
and upper bound can easily be computed by 
exactly solving the problem. Otherwise, standard 
heuristics are used. 
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Applications 


SET COVER instances occur e.g., in railway op- 
timization, where the task is to determine where 
new railway stations should be built. Each row 
then corresponds to an existing settlement, and 
each column corresponds to a location on the 
existing trackage where a railway station could 
be build. A column c covers a row r, if the 
settlement corresponding to r lies within a given 
radius around the location corresponding to c. 

If the railway network consisted of one 
straight line rail track only, the corresponding 
SET COVER instance would have the CIP; 
instances arising from real world data are close 
to have the C1P [7, 8]. 


Experimental Results 


Mecke and Wagner [7] make experiments on real- 
world instances as described in the Applications 
section and on instances that have been generated 
by starting with a matrix that has the CIP and 
replacing randomly a certain percentage of the 1’s 
by 0’s. The real-world data consists of a railway 
graph with 8,200 nodes and 8,700 edges, and 
30,000 settlements. The generated instances con- 
sist of 50-50,000 rows with 10-200 1’s per row. 
Up to 20 % of the 1’s are replaced by 0’s. 

In the real-world instances, the data reduction 
rules decrease the number of 1’s to between | % 
and 25 % of the original number of 1’s with- 
out and to between 0.2 % and 2.5 % with the 
advanced column reduction rule. In the case of 
generated instances that have the C1P, the number 
of 1’s is decreased to about 2 % without and 
to 0.5 % with the advanced column reduction 
tule. In instances with 20 % perturbation, the 
number of 1’s is decreased to 67 % without and 
to 20 % with the advanced column reduction rule. 

The enumerative algorithm has a running time 
that is almost linear for real-world instances and 
most generated instances. Only in the case of 
generated instances with 20 % perturbation, the 
running time is quadratic. 

Ruf and Schobel [8] consider three instance 
types: real-world instances, instances arising 
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from Steiner triple systems, and randomly gener- 
ated instances. The latter have a size of 100 x 100 
and contain either 1-5 blocks of consecutive 1’s 
in each row, each one consisting of between 
one and nine 1’s, or they are generated with 
a probability of 3 % or 5 % for any entry to be 1. 

The data reduction rules used by Ruf and 
Schobel turn out to be powerful for the real-world 
instances (reducing the matrix size from about 
1,100 x 3,100 to 100 x 800 in average), whereas 
for all other instance types the sizes could not be 
reduced noticeably. 

The branch and bound algorithm could solve 
almost all real-world instances up to optimality 
within a time of less than a second up to one hour. 
In all cases where an optimal solution has been 
found, the first generated subproblem had already 
provided a lower bound equal to the weight of the 
optimal solution. 
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Problem Definition 


The study of the parameterized complexity of 
problems on directed graphs has been hitherto 
relatively unexplored. Usually the directed ver- 
sion of the problems require significantly differ- 
ent and more involved ideas than the ones for 
the undirected version. Furthermore, for directed 
graphs there are no known algorithmic meta- 
techniques: for example, there is no known al- 
gorithmic analogue of the Graph Minor Theory 
of Robertson and Seymour for directed graphs. 
As a result, the fixed-parameter tractability status 
of the directed versions of several fundamental 
problems such as Multiway Cut, Multicut, Feed- 
back Vertex Set, etc., was open for a long time. 
The problem of Feedback Vertex Set best illus- 
trates this gulf between undirected and directed 
graphs with respect to parameterized complexity. 
In this problem, we are given a graph and the 
question is whether there exists a set of size at 
most & whose deletion makes the graph acyclic. 
The undirected version was known to be FPT 
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since 1984 [10]. However, the directed version 
was a long-standing open problem until it was 
shown to be FPT in 2008 [1]. 

The framework of shadowless solutions aims 
to bridge this gap by providing an important first 
step in designing FPT algorithms for a general 
class of transversal problems on directed graphs. 
In undirected graphs, the framework of shad- 
owless solutions was introduced in [9] and has 
since been used in [4, 6,7]. It was adapted and 
generalized to directed graphs in [2, 3] for the 
following general class of problems: 


Finding an /-transversal for some T- 
connected F 

Input: A directed graph G = (V,E), a 
positive integer k, a set T C V, and a set 
F = {F,, Fo,..., Fg} of subgraphs such 
that F is T-connected, i.e., Vi € [q] each 
vertex of F; can reach some vertex of T by 
a walk completely contained in G[F;] and is 
reachable from some vertex of T by a walk 
completely contained in G[F;]. 

Parameter: k 

Question: Is there an F-transversal W C 
V with |W| < &k, ie., a set W such that 
F,; OW # @ for every i € [q]? 


The collection F is implicitly defined in a 
problem-specific way and need not be given ex- 
plicitly in the input. In fact, it is possible that F is 
exponentially large. The shadow of a solution X 
is the set of vertices that are disconnected from 
T (in either direction) after the removal of X. 
More formally, the reverse shadow of X is given 
by rr(X) = {v : Xisav — T separator}. 
Similarly, the forward shadow of X is given by 
fr(X) = {v : XisaT — v separator}. The 
shadow of X is given by the union of its reverse 
and forward shadows, i.e., shadow(X) = r(X)U 
F(X). A set X is said to be shadowless if its 
shadow is empty. 

The aim is to ensure first that there is a so- 
lution whose shadow is empty, as finding such a 
shadowless solution can be a significantly easier 
task. 
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Key Results 


For the ¥-transversal problem defined above, [2] 
shows how to invoke the technique of random 
sampling of important separators and obtain a set 
Z which is disjoint from a minimum solution X 
and covers its shadow. 


Theorem 1 (randomized covering of the 
shadow) Let T © V(G). There is an algorithm 
RandomSet(G,T,k) that runs in 4* . n° 
time and returns a set Z © V(G) such that 
for any set F of T-connected subgraphs, if 
there exists an F-transversal of size < k, then 
the following holds with probability ee 
there is an F-transversal X of size < k such 
that 


1. X NZ =@and 
2. Z covers the shadow of X, i.e, r(X) U 
F(X) SZ. 


The set F is not an input of the algorithm 
described by Theorem 1: the set Z constructed in 
the above theorem works for every T-connected 
set F of subgraphs. Therefore, issues related to 
the representation of F do not arise. Theorem | 
can be derandomized using the theory of split- 
ters [11]: 


Theorem 2 (deterministic covering of the 
shadow) Let T C V(G). We can construct a 
set {Z1,Z>,...,Z;} with t = 22° - log?n 
in time 227° «n° such that for any set F of 
T -connected, if there exists an F-transversal of 
size < k, then there is an F-transversal X of 
size < k such that for at least one 1 <i < t we 
have 


1. XN Z; = @ and 
2. Z; covers the shadow of X, i.e, r(X) U 
I(X) © Z. 


Consider one such set Z; for some 1 <i < 
220) | log’ n. Since this set Z; is disjoint from 
a minimum solution X, it can be removed from 
the graph. However, we need to remember the 
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structure that the set Z; imposed on the prob- 
lem. This structure is problem specific, and the 
reduced (equivalent) instance is obtained on a 
supergraph of G \ Z; via the torso operation. It 
can be shown that the original instance G has 
a solution if and only if the reduced instance 
has a shadowless solution. Therefore, one can 
focus on the simpler task of finding a shadowless 
solution or more precisely, finding any solution 
under the guarantee that a shadowless solution 
exists. 


Applications 


The first FPT algorithms for the Directed Mul- 
tiway Cut problem [3] and the Directed Subset 
Feedback Vertex Set problem [2] were obtained 
via the framework of shadowless solutions. 


Directed Multiway Cut 

In the Directed Multiway Cut problem, given 
a directed graph G = (V,£), an integer k, 
and a set of terminals T = {t1,¢2,.. ., tp}, the 
objective is to find whether there exists a set 
X C V(G) of size at most k such that G \ X 
has no ¢; — t; path for any 1 <i # j < p. 
Let F be the set of all paths between pairs of 
(distinct) terminals. Then it is easy to show that 
F is T-connected, and the problem of finding an 
F -transversal is exactly the same as the Directed 
Multiway Cut problem. It is shown in [3] that 
a shadowless solution of Directed Multiway Cut 
is also a solution of the underlying undirected 
instance of Multiway Cut, which is known to 
be FPT [8] parameterized by k. Combining with 
Theorem 2, this gives an FPT algorithm for the 
Directed Multiway Cut problem. 


Directed Subset Feedback Vertex Set 

In the Directed Subset Feedback Vertex Set prob- 
lem, given a directed graph G = (V,£), an 
integer k, and a set S C V(G), the objective 
is to find whether there exists a set X C V(G) 
of size at most k such that G \ X has no S- 
cycles, i.e., cycles containing at least one vertex 
of S. The special case when S = V(G) is the 
Directed Feedback Vertex Set problem. Let F be 
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the set of all S-cycles and T be a solution of 
size k + 1 (which can be obtained via iterative 
compression). Then it is easy to show that F is 
T-connected, and the problem of finding an F- 
transversal is exactly the same as the Directed 
Subset Feedback Vertex Set problem. It is shown 
in [2] that a shadowless solution of Directed 
Subset Feedback Vertex Set can be found in FPT 
time. Combining with Theorem 2, this gives an 
FPT algorithm for the Directed Subset Feedback 
Vertex Set problem. This generalizes the FPT 
algorithm for Directed Feedback Vertex Set [1]. 


Open Problems 


The two main open problems which fit within 
the framework of “Finding an ¥-transversal for 
some T-connected F” are Directed Multicut and 
Directed Odd Cycle Transversal. Unfortunately, 
the structure of shadowless solutions is not yet 
understood well enough to be able to find them in 
FPT time. 


Directed Multicut 

In the Directed Multicut problem, given 
a directed graph G = (V,E), an _inte- 
ger k, and a set of terminal pairs T = 
{(51, 41), (S2.t2),....(Sp.tp)}, the objective is 
to find whether there exists a set X C V(G) of 
size at most k such that G \ X has no s; > ¢; 
path for any | <7 < p. Let F be the union of 
set of all s; — ¢; paths for 1 <i < p. Then it 
is easy to show that F is T-connected, and the 
problem of finding an F-transversal is exactly 
the same as the Directed Multicut problem. It is 
known [9] that Directed Multicut parameterized 
by & is W[1]-hard. However, for the special case 
of p = 2 terminal pairs, the problem can be 
reduced to Directed Multiway Cut and is hence 
FPT parameterized by k [3]. The complexity for 
p = 3 parameterized by k is an important open 
problem. With respect to the bigger parameter 
p +k, the problem is known [5] to be FPT on 
directed acyclic graphs. However, this algorithm 
heavily uses the properties of a topological 
ordering, and the complexity parameterized by 
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p + & on general graphs is another important 
open problem. 


Directed Odd Cycle Transversal 

In the Directed Odd Cycle Transversal problem, 
given a directed graph G = (V,£) and an 
integer k, the objective is to find whether there 
exists a set X C V(G) of size at most k 
such that G \ X has no cycle of odd length. 
Let F be the set of all odd cycles in G and 
T be a solution of size k + 1 (which can be 
obtained via iterative compression [12]). Then it 
is easy to show that F is T-connected, and the 
problem of finding an ¥-transversal is exactly 
the same as the Directed Odd Cycle Transversal 
problem. The complexity parameterized by k is 
open. Moreover, it is known that Directed Odd 
Cycle Transversal problem generalizes the Di- 
rected Feedback Vertex Set problem [1] and the 
Undirected Odd Cycle Transversal problem [12]. 
Hence, an FPT algorithm for Directed Odd Cycle 
Transversal would have to generalize the ideas 
used to obtain FPT algorithms for these two 
problems. 
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Problem Definition 


The problem is concerned with scheduling dy- 
namically arriving jobs in the scenario when the 
processing requirements of jobs are unknown to 
the scheduler. The lack of knowledge of how 
long a job will take to execute is a particularly 
attractive assumption in real systems where such 
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information might be difficult or impossible to 
obtain. The goal is to schedule jobs to provide 
good quality of service to the users. In particular 
the goal is to design algorithms that have good 
average performance and are also fair in the sense 
that no subset of users experiences substantially 
worse performance than others. 


Notations 


Let J = {1,2,...,n} denote the set of jobs in 
the input instance. Each job j is characterized by 
its release time r; and its processing requirement 
p;- In the online setting, job j is revealed to the 
scheduler only at time r;. A further restriction 
is the non-clairvoyant setting, where only the 
existence of job 7 is revealed at r;, in particular 
the scheduler does not know p; until the job 
meets its processing requirement and leaves the 
system. Given a schedule, the completion time 
c; Of a job is the earliest time at which job j 
receives p; amount of service. The flow time /; 
of j is defined as c; — r;. The stretch of a job 
is defined as the ratio of its flow time divided by 
its size. Stretch is also referred to as normalized 
flow time or slowdown and is a natural measure 
of fairness as it measures the waiting time of a job 
per unit of service received. A schedule is said to 
be preemptive, if a job can be interrupted arbitrar- 
ily, and its execution can be resumed later from 
the point of interruption without any penalty. It 
is well known that preemption is necessary to 
obtain reasonable guarantees for flow time even 
in the offline setting [6]. 

Recall that the online shortest remaining pro- 
cessing time (SRPT) algorithm that at any time 
works on the job with the least remaining pro- 
cessing is optimum for minimizing average flow 
time. However, a common critique of SRPT is 
that it may lead to starvation of jobs, where some 
jobs may be delayed indefinitely. For example, 
consider the sequence where a job of size 3 
arrives at time ¢ = 0 and one job of size | arrives 
every unit of time starting ¢ = | for a long time. 
Under SRPT, the size 3 job will be delayed until 
the size 1 jobs stop arriving. On the other hand, 
if the goal is to minimize the maximum flow 
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time, then it is easily seen that first in first out 
(FIFO) is the optimum algorithm. However, FIFO 
can perform very poorly with respect to average 
flow time (e.g., many small jobs could be stuck 
behind a very large job that arrived just earlier). A 
natural way to balance both the average and worst 
case performance is to consider the £, norms of 
flow time and stretch, where the £ p norm of the 


1/p 
sequence X1,...,X, is defined as (= ?) F 
i 


The shortest elapsed time first (SETF) is a 
non-clairvoyant algorithm that at any time works 
on the job that has received the least amount of 
service thus far. This is a natural way to favor 
short jobs given the lack of knowledge of job 
sizes. In fact, SETF is the continuous version of 
the multilevel feedback (MLF) algorithm. Unfor- 
tunately, SETF (or any other deterministic non- 
clairvoyant algorithm) performs poorly in the 
framework of competitive analysis, where an al- 
gorithm is called c-competitive if for every input 
instance, its performance is no worse than c times 
that of the optimum offline (clairvoyant) solution 
for that instance [7]. However, competitive anal- 
ysis can be overly pessimistic in its guarantee. 
A way around this problem was proposed by 
Kalyanasundaram and Pruhs [5] who allowed 
the online scheduler a slightly faster processor 
to make up for its lack of knowledge of future 
arrivals and job sizes. Formally, an algorithm Alg 
is said to be s-speed, c-speed competitive where 
c is the worst case ratio over all instance /, 
of Alg,(/)/Opt, (7), where Alg, is the value of 
solution produced by Alg when given an s-speed 
processor, and Opt, is the optimum value using a 
speed | processor. Typically the most interesting 
results are those where c is small and s = (1+e) 
for any arbitrary € > 0. 


Key Results 


In their seminal paper [5], Kalyanasundaram and 
Pruhs showed the following. 


Theorem 1 ([5]) SETF is a (1 + €)-speed, 1+ 
1/€)-competitive non-clairvoyant algorithm for 
minimizing the average flow time on a single 
machine with preemptions. 
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For minimizing the average stretch, Muthukr- 
ishnan, Rajaraman, Shaheen, and Gehrke [6] 
considered the clairvoyant setting and showed 
that SRPT is 2-competitive for a single machine 
and 14-competitive for multiple machines. The 
non-clairvoyant setting was consider by Bansal, 
Dhamdhere, Konemann, and Sinha [7]. They 
showed that 


Theorem 2 ([1]) SETF is a (1 + €)-speed, 
O(log” P)-competitive for minimizing average 
stretch, where P is the ratio of the maximum to 
minimum job size. On the other hand, even with 
O(1)-speed, any non-clairvoyant algorithm is 
at least Q4(log P)-competitive. Interestingly, in 
terms of n, any non-clairvoyant algorithm must 
be {2(n)-competitive even with O(1)-speedup. 
Moreover, SETF is O(n)-competitive (even 
without extra speedup). For the special case 
when all jobs arrive at time 0, SETF is optimum 
up to constant factors. It is O(log P.)-competitive 
(without any extra speedup). Moreover, any non- 
clairvoyant must be Q(log P.)-competitive even 
with factor O(1)-speedup. 

The key idea of the above result was a con- 
nection between SETF and SRPT. First, at the 
expense of (1 + €)-speedup, it can be seen that 
SETF is no worse than MLF where the thresholds 
are powers of (1 + €). Second, the behavior of 
MLF on an instance J can be related to the 
behavior of shortest job first (SJF) algorithm 
on another instance J’ that is obtained from/by 
dividing each job into logarithmically many jobs 
with geometrically increasing sizes. Finally, the 
performance of SJF is related to SRPT using 
another (1 + €) factor speedup. 

Bansal and Pruhs [2] considered the problem 
of minimizing the £, norms of flow time and 
stretch on a single machine. They showed the 
following. 


Theorem 3 ((2]) In the clairvoyant setting, 
SRPT and SJF are (1 + €)-speed, O(1/e€)- 
competitive for minimizing the £) norms of 
both flow time and stretch. On the other hand, 
for 1 < p< ow, no online algorithm 
(possibly clairvoyant) can be O(1)-competitive 
for minimizing €p norms of stretch or flow time 
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without speedup. In particular, any randomized 
online algorithm is at least Q(nP-Y/3P7). 
competitive for €p norms of stretch and is at least 
Q(n'—-D/PGBP-)). competitive for Lp norms of 
flow time. 


The above lower bounds are somewhat sur- 
prising, since SRPT and FIFO are optimum for 
the case p = | and p = ov for flow time. 

Bansal and Pruhs [2] also consider the non- 
clairvoyant case. 


Theorem 4 ((2]) Jn the non-clairvoyant setting, 
SETF is (1+ €)-speed, O(1/e?+?/?)-competitive 
for minimizing the lp norms of flow time. 
For minimizing £) norms of stretch, SETF 
is (1 + €)-speed, O(1/e3+!/P - log!t!/P P)- 
competitive 


Finally, Bansal and Pruhs also consider round 
robin (RR) or processor sharing that at any time 
splits the processor equally among the unfinished 
jobs. RR is considered to be an ideal fair strategy 
since it treats all unfinished jobs equally. How- 
ever, they show that 


Theorem 5 For any p = 1, there is ane > 0 
such that even with a (1 + €) times faster pro- 
cessor, RR is not n° -competitive for minimizing 
the Lp norms of flow time. In particular, for 
€ < 1/2p, RR is (1 + €)-speed, Q(nG-2«?)/?). 
competitive. For £, norms of stretch, RR is (n)- 
competitive as is in fact any randomized non- 
clairvoyant algorithm. 


The results above have been extended in a 
couple of directions. Bansal and Pruhs [3] extend 
these results to weighted £p norms of flow time 
and stretch. Chekuri, Khanna, Kumar, and Goel 
[4] have extended these results to the multiple 
machines case. Their algorithms are particularly 
elegant: Each job is assigned to some machine at 
random, and all jobs at a particular machine are 
processed using SRPT or SETF (as applicable). 


Applications 


SETF and its variants such as MLF are widely 
used in operating systems [9, 10]. Note that SETF 
is not really practical since each job could be 
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preempted infinitely often. However, variants 
of SETF with fewer preemptions are quite 
popular. 


Open Problems 


It would be interesting to explore other notions 
of fairness in the dynamic scheduling setting. 
In particular, it would be interesting to consider 
algorithms that are both fair and have a good 
average performance. 

An immediate open problem is whether the 
gap between O(log? P) and Q(log P) can be 
closed for minimizing the average stretch in the 
non-clairvoyant setting. 
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Problem Definition 


Consider the route-planning task for passengers 
of scheduled public transportation. Here, the run- 
ning example is that of a train system, but the 
discussion applies equally to bus, light-rail and 
similar systems. More precisely, the task is to 
construct a timetable information system that, 
based upon the detailed schedules of all trains, 
provides passengers with good itineraries, includ- 
ing the transfer between different trains. 
Solutions to this problem consist of a model 
of the situation (e.g., can queries specify a limit 
on the number of transfers?), an algorithmic 
approach, its mathematical analysis (does it 
always return the best solution? Is it guaranteed 
to work fast in all settings?), and an evaluation 
in the real world (Can travelers actually use the 
produced itineraries? Is an implementation fast 
enough on current computers and real data?). 


1970 


Key Results 


The problem is discussed in detail in a recent 
survey article [6]. 


Modeling 

In a simplistic model, it is assumed that a transfer 
between trains does not take time. A more real- 
istic model specifies a certain minimum transfer 
time per station. Furthermore, the objective of the 
optimization problem needs to be defined. Should 
the itinerary be as fast as possible, or as cheap 
as possible, or induce the least possible trans- 
fers? There are different ways to resolve this as 
surveyed in [6], all originating in multi-objective 
optimization, like resource constraints or Pareto- 
optimal solutions. From a practical point of view, 
the preferences of a traveler are usually difficult 
to model mathematically, and one might want to 
let the user choose the best option among a set of 
reasonable itineraries himself. For example, one 
can compute all itineraries that are not inferior to 
some other itinerary in all considered aspects. As 
it turns out, in real timetables the number of such 
itineraries is not too big, such that this approach 
is computationally feasible and useful for the 
traveler [5]. Additionally, the fare structure of 
most railways is fairly complicated [4], mainly 
because fares usually are not additive, i.e., are not 
the sum of fares of the parts of a trip. 


Algorithmic Models 

The current literature establishes two main ideas 
how to transform the situation into a shortest path 
problem on a graph. As an example, consider 
the simplistic modeling where transfer takes no 
time, and where queries specify starting time and 
station to ask for an itinerary that achieves the 
earliest arrival time at the destination. 

In the time-expanded model [11], every arrival 
and departure event of the timetable is a vertex of 
the directed graph. The arcs of the graph repre- 
sent consecutive events at one station, and direct 
train connections. The length of an arc is given 
by the time difference of its end vertices. Let s 
be the vertex at the source station whose time is 
directly after the starting time. Now, a shortest 
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path from s to any vertex of the destination station 
is an optimal itinerary. 

In the time-dependent model [3, 7, 9, 10], 
the vertices model stations, and the arcs stand 
for the existence of a direct (non-stop) train 
connection. Instead of edge length, the arcs are 
labeled with edge-traversal functions that give the 
arrival time at the end of the arc in dependence 
on the time a passenger starts at the beginning of 
the arc, reflecting the times when trains actually 
run. To solve this time-dependent shortest path 
problem, a modification of Dijkstra’s algorithm 
can be used. Further exploiting the structure of 
this situation, the graph can be represented in 
a way that allows constant time evaluation of the 
link traversal functions [3]. To cope with more re- 
alistic transfer models, a more complicated graph 
can be used. 

Additionally, many of the speed-up techniques 
for shortest path computations can be applied to 
the resulting graph queries. 


Applications 


The main application are timetable information 
systems for scheduled transit (buses, trains, etc.). 
This extends to route planning where trips in 
such systems are allowed, as for example in 
the setting of fine-grained traffic simulation to 
compute fastest itineraries [2]. 


Open Problems 


Improve computation speed, in particular for 
fully integrated timetables and the multi-criteria 
case. Extend the problem to the dynamic case, 
where the current real situation is reflected, 
i.e., delayed or canceled trains, and otherwise 
temporarily changed timetables are reflected. 


Experimental Results 
In the cited literature, experimental results usu- 


ally are part of the contribution [2, 4, 5, 6, 7, 
8, 9, 10, 11]. The time-dependent approach can 
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be significantly faster than the time-expanded 
approach. In particular for the simplistic models 
speed-ups in the range 10-45 are observed [8, 
10]. For more detailed models, the performance 
of the two approaches becomes comparable [6]. 
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Problem Definition 


This problem is to find shortest paths in planar 
graphs with general edge weights. It is known that 
shortest paths exist only in graphs that contain 
no negative weight cycles. Therefore, algorithms 
that work in this case must deal with the presence 
of negative cycles, i.e., they must be able to detect 
negative cycles. 

In general graphs, the best known algorithm, 
the Bellman-Ford algorithm, runs in time O(mn) 
on graphs with n nodes and m edges, while algo- 
rithms on graphs with no negative weight edges 
run much faster. For example, Dijkstra’s algo- 
rithm implemented with the Fibonacchi heap runs 
in time O(m + nlogn), and, in case of integer 
weights Thorup’s algorithm runs in linear time. 
Goldberg [5] also presented an O(m./n log L)- 
time algorithm where L denotes the absolute 
value of the most negative edge weights. Note 
that his algorithm is weakly polynomial. 


Notations 

Given a directed graph G = (V, £) and a weight 
function w: E — R on its directed edges, a dis- 
tance labeling for a source node s is a function 
d:V — R such that d(v) is the minimum length 
over all s-to-v paths, where the length of path P 


is) ocp we). 


Problem 1 (Single-Source-Shortest-Path) 
INPUT: A directed graph G = (V,E), weight 
function w: E — R, source nodes € V. 
OuTPUT: If G does not contain negative length 
cycles, output a distance labeling d for source 
node s. Otherwise, report that the graph contains 
some negative length cycle. 


The algorithm by Fakcharoenphol and Rao [4] 
deals with the case when G is planar. They gave 
an O(n log? n)-time algorithm, improving on an 
O(n3/?)-time algorithm by Lipton, Rose, and 
Tarjan [9] and an O(n4/3 lognL)-time algorithm 
by Henzinger, Klein, Rao, and Subramanian [6]. 

Their algorithm, as in all previous algorithms, 
uses a recursive decomposition and constructs 
a data structure called a dense distance graph, 
which shall be defined next. 


Shortest Paths in Planar Graphs with Negative Weight Edges 


A decomposition of a graph is a set of subsets 
Pi, Po,..., Pe (not necessarily disjoint) such 
that the union of all the sets is V and for all 
e = (u,v) € E, there is a unique P; that contains 
e. Anode v is a border node of a set P; if v € P; 
and there exists an edge e = (v,x) where x ¢ Pj. 
The subgraph induced on a subset P; is referred to 
as a piece of the decomposition. 

The algorithm works with a recursive decom- 
position where at each level, a piece with n nodes 
and r border nodes is divided into two subpieces 
such that each subpiece has no more than 2n/3 
nodes and at most 2r/3 + c./n border nodes, 
for some constant c. In this recursive context, 
a border node of a subpiece is defined to be any 
border node of the original piece or any new 
border node introduced by the decomposition of 
the current piece. 

With this recursive decomposition, the level 
of a decomposition can be defined in the nat- 
ural way, with the entire graph being the only 
piece in the level 0 decomposition, the pieces 
of the decomposition of the entire graph being 
the level 1 pieces in the decomposition, and 
so on. 

For each piece of the decomposition, the all- 
pair shortest path distances between all its bor- 
der nodes along paths that lie entirely inside 
the piece are recursively computed. These all- 
pair distances form the edge set of a non-planar 
graph representing shortest paths between border 
nodes. The dense distance graph of the planar 
graph is the union of these graphs over all the 
levels. 

Using the dense distance graph, the shortest 
distance queries between pairs of nodes can be 
answered. 


Problem 2 
Structure) 
INPUT: A directed graph G = (V,£), weight 
function w: E — R, source nodes € V. 
OuTPuT: If G does not contain negative length 
cycles, output a data structure that support dis- 
tance queries between pairs of nodes. Other- 
wise, report that the graph contains some negative 
length cycle. 


(Shortest-Path-Distance-Data- 
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The algorithm of Fakcharoenphol and Rao re- 
lies heavily on planarity, i.e., it exploits proper- 
ties regarding how shortest paths on each piece 
intersect. Therefore, unlike previous algorithms 
that require only that the graph can be recur- 
sively decomposed with small numbers of border 
nodes [10], their algorithm also requires that each 
piece has a nice embedding. 

Given an embedding of the piece, a hole is 
a bounded face where all adjacent nodes are bor- 
der nodes. Ideally, one would hope that there is 
a planar embedding of any piece in the recursive 
decomposition where all the border nodes are 
on a single face and are circularly ordered, i.e., 
there is no holes in each piece. Although this is 
not always true, the algorithm works with any 
decomposition with a constant number of holes 
in each piece. This decomposition can be found in 
O(n log n) time using the simple cycle separator 
algorithm by Miller [12]. 


Key Results 


Theorem 1 Given a recursive decomposition of 
a planar graph such that each piece of the de- 
composition contains at most a constant number 
of holes, there is an algorithm that constructs the 
dense distance graph is O(n log? n) time. 


Given the procedure that constructs the dense dis- 
tance graph, the shortest paths from a source s can 
be computed by first adding s as a border node 
in every piece of the decomposition, computing 
the dense distance graph, and then extending the 
distances into all internal nodes on every piece. 
This can be done in time O(n log? n). 


Theorem 2 The single-source shortest path 
problem for an n-node planar graph with 
negative weight edges can be solved in time 


O(n log? n). 


The dense distance graph can be used to answer 
distance queries between pairs of nodes. 


Theorem 3 Given the dense distance graph, the 
shortest distance between any pair of nodes can 


be found in O(./n log? n) time. 
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It can also be used as a dynamic data structure 
that answers shortest path queries and allows 
edge cost updates. 


Theorem 4 For planar graphs with only non- 
negative weight edges, there is a dynamic data 
structure that supports distance queries and 
update operations that change edge weights in 
amortized O(n?/3 log’/3 n) time per operation. 
For planar graph with negative weight edges, 
there is a dynamic data structures that supports 
the same set of operations in amortized 
O(n*/5 log!3/> n) time per operation. 


Note that the dynamic data structure does not 
support edge insertions and deletions, since these 
operations might destroy the recursive decompo- 
sition. 


Applications 


The shortest path problem has long been studied 
and continues to find applications in diverse ar- 
eas. There are a many problems that reduce to 
the shortest path problem where negative weight 
edges are required, for example the minimum- 
mean length directed circuit. For planar graphs, 
the problem has wide application even when the 
underlying graph is a grid. For example, there 
are recent image segmentation approaches that 
use negative cycle detection [2, 3]. Some of other 
applications for planar graphs include separator 
algorithms [13] and multi-source multi-sink flow 
algorithms [11]. 


Open Problems 


Klein [8] gives a technique that improves the 
running time of the construction of the dense dis- 
tance graph to O(n log” n) when all edge weights 
are non-negative; this also reduces the amor- 
tized running time for the dynamic case down to 
O(n2/3 log®/3 n). Also, for planar graphs with no 
negative weight edges, Cabello [1] gives a faster 
algorithm for computing the shortest distances 
between k pairs of nodes. However, the problem 
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for improving the bound of O(n log? n) for find- 
ing shortest paths in planar graphs with general 
edge weights remains opened. 

It is not known how to handle edge inser- 
tions and deletions in the dynamic data structure. 
A new data structure might be needed instead of 
the dense distance graph, because the dense dis- 
tance graph is determined by the decomposition. 
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Problem Definition 


A point lattice is the set of all integer linear 
combinations 


n 
L(b1,...,Dn) = 4 >) xibj:x1,...,%, EZ 


i=1 


of n linearly independent vectors b;,..., b, € R” 
in m-dimensional Euclidean space. For compu- 
tational purposes, the lattice vectors bi,..., Dy 
are often assumed to have integer (or rational) 
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entries, so that the lattice can be represented 
by an integer matrix B = [bi,...,b,] €¢ 2” 
(called basis) having the generating vectors as 
columns. Using matrix notation, lattice points 
in £(B) can be conveniently represented as Bx 
where x is an integer vector. The integers m and 
n are called the dimension and rank of the lattice 
respectively. Notice that any lattice admits mul- 
tiple bases, but they all have the same rank and 
dimension. 

The main computational problems on lattices 
are the Shortest Vector Problem, which asks 
to find the shortest nonzero vector in a given 
lattice, and the Closest Vector Problem, which 
asks to find the lattice point closest to a given 
target. Both problems can be defined with 
respect to any norm, but the Euclidean norm 


=, 2; 
lv|| = ,/>0; v7 is the most common. Other 


norms typically found in computer science 
applications are the €; norm |lv|]1 = >>; |vi| 
and the max norm ||V||oo0 = max; |v;|. This entry 
focuses on the Euclidean norm. 

Since no efficient algorithm is known to solve 
SVP and CVP exactly in arbitrary high dimen- 
sion, the problems are usually defined in their 
approximation version, where the approximation 
factor y => 1 can be a function of the dimension 
or rank of the lattice. 


Definition 1 (Shortest Vector Problem, SVP, ) 
Given a lattice C£(B), find a nonzero lattice 
vector Bx (where x € Z” \ {0}) such that 
||Bx|| < y - ||By|| for any y € Z” \ {0}. 


Definition 2 (Closest Vector Problem, CVP,) 
Given a lattice £(B) and a target point t, find 
a lattice vector Bx (where x € Z”) such that 
||Bx — t|| < y - ||By —t]] for any y € Z”. 


Lattices have been investigated by mathemati- 
cians for centuries in the equivalent language of 
quadratic forms, and are the main object of study 
in the geometry of numbers, a field initiated by 
Minkowski as a bridge between geometry and 
number theory. For a mathematical introduction 
to lattices see [3]. The reader is referred to [6, 12] 
for an introduction to lattices with an emphasis 
on computational and algorithmic issues. 
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Key Results 


The problem of finding an efficient (polynomial 
time) solution to SVP, for lattices in arbitrary 
dimension was first solved by the celebrated 
lattice reduction algorithm of Lenstra, Lenstra 
and Lovasz [11], commonly known as the LLL 
algorithm. 


Theorem 1 There is a polynomial time algo- 
rithm to solve SVP, for y = (2//3)", where n 
is the rank of the input lattice. 


The LLL algorithm achieves more than just find- 
ing a relatively short lattice vector: it finds a so- 
called reduced basis for the input lattice, i.e., 
an entire basis of relatively short lattice vectors. 
Shortly after the discovery of the LLL algorithm, 
Babai [2] showed that reduced bases can be used 
to efficiently solve CVP, as well within similar 
approximation factors. 


Corollary 1 There is a polynomial time algo- 
rithm to solve CVP, for y = O(2/ ./3)", where 
n is the rank of the input lattice. 


The reader is referred to the original pa- 
pers [2,11] and [12, chap. 2] for details. 
Introductory presentations of the LLL algorithm 
can also be found in many other texts, e.g., [5, 
chap. 16] and [15, chap. 27]. It is interesting to 
note that CVP is at least as hard as SVP (see 
[12, chap 2]) in the sense that any algorithm that 
solves CVP, can be efficiently adapted to solve 
SVP, within the same approximation factor. 
Both SVP, and CVP, are known to be NP- 
hard in their exact (vy = 1) or even approximate 
versions for small values of y, e.g., constant y 
independent of the dimension. (See [13, chaps. 3 
and 4] and [4, 10] for the most recent results.) So, 
no efficient algorithm is likely to exist to solve 
the problems exactly in arbitrary dimension. For 
any fixed dimension n, both SVP and CVP can be 
solved exactly in polynomial time using an algo- 
rithm of Kannan [9]. However, the dependency 
of the running time on the lattice dimension is 
nP@), Using randomization, exact SVP can be 
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solved probabilistically in 2°™ time and space 
using the sieving algorithm of Ajtai, Kumar and 
Sivakumar [1]. 

As for approximate solutions, the LLL lattice 
reduction algorithm has been improved both in 
terms of running time and approximation guaran- 
tee. (See [14] and references therein.) Currently, 
the best (randomized) polynomial time approxi- 


mation algorithm achieves approximation factor 
y= 20(n loglogn/logn) 


Applications 


Despite the large (exponential in n) approxima- 
tion factor, the LLL algorithm has found numer- 
ous applications and lead to the solution of many 
algorithmic problems in computer science. The 
number and variety of applications is too large 
to give a comprehensive list. Some of the most 
representative applications in different areas of 
computer science are mentioned below. 

The first motivating applications of lattice 
basis reduction were the solution of integer 
programs with a fixed number of variables 
and the factorization of polynomials with 
rationals coefficients. (See [11, 8], and [15, 
chap. 16].) Other classic applications are the 
solution of random instances of low-density 
subset-sum problems, breaking (truncated) 
linear congruential pseudorandom generators, 
simultaneous Diophantine approximation, and 
the disproof of Mertens’ conjecture. (See [8] and 
(5, chap. 17].) 

More recently, lattice basis reduction has been 
extensively used to solve many problems in crypt- 
analysis and coding theory, including breaking 
several variants of the RSA cryptosystem and the 
DSA digital signature algorithm, finding small 
solutions to modular equations, and list decoding 
of CRT (Chinese Reminder Theorem) codes. The 
reader is referred to [7, 13] for a survey of recent 
applications, mostly in the area of cryptanalysis. 

One last class of applications of lattice prob- 
lems is the design of cryptographic functions 
(e.g., collision resistant hash functions, public 
key encryption schemes, etc.) based on the appar- 
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ent intractability of solving SVP, within small 
approximation factors. The reader is referred to 
[12, chap. 8] and [13] for a survey of such appli- 
cations, and further pointers to relevant literature. 
One distinguishing feature of many such lattice 
based cryptographic functions is that they can be 
proved to be hard to break on the average, based 
on a worst-case intractability assumption about 
the underlying lattice problem. 


Open Problems 


The main open problems in the computational 
study of lattices is to determine the complexity of 
approximate SVP, and CVP,, for approximation 
factors y = n° polynomial in the rank of the 
lattice. Specifically, 


e Are there polynomial time algorithm that 
solve SVP, or CVP, for polynomial factors 
y = n°? (Finding such algorithms even for 
very large exponent c would be a major 
breakthrough in computer science.) 

e Is there an € > 0 such that approximating 
SVP, or CVP, to within y =n‘ is NP- 
hard? (The strongest known inapproximability 
results [4] are for factors of the form 
nOU/loglosn) which grow faster than any 
poly-logarithmic function, but slower than 
any polynomial.) 


There is theoretical evidence that for large 
polynomials factors y = n°, SVP; and CVP,, are 
not NP-hard. Specifically, both problems belong 
to complexity class coAM for approximation fac- 
tor y = O(,/n/ logn). (See [12, chap. 9].) So, 
the problems cannot be NP-hard within such 
factors unless the polynomial hierarchy PH col- 
lapses. 


URL to Code 


The LLL lattice reduction algorithm is imple- 
mented in most library and packages for compu- 
tational algebra, e.g., 
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¢ GAP (http://www.gap-system.org) 

e LiDIA (http://www.cdc.informatik.tu-darmstadt. 
de/TI/LiDIA/) 

« Magma _(http://magma.maths.usyd.edu.au/ 


magma/) 
¢ Maple (http://www.maplesoft.com/) 
¢ Mathematica (http://www.wolfram.com/ 


products/mathematica/index.html) 
¢ NTL (http://shoup.net/ntl/). 


NTL also includes an implementation of Block 
Korkine-Zolotarev reduction that has been exten- 
sively used for cryptanalysis applications. 
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Similarity Between Compressed Strings, Table 1 Various scoring metrics 


Metric Match Mismatch Indel Indel of & characters 
Longest common subsequence 1 0 0 0 

Levenshtein distance 0 1 1 k 

Weighted edit distance 0 8 Vy ku 

Affine gap penalty 1 —6 -—y-—p —y—ku 
Problem Definition Problem 1 


The problem of computing similarity between 
two strings is concerned with comparing two 
strings using some scoring metric. There exist 
various scoring metrics and a popular one is 
the Levenshtein distance (or edit distance) met- 
ric. The standard solution for the Levenshtein 
distance metric was proposed by Wagner and 
Fischer [13], which is based on dynamic pro- 
gramming. Other widely used scoring metrics 
are the longest common subsequence metric, the 
weighted edit distance metric, and the affine gap 
penalty metric. The affine gap penalty metric is 
the most general, and it is a quite complicated 
metric to deal with. Table 1 shows the differences 
between the four metrics. 

The problem considered in this entry is the 
similarity between two compressed strings. This 
problem is concerned with efficiently comput- 
ing similarity without decompressing two strings. 
The compressions used for this problem in the 
literature are run-length encoding and Lempel- 
Ziv (LZ) compression [14]. 


Run-Length Encoding 

A string S is run-length encoded if it is described 
as an ordered sequence of pairs (0,7), often de- 
noted “o’”, each consisting of an alphabet sym- 
bol, o, and an integer, i. Each pair corresponds to 
arun in S, consisting of i consecutive occurrences 
of o. For example, the string aaabbbbaccccbb 
can be encoded a?h*ta!c*h? or, equivalently, 
(a, 3)(b, 4)(a, 1)(c, 4)(b, 2). Let A and B be two 
strings with lengths n and m, respectively. Let A’ 
and B’ be the run-length encoded strings of A and 
B, and n’ and m’ be the lengths of A’ and B’, 
respectively. 


INPUT: Two run-length encoded strings A’ and B’, 
a scoring metric d. 

OUTPUT: The similarity between A’ and B’ 
using d. 


LZ Compression 

Let X and Y be two strings with length O(n). 
Let X’ and Y’ be the LZ compressed strings of X 
and Y, respectively. Then the lengths of X’ and Y’ 
are O(hn/logn), where h < 1 is the entropy of 
strings X and Y. 


Problem 2 

INPUT: Two LZ compressed strings X’ and Y’, 
a scoring metric d. 

OUTPUT: The similarity between X’ and Y’ 
using d. 


Block Computation 

To compute similarity between compressed 
strings efficiently, a block 
computation method. Dynamic programming 
tables are divided into submatrices, which are 
called “blocks”. For run-length encoded strings, 
a block is a submatrix made up of two runs — one 
of A and one of B. For LZ compressed strings, 
a block is a submatrix made up of two phrases — 
one phrase from each string. See [5] for more 
details. Then, blocks are computed from left to 
right and from top to bottom. For each block, 
only the bottom row and the rightmost column 
are computed. Figure | shows an example of 
block computation. 


one can use 


Key Results 


The problem of computing similarity of two 
run-length encoded strings, A’ and B’, has been 
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Similarity Between 
Compressed Strings, 
Fig. 1 Dynamic 
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programming table for 
strings a’c?b! and 
a‘ b%c" is divided into 9 


2p 


blocks. For one of the 

blocks, e.g., B, only the 

bottom row C and the Pp 
rightmost column D are 


oop 


computed from E and F 


SOOO ee 


studied for various scoring metrics. Bunke and 
Csirik [4] presented the first solution to Problem 
1 using the longest common subsequence metric. 
The algorithm is based on block computation of 
the dynamic programming table. 


Theorem 1 (Bunke and Csirik [4]) A longest 
common subsequence of run-length encoded 
strings A’ and B' can be computed in O(nm' 
+n'm) time. 


For the Levenshtein distance metric, Arbell, Lan- 
dau, and Mitchell [2] and Makinen, Navarro, 
and Ukkonen [10] presented O(nm’ + n’m) time 
algorithms, independently. These algorithms are 
extensions of the algorithm of Bunke and Csirik. 


Theorem 2 (Arbell, Landau, and Mitchell 
[2], Méakinen, Navarro, and Ukkonen [10]) 
The Levenshtein distance between run-length 
encoded strings A' and B' can be computed in 
O(nm' +n'm) time. 


For the weighted edit distance metric, Crochemore, 
Landau, and Ziv-Ukelson [6] and Méakinen, 
Navarro, and Ukkonen [11] gave O(nm' + n'm) 
time algorithms using techniques completely 
different from each other. The algorithm of 
Crochemore, Landau, and Ziv-Ukelson [6] is 
based on the technique which is used in the 
LZ compressed pattern matching algorithm [6], 
and the algorithm of Makinen, Navarro, and 
Ukkonen [11] is an extension of the algorithm for 
the Levenshtein distance metric. 


Theorem 3 (Crochemore, Landau, and Ziv- 
Ukelson [6] Makinen, Navarro, and Ukko- 
nen [11]) The weighted edit distance between 


run-length encoded strings A’ and B’ can be 
computed in O(nm' + n'm) time. 


For the affine gap penalty metric, Kim, Amir, 
Landau, and Park [8] gave an O(nm' + n'm) 
time algorithm. To compute similarity in this 
metric efficiently, the problem is converted into 
a path problem on a directed acyclic graph and 
some properties of maximum paths in this graph 
are used. It is not necessary to build the graph ex- 
plicitly since they came up with new recurrences 
using the properties of the graph. 


Theorem 4 (Kim, Amir, Landau, and Park 
[8]) The similarity between run-length encoded 
strings A’ and B’ in the affine gap penalty 
metric can be computed in O(nm' +n'm) 
time. 


The above results show that comparison of run- 
length encoded strings using the longest common 
subsequence metric is successfully extended to 
more general scoring metrics. 

For the longest common _ subsequence 
metric, there exist improved algorithms. 
Apostolico, Landau, and Skiena [1] gave an 
O(n'm' log(n'm’)) time algorithm. This algo- 
rithm is based on tracing specific optimal paths. 


Theorem 5 (Apostolico, Landau, and Skiena 
[1]) A longest common subsequence of run- 
length encoded strings A’ and B' can be computed 
in O(n'm' log(n’ + m’)) time. 


Mitchell [12] obtained an O((d +n’ +m’) 
log(d +n’+m’')) time algorithm, where d is 
the number of matches of compressed characters. 
This algorithm is based on computing geometric 
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shortest paths using special convex distance 
functions. 


Theorem 6 (Mitchell [12]) A longest common 
subsequence of run-length encoded strings A’ 
and B' can be computed in O((d +n' +m’) 
log(d +n’ + m’)) time, where d is the number 
of matches of compressed characters. 


Makinen, Navarro, and Ukkonen [11] conjec- 
tured an O(n'm’) time algorithm on average 
under the assumption that the lengths of 
the runs are equally distributed in both 
strings. 


Conjecture 1 (Miakinen, Navarro, and Ukko- 
nen [11]) A longest common subsequence of 
run-length encoded strings A’ and B’ can be 
computed in O(n'm’) time on average. 


For Problem 2, Crochemore, Landau, and Ziv- 
Ukelson [6] presented a solution using the addi- 
tive gap penalty metric. The additive gap penalty 
metric consists of 1 for match, —8 for mismatch, 
and — for indel, which is almost the same as the 
weighted edit distance metric. 


Theorem 7 (Crochemore, Landau, and Ziv- 
Ukelson [6]) The similarity between LZ com- 
pressed strings X’ and Y' in the additive gap 
penalty metric can be computed in O(hn?/ logn) 
time, where h <1 is the entropy of strings X 
and Y. 


Applications 


Run-length encoding serves as a popular image 
compression technique, since many classes 
of images (e.g., binary images in facsimile 
transmission or for use in optical character 
recognition) typically contain large patches 
of identically-valued pixels. Approximate 
matching on images can be a useful tool to 
handle distortions. Even a one-dimensional 
compressed approximate matching algorithm 
would be useful to speed up two-dimensional 
approximate matching allowing mismatches and 
even rotations [3, 7, 9]. 


Similarity Between Compressed Strings 


Open Problems 


The worst-case complexity of the problem is not 
fully understood. For the longest common sub- 
sequence metric, there exist some results whose 
time complexities are better than O(nm’' + n'm) 
to compute the similarity of two run-length en- 
coded strings [1, 11, 12]. It remains open to 
extend these results to the Levenshtein distance 
metric, the weighted edit distance metric and the 
affine gap penalty metric. 

In addition, for the longest common subse- 
quence metric, it is an open problem to prove 
Conjecture 1. 
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Problem Definition 


A spanner is a sparse subgraph of a given undi- 
rected graph that preserves approximate distance 
between each pair of vertices. More precisely, 
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a t-spanner of a graph G = (V, £) is a sub- 
graph (V, Es), Es © E such that, for any pair 
of vertices, their distance in the subgraph is at 
most ¢ times their distance in the original graph, 
where f is called the stretch factor. The spanners 
were defined formally by Peleg and Schaffer [15] 
though the associated notion was used implicitly 
by Awerbuch [3] in the context of network syn- 
chronizers. 

Computing t-spanner of smallest size for a 
given graph is a well-motivated combinatorial 
problem with many applications. However, com- 
puting ¢-spanner of smallest size for a graph 
is NP-hard. In fact, for t > 2, it is NP-hard 
[11] even to approximate the smallest size of t- 
spanner of a graph with ratio O(20-)!"”) for 
any 4 > 0. Having realized this fact, researchers 
have pursued another direction which is quite 
interesting and useful. Let SZ be the size of the 
sparsest t-spanner of a graph G, and let S* be the 
maximum value of S@ over all possible graphs on 
n vertices. Does there exist a polynomial time al- 
gorithm which computes, for any weighted graph 
and parameter f, its ¢-spanner of size O(S/)? 
Such an algorithm would be the best one can hope 
for given the hardness of the original t-spanner 
problem. Naturally, the question arises as to how 
large can S’ be ? A 43-year-old girth lower 
bound conjecture by Erdos [13] implies that there 
are graphs on n vertices whose 2k— as well as 
(2k — 1)-spanner will require Q(n'!+!/*) edges. 
This conjecture has been proved for k = 1, 2,3, 
and 5. Note that a (2k — 1)-spanner is also a 2k- 
spanner, and the lower bound on the size is the 
same for both 2k-spanner and (2k — 1)-spanner. 
So the objective is to design an algorithm that, 
for any weighted graph on n vertices, computes a 
(2k — 1)-spanner of O(n'!+1/*) size. Needless to 
say, one would like to design the fastest algorithm 
for this problem, and the most ambitious aim 
would be to achieve the linear time complexity. 


Key Results 
The key results of this entry are two very simple 


algorithms which compute a (2k — 1)-spanner of 
a given weighted graph G = (V, E). Let andm 
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denote, respectively, the number of vertices and 
edges of G. The first algorithm, due to Althofer 
et al. [2], is based on a greedy strategy and runs 
in O(mn'*"/*) time. The second algorithm [6] 
is based on a very local approach and runs in 
an expected O(m) time. To start with, consider 
the following simple observation. Suppose there 
is a subset E's C E that ensures the following 
proposition for every edge (x, y) € E\Es. 


Pr (x, y) : the vertices x and y are connected 
in the subgraph (V, E's) by a path consisting 
of at most tf edges, and the weight of each 
edge on this path is not more than that of the 


edge (x, y). 


It follows easily that the subgraph (V, E's) will 
be a ¢-spanner of G. The two algorithms for 
computing (2k — 1)-spanner eventually compute 
such set E's based on two completely different 
approaches. 


Algorithm | 

This algorithm selects edges for spanner in a 
greedy fashion and is similar to Kruskal’s algo- 
rithm for computing a minimum spanning tree. 
The edges of the graph are processed in the 
increasing order of their weights. To begin with, 
the spanner Es = Q; and the algorithm adds 
edges to it gradually. The decision as to whether 
an edge, say (u, v), has to be added (or not) to E's 
is made as follows: 


If the distance between u and v in the subgraph 
induced by the current spanner edges E's is more 
than t - weight(u, v), then add the edge (u, v) to 
E's; otherwise, discard the edge. 


It follows that P;(x, y) would hold for each 
edge of F missing in E's, and so at the end, the 
subgraph (V, Es) will be a t-spanner. A well- 
known result in elementary graph theory states 
that a graph with more than n!+!/* edges must 
have a cycle of length at most 2k. It follows 
from the above algorithm that the length of any 
cycle in the subgraph (V, F's) has to be at least 
t + 1. Hence, for t = 2k — 1, the number of 
edges in the subgraph (V, Es) will be less than 
nitk, Thus, the algorithm I described above 
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computes a (2k — 1)-spanner of size O(n'!+1/*), 
which is indeed optimal based on the lower bound 
mentioned earlier. 

A simple O(mn!*!/*) implementation of al- 
gorithm I follows based on Dijkstra’s algorithm. 
Cohen [10] and later Thorup and Zwick [19] 
designed algorithms for (2k — 1)-spanner with an 
improved running time of O(kmn'!*1/*). These 
algorithms relied on several calls to Dijkstra’s 
single-source Shortest path algorithm for dis- 
tance computation and therefore were far from 
achieving linear time. On the other hand, since 
a spanner must approximate all-pairs distances in 
a graph, it appears difficult to compute a spanner 
by avoiding explicit distance information. Some- 
what surprisingly, algorithm II, described in the 
following section, avoids any sort of distance 
computation and achieves expected linear time. 


Algorithm Il 

This algorithm employs a novel clustering based 
on a very local approach and establishes the 
following result for the spanner problem: 


Given a weighted graph G = (V,£) and an 
integer k > 1, a spanner of (2k — 1) stretch and 


O(kn!*1/*) size can be computed in expected 
O(km) time. 


The algorithm executes in O(k) rounds, and 
in each round it essentially explores adjacency 
list of each vertex to prune dispensable edges. As 
a testimony of its simplicity, we will present the 
entire algorithm for 3-spanner and its analysis in 
the following section. The algorithm can be easily 
adapted in other computational models (parallel, 
external memory, distributed) with nearly optimal 
performance (see [6] for more details). 


Computing a 3-Spanner in Linear Time 

To meet the size constraint of a 3-spanner, a 
vertex, on an average, contributes \/n edges to 
the spanner. So the vertices with degree O(./n) 
are easy to handle since all their edges can be 
selected in the spanner. For vertices with higher 
degree, a clustering (groupings) scheme is em- 
ployed to tackle this problem which has its basis 
in dominating sets. 
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To begin with, there is a set of edges EF’ initial- 
ized to E and empty spanner E's. The algorithm 
processes the edges E’, moves some of them to 
the spanner F's, and discards the remaining ones. 
It does so in the following two phases: 


1. Forming the clusters 

A sample R C V is chosen by picking each 

vertex independently with probability Tr 

The clusters will be formed around these 

sampled vertices. Initially, the clusters are 

{{uj|u € R}. Each u € R is called the 

center of its cluster. Each unsampled vertex 

v € V — Ris processed as follows: 

(a) If v is not adjacent to any sampled vertex, 
then every edge incident on v is moved to 
Es. 

(b) If v is adjacent to one or more sam- 
pled vertices, let N(v,R) be the sam- 
pled neighbor that is nearest (Ties can 
be broken arbitrarily. However, it helps 
conceptually to assume that all weights 
are distinct) to v. The edge (uv, V(v, R)) 
along with every edge that is incident on 
v with weight less than this edge is moved 
to Ey. The vertex v is added to the cluster 
centered at V(v, R). 

As a last step of the first phase, all those edges 

(u, v) from E’ where u and v are not sampled 

and belong to the same cluster are discarded. 

Let V’ be the set of vertices corresponding 
to the endpoints of the edges E’ left after the 

first phase. It follows that each vertex from V’ 

is either a sampled vertex or adjacent to some 

sampled vertex, and step 1(b) has partitioned 

V’ into disjoint clusters each centered around 

some sampled vertex. Also note that, as a 

consequence of the last step, each edge of 

the set E’ is an intercluster edge. The graph 

(V’, E’), and the corresponding clustering of 

V’, is passed onto the following (second) 

phase. 

2. Joining vertices with their neighboring clus- 
ters 

Each vertex v of graph (V’, E’) is processed 

as follows. Let E’(v,c) be the edges from the 

set E’ incident on v from a cluster c. For each 
cluster c neighboring to v, the least-weight 
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edge from E’(v,c) is moved to Fg, and the 
remaining edges are discarded. 


The number of edges added to the spanner 
Es during the algorithm described above can be 
bounded as follows. Note that the sample set R 
is formed by picking each vertex randomly in- 
dependently with probability Te It thus follows 
from elementary probability that for each vertex 
uv € V, the expected number of incident edges 
with weight less than that of (v, V(v, R)) is at 
most ./n. Thus, the expected number of edges 
contributed to the spanner by each vertex in the 
first phase of the algorithm is at most ./n. The 
number of edges added to the spanner in the 
second phase is O(n|7R|). Since the expected size 
of the sample FR is ./n, therefore, the expected 
number of edges added to the spanner in the 
second phase is at most n3/?. Hence, the expected 
size of the spanner F's at the end of the algorithm 
described above is at most 2n3/?. The algorithm 
is repeated if the size of the spanner exceeds 
3n/, It follows using Markov’s inequality that 
the expected number of such repetitions will be 
O(1). 

We now establish that F's is a 3-spanner. Note 
that for every edge (u,v) ¢ Es, the vertices u, v 
belong to some cluster in the first phase. There 
are two cases now. 


Case 1: (u and v belong to the same cluster) 


Let uw and v belong to the cluster centered at 
x € R. It follows from the first phase of the 
algorithm that there is a 2-edge path u — x — v 
in the spanner with each edge not heavier than 
the edge (u, v). (This provides a justification for 
discarding all intracluster edges at the end of the 
first phase.) 


Case 2 : (u and v belong to different clusters) 


Clearly, the edge (u,v) was removed from E’ 
during phase 2, and suppose it was removed 
while processing the vertex u. Let v belong to the 
cluster centered at x € R. 

In the beginning of the second phase, 
let (u,v’') € E”’ be the least-weight edge 
among all the edges incident on u from the 
vertices of the cluster centered at x. So it 
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must be that weight(u,v') < weight(u,v). 
The processing of vertex u during the second 
phase of our algorithm ensures that the edge 
(u,v’) gets added to Es. Hence, there is a 
path T,, = u—v’ — x — v between u and 
v in the spanner Es, and its weight can be 
bounded as weight(Tl,j,) = weight(u,v') + 
weight(v',x) + weight(x,v). Since (v’,x) 
and (v,x) were chosen in the first phase, it 
follows that weight(v’,x) < weight(u, v’) 
and weight(x,v) < weight(u,v). It follows 
that the spanner (V, Es) has stretch 3. Moreover, 
both phases of the algorithm can be executed in 
O(m) time using elementary data structures and 
bucket sorting. 

The algorithm for computing a (2k — 1)- 
spanner executes k iterations where each iteration 
is similar to the first phase of the 3-spanner algo- 
rithm. For details and formal proofs, the reader 
may refer to [6]. 


Other Related Works 
The notion of a spanner has been generalized in 
the past by many researchers. 


Additive Spanners 

A t-spanner as defined above approximates pair- 
wise distances with multiplicative error and can 
be called a multiplicative spanner. In an analo- 
gous manner, one can define spanners that ap- 
proximate pairwise distances with additive error. 
Such a spanner is called an additive spanner, and 
the corresponding error is called surplus. Aing- 
worth et al. [1] presented the first additive spanner 
of size O(n/? logn) with surplus 2. Baswana 
et al. [7] presented a construction of O(n4/3)- 
size additive spanner with surplus 6. Recently, 
Chechik [9] presented a construction of O(n7/*)- 
size additive spanner with surplus 4. It is a major 
open problem if there exists any sparser additive 
spanner. 


(a, B)-Spanner 

Elkin and Peleg [12] introduced the notion of 
(a, 8)-spanner for unweighted graphs, which can 
be viewed as a hybrid of multiplicative and ad- 
ditive spanners. An (a, 6)-spanner is a subgraph 
such that the distance between any pair of ver- 


Simple Algorithms for Spanners in Weighted Graphs 


tices u,v € V in this subgraph is bounded 
by ad(u,v) + B, where é(u,v) is the distance 
between wu and v in the original graph. Elkin 
and Peleg showed that an (1 + e, 6)-spanner of 
size O(Bn't®), for arbitrarily small ¢,5 > 0, 
can be computed at the expense of sufficiently 
large surplus 6. Recently, Thorup and Zwick [20] 
introduced a spanner where the additive error is 
sublinear in terms of the distance being approxi- 
mated. 

Other interesting variants of spanner include 
distance preserver proposed by Bollobas et al. [8] 
and lightweight spanner proposed by Awerbuch 
et al. [4]. A subgraph is said to be a d-preserver 
if it preserves exact distances for each pair of 
vertices which are separated by distance at least 
d. A lightweight spanner tries to minimize the 
number of edges as well as the total edge weight. 
A lightness parameter is defined for a subgraph 
as the ratio of total weight of all its edges and 
the weight of the minimum spanning tree of the 
graph. Awerbuch et al. [4] showed that for any 
weighted graph and integer k > 1, there exists 
a polynomially constructible O(k)-spanner with 
O(kpn't1/*) edges and O(kpn'/*) lightness, 
where p = log(diameter). 

In addition to the above work on the gener- 
alization of spanners, a lot of work has also been 
done on computing spanners for special classes of 
graphs, e.g., chordal graphs, unweighted graphs, 
and Euclidean graphs. For chordal graphs, Peleg 
and Schaffer [15] designed an algorithm that 
computes a 2-spanner of size O(n?/?) and a 
3-spanner of size O(nlogn). For unweighted 
graphs, Halperin and Zwick [14] gave an O(m) 
time algorithm for this problem. Salowe [18] 
presented an algorithm for computing a (1 + €)- 
spanner of a d-dimensional complete Euclidean 
graph in O(n logn + ai) time. However, none of 
the algorithms for these special classes of graphs 
seem to extend to general weighted undirected 
graphs. 


Applications 


Spanners are quite useful in various applica- 
tions in the area of distributed systems and 
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communication networks. In these applications, 
spanners appear as the underlying graph 
structure. In order to build compact routing 
tables [17], many existing routing schemes 
use the edges of a sparse spanner for routing 
messages. In distributed systems, spanners play 
an important role in designing synchronizers. 
Awerbuch [3] and Peleg and Ullman [16] showed 
that the quality of a spanner (in terms of stretch 
factor and the number of spanner edges) is very 
closely related to the time and communication 
complexity of any synchronizer for the network. 
The spanners have also been used implicitly in 
a number of algorithms for computing all-pairs 
approximate shortest paths [5, 10, 19,21]. For a 
number of other applications, please refer to the 
papers [2,3, 15, 17]. 


Open Problems 


The running time as well as the size of the (2k — 
1)-spanner computed by the algorithm described 
above are away from their respective worst-case 
lower bounds by a factor of k. For any constant 
value of k, both these parameters are optimal. 
However, for the extreme value of k, that is, 
for k = logn, there is deviation by a factor of 
log n. Is it possible to get rid of this multiplicative 
factor of k from the running time of the algorithm 
and/or the size of the (2k —1)-spanner computed? 
It seems that a more careful analysis coupled with 
advanced probabilistic tools might be useful in 
this direction. 
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Introduction 


We have a two-sided market, one side is a set U 
of men, the other side is a set V of women. The 
first part of the input also contains the mutually 
acceptable man-woman pairs EF. This makes up 
a bipartite graph G(U U V, E). The second part 
of the input contains the preference lists of each 
person, that is a weak order (may contain ties) on 
his/her acceptable pairs. 

A matching is a set of mutually disjoint ac- 
ceptable man-woman pairs. Given a matching M, 
aman m and a woman w form a blocking pair, if 
they are an acceptable pair but are not partners 
in M, and they both prefer each other to their 
partner, or have no partner in M. That is either 
w is unmatched in M or w prefers m to her M- 
partner, and either m is unmatched in M or m 
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prefers w to his M-partner. A matching M is 
stable if there are no blocking pairs. 

We consider a two-sided market under incom- 
plete preference lists with ties (SMTI), where the 
goal is to find a maximum size stable matching 
(MAX-SMTD. 


Problem Definition 


Problem 1 (MAX-SMTI) 


INPUT: Set U of men, and set V of women and 
each person’s preference list. 

OuTPUT: A stable matching of maximum 
size. 


Input format A list of an agent a consists of 
pairs (41, pi), (a2, p2),.--,(@a, Pa), Where aj 
are the acceptable persons from the other gender 
and 1 < p; < max(|U|,|V|) are integers with 
ordering pi > p2 =>-:: => pa. Agent a strictly 
prefers a; to a; if pj > pj; and is indifferent 
between a; and a; if pj = p;. Moreover women 
needs a black-box procedure, which on input a; 
outputs in constant time p; (we assume that this 
procedure is also a part of the input). The size of 
the input is the number of agents plus the total 
length of the lists. 


Definition of approximation ratios A goodness 
measure of an approximation algorithm A for a 
maximization problem is defined as follows: the 
approximation ratio of A is max{opt(/)/A(/)} 
over all instances J, where opt(/) and A(/) 
are the size of the optimal and the algorithm’s 
solution on instance J, respectively. 


Short history It was shown in [4] that finding 
the optimal solution is NP-hard; moreover, it 
is APX-hard [3]. The original Deferred Accep- 
tance Algorithm of Gale and Shapley gives a 2- 
approximation; the first approximation algorithm 
with a strictly better ratio was presented in [5], 
where the approximation ratio was 15/8. This was 
improved in [6] to a 5/3-approximation and later 
in [9] to a 3/2-approximation; this latter algorithm 
had nonlinear running time. Recently in [10] and 
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in [7], linear time 3/2-approximation algorithms 
were given. 


Key Results 


A simple variation of the famous Deferred 
Acceptance Algorithm of Gale and Shapley 
is presented; which also runs in linear time 
and gives a 3/2-approximation for the problem 
MAX-SMTI. This algorithm is local; no central 
agent or knowledge about the global input is 
needed. 


Algorithm 


Preliminary Definitions and Concepts for the 
Algorithm 

During the algorithm, the agents may have dif- 
ferent statuses, and some Boolean properties de- 
scribed below, and also varying actual prefer- 
ences. 

A status of a man can be either a lad or a 
bachelor or an old bachelor. A man can be 
active or inactive. A man is active, if he is not 
an old bachelor and he is not engaged (i.e., 
he has actually no partner). A man can also be 
uncertain, described later. Initially every man is 
an active lad. 

A status of a woman can be either maiden 
or engaged. An engaged woman is flighty, if 
her fiancé is uncertain. Initially every woman is 
maiden. 

The actual preferences a man m is described 
as follows. If women w, and w2 are indifferent 
on m’s list, and w, is maiden but w2 is engaged, 
then m prefers maiden w, to engaged w2. An 
engaged lad is uncertain if his list contains 
a woman he prefers to his actual fiancée (this 
can happen, if there were two maidens with the 
same highest priority on m’s list, and m became 
engaged to one of them). 

The actual preferences a woman w is described 
as follows. If there are two men, m, and m2 
with the same priority in w’s list, and my, is 
a lad, but m2 is a bachelor, then w prefers 
bachelor mz to lad m,. If w is flighty, then 
she prefers a man who is not uncertain, to a 
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man who is uncertain (regardless of her original 
preferences). 


The Algorithm 


While there exists an active man m, he pro- 
poses to his favorite woman w. If w accepts 
his proposal, they become engaged. If w 
rejects him, m deletes w from his list and 
remain active. 

When a woman w gets a new proposal from 
man m, she accepts this proposal if she 
(actually) prefers m to her current fiancé. 
Otherwise she rejects m. 

If w accepted m, then she rejects her previ- 
ous fiancé, if there was one (breaks off her 
engagement), and becomes engaged to m. 

If m was engaged to a woman w and later 
w rejects him, then m becomes active again 
and deletes w from his list, except if m is 
uncertain, in this case m keeps w on the list. 
If the list of m becomes empty for the first 
time, he turns into a bachelor, his original list 
is recovered, and he reactivates himself. If 
the list of 7 becomes empty for the second 
time, he will turn into an old bachelor and 
will remain inactive forever. 


After the algorithm finishes, the engaged pairs get 
married and form matching M. 


Theorem 1 ([7]) The algorithm always gives a 
stable matching M and it is 3/2-approximating, 
i.e., the stable matching given has size at least 
2/3 of the maximum size stable matching. 


Running Time, Locality 

This algorithm runs in linear time using the as- 
sumptions on the input format. Though it is clear 
that along every edge at most three proposals 
happen, the technical details must be worked out; 
see [7] for details. 


Local algorithm Each agent (a man or woman) 
always makes a greedy decision based only on 
local information (his/her preference list, and 
provided by some communication with his/her 
acceptable partners). A local algorithm is linear if 
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every agent communicates with each acceptable 
partner only a constant time during the algorithm. 

The algorithm presented is a linear time local 
algorithm (using the appropriate data structures); 
see [7] for details. 
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Problem Definition 


Buffer management policies are online al- 
gorithms that control a limited buffer of 
packets with homogeneous or heterogeneous 
characteristics, deciding whether to accept new 
packets when they arrive, which packets to 
process and transmit, and possibly whether 
to push out packets already residing in the 
buffer. Although settings differ, the problem is 
always to achieve the best possible competitive 
ratio, i.e., find a policy with good worst-case 
guarantees in comparison with an optimal offline 
clairvoyant algorithm. The policies themselves 
are often simple, simplicity being an important 
advantage for implementation in switches; the 
hard problem is to find proofs of lower and 
especially upper bounds for their competitive 
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ratios. Thus, this problem is more theoretical 
in nature, although the resulting throughput 
guarantees are important tools in the design of 
network elements. Comprehensive surveys of this 
field have been given in the past by Goldwasser 
[9] and Epstein and van Stee [7]. 


General Model Description 

We assume discrete slotted time. A packet is fully 
processed if the processing unit has scheduled 
the packet for processing for at least its required 
number of cycles. Each packet may have the 
following characteristics: (i) required processing, 
i.e., how many processing cycles the packet has 
to go through before it can be transmitted; (11) 
value, i.e., how much the packet contributes to 
the objective function when it is transmitted; (iii) 
output port, i.e., where the packet is headed (in 
settings with multiple output ports, it is usually 
assumed that processing occurs independently at 
each port, so it becomes advantageous to have 
more busy output ports at a time); and (iv) size, 
i.e., how many slots (bytes) a packet occupies in 
the buffer. The objective of a buffer management 
policy is to maximize the total value of transmit- 
ted packets. Different settings may assume that 
some characteristics are uniform. 


Competitive Analysis 

Competitive analysis provides a uniform through- 
put guarantee for online algorithms across all 
traffic patterns. An online algorithm ALG is said 
to be a-competitive with respect to some ob- 
jective function f (for some @ > 1 which is 
called the competitive ratio) if for any arrival 
sequence o the objective function value on the 
result of ALG is at least 1/a@ times the objective 
function value on the solution obtained by an 
offline clairvoyant algorithm, denoted OPT. 


Problem 1 (Competitive Ratio) For a given 
switch architecture, packet characteristics, and 
an online algorithm ALG in a given setting, 
prove lower and upper bounds on its competitive 
ratio with respect to weighted throughput (total 
value of packets transmitted by an algorithm). 
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Key Results 


Policies and lower and upper bounds on their 
competitive ratios are outlined according to prob- 
lem settings; the latter differ in which packet 
characteristics they assume to be uniform and 
which are allowed to vary, and additional restric- 
tions may be imposed on admission, processing 
and/or transmission order, and admissible packet 
characteristics. 


Uniform Processing, Uniform Value, 

Shared Memory Switch 

Since all packets are identical, the problem for a 
single queue with one output port is trivial. We 
consider an M x N shared memory switch that 
can hold B packets, with a separate processor 
on each output port. All packets require a single 
processing cycle and have equal value; the goal is 
to maximize the number of transmitted packets. 
Each packet is labeled with an output port where 
it has to be processed and transmitted. 


Non-Push-Out Policies 

Kesselman and Mansour [14] show an adversarial 
logarithmic lower bound: no non-push-out policy 
can achieve competitive ratio better than d/2 for 
d = log, N. On the positive side, they present 
the Harmonic policy that allocates approximately 
1/i of the buffer to the ith largest queue and, for 
its variant, the Parametric Harmonic policy, show 
an upper bound of c log, N + 1. 


Push-Out Policies 

The best known policy is Longest Queue Drop 
(LQD): accept packets greedily if the buffer is not 
full; if it is, accept the new packet and then drop 
a packet from the longest queue (destined to the 
output port with the most packets assigned to it). 
Aiello et al. [1, 10] show that the competitive ratio 
of LQD is between /2 and 2; they also provide 
nonconstant lower bounds for other popular poli- 
cies and a general adversarial lower bound of 4 
on the competitive ratio of any online algorithm. 
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Uniform Processing, Uniform Value, 

Multiple Separated Queues 

In an N x | switch where each of N input queues 
has a separate independent buffer of size B, a 
policy must select which input queue to take a 
packet from and set admission policies for input 
queues. For uniform values, the problem was 
closed by Azar and Litichevskey [3] with a deter- 
ministic policy with competitive ratio converging 
to 55 ~ 1.582 for arbitrary B; a matching lower 
bound was shown by Azar and Richter [4]. 


Uniform Processing, Variable Values, 

Single Queue 

Here, there is only one output port (a single 
queue), and each packet is fully processed in 
one cycle; however, packets have different values, 
making it desirable to drop packets with smaller 
value and process packets of larger value. It 
is easy to show that the Priority Queue (PQ) 
policy that sorts packets with respect to values 
and pushes out smaller values for larger ones is 
optimal. Research has concentrated on models 
with additional constraints: non-push-out policies 
that are not allowed to push admitted packets out 
and the FIFO model where packets have to be 
transmitted in order of arrival. Another important 
special case considers two possible values: | and 
V>1. 


Non-Push-Out Policies 

Aiello et al. [2] consider five online policies 
for the two-valued case, considering the specific 
cases of V = 1, V = 2, and V = oo. Andelman, 
Mansour, and Zhu provide a deterministic policy 
(Ratio Partition) that achieves optimal (2 — t)- 
competitiveness [26]. In the case of arbitrary 
values between | and V > 1, they show that the 
optimal competitive ratio is In V, proving tightly 
matching bounds of 1 + InV and 2 + InV + 
O(In? V/B) [2,26]. 


Push-Out Policies 

In the FIFO model, there has been a line of ad- 
versarial lower bounds culminating in the lower 
bound of 1.419 shown by Kesselman, Mansour, 
and van Stee [18] that applies to all algorithms, 
with a stronger bound of 1.434 for B = 2 
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[2,26]. As for upper bounds, in this simple model 
the FIFO greedy push-out policy (accept ev- 
ery packet to end of queue, then push out the 
packet with smallest value if buffer has over- 
flown) has been shown by Kesselman et al. to 
be 2-competitive [17]; in the two-valued case, 
they provide an adversarial lower bound of 1.282, 
and a long line of improvements for the upper 
bound has led to the optimal Account Strategy 
policy of Englert and Westermann [6]. They show 
an adversarial lower bound of r = $(/13 = 
1) = 1.303 for any B > 2 andro = /2= 
$(V5+ 4/2 —3) & 1.282 for B > oo and 
show that Account Strategy achieves competitive 
ratio r for arbitrary B and ry for B > oo. 
Thus, in the push-out two-valued case, the gap 
between lower and upper bounds has been closed 
completely. 


Uniform Processing, Variable Values, 
Multiple Separated Queues 
Kawahara et al. [ll] consider an N x 1 
switch with N_ separated queues, each of 
which has a distinct buffer of size B and has 
a value a; associated with it, 1 = a, < 
. < ay = a. A policy selects one of N 
queues, maximizing total transmitted value; [11] 
provides matching lower and upper bounds 
a S, 


n’+i1 ? 
Vial a; 


for the PQ policy as | + where 


jai 8) 
W 32 j=l ej 
n= arg MaXn Faery 

joi SS 


3 2 
a?+a~+a 
lower bound 1 + | rs a a for any 


online algorithm. Azar and Richter [4] show 
that any r-competitive policy for a FIFO queue 
with variable values yields a 2r-competitive 
policy for multiple queues. Kobayashi et al. 
[21] show that an r-competitive policy for 


unit values and multiple queues yields a 
Vr(2—r)+r2—2r+2 
— V@=r)tr-1 
for the two-valued case. 


, and an adversarial 


min \Vr, | competitive policy 


Uniform Processing, Variable Values, 

Shared Memory Switch 

Several output queues, each with a processor, 
share a buffer of size B, and each unit- 
sized packet is labeled with an output port 
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and an intrinsic value from | to V. Eugster, 
Kogan, Nikolenko, and Sirotkin [8] show a 
(V/V —o0(A/V )) lower bound for the LQD 
(Longest Queue Drop) policy, an $(min{ V, B} — 
1) lower bound for the MVD (Minimal Value 
Drop) policy, and a 4 lower bound for the MRD 
(Maximal Ratio Drop) policy. 


Uniform Processing, ClOQ Switches 

In CIOQ (Combined Input-Output Queued) 
switches, one maintains at each input a separate 
queue for each output (also called Virtual Output 
Queuing, VOQ). To get delay guarantees of an 
input queuing (IQ) switch closer to those of an 
output queuing switch (OQ), one usually assumes 
increased speedup S: the switching fabric runs S 
times faster than each of the input or the output 
ports. Hence, an OQ switch has a speedup of N 
(where N is the number of input/output ports), 
whereas an IQ switch has a speedup of 1; for 
1 < S < N, packets need to be buffered at 
the inputs before switching as well as at the 
outputs after switching. This architecture is called 
a CIOQ switch. 


Uniform Values 

Consider an N x N CIOQ switch with speedup 
S. Packets of equal size arrive at input ports, 
each labeled with the output port where it has 
to leave the switch. Each packet is placed in 
the input queue corresponding to its output port; 
when it crosses the switch fabric, it is placed 
in the output queue and resides there until it is 
sent on the output link. For unit-valued packets, 
Kesselman and Rosén [15] proposed a non-push- 
out policy which is 3-competitive for any S and 
2-competitive for S = 1. Kesselman, Kogan, 
and Segal [13] show an upper bound of 4 on the 
competitiveness of a simple greedy policy. 


Variable Values 

For up to m packet values in [1, V], Kesselman 
and Rosén [15] show two push-out policies to 
be 4S- and 8 min{m, 2 log V}-competitive. Azar 
and Richter [5] propose a push-out policy B-PG 
with parameter 6; Kesselman et al. [20] show that 
the competitive ratio of B-PG is at most 7.5 for 
8 = 3 and at most 7.47 for B = 2.8. Kesselman 
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and Rosén [16] consider CIOQ switches with PQ 
buffers (transmit the highest value packet) and 
show that this policy is 6-competitive for any S. 


Uniform Processing, Crossbar Switches 

In the buffered crossbar switch architecture, a 
small buffer is placed on each crosspoint in ad- 
dition to input and output queues, which greatly 
simplifies the scheduling process. For packets 
with unit length and value, Kesselman et al. [20] 
introduce a greedy switch policy with competi- 
tive ratio between 2 and 4 and show a general 


2 
lower bound of 2 for unit-size buffers. For vari- 


able values and PQ buffers, they propose a push- 
out greedy switch policy with preemption factor 
B with competitive ratio between (28 —1)/(B—1) 
(3.87 for B = 1.53) and (B + 2)? + 2/(B — 1) 
(16.24 for 8 = 1.53). For variable values and 
FIFO buffers, they propose a B-push-out greedy 
switching policy with competitive ratio 6 + 46 + 
B? + 3/(B — 1) (19.95 for B = 1.67) [19]. 


Uniform Values, Variable Processing, 

Single Queue 

In this setting, each packet contributes one unit to 
the objective function, but different packets have 
different processing requirements, i.e., they spend 
a different number of time slots at the processor. 
We denote maximal possible required processing 
by k. 


Non-Push-Out Policies 

For a single queue and packets with heteroge- 
neous processing, non-push-out policies have not 
been considered in any detail. Kogan, Lépez- 
Ortiz, Nikolenko, and Sirotkin [23] have shown 
that any greedy non-push-out policy is at least 
$(k + 1)-competitive. It remains an open prob- 
lem to find non-push-out policies with sublinear 
competitive ratios or show that none exists. 


Push-Out Policies 

Keslassy et al. [12] showed that again, for a 
single queue, PQ (Priority Queue) that sorts pack- 
ets with respect to required processing (smallest 
first) is optimal; research has concentrated on the 
FIFO case, where packets have to be transmitted 
in order of arrival. Kogan et al. [24] introduced 
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lazy policies that process packets down to a 
single cycle but then delay their transmission un- 
til the entire queue consists of such packets; then 
all packets are transmitted out in as many time 
slots as there are packets in the queue. In [24], 
LPO (Lazy Push-Out) was proven to be at most 
(max{1,In&}+ 2 + o(1))-competitive; [24] also 
provides a lower bound of |logz k|+1—O(1/B) 
for both PO (push-out FIFO) and LPO; for large 
k this bound matches the upper bound up to a 
factor of log B. Proving a matching upper bound 
for the PO policy remains an important open 
problem. In the two-valued case, when packets 
may have required processing only 1 or k, LPO 
has a lower bound of 2 — t and a matching 
upper bound of 2 + z [24]. Kogan, Lépez- 
Ortiz, Nikolenko, and Sirotkin [23] introduce 
semi-FIFO policies, separating processing order 
from transmission order so that transmission can 
conform to FIFO constraints while processing or- 
der remains arbitrary. Lazy policies thus become 
a special case of semi-FIFO pele The authors 
show a general upper bound of + gzlog_p k+3 
on the competitive ratio of aay aay policy and 
a matching lower bound of 4 z log_e_ k + 1 for 
several processing orders. In the two-valued case, 
when processing is only 1 or k, this upper bound 
improves to 2+ z , so any lazy policy has constant 
competitiveness. LPQ (Lazy Priority Queue) also 
falls in the semi-FIFO class; its competitiveness 
is between (2— 4[#]) and 2 even for arbi- 
trary processing requirements. Kogan et al. [22] 
consider a generalization with packets of vary- 
ing size, considering several natural policies and 
showing an upper bound of 4Z for one of PO 
policies, where L is the maximal packet size. 


Copying Cost 

An important generalization of the heterogeneous 
processing model was introduced by Keslassy 
et al. [12]. They attach a penalty a called copying 
cost to admitting a packet in the queue; thus, 
the objective function is now 7 — aA, where 
T is the number of transmitted packets and A 
is the number of accepted ones, and it becomes 
less advantageous to push packets out. To deal 
with copying cost, the authors propose to use B- 
push-out policies that push a packet out only 


Single and Multiple Buffer Processing 


if its required processing is at least B > 1 
times less than the required processing of a 
packet which is being pushed out. Keslassy et al. 
[12] consider the PQg policy (Priority Queue 
with 6-push-out) and hag that it is at most 
eae at + log g_ wor 2 K+ 2 logs k) (1 —a@)- 

competitive. Kogan, ee -Ortiz, Nikolenko, and 
Sirotkin [23] show that for any processing order, 
a B-push-out lazy policy LAg has competitive 


ratio at most (3 + § 108 of BB _k) . They 
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1—a logg k 


show a lower bound 3k On the competitive 


1 aitg 

ratio of any 6-push-out policy: which matches 
the additional factor in the upper bound. In 
the two-valued case, the upper bound becomes 
(2+ =) a and the authors also show a 


: 2B—2)(1— 
matching lower bound of ( mat st . 


Uniform Values, Variable Processing, 

Multiple Separated Queues 

Consider k separate queues of size B each; 
packets with required processing i fall into the 7th 
queue, and the processor chooses which queue to 
process on a given time slot. Push-out is irrelevant 
since queues are independent and packets in 
a queue are identical. Kogan, L6opez-Ortiz, 
Nikolenko, and Sirotkin [25] show linear lower 
bounds for several seemingly attractive policies: 
5 min{k, B} for LQF (Longest Queue First), 
k for SQF (Shortest Queue First), Se?) for 
PRR (Packet Round Robin), and an almost linear 
lower bound of he: where H(k) = yy + ow 
Ink + y, for CRR (Cycle Round Robin). They 
introduce a policy called MQF (Minimal Queue 
First) that processes packets from a nonempty 
queue with minimal processing requirement. 
They show that MQF is at least (1 + i} - 
competitive and prove a constant upper bound of 
2. For the two-valued case with two queues, 
1 and k, Kogan et al. [25] show exactly 
matching lower and upper bounds for MQF of 


I++ | ))/(B+[2 Ol] +)))- 


Uniform Values, Variable Processing, 

Shared Memory Switch 

In this setting, multiple queues with shared mem- 
ory are implemented in the same way as for 
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uniform processing and heterogeneous values: 
there are N output ports, each output port man- 
ages a single output queue Q;, and each output 
queue collects packets with the same processing 
requirement (so packets in a given queue are 
identical). 


Non-Push-Out Policies 

Eugster, Kogan, Nikolenko, and Sirotkin [8] con- 
sider non-push-out policies and show that NHST 
(Non-Push-Out Harmonic with Static Threshold: 
|Q;| is bounded by ) is (kZ + o(kZ))- 
competitive, NEST (Non-Push-Out with Equal 
Static Threshold: |Q;| is bounded by B/n) is 
(N + 0(N))-competitive, NHDT (Non-Push-Out 
with Harmonic Dynamic Threshold: accept into 
OF if S103) <— gost gts FS), 
where j,...jm = i are queues for which 
Qi] = |Qil) is GVkInk — o(Vk Ink))- 
competitive; finding better non-push-out policies 
is an open problem. 


Push-Out Policies 

The work [8] also shows lower bounds on the 
competitive ratio of well-known policies: (Vk - 
o(s/k)) for LQD (Longest Queue Drop), (Ink + 
y) for BQD (Biggest Packet Drop), and (4 - $) 
for LWD (Largest Work Drop). The main result 


of [8] is that LWD is at most 2-competitive. 


Open Problems 


1. Close the gap between competitive ratios z 
(lower bound for any policy) and 2 (upper 
bound for LQD) in the uniform processing, 
uniform value case. 

2. Do there exist policies with constant compet- 
itive ratio in the uniform processing, variable 
values, shared memory multiple output queues 
setting? 

3. Do there exist non-push-out policies with sub- 
linear competitive ratio in the case of a single 
queue with packets with variable processing 
and uniform values? 

4. Prove an upper bound on the competitiveness 
of PO (push-out) policy in the single-queue 
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FIFO model with heterogeneous required pro- 
cessing and uniform values. 

5. Do there exist non-push-out policies with log- 
arithmic competitive ratio in the case of mul- 
tiple output ports with shared memory that 
contain packets with variable processing and 
uniform values? 

6. Design efficient policies for CIOQ and cross- 
bar switches with packets with heterogeneous 
processing and uniform values; prove bounds 
on their competitive ratios. 

7. Design efficient policies and prove bounds on 
their competitive ratios for the case of packets 
with both variable values and heterogeneous 
processing requirements in all of the above 
settings. 
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Problem Definition 


A dynamic graph algorithm maintains a given 
property P on a graph subject to dynamic 
changes, such as edge insertions, edge deletions 
and edge weight updates. A dynamic graph 
algorithm should process queries on property P 
quickly, and perform update operations faster 
than recomputing from scratch, as carried out by 
the fastest static algorithm. An algorithm is fully 
dynamic if it can handle both edge insertions and 
edge deletions and partially dynamic if it can 
handle either edge insertions or edge deletions, 
but not both. 

Given a graph with n vertices and m edges, 
the transitive closure (or reachability) problem 


Single-Source Fully Dynamic Reachability 


consists of building an n x n Boolean matrix M 
such that M[x, y] = 1 if and only if there is 
a directed path from vertex x to vertex y in the 
graph. The fully dynamic version of this problem 
can be defifined as follows: 


Definition 1 (Fully dynamic reachability 
problem) The fully dynamic reachability 
problem consists of maintaining a directed graph 
under an intermixed sequence of the following 
operations: 


« insert(u,v): insert edge (u,v) into the graph. 

« delete(u,v): delete edge (u,v) from the 
graph. 

« reachable(x,y): return true if there is a di- 
rected path from vertex x to vertex y, and false 
otherwise. 


This entry addresses the single-source version of 
the fully-dynamic reachability problem, where 
one is only interested in queries with a fixed 
source vertex s. The problem is defined as 
follows: 


Definition 2 (Single-source fully dynamic 
reachability problem) The fully dynamic 
single-source reachability problem consists of 
maintaining a directed graph under an intermixed 
sequence of the following operations: 


« insert(u,v): insert edge (u,v) into the graph. 

« delete(u,v): delete edge (u,v) from the 
graph. 

« reachable(y): return true if there is a di- 
rected path from the source vertex s to vertex 
y, and false otherwise. 


Approaches 

A simple-minded solution to the problem of Def- 
inition would be to keep explicit reachability 
information from the source to all other vertices 
and update it by running any graph traversal 
algorithm from the source s after each insert or 
delete. This takes O(m + n) time per operation, 
and then reachability queries can be answered in 
constant time. 
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Another simple-minded solution would be to 
answer queries by running a point-to-point reach- 
ability computation, without the need to keep 
explicit reachability information up to date after 
each insertion or deletion. This can be done in 
O(m + n) time using any graph traversal algo- 
rithm. With this approach, queries are answered 
in O(m + n) time and updates require constant 
time. Notice that the time required by the slowest 
operation is O(m-n) for both approaches, which 
can be as high as O(n”) in the case of dense 
graphs. 

The first improvement upon these two ba- 
sic solutions is due to Demetrescu and Italiano, 
who showed how to support update operations in 
O(n!°7°) time and reachability queries in O(1) 
time [1] in a directed acyclic graph. The result is 
based on a simple reduction of the single-source 
problem of Definition to the all-pairs problem 
of Definition. Using a result by Sankowski [2], 
the bounds above can be extended to the case of 
general directed graphs. 


Key Results 


This Section presents a simple reduction 
presented in [1] that allows it to keep explicit 
single-source reachability information up to date 
in subquadratic time per operation in a directed 
graph subject to an intermixed sequence of 
edge insertions and edge deletions. The bounds 
reported in this entry were originally presented 
for the case of directed acyclic graphs, but can 
be extended to general directed graphs using the 
following theorem from [2]: 


Theorem 1 Given a general directed graph with 
n vertices, there is a data structure for the fully 
dynamic reachability problem that supports each 
insertion/deletion in O(n'37>) time and each 
reachability query in O(n®°75) time. The algo- 
rithm is randomized with one-sided error. 


The idea described in [1] is to maintain reach- 
ability information from the source vertex s to 
all other vertices explicitly by keeping a Boolean 
array R of size n such that R[y] = 1 if and 
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only if there is a directed path from s to y. An 
instance D of the data structure for fully dynamic 
reachability of Theorem is also maintained. After 
each insertion or deletion, it is possible to update 
D in O(n!°7°) time and then rebuild R in O(n - 
n®°75) = O(n!°7°) time by letting R[y] <— D. 
reachable (s,y) for each vertex y. This yields 
the following bounds for the single-source fully 
dynamic reachability problem: 


Theorem 2 Given a general directed graph with 
n vertices, there is a data structure for the single- 
source fully dynamic reachability problem that 
supports each insertion/deletion in O(n'57>) 
time and each reachability query in O(1\) 
time. 


Applications 


The graph reachability problem is particularly 
relevant to the field of databases for support- 
ing transitivity queries on dynamic graphs of 
relations [3]. The problem also arises in many 
other areas such as compilers, interactive verifi- 
cation systems, garbage collection, and industrial 
robotics. 


Open Problems 


An important open problem is whether one can 
extend the result described in this entry to main- 
tain fully dynamic single-source shortest paths in 
subquadratic time per operation. 
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Problem Definition 


The single-source shortest path problem (SSSP) 
is, given a graph G = (V,E,/) and a source 
vertex s € V, to find the shortest path from s 
to every v € V. The difficulty of the problem 
depends on whether the graph is directed or 
undirected and the assumptions placed on 
the length function £. In the most general 
situation, / : E — R assigns arbitrary (positive 
and negative) real lengths. The algorithms of 
Bellman-Ford and Edmonds [1, 4] may be 
applied in this situation and have running 
times of roughly O(mn), (Edmonds’s algorithm 
works for undirected graphs and presumes that 
there are no negative length simple cycles.) 
where m = |E| and n = |V| are the 
number of edges and vertices. If ¢ assigns 
only nonnegative real edge lengths, then the 


Single-Source Shortest Paths 


algorithms of Dijkstra and Pettie-Ramachandran 
[4, 13] may be applied on directed and undirected 
graphs, respectively. These algorithms include 
a sorting bottleneck and, in the worst case, 
take Q(m + nlogn) time. (The [13] algorithm 
actually runs in O(m + nloglogn) time if the 
ratio of any two edge lengths is polynomial 
in 7). 

A common assumption is that £ assigns integer 
edge lengths in the range {0,...,2" — 1} or 
{—2"-1..,2"-1 _ 1} and that the machine is 
a w-bit word RAM; that is, each edge length fits 
in one register. For general integer edge lengths, 
the best SSSP algorithms improve on Bellman- 
Ford and Edmonds by a factor of roughly ./n 
[6]. For nonnegative integer edge lengths, the 
best SSSP algorithms are faster than Dijkstra and 
Pettie-Ramachandran by up to a logarithmic fac- 
tor. They are frequently based on integer priority 
queues [9]. 


Key Results 


Thorup’s primary result [16] is an optimal linear 
time SSSP algorithm for undirected graphs with 
integer edge lengths. This is the first and only 
linear time shortest path algorithm that does not 
make serious assumptions on the class of input 
graphs. 


Theorem 1 There is a SSSP algorithm for 
integer-weighted undirected graphs that runs 
in O(m) time. 


Thorup avoids the sorting bottleneck inherent in 
Dijkstra’s algorithm by precomputing (in linear 
time) a component hierarchy. The algorithm of 
[16] operates in a manner similar to Dijkstra’s 
algorithm [4] but uses the component hierarchy 
to identify groups of vertices that can be visited 
in any order. In later work, Thorup [17] extended 
this approach to work when the edge lengths are 
floating-point numbers. (There is some flexibility 
in the definition of shortest path since floating- 
point addition is neither commutative nor asso- 
ciative). 

Thorup’s hierarchy-based approach has since 
been extended to directed and/or real-weighted 
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graphs and to solve the all pairs shortest path 
(APSP) problem [11-13]. The generalizations 
to related SSSP problems are summarized be- 
low. See [1 1, 12] for hierarchy-based APSP algo- 
rithms. 


Theorem 2 (Hagerup [8], 2000) A component 
hierarchy for a directed graph G = (V,E,1), 
where 1 : E — {0,...,2” — 1}, can be con- 
structed in O(mlogw) time. Thereafter, SSSP 
from any source can be computed in O(m + 
n log log n) time. 


Theorem 3 (Pettie and Ramachandran [13], 
2005) A component hierarchy for an undirected 
graph G = (V,E,1), where 1 Eo 
Rt, can be constructed in O(ma(m,n) + 
min{n loglogr,nlogn}) time, where r is the 
ratio of the maximum-to-minimum edge length. 
Thereafter, SSSP from any source can be 
computed in O(m log a(m,n)) time. 


The algorithms of Hagerup [8] and Pettie- 
Ramachandran [13] take the same basic approach 
as Thorup’s algorithm: use some kind of 
component hierarchy to identify groups of 
vertices that can safely be visited in any 
order. However, the assumption of directed 
graphs [8] and real edge lengths [13] renders 
Thorup’s hierarchy inapplicable or inefficient. 
Hagerup’s component hierarchy is based on a 
directed analogue of the minimum spanning tree. 
The Pettie-Ramachandran algorithm enforces 
a certain degree of balance in its component 
hierarchy and, when computing SSSP, uses a 
specialized priority queue to take advantage of 
this balance. 


Applications 


Shortest path algorithms are frequently used as a 
subroutine in other optimization problems, such 
as flow and matching problems [1] and facility 
location [18]. A widely used commercial ap- 
plication of shortest path algorithms is finding 
efficient routes on road networks, e.g., as pro- 
vided by Google Maps, MapQuest, or Yahoo 
Maps. 
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Open Problems 


Thorup’s SSSP algorithm [16] runs in linear time 
and is therefore optimal. The main open prob- 
lem is to find a linear time SSSP algorithm 
that works on real-weighted directed graphs. For 
real-weighted undirected graphs, the best run- 
ning time is given in Theorem 3. For integer- 
weighted directed graphs, the fastest algorithms 
are based on Dijkstra’s algorithm (not Theo- 
rem 2) and run in O(m ,/log log n) time (random- 
ized) and deterministically in O(m + n log log n) 
time. 


Problem 1 Is there an O(m) time SSSP algo- 
rithm for integer-weighted directed graphs? 


Problem 2 Is there an O(m) + o(n logn) time 
SSSP algorithm for real-weighted graphs, either 
directed or undirected? 


The complexity of SSSP on graphs with positive 
and negative edge lengths is also open. 


Experimental Results 


Asano and Imai [2] and Pettie et al. [14] evaluated 
the performance of the hierarchy-based SSSP 
algorithms [13, 16]. There have been a number of 
studies of SSSP algorithms on integer-weighted 
directed graphs; see [7] for the latest and refer- 
ences to many others. The trend in recent years 
is to find practical preprocessing schemes that 
allow for very quick point-to-point shortest path 
queries. See [3, 10, 15] for recent work in this 
area. 


Data Sets 


See [5] for a number of US and European road 
networks. 


URL to Code 


See [5]. 
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Problem Definition 


The ski rental problem was developed as a peda- 
gogical tool for understanding the basic concepts 
in some early results in online algorithms. (In the 
interest of full disclosure, the earliest presenta- 
tions of these results described the problem as the 
wedding-tuxedo-rental problem. Objections were 
presented that this was a gender-biased name 
for the problem, since while groomsmen can 
rent their wedding apparel, bridesmaids usually 
cannot. A further complication, owing to the 
difficulty of instantaneously producing fitted gar- 
ments or ski equipment outlined below, suggests 
that some complications could have been avoided 
by focusing on the dilemma of choosing between 
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daily lift passes or season passes, although this 
leads to the pricing complexities of purchasing 
season passes well in advance of the season, as 
opposed to the higher cost of purchasing them 
at the mountain during the ski season. A simi- 
lar problem could be derived from the question 
as to whether to purchase the daily newspaper 
at a newsstand or to take a subscription, after 
adding the challenge that one’s peers will treat 
one contemptuously if one has not read the news 
on days on which they have.) The ski rental 
problem considers the plight of one consumer 
who, in order to socialize with peers, is forced to 
engage in a variety of athletic activities, such as 
skiing, bicycling, windsurfing, rollerblading, sky 
diving, scuba-diving, tennis, soccer, and ultimate 
Frisbee, each of which has a set of associated 
apparatus, clothing, or protective gear. 

In all of these, it is possible either to purchase 
the accoutrements needed or to rent them. For the 
purpose of this problem, it is assumed that one- 
time rental is less expensive than purchasing. It 
is also assumed that purchased items are durable, 
and suitable for reuse for future activities of the 
same type without further expense, until the items 
wear out (which occurs at the same rate for all 
users), are outgrown, become unfashionable, or 
are disposed of to make room for other purchased 
items. The social consumer must make the de- 
cision to rent or buy for each event, although 
it is assumed that the consumer is sufficiently 
parsimonious as to abjure rental if already in 
possession of serviceable purchased equipment. 
Whether purchases are as easy to arrange as 
rentals, or whether some advance planning is 
required (e.g., to mount bindings on a ski) is a 
further detail considered in this problem. It is 
assumed that the social consumer has no parttic- 
ular independent interest in these activities, and 
engages in these activities only to socialize with 
peers who choose to engage in these activities 
disregarding the consumer’s desires. 

These putative peers are more interested in 
demonstrating the superiority of their financial 
acumen to that of the social consumer in question 
than they are in any particular activity. To that 
end, the social consumer is taunted mercilessly 
based on the ratio of his/her total expenses on 
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rentals and purchases to theirs. Consequently, the 
peers endeavor to invite the social consumer to 
engage in events while they are costly to him/her, 
and once the activities are free to the social 
consumer, if continued activity would be costly 
to them, cease. But, to present an illusion of 
fairness, skis, both rented and purchased, have the 
same cost for the peers as they do for the social 
consumer in question. The ski rental problem 
takes a very restricted setting. It assumes that 
purchased ski equipment never needs replace- 
ment, and that there are no costs to a ski trip 
other than the skis (thus, no cost for the gaso- 
line, for the lift and/or speeding tickets, for the 
hot chocolates during skiing, or for the aprés- 
ski liqueurs and meals). It is assumed that the 
social consumer experiences no physical disabil- 
ities preventing him/her from skiing and has no 
impending restrictions to his/her participation in 
ski trips (obviously, a near-term-fatal illness or 
an anticipated conviction leading to confinement 
for life in a penitentiary would eliminate any 
potential interest in purchasing alpine equipment 
— when the ratio of purchase to rental exceeds the 
maximum need for equipment, one should always 
rent). It is assumed that the social consumer’s 
peers have disavowed any interest in activities 
other than skiing, and that the closet, basement, 
attic, garage, or storage locker included in the so- 
cial consumer’s rent or mortgage (or necessitated 
by other storage needs) has sufficient capacity 
to hold purchased ski equipment without entail- 
ing the disposal of any potentially useful items. 
Bringing these complexities into consideration 
brings one closer to the hardware-based problems 
which initially inspired this work. 

The impact of invitations issued with sufficient 
time allowed for purchasing skis, as well as those 
without, will be considered. 

Given all of that, what ratio of expenses can 
the social consumer hope to attain? What ratio 
can the social consumer not expect to beat? These 
are the basic questions of competitive analysis. 

The impact of keeping secrets from one’s 
peers is further considered. Rather than a fixed 
strategy for when to purchase skis, the social 
consumer may introduce an element of chance 
into the process. If the peers are able to observe 
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his/her ski equipment and notice when it changes 
from rented skis to purchased skis, and change 
their schedule for alpine recreation in light of 
this observation, randomness provides no advan- 
tages. If, on the other hand, the social consumer 
announces to the peers, in advance of the first 
trip, how he/she will decide when the time is 
right for purchasing skis, including any use of 
probabilistic techniques, and they then decide on 
the schedule for ski trips for the coming win- 
ter, a deterministic decision procedure generally 
produces a larger competitive ratio than does a 
randomized procedure. 


Key Results 


Given an unbounded sequence of skiing trips, 
one should eventually purchase skis if the cost of 
renting skis, r, is positive. In particular, let the 
cost of purchasing skis be some number p > r. 
If one never intends to make a purchase, one’s 
cost for the season will be rn, where n is the 
number of ski trips in which one participates. If 
n exceeds p/r, one’s cost will exceed the price 
of purchasing skis; as m continues to increase, 
the ratio of one’s costs to those of one’s peers 
increases to nr/p, which grows unboundedly 
with 1, since your peers, knowing that n exceeds 
p/r, will have purchased skis prior to the first 
trip. 

On the other hand, if one rushes out to pur- 
chase skis upon being told that the ski season is 
approaching, one’s peers will decide that this sea- 
son looks inopportune, and that skiing is passé, 
leaving their costs at zero, and one’s costs at p, 
leaving an infinite ratio between one’s costs and 
theirs; if one chooses to defer the purchase until 
after one’s first ski trip, this produces the less 
unfavorable ratio p/r or 1 + p/r, depending on 
whether the invitation left one time to purchase 
skis before the first trip or not. 

Suppose one chooses, instead, to defer one’s 
purchase until after one has made k rentals, but 
before ski trip k +1. One’s costs are then bounded 
by kr + p. After k ski trips, the cost to one’s 
peers will be the lesser of kr and p (as one’s 
peers will have decided whether to rent or buy for 
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the season upon knowing one’s plans, which in 
this case amounts to knowing k), for a ratio equal 
to the larger of 1 + kr/p and 1 + p/kr. Were 
they to choose to terminate the activity earlier 
(son < k), the ratio would be only the greater 
of kr/p and 1, which is guaranteed to be less 
than the sum of the two — one’s peers would be 
shirking their opportunity to make one’s behavior 
look foolish were they to allow one to stop skiing 
prior to one’s purchase of a pair of skis! 

It is certain, since kr/p and p/kr are recip- 
rocals, that one of them is at least equal to 1, 
ensuring that one will be compelled to spend at 
least twice as much as one’s peers. 

The analysis above applies to the case where 
ski trips are announced without enough warning 
to leave one time to buy skis. Purchases in that 
case are not instantaneous; in contrast, if one is 
able to purchase skis on demand, the cost to one’s 
peers changes to the lesser of (k + 1)r and p. The 
overall results are not much different; the ratio 
choices become the larger of | + kr/p and 1 + 
(p—r)/((k + Dr). 

When probabilistic algorithms are considered 
with oblivious frenemies (those who know the 
way in which random choices will affect one’s 
purchasing decisions, but who do not take time to 
notice that one’s skis are no longer marked with 
the name and phone number of a rental agency), 
one can appear more thrifty. 

A randomized algorithm can be viewed as 
a distribution over deterministic algorithms. No 
good algorithm can purchase skis prior to the 
first invitation, lest it exhibit infinite regrettability 
(some positive cost compared to zero). A good 
algorithm must purchase skis by the time one’s 
peers will have; otherwise, one’s cost ratio con- 
tinues to increase with the number of ski trips. 
Moreover, the ratio should be the same after every 
ski trip; if not, then there is an earliest ratio 
not equal to the largest, and probabilities can be 
adjusted to change this earliest ratio to be closer 
to the largest while decreasing all larger ratios. 

Consider, for example, the case of p = 2r, 
with purchases allowed at the time of an invita- 
tion. The best deterministic ratio in this case is 
1.5. It is only necessary to choose a probability 
q, the probability of purchasing at the time of 
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the first invitation. The cost after one trip is then 
(l—q)r+2qr = r(1+4q), fora ratio of 1+g, and 
after two trips the cost is g(2r) + (1 — q)(3r) = 
3 — q)r, producing a ratio of (3 — q)/2. Setting 
these to be equal yields g = 1/3, for a ratio of 
4/3. 

If insufficient time is allowed for purchases 
before skiing, the best deterministic ratio is 2. 
Purchasing after the first ski trip with probability 
q (and after the second with probability 1 — q) 
leads to expected costs of (1—qg)r+3qr =rU+ 
2q) after the first trip, and (1—q)(2+2)r+3qr = 
r(4+ q), leading to a ratio of 2 — q/2. Setting 
1+ 2q = 2-q/2 yields q = 2/5, for a ratio of 
9/5. 

More careful analysis, for which readers are 
referred to the references and the remainder of 
this volume, shows that the best achievable ra- 
tio approaches «/(e — 1) ® 1.58197 as p/r 
increases, approaching the limit from below if 
sufficient warning time is offered, and from above 
otherwise. 


Applications 


The primary initial results were directed towards 
problems of computer architecture; in particu- 
lar, design questions for capacity conflicts in 
caches, and shared memory design in the pres- 
ence of a shared communication channel. The 
motivation for these analyses was to find designs 
which would perform reasonably well on as- 
yet-unknown workloads, including those to be 
designed by competitors who may have chosen 
alternative designs which favor certain cases. 
While it is probably unrealistic to assume that 
precisely the least-desirable workloads will occur 
in ordinary practice, it is not unreasonable to 
assume that extremal workloads favoring either 
end of a decision will occur. 


History and Further Reading 
This technique of finding algorithms with 


bounded worst-case performance ratios is 
common in analyzing approximation algorithms. 
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The initial proof techniques used for such 
analyses (the method of amortized analysis) were 
first presented by Sleator and Tarjan. 

The reader is advised to consult the remainder 
of this volume for further extensions and appli- 
cations of the principles of competitive online 
algorithms. 
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Problem Definition 


This problem is about finding the optimal orienta- 
tions of the cells in a slicing floorplan to minimize 
the total area. In a floorplan, cells represent basic 
pieces of the circuit which are regarded as in- 
divisible. After performing an initial placement, 
for example, by repeated application of a min- 
cut partitioning algorithm, the relative positions 
between the cells on a chip are fixed. Various 
optimizations can then be done on this initial 
layout to optimize different cost measures such 
as chip area, interconnect length, routability, etc. 
One such optimization, as mentioned in Lauther 
[3], Otten [4], and Zibert and Saal [13], is to 
determine the best orientation of each cell to 
minimize the total chip area. This work by Stock- 
meyer [8] gives a polynomial time algorithm to 
solve the problem optimally in a special type 
of floorplans called slicing floorplans and shows 
that this orientation optimization problem in gen- 
eral non-slicing floorplans is NP-complete. 


Slicing Floorplan 

A floorplan consists of an enclosing rectangle 
subdivided by horizontal and vertical line seg- 
ments into a set of non-overlapping basic rect- 
angles. Two different line segments can meet but 
not cross. A floorplan F is characterized by a 
pair of planar acyclic directed graphs Ar and Lr 
defined as follows. Each graph has one source 
and one sink. The graph Af captures the “above” 
relationships and has a vertex for each horizontal 
line segment, including the top and the bottom of 
the enclosing rectangle. For each basic rectangle 
R, there is an edge e, directed from segment o 
to segment o’ if and only if o (or part of o) is 
the top of R and o’ (or part of 0’) is the bot- 
tom of R. There is a one-to-one correspondence 
between the basic rectangles and the edges in 
Arf. The graph LF is defined similarly for the 
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Slicing Floorplan Orientation, Fig. 1 A floorplan F and its A and L - representing the above and left relationships 


Slicing Floorplan 
Orientation, Fig.2 A 
slicing floorplan F and its 
slicing tree representation 


“left” relationships of the vertical segments. An 
example is shown in Fig. 1. Two floorplans F and 
G are equivalent if and only if Ar = Ag and 
Lr = Lg. A floorplan F is slicing if and only if 
both its Ar and Lf are series parallel. 


Slicing Tree 

A slicing floorplan can also be described natu- 
rally by a rooted binary tree called slicing tree. 
In a slicing tree, each internal node is labeled by 
either an / or a v, indicating a horizontal or a ver- 
tical slice respectively. Each leaf corresponds to 
a basic rectangle. An example is shown in Fig. 2. 
There can be several slicing trees describing the 
same slicing floorplan, but this redundancy can 
be removed by requiring the label of an internal 
node to differ from that of its right child [12]. 
For the algorithm presented in this work, a tree of 
smallest depth should be chosen, and this depth 
minimization process can be done in O(n logn) 
time using the algorithm by Golumbic [2]. 


Orientation Optimization 

In optimization of a floorplan layout, some free- 
dom in moving the line segments and in choosing 
the dimensions of the rectangles are allowed. In 
the input, each basic rectangle R has two positive 


integers ar and bp, representing the dimensions 
of the cell that will be fit into R. Each cell has 
two possible orientations resulting in either the 
side of length apr or br being horizontal. Given a 
floorplan F and an orientation p, each edge e in 
Arf and Lf is given a label /(e) representing the 
height or the width of the cell corresponding to 
e depending on its orientation. Define an (F, p)- 
placement to be a labeling / of the vertices in 
Af and LF such that (i) the sources are labeled 
by zero and (ii) if e is an edge from vertex 
o to o’,l(o’) => I(o) + L(e). Intuitively, if o 
is a horizontal segment, /(o) is the distance of 
o from the top of the enclosing rectangle, and 
the inequality constraint ensures that the basic 
rectangle corresponding to e is tall enough for 
the cell contained in it and similarly for the 
vertical segments. Now, hr (p) (resp. wr (p)) is 
defined to be the minimum label of the sink in 
AF(p) (resp. Lr (p)) over all (F, p)-placements, 
where Ar (p) (resp. LF (p)) is obtained from AF 
(resp. L-) by labeling the edges and vertices as 
described above. Intuitively, hr(p) and wr(p) 
give the minimum height and width of a floorplan 
F given an orientation p of all the cells such 
that each cell fits well into its associated basic 
rectangle. 
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The orientation optimization problem can be 
defined formally as follows: 


Problem 1 (Orientation Optimization Prob- 
lem for Slicing Floorplan) 

INPUT: A slicing floorplan F' of n cells described 
by a slicing tree 7, the widths and heights of the 
cells a; and b; fori = 1...n, and a cost function 
yh). 

OUTPUT: An orientation p of all the cells that 
minimizes the objective functiony( F (p),wF(p)) 
over all orientations p. 


For this problem, Lauther [3] has suggested 
a greedy heuristic. Zibert and Saal [13] use 
integer programming methods to do rotation 
optimization and several other optimization 
simultaneously for general floorplans. In the 
following sections, an efficient algorithm will be 
given to solve the problem optimally in O(nd) 
time where n is the number of cells and d is the 
depth of the given slicing tree. 


Key Results 


In the following algorithm, F(u) denotes the 
floorplan described by the subtree rooted at wu in 
the given slicing tree 7, and let L(u) be the set of 
leaves in that subtree. For each node u of 7, the 
algorithm constructs recursively a list of pairs: 


{(h1,W1).(h2,W2)...-+ ims Wm)$ 


where (1) m < |L(@w)| + 1, (2) Aj > Aj+1 and 
wi < wj41 fori = 1...m — 1, (3) there is 
an orientation p of the cells in L(u) such that 
(hi.wi) = (hr(u)(p). wr (w(p)) for each i = 
1...m, and (4) for each orientation p of the cells 
in L(u), there is a pair (;, w;) in the list such that 
hi <hrq(p) and w; < wrq(p). 

L(u) is thus a non-redundant list of all possible 
dimensions of the floorplan described by the 
subtree rooted at u. Since the cost function 
is non-decreasing, it can be minimized over all 
orientations by finding the minimum wW(h;, w;) 
over all the pairs (4;,w;) in the list constructed 
at the root of 7. At the beginning, a list is 
constructed at each leaf node of T representing 
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the possible dimensions of the cell. If a leaf cell 
has dimensions a and b with a > b, the list 
is {(a,b), (b,a)}. If a = b, there will just be 
one pair (a,b) in the list. (If the cell has a fixed 
orientation, there will also be just one pair as 
defined by the fixed orientation.) Notice that the 
condition (1) above is satisfied in these leaf node 
lists. The algorithm then works its way up the 
tree and constructs the list at each node recur- 
sively. In general, assume that wu is an internal 
node with children v and v’ and uw represents 
a vertical slice. Let {(41,w1)...(Ax,wx)} and 
{(h),w))--.(h),,Wj,)} be the lists at v and v’ 
respectively where kK < |L(v)| + 1 andm < | 
L(v’)| + 1. A pair (4;, wi) from v can be put 
together by a vertical slice with a pair (h’,, w’;) 
from v’ to give a pair: 


join((hi, wi), (Hl,,w,)) = (max(hi.h',), wi -+w,) 


in the list of u (see Fig.3). The key fact is that 
most of the km pairs are sub-optimal and do not 
need to be considered. For example, if h; > h’,, 
there is no need to join (h;, w;) with (h/,, w,,) for 
any z > j since 

max(h;,h’,) = max(hj, h’;) =h;, 
wi tw, > wi + wi’, 


Similarly, if node u represents a horizontal slice, 
the join operation will be 


join((h;, wi), (h';,w)) = (hi +h’;, max(w;, w’;)) 


The algorithm also keeps two pointers for each 
element in the lists in order to construct back the 
optimal orientation at the end. The algorithm is 
summarized by the following pseudocode: 


Pseudocode Stockmeyer() 

1. Initialize the list at each leaf node. 

2. Traverse the tree in postorder. At each inter- 
nal node u with children v and v’, construct a 
list at node u as follows: 

3. Let {(41,w1)...(he,we)} and {(h{,w}) 
...(1),,Win)} be the lists at uv and v’ 
respectively. 

4. Initialize i and 7 to one. 
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Slicing Floorplan 
Orientation, Fig.3 An 
illustration of the merging 
step 


5. Ifi > k or j > m, the whole list at u is 
constructed. 

6. Add join((h;, w;), (h’,, w')) to the list with 
pointers pointing to (h;,w;) and (h’,,w’) in 
L(v) and L(v’) respectively. 

7. If hj; > h'., increment i by 1. 

8. If hj > h’,, increment j by 1. 

9. If hj > h’;, increment both i and j by 1. 

10. Go to step 5 

11. Compute W(h;,w;) for each pair (h;, w;) in 
the list L, at the root r of T. 

12. Return the minimum W(/;,w;) for all 
(h;, w;) in L; and construct back the optimal 
orientation by following the pointers. 


Correctness 

The algorithm is correct since at each node u, 
a list is constructed that records all the possible 
non-redundant dimensions of the floorplan de- 
scribed by the subtree rooted at u. This can be 
proved easily by induction starting from the leaf 
nodes and working up the tree recursively. Since 
the cost function is non-decreasing, it can be 
minimized over all orientations of the cells by 
finding the minimum W(h;, w;) over all the pairs 
(h;,w;) in the list L, constructed at the root r 
of T. 


Runtime 

At each internal node u with children v and v’. 
If the lengths of the lists at v and v’ are k and m 
respectively, the time spent at uw to combine the 
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two lists is O(k + m). Each possible dimension 
of a cell will thus invoke one unit of execution 
time at each node on its path up to the root in 
the postorder traversal. The total runtime is thus 
O(d x N) where N is the total number of real- 
izations of all the 7 cells, which is equal to 2n in 
the orientation optimization problem. Therefore, 
the runtime of this algorithm is O(nd). 


Theorem 1 Let W(h,w) be non-decreasing in 
both arguments, i.e, if h < h' and w<w, 
Wh, w) < Wh’, w’), and computable in constant 
time. For a Slicing floorplan F described by a 
binary slicing tree T, the problem of minimizing 
W(hF(p), wr (p)) over all orientations p can be 
solved in time O(nd) time, where n is the number 
of leaves of T (equivalently, the number of cells 
of F) and d is the depth of T. 


Applications 


Floorplan design is an important step in the phys- 
ical design of VLSI circuits. Stockmeyer’s opti- 
mal orientation algorithm [8] has been general- 
ized to solve the area minimization problem in 
slicing floorplans [7], in hierarchical non-slicing 
floorplans of order five [6,9], and in general floor- 
plans [5]. The floorplan area minimization prob- 
lem is similar except that each soft cell now has 
a number of possible realizations, instead of just 
two different orientations. The same technique 
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can be applied immediately to solve optimally the 
area minimization problem for slicing floorplans 
in O(nd) time where n is the total number of 
realizations of all the cells in a given floorplan 
Fand d is the depth of the slicing tree of F. Shi 
[7] has further improved this result to O(n logn) 
time. This is done by storing the list of non- 
redundant pairs at each node in a balanced binary 
search tree structure called realization tree and 
using a new merging algorithm to combine two 
such trees to create a new one. It is also proved 
in [7] that this O(m log) time complexity is the 
lower bound for this area minimization problem 
in slicing floorplans. 

For hierarchical non-slicing floorplans, Pan 
et al. [6] prove that the problem is NP-complete. 
Branch-and-bound algorithms are developed by 
Wang and Wong [9], and pseudopolynomial time 
algorithms are developed by Wang and Wong 
[10] and Pan et al. [6]. For general floorplans, 
Stockmeyer [8] has shown that the problem is 
strongly NP-complete. It is therefore unlikely 
to have any pseudopolynomial time algorithm. 
Wimer et al. [11] and Chong and Sahni [1] 
propose branch-and-bound algorithms. Pan et al. 
[5] develop algorithms for general floorplans that 
are approximately slicing. 
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Problem Definition 


In the last decade, the theoretical study of the slid- 
ing window model was developed to advance ap- 
plications with very large input and time-sensitive 
output. In some practical situations, input might 
be seen as an ordered sequence, and it is use- 
ful to restrict computations to recent portions 
of the input. Examples include the analysis of 
recent tweets and time series of the stock market. 


Sliding Window Algorithms 


To address the aforementioned practical situa- 
tions, Datar et al. [20] introduced the sliding 
window model that assumes that the input is a 
stream (i.e., the ordered sequence) of data el- 
ements and divides the data elements into two 
categories: active elements and expired elements. 
Typically, a recent portion (i.e., a suffix) of the 
stream defines the window of active elements, 
and the reminder (i.e., a complimenting prefix) 
of the stream defines the set of expired elements. 
When a new data element arrives, the set of active 
elements expands to include the new element, 
but the set might also shrink by discarding some 
portion of oldest active elements. This process 
of additions and expirations reminds one of the 
movements of an interval (or a window) along 
a line and explains the name of the model. The 
number of active elements N is often called a 
size of the sliding window. There are two popular 
variants of the sliding window model. The variant 
of a sequence-based window fixes the number of 
active elements N, and every insertion (or arrival) 
of a new element corresponds to a deletion (or 
expiration) of the oldest active element (after the 
size of the stream becomes larger than NV). For 
example, a sequence-based window on a stream 
of IP packets is a set of last N packets. The 
variant of a timestamp-based window associates 
each element with a nondecreasing timestamp, 
and the window contains all elements with times- 
tamps larger than a certain value. Thus, there is 
no obvious dependence between the number of 
elements that arrive and expire. In the previous 
example, the timestamp-based window might be 
defined as a set of all packets that arrived within 
the last t seconds. 


Formal Definition 

We denote the stream D by a sequence of ele- 
ments {p;}"_, where p; € [mn]. It is important 
to note that m is incremented for each new 
arrival. A bucket B(x,y) = {pi,i € [x,y]} 
is the set of all stream elements between p, 
and py, inclusively. A sequence-based window is 
defined W = Bim — N + 1,m) where N isa 
predefined parameter. Consider a nondecreasing 
timestamp function T : [m] — R and let t be 
a parameter. Given T and f, a timestamp-based 
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window is defined as W = B(I(t),m) where 
l(t) = min{i : TW) => T(m) — t}. Consider 
function f that is defined on buckets. An algo- 
rithm maintains a (1 + €)-approximation of f on 
W if, at any moment, the algorithm outputs X s.t. 
| f(W) — X| < €f(W). Similarly, a randomized 
algorithm maintains a (1 + €,4)-approximation 
if P(| f(W) — X| > «f(W)) < 4. It is often 
the case that f can be computed precisely if the 
entire window is available, but sublinear-space 
approximations, i.e., computation when the size 
of the available memory is o(N + n), might be 
challenging. For example, Datar et al. [20] show 
linear space is required to maintain a (1 + e¢, 4)- 
approximation of a sum of active elements if 
pi € {1,0,—1}. A typical question in the sliding 
window model is the following: given function 
jf, what are the upper and lower bounds on 
the space complexity of maintaining (1 + e, )- 
approximation of f. 


History 

In their pioneering papers, Datar et al. [20, 21] 
and Babcock et al. [3] gave the first formal defi- 
nition of the sliding window model. The model 
arose in the context of relational databases as 
a special case of time-sensitive queries in tem- 
poral databases [3]. Below we give a short sur- 
vey of a subset of known results. A survey of 
Datar and Motwani [1] provides additional de- 
tails. Datar et al. [20] gave the first algorithms 
for estimating the count and sum of positive 
integers, average, Lp for p € [1,2], and a wide 
class of weakly additive functions. Gibbons and 
Tirthapura [24] provided further improvements 
to count and sum and gave the first methods 
for distributed computations. Lee and Ting [29] 
provided an optimal solution for a relaxed version 
of the counting problem, where the correct an- 
swer is provided only if it is comparable with the 
window’s size. Braverman and Ostrovsky [6, 7] 
extended the results in [20] to a wider class of 
smooth functions. Chi et al. [15] considered a 
problem of frequent itemsets. Arasu and Manku 
[2], Lee and Ting [30], and Golab et al. [26] 
considered the problem of finding frequent el- 
ements, frequency counts, and quantiles. Bab- 
cock, Datar, Motwani, and O’Callaghan [5] pro- 
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vided first algorithms for variance and k-medians 
problems. Feigenbaum, Kannan and Zhang [22] 
presented an efficient solution for the diameter 
of a data set in multidimensional space. Later, 
Chan and Sadjad [23] presented optimal solu- 
tions for this and other geometric problems. Bab- 
cock, Datar and Motwani [4] presented algo- 
rithms for uniform random sampling from sliding 
windows. 

Recently, Crouch et al. [17] presented the first 
approximation algorithms for important graph 
problems such as combinatorial sparsifiers and 
spanners, graph matching, and minimum span- 
ning tree. Among other results, the methods in 
[17] allow non-smooth statistics using a modified 
smooth histogram to be computed. McGregor 
provided a detailed survey of these and other 
graph algorithms [32]. Datar and Muthukrishnan 
[19] solved problems of rarity and similarity. 
Braverman et al. [11] gave improved algorithms 
for rarity, similarity, and L2-heavy hitters. Cor- 
mode and Yi developed several first algorithms 
for sliding windows in distributed streams [16]. 
Babcock et al. [4] gave the first method of sam- 
pling an element with constant expected space 
complexity. Braverman et al. [9, 10] gave a solu- 
tion with a space complexity that is a constant in 
the worst case. Tatbul and Zdonik [35] considered 
the problem of load shedding for aggregation 
queries. Golab and Ozsu [25] gave the first al- 
gorithm for approximating multi-joins. Recently, 
Braverman et al. [13] extended the zero-one law 
for increasing frequency-based functions [8] to 
sliding windows. 


Key Results 


Smooth Histogram 

Extending the results in [20], Braverman and 
Ostrovsky [6,7] introduced a notion of a smooth 
function and presented techniques for approxi- 
mating smooth functions over sliding windows. 
Denote by B C, A the event when bucket B is 
a suffix of A; ie. if A = {pn,,---, Pny} (for 
some n; <2), then B = {pp3,..., Pay}, where 
ny <n3 < Nz. Denote by A U C the union of 
adjacent buckets A and C. 
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Definition 1 Function f is (a, B)-smooth if it 
preserves the following properties: 


1. f(A) = 0. 
2. f(A) => f(B) for BC, A. 
3. f(A) < poly(Al). 
4. For any 0 < € < 1, there exista = a(e, f) 
and 6B = B(e, f) such that 
°° 0<Bs<a<il. 
¢ If BC, Aand (1 — B) f(A) < f(B), then 
(l—a)f(AUC) < f(B UC) for any 
adjacent C. 


In other words, a nonnegative, nondecreasing, 
and polynomially bounded function f is (a, 6)- 
smooth if the following is true. If f(B) is a 
(1 + £)-approximation of f(A), then f(B U C) 
is (1 4 q@)-approximation of f(A U C) for 
any B C, A and C. The main technical 
result of [7] is a new data structure called 
“smooth histogram” that allows algorithms 
for insertion-only streams to be extended 
to sliding windows with space complexity 
increased by a polylogarithmic factor. If there 
exists an algorithm that computes f precisely 
using g space and h time per element, then 
a smooth histogram can be used to maintain 
a (1 + q@)-approximation of f over sliding 


windows, using O (7 logn(g + logn)) bits 
and O (zh logn) time. Further, (1 + p)- 


approximation of f on D results in (1 + 
(a + p))-approximation of f over sliding 
windows. Examples of smooth functions include 
sum, count, min, diameter, weakly additive 
functions, L, norms, frequency moments, 
length of longest subsequence, and geometric 
mean. 

Let f be (a, 8)-smooth for which there ex- 
ists an algorithm A that calculates f on D 
using g space and / operation per element. To 
maintain f on sliding windows, we construct a 
data structure that we call smooth histogram. It 
consists of a set of indexes x} < X2 <<+:: < 
Xs = WN and instances of A for each bucket 
B(x;, N). Informally, the smooth histogram en- 
sures the following properties of the sequence. 
The first two elements of the sequence always 
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“sandwich” the window, ie., x} < N—n < 
X2. This requirement and the monotonicity of 
f give us useful bounds for the sliding window 
W: f(%2,N) < f(W) < f(%1,N). Also, 
f should slowly but constantly decrease with 
i, ie., f(Xi42,N) < CU — B) f(x, N). This 
gradual decrease, together with the fact that f is 
polynomially bounded, ensures that the sequence 
is short, i.e. s = O G log n). Finally, the values 
of f on successive buckets were close in the 
past, ie, f(xi41,N’) = Cl — B)f Qu, N’) for 
some N’ < N. This represents our key idea and 
exploits the properties of smoothness. Indeed, 
J (x2, N’) = (1—B) f(x1, N’) for some N’ < N; 
thus, by the (a, 8)-smoothness of f/f, we have 
f(%2,N) = (a) f0a,N) = a) f(W). 
We refer a reader to [7] for further technical 
details. 


Applications 


There are several applications of the theoretical 
methods for the sliding window model, for ex- 
ample, [15, 18,31, 33, 36]. 


Open Problems 


We list several interesting open problems. It 
would be important to understand the difference 
between the sliding window model and other 
streaming models such as the insertion-only 
model, the turnstile, and decay models. This is 
perhaps one of the most important unresolved 
open problems; see, e.g., Sohler [34]. In 
particular, it would be nice to understand 
the exact space complexity of the frequency 
moments that are well understood in the other 
streaming models [12, 27, 28]. Also, it would 
be interesting to extend the coreset methods 
[14] to sliding windows, obtain polylogarithmic 
solutions for clustering, and improve the first 
clustering algorithm in [5]. Also, it would 
be nice to further develop graph methods 
[17]. Improving the approximation ratio of the 
maximum matching and obtaining the O(n!+1/") 
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space bound for (2¢ — 1)-spanners are important 
open problems. 
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Problem Definition 


Given a smooth surface S C R?, we are required 
to compute a set of points P C S and connect 
them with edges and triangles so that the resulted 
triangulation T is geometrically close and is 
topologically equivalent to S. 

The output triangulation T is a simplicial 2- 
complex whose vertices are the points in P. Its 
underlying space, which is the pointwise union 
of the simplices (vertices, edges, triangles), is 
denoted with |7'|. Geometric proximity is often 
characterized by Hausdorff distance between S$ 


2011 


and the underlying space |T| of T. It is also 
desired that the triangle normals in T closely 
approximate the surface normals at its vertices. 
Topological equivalence is characterized by the 
existence of a homeomorphism between S and 
|T |. In some cases, the topological guarantee can 
be given in terms of isotopy which is stronger 
than homeomorphism. It is important to notice 
that, unlike polyhedral surfaces, a smooth surface 
cannot be represented exactly and hence needs to 
be approximated with a finite triangulation. This 
approximation requires that the mesh generation 
algorithms guarantee topological fidelity in addi- 
tion to the geometric proximity. 

In volume mesh generation, the space bounded 
by a smooth surface S is required to be tes- 
sellated with tetrahedra which form a simplicial 
3-complex T. Similar to the surface case, it is 
required that the underlying space |T| is geo- 
metrically close and topologically equivalent to 
the space bounded by S. It turns out that if the 
underlying space of the boundary 2-complex of 
T is geometrically close and has an isotopy to S, 
then so is |7'|. 

In both surface and volume meshes, it is 
desirable that the triangles and tetrahedra have 
good aspect ratio. This is often achieved by 
bounding the circumradius to shortest edge 
length ratios for triangles. Unfortunately, for 
tetrahedra, a bounded radius-edge ratio does not 
necessarily imply a bounded aspect ratio though 
most poor quality tetrahedra except slivers [4] 
are eliminated by bounded radius-edge ratio. 
Figure | shows an example of a surface and a 
volume mesh. 


Key Results 


Theoretically sound algorithms for surface mesh- 
ing use the technique of Delaunay refinement 
originally proposed by Chew [8]. For a point 
set P C R?, let Vor P and Del P denote the 
Voronoi diagram and Delaunay triangulation of 
P, respectively. A typical Delaunay refinement 
algorithm iteratively samples the space to be 
meshed with a locally furthest point strategy that 
inserts points where a Voronoi face of appropriate 
dimension intersects the space. The decision of 
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Smooth Surface and Volume Meshing, Fig. 1 A knotted torus, its surface mesh, and its volume mesh 


which points to be inserted is guided by cer- 
tain desirable properties of the output such as 
topological equivalence, simplex radius-edge ra- 
tios, geometric proximity, and so on. 

In both surface and volume meshing, the fea- 
tures of the surface S play an important role 
because regions of small features need to be sam- 
pled relatively densely to capture the geometry 
and topology of S. The definition of local feature 
size and e-sample given by Amenta, Bern, and 
Eppstein [2] captures this idea. 

Let S be a smooth, closed surface, that is, S is 
compact, C 2_smooth, and has no boundary. The 
medial axis M(S) of S is defined as the closure 
of the set of points x € R? so that the distance 
d(x, S) is realized by two or more points in S. 
The local feature size is defined as 


f(x) = d(x, M). 


A set of points P C S is called an e-sample of 
S if every point x € S has a sample point in P 
within ¢ f(x) distance. 

It turns out that if P is an e-sample of S fora 
sufficiently small value of ¢, a subcomplex of the 
Delaunay triangulation of this sample captures 
the topology of S. We define this subcomplex in 
generality and then specialize it to S. 

Let V; denote the dual Voronoi face of a De- 
launay simplex £€ in Del P. The restricted Voronoi 
face of Vg with respect to X C R? is the inter- 
section Vg|x = Vg M X. The restricted Voronoi 
diagram and restricted Delaunay triangulation of 
P with respect to X are 


Vor P|x = {Velx | Velx 4 O} and Del P|x 
= {& | Velx 4 O} respectively. 


In words, Del P|x consists of those Delaunay 
simplices in Del P whose dual Voronoi face in- 
tersects X. We call these simplices restricted. 
Now consider a sample P on the surface S. 
The restricted Delaunay triangulation of P with 
respect to S is Del P|s. It is known that if P is 
an e-sample of S for e < 0.09, then Del P|5 has 
its underlying space homeomorphic to S [1, 9]. 
To use this result one requires computing an e- 
sample of S. A computation of local feature size 
or its approximation is necessary to determine if 
a sample is an ¢-sample for a predetermined e. 
Even if one is allowed to assume the availability 
of the local feature size at any given point, it is 
not immediately obvious how to place points on 
S' so that they become ¢-sample for a given ¢ > 0. 


Surface Meshing 

The following theorem about the fidelity of the 
restricted Delaunay triangulation of a dense sam- 
ple on a smooth closed surface is the basis of 
provable surface meshing algorithms. It has been 
proved in various versions in [1,5,7,9]. 


Theorem 1 Let P be an ¢-sample of a smooth, 
compact, boundary-less surface S C R3. The re- 
stricted Delaunay complex T = Del P|s satisfies 
the following properties for €¢ < 0.09: 


1. The underlying space |T | is homeomorphic to 
S (actually, there is an ambient isotopy taking 
|T| to S). 

2. Every point in |T| has a point x € S so that 
d(p,x) < O(e) f(x). Similarly, every point x 
in S has a point p in \T| so that d(p,x) < 
Ole) f(x). 
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3. Each triangle t € T has a normal making an 
angle O(¢€) with the normal to the surface S 
at any of its vertices. 


Cheng, Dey, Edelsbrunner, and Sullivan [5] 
applied Chew’s furthest point placement strat- 
egy [8] to maintain a dynamic surface mesh 
of a special type of surface called skin surface 
for which they computed the local feature size 
explicitly. The above theorem then allowed them 
to argue the geometric and topological fidelity 
of the output. Boissonnat and Oudot [3] used 
similar point placement strategy assuming that 
the local feature sizes are available, but they 
suggested how to initialize the meshing proce- 
dure for general surfaces. For a restricted triangle 
t € Del P|gs, the dual Voronoi edge intersects S 
possibly at multiple points. Each ball centering 
such an intersection point and circumscribing 
vertices of ¢ is called a surface Delaunay ball of 
t. Boissonnat and Oudot observed that if every 
surface Delaunay ball of each restricted triangle 
has small radius, say at most 0.05 times the 
local feature size at the center, then P a 0.09- 
sample of S. It follows that Del P| at this point 
satisfies the properties stated in Theorem |. The 
deduction of this conclusion also requires that 
every component of S has at least one Voronoi 
edge intersecting it which Boissonnat and Oudot 
ensure with persistent triangles. 

When local feature sizes are not known, 
we cannot use the method of Boissonnat and 
Oudot [3]. Instead, we fall back upon a different 
strategy to drive the Delaunay refinement. A 
result of Edelsbrunner and Shah [10] says 
that if Voronoi faces intersect S in a closed 
topological ball of appropriate dimension, then 
the underlying space of the restricted Delaunay 
triangulation becomes homeomorphic to S. In 
fact, this is the basis of the proof of Theorem 1. 
Therefore, a Delaunay refinement driven by 
the violation of the topological ball conditions 
provides a viable strategy for meshing with 
topological guarantees. This strategy is followed 
by Cheng, Dey, Ramos, and Ray [6]. 

The algorithm of Cheng et al. avoids com- 
puting local feature sizes or their approximation; 
however, it needs to compute critical points of 
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certain functions on the surface, which may not 
be easily computable. In a recent book on De- 
launay mesh generation [7], Cheng, Dey, and 
Shewchuk have suggested a strategy that is more 
practical which leverages on both algorithms of 
Boissonnat and Oudot [3] and Cheng et al. [6]. 
It operates with an input parameter A > 0. 
As long as the surface Delaunay balls of the 
restricted triangles are not all smaller than a ball 
of radius A, the algorithm refines. It also refines 
if the restricted triangles around each vertex do 
not form a topological disk. The algorithm can 
be shown to terminate and has the following 
guarantees. 


Theorem 2 ((7]) There is a Delaunay refine- 
ment algorithm that runs with a parameter i > 0 
on an input smooth, compact, boundary-less 
surface S with the following guarantees: 


1. The output mesh is a Delaunay subcomplex 
and is a 2-manifold for all values of i. 

2. If X is sufficiently small, then the output mesh 
has similar guarantees with respect to the 
input surface S as in Theorem I (replace ¢€ 
with 2). 


It should be noted that in any of the above 
algorithms, one may introduce the condition that 
the output triangles have radius-edge ratio of at 
most | without loosing any of the geometric 
or topological guarantees. Even a graded mesh 
can be guaranteed by supplying an appropriate 
grading function as input. For details see [7]. 


Volume Meshing 

Let O denote the volume enclosed by a smooth 
surface S. Consider the surface mesh of S pro- 
duced by one of the algorithms mentioned above. 
The volume enclosed by this surface mesh is 
already triangulated with Delaunay tetrahedra. 
We can further refine them for quality using the 
radius-edge ratio condition. The circumcenters of 
skinny tetrahedra can be added as long as they do 
not disturb the surface triangulation. One easy ap- 
proach is to skip adding those circumcenters who 
encroach the surface Delaunay balls meaning that 
they lie inside these balls. This ensures that all 
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surface triangles remain intact. The trade-off of 
this easy fix is that the tetrahedra near the bound- 
ary may not have bounded radius-edge ratios. To 
ensure the quality for all tetrahedra, additional 
effort is required to maintain the surface. Oudot, 
Rineau, and Yvinec [11] proposed an algorithm 
for guaranteed quality volume meshing. 

The algorithm first runs the algorithm of [3] 
to obtain a surface triangulation with a vertex set 
P on the surface. It uses two parameters ¢ and 
p where e controls the level of refinement and 
p controls the aspect ratios of the tetrahedra and 
triangles. It ensures that all restricted triangles on 
the surface have vertices from S. It refines surface 
triangles as in surface meshing algorithm. Then, 
it refines the tetrahedra. Refinement of surface tri- 
angles is given priority over the tetrahedra. Oudot 
et al. [11] prove that their algorithm terminates 
and has the following geometric and topological 
guarantees. 


Theorem 3 ({11]) Given a volume O bounded 
by a smooth surface S, for ¢ < 0.05 and p > 1, 
there is an algorithm that produces T = Del P|o 
where each tetrahedron in T has radius-edge 
ratio at most p and |T| is homeomorphic (iso- 
topic) to O and the boundary of T is Del P|s. 
Furthermore, the isotopy moves a point x € S by 
at most O(¢”) f (x) distance. 


An improved version of the algorithm and its 
analysis in presented in the book [7]. 


URLs to Code and Data Sets 


CGAL(http://cgal.org), a library of geometric 
algorithms, contains software for surface and 
volume mesh generation. The DelPSC software 
that implements the surface and volume meshing 
algorithms as described in [7] is also available 
from _http://web.cse.ohio-state.edu/~tamaldey/ 
delpsc.html. 
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Smoothed Analysis 


Problem Definition 


Smoothed analysis has originally been introduced 
by Spielman and Teng [22] in 2001 to explain 
why the simplex method is usually fast in prac- 
tice despite its exponential worst-case running 
time. Since then it has been applied to a wide 
range of algorithms and optimization problem. 
In smoothed analysis, inputs are generated in 
two steps: first, an adversary chooses an arbi- 
trary instance, and then this instance is slightly 
perturbed at random. The smoothed performance 
of an algorithm is defined to be the worst ex- 
pected performance the adversary can achieve. 
This model can be viewed as a less pessimistic 
worst-case analysis, in which the randomness 
rules out pathological worst-case instances that 
are rarely observed in practice but dominate the 
worst-case analysis. If the smoothed running time 
of an algorithm is low (i.e., the algorithm is ef- 
ficient in expectation on any perturbed instance) 
and inputs are subject to a small amount of 
random noise, then it is unlikely to encounter an 
instance on which the algorithm performs poorly. 
In practice, random noise can stem, for example, 
from measurement errors, numerical imprecision, 
or rounding errors. It can also model arbitrary 
influences, which we cannot quantify exactly, but 
for which there is also no reason to believe that 
they are adversarial. After its invention smoothed 
analysis has been applied in a variety of different 
contexts, e.g., linear programming [8, 19,21, 23], 
multi-objective optimization [5, 10, 17,18], online 
and approximation algorithms [4,7,20], searching 
and sorting [3,12,15,16], game theory [9,11], and 
local search [1,2, 13, 14]. 


Key Results 


Simplex Method 
Spielman and Teng [22] considered linear pro- 
grams of the form 


maximize c! x 


subject to (A + G)x < (b +h), 


where A € R”*4 and b € R"” are chosen 
arbitrarily by an adversary and the entries of the 
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matrix G € R”*@ and the vector h € R” are 
independent Gaussian random variables that rep- 
resent the perturbation. These Gaussian random 
variables have mean 0 and standard deviation 
o - (max; \|(b;,@:)]|), where the vector (b;,@) € 
R¢@+! consists of the i-th component of b and the 
i-th row of A and ||-|| denotes the Euclidean norm. 
Without loss of generality, we can scale the linear 
program specified by the adversary and assume 
that max;||(b;,a;)|| = 1. Then the perturba- 
tion consists of adding an independent Gaussian 
random variable with standard deviation o to 
each entry of A and b. The smaller o is chosen, 
the more concentrated are the random variables, 
and hence, the better worst-case instances can be 
approximated by the adversary. Intuitively, 0 can 
be seen as a measure specifying how close the 
analysis is to a worst-case analysis. 

Spielman and Teng analyzed the smoothed 
running time of the simplex algorithm using the 
shadow vertex pivot rule. This pivot rule has a 
simple and intuitive geometric description which 
makes probabilistic analyses feasible. Let x9 de- 
note the given initial vertex of the polytope P 
of feasible solutions. Since x9 is a vertex of 
the polytope, there exists an objective function 
u! x which is maximized by x9 subject to the 
constraint x € P. In the first step, the shadow 
vertex pivot rule computes an objective function 
u’ x with this property. If xo is not an optimal 
solution of the linear program, then the vectors 
c and wu are linearly independent and span a 
plane. The shadow vertex method projects the 
polytope P onto this plane. The shadow, that is, 
the projection of P onto this plane is a possibly 
open polygon. One can show that both xo and the 
optimal solution x* are projected onto vertices of 
the polygon and that each path between the pro- 
jections of xo and x* in the polygon corresponds 
to a path between xo and x* in the polytope. 
Hence, one only needs to follow the edges of the 
polygon starting from the projection of xo to (the 
projection of) x*. 

The number of steps performed by the simplex 
method with shadow vertex pivot rule is upper 
bounded by the number of vertices of the two- 
dimensional projection of the polytope. Hence, 
bounding the expected number of vertices on the 
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polygon is the crucial step for bounding the ex- 
pected running time of the simplex method with 
shadow vertex pivot rule. Spielman and Teng first 
consider the case that the polytope P is projected 
onto a fixed plane specified by two fixed vectors 
c and u. They show that the expected number of 
vertices of the polygon is polynomially bounded 
in d,n, and 1/o. Though this result is the main 
ingredient of the analysis, alone it does not yield a 
polynomial bound on the smoothed running time 
of the simplex method. We have, for example, 
not yet described how the initial solution x9 is 
found. It is also problematic that the vector wu is 
not independent of the constraints because it is 
determined by xo which in turn is determined by 
a subset of the constraints. Spielman and Teng 
showed in a very involved analysis the following 
theorem. 


Theorem 1 The smoothed running time of the 
shadow vertex simplex method is bounded poly- 
nomially in d, n, and 1/o. 


Later, this analysis was substantially improved 
and simplified by Vershynin [23], who proved 
that the smoothed running time is even polyno- 
mially bounded in d, logn, and 1/o. 


Binary Optimization Problems 

Beier and Vécking [6] studied the question which 
linear binary optimization problems have polyno- 
mial smoothed complexity. Intuitively these are 
the problems that can be solved efficiently on 
perturbed inputs. An instance J of such an op- 
timization problem /7 consists of a set of feasible 
solutions S C {0,1}” and a linear objective 
function f : {0,1}” — R of the form maximize 
(or minimize) f(x) = c’x for some c € R”. 
Many well-known optimization problems can be 
formulated this way, e.g., the problem of finding a 
Minimum Spanning Tree, the Knapsack Problem, 
and the Traveling Salesman Problem. 

It is assumed that an adversary is allowed 
to choose the coefficients of the objective 
function from the interval [—1, 1]. In the second 
step, these coefficients are perturbed by adding 
independent Gaussian random variables with 
mean 0 and standard deviation o to them. 
Naturally one might define that a problem JT has 
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polynomial smoothed complexity if there exists 
an algorithm A for IT whose expected running 
time E[74(/)] is bounded polynomially in the 
input size |/| and 1/o. This definition, however, 
is not sufficiently robust as it depends on the 
machine model. An algorithm with expected 
polynomial running time on one machine model 
might have expected exponential running time on 
another machine model even if the former can 
be simulated by the latter in polynomial time. In 
contrast, the definition from [6] yields a notion 
of polynomial smoothed complexity that does 
not vary among classes of machines admitting 
polynomial time simulations among each other. It 
states that a problem J7 has polynomial smoothed 
complexity if there exists an algorithm A for JT 
and some @ > 0 such that E [T4(/)*] is bounded 
polynomially in the input size |/| and 1/o. 

Beier and Voécking proved the following 
theorem that characterizes the class of linear 
binary optimization problems with polynomial 
smoothed complexity. 


Theorem 2 A linear binary optimization prob- 
lem IT has polynomial smoothed complexity if 
and only if there exists a randomized algorithm 
for solving IT whose expected worst-case running 
time is pseudo-polynomial with respect to the 
coefficients in the objective function. 


For example, the knapsack problem, which 
can be solved by dynamic programming 
in pseudo-polynomial time, has polynomial 
smoothed complexity even if the weights are 
fixed and only the profits are randomly perturbed. 
Moreover, the traveling salesman problem does 
not have polynomial smoothed complexity when 
only the distances are randomly perturbed, unless 
P=NP, since a simple reduction from Hamilto- 
nian cycle shows that it is strongly NP-hard. 


Open Problems 


An interesting open question is whether or not 
other pivot rules for the simplex method also have 
polynomial smoothed running time. It would also 
be interesting to see whether the insights gained 
from smoothed analysis can be used to improve 
existing algorithms. 


Snapshots in Shared Memory 


Cross-References 


Knapsack 


Recommended Reading 


1. 


10. 


11. 


12. 


Arthur D, Vassilvitskii S (2009) Worst-case and 
smoothed analysis of the ICP algorithm, with an 
application to the k-means method. SIAM J Comput 
39(2):766-782 

Arthur D, Manthey B, Réglin H (2011) Smoothed 
analysis of the k-means method. J ACM 58(5) 
Banderier C, Beier R, Mehlhorn K (2003) Smoothed 
analysis of three combinatorial problems. In: Pro- 
ceedings of the 28th international symposium 
on mathematical foundations of computer science 
(MFCS), Bratislava. Lecture notes in computer sci- 
ence, vol 2747. Springer, pp 198-207 

Becchetti L, Leonardi S, Marchetti-Spaccamela A, 
Schafer G, Vredeveld T (2006) Average case and 
smoothed competitive analysis of the multilevel feed- 
back algorithm. Math Oper Res 31(1):85—108 

Beier R, Vécking B (2004) Random knapsack 
in expected polynomial time. J Comput Syst Sci 
69(3):306-329 

Beier R, Vécking B (2006) Typical properties of 
winners and losers in discrete optimization. SIAM J 
Comput 35(4):855-88 1 

Blaser M, Manthey B, Rao BVR (2011) Smoothed 
analysis of partitioning algorithms for Euclidean 
functionals. In: Proceedings of the 12th workshop on 
algorithms and data structures (WADS), New York. 
Lecture notes in computer science. Springer, pp 110— 
121 

Blum AL, Dunagan JD (2002) Smoothed analysis of 
the perceptron algorithm for linear programming. In: 
Proceedings of the 13th annual ACM-SIAM sympo- 
sium on discrete algorithms (SODA), San Francisco. 
SIAM, pp 905-914 

Boros E, Elbassioni K, Fouz M, Gurvich V, Makino 
K, Manthey B (2011) Stochastic mean payoff games: 
smoothed analysis and approximation schemes. In: 
Proceedings of the 38th international colloquium 
on automata, languages and programming (ICALP), 
Zurich, Part I. Lecture notes in computer science, 
vol 6755. Springer, pp 147-158 

Brunsch T, Réglin H (2012) Improved smoothed 
analysis of multiobjective optimization. In: Proceed- 
ings of the 44th annual ACM symposium on theory 
of computing (STOC), New York, pp 407-426 

Chen X, Deng X, Teng SH (2009) Settling the com- 
plexity of computing two-player Nash equilibria. J 
ACM 56(3) 

Damerow V, Manthey B, auf der Heide FM, Racke H, 
Scheideler C, Sohler C, Tantau T (2012) Smoothed 
analysis of left-to-right maxima with applications. 
ACM Trans Algorithms 8(3): Article no 30 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


2017 


Englert M, Roglin H, Vécking B (2007) Worst case 
and probabilistic analysis of the 2-Opt algorithm for 
the TSP. In: Proceedings of the 18th annual ACM- 
SIAM symposium on discrete algorithms (SODA), 
New Orleans. SIAM, pp 1295-1304 

Etscheid M, Réglin H (2014) Smoothed analysis of 
local search for the maximum-cut problem. In: Pro- 
ceedings of the 25th annual ACM-SIAM symposium 
on discrete algorithms (SODA), Portland pp 882-889 
Fouz M, Kufleitner M, Manthey B, Zeini Jahromi 
N (2012) On smoothed analysis of quicksort and 
Hoare’s find. Algorithmica 62(3—-4):879-905 
Manthey B, Reischuk R (2007) Smoothed analysis of 
binary search trees. Theor Comput Sci 378(3):292— 
315 

Moitra A, O’Donnell R (2012) Pareto optimal so- 
lutions for smoothed analysts. SIAM J Comput 
41(5):1266-1284 

Roglin H, Teng SH (2009) Smoothed analysis of mul- 
tiobjective optimization. In: Proceedings of the 50th 
annual IEEE symposium on foundations of computer 
science (FOCS), Atlanta. IEEE, pp 681-690 

Sankar A, Spielman DA, Teng SH (2006) Smoothed 
analysis of the condition numbers and growth factors 
of matrices. SIAM J Matrix Anal Appl 28(2):446- 
476 

Schafer G, Sivadasan N (2005) Topology matters: 
smoothed competitiveness of metrical task systems. 
Theor Comput Sci 241(1-3):216-246 

Spielman DA, Teng SH (2003) Smoothed analysis of 
termination of linear programming algorithms. Math 
Program 97(1—2):375—404 

Spielman DA, Teng SH (2004) Smoothed analysis of 
algorithms: why the simplex algorithm usually takes 
polynomial time. J ACM 51(3):385—-463 

Vershynin R (2009) Beyond Hirsch conjecture: walks 
on random polytopes and smoothed complexity of the 
simplex method. SIAM J Comput 39(2):646-678 


Snapshots in Shared Memory 


Eric Ruppert 

Department of Computer Science and 
Engineering, York University, Toronto, 
ON, Canada 


Keywords 


Atomic scan 


Years and Authors of Summarized 
Original Work 


1993; Afek, Attiya, Dolev, Gafni, Merritt, Shavit 


2018 


Problem Definition 


Implementing a snapshot object is an abstraction 
of the problem of obtaining a consistent view of 
several shared variables while other processes are 
concurrently updating those variables. 

In an asynchronous shared-memory dis- 
tributed system, a collection of n processes 
communicate by accessing shared data structures, 
called objects. The system provides basic types 
of shared objects; other needed types must be 
built from them. One approach uses locks to 
guarantee exclusive access to the basic objects, 
but this approach is not fault-tolerant, risks 
deadlock or livelock, and causes delays when 
a process holding a lock runs slowly. Lock-free 
algorithms avoid these problems but introduce 
new challenges. For example, if a process reads 
two shared objects, the values it reads may not 
be consistent if the objects were updated between 
the two reads. 

A snapshot object stores a vector of m values, 
each from some domain D. It provides two opera- 
tions: scan and update(i, v), where 1 < i < mand 
v € D. If the operations are invoked sequentially, 
an update(i, v) operation changes the value of the 
ith component of the stored vector to v, and a scan 
operation returns the stored vector. 

Correctness when snapshot operations by 
different processes overlap in time is described 
by the linearizability condition, which says op- 
erations should appear to occur instantaneously. 
More formally, for every execution, one can 
choose an instant of time for each operation 
(called its linearization point) between the 
invocation and the completion of the operation. 
(An incomplete operation may either be assigned 
no linearization point or given a linearization 
point at any time after its invocation.) The 
responses returned by all completed operations 
in the execution must return the same result 
as they would if all operations were executed 
sequentially in the order of their linearization 
points. 

An implementation must also _ satisfy 
a progress property. Wait-freedom requires that 
each process completes each scan or update in 
a finite number of its own steps. The weaker 
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non-blocking progress condition says the system 
cannot run forever without some operation 
completing. 

This article describes implementations of 
snapshots from more basic types, which are 
also linearizable, without locks. Two types 
of snapshots have been studied. In a single- 
writer snapshot, each component is owned by 
a process, and only that process may update it. 
(Thus, for single-writer snapshots, m =n.) In 
a multi-writer snapshot, any process may update 
any component. There also exist algorithms 
for single-scanner snapshots, where only one 
process may scan at a time [10, 13, 14, 16]. 
Snapshots were introduced by Afek et al. [1], 
Anderson [2] and Aspnes and Herlihy [4]. 

Space complexity is measured by the number 
of basic objects used and their size (in bits). 
Time complexity is measured by the maximum 
number of steps a process must do to finish 
a scan or update, where a step is an access to 
a basic shared object. (Local computation and 
local memory accesses are usually not counted.) 
Complexity bounds will be stated in terms of 
n,m,d = log|D| and k, the number of opera- 
tions invoked in an execution. Ordinarily, there 
is no bound on k. 

Most of the algorithms below use read-write 
registers, the most elementary shared object type. 
A single-writer register may only be written 
by one process. A multi-writer register may be 
written by any process. Some algorithms using 
stronger types of basic objects are discussed in 
section “Wait-Free Implementations from Small, 
Stronger Objects”. 


Key Results 


A Simple Non-blocking Implementation 

from Small Registers 

Suppose each component of a single-writer snap- 
shot object is represented by a single-writer reg- 
ister. Process i does an update(i, v) by writing v 
and a sequence number into register i, and incre- 
menting its sequence number. Performing a scan 
operation is more difficult than merely reading 
each of the m registers, since some registers 
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might change while these reads are done. To 
scan, a process repeatedly reads all the regis- 
ters. A sequence of reads of all the registers is 
called a collect. If two collects return the same 
vector, the scan returns that vector (with the 
sequence numbers stripped away). The sequence 
numbers ensure that, if the same value is read 
in a register twice, the register had that value 
during the entire interval between the two reads. 
The scan can be assigned a linearization point 
between the two identical collects, and updates 
are linearized at the write. This algorithm is non- 
blocking, since a scan continues running only if 
at least one update operation is completed during 
each collect. A similar algorithm, with process 
identifiers appended to the sequence numbers, 
implements a non-blocking multi-writer snapshot 
from m multi-writer registers. 


Wait-Free Implementations from 

Large Registers 

Afek et al. [1] described how to modify the 
non-blocking single-writer snapshot algorithm to 
make it wait-free using scans embedded within 
the updates. An update(i, v) first does a scan and 
then writes a triple containing the scan’s result, 
v and a sequence number into register i. While 
a process P is repeatedly performing collects to 
do a scan, either two collects return the same 
vector (which P can return) or P will eventually 
have seen three different triples in the register of 
some other process. In the latter case, the third 
triple that P saw must contain a vector that is 
the result of a scan that started after P’s scan, so 
P’s scan outputs that vector. Updates and scans 
that terminate after seeing two identical collects 
are assigned linearization points as before. If one 
scan obtains its output from an embedded scan, 
the two scans are given the same linearization 
point. This is a wait-free single-writer snapshot 
implementation from n single-writer registers of 
(n + 1)d + logk bits each. Operations complete 
within O(n?) steps. Afek et al. [1] also describe 
how to replace the unbounded sequence numbers 
with handshaking bits. This requires nO(nd )-bit 
registers and n 1-bit registers. Operations still 
complete in O(n’) steps. 
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The same idea can be used to build multi- 
writer snapshots from multi-writer registers. 
Using unbounded sequence numbers yields 
a wait-free algorithm that uses m registers 
storing O(nd+logk) bits each, in which 
each operation completes within O(mn) steps. 
(This algorithm is given explicitly in [9].) No 
algorithm can use fewer than m registers if 
n > m [9]. If handshaking bits are used instead, 
the multi-writer snapshot algorithm uses n? 
1-bit registers, m(d + logn)-bit registers and 
n (md)-bit registers, and each operation uses 
O(nm + n?) steps [1]. 

Guerraoui and Ruppert [12] gave a similar 
wait-free multi-writer snapshot implementation 
that is anonymous, i.e., it does not use pro- 
cess identifiers and all processes are programmed 
identically. 

Anderson [3] gave an implementation of 
a multi-writer snapshot from a_ single-writer 
snapshot. Each process stores its latest update 
to each component of the multi-writer snapshot 
in the single-writer snapshot, with associated 
timestamp information computed by scanning the 
single-writer snapshot. A scan is done using just 
one scan of the single-writer snapshot. An update 
requires scanning and updating the single-writer 
snapshot twice. The implementation involves 
some blow-up in the size of the components, 
i.e., to implement a multi-writer snapshot with 
domain D requires a single-writer snapshot 
with a much larger domain D’. If the goal 
is to implement multi-writer snapshots from 
single-writer registers (rather than multi-writer 
registers), Anderson’s construction gives a more 
efficient solution than that of Afek et al. 

Attiya, Herlihy and Rachman [7] defined the 
lattice agreement object, which is very closely 
linked to the problem of implementing a single- 
writer snapshot when there is a known upper 
bound on k. Then, they showed how to construct 
a single-writer snapshot (with no bound on k) 
from an infinite sequence of lattice agreement 
objects. Each snapshot operation accesses the 
lattice agreement object twice and does O(n) 
additional steps. Their implementations of lattice 
agreement are discussed in section ‘“‘Wait-Free 
Implementations from Small, Stronger Objects”. 
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Attiya and Rachman [8] used a similar ap- 
proach to give a single-writer snapshot imple- 
mentation from large single-writer registers using 
O(n logn) steps per operation. Each update has 
an associated sequence number. A scanner tra- 
verses a binary tree of height logk from root 
to leaf (here, a bound on &k is required). Each 
node has an array of n single-writer registers. 
A process arriving at a node writes its current 
vector into a single-writer register associated with 
the node and then gets a new vector by combining 
information read from all n registers. It proceeds 
to the left or right child depending on the sum 
of the sequence numbers in this vector. Thus, 
all scanners can be linearized in the order of 
the leaves they reach. Updates are performed 
by doing a similar traversal of the tree. The 
bound on k can be removed as in [7]. Attiya and 
Rachman also give a more direct implementation 
that achieves this by recycling the snapshot object 
that assumes a bound on k. Their algorithm has 
also been adapted to solve condition-based con- 
sensus [15]. 

Attiya, Fouren and Gafni [6] described how to 
adapt the algorithm of Attiya and Rachman [8] 
so that the number of steps required to perform 
an operation depends on the number of processes 
that actually access the object, rather than the 
number of processes in the system. 

Attiya and Fouren [5] solve lattice agreement 
in O(n) steps. (Here, instead of using the ter- 
minology of lattice agreement, the algorithm is 
described in terms of implementing a snapshot 
in which each process does at most one snapshot 
operation.) The algorithm uses, as a data struc- 
ture, a two-dimensional array of O(n’) reflectors. 
A reflector is an object that can be used by two 
processes to exchange information. Each reflector 
is built from two large single-writer registers. 
Each process chooses a path through the array 
of reflectors, so that at most two processes visit 
each reflector. Each reflector in column i is used 
by process i to exchange information with one 
process j <i. If process i reaches the reflector 
first, process j learns about i’s update (if any). 
If process j reaches it first, then process i learns 
all the information that j has already gathered. 
(If both reach it at about the same time, both 
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processes learn the information described above.) 
As the processes move from column i — 1 to 
column i, a process that enters column i at some 
row r will have gathered all the information that 
has been gathered by any process that enters 
column i below row r (and possibly more). This 
invariant is maintained by ensuring that if process 
i passes information to any process j <i in row 
r of column i, it also passes that information to 
all processes that entered column i above row r. 
Furthermore, process i exits column / at a row that 
matches the amount of information it learns while 
traveling through the column. When processes 
have reached the rightmost column of the array, 
the ones in higher rows know strictly more than 
the ones in lower rows. Thus, the linearization 
order of their scans is the order in which they exit 
the rightmost column, from bottom to top. The 
techniques of Attiya, Herlihy and Rachman [7, 
8], mentioned above, can be used to remove the 
restriction that each process performs at most one 
operation. The number of steps per operation is 
still O(n). 


Wait-Free Implementations from Small, 
Stronger Objects 

All of the wait-free implementations described 
above use registers that can store (2(m) bits 
each, and are therefore not practical when m is 
large. Some implementations from smaller ob- 
jects equipped with stronger synchronization op- 
erations, rather than just reads and writes, are 
described in this section. An object is considered 
to be small if it can store O(d + logn + logk) 
bits. This means that it can store a constant 
number of component values, process identifiers 
and sequence numbers. 

Attiya, Herlihy and Rachman [7] gave an el- 
egant divide-and-conquer recursive solution to 
the lattice agreement problem. The division of 
processes into groups for the recursion can be 
done dynamically using test&set objects. This 
provides a snapshot algorithm that runs in O(n) 
time per operation, and uses O(kn? logn) small 
single-writer registers and O(kn log? n) test&set 
objects. (This requires modifying their imple- 
mentation to replace those registers that are large, 
which are written only once, by many small 
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registers.) Using randomization, each test&set 
object can be replaced by single-writer registers 
to give a snapshot implementation from registers 
only with O(n) expected steps per operation. 

Jayanti [13] gave a multi-writer snapshot im- 
plementation from O(mn?) small compare & 
swap objects where updates take O(1) steps and 
scans take O(m) steps. He began with a very 
simple single-scanner, single-writer snapshot im- 
plementation from registers that uses a secondary 
array to store a copy of recent updates. A scan 
clears that array, collects the main array, and 
then collects the secondary array to find any 
overlooked updates. Several additional mecha- 
nisms are introduced for the general, multi-writer, 
multi-scanner snapshot. In particular, compare & 
swap operations are used instead of writes to 
coordinate writers updating the same component 
and multiple scanners coordinate with one an- 
other to simulate a single scanner. Jayanti’s algo- 
rithm builds on an earlier paper by Riany, Shavit 
and Touitou [16], which gave an implementation 
that achieved similar complexity, but only for 
a single-writer snapshot. 


Applications 


Applications of snapshots include distributed 
databases, storing checkpoints or backups for 
error recovery, garbage collection, deadlock 
detection, debugging distributed programmes 
and obtaining a consistent view of the values 
reported by several sensors. Snapshots have been 
used as building blocks for distributed solutions 
to randomized consensus and approximate 
agreement. They are also helpful as a primitive 
for building other data structures. For example, 
consider implementing a counter that stores 
an integer and provides increment, decrement 
and read operations. Each process can store 
the number of increments it has performed 
minus the number of its decrements in its own 
component of a single-writer snapshot object, 
and the counter may be read by summing the 
values from a scan. See [10] for references on 
many of the applications mentioned here. 
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Open Problems 


Some complexity lower bounds are known for 
implementations from registers [9], but there re- 
main gaps between the best known algorithms 
and the best lower bounds. In particular, it is not 
known whether there is an efficient wait-free im- 
plementation of snapshots from small registers. 


Experimental Results 


Riany, Shavit and Touitou gave performance eval- 
uation results for several implementations [16]. 
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Problem Definition 


One of the most promising ways to determine 
evolutionary distance between two organisms is 
to compare the order of appearance of identical 
(e.g., orthologous) genes in their genomes. The 
resulting genome rearrangement problem calls 
for finding a shortest sequence of rearrangement 
operations that sorts one genome into the other. 


Sorting by Transpositions and Reversals (Approximate Ratio 1.5) 


In this work [8], Hartman and Sharan provide 
a 1.5-approximation algorithm for the problem 
of sorting by transpositions, transreversals, and 
revrevs, improving on a previous 1.75 ratio for 
this problem. Their algorithm is also faster than 
current approaches and requires O(n3/? ,/logn) 
time for n genes. 


Notations and Definition 

A signed permutation m = [11,12,...,7,] on 
n(x) = n elements is a permutation in which 
each element is labeled by a sign of plus or minus. 
A segment of m is a sequence of consecutive 
elements 1;,™j41,--.,™, Where] <i <k < 
n. A reversal p is an operation that reverses the 
order of the elements in a segment and also flips 
their signs. Two segments 1;,1;+41,..., 1 and 
j,Wj+1,-..,7 are said to be contiguous if 
J =k+lori = /+1.A transposition tis an op- 
eration that exchanges two contiguous (disjoint) 
segments. A transreversal tp4,B (respectively, 
WtpB,A) iS a transposition that exchanges two 
segments A and B and also reverses A (respec- 
tively, B). A revrev operation pp reverses each 
of the two contiguous segments (without trans- 
posing them). The problem of finding a short- 
est sequence of transposition, transreversal, and 
revrev operations that transforms a permutation 
into the identity permutation is called sorting by 
transpositions, transreversals, and revrevs. The 
distance of a permutation 1, denoted by d(m), is 
the length of the shortest sorting sequence. 


Key Results 


Linear vs. Circular Permutations 

An operation is said to operate on the affected 
segments as well as on the elements in those seg- 
ments. Two operations yt and jw’ are equivalent if 
they have the same rearrangement result, i.e., | - 
x = yw’-x for all x. In this work [8], Hartman and 
Sharan showed that for an element x of a circular 
permutation 1, if {t is an operation that operates 
on x, then there exists an equivalent operation 1’ 
that does not operate on x. Based on this property, 
they further proved that the problem of sorting 
by transpositions, transreversals, and revrevs is 
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A _B C -A 
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Sorting by Transpositions and Reversals (Ap- 
proximate Ratio 1.5), Fig. 1 (a) The equivalence 
of transreversal and revrev on circular permutations. 
(b) The breakpoint graph G(x) of the permutation 
nm = [1,—4,6,—5,2,—7,—3], for which f(m) = 


equivalent for linear and circular permutations. 
Moreover, they observed that revrevs and tran- 
sreversals are equivalent operations for circular 
permutations (as illustrated in Fig. la), imply- 
ing that the problem of sorting a linear/circular 
permutation by transpositions, transreversals, and 
revrevs can be reduced to that of sorting a circular 
permutation by transpositions and transreversals 
only. 


The Breakpoint Graph 

Given a signed permutation m on {1,2,...,n} of 
n elements, it is transformed into an unsigned 
permutation f(a) = mw’ = [),74,...,75,] on 
{1,2,...,2 n} of 2n elements by replacing each 
positive element 7 with two elements 27 — 1, 2i 
(in this order) and each negative element —i with 
2i,2i — 1. The extended f(z) is considered here 
as a circular permutation by identifying 2” + 1 
and 1 in both indices and elements. To ensure 
that every operation on f(1) can be mimicked 
by an operation on 2, only operations that cut 
before odd position are allowed for f(z). The 
breakpoint graph (G7) is an edge-colored graph 
on 2n vertices {1,2,..., 2m}, in which for every 
1 <i < n,m, is joined to 14,,, by a black 
edge and 27 is joined to 27 + | by a gray edge 
(e.g., see Fig. 1b). Since the degree of each vertex 
in G(z) is exactly 2, G(at) uniquely decomposes 
into cycles. A k-cycle (i.e., a cycle of lengthk) is a 
cycle with k black edges, and it is odd if k is odd. 
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[1, 2,8, 7,11, 12,10,9, 3,4, 14, 13, 6,5]. It is con- 
venient to draw G(x) on a circle such that black edges 
(i.e., thick lines) are on the circumference and gray edges 
(i.e., thin lines) are chords 


The number of odd cycles in G(x) is denoted by 
Coda (st). It is not hard to verify that G(s) consists 
of n 1-cycles, and hence, Coga(t) = n, if 1 is 
an identity permutation [1,2,...,m]. Gu et al. 
[5] have shown that Coga({t - 1) < Coaa(at) + 2 
for all linear permutations m and operations |. 
In this work [8], Hartman and Sharan further 
noted that the above result holds also for circular 
permutations and proved that the lower bound of 


d(x) is (n(at) — €oga(1t))/2. 


Transformation into 3-Permutations 

A permutation is called simple if its breakpoint 
graph contains only k-cycle, where k < 3. A sim- 
ple permutation is also called a 3-permutation if 
it contains no 2-cycles. A transformation from 1 
to 7 is said to be safe if n(7v) —Coaa(1) = n(t) — 
Coda(i). It has been shown that every permutation 
x can be transformed into a simple one 1’ by safe 
transformations and, moreover, every sorting of 
x’ mimics a sorting of m with the same number 
of operations [6, 11]. Here, Hartman and Sharan 
[8] further showed that every simple permutation 
x’ can be transformed into a 3-permutation 7 
by safe paddings (of transforming those 2-cycles 
into 1-twisted 3-cycles) and, moreover, every 
sorting of # mimics a sorting of x’ with the same 
number of operations. Hence, based on these 
two properties, an arbitrary permutation m can 
be transformed into a 3-permutation 7 such that 
every sorting of 7 mimics a sorting of m with the 


2024 


a b c 
Sorting by Transpositions and Reversals (Approxi- 


mate Ratio 1.5), Fig. 2. Configurations of 3-cycles. (a) 
Unoriented, 0-twisted 3-cycle. (b) Unoriented, 1-twisted 


same number of operations, suggesting that one 
can restrict attention to circular 3-permutations 
only. 


Cycle Types 

An operation that cuts some black edges is said 
to act on these edges. An operation is further 
called a k-operation if it increases the number 
of odd cycles by k. A (0, 2,2)-sequence is a 
sequence of three operations, of which the first is 
a 0-operation and the next two are 2-operations. 
An odd cycle is called oriented if there is a 2- 
operation that acts on three of its black edges; 
otherwise, it is unoriented. A configuration of 
cycles is a subgraph of the breakpoint graph 
that contains one ore more cycles. As shown in 
Fig. 2a—d, there are four possible configurations 
of single 3-cycles. A black edge is called twisted 
if its two adjacent gray edges cross each other 
in the circular breakpoint graph. A cycle is k- 
twisted if k of its black edges is twisted. For 
example, the 3-cycles in Fig. 2a—d are 0-, 1-, 2-, 
and 3-twisted, respectively. Hartman and Sharan 
observed that a 3-cycle is oriented if and only if 
it is 2- or 3-twisted. 


Cycle Configurations 

Two pairs of black edges are called intersecting 
if they alternate in the order of their occurrence 
along the circle. A pair of black edges intersects 
with cycle C, if it intersects with a pair of black 
edges that belong to C. Cycles C and Dintersect 
if there is a pair of black edges in C that intersects 
with D (see Fig. 2e). Two intersecting cycles are 
called interleaving if their black edges alternate 
in their order of occurrence along the circle (see 
Fig. 2f). Clearly, the relation between two cycles 
is one of (1) nonintersecting, (2) intersecting but 
non-interleaving, and (3) interleaving. A pair of 
black edges is coupled if they are connected by 
a gray edge and when reading the edges along 
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“eos 


3-cycle. (c) Oriented, 2-twisted 3-cycle. (d) Oriented, 3- 
twisted 3-cycle. (e) A pair of intersecting 3-cycles. (f) A 
pair of interleaving 3-cycles 

the cycle, they are read in the same direction. For 
example, all pairs of black edges in Fig. 2a are 
coupled. Gu et al. [5] have shown that given a 
pair of coupled black edges (b1, bz), there exists a 
cycle C that intersects with (b;, bz). A 1-twisted 
pair is a pair of 1-twisted cycles, whose twists 
are consecutive on the circle in a configuration 
that consists of these two cycles only. A 1-twisted 
cycle is called closed in a configuration if its two 
coupled edges intersect with some other cycle 
in the configuration. A configuration is closed 
if at least one of its 1-twisted cycles is closed; 
otherwise, it is called open. 


The Algorithm 

The basic ideas of the Hartman and Sharan’s 
1.5-approximation algorithm [8] for the problem 
of sorting by transpositions, transreversals, and 
revrevs are as follows. Hartman and Sharan re- 
duced the problem to that of sorting a circular 3- 
permutation by transpositions and transreversals 
only and then focused on transforming the 3- 
cycles into 1-cycles in the breakpoint graph of 
this 3-permutation. By definition, an oriented 
(i.e., 2- or 3-twisted) 3-cycle admits a 2-operation 
and, therefore, they continued to consider unori- 
ented (i.e., O- or 1-twisted) 3-cycles only. Since 
configurations involving only 0-twisted 3-cycles 
were handled with (0,2,2)-sequences in [7], Hart- 
man and Sharan restricted their attention to those 
configurations that consist of O- and 1-twisted 3- 
cycles. They showed that these configurations are 
all closed and that it can be sorted by a (0,2,2)- 
sequence of operations for each of the following 
five possible closed configurations: (1) a closed 
configuration with two unoriented, interleaving 
3-cycles that do not form a 1-twisted pair; (2) 
a closed configuration with two intersecting, 0- 
twisted 3-cycles; (3) a closed configuration with 
two intersecting, 1-twisted 3-cycles; (4) a closed 
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configuration with a 0-twisted 3-cycles that in- 
tersects with the coupled edges of a 1-twisted 3- 
cycle; and (5) a closed configuration that contains 
k > 2 mutually interleaving 1-twisted 3-cycles 
such that all their twists are consecutive on the 
circle and k is maximal with this property. As 
a result, the sequence of operations used by 
Hartman and Sharan in their algorithm contains 
only 2-operations and (0,2,2)-sequences. Since 
every sequence of three operations increases the 
number of odd cycles by at least 4 out of 6 
possible in 3 steps, the ratio of their approxi- 
mation algorithm is 1.5. Furthermore, Hartman 
and Sharan showed that their algorithm can be 
implemented in O(n?/? ,/logn) time using the 
data structure of Kaplan and Verbin [10], where 
n is the number of elements in the permutation. 


Theorem 1 The problem of sorting linear per- 
mutations by transpositions, transreversals, and 
revrevs is linearly equivalent to the problem of 
sorting circular permutations by transpositions, 
transreversals, and revrevs. 


Theorem 2 There is a 1.5-approximation algo- 
rithm for sorting by transpositions, transrever- 
sals, and revrevs, which runs in O(n3/?../log n) 


time. 


Applications 


When trying to determine evolutionary distance 
between two organisms using genomic data, bi- 
ologists may wish to reconstruct the sequence of 
evolutionary events that have occurred to trans- 
form one genome into the other. One of the most 
promising ways to do this phylogenetic study 
is to compare the order of appearance of iden- 
tical (e.g., orthologous) genes in two different 
genomes [9, 12]. This comparison of computing 
global rearrangement events (such as reversals, 
transpositions, and transreversals of genome seg- 
ments) may provide more accurate and robust 
clues to the evolutionary process than the anal- 
ysis of local point mutations (i.e., substitutions, 
insertions, and deletions of nucleotides/amino 
acids). Usually, the two genomes being com- 
pared are represented by signed permutations, 
with each element standing for a gene and its 
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sign representing the (transcriptional) direction of 
the corresponding gene on a chromosome. Then 
the goal of the resulting genome rearrangement 
problem is to find a shortest sequence of rear- 
rangement operations that transforms (or, equiv- 
alently, sorts) one permutation into the other. 
Previous work focused on the problem of sorting 
a permutation by reversals. This problem has 
been shown by Capara [2] to be NP-hard, if the 
considered permutation is unsigned. However, 
for signed permutations, this problem becomes 
tractable and Hannenhalli and Pevzer [6] gave the 
first polynomial-time algorithm for it. On the 
other hand, there has been less progress on the 
problem of sorting by transpositions. Thus far, 
the complexity of this problem is still open, 
although several 1.5-approximation algorithms 
[1, 3, 7] have been proposed for it. Recently, 
the approximation ratio of sorting by transpo- 
sitions was further improved to 1.375 by Elias 
and Hartman [4]. Gu et al. [5] and Lin and Xue 
[11] gave quadratic-time 2-approximation algo- 
rithms for sorting signed, linear permutations by 
transpositions and transreversals. In [11], Lin and 
Xue considered the problem of sorting signed, 
linear permutations by transpositions, transrever- 
sals, and revrevs and proposed a quadratic-time 
1.75-approximation algorithm for it. In this work 
[8], Hartman and Sharan further showed that 
this problem is equivalent for linear and circular 
permutations and can be reduced to that of sorting 
signed, circular permutations by transpositions 
and transreversals only. In addition, they provided 
a 1.5-approximation algorithm that can be imple- 
mented in O(n?/? ,/log 7) time. 
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Problem Definition 


This entry describes algorithms for finding the 
minimum number of steps needed to sort a signed 
permutation (also known as inversion distance, 
reversal distance). This is a real-world prob- 
lem and, for example, is used in computational 
biology. 

Inversion distance is a difficult computational 
problem that has been studied intensively in re- 
cent years [1, 4, 6-10]. Finding the inversion 
distance between unsigned permutations is NP- 
hard [7], but with signed ones, it can be done in 
linear time [1]. 


Key Results 


Bader et al. [1] present the first worst-case linear- 
time algorithm for computing the reversal dis- 
tance that is simple and practical and runs faster 
than previous methods. Their key innovation is a 
new technique to compute connected components 
of the overlap graph using only a stack, which 
results in the simple linear-time algorithm for 
computing the inversion distance between two 
signed permutations. Bader et al. provide am- 
ple experimental evidence that their linear-time 
algorithm is efficient in practice as well as in 
theory: they coded it as well as the algorithm of 
Berman and Hannenhalli, using the best princi- 
ples of algorithm engineering to ensure that both 
implementations would be as efficient as possible 
and compared their running times on a large 
range of instances generated through simulated 
evolution. 

Bafna and Pevzner introduced the cycle 
graph of a permutation [3], thereby providing 
the basic data structure for inversion distance 
computations. Hannenhalli and Pevzner then 
developed the basic theory for expressing the 
inversion distance in easily computable terms 
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(number of breakpoints minus number of cycles 
plus number of hurdles plus a correction factor 
for a fortress [3, 15]-hurdles and fortresses are 
easily detectable from a connected component 
analysis). They also gave the first polynomial- 
time algorithm for sorting signed permutations 
by reversals [9]; they also proposed a O(n*) 
implementation of their algorithm which runs 
in quadratic time when restricted to distance 
computation. Their algorithm requires the 
computation of the connected components of 
the overlap graph, which is the bottleneck 
for the distance computation. Berman and 
Hannenhalli later exploited some combinatorial 
properties of the cycle graph to give a O(na(n)) 
algorithm to compute the connected components, 
leading to a O(n?a(n)) implementation of 
the sorting algorithm [6], where q@ is the 
inverse Ackerman function. (The later Kaplan- 
Shamir-Tarjan (KST) algorithm [10] reduces 
the time needed to compute the shortest 
sequence of inversions, but uses the same 
algorithm for computing the length of that 
sequence.) 

No algorithm that actually builds the overlap 
graph can run in linear time, since that graph 
can be of quadratic size. Thus, Bader’s key 
innovation is to construct an overlap forest 
such that two vertices belong to the same 
tree in the forest exactly when they belong to 
the same connected component in the overlap 
graph. An overlap forest (the composition of its 
trees is unique, but their structure is arbitrary) 
has exactly one tree per connected component 
of the overlap graph and is thus of linear 
size. The linear-time step for computing the 
connected components scans the permutation 
twice. The first scan sets up a trivial forest in 
which each node is its own tree, labeled with 
the beginning of its cycle. The second scan 
carries out an iterative refinement of this first 
forest, by adding edges and so merging trees 
in the forest; unlike a Union-Find, however, 
this algorithm does not attempt to maintain the 
trees within certain shape parameters. This step 
is the key to Bader’s linear-time algorithm for 
computing the reversal distance between signed 
permutations. 
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Applications 


Some organisms have a single chromosome or 
contain single-chromosome organelles (such as 
mitochondria or chloroplasts), the evolution of 
which is largely independent of the evolution of 
the nuclear genome. Given a particular strand 
from a single chromosome, whether linear or cir- 
cular, we can infer the ordering and directionality 
of the genes, thus representing each chromosome 
by an ordering of oriented genes. In many cases, 
the evolutionary process that operates on such 
single-chromosome organisms consists mostly of 
inversions of portions of the chromosome; this 
finding has led many biologists to reconstruct 
phylogenies based on gene orders, using as a 
measure of evolutionary distance between two 
genomes the inversion distance, i.e., the smallest 
number of inversions needed to transform one 
signed permutation into the other [11, 12, 14]. 
The linear-time algorithm is in wide use (as 
it has been cited nearly 200 times within the 
first several years of its publication). Examples 
include the handling multichromosomal genome 
rearrangements [16], genome comparison [5], 
parsing RNA secondary structure [13], and 
phylogenetic study of the HIV-1 virus [2]. 


Open Problems 

Efficient algorithms for computing minimum dis- 
tances with weighted inversions, transpositions, 
and inverted transpositions are open. 
Experimental Results 


Bader et al. give experimental results in [1]. 


URL to Code 


An implementation of the linear-time algorithm 
is available as C code from www.cc.gatech.edu/~ 
bader. Two other dominated implementations are 
available that are designed to compute the short- 
est sequence of inversions as well as its length: 
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one, due to Hannenhalli that implements his first 
algorithm [9], which runs in quadratic time when 
computing distances, while the other, a Java ap- 
plet written by Mantin (http://www.math.tau.ac. 
il/~rshamir/GR/), that implements the KST algo- 
rithm [10], but uses an explicit representation of 
the overlap graph and thus also takes quadratic 
time. The implementation due to Hannenhalli is 
very slow and implements the original method of 
Hannenhalli and Pevzner and not the faster one 
of Berman and Hannenhalli. The KST applet is 
very slow as well since it explicitly constructs the 
overlap graph. 


Cross-References 


For finding the actual sorting sequence, see the 
entry: 
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Problem Definition 


A signed permutation 1 of size n is a permutation 


over {—n,...,—1,1...m}, where t_; = —1; 
for alli. We note m = (m1,...,n). 
The reversal p = p;,;1 <i < j <n) is 


an operation that reverses the order and flips the 
signs of the elements 7;,..., 7; in a permutation 


TW: 
Tp 

= (%1,...,Mj-1,-Mj,...,—-MiWj41,..., Tn). 
If p1,...,P% iS a sequence of reversals, it is 


said to sort a permutation m if m---pi-+- px = 
Id, where Id = (1,2,...,m) is the identity 
permutation. The length of a shortest sequence of 
reversals sorting 1 is called the reversal distance 
of x and is denoted by d(x). 

If the computation of d(x) is solved in linear 
time [3] (see the entry » Sorting Signed Per- 
mutations by Reversal (Reversal Distance)), the 
computation of a sequence p!,...,p* of size 
k = d(x) that sorts m is more complicated, 
and no linear time algorithm is known so far. 
The best complexity is currently achieved by 
the subquadratic solution of Tannier and Sagot 
[17], which has later been improved by Tannier, 
Bergeron and Sagot [18], and Han [9]. 


Key Results 


The O(n*) Self-Reduction 

Recall there is a linear algorithm to compute the 
reversal distance thanks to the formula d(x) = 
n+1-—c(x) + t(x) + A(x) + f(a), where 
c(x) is the number of cycles in the breakpoint 
graph and A(x) + f(s) is computed from the 
unoriented components of the permutation (see 
the entry > Sorting Signed Permutations by Re- 
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versal (Reversal Distance)). Once this is known, 
the self-reduction technique trivially computes a 
sequence of size d(m): try every possible reversal 
p at one step, until you find one such that d(x - 
p) = d(x) — 1. Such a reversal is called a sorting 
reversal. This necessitates O(n) computations for 
every possible reversal. There are at most n(n + 
1)/2 = O(n”) reversals to try, so iterating this to 
find a sequence yields an O(n*) algorithm. 

The first polynomial algorithm by Hannenhalli 
and Pevzner [10] was not achieving a better com- 
plexity, and the algorithmic study of finding the 
shortest sequences of reversals began its history. 


The Quadratic Roof 


All the published solutions for the computations 
of a sorting sequence are divided into two, fol- 
lowing the division of the distance formula into 
its parameters: a first part computes a sequence 
of reversals so that the resulting permutation has 
no unoriented component, and a second part sorts 
all oriented components. 

The first part was given its best solution by 
Kaplan, Shamir, and Tarjan [12], whose algo- 
rithm runs in linear time when coupled with the 
linear distance computation [3], and it is based on 
Hannenhalli and Pevzner’s [10] early results. 

The second part is the bottleneck of the whole 
procedure. At this point, if there is no unoriented 
component, the distance is d(x) =n +1-—c(n), 
so a sorting reversal is one that increases c(m) and 
does not create unoriented components. 

A reversal that increases c(m) is called ori- 
ented. Finding an oriented reversal is an easy part: 
any two consecutive numbers that have different 
signs in the permutation define one. This can 
easily be done in linear time or sublinear with ad 
hoc data structures to maintain the permutation 
during the scenario. The hard part is to make sure 
it does not create unoriented components. 

The quadratic solutions (see, e.g., the one of 
Kaplan, Shamir, and Tarjan [12]) are based on 
the linear recognition of sorting reversals. No 
better algorithm is known so far to recognize 
sorting reversals, and it seemed that a lower 
bound had been reached, as witnessed by a survey 
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of Ozery-Flato and Shamir [15] in which they 
wrote that “a central question in the study of 
genome rearrangements is whether one can ob- 
tain a subquadratic algorithm for sorting by re- 
versals.” This was obtained by Tannier and Sagot 
[17], who proved that the recognition of sorting 
reversal at each step is not necessary, but only the 
recognition of oriented reversals. 


A Promising New but Still Quadratic 
Method 


The algorithm is based on the following theorem, 
taken from [18]. A sequence of oriented reversals 
01,---,Pk iS said to be maximal if there is no 
oriented reversal in m - py... px. In particular a 
sorting sequence is maximal, but the converse is 
not true. 


Theorem 1 /f S is a maximal but not a sorting 
sequence of oriented reversals for a permutation, 
then there exists a nonempty sequence S' of ori- 
ented reversals such that S may be split into two 
parts S = S,,S2, and S;, S’, S> is a sequence of 
oriented reversal. 


This allows to construct sequences of oriented 
reversals instead of sorting reversals, increase 
their size by adding reversals inside the sequence 
instead of at the end, and obtain a sorting se- 
quence. 

This algorithm, with a classical data structure 
to represent permutations (e.g., as an array), has 
still an O(n?) complexity, because at each step 
it has to test the presence of an oriented reversal 
and apply it to the permutation. 


Composing with Data Structures 

The slight modification of a data structure in- 
vented by Kaplan and Verbin [11] allows to pick 
and apply an oriented reversal in O(,/n log), 
and using this, Tannier and Sagot’s algorithm 
achieves O(n3/? \/logn) time complexity. 

Han [9] announced another data structure that 
allows to pick and apply an oriented reversal in 
O(./n) time, and integrating this to the algorithm 
can plausibly decrease the complexity of the 
overall method to O(n3/2). Swenson et al. [16] 
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gave an O(n logn) solution for picking oriented 
reversals, but their attempts of integrating it to the 
overall procedure seems to fail on worst cases. 


Extensions 


Once sorting by reversals has reached its best 
solutions, there are natural extensions guided by 
the main motivation for the problem in computa- 
tional biology: sample among optimal solutions, 
and handle several permutations and more opera- 
tions than just the reversal. 

Counting optimal solutions is conjectured to 
be #P-complete [14], but sampling almost uni- 
formly from the solution space is still open, and 
has been given a heuristic solution [14], including 
suboptimal solutions in the sample. 

Algorithms to enumerate all sorting reversals 
at one step have also been worked out [4], which 
provides a way for enumeration. A structure of 
the solution space was proposed, but with a pos- 
sibly exponential number of objects to enumer- 
ate [5]. 

The median problem consists in handling 
more than one permutation and is a particular 
case of the so-called small parsimony problem, 
which consists in reconstructing ancestral states 
in a phylogenetic context. Additional operations 
can be transpositions, duplications, or many 
others. Many generalizations and variants have 
been listed in a book on Combinatorics of 
Genome Rearrangements [8]. Almost all are 
NP-hard. 


Applications 


The motivation as well as the main application of 
this problem is in computational biology. Signed 
permutations are an adequate object to model the 
relative position and orientation of homologous 
segments of DNA in two species. 

Reversal scenarios were used to test some 
evolutionary properties, like the propension of 
rearrangement to cut around the replication origin 
[1] or the fragility of certain genomic regions 
[2]. But evolutionary hypotheses can hardly be 
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tested from a single optimal solution; this would 
necessitate a better view of the solution space. 

The gain of complexity for sorting by reversals 
inspired many other algorithmic works, and 
several problems in genome rearrangement found 
a better solution thanks to the subquadratic gain 
described here. But the computational difficulties 
of the problem (parameters A(x) and f(x), 
additional complexity for generating a scenario 
compared to the distance calculation, NP- 
completeness of every generalization with more 
operations, more permutations, more realistic 
models) lead most computational biologists to 
progressively abandon the reversal model for 
simpler ones (DCJ [19], SCJ [7]). 

Sometimes heroic gains in complexity are 
worth for computer science but seem just like 
going a bit further in a dead end for applications. 
Research consists in breaking walls without 
always knowing if behind there is a space for 
a community to work in or another thicker wall. 


Open Problems 


Still there are a couple of questions that remain 
unsolved before closing (or reopening?) this en- 
try: 


¢ Iconjecture that the “real” complexity of giv- 
ing a reversal scenario is O(n log 7). It is more 
or less what Swenson et al. [16] also claim, but 
without giving a full proof. 

¢ Counting and sampling, even approximately, 
are open. I learned this interesting conjecture 
from Istvan Miklos: is it possible to walk in 
the entire space of sequences of sorting re- 
versals by small transformations of scenarios, 
consisting at each step to change at most 4 
reversals? This would be a first step to design 
an almost uniform sampler. 


Experimental Results 


To my knowledge the data structure that allows 
the subquadratic complexity described in this 
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entry has never been implemented. The size of 
the data, as well as the limited possibilities of 
applications of handling only two genomes and 
a single optimal solution, makes the subquadratic 
version, while a good piece of algorithmics, not 
really worth for applications. 


URL to Code 


¢ There are a few old programs still able to 
give a sorting sequence of reversals: in San 
Diego http://grimm.ucsd.edu/GRIMM/, New 
Mexico www.cs.unm.edu/~moret/GRAPPA/, 
or Tel Aviv www.math.tau.ac.il/~rshamir/ 
GR/ and more recent ones in Lyon http:// 
doua.prabi.fr/software/luna or Bielefeld 
http://bibiserv.techfak.uni-bielefeld.de/dcj/wel 
come.html. 

¢ The standard software for Bayesian sampling 
in the space of sorting sequences (including 
nonoptimal ones) is Badger http://bibiserv. 
techfak.uni-bielefeld.de/dcj/welcome.html, 
and there is also one biased to optimal 
solutions called DCJ2HP http://www.renyi. 
hu/~miklosi/DCJ2HP/ that uses a_ parallel 
tempering between DCJ solutions (easier to 
sample) and reversals solutions. 
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Problem Definition 


Let G = (V,£) be an undirected graph, with 
nonnegative weights on the edgesw: E > Ry. 
Let dg be the shortest-path metric on G, with 
respect to the weights. For a spanning (subgraph) 
tree T of G, define the stretch of an edge {u, v} € 


E in T as stretchr(u,v) = sees and the 
average stretch as 
1 
avg — stretch; (G) = IE] > stretch7 (e) . 
ecE 


We shall consider the problem of finding a tree T 
whose average stretch is small. We also study the 
problem of finding a distribution over spanning 
trees, such that for all e € E, E7[stretch7(e)] is 
small. 


Key Results 


Low-stretch spanning trees were first studied 
by [3], who showed that any graph on n 
vertices has a spanning tree with average stretch 
20(Jlogntozlog”) and showed a family of graphs 
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that requires (2(logn) average stretch. Their 
result was substantially improved by [7], who 
showed an upper bound of O(log? 7 log logn), 
and later [1] improved this to a near optimal 
O(log n). 

The main result discussed here is from [2]: 


Theorem 1 For any graph G with n_ver- 
tices and m_ edges, 
istic algorithm that constructs a spanning 
tree T, such that avg—stretchy(G) < 
O(lognloglogn). The running time of the 
algorithm is O(m logn log logn). 


there is a determin- 


We also show an efficient algorithm to sample 
from a distribution over spanning trees, such that 
the expected stretch of any edge is bounded by 
O(log n log logn log log logn). 


Applications 


An important problem in algorithm design is ob- 
taining fast algorithms for solving linear systems. 
For many applications, the matrix is sparse, and 
while little is known for general sparse matri- 
ces, the case of symmetric diagonally dominant 
(SDD) matrices has received a lot of attention 
recently. In a seminal sequence of results, Spiel- 
man and Teng [12] showed a near-linear time 
solver for this important case. This solver has 
proven a powerful algorithmic tool and is used 
to calculate eigenvalues, obtain spectral graph 
sparsifiers [11], and approximate maximum flow 
[6] and many other applications. A basic step in 
solving these systems Ax = b is combinatorial 
preconditioning. If one uses the Laplacian matrix 
corresponding to a spanning tree (and a few extra 
edges) of the graph whose Laplacian matrix is 
A, then the condition number depends on the 
total stretch of the tree. This will improve the 
run-time of iterative methods, such as conjugate 
gradient or Chebyshev iterations. See [9, 10] for 
the latest progress on this direction. In this work 
we show that one can construct such a spanning 
tree with both run-time and total stretch bounded 
by O(m logn log logn). 

Probabilistic embedding into trees, introduced 
by [4], has been a successful paradigm in algo- 
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rithm design. Many hard optimization problems 
on graphs can be reduced, via embedding, to a 
similar problem on a tree, which is often con- 
siderably easier. This framework can be applied 
to approximation algorithms, online algorithms, 
network design, and other settings. Some of the 
notable examples are metrical task system, buy- 
at-bulk network design, the k-server problem, 
group Steiner tree, etc. An asymptotical opti- 
mal result of expected O(log) distortion for 
probabilistic embedding into trees was given by 
[8]. The trees in the support of the FRT dis- 
tribution are not subgraphs of the input graph 
and may contain Steiner nodes and new edges. 
While this is fine for most applications, there 
are some that must have trees which are sub- 
graphs, such as minimum cost communication 
spanning tree: Given a weighted graph G = 
(V, E) and a requirement matrix R = (r,,), the 
objective is to find a spanning tree T that mini- 
mizes ater Tw + dr (u, v). Our result implies a 
O(log n) approximation. 


Petal Decomposition 


A basic tool that is often used in constructing 
tree metrics and spanning trees with low stretch 
is sparse graph decomposition. The idea is to 
partition the graph into small diameter pieces, 
such that few edges are cut. Each cluster of the 
decomposition is partitioned recursively, which 
yields a hierarchical decomposition. Creating a 
tree recursively on each cluster of the decompo- 
sition, and connecting these in a tree structure, 
will yield a spanning tree of the graph. The edges 
cut by the decomposition are potentially stretched 
by a factor proportional to the diameter of the 
created tree. The construction has to balance 
between these two goals: cut a small number of 
edges and maintain small diameter in the created 
tree. 

One of the main difficulties in such a spanning 
tree construction is that the radius (The radius 
of a graph is the maximal distance from a des- 
ignated center.) may increase by a small factor 
at every application of the decomposition, which 
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translates to increased stretch. If we drop the 
requirement that the tree is a spanning tree of 
the graph and just require a tree metric, then this 
difficulty does not appear, and indeed, optimal 
@(logn) bound is known on the average stretch 
[5,8]. Our petal decomposition allows essentially 
optimal control on the radius increase of the 
spanning tree; it increases by at most a factor of 
4 over all the recursion levels. 


Highways 

One of the components in the decomposition 
scheme is highways. Each cluster X¥ C V in our 
decomposition scheme has a designated center 
Xo € X anda “target” ¢ € X. It is guaranteed 
that the shortest path from x9 to ¢ will be fully 
contained in the final spanning tree 7. This path 
is called the petal’s highway. Intuitively, the high- 
way will provide short paths from the center x9 to 
many of the points in the cluster. 


Cones and Petals 

A cone is a generalization of a ball; the notion 
of cones was introduced in [7] and was used also 
in [1] for low-stretch spanning trees. Informally, a 
cone C(t, 7) of radius r centered at t (with respect 
to the cluster center xo) contains all the points 
z € X such that d(z,t) + d(t, x0) < d(z, x0) + 
r (here d is the shortest-path metric on X). In 
other words, the cone contains all the points for 
which the path to x9 through ¢ is not much longer 
than the direct shortest path to x9. The parameter 
r is a bound on the radius increase in the current 
decomposition. 

One way to define a petal is as a union 
of cones. The petal P(t,r) around a target ¢ 
with radius r is defined as Up, —, C(pe.k/2), 
where px is the point of distance r — k from 
t on the shortest path from t to Xo. The 
center of the petal is defined as x = po, and 
the path from x to ¢ is the petal’s highway. 
The petal-decomposition algorithm 
iteratively picks an arbitrary target of distance 
at least 34/4 (where A is the radius of X) away 
from Xo, generates a petal for it, and removes the 
petal from the graph. When there are no longer 
such points, the remaining points will form 
the central cluster (the stigma). The first petal 
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requires extra care in its target choice, as it may 
contain the designated target of the cluster, which 
implies we cannot allow the shortest path to this 
target to be cut by this or subsequent petals. The 
radii of the petals are chosen by a region-growing 
argument that cuts few edges, where the length of 
the possible range for the radius is ~A. This is in 
contrast with the previous work, where in order to 
give an appropriate bound on the radius increase, 
the range was much smaller than A, which 
immediately translates to a loss in the stretch. 
The precise method for choosing r is essentially 
given in [7], and we also give a randomized 
version similar in spirit to the one in [1]. 


Fast Petal Construction 

The alternative way to define petals and cones is 
as balls in an appropriately defined directed graph 
created from G. This suggests that we can use a 
variant of Dijkstra to compute a petal in nearly 
linear time in the sum of degrees of its vertices. 
Let G = (V,A,w) be the weighted directed 
graph induced by adding the two directed edges 
(u, v), (v,u) € A for each {u, v} € E and setting 
w(u,v) = d(u,v) — (d(v, x0) — d(u, Xo)). The 
cone C(t,1) is simply the ball around ¢ of radius 
r in G. The petal P(t,r) is the ball around ¢ 
of radius r/2 in G with one change: the weight 
of each edge on the path from f to x = po is 
changed to be 1/2 of its original weight (i.e., 1/2 
of its weight in G). 


Ideas in the Analysis 

Informally, the crucial property of a petal and its 
highway is the following: Assume z € P(t,r), 
and Px z is the shortest path from the original 
center Xo to z. By forming the petal, we remove 
all edges between P(t,r) and X \ P(t,r) except 
for the edge from the petal center x toward Xo. 
Hence, any path from x9 to z must go through the 
petal center x. If the new shortest path Pf og (after 
forming the petal) is (additively) k/2 longer than 
the length of P,,-, then z € C(px,k/2) and so 
Pi oz Will contain part of the new petal’s highway 
of length at least k. Such a property could allow 
the following wishful thinking: Suppose that in 
each iteration we increase the distance of a point 
to the center by at most a but also mark a new 
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portion of the path of length 2a as edges that 
are guaranteed to appear in the final tree (part 
of a highway). In such a case, it is easy to see 
that the final path will have stretch at most 2: 
If the original distance was b, once the total 
increase is b, we have marked 2b — all of the 
path — as a highway that will appear in the tree. 
Unfortunately, the path from x to z in the final 
tree may not use the prescribed highway of the 
parent cluster so the above “wishful thinking” 
argument does not work. 

The key algorithmic idea to alleviate this prob- 
lem is to decrease the weight of an edge by half 
when it becomes part of a highway (we ensure 
that this happens at most once for every edge). 
This reweighting signals later iterations to use 
the prescribed highway, as this must remain the 
shortest path. We maintain the invariant that in 
every cluster, the highway edges are the only 
cluster edges which have been reweighted. Now, 
in every petal (except for maybe the first), we cre- 
ate a new petal highway when we form P(t,r). 
For any z € P(t,r), the length of the path from 
Xq to Z does not increase at all (after reweighting 
the highway): For some k < r, it increased by at 
most k/2, but a highway length of at least k was 
reduced by 1/2. 

We have to take care of radius increase gen- 
erated by the very first petal as well, where it 
could be that no new highway is created (this 
petal’s highway may be a part of the highway 
of the original cluster). In this case, we use the 
fact that the path from x9 to x; (the center of 
the first petal) must also be on the highway of 
the original cluster and that its length is at least 
A/2. This implies that even though we may have 
increased the radius, at least half of the path is 
guaranteed not to increase ever again. We use a 
subtle inductive argument to make this intuition 
precise, and in fact we lose a factor of 2 for each 
of these cases, so the maximal increase is by a 
factor of 4. 
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Problem Definition 


Suppose that we have access to a vector x € 
C”. How much time does it take to compute its 
Fourier transform x? One can do this with the 
Fast Fourier Transform (FFT) in O(n log 7) time. 
But can we do better? 

We do not know the answer in general, but 
some classes of algorithms cannot do better [1, 
20] and certainly one cannot do better than O(n) 
time for arbitrary signals x. But the Fourier trans- 
form is ubiquitous in signal processing, appearing 
in compression of audio, images, and video, in 
manipulation of audio, and in recovery of radio or 
MRI signals, so we would really like to do better. 
If we cannot improve on the FFT in general, then 
perhaps we can for the signals commonly seen 
in these applications. To do this, we need some 
notion for how the signals we typically see are 
“easier” than arbitrary ones. 

One such notion is sparsity. The main reason 
to use the Fourier transform in compression is 
because it concentrates the energy of the signal 
into a few large (or “heavy”) coordinates and 
many small ones; signals with such concentrated 
coordinates are called sparse. One can then throw 
out the small coordinates and only store the 
heavy ones; this is the main principle behind 
lossy compression such as MP3 or JPEG. In fact, 
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in all the applications discussed in the previous 
paragraph, the signals typically have an approxi- 
mately sparse Fourier transform. This brings us 
to the problem described in this entry: can we 
speed up the Fourier transform for signals when 
the result is approximately sparse? 

Moreover, as with lossy compression, we of- 
ten only care about the heavy coordinates and are 
willing to tolerate an error proportional to the en- 
ergy in the small coordinates. This relaxation will 
allow us to compute the sparse Fourier transform 
in sublinear time. 


Formal Definition 
The discrete Fourier transform x € C” of a vector 
x € C” is given by 


n 
LS y wo" for w = e27i/n 


i=1 


We say that X is exactly k-sparse if it has at most 
k nonzero coordinates, i.e., |supp(x)| < k. We 
say that X is approximately k-sparse if most of the 
energy is contained in the heaviest k coordinates, 
in particular 

Err, (x) := 


min ||X — Jll2 


-sparse y 


is small relative to ||X||2. A sparse Fourier trans- 
form algorithm can access x € C” in arbitrary 


positions and outputs a vector <’ such that 


|X — Xo < C Errg(x) + 8] |]2 (1) 


for some approximation factor C > 1 andé < 1. 
An algorithm for the exactly sparse case would 
do this for C = ov, while robust algorithms 
can achieve C = O(1) orevenC = l+e. 
The algorithms we will discuss will feature a 
logarithmic dependence on 1/6, so one typically 
sets 6 = 1/poly(n), and for typical signals, the 
right-hand side of (1) will be dominated by the 
C Errg (x) term; we will assume this for the rest 
of the entry. 

We would like to optimize both the sample 
complexity — the number of positions of x 
that are accessed by the algorithm — and the 
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running time. Optimizing sample complexity 
is important for applications such as spectrum 
sensing or MRIs, which do not have the input 
x in memory but must sample it at some 
expense. 

We also allow the algorithm to be randomized 
and to fail with some small probability p. For 
simplicity we set p to a small constant; for 
any algorithm one can amplify this probability 
with a O(log 5) overhead in sample complexity 
and time. It is an open question whether the 
algorithms that achieve the best known time and 
sample complexities can be modified to avoid this 
overhead. 


Related Work 

The modern research on_ sparse Fourier 
transforms is closely related to work on sparse 
recovery from general linear measurements. In 
this problem, one would like to (approximately) 
recover an (approximately) sparse vector 
x from linear measurements Ax for some 
“measurement” matrix A with fewer rows 
than columns. The sparse Fourier transform is 
precisely this where A is a subset of rows of the 
inverse Fourier matrix. 

Broadly speaking, there are two conceptual 
classes of algorithms and results for the general 
linear measurement setting. The first class, often 
called compressed sensing and first studied 
in [2, 3, 7], generally (1) involves independent 
random linear measurements,;(2) shows with 
high probability, the measurement matrix gives 
good recovery for all vectors x; (3) optimizes 
the sample complexity but not the running 
time, which is superlinear or polynomial in 
n; and (4) give algorithms that work for 
general classes of measurements and work for 
both random Gaussian and random Fourier 
matrices at the same time. These papers often 
refer to properties like the restricted isometry 
property that measurement matrices may have 
and use either convex optimization (e.g., Ll 
minimization or the LASSO) or iterative greedy 
methods (e.g., [HT or CoSaMP) to perform the 
recovery. 

The second class, more often called sparse 
recovery, is largely an outgrowth of the streaming 
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algorithms literature [4, 5]. These results 
generally (1) involve more structured linear 
measurements that use randomness and also have 
dependencies among the samples; (2) show for 
each vector x that, with high probability, the 
measurement matrix gives good recovery; (3) 
optimize both the sample complexity and the 
running time, so both may be sublinear in 1; and 
(4) give algorithms that are closely connected to 
the measurement matrix and would not work 
for matrices with different structure. These 
papers often construct the matrix to emulate 
hash tables and use medians to perform robust 
recovery. 

These statements are generalizations, and 
not every algorithm matches the trend in all 
four ways, but they hold more often than 
not. Our algorithm falls in the second class, 
which for Fourier measurements can achieve 
both better sample complexity and _ better 
running time than algorithms in the first 
class. 

There’s a much older collection of algorithms 
that can do sparse Fourier transforms in the ex- 
act setting when |supp(X)| < k. These include 
Prony’s method from 1795, the matrix pencil 
method, and Berlekamp-Massey syndrome de- 
coding. These can achieve the optimal sample 
complexity of 2k and recovery time poly(k) 
(down to O(k? + k log® logn) [8]). Addition- 
ally, they use a deterministic set of samples and 
work for all vectors x. However, it is not known 
how to make the techniques in these algorithms 
robust to approximately sparse signals, so they 
do not apply to the signals appearing in typical 
applications. 

Noise-tolerant sparse Fourier transforms were 
first studied over the Boolean cube, also known 
as the Hadamard transform. In this setting, Gol- 
dreich and Levin [12, 18] showed how to get 
O(k log(n/k)) samples and O(k log® n) time, 
which is essentially optimal. Mansour [19] ex- 
tended this to the C” setting that we consider in 
this entry but with more than k? sample com- 
plexity. Over the next couple decades, a number 
of subsequent works, including [9, 10, 13, 14, 
16], have improved our understanding of the C” 
setting. 


2038 


Key Results 


At present, the two best sparse Fourier trans- 
form algorithms are [13], which is fastest at 
O(k log(n/k) logn) time and sample complex- 
ity, and [14], which has nearly optimal O(k log n) 
sample complexity at the cost of O(n) running 
time. These works build on [9, 10, 17]. 

We know that the optimal nonadaptive sample 
complexity — that is, among algorithms that 
choose the sample set {2 independently of 
the vector x — is 2(klog(n/k)) [6], which 
matches [14] for k < n®??. One could imagine 
constructing an algorithm that uses adaptive 
samples, where one uses the first few samples 
to decide where to look in future samples. In the 
general sparse recovery setting, this adaptivity 
can lead to significant improvements [15], 
but we know that 2(k log(n/k)/loglogn) 
Fourier samples remain necessary in the adaptive 
setting [13]. 


Algorithm Overview 

At a high level, sparse recovery algorithms are 
built in three stages: one-sparse recovery, where 
we solve the problem for k = 1; partial k- 
sparse recovery, where we find and estimate most 
(say, 90%) of the heavy coordinates or of the 
energy; and full k-sparse recovery, where we 
get a good approximation to the entire signal 
and achieve (1). Each stage uses the previous as 
(nearly) a black box. This architecture generally 
holds for the class of “sparse recovery” algo- 
rithms; in the sparse Fourier transform setting, the 


Sparse Fourier Transform, Fig. 1 The first two steps of 
estimating i* using y; and y2. Using y; we can identify 
i* toan O(cn) size region. With y2 we learn 27* mod n 
to within O(cn), which tells us that i* is within one of 
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pieces change, but the architecture does not. We 
will go through each in turn. 


One-Sparse Recovery 

Let us consider the one-sparse setting for C = 
O(1). We have access to xj = vol i + gj; for 
some “signal” (v,i*) € C x [n] and “noise” g € 
C” with ||g|l2 < clv|./n for a sufficiently small 
constant c. To satisfy (1), we would like to find 
i* exactly and find v to within O(||g||/./7). 

The tricky bit is to find i*; once we know 
i*, then xjo td is a good estimator of v. 
In particular, for a random j € [n], we have 
bj|x jot J —v|? = |lg||?/n, so taking the 
median of several such estimates will have 
O(\|g\|/./7) error with large probability. So that 
just leaves us to find i*. 

As a first step, consider for a fixed a € [n] 


looking at the random variable 


bee On 
Ya = Xatj/Xj YO 


as a distribution over random j € [n], where 
addition of indices is taken modulo n. This allows 
us to remove the influence of v and focus on i*. 
We can show that yg — w'* < O(c) with large 
(say, 3/4) probability. Suppose this were instead 
true with probability 1. 

By knowing w' @ to within O(c), we know 
i*a modn to with +O(cn). For small enough 
c, this is within +n/4. Then we could look at 
y, to learn i* to within +n/4, y2 to refine the 
estimate to +n/8, and y4 to refine to +n/16, 
until we identify i* using logn different yg. This 
is illustrated in Fig. 1. 


two antipodal regions of half the size. Based on yj, we 
can throw out the spurious region and narrow our estimate 
of i* (a) Error in yy. (b) Error in yo. (c) Set of wo 
consistent with y2 
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Filter (time): Gaussian - sinc 


Sparse Fourier Transform, Fig. 2 Filters used in [13] 
domain: width O(n/k) rectangle 


In reality yz has a small constant chance of 
failure at each stage. One could fix this by taking 
O(log log n) different samples of yg at each stage 
and using the median, which would give an algo- 
rithm with O(logn log logn) sample complexity 
and time. An alternative, as used in [13], is 
to learn i* in chunks of O(loglogn) bits at a 
time, which gets the optimal O(log) sample 
complexity using O(log!'! n) running time. 


Partial k-Sparse Recovery 

The goal of partial k-sparse recovery is to find 
most of the heavy coordinates of x. The general 
idea is to “hash” the coordinates randomly into 
B = Ok) bins in a way that lets us take mea- 
surements of the signal restricted to frequencies 
within each bin. By taking the measurements 
corresponding to the one-sparse recovery algo- 
rithm, we recover frequencies that are alone in 
their bin. This will happen with a large constant 
(say, 90 %) probability for each heavy frequency, 
sO we recover most of the heavy frequencies 
well. 

To see how this is done, we start with a 
deterministic way of hashing the frequencies into 
bins and then show how to randomize it. Hashing 
is based on filters that are sparse in both time 
and frequency domain. The filter F is designed 
to be as close as possible to a rectangular filter 
in frequency domain while still being sparse in 
frequency domain. Figure 2 demonstrates the 
filter used in [13], where F is a sinc function 
times a (truncated) Gaussian with support size 
O(k logn). In frequency domain, F approxi- 
mates a rectangle of width O(n/k), matching 
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Filter (frequency): Gaussian * rectangle 


. (a) In time domain: O(k logn) sparse. (b) In frequency 


it up to a small transition region between the 
passband and the stopband and with 1/n° error 
inside the passband and stopband. 

Using these filters, Fig.3 demonstrates a 
method for learning information about the 
signal. Given the signal x, we compute the 
O(k logn)-size vector F - x. We then “alias” 
it down to B = O(k) elements — adding up 
terms 1,B + 1,2B + 1,... — and take the B- 
dimensional DFT. This lets us compute the red 
points in Fig.3f in O(klogn + BlogB) = 
O(k logn) time. The red points are B evenly 
spaced samples of £ * F. 

We can think of the 7th red point in a different 
way. The ith red point is the sum of all the 
entries of £-shift(F, in/B), where shift(F, in/B) 
denotes shifting F to the right by in/B. This 
equals the zeroth time domain coefficient of the 
vector with Fourier coefficients given by < - 
shift(F, in/B). And if our algorithm looks not 
at yj = F;x; but yO = FjXxXj+a when 
computing the red points, then the 7th red point 
will equal the ath time domain coefficient of 
the vector with Fourier coefficients given by x - 
shift(F ,in/B). This lets us sample from the time 
domain representation of the vectors with Fourier 
coefficients given by X - shift(F, in /B) fori € 
[B]. It takes O(k log) time to get these samples, 
for O(logn) overhead (in time and samples) per 
“effective” sample. 

Now, we simply choose our samples a from 
the distribution requested by the one-sparse 
recovery algorithm. In every bucket for which 
XK: shift(F, in/B) is one-sparse, this procedure 
will let us recover the heavy frequency. Because 
the different shifts of F give B different buckets, 
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Sparse Fourier 


Original singnal x 


Sparse Fourier Transform 


Goal £ 


Transform, Fig. 3 The 

algorithm for hashing used a 
in [13]. For simplicity, the 
illustrations do not include 

noise. (a) The signal in 

time domain. (b) 

Corresponds to this signal 

in frequency domain. (c) 

We observe F - x fora 


sparse F’. (d) Which has 
the dashed n-dimensional 
DFT. (e) We alias from 


Computed F-x 


Filtered signal Fx 


O(k logn) terms to O(k). 
(f) And compute the 
O(k)-dimensional DFT 
(dots) 


' 
1 


Fz aliased to k terms 


Computed samples of Fz 
f 


if the frequencies were randomly distributed, this 
technique would get us partial sparse recovery. 
The one-sparse recovery algorithm only takes 
O(log(n/k)) samples because each frequency 
is known to lie within an n/B = O(n/k) 
size region; hence the overall method takes 
O(k logn log(n/k)) time and samples. 

We would like the algorithm to work for ar- 
bitrary input signals, so we need a way of ran- 
domizing the frequencies. To do this, we further 
refine the algorithm to choose a random o,b € 
[n] with o relatively prime to n. Then we have 
the algorithm look at a a Ejtgteaje 
The effect of o,b is to apply an hash func- 
tion j — o 'j + bd in frequency domain; 
this is approximately pairwise independent, so 
the frequencies become effectively randomly dis- 
tributed. Each frequency then has a good chance 
of landing alone in its bucket, so we can recover 


most frequencies in O(k log(n/k)logn) time 
and samples. 


Full k-Sparse Recovery 

Once we have partial k-sparse recovery, one 
can naively achieve full k-sparse recovery by 
repeating the algorithm O(logk) times. Since 
each heavy frequency is recovered with 90% 
probability in each stage, the median of all the es- 
timations will recover all the heavy frequencies — 
and in fact achieve (1) — with high probability. 
This method is simple but loses a log k factor in 
running time and sample complexity, which more 
intricate techniques can avoid. 

One such technique, used in [13] and based 
off [11], is to use smaller and smaller k in succes- 
sive iterations. Once we have performed partial 
sparse recovery on % to get < that contains 
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90 % of the heavy hitters, we can then perform 
sparse recovery on the residual £ — <. The 
residual is then roughly &/10-sparse, so we run 
a partial k/10-sparse recovery algorithm in the 
second stage that is much faster than in the first 
stage. Similar geometric decay happens in later 
stages, so the total time spent will be dominated 
by the first stage. This gives O(k log(n/k) logn) 
time and sample complexity for the problem. 
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Problem Definition 


For a pair of numbers a,f, a>1, B>0, 
a subgraph G’=(V,H) of an unweighted 
undirected graph G=(V,E), H CE, is an 
(a, B)-spanner of G if for every pair of vertices 
u,w€ V, distg/(u,w) < a - distg(u,w) + B, 
where distg (u, w) stands for the distance between 
u and w in G. It is desirable to show that for 
every n-vertex graph there exists a sparse (a, B)- 
spanner with as small values of a and 6 as 
possible. The problem is to determine asymptotic 
tradeoffs between a and 6 on one hand, and the 
sparsity of the spanner on the other. 


Key Results 


The main result of Elkin and Peleg [8] establishes 
the existence and efficient constructibility of 
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(1+ <,)-spanners of size O(Bn'!+1/*) for 
every n-vertex graph G, where 6 = f(e,k) 
is constant whenever « and e€ are. The 
specific dependence of 6 on xk and € is 
Blk, €) = x loglogx—loge | 

An important ingredient of the construction 
of [8] is a partition of the graph G into regions 
of small diameter in such a way that the super- 
graph induced by these regions is sparse. The 
study of such partitions was initiated by Awer- 
buch [3], that used them for network synchro- 
nization. Peleg and Schaffer [10] were the first to 
employ such partitions for constructing spanners. 
Specifically, they constructed (O(x), 1)-spanners 
with O(n!+!/*) edges. Althofer et al. [2] pro- 
vided an alternative proof of the result of Peleg 
and Schaffer that uses an elegant greedy argu- 
ment. This argument also enabled Althofer et 
al. to extend the result to weighted graphs, to 
improve the constant hidden by the O-notation 
in the result of Peleg and Schaffer, and to obtain 
related results for planar graphs. 


Applications 

Efficient algorithms for computing sparse 
(1 + e, 8)-spanners were devised in [7] and [13]. 
The algorithm of [7] was used in [7, 9, 12] for 
computing almost shortest paths in centralized, 
distributed, streaming, and dynamic centralized 
models of computations. The basic approach used 
in these results is to construct a sparse spanner, 
and then to compute exact shortest paths on the 
constructed spanner. The sparsity of the latter 
guarantees that the computation of shortest paths 
in the spanner is far more efficient than in the 
original graph. 


Open Problems 


The main open question is whether it is possi- 
ble to achieve similar results with e = 0. More 
formally, the question is: Is it true that for any 
« > 1 and any n-vertex graph G there exists 
(1, B(k))-spanner of G with O(n!*+!/*) edges? 
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This question was answered in affirmitive for k 
equal to 2, 5/2, and 3 [1, 4-6, 8]. Some lower 
bounds were recently proved by Woodruff [14]. 
A less challenging problem is to improve the 
dependence of 6 on € and kx. Some progress 
in this direction was achieved by Thorup and 
Zwick [13], and very recently by Pettie [11]. 
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Problem Definition 


In the Sparsest Cut problem, informally, the goal 
is to partition a given graph into two or more large 
pieces while removing as few edges as possible. 
Graph partitioning problems such as this one oc- 
cupy a central place in the theory of network flow, 
geometric embeddings, and Markov chains, and 
form a crucial component of divide-and-conquer 
approaches in applications such as packet rout- 
ing, VLSI layout, and clustering. 

Formally, given a graph G = (V, E), the spar- 
sity or edge expansion of anon-empty set S C V, 
|S| < 4\V\, is defined as follows: 


|E(S,V \ S)| 
a(S) = —————_.. 
S| 
The sparsity of the graph, «(G), is then defined as 
follows: 


min 


a(G) = 
SCV,|S|<51V| 


a(S). 
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The goal in the Sparsest Cut problem is to find 
a subset S C V with the minimum sparsity, and 
to determine the sparsity of the graph. 

The first approximation algorithm for the 
Sparsest Cut problem was developed by Leighton 
and Rao in 1988 [13]. Employing a linear 
programming relaxation of the problem, they 
obtained an O(log) approximation, where n is 
the size of the input graph. Subsequently Arora, 
Rao and Vazirani [4] obtained an improvement 
over Leighton and Rao’s algorithm using 
a semi-definite programming relaxation, approx- 
imating the problem to within an O(,/logn) 
factor. 

In addition to the Sparsest Cut problem, Arora 
et al. also consider the closely related Balanced 
Separator problem. A partition (S,V \ S) of the 
graph G is called a c-balanced separator for 
0<c< 5, if both S and V \ S have at least 
c|V| vertices. The goal in the Balanced Separator 
problem is to find a c-balanced partition with 
the minimum sparsity. This sparsity is denoted 


ac(G). 


Key Results 


Arora et al. provide an O(,/logn) pseudo- 
approximation to the balanced-separator problem 
using semi-definite programming. In particular, 
given a constant c € (0, 5] they produce 
a separator with balance c’ that is slightly worse 
than c (that is, c’ <c), but sparsity within an 
O(./logn) factor of the sparsity of the optimal 
c-balanced separator. 


Theorem 1 Given a graph G=(V,E), let 
ac(G) be the minimum edge expansion of 
a c-balanced separator in this graph. Then 
for every fixed constant a <1, there exists 
a polynomial-time algorithm for finding a_ c’'- 
balanced separator in G, with c’ > ac, that has 
edge expansion at most O(./logna,(G)). 


Extending this theorem to include unbalanced 
partitions, Arora et al. obtain the following: 


Theorem 2 Let G=(V,E) be a graph with 
sparsity a(G). Then there exists a polynomial-time 
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algorithm for finding a partition (S,V \ S), 
with SCV, S #9, having sparsity at most 


O(/logna(G)). 


An important contribution of Arora et al. is 
a new geometric characterization of vectors 
in n-dimensional space endowed with the 
squared-Euclidean metric. This result is of 
independent significance and has lead to or 
inspired improved approximation factors for 
several other partitioning problems (see, for 
example, [1, 5, 6, 7, 11]). 

Informally, the result says that if a set of points 
in n-dimensional space is randomly projected on 
to a line, a good separator on the line is, with 
high probability, a good separator (in terms of 
squared-Euclidean distance) in the original high- 
dimensional space. Separation on the line is re- 
lated to separation in the original space via the 
following definition of stretch. 


Definition 1 (Def. 4 in [4]) Let X1,X2,....Xn 
be a set of n points in R”, equipped with the 
squared-Euclidean metric d(x, y) = |x — y]3. 
The set of points is said to be (t, y, B) -stretched 
at scale £, if for at least a y fraction of all the 
n-dimensional unit vectors u, there is a partial 
matching M, = {(x;, y;)}; among these points, 
with |M,| => Bn, such that for all (x,y) € M,, 
d(x, y) < 7 and (u, x — ¥) > t£/./n. Here (-, -) 
denotes the dot product of two vectors. 


Theorem 3 For any y, B > 0, there is a constant 
C = C(y, B) such that if t > C log!/3 n, then no 
set of n points in R” can be (t, y, B)-stretched for 
any scale €. 


In addition to the SDP-rounding algorithm, Arora 
et al. provide an alternate algorithm for finding 
approximate sparsest cuts, using the notion of ex- 
pander flows. This result leads to fast (quadratic 
time) implementations of their approximation al- 
gorithm [3]. 


Applications 


One of the main applications of balanced sepa- 
rators is in improving the performance of divide 
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and conquer algorithms for a variety of optimiza- 
tion problems. 

One example is the Minimum Cut Linear 
Arrangement problem. In this problem, the 
goal is to order the vertices of a given n 
vertex graph G from 1 through n in such 
a way that the capacity of the largest of the 
cuts ({1,2,--- ,i}, {2 +1,--- ,n}), ie [1,n], 
is minimized. Given a p-approximation to the 
balanced separator problem, the following divide 
and conquer algorithm gives an O(plogn)- 
approximation to the Minimum Cut Linear 
Arrangement problem: find a balanced separator 
in the graph, then recursively order the two 
parts, and concatenate the orderings. The 
approximation follows by noting that if the graph 
has a balanced separator with expansion a@-(G), 
only O(ena,(G)) edges are cut at every level, 
and given that a balanced separator is found at 
every step, the number of levels of recursion is at 
most O(logn). 

Similar approaches can be used for problems 
such as VLSI layout and Gaussian elimination. 
(See the survey by Shmoys [14] for more details 
on these topics.) 

The Sparsest Cut problem is also closely 
related to the problem of embedding squared- 
Euclidean metrics into the Manhattan (£1) 
metric with low distortion. In particular, the 
integrality gap of Arora et al.’s semi-definite 
programming relaxation for Sparsest Cut 
(generalized to include weights on vertices and 
capacities on edges) is exactly equal to the 
worst-case distortion for embedding a squared- 
Euclidean metric into the Manhattan metric. 
Using the technology introduced by Arora et 
al., improved embeddings from the squared- 
Euclidean metric into the Manhattan metric have 
been obtained [5, 7]. 


Open Problems 


Hardness of approximation results for the 
Sparsest Cut problem are fairly weak. Recently 
Chuzhoy and Khanna [9] showed that this 
problem is APX-hard, that is, there exists a con- 
stant € > 0, such that a (1 + €)-approximation 
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algorithm for Sparsest Cut would imply P = NP. 
It is conjectured that the weighted version 
of the problem is NP-hard to approximate 
better than O((loglogn)*°) for some constant 
c, but this is only known to hold true 
assuming a version of the so-called Unique 
Games conjecture [8, 12]. On the other hand, 
the semi-definite programming relaxation of 
Arora et al. is known to have an integrality 
gap of S2(loglogn) even in the unweighted 
case [10]. Proving an unconditional super- 
constant hardness result for weighted or un- 
weighted Sparsest Cut, or obtaining o(,/logn)- 
approximations for these problems remain 
open. 

The directed version of the Sparset Cut prob- 
lem has also been studied, and is known to be 
hard to approximate within a 2% (log!) fac. 
tor [9]. On the other hand, the best approxi- 
mation known for this problem only achieves 
a polynomial factor of approximation—a factor of 
O(n"1/23 Jog? n) due to Aggarwal, Alon and 
Charikar [2]. 


Recommended Reading 


1. Agarwal A, Charikar M, Makarychev K, Makarychev 
Y (2005) Proceedings of the 37th ACM symposium 
on theory of computing (STOC), Baltimore, May 
2005, pp 573-581 

2. Aggarwal A, Alon N, Charikar M (2007) Improved 
approximations for directed cut problems. In: Pro- 
ceedings of the 39th ACM symposium on the- 
ory of computing (STOC), San Diego, June 2007, 
pp 671-680 

3. Arora S, Hazan E, Kale S (2004) Proceedings of the 
45th IEEE symposium on foundations of computer 
science (FOCS), Rome, 17-19 Oct 2004, pp 238-247 

4. Arora S, Rao S, Vazirani U (2004) Expander flows, 
geometric embeddings, and graph partitionings. In: 
Proceedings of the 36th ACM symposium on the- 
ory of computing (STOC), Chicago, June 2004, 
pp 222-231 

5. Arora S, Lee J, Naor A (2005) Euclidean distortion 
and the sparsest cut. In: Proceedings of the 37th 
ACM Symposium on Theory of Computing (STOC), 
Baltimore, May 2005, pp 553-562 

6. Arora S, Chlamtac E, Charikar M (2006) New 
approximation guarantees for chromatic number. 
In: Proceedings of the 38th ACM symposium on 
theory of computing (STOC), Seattle, May 2006, 
pp 215-224 


2045 


7. Chawla S, Gupta A, Racke H (2005) Embeddings 
of negative-type metrics and an improved approx- 
imation to generalized sparsest cut. In: Proceed- 
ings of the ACM-SIAM symposium on discrete al- 
gorithms (SODA), Vancouver, Jan 2005, pp 102- 
111 

8. Chawla S, Krauthgamer R, Kumar R, Rabani Y, 
Sivakumar D (2005) On the hardness of approximat- 
ing sparsest cut and multicut. In: Proceedings of the 
20th IEEE conference on computational complexity 
(CCC), San Jose, June 2005, pp 144-153 

9. Chuzhoy J, Khanna S (2007) Polynomial flow-cut 
gaps and hardness of directed cut problems. In: 
Proceedings of the 39th ACM symposium on the- 
ory of computing (STOC), San Diego, June 2007, 
pp 179-188 


10. Devanur N, Khot S, Saket R, Vishnoi N 
(2006) Integrality gaps for sparsest cut and 
minimum linear arrangement problems. In: 


Proceedings of the 38th ACM symposium on 
theory of computing (STOC), Seattle, May 2006, 
pp 537-546 

11. Feige U, Hajiaghayi M, Lee J (2005) Improved ap- 
proximation algorithms for minimum-weight vertex 
separators. In: Proceedings of the 37th ACM sym- 
posium on theory of computing (STOC), Baltimore, 
May 2005, pp 563-572 

12. Khot S, Vishnoi N (2005) Proceedings of the 46th 
IEEE symposium on foundations of computer science 
(FOCS), Pittsburgh, Oct 2005, pp 53-62 

13. Leighton FT, Rao SB (1988) An approximate max- 
flow min-cut theorem for uniform multicommodity 
flow problems with applications to approximation 
algorithms. In: Proceedings of the 29th IEEE sym- 
posium on foundations of computer science (FOCS), 
White Plains, Oct 1988, pp 422-431 

14. Shmoys DB (1997) Cut problems and _ their 
application to divide-and-conquer. In: Hochbaum 


DS (ed) Approximation algorithms for NP- 
hard problems. PWS _ Publishing, Boston, 
pp 192-235 

Speed Scaling 

Kirk Pruhs 


Department of Computer Science, University of 
Pittsburgh, Pittsburgh, PA, USA 


Keywords 


Frequency 
scaling 


scaling; Speed scaling; Voltage 


2046 


Years and Authors of Summarized 
Original Work 


1995; Yao, Demers, Shenker 


Problem Definition 


Speed scaling is a power management technique 
in modern processor that allows the processor to 
run at different speeds. There is a power function 
P(s) that specifies the power, which is energy 
used per unit of time, as a function of the speed. 
In CMOS-based processors, the cube-root rule 
states that P(s) ~ s>. This is usually generalized 
to assume that P(s) = s® form some constant a. 
The goals of power management are to reduce 
temperature and/or to save energy. Energy is 
power integrated over time. Theoretical investi- 
gations to date have assumed that there is a fixed 
ambient temperature and that the processor cools 
according to Newton’s law, that is, the rate of 
cooling is proportional to the temperature differ- 
ence between the processor and the environment. 

In the resulting scheduling problems, the 
scheduler must not only have a job-selection 
policy to determine the job to run at each time, 
but also a speed scaling policy to determine 
the speed at which to run that job. The 
resulting problems are generally dual objective 
optimization problems. One objective is some 
quality of service measure for the schedule, and 
the other objective is temperature or energy. 

We will consider problems where jobs arrive at 
the processor over time. Each job 7 has a release 
time r; when it arrives at the processor, and 
a work requirement w;. A job irun at speed s takes 
w;/s units of time to complete. 


Key Results 


Yao et al. [5] initiated the theoretical algorithmic 
investigation of speed scaling problems. Yao et 
al. [5] assumed that each job i had a deadline 
d;, and that the quality of service measure was 
deadline feasibility (each job completes by its 
deadline). Yao et al. [5] gives a greedy algorithm 
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YDS to find the minimum energy feasible sched- 
ule. The job selection policy for YDS is to run the 
job with the earliest deadline. To understand the 
speed scaling policy for YDS, define the intensity 
of a time interval to be the work that must be 
completed in this time interval divided by the 
length of the time interval. YDS then finds the 
maximum intensity interval, runs the jobs that 
must be run in this interval at constant speed, 
eliminates these jobs and this time interval from 
the instance, and proceeds recursively. Yao et al. 
[5] gives two online algorithms: OA and AVR. In 
OA the speed scaling policy is the speed that YDS 
would run at, given the current state and given 
that no more jobs will be released in the future. 
In AVR, the rate at which each job is completed 
is constant between the time that a job is released 
and the deadline for that job. Yao et al. [5] showed 
that AVR is 2*~!a@*-competitive with respect to 
energy. 

The results in [5] were extended in [2]. Bansal 
et al. [2] showed that OA is w*-competitive with 
respect to energy. Bansal et al. [2] proposed 
another online algorithm, BKP. BKP runs at the 
speed of the maximum intensity interval contain- 
ing the current time, taking into account only the 
work that has been released by the current time. 
They show that the competitiveness of BKP with 
respect to energy is at most 2(a/(@ — 1))*e®. 
They also show that BKP is e-competitive with 
respect to the maximum speed. 

Bansal et al. [2] initiated the theoretical 
algorithmic investigation of speed scaling to 
manage temperature. Bansal et al. [2] showed 
that the deadline feasible schedule that minimizes 
maximum temperature can in principle be 
computed in polynomial time. Bansal et al. 
[2] showed that the competitiveness of BKP 
with respect to maximum temperature is at most 
2%+1 6% (6(a/(a — 1))* + 1). 

Pruhs et al. [4] initiated the theoretical al- 
gorithmic investigation into speed scaling when 
the quality-of-service objective is average/total 
flow time. The flow time of a job is the delay 
from when a job is released until it is com- 
pleted. Pruhs et al. [4] give a rather complicated 
polynomial-time algorithm to find the optimal 
flow time schedule for unit work jobs, given 
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a bound on the energy available. It is easy to see 
that no O(1)-competitive algorithm exists for this 
problem. 

Albers and Fujiwara [1] introduce the objec- 
tive of minimizing a linear combination of energy 
used and total flow time. This has a natural 
interpretation if one imagines the user specifying 
how much energy he is willing to use to increase 
the flow time of a job by a unit amount. Albers 
and Fujiwara [1] give an O(1)-competitive online 
algorithm for the case of unit work jobs. Bansal 
et al. [3] improves upon this result and gives a 4- 
competitive online algorithm. The speed scaling 
policies of the online algorithms in [1] and [3] 
essentially run as power equal to the number 
of unfinished jobs (in each case modified in 
a particular way to facilitate analysis of the al- 
gorithm). Bansal et al. [3] extend these results 
to apply to jobs with arbitrary work, and even 
arbitrary weight. The speed scaling policy is 
essentially to run at power equal to the weight 
of the unfinished work. The expression for the 
resulting competitive ratio is a bit complicated 
but is approximately 8 when the cube-root rule 
holds. 

The analysis of the online algorithms in [2] 
and [3] heavily relied on amortized local com- 
petitiveness. An online algorithm is locally com- 
petitive for a particular objective if for all times 
the rate of increase of that objective for the 
online algorithm, plus the rate of change of some 
potential function, is at most the competitive ratio 
times the rate of increase of the objective in any 
other schedule. 


Applications 


None 


Open Problems 


The outstanding open problem is probably to 
determine if there is an efficient algorithm to 
compute the optimal flow time schedule given 
a fixed energy bound. 
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Problem Definition 


The sphere packing problem seeks to pack 
spheres into a given geometric domain. The 
problem is an instance of geometric packing. 
Geometric packing is a venerable topic in 
mathematics. Various versions of geometric 
packing problems have been studied, depending 
on the shapes of packing domains, the types 
of packing objects, the position restrictions 
on the objects, the optimization criteria, the 
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dimensions, etc. It also arises in numerous 
applied areas. The sphere packing problem 
under consideration here finds applications in 
radiation cancer treatment using Gamma Knife 
systems. Unfortunately, even very restricted 
versions of geometric packing problems (e.g., 
regular-shaped objects and domains in lower 
dimensional spaces) have been proved to be 
NP-hard. For example, for congruent packing 
(i.e., packing copies of the same object), it is 
known that the 2-D cases of packing fixed-sized 
congruent squares or disks in a simple polygon 
are NP-hard [7]. Baur and Fekete [2] considered 
a closely related dispersion problem of packing 
k congruent disks in a polygon of n vertices 
such that the radius of the disks is maximized; 
they proved that the dispersion problem cannot 
be approximated arbitrarily well in polynomial 
time unless P = NP, and gave a 2-approximation 
algorithm for the Loo disk case with a time bound 
of O(n*®). 

Chen et al. [4] proposed a practically efficient 
heuristic scheme, called pack-and-shake, for the 
congruent sphere packing problem, based on 
computational geometry techniques. The prob- 
lem is defined as follows. 


The Congruent Sphere Packing Problem 
Given a d-D polyhedral region R(d = 2,3) of n 
vertices and a value r > 0, find a packing SP of R 
using spheres of radius r, such that (i) each sphere 
is contained in R, (i) no two distinct spheres 
intersect each other in their interior, and (iii) the 
ratio (called the packing density) of the covered 
volume in R by SP over the total volume of R is 
maximized. 

In the above problem, one can view the 
spheres as “solid” objects. The region R is also 
called the domain or container. Without loss of 
generality, letr = 1. 

Much work on congruent sphere packing stud- 
ied the case of packing spheres into an unbounded 
domain or even the whole space [5]. There are 
also results on packing congruent spheres into 
a bounded region. Hochbaum and Maass [8] pre- 
sented a unified and powerful shifting technique 
for designing pseudo-polynomial time approxi- 
mation schemes for packing congruent squares 
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into a rectilinear polygon. But, the high time com- 
plexities associated with the resulting algorithms 
restrict their applicability in practice. Another 
approach is to formulate a packing problem as 
a non-linear optimization problem, and resort to 
an available optimization software to generate 
packings; however, this approach works well only 
for small problem sizes and regular-shaped do- 
mains. 

To reduce the running time yet achieve a dense 
packing, a common idea is to consider objects 
that form a certain lattice or double-lattice. 
A number of results were given on lattice packing 
of congruent objects in the whole (especially 
high dimensional) space [5]. For a bounded 
rectangular 2-D domain, Milenkovic [10] 
adopted a method that first finds the densest 
translational lattice packing for a set of polygonal 
objects in the whole plane, and then uses 
some heuristics to extract the actual bounded 
packing. 


Key Results 


The pack-and-shake scheme of Chen et al. [4] 
for packing congruent spheres in an irregular- 
shaped 2-D or 3-D bounded domain R consists 
of three phases. In the first phase, the d-D 
domain R is partitioned into a set of convex 
subregions (called cells). The resulting set of 
cells defines a dual graph Gp, such that each 
vertex uv of Gp corresponds to a cell C(v) and 
an edge connects two vertices if and only if 
their corresponding cells share a (d — 1)-D face. 
In the second phase, the algorithm repeats the 
following trimming and packing process until 
Gp =9%: Remove the lowest degree vertex 
uv from Gp and pack the cell C(v). In the 
third phase, a shake procedure is applied to 
globally adjust the packing to obtain a denser 
one. 

The objective of the trimming and packing 
procedure is that after each cell is packed, the 
remaining “packable” subdomain R’ of R is al- 
ways kept as a connected region. The rationale for 
maintaining the connectivity of R’ is as follows. 
To pack spheres in a bounded domain R, two 
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typical approaches have been used: (a) packing 
spheres layer by layer going from the boundary 
of R towards its interior [9], and (b) packing 
spheres starting from the “center” of R, such 
as its medial axis, towards its boundary [3, 13, 
14]. Due to the shape irregularity of R, both 
approaches may fragment the remaining “pack- 
able” subdomain R’ into more and more dis- 
connected regions; however, at the end of pack- 
ing each such region, a small “unpackable” area 
may eventually remain that allows no further 
packing. It could fit more spheres if the “pack- 
able” subdomain R’ is lumped together instead 
of being divided into fragments, which is what 
the trimming and packing procedure aims to 
achieve. 

Due to the packing of its adjacent cells 
that have been done by the trimming and 
packing procedure, the boundary of a cell 
C(v) that is to be packed may consist of both 
line segments and arcs (from packed spheres). 
Hence, a key problem is to pack spheres in 
a cell bounded by curves of low degrees. Chen 
et al.’s algorithms [4] for packing each cell are 
based on certain lattice structures and allow 
the cell to both translate and rotate. Their 
algorithms have fairly low time bounds. In 
certain cases, they even run in nearly linear 
time. 

An interesting feature of the cell pack- 
ings generated by the trimming and_pack- 
ing procedure is that the resulted spheres 
cluster together in the middle of the cells 
of the domain R, leaving some small un- 
packable areas scattered along the bound- 
ary of R. The “shake” procedure in [4] 
thus seeks to collect these small areas to- 
gether by “pushing” the spheres towards 
the boundary of R, in the hope of obtain- 
ing some “packable” region in the middle 
of R. 

The approach in [4] is to first obtain a densest 
lattice unit sphere packing LSP(C) for each cell 
C of R, and then use a “shake” procedure to 
globally adjust the resulting packing of R to 
generate a denser packing SP in R. Suppose 
the plane P is already packed by infinitely 
many unit spheres whose center points form 
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a lattice (e.g., the hexagonal lattice). To obtain 
a densest packing LSP(C) for a cell C from the 
lattice packing of the plane P, a position and 
orientation of C on P need to be computed 
such that C contains the maximum number 
of spheres from the lattice packing of P. 
There are two types of algorithms in [4] for 
computing an optimal placement of C on P: 
translational algorithms that allow C to be 
translated only, and__ translational/rotational 
algorithms that allow C to be both translated 
and rotated. 

Let n = |C|, the number of bounding curves 
of C, and m be the number of spheres along 
the boundary of C in a sought optimal packing 
of C. 


Theorem 1 Given a_ polygonal region C 
bounded by n algebraic curves of constant 
degrees, a densest lattice unit sphere packing 
of C based only on translational motion can 
be computed in O(N log N + K) time, where 
N = f(n,m) is a function of n and m, and K 
is the number of intersections between N planar 
algebraic curves of constant degrees that are 
derived from the packing instance. 


Note: In the worst case, N = f(n,m)=nxm. 
But in practice, N may be much smaller. The 
N planar algebraic curves in Theorem | form 
a structure called arrangement. Since all these 
curves are of a constant degree, any two such 
curves can intersect each other at most a constant 
number of times. In the worst case, the num- 
ber K of intersections between the N algebraic 
curves, which is also the size of the arrange- 
ment, is O(N’). The arrangement of these curves 
can be computed by the algorithms [1, 6] in 
O(N log N + K) time. 


Theorem 2 Given a_ polygonal region C 
bounded by n_ algebraic curves of con- 
stant degrees, a densest lattice unit sphere 
packing of C based on both translational 
and rotational motions can be computed 
in O(T(n)+(N+4+K’)logN) time, where 
N= f(n,m) is a function of n and m, 
K' is the size of the arrangement of N 
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pseudo-plane surfaces in 3-D that are derived 
from the packing instance, and T(n) is the 
time for solving O(n?) quadratic optimization 
problem instances associated with the packing 
instance. 


In Theorem 2, K’ = O(N?%) in the worst case. In 
practice, K’ can be much smaller. 

The results on 2-D sphere packing in [4] 
can be extended to d-D for any constant integer 
d > 3, so long as a good d-D lattice packing of 
the d-D space is available. 


Applications 


Recent interest in the considered congruent 
sphere packing problem was motivated by 
medical applications in Gamma Knife radio- 
surgery [4, 11, 12]. Radiosurgery is a minimally 
invasive surgical procedure that uses radiation 
to destroy tumors inside human body while 
sparing the normal tissues. The Gamma Knife 
is a radiosurgical system that consists of 201 
Cobalt-60 sources [3, 14]; the gamma-rays from 
these sources are all focused on a common 
center point, thus creating a spherical volume 
of radiation field. The Gamma Knife treatment 
normally applies high radiation dose. In this 
setting, overlapping spheres may result in 
overdose regions (called hot spots) in the target 
treatment domain, while a low packing density 
may cause underdose regions (called cold spots) 
and a non-uniform dose distribution. Hence, one 
may view the spheres used in Gamma Knife 
packing as “solid” spheres. Therefore, a key 
geometric problem in Gamma Knife treatment 
planning is to fit multiple spheres into a 3-D 
irregular-shaped tumor [3, 13, 14]. The total 
treatment time crucially depends on the number 
of spheres used. Subject to a given packing 
density, the minimum number of spheres used 
in the packing (i.e., treatment) is desired. The 
Gamma Knife currently produces spheres of 
four different radii (4, 8, 14, and 18 mm), and 
hence the Gamma Knife sphere packing is in 
general not congruent. In practice, a commonly 
used approach is to pack larger spheres first, 
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and then fit smaller spheres into the remaining 
subdomains, in the hope of reducing the total 
number of spheres involved and thus shortening 
the treatment time. Therefore, congruent sphere 
packing can be used as a key subroutine for such 
a common approach. 


Open Problems 


An open problem is to analyze the quality bounds 
of the resulting packing for the algorithms in [4]; 
such packing quality bounds are currently not yet 
known. Another open problem is to reduce the 
running time of the packing algorithms in [4], 
since these algorithms, especially for sphere 
packing problems in higher dimensions, are still 
very time-consuming. In general, it is highly 
desirable to develop efficient sphere packing 
algorithms in d-D (d > 2) with guaranteed good 
packing quality. 


Experimental Results 


Some experimental results of the 2-D pack- 
and-shake sphere packing algorithms were 
given in [4]. The planar hexagonal lattice 
was used for the lattice packing. On packings 
whose sizes are in the hundreds, the C++ 
programs of the algorithms in [4] based only 
on translational motion run very fast (a few 
minutes), while those of the algorithms based 
on both translation and rotation take much 
longer time (hours), reflecting their respective 
theoretical time bounds, as expected. On the 
other hand, the packing quality of the translation- 
and-rotation based algorithms is a little better 
than the translation based algorithms. The 
packing densities of all the algorithms in the 
experiments are well above 70 % and some are 
even close to or above 80 %. Comparing with the 
nonconvex programming methods, the packing 
algorithms in [4] seemed to run faster based on 
the experiments. 
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Problem Definition 


Introduced by Cunningham and Edmonds [11], 
the split decomposition, also known as the join 
(or 1-join) decomposition, ranges among the 
classical graph decomposition schemes. Given a 
graph G = (V,£), a bipartition (A, B) of the 
vertex set V (with |A| > 2 and |B| > 2) is a split 
if there are subsets A’ C A and B’ C B, called 
frontiers, such that there is an edge between a 
vertex u € A and v € B if and only if u € A’ 
and v € B’ (see Fig.1). A graph is prime if 
it does not contain any split. Observe that an 
induced cycle of length at least 5 is a prime 
graph. A graph is degenerate if every bipartition 
(A, B) with |A| > 2 and |B| > 2 is a split. 
It can be shown that a degenerate graphs are 
either cliques or stars. The split decomposition 
consists in recursively decompose a graph into 
a set of disjoint graphs {G,,...G x}, called split 
components, each of which is either prime or 
degenerate. There are two cases: 


1. If G is prime or degenerate, then return the set 


{G}; 
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Split Decomposition via Graph-Labelled Trees, Fig. 
1 A circle graph G with a chord diagram on the right 
and its split decomposition tree S7'(G) on the /eft. The 
nodes v and y are prime nodes, whereas u is a star node 
and w a clique node. The bipartition ({f,g,h,i},V \ 
{f, g,h,i}) forms a split of G and corresponds to a tree 
edge of ST(G). The frontiers are {f,7} on one side and 


2. If G is neither prime nor degenerate, it con- 
tains a split (A, B), with frontiers A’ and B’. 
The split components of G is then the union 
of the split components of the graphs G[A] + 
(a, A’) and G[B] + (b, B’), where a and b are 
new vertices, called markers. 


Observe that the split decomposition process nat- 
urally defines a decomposition tree whose nodes 
represent the split components. This decomposi- 
tion tree can be represented by a graph-labeled 
tree (GLT) (see [16, 18]) defined as a pair (T, F), 
where T is a tree and ¥ a set of graphs, such 
that each node u of T is labeled by the graph 
G(u) € F, and there exists a bijection p,, between 
the edges of T incident to u and the vertices 
of G(u), called marker vertices. We say that 
two leaves £, and €, of T are accessible if 
for every pair of consecutive tree edges uv and 
vw on the path from é, and fy in T, py(uv) 
and py(vw) are adjacent in G(v). From a GLT 
(T, F), we define an accessibility graph G(T, F) 
whose vertex set is the leaf set of J and two 
vertices a and b are adjacent if the corresponding 
leaves fq and £, are accessible. It is easy to 
observe that every tree edge e of a GLT (7, F) 
defines a split (A, B) of G(T, F) where A and 
B respectively contain the vertices corresponding 


{e, j,k, 1} on the other. Observe that ({k, 1}, V \ {k, 1}) 
is also a split but which is not represented by the tree edge 
between nodes Y and Z in ST(G). Because G is not 
a prime graph, it can be represented with several chord 
diagram. For example, exchanging the chord of y with the 
chord of z yields an alternative chord diagram 


to the leaves of the two connected components of 
T —e. Cunningham and Edmonds [11] formalized 
the family of splits as an example of partite 
family of bipartitions thereby implying that every 
graph admits a canonical split decomposition tree 
(see Fig. 1). In terms of GLTs, this translates as 
follows: 


Theorem 1 ({11, 16, 18]) Let G be a connected 
graph. There exists a unique GLT (T, F) whose 
labels are either prime or degenerate, having 
a minimal number of nodes and such that 
G = G(T, F). This GLT is called the split tree of 
G and denoted ST(G). 


The problem we are interested in is to effi- 
ciently compute the split tree S7(G) of a graph 
G = (V, E). The first polynomial-time algorithm 
was and runs in time O(nm), where n = |V | and 
m = |E|. Maand Spinrad [23] later developed an 
O(n?) algorithm. Finally Dahlhaus [12] designed 
the first linear-time algorithm which was recently 
revisited by Charbit et al. [5]. 


Key Results 


As mentioned above, the split tree of a graph can 
be computed in linear time. The algorithm we 
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describe here is nearly optimal, that is, runs in 
time O(n + m)-a(n +m), where a is the inverse 
Ackermann function. The fact that this algorithm 
incrementally builds the split tree is responsible 
of the small additional complexity cost. More 
precisely, updating the tree structure of the GLT 
representing the split tree relies on the union- 
find data structure [15]. But having an incre- 
mental split decomposition algorithm allows an 
extension of the algorithm, within the same time 
complexity, to the circle graph recognition [17], a 
problem for which computing the split decompo- 
sition is a corner step. But so far, a subquadratic 
time complexity cannot be reached using the 
previous linear (or quadratic) split decomposition 
algorithms. 


Theorem 2 ([18]) The split tree ST(G) of 
a graph G = (V,E), with |V| = n and 
|E| = m, can be built incrementally according 
to an LBFS ordering in time O(n + m) - 
a(n + m), where a is the inverse Ackermann 
Junction. 


It is important to observe that to reach the 
expected complexity, the algorithm inserts the 
vertices according to a LexBFS ordering [25]. 
These orderings, resulting from a lexicographic 
breadth-first search, appear in a number of recog- 
nition algorithms, such as chordal graphs [25], 
comparability graphs [20], interval graphs [22], 
and cographs [3]. The idea is that structural 
properties can be shown on the last vertex visited 
by a LexBFS. For example, in chordal graphs 
the last vertex is simplicial; in comparability 
graphs it is a source of some transitive orienta- 
tion. LexBFS, introduced in [25], works as fol- 
lows: it numbers the vertices decreasingly from 
n = |V| down to 1; initially every vertex re- 
ceives an empty label; then iteratively, an ar- 
bitrary unnumbered vertex x with lexicograph- 
ically largest label is selected and numbered i, 
and i is appended to the label of every unnum- 
bered neighbor of x. On the graph of Fig. 1, 
o = b,a,e,d,c, fii, j,k,1,h,g is a LexBFS 
ordering. 
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Applications 


Many graph classes can be characterized by 
means of the split decomposition. Below, 
we review the most important of these 
classes. Finally, we discuss the links between 
split decomposition and other decomposition 
approaches. 


Graph Classes 


Distance Hereditary Graphs 

The family of graphs for which the split tree 
does not contain any prime node is called to- 
tally decomposable (or totally separable). This 
terminology follows from the observation that for 
every subgraph of size at least 4, every nontrivial 
bipartition of the vertex set forms a split. A graph 
G is distance hereditary [1] if for every induced 
connected subgraph H of G and every pair of 
vertices x and y of H, the distance between x 
and y is the same in H and G. It turns out that 
a graph G is totally decomposable if and only 
if it is distance hereditary [1]. In other words, 
a graph G is distance hereditary if and only if 
every node of S'7(G) is either a star or a clique 
node. The first linear-time recognition algorithm 
of distance hereditary graphs, due to [21], relies 
on a breadth-first search characterization (see 
also [13]). More recently, a linear-time algorithm 
has been designed to update the split tree of a 
distance hereditary graph under vertex and edge 
insertion, leading to an alternative (vertex in- 
cremental) linear-time recognition algorithm for 
distance hereditary graphs. 


Theorem 3 ((16]) Let ST(G) be the split tree of 
a distance hereditary graph G = (V, E), S CV 
be a subset of vertices of G ande = (x,y) € E 
be a non-edge of G. Then: 


¢« In O(1)-time, we can compute ST(G + e) 
where G+e = (V, EU{e}) ifG+e is distance 
hereditary; 

¢ In O(|S|)-time, we can compute ST(G + 
(x, S) where G + (x,S) = VU {x},EU 
{(x,y) | y € S} if G + (x,S) is distance 
hereditary. 
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Subclasses of Totally Decomposable Graphs 

A GLT is called clique-star tree if its nodes are 
labeled either with cliques or stars. As a conse- 
quence of the discussion of the previous para- 
graph, distance hereditary graphs are the graphs 
corresponding the clique-star trees. Imposing any 
constraint on a clique-star tree thereby immedi- 
ately defines a subclass of distance hereditary 
graphs. It turns out that many important graph 
subclasses of distance hereditary graphs can be 
characterized with the split decomposition. 

The cographs, also known as complement- 
reducible graphs [8] or P4-free graphs, are prob- 
ably the most studied subclass of distance heredi- 
tary graphs. Cographs are also known as the class 
of graphs totally decomposable with respect to 
the modular decomposition [19], and their com- 
binatorial structure is captured by the so-called 
cotree. As noticed in [16], it is easy to observe 
that a graph G is a cograph if and only if its split 
tree ST(G) is aclique-star tree that can be rooted 
either at a node or at a tree edge such that every 
star node is “oriented” toward that root (that is the 
marker vertex corresponding the center of every 
star node is oriented toward the root). 

The class of ptolemaic graphs or 3-leaf power 
are also interesting. The class of ptolemaic graphs 
is defined as the intersection of distance heredi- 
tary graphs and chordal graphs. Chordal graphs 
are the graphs without induced chordless cycles 
of length four or more. It follows that a graph 
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G is ptolemaic if and only if S7(G) is a clique- 
star tree such that for every pair of star nodes u 
and v, not both extremities of the path from u 
to v in ST(G) are attached to the center marker 
vertex of u and v (otherwise this would generate 
a chordless 4-cycle). As a subclass of chordal 
graph, 3-leaf powers inherit the restrictions of 
ptolemaic graphs on the split tree with the addi- 
tive constraint that no clique node lies on the path 
between two star nodes (see [16] for details). 


Circle Graphs 

The split decomposition plays an important role 
in the context of circle graphs defined as inter- 
section graphs of a set of chords in a circle. The 
main reason is that a graph G is a circle graph 
if and only if every split component of G is a 
circle graph. In other words, as clique and stars 
are circle graphs, G is a circle graph if and only if 
the prime nodes of S7T(G) are labeled with circle 
graphs. Observe that this characterization shows 
that distance hereditary graphs form a subclass 
of circle graphs. By the way the first quadratic 
time circle graph recognition algorithm was ob- 
tained by computing the split decomposition of 
the input graph and reducing the problem to the 
recognition of prime circle graphs [23, 26]. The 
key property is that a prime circle graph has a 
unique (up to mirror) chord diagram [2, 14] (see 
Fig. 2). 
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Split Decomposition via Graph-Labelled Trees, Fig. 2 
On the Jeff, two distinct chord diagrams of the graph G 
depicted in Fig. 1 results from symmetric insertion of the 
chords representing the vertices { f, g,,i} (remind that 


(f,g,h,1},V \{f, g,h,1}) forma split). On the right, 
the chord diagram on {1, 2, 3, 4, 5} is the unique (up to 
rotation and mirror) chord diagram of the 5-cycle, which 
is a prime graph 
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The linear-time split decomposition  al- 
gorithm [12], proposed in the mid-1990s, 
did not lead to a linear-time circle graph 
recognition algorithm. For almost two decades, 
the quadratic time complexity [27] remained the 
best known complexity. The quadratic barrier 
has been broken using the almost linear-time 
split decomposition algorithm of [17]. The key 
ingredient was to insert the vertices according 
to a LexBFS ordering. Indeed, in the unique 
chord diagram of a prime circle graph G, the 
neighborhood of the last vertex x of a LexBFS 
ordering satisfies a sort of consecutiveness 
property. More precisely, the chord diagram of 
G contains a set of consecutive chord extremities 
starting and ending with the extremities of x’s 
chord and containing one and only one chord 
extremity per neighbor of x and no chord 
extremity of non-neighbors of x. This property 
is used to incrementally build the split tree of a 
circle graph using chord diagrams to represent 
prime nodes. It is worth to observe that the split 
tree of a circle graph G together with the chord 
diagrams of each of its prime nodes provides a 
canonical (linear space) representation of the set 
of (exponentially many) chord diagrams of G. 


Theorem 4 ([17]) Let G = (V, E) be a graph 
such that |V| = n and |E| = m. There exists 
a O(n + m)- a(n + m)-time algorithm, where 
a is the inverse Ackermann function, deciding 
whether G is a circle graph. Moreover, if G is 
a circle graph, the algorithm outputs a split-tree 
representation G from which any chord diagram 
of G can be extracted in linear time. 


Perfect Graphs 

The recent proof [6] of the famous conjecture of 
Berge on perfect graphs states that a graph is per- 
fect if and only if it does not contain an odd cycle 
of length at least 5 nor its complement as induced 
subgraph. It is easy to observe that a graph is 
perfect if and only if its prime components are 
perfect graphs. The split decomposition does not 
formally appear in the structural decomposition 
theorem of perfect graphs [6, 28] as it is sub- 
sumed by the so-called balanced skew partition. 
In the context of perfect graphs, parity graphs [4] 
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form a nice example of class of graphs simply 
characterized through their split decomposition. 
A graph is a parity graph if for every pair x, 
y of vertices, the length of every chordless path 
between x and y is of the same parity. This 
constraint can be translated into a condition on 
odd cycles or into a condition on their split tree. 
Indeed it can be proved that a graph is a parity 
graph if and only if its prime nodes are labeled 
with bipartite graphs [7]. 


Related Graph Decompositions 


Modular Decomposition 

The split decomposition is often introduced as 
a generalization of the modular decomposition 
(also known as homogeneous decomposi- 
tion) [19]. A module in a graph G = (V, E) isa 
subset M of vertices such that every vertex not in 
M is either fully adjacent or fully nonadjacent to 
the vertices of M. Clearly, if M is a module of 
size at least 2, then (M,V \ M) defines a split. 
Indeed the split decomposition is sometimes used 
to further decompose graphs that are primes with 
respect to the modular decomposition. 


Width Parameters 

Rank-width [24] and clique-width [10] are two 
important width parameters both sharing some 
connections with the split decomposition. As the 
rank-width of a graph is small if its clique-width 
is small and vice versa, we only briefly describe 
the former parameter. A rank-decomposition of a 
graph G is defined as a ternary tree whose leaves 
are in one-to-one correspondence with the ver- 
tices of G. It follows that every internal tree edge 
defines a bipartition, say (A, B) of the vertices of 
G. The rank-width of a bipartition (A, B) is de- 
fined as the rank of the incidence matrix between 
A and B, and the width of a rank-decomposition 
is the maximum width over its bipartitions. The 
rank-width of a graph G is then the minimum 
width over its rank-decompositions. Observe that 
the every split is a rank-width | bipartition. It 
follows that the rank-width of a graph is the max- 
imum rank-width of its prime components. As a 
consequence rank-width one graphs are exactly 
distance hereditary graphs. To conclude, let us 
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mention that computing the split decomposition 
of a graph is a key step in the polynomial- 
time recognition algorithm of clique-width three 
graphs [9]. 
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Squares and Repetitions, Fig. 1 The structure of RUNS(x) where x = baababaababbabaababaab = 
bz?(z*®)*b, for z = aabab. The operation -* is reversing the string 


Problem Definition 


Periodicities and repetitions in strings have been 
extensively studied and are important both in 
theory and practice (combinatorics of words, 
pattern-matching, computational biology). The 
words of the type ww and www, where w is 
a nonempty primitive (not of the form u* for 
an integer k > 1) word, are called squares and 
cubes, respectively. They are well-investigated 
objects in combinatorics on words [16] and in 
string-matching with small memory [5]. 

A string w is said to be periodic iff 
period(w) < |w|/2, where period(w) is the small- 
est positive integer p for which wii] = w[i + p] 
whenever both sides of the equality are defined. 
In particular each square and cube is periodic. 

A repetition in a string x = x1X2...X, is an 
interval [i .. 7] C [1 ..n] for which the associated 
factor x[i .. j] is periodic. It is an occurrence of 
a periodic word x|i .. j], also called a positioned 
repetition. A word can be associated with several 
repetitions, see Fig. 1. 

Initially people investigated mostly positioned 
squares, but their number is (2 (n log 7) [2], hence 
algorithms computing all of them cannot run in 
linear time, due to the potential size of the output. 
The optimal algorithms reporting all positioned 
squares or just a single square were designed 
in [1, 2, 3, 19]. Unlike this, it is known that 
only O(n) (un-positioned) squares can appear in 
a string of length n [8]. 

The concept of maximal repetitions, called 
runs (equivalent terminology) in [14], has been 
introduced to represent all repetitions in a suc- 
cinct manner. The crucial property of runs is that 


there are only O(n) runs in a word of length 
n[15, 21]. 

A run in a string x is an interval [i.. 7] 
such that both the associated string x[i ..j] has 
period p < (j —i + 1)/2, and the periodicity 
cannot be extended to the right nor to the left: 
x[i- 1] Ax[x+ p—1] and x[j —p+1] # 
x[j + 1] when the elements are defined. The set 
of runs of x is denoted by RUNS(x). An example 
is displayed in Fig. 1. 


Key Results 


The main results concern fast algorithms for 
computing positioned squares and runs, as well 
as combinatorial estimation on the number of 
corresponding objects. 


Theorem 1 (Crochemore [1], Apostolico- 
Preparata [2], Main-Lorentz [19]) There exists 
an O(nlogn) worst-case time algorithm for 
computing all the occurrences of squares in 
a string of length n. 


Techniques used to design the algorithms are 
based on partitioning, suffix trees, and naming 
segments. A similar result has been obtained 
by Franek, Smyth, and Tang using suffix ar- 
rays [11]. The key component in the next algo- 
rithm is the function described in the following 
lemma. 


Lemma 2 (Main-Lorentz [19]) Given two 
square-free strings u and v, reporting if uv 
contains a square centered in u can be done 
in worst-case time O(|ul). 
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Squares and Repetitions, Fig. 2. The f-factorization of the example string x = baababaababbabaababaab and 
the set of its internal runs; all other runs overlap factorization points 
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Squares and Repetitions, Fig. 3 If an overlapping run with period p starts in u, ends in v, and its part in v is of size 
at least p then it is easily detectable by computing continuations of the periodicity p in two directions: left and right 


Using suffix trees or suffix automata together 
with the function derived from the lemma, the 
following fact has been shown. 


Theorem 3 (Crochemore [3], Main-Lorentz 
[19]) Testing the square-freeness of a_ string 
of length n can be done in worst-case time 
O(n loga), where a is the size of the alphabet 
of the string. 


As a consequence of the algorithms and of the 
estimation on the number of squares, the most 
important result related to repetitions can be for- 
mulated as follows. 


Theorem 4 (Kolpakov-Kucherov [15], Rytter 
[21], Crochemore-Ilie [4]) 


(1) All runs in a string can be computed in linear 
time (on a fixed-size alphabet). 

(2) The number of all runs is linear in the length 
of the string. 


The point (2) is very intricate, it is of purely 
combinatorial nature and has nothing to do with 
the algorithm. We sketch shortly the basic com- 
ponents in the constructive proof of the point 
(1). The main idea is to use, as for the previous 
theorem, the f-factorization (see [3]): a string x 
is decomposed into factors u1,uU2,...,Ug, where 
u; is the longest segment which appears before 
(possibly with overlap) or is a single letter if the 
segment is empty. 


The runs which fit in a single factor are called 
internal runs, other runs are called here overlap- 
ping runs. There are three crucial facts: 


¢ all overlapping runs can be computed in linear 
time, 

¢ each internal run is a copy of an earlier over- 
lapping run, 

¢ the f-factorization can be computed in linear 
time (on a fixed-size alphabet) if we have the 
suffix tree or suffix automaton of the string. 
Figure 2 shows f-factorization and internal 
runs of an example string. 


It follows easily from the definition of the f- 
factorization that if a run overlaps two (consec- 
utive) factors u ,_, and uz then its size is at most 
twice the total size of these two s factors. 

Figure 3 shows the basic idea for computing 
runs that overlap u v in time O(|u| + |v|). Using 
similar tables as in the Morris—Pratt algorithm 
(border and prefix tables), see [6], we can test the 
continuation of a period p from position p in v 
to the left and to the right. The corresponding 
tables can be constructed in linear time in 
a preprocessing phase. After computing all 
overlapping runs the internal runs can be copied 
from their earlier occurrences by processing the 
string from left to right. 

Another interesting result concerning period- 
icities is the following lemma and its fairly im- 
mediate corollary. 
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Lemma 5 (Three Prefix Squares, Crochemore- 
Rytter [5]) Jf u, v, and w are three primitive 
words satisfying: |u| < |v| < |w|, uu is a prefix of 
vu, and vv is a prefix of ww, then |u| + |v| < |w| 


Corollary 1 Any nonempty string x possesses 
less than logg |y| prefixes that are squares. 


In the configuration of the lemma, a second 
consequence is that uu is a prefix of w. Therefore, 
a position in a string x cannot be the largest 
position of more than two squares, which yields 
the next corollary. A simple direct proof of it is 
by Ilie [13], see also [17]. 


Corollary 2 (Fraenkel and Simpson [8]) Any 
string x contains at most 2|x| (different) squares, 
that is: card{u | u primitive and u? factor of y} < 
2|x|. 


The structure of all squares and of un-positioned 
runs has been also computed within the same time 
complexities as above in [18] and [12]. 


Applications 


Detecting repetitions in strings is an important 
element of several questions: pattern matching, 
text compression, and computational biology 
to quote a few. Pattern-matching algorithms 
have to cope with repetitions to be efficient 
as these are likely to slow down the process; 
the large family of dictionary-based text 
compression methods use a weaker notion of 
repeats (like the software gzip); repetitions 
in genomes, called satellites, are intensively 
studied because, for example, some over-repeated 
short segments are related to genetic diseases; 
some satellites are also used in forensic crime 
investigations. 


Open Problems 
The most intriguing question remains the 
asymptotically tight bound for the maximum 
number p(7) of runs in a string of size n. The 
first proof (by painful induction) was quite 
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difficult and has not produced any concrete 
constant coefficient in the O(n) notation. This 
subject has been studied in [9, 10, 22, 23]. 
The best-known lower bound of approximately 
0.927n is from [10]. The exact number of 
runs has been considered for special strings: 
Fibonacci words and (more generally) Sturmian 
words [7, 14, 20]. It is proved in a structural 
and intricate manner in the full version of [21] 
that p(n) < 3.44n, by introducing a sparse- 
neighbors technique. The neighbors are runs 
for which both the distance between their 
starting positions is small and the difference 
between their periods is also proportionally 
small (according to some fixed coefficient of 
proportionality). The occurrences of neighbors 
satisfy certain sparsity properties which imply 
the linear upper bound. Several variations 
for the definitions of neighbors and sparsity 
are possible. Considering runs having close 
centers the bound has been lowered to 1.6n 
in [4]. 

As a conclusion, we believe that the following 
fact is valid. 


Conjecture: A string of length n contains less 
than 7 runs, i.e., JRUNS|(7) <n. 
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Problem Definition 


The objective in stable matching problems is to 
match together pairs of elements of a set of par- 
ticipants, taking into account the preferences of 
those involved and focusing on a stability require- 
ment. The stability property ensures that no pair 
of participants would both prefer to be matched 
together rather than to accept their allocation in 
the matching. Such problems have widespread 
application, for example, in the allocation of 
medical students to hospital posts, students to 
schools or colleges, etc. 

An instance of the classical stable marriage 
problem (SM), introduced by Gale and Shapley 
[2], involves a set of 2” participants comprising n 
men {m,,..., M,} and nm women {w1,..., Wn}. 
Associated with each participant is a preference 
list, which is a total order over the participants 
of the opposite sex. A man m; prefers woman 
w; to woman wz if w; precedes wz on the 
preference list of m; and similarly for the women. 
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A matching M is a bijection between the sets of 
men and women, in other words a set of man- 
woman pairs so that each man and each woman 
belongs to exactly one pair of M. For a man m;, 
M(m;) denotes the partner of m; in M, i.e., the 
unique woman w,; such that (m;,w;) is in M. 
Similarly, M(w;) denotes the partner of woman 
w; in M. A matching M is stable if there is no 
blocking pair, namely, a pair (m;,w;) such that 
m; prefers w; to M(m;) and w; prefers m; to 
M (w J ) Fs 

Relaxing the requirements that the numbers 
of men and women are equal and that each 
participant should rank all of the members of the 
opposite sex gives the stable marriage problem 
with incomplete lists (SMI). So an instance of 
SMI comprises a set of ny men {m,...,™n1} 
and a set of nz women {w1,...,Wy2}, and each 
participant’s preference list is a total order over 
a subset of the participants of the opposite sex. 
The implication is that if woman w; does not 
appear on the list of man m;, then she is not an 
acceptable partner for m; and vice versa. A man- 
woman pair is acceptable if each member of the 
pair is on the preference list of the other, and a 
matching M is now a set of acceptable pairs such 
that each man and each woman is in at most one 
pair of M. In this context, a blocking pair for 
matching M is an acceptable pair (m;,w;) such 
that m; either is unmatched in M or prefers w; 
to M(m;) and, likewise, w; either is unmatched 
or prefers m; to M(w;). A matching is stable 
if it has no blocking pair. So in an instance of 
SMI, a stable matching need not match all of the 
participants. 

Gale and Shapley also introduced a many- 
one version of stable marriage, which they 
called the college admissions problem, but 
which is now more usually referred to as the 

Hospitals/Residents Problem (HR) because 
of its well-known applications in the medical 
employment field. This problem is covered in 
detail in Entry 150 of this volume. 

A comprehensive treatment of many aspects of 
the stable marriage problem, as of 1989, appears 
in the monograph of Gusfield and Irving [5]. 
A more recent detailed exposition is given by 
Manlove [14]. 
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Key Results 


Theorem 1 For every instance of SM or SMI, 
there is at least one stable matching. 


Theorem | was proved constructively by Gale 
and Shapley [2] as a consequence of the algo- 
rithm that they gave to find a stable matching. 


Theorem 2 /. For a given instance of SM in- 
volving n men and n women, there is a O(n) 
time algorithm that finds a stable matching. 

2. For a given instance of SMI in which the 
combined length of all the preference lists is 
a, there is a O(a) time algorithm that finds a 
stable matching. 


The algorithm for SMI is a simple extension 
of that for SM. Each can be formulated in a 
variety of ways, but is most usually expressed 
in terms of a sequence of “proposals” from the 
members of one sex to the members of the other. 
A pseudocode version of the SMI algorithm ap- 
pears in Fig. 1, in which the traditional approach 
of allowing men to make proposals is adopted. 

The complexity bound of Theorem 2(1) first 
appeared in Knuth’s monograph on stable mar- 
riage [12]. The fact that this algorithm is asymp- 
totically optimal was subsequently established by 
Ng and Hirschberg [17] via an adversary argu- 
ment. On the other hand, Wilson [21] proved that 
the average running time, taken over all possible 
instances of SM, is O(n log n). 

The algorithm of Fig. 1, in its various guises, 
has come to be known as the Gale-Shapley algo- 
rithm. The variant of the algorithm given here is 
called man oriented, because men have the ad- 
vantage of proposing. Reversing the roles of men 
and women gives the woman-oriented variant. 
The “advantage” of proposing is remarkable, as 
spelled out in the next theorem. 


Theorem 3 The man-oriented version of the 
Gale-Shapley algorithm for SM or SMI yields the 
man-optimal stable matching in which each man 
has the best partner that he can have in any stable 
matching, but in which each woman has her worst 
possible partner. The woman-oriented version 
yields the woman-optimal stable matching, which 
has analogous properties favoring the women. 
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M=9; 
assign each person to be free; 
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/* i. e., not a member of a pair in M */ 


while (Some man m is free and has not proposed to every woman on his list) 
m proposes to w, the first woman on his list to whom he has not proposed; 


if (w is free) 
add (m, w) to M; 


/* w accepts m */ 


else if (w prefers m to her current partner m’) 
remove (m', w) from M; /* w rejects m’, setting m’ free */ 


add (m, w) to M; 
else 
M remains unchanged; 
return M; 


Stable Marriage, Fig. 1 The Gale-Shapley algorithm 


The optimality property of Theorem 3 was 
established by Gale and Shapley [2], and the 
corresponding “pessimality” property was first 
observed by McVitie and Wilson [16]. 

As observed earlier, a stable matching for 
an instance of SMI need not match all of the 
participants. But the following striking result was 
established by Gale and Sotomayor [3] and Roth 
[19] Gn the context of the more general HR 
problem). 


Theorem 4 In an instance of SMI, all stable 
matchings have the same size and match exactly 
the same subsets of the men and women. 


For a given instance of SM or SMI, there may 
be many different stable matchings. Indeed Knuth 
[12] showed that the maximum possible number 
of stable matchings grows exponentially with the 
number of participants. He also pointed out that 
the set of stable matchings forms a distributive 
lattice under a natural dominance relation, a result 
attributed to Conway. This powerful algebraic 
structure that underlies the set of stable matchings 
can be exploited algorithmically in a number of 
ways. For example, Gusfield [4] showed how all 
k stable matchings for an instance of SM can be 
generated in O(n? + kn) time (> Optimal Stable 
Marriage). 

Extensions of these problems that are impor- 
tant in practice, so-called SMT and SMTI (ex- 
tensions of SM and SMI, respectively), allow the 


/* w accepts m */ 


/* w rejects m */ 


presence of ties in the preference lists. In this con- 
text, three different notions of stability have been 
defined [7] — weak, strong, and super-stability, 
depending on whether the definition of a blocking 
pair requires that both members should improve, 
or at least one member improves and the other 
is no worse off, or merely that neither member is 
worse off. The following theorem summarizes the 
basic algorithmic results for these three varieties 
of stable matchings. 


Theorem 5 For a given instance of SMT or 
SMTI: 


1. A weakly stable matching is guaranteed to 
exist and can be found in O(n”) or O(a) time, 
respectively. 

2. A super-stable matching may or may not exist; 
if one does exist, it can be found in O(n?) or 
O(a) time, respectively. 

3. A strongly stable matching may or may not 
exist; if one does exist, it can be found in 
O(n3) or O(na) time, respectively. 


Theorem 5 parts (1) and (2) are due to Irving [7] 
(for SMT) and Manlove [13] (for SMTI). Part (3) 
is due to Kavitha et al. [11], who improved earlier 
algorithms of Irving and Manlove. 

It turns out that, in contrast to the situation 
described by Theorem 4, weakly stable match- 
ings in SMTI can have different sizes. The nat- 
ural problem of finding a maximum cardinality 
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weakly stable matching, even under severe re- 
strictions on the ties, is NP-hard [15]. » Stable 
Marriage with Ties and Incomplete Lists explores 
this problem further. 

Interesting special cases of SM and its variants 
arise when the preference lists on one or both 
sides are derived from a “master” list that ranks 
participants (e.g.,, according to some objective 
criterion). Such problems are explored by Irving 
et al. [10]. 

The stable marriage problem is an example 
of a bipartite matching problem. The extension 
in which the bipartite requirement is dropped 
is the so-called stable (SR) 
problem. 

Gale and Shapley had observed that, unlike the 
case of SM, an instance of SR may or may not ad- 
mit a stable matching, and Knuth [12] posed the 
problem of finding an efficient algorithm for SR 
or proving it NP-complete. Irving [6] established 
the following theorem via a nontrivial extension 
of the Gale-Shapley algorithm. 
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Theorem 6 For a given instance of SR, there ex- 
ists a O(n?) time algorithm to determine whether 
a stable matching exists and if so to find such a 
matching. 


Variants of SR may be defined, as for SM, in 
which preference lists may be incomplete and/or 
contain ties — these are denoted by SRI, SRT, 
and SRTI — and in the presence of ties, the three 
flavors of stability, weak, strong, and super, are 
again relevant. 


Theorem 7 For a given instance of SRT or 
SRTI: 


1. A weakly stable matching may or may not 
exist, and it is an NP-complete problem 
to determine whether such a_ matching 
exists. 

2. A super-stable matching may or may not exist; 
if one does exist, it can be found in O(n?) or 
O(a) time, respectively. 

3. A strongly stable matching may or may not 
exist; if one does exist, it can be found in 
O(n*) or O(a?) time, respectively. 
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Theorem 7 part (1) is due to Ronn [18], part (2) 
is due to Irving and Manlove [9], and part (3) is 
due to Scott [20]. 


Applications 


Undoubtedly the best known and most important 
applications of stable matching algorithms are 
in centralized matching schemes in the medical 
and educational domains. » Hospitals/Residents 
Problem includes a summary of some of these 
applications. 


Open Problems 


The parallel complexity of stable marriage 
remains open. The best known parallel algorithm 
for SMI is due to Feder et al. [1] and has 
O(./a log? a) running time using a polynomially 
bounded number of processors. It is not known 
whether the problem is in NC, but nor is there a 
proof of P-completeness. 

One of the open problems posed by Knuth in 
his early monograph on stable marriage [12] was 
that of determining the maximum possible num- 
ber x, of stable matchings for any SM instance 
involving m men and 1 women. This problem 
remains open, although Knuth himself showed 
that x, grows exponentially with n. Irving and 
Leather [8] conjecture that, when 7 is a power of 
2, this function satisfies the recurrence 


Xn = 3% - oe jas 


Many open problems remain in the setting of 
weak stability, such as finding a good approxima- 
tion algorithm for a maximum cardinality weakly 
stable matching — see » Stable Marriage with 
Ties and Incomplete Lists — and enumerating all 
weakly stable matchings efficiently. 
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Problem Definition 


In the stable marriage problem first defined by 
Gale and Shapley [7], there is one set each of 
men and women having the same size, and each 
person has a strict preference order on persons 
of the opposite gender. The problem is to find a 
matching such that there is no pair of a man anda 
woman who prefer each other to their partners in 
the matching. Such a matching is called a stable 
marriage (or stable matching). Gale and Shap- 
ley showed the existence of a stable marriage 
and gave an algorithm for finding one. Fleiner 
[4] extended the stable marriage problem to the 
framework of matroids, and Eguchi, Fujishige, 
and Tamura [3] extended this formulation to a 
more general one in terms of discrete convex 
analysis, which was developed by Murota [8, 9]. 
Their formulation is described as follows. 

Let M and W be sets of men and women who 
attend a dance party at which each person dances 
a waltz T times and the number of times that 
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he/she can dance with the same person of the op- 
posite gender is unlimited. The problem is to find 
an “agreeable” allocation of dance partners, in 
which each person is assigned at most T persons 
of the opposite gender with possible repetition. 
Let E = M x W, Le., the set of all man-woman 
pairs. Also define Ei) = {i} x W for alli ¢ M 
and £,;) = M x {j} forall 7 ¢ W. Denoting by 
x(i, 7) the number of dances between man i and 
woman /, an allocation of dance partners can be 
described by a vectorx = (x(@i,j):i€M,j € 
W) € Z® ,, where Z denotes the set of all integers. 
For each y € Z¥ andk € MUW, denote by Vk) 
the restriction of y on E(x). For example, for an 
allocation x € ZF, Xk) represents the allocation 
of person k with respect to x. Each person k 


Ai(x(1, 2), x1, 3) = 
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describes his/her preferences on allocations by 
using a value function f ZFw) +> RU 
(—oco), where R denotes the set of all reals and 
f(y) = —0o means that allocation y ¢ Z2 
is unacceptable for k. Note that the valuation of 
each person on allocations is determined only by 
his/her allocations. Let dom fy = {y|fx(y) € 
R}. Assume that each value function /;, satisfies 
the following assumption: 

(A) dom f; is bounded and hereditary and has 
0 as the minimum point, where 0 is the vector of 
all zeros and heredity means that for any y, y’ € 
ZEw,0 < y’ < y € dom fy implies y’ € 
dom fx. 

For example, the following value functions 
with M = {1} and W = {2,3} 


10(x(1, 2) + x(1,3)) — x(1, 2)? — x(1, 3)? if x(1, 2), x(1, 3) = 0 


—0o 


fi@0, 7) = 


—oo otherwise 


and x(1,2) + x(1,3) <3 
otherwise, 


x(1, j) if x(1, j) € {0,1,2,3}(7 = 2,3) 


represent the case where (1) everyone wants to 
dance as many times, up to three, as possible and 
(2) man | wants to divide his dances between 
women 2 and 3 as equally as possible. Alloca- 
tions (x(1,2),x(1,3)) = (1,2) and (2,1) are 
stable in the sense below. 

A vector x € ZF is called a feasible allocation 
if xq@ € dom f, forall k ¢€ MUW. An 
allocation x is said to satisfy incentive constraints 
if each person has no incentive to unilaterally 
decrease the current units of x, that is, if it 
satisfies 


Sk (XG) =max{ fc) YS XG} (VK © MUW). 


(1) 
An allocation x is called unstable if it does not 
satisfy incentive constraints or there existi ¢ M, 
7 €W,y' € ZF and y” € ZF”) such that 


fix) < fi”), (2) 


YU@IDS*GI) WIE WV), GB) 


fieq) < fi"), (4) 
Va Asx, j) Wi € M\{i}), (©) 
yi, fj) =y"G J). (6) 


Conditions (2) and (3) say that man i can strictly 
increase his valuation by changing the current 
number of dances with j without increasing the 
numbers of dances with other women, and (4) 
and (5) describe a similar situation for women. 
Condition (6) requires that i and 7 agree on the 
number of dances between them. An allocation x 
is called stable if it is not unstable. 


Problem I Given disjoint sets M and W and 
value functions fy : Z~“ > RU {—oo} for 
k € M U W satisfying assumption (A), find a 
stable allocation x. _ 
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Remark I A time schedule for a given feasible 
allocation can be given by a famous result on 
graph coloring, namely, “any bipartite graph can 
be edge-colorable with the maximum degree 
colors.” 


Key Results 


The work of Eguchi, Fujishige, and Tamura [3] 
gave a solution to Problem | in the case where 
each value function f% is M'-concave. 


Discrete Convex Analysis: 
M'-Concave Functions 


Let V be a finite set. For each S C V, eg 
denotes the characteristic vector of S' defined by 
es(v) = lif v € S and es(v) = 0 otherwise. 
Also define eo as the zero vector in Z”. For a 
vector x € ZY, its positive support supp* (x) 
and negative support supp’ (x) are defined by 
supp* (x) = {u € V|x(u) > 0} and supp” (x) = 
{u € V|x(u) < O}. A function f : ZY > 
R U {—o0} is called M4-concave if it satisfies 
the following condition Vx, y € dom f, Vue 


supp* (x — y), du € supp” (x — y) U {0}: 
Fx)+ SY) = fe —-eut ev) + f(y + eu— ey). 


The above condition says that the sum of the 
function values at two points does not decrease as 
the points symmetrically move one or two steps 
closer to each other on the set of integral lattice 
points of Z”. This is a discrete analogue of the 
fact that for an ordinary concave function, the 
sum of the function values at two points does not 
decrease as the points symmetrically move closer 
to each other on the straight line segment between 
the two points. 


Example 1 A nonempty family 7 of subsets of 
V is called a laminar family if X NY = @, 
X CYorY C X holds for every X,Y € T. 
For a laminar family 7 and a family of univariate 
concave functions fy : R > RU {—oo} indexed 
by Y € 7, the function f : Z” + RU {—oo} 
defined by 
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{M= >of (x: ra) (Vx eZ") 


YeT veY 


is M'-concave. The stable marriage problem can 
be formulated as Problem | by using value func- 
tions of this type. 


Example 2 For the independence family Z ¢ 2” 
of a matroid on V and w € R", the function f : 
Z’ —> RU {—o0} defined by 


ex WU) if x = ex for some X eT 
otherwise 


f@)= 


—Oo 


(Vx eZ") 


is M!-concave. Fleiner [4] showed that there al- 
ways exists a stable allocation for value functions 
of this type. 


Theorem 1 ((6]) Assume that the value func- 
tions {x(k € M UW) are M'-concave satisfying 
(A). Then, a feasible allocation x is stable if 
and only if there exist Zy = (zq@|li € M) € 
(Z U {+00})¥ and zw = (zyli € W) € 
(ZU {+00})* such that 


x@y € argmax{ fi(yly Sz@H} (Wie M), 


(7) 
xq) € argmax{ fi(y)ly < zy} (Vi € W), 

(8) 
zm(e) = +o0orzw(e)=+00 (Vee E), 

(9) 


where arg max{ fj (y)|y < Zi} denotes the set of 
all maximizers of f; under the constraints y < 


Z(i)- 
Theorem 2 ((3]) Assume that the value func- 


tions {x(k € M UW) are M'-concave satisfying 
(A). Then, there always exists a stable allocation. 


Eguchi, Fujishige, and Tamura [3] proved The- 
orem 2 by showing that the following algorithm 
finds a feasible allocation x, and zy, Zw satisfy- 
ing (7), (8), and (9). 

Here, zw V xy is defined by (zw Vx )(e) = 
max{zw(e),xm(e)} for alle € E. 
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Algorithm EXTENDED-GS 


Input: M'-concave functions fiz, fw with faz (x) = 


dX fi(xay) and fw (x) = YL fi (xy); 
iceM JEW 


Output: (x, Z47, Zw) satisfying (7), (8), and (9); 
ZM := (+00,:-:,+00),Zw := xw :=0; 
repeat{ 

let x4 be any element in 

arg max{ fi (y)|xw Sy S Zu}: 

let xw be any element in 

arg max{ fy (y)|y < xm}: 

for eache € E with xy (e) > xw(e){ 

zm (e) := xw(e); 

Zw (e) = +00; 

}3 

} until xy = xw; 

return (xy7,ZuM,Zw V Xm). 


Applications 


Abraham, Irving, and Manlove [1] dealt with 
a student-project allocation problem which is a 
concrete example of models in [4] and [3] and 
discussed the structure of stable allocations. 

Fleiner [5] generalized the stable marriage 
problem and its extension in [4] to a wide frame- 
work and showed the existence of a stable alloca- 
tion by using a fixed point theorem. 

Fujishige and Tamura [6] proposed a common 
generalization of the stable marriage problem and 
the assignment game defined by Shapley and 
Shubik [10] by utilizing M'-concave functions 
and gave a constructive proof of the existence of 
a stable allocation. 


Open Problems 


Algorithm EXTENDED-GS solves the maximiza- 
tion problem of an M'-concave function in each 
iteration. A maximization problem of an M!- 
concave function f on EF can be solved in poly- 
nomial time in | Z| and log L, where L = 
max{||x — y||oo|x,y € dom f}, provided that 
the function value f(x) can be calculated in 
constant time for each x [11, 12]. Eguchi, Fu- 
jishige, and Tamura [3] showed that EXTENDED- 
GS terminates after at most L iterations, where 
L is defined by {||x||oo|x € dom fz} in this 
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case, and there exist a series of instances in which 
EXTENDED-GS requires numbers of iterations 
proportional to L. On the other hand, Baiou and 
Balinski [2] gave a polynomial time algorithm 
in | E | for the special case where fry and fw 
are linear on rectangular domains. Whether a 
stable allocation for the general case can be found 
in polynomial time in || and logL or not 
is Open. 
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Problem Definition 


Over the last 50 years, the stable marriage 
problem has been extensively studied for many 
problem settings (see, e.g., [11]), and one of 
the most intensively studied problem settings is 
MAX SMTI (MAXimum Stable Marriage with 
Ties and Incomplete lists). An input for the stable 
matriage problem consists of n men, 1 women, 
and each person’s preference list for the people of 
the opposite sex. In MAX SMTI, the preference 
list of each person can be incomplete, which 
means that each person is allowed to exclude 
unacceptable people from the preference list, and 
the preference list of each person is allowed to 
include ties to show indifference between two or 
more people. 


Stable Marriage with One-Sided Ties 


The objective of MAX SMTI is to find the 
largest matching that satisfies a stability condi- 
tion. Before describing the stability condition, we 
review some notation. A matching M is defined 
as a set of pairs of man m and woman w such 
that m and w are acceptable to each other. The 
size of a matching M is defined as the number of 
pairs in M. We say that a person p is single if p 
is not matched in M. When man m and woman 
w are matched in M, we write M(m) = w and 
M(w) = m. We say that matching M is stable if 
it does not contain any pair of man and woman, 
each of whom prefers the other to the partner 
in M cf any). More precisely, a matching M is 
stable if there is no pair of man m’ and woman w’ 
that satisfy all three conditions (i)—(iii): (i) m’ and 
w’ are acceptable to each other but not matched 
in M, (ii) m’ is single in M or m’ strictly prefers 
w’ to M(m’), and (iii) w’ is single in M or w’ 
strictly prefers m’ to M(w’). MAX SMTI asks 
us to find a stable matching of the largest size, 
and this problem is known to be NP-hard [12]. 
Therefore, the approximability of this problem 
has been intensively studied. 

In this entry, we show recent results for two 
major variants of MAX SMTI. One of the vari- 
ants is MAX SMOTI (MAXimum Stable Mar- 
riage with One-Sided Ties and Incomplete lists), 
in which only women are allowed to include ties 
in their preference lists and the preference lists 
of men are strictly ordered. The other variant 
is MAX SSMTI (Special SMTI), which is an 
even more restricted variant of MAX SMOTI 
where the ties are only allowed at the ends of 
the women’s preference lists. Note that these two 
variants are still known to be NP-hard [12]. 


Problem 1 (MAX SMOTI) 

INPUT: m men, m women, and each person’s 
preference list, where only women have ties 

OUTPUT: A stable matching of maximum size 


Problem 2 (MAX SSMTI) 

INPUT: nm men, m women, and each person’s 
preference list, where ties are at the ends of 
the women’s preference lists 

OUTPUT: A stable matching of maximum size 


Stable Marriage with One-Sided Ties 
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Stable Marriage with One-Sided Ties, Table 1 Examples of instances for MAX SMOTI and MAX SSMTI 


MAX SMOTI 
mM, >w2wyi W1:m,mMo2 


M2:W2W3W, W2: (my m>) m3 


M3: W3 W2 wWw3:Mm2mM3 
Examples 
Table 1 shows examples of instances for 


MAX SMOTI and MAX SSMTI. The instance 
for MAX SMOTI contains a set of men 
{m,,m2,m3} and a set of women {w], W2, w3}. 
The preference list of each person is described in 
decreasing order of preference, and tied people 
are enclosed in a pair of parenthesis. For example, 
woman Ww? is indifferent between m , and mz but 
prefers m, or mz over m3. A matching M = 
{(m2, W1), (m3, w2)} is not stable for this MAX 
SMOTI instance, because my strictly prefers wz 
to wy (= M(mz)) and wy strictly prefers mz to 
m3 (= M(w2)). An example of a stable matching 
for this instance is M’ = {(m 1, w2), (m2, w3)}, 
and we can find another larger stable matching 
M* = {(m,, W1), (M2, W2), (m3, w3)} of size 3. 


Key Results 


Here we review past research on MAX SMOTI 
and MAX SSMTI. We start by describing a sim- 
ple proposal-based algorithm (often referred to 
as the Gale-Shapley algorithm or the deferred 
acceptance algorithm), which is guaranteed to 
find a stable matching. In this algorithm, all 
of the men and women are initially set to be 
single. We pick an arbitrary man m who is single, 
and let man m propose to woman w at the top 
of his preference list. When man m proposes 
to w, he deletes woman w from his preference 
list. Woman w always accepts any proposal if 
she is single, which makes a matching pair of 
m and w. We repeat this proposal procedure to 
find more and more matching pairs. When a 
woman w, who is already matched to a man m, 
receives another proposal from man m’, woman 
w chooses the more highly ranked man based on 
her preference list. (That is, the matching partner 


MAX SSMTI 


Mm, 2 W1 W3 w 1: (m1, mz m3) 
W2:M2 


w3 i m2 (m, m3) 


M2 > W2W3W1 
M32W3Wi1 


of w is unchanged if w prefers m to m’, and the 
matching partner of w is changed from m to m’ 
and m becomes unmatched if w prefers m’ to 
m.) If m and m’ are tied in w’s preference list, 
then w chooses an arbitrary man. The proposal 
procedure continues until we cannot find any 
man who can propose. (That is, this algorithm 
terminates when all of the men become matched 
or the preference lists of all single men become 
empty.) Any matching obtained by this algorithm 
can be proven to be stable. The size of the 
obtained stable matching depends mostly on the 
decisions by women when a woman receives two 
proposals from men who are tied in her prefer- 
ence list. In the worst case, the size of an obtained 
matching can be half of the optimum matching, 
and hence, the approximation ratio of this al- 
gorithm is 2. It was an open problem whether 
or not there exists an approximation algorithm 
whose approximation ratio is strictly better than 
2. Iwama, Miyazaki, and Yamauchi [8] provided 
an affirmative answer for this open problem with 
a 1.875-approximation algorithm. 

After this breakthrough, Kiraly [10] devel- 
oped a new simple 1|.5-approximation algorithm 
for MAX SMOTI (which also applies to MAX 
SSMTI) by improving the decision strategy of the 
proposal-based algorithm when women receive 
multiple proposals from tied men. His algorithm 
proceeds in the same way as the proposal-based 
algorithm until one of the men’s preference lists 
become empty. When the preference list of a man 
becomes empty, he enters into his second round. 
Specifically, he recovers his original preference 
list so that he can propose to the women in 
his original preference list again, but his status 
is changed to “promoted.” A promoted man is 
not allowed to recover his original preference 
list when his preference list becomes empty the 
second time, and hence, no man can enter a 
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third round in Kirary’s algorithm. The decision 
strategy of the women is changed so that a woman 
is forced to choose a promoted man (if one exists) 
when she receives two proposals from men who 
are tied in her preference list. This improvement 
of the decision strategy is the key to achieve the 
1.5-approximation. 

Iwama, Miyazaki, and Yanagisawa [9] further 
improved the approximation ratio to 25/17 (< 
1.4706) for MAX SMOTI with a new algorithm 
GSA-LP, which uses a more complex proposal 
sequence of the men and a more sophisticated 
decision strategy for the women. In GSA-LP, 
we compute an optimum solution for a linear 
programming relaxation of a natural integer pro- 
gramming formulation of the problem in advance 
and use it for the decision strategy of the women. 
In addition, the proposal sequence is changed 
so that a man can propose to a woman many 
times, and a man is allowed to recover his original 
preference list at most twice (in other words, a 
man is allowed to go into a third round). These 
changes yield an improved approximation ratio 
for MAX SMOTI. Very recently, GSA-LP was 
shown to achieve a |.25-approximation for MAX 
SSMTI [5]. 

For MAX SMOTI, there are successive im- 
provements over GSA-LP. Huang and Kavitha [4] 
developed another new algorithm that achieves 
a 22/15 (<1.4667)-approximation by using 
a maximum matching algorithm. Radnai [14] 
showed 41/28 (<1.4643)-approximation by 
using a more detailed analysis of this new 
algorithm and also showed that a lower bound 
of the approximation ratio of this algorithm is 
at least 13/9 (>1.4444). Dean and Jalasutram 
improved the analysis of GSA-LP and showed 
that the approximation ratio of GSA-LP is 
at most 19/13 (<1.4616) if we increase the 
number of rounds from three to four [1]. 
We also note that if the lengths of ties are 
restricted to two, then the approximation ratio 
of this restricted MAX SMOTI variant can be 
further improved. A randomized algorithm [2] 
achieves 10/7 (<1.4286)-approximation and 
Huang and Kavitha devised another deterministic 
algorithm [4] with the same approximation 
ratio. 
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For the negative side, both MAX SMOTI and 
MAX SSMTI are NP-hard to approximate within 
any constant factor better than 21/19 (>1.1052) 
and hard to approximate within any constant 
factor better than 5/4 (=1.25) under the unique 
games conjecture [3, 15]. These lower bounds 
hold even if we restrict the lengths of the ties 
to two. Note that the approximation ratio of the 
GSA-LP algorithm for MAX SSMTI is 1.25, 
which matches the lower bound under the unique 
games conjecture. 


Applications 


MAX SSMTI was introduced by Irving and 
Manlove [6] based on an actual application of 
the Scottish Foundation Allocation Scheme, 
which allocates residents (medical students) to 
hospitals. In this scheme, each resident submits 
a strictly ordered preference list, while each 
hospital submits a preference list that may 
contain one tie of arbitrary length at the end 
of the list. The objective of this allocation 
scheme is to maximize the number of allocated 
residents, and it is easy to reformulate this 
many-to-one allocation scheme as a one-to- 
one matching problem (MAX SSMTI) using a 
cloning technique [11]. 


Open Problems 


An obvious future goal is to narrow the gap 
between the upper and lower bounds of the 
approximability of MAX SMOTI. Assuming 
the unique games conjecture is true, we now 
know that the best possible approximation 
ratio is between 1.4616 and 1.25. Even if we 
restrict the lengths of ties to two, all we can do 
now is reduce the upper bound slightly down 
to 1.4286. Thus, there is still much room for 
improvement. 

As for MAX SSMTI, the 1.25-approximation 
of the GSA-LP algorithm is the best possible 
if the unique games conjecture is true. A fu- 
ture project could investigate if we can construct 
a faster approximation algorithm, because the 
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GSA-LP algorithm uses a linear programming re- 
laxation technique, which takes superlinear time 
in the worst case. 


Experimental Results 


Irving and Manlove [7] reported on experimental 
evaluations of some _ heuristic algorithms 
including the Kirdly’s algorithm on real-world 
and random instances for MAX SMOTI. 
Subsequently, Podhradsky [13] conducted 
experimental evaluations on random instances 
for MAX SMOTI and MAX SSMTI using 
some other heuristic algorithms including 
GSA-LP. 
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Problem Definition 


In the original setting of the stable marriage 
problem introduced by Gale and Shapley [2], 
each preference list has to include all members of 
the other party, and furthermore, each preference 
list must be totally ordered (see entry » Stable 
Marriage also). 

One natural extension of the problem is then 
to allow persons to include ties in preference 
lists. In this extension, there are three variants 
of the stability definition, super-stability, strong 
stability, and weak stability (see below for defini- 
tions). In the first two stability definitions, there 
are instances that admit no stable matching, but 
there is a polynomial-time algorithm in each case 
that determines if a given instance admits a stable 
matching and finds one if one exists [9]. On the 
other hand, in the case of weak stability, there 
always exists a stable matching, and one can be 
found in polynomial time. 

Another possible extension is to allow persons 
to declare unacceptable partners, so that prefer- 
ence lists may be incomplete. In this case, every 
instance admits at least one stable matching, but 
a stable matching may not be a perfect matching. 
However, if there are two or more stable match- 
ings for one instance, then all of them have the 
same size [3]. 

The problem treated in this entry allows both 
extensions simultaneously, which is denoted as 
SMTI (stable marriage with ties and incomplete 
lists). 


Notations 

An instance J of SMTI comprises 1 men, n 
women, and each person’s preference list that 
may be incomplete and may include ties. If a man 
m includes a woman w in his list, w is acceptable 
to m. Ww; >m Ww; means that m strictly prefers 
w; tow; in J. w; =m w; means that w; and w; 
are tied in m’s list (including the case w; = w;). 
The statement w; =, w, is true if and only if 
Wi >m Wj OF Wi =m W;. Similar notations are 
used for women’s preference lists. A matching M 
is a set of pairs (m,w) such that m is acceptable 
to w, and vice versa, and each person appears at 
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most once in M. If a man m is matched with a 
woman w in M, it is written as M(m) = w and 
M(w) =m. 

A man m and a woman w are said to form 
a blocking pair for weak stability for M if 
they are not matched together in M, but by 
matching them, both become better off, namely, 
(i) M(m) # w but m and w are acceptable to 
each other, (11) w >» M(m) or m is single in M, 
and (iii) m >, M(w) or w is single in M. 

Two persons x and y are said to form a 
blocking pair for strong stability for M if they 
are not matched together in M, but by matching 
them, one becomes better off, and the other does 
not become worse off, namely, (i) M(x) 4 y but 
x and y are acceptable to each other, (ii) y > x 
M(x) or x is single in M, and (iii) x >, M(y) 
or y is single in M. 

A man m and a woman w are said to form 
a blocking pair for super-stability for M if they 
are not matched together in M, but by match- 
ing them, neither becomes worse off, namely, 
(i) M(m) # w but m and w are acceptable to 
each other, (ii) w =m M(m) or m is single in M, 
and (iii) m >,, M(w) or w is single in M. 

A matching M is called weakly stable 
(strongly stable and super-stable, respectively) 
if there is no blocking pair for weak (strong and 
super, respectively) stability for M. 


Problem 1 (SMTID 


INPUT: ” men, m women, and each person’s 
preference list 
OutTPuT: A stable matching 


Problem 2 (MAX SMTI) 


INPUT: 7 men, 7 women, and each person’s 
preference list 
OutTPuT: A stable matching of maximum size 


The following problem is a restriction of MAX 
SMTI in terms of the length of preference lists: 


Problem 3 ((p,q)-MAX SMTD) 


INPUT: ” men, 7 women, and each person’s pref- 
erence list, where each man’s preference list 
includes at most p women and each woman’s 
preference list includes at most g men 

OutTPuUT: A stable matching of maximum size 


Stable Marriage with Ties and Incomplete Lists 


Definition of the Approximation Ratio 

A goodness measure of an approximation al- 
gorithm T for a maximization problem is de- 
fined as follows: the approximation ratio of T 
is max{opt(x)/T(x)} over all instances x of 
size N, where opt(x) and T(x) are the sizes of 
the optimal and the algorithm’s solutions, respec- 
tively. 


Key Results 


SMTI and MAX SMT in Super-Stability and 
Strong Stability 

Theorem 1 ((20]) There is an O(n*)-time algo- 
rithm that determines if a given SMTI instance 
admits a super-stable matching and finds one if 
one exists. 


Theorem 2 ({17]) There is an O(n?)-time algo- 
rithm that determines if a given SMTI instance 
admits a strongly stable matching and finds one 
if one exists. 


It is shown that all stable matchings for a fixed 
instance are of the same size [20]. Therefore, the 
above theorems imply that MAX SMTI can also 
be solved in the same time complexity. 


SMTI and MAX SMTI in Weak Stability 

In the case of weak stability, every instance ad- 
mits at least one stable matching, but one instance 
can have stable matchings of different sizes. If the 
size is not important, a stable matching can be 
found in polynomial time by breaking ties arbi- 
trarily and applying the Gale-Shapley algorithm. 


Theorem 3 There is an O(n?)-time algorithm 
that finds a weakly stable matching for a given 
SMTI instance. 


However, if larger stable matchings are re- 
quired, the problem becomes hard. 


Theorem 4 ((5, 13, 21, 24]) MAX SMTI is 
NP-hard and cannot be approximated within 
33/29 — € for any positive constant € unless 
P=NP. (33/29 > 1.137) 


The following approximation ratio is achieved 
by a local search type algorithm. 
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Theorem 5 ((14]) There is a polynomial-time 
approximation algorithm for MAX SMTI whose 
approximation ratio is at most 15/8 (=1.875). 


There are a couple of approximation algo- 
rithms for restricted inputs. 


Theorem 6 ([6]) There is a_ polynomial-time 
randomized approximation algorithm for MAX 
SMTI whose expected approximation ratio is at 
most 10/7(<1.429) if, in a given instance, ties 
appear in one side only and the length of each tie 
is two. 


Theorem 7 ((6]) There is a _ polynomial-time 
randomized approximation algorithm for MAX 
SMTI whose expected approximation ratio is at 
most 7/4(= 1.75) if, in a given instance, the 
length of each tie is two. 


Theorem 8 ([7]) There is a_ polynomial-time 
approximation algorithm for MAX SMTI whose 
approximation ratio is at most 2/(1 + L~*) if, in 
a given instance, ties appear in one side only and 
the length of each tie is at most L. 


Theorem 9 ([7]) There is a polynomial-time ap- 
proximation algorithm for MAX SMTI whose ap- 
proximation ratio is at most 13/7(1.858) if, in 
a given instance, the length of each tie is two. 


(p, q)-MAX SMT in Weak Stability 

Irving et al. [12] show the boundary between 
P and NP-hardness in terms of the length of 
preference lists. 


Theorem 10 ((12]) (2,00)-MAX SMTI is solv- 
able in time O(n2 logn). 

Theorem 11 ({12]) (3,3)-MAX SMTI is NP- 
hard. 


Theorem 12 ((12]) (3,4)-MAX SMTI is NP-hard 
and cannot be approximated within some con- 
stant 6(> 1) unless P=NP. 


Applications 
One of the most famous applications of the stable 


marriage problem is a centralized assignment 
system between medical students (residents) and 
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hospitals. This is an extension of the stable mar- 
riage problem to a many-one variant: Each hospi- 
tal declares the number of residents it can accept, 
which may be more than one, while each resident 
has to be assigned to at most one hospital. Actu- 
ally, there are several applications in the world, 
known as NRMP in the USA [4], CaRMS in 
Canada [1], SFAS (previously known as SPA) in 
Scotland [10, 11], and JRMP in Japan [16]. One 
of the optimization criteria is clearly the number 
of matched residents. In a real-world application 
such as the above hospitals-residents matching, 
hospitals and residents tend to submit short pref- 
erence lists that may include ties, in which case, 
the problem can be naturally considered as MAX 
SMTI. 


Open Problems 


An apparent open problem is to narrow the gap of 
approximability and inapproximability of MAX 
SMTI in weak stability. 

Since the publication of the key result of 
this chapter (Theorem 5), there have been a lot 
of improvement. Kirdly [18] presented a linear 
time 5/3-approximation algorithm (see > Simpler 
Approximation for Stable Marriage). McDer- 
mid [22] then presented a 1.5-approximation 
algorithm (see » Simpler Approximation for 
Stable Marriage), and Kiraly [19] and Paluch [23] 
presented simpler algorithms with the same 
approximation ratio, which is the current best 
upper bound. The lower bound was improved by 
Yanagisawa [24], who showed that MAX SMTI 
is inapproximable to within a ratio smaller than 
33/29(>1.137) unless P=NP. He also showed 
that MAX SMTI is inapproximable within a 
ratio smaller than 4/3(>1.333) under the Unique 
Games Conjecture (UGC). 

As for the special case where ties can appear 
in one side only (see » Stable Marriage with 
One-Sided Ties), Kiraly [18] presented a 1.5- 
approximation algorithm. It was then improved to 
25/17(<1.471) [15] and to 22/15(<1.467) [8], 
which is the current best upper bound. The cur- 
rent best lower bounds are 21/19(~1.105) under 
PANP and 1.25 under UGC [7]. 


Stable Marriage with Ties and Incomplete Lists 
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Problem Definition 


Let N be a finite set of players; a nonempty 
subset of N is called a coalition. Each player 
i € N has a preference relation >; (complete, 
reflexive, and transitive) over all the coalitions 
that contain 7. Notation S >; T means that player 
i weakly prefers coalition S to coalition T; if 
S >; T and not T >; S, then player i strictly 
prefers S to T, denoted by S >; T. If S >; T 
and T >; S, then player i is indifferent between 
coalitions S and T (there is a fie in his prefer- 
ence list). Player i has strict preferences if her 
preference list contains no ties. There are several 
possible ways of representing preferences, but it 
is usually supposed that preference relations can 
be evaluated in polynomial time. 

An instance J of the stable partition problem 
(or coalition formation game, or hedonic game) is 
given by the set of players and their preferences. 

A partition II is a collection of disjoint 
coalitions whose union equals NV. It is supposed 
that each participant’s appreciation of a coalition 
structure only depends on the coalition II(i) she 
is a member and not on the composition of other 
coalitions. Of interest are partitions that fulfill 
some kind of stability requirements. 

We say that a coalition S C N strongly blocks 
a partition I, if each player i € S strictly prefers 
S to II(i), and a coalition S C N weakly blocks 
a partition IT, if each player i € S weakly prefers 
S to II(i) and there exists at least one player 
J € S who strictly prefers S to II(/). Partition 
IT is: 


¢ Individually stable if each player i weakly 
prefers II(i) to {i}; 

¢ Nash stable (NS) if each player i weakly 
prefers II(i) to X U {i} for each X € TI UQ@; 
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¢ Individually stable (IS) if whenever a player i 
strictly prefers X U {i} to (i) for some X € 
II, then X >; X U {i} for at least one player 
JEX; 

¢ Contractually individually stable (CIS) if 
whenever a player i strictly prefers X U {i} to 
TI(i) for some X € II, then X¥ >; X U {i} 
for at least one player 7 € X or II(i) >; 
IT(i)\{i} for at least one player7 € IT(Z); 

¢ Core stable if it admits no blocking coalition; 

¢ Strictly core stable if it admits no weakly 
blocking coalition; 


Most of these definitions were introduced in 
[3] and [4] where also some sufficient conditions 
for the existence of stable partitions were formu- 
lated. An overview of the implications between 
these definitions can be found in [1]. The follow- 
ing problems have been studied algorithmically 
for various stability notions S: 


¢ S-STABILITY-VERIFICATION: Given J and a 
partition IT, is 7 a S-stable partition? 

e¢ S-STABILITY-EXISTENCE: Given J, does aS- 
stable partition exist? 


¢ S-STABILITY-CONSTRUCTION: Given I, 
construct a S-stable partition. 

¢ S-STABILITY-STRUCTURE: Describe _ the 
structure of S-stable partitions for a 
given I. 


The computational complexity of these prob- 
lems depends on the specification of the prefer- 
ence relation in the input. 

An Important special case of the stable par- 
tition problem arises when each coalition can 
contain at most two players. This is known under 
the name the Stable Matching Problem and is 
treated in detail in [14]; see also references in the 
entry Stable Marriage. 


Key Results 


Trivial Encoding 

In the trivial encoding, each player lists all her 
individually rational coalitions (i.e., those that 
player i weakly prefers to coalition {7}). 


Stable Partition Problem 


Theorem 1 Under the trivial encoding, the 
STABILITY-VERIFICATION problem is poly- 
nomially solvable for any stability definition. 
STABILITY-EXISTENCE is NP-complete for IR, 
NS, core, and strict core [2]. CORE-STABILITY- 
EXISTENCE is NP-complete [2], even in the case 
when each player i has her preference list of the 
form C\(i) >; Co(i) >: {i} and all acceptable 
coalitions have size three [11]. 


As the trivial encoding may be of exponential 
size in the number of players, more succinct 
preference representations have been studied. 


Anonymous Preferences 

Players have anonymous preferences if all coali- 
tions of the same size are tied, i.e., players do 
not care about the actual content of the coalitions, 
only about their sizes. 


Theorem 2 Under anonymous preferences, the 
CORE-STABILITY-VERIFICATION problem is 
polynomially solvable and CORE-STABILITY- 
EXISTENCE is NP-complete [2]. 


Additive Preferences 
In an additive hedonic game, each player i has a 
real-valued function v; : N > Rand S >; T if 


and only if Di jes Ui(J) > Vier vil/)- 


Theorem 3 In additive hedonic games, STABILI- 
TY-VERIFICATION is co-NP-complete in the 
strong sense for core and strict core [1, 17]. 
CORE-STABILITY-EXISTENCE and _ STRICT- 
CORE-STABILITY-EXISTENCE are __ strongly 
NP-hard [18] even in the symmetric case 
[1]. INDIVIDUAL-STABILITY-EXISTENCE and 
NASH-STABILITY-EXISTENCE are strongly NP- 
complete [1, 18]. Moreover, CORE-STABILITY- 
EXISTENCE is )~}-complete [19]. 


Special cases of additive preferences arise if 
vi(j) € {-1,|N]} for each i, 7 € N (friend- 
oriented case) or vj(7) € {1,—|N]|} for each 
i,j € N (enemy-oriented case). Under friend- 
oriented as well as under enemy-oriented prefer- 
ences, a core-stable partition always exists [12], 
however, the following assertion holds. 


Stable Partition Problem 


Theorem 4 ({12]) Under enemy-oriented pref- 
erences, CORE-STABILITY-VERIFICATION and 
CORE-STABILITY-CONSTRUCTION are strongly 
NP-complete and NP-hard, respectively. 


Preferences Derived from the Best and/or 
Worst Player 
Suppose that each player 7 linearly orders only 
individual players or, more precisely, a subset of 
them — these are acceptable for i. 

Preferences over players are extended to pref- 
erences over coalitions on the basis of the best or 
the worst player in the coalition as follows: 


B-preferences — a player orders coalitions first 
on the basis of the most preferred member of 
the coalition, and if those are equal or tied, the 
coalition with smaller cardinality is preferred; 

W-preferences — a player orders coalitions on 
the basis of the least preferred member of the 
coalition; 

BW-preferences — a player orders coalitions 
first on the basis of the best member of the 
coalition, and if those are equal or tied, the 
coalition with a more preferred worst member 
is preferred. 


In this case, preferences are considered strict, 
if the preferences over individuals are strict, and 
they are called dichotomous if all acceptable 
participants are tied in each preference list. 


Theorem 5 Under B-preferences, STABILITY- 
VERIFICATION is polynomial for core and strict 
core. A strict core and a core stable partition 
always exist if preferences over players are strict 
[9]. However, if preferences over players contain 
ties, STABILITY-EXISTENCE for core and strict 
core is NP-complete [6]. In the dichotomous 
case, a core stable partition can be constructed in 
polynomial time, but STRICT-CORE-STABILITY- 
EXISTENCE is NP-complete [5]. 


Let us remark here that in the case of strict 
preferences, a strict core stable partition can be 
found by the famous Top Trading Cycles algo- 
rithm [9, 20]. 

The stable partition problem under W- 
preferences was studied in [7] and many 
features similar to the Stable Roommates 
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Problem [14] were described. First, if a blocking 
coalition exists, then there is a_ blocking 
coalition of size at most 2. Hence, CORE- 
STABILITY-VERIFICATION is polynomial. 
CORE-STABILITY-EXISTENCE and  CORE- 
STABILITY-CONSTRUCTION are polynomial in 
the strict preferences case, which can be shown 
using an extension of Irving’s Stable Roommates 
Algorithm (discussed in detail in [14]). This 
algorithm can also be used to derive some results 
for CORE-STABILITY-STRUCTURE. In the case 
of ties, CORE-STABILITY-EXISTENCE is NP- 
complete. 

Under BW preferences, in the strict pref- 
erences case, a core partition always exists 
and one can be obtained by the Top Trading 
Cycles algorithm, but STRICT-CORE-STABILITY- 
EXISTENCE is NP-hard. If preferences contain 
ties, CORE-STABILITY-EXISTENCE is  NP- 
hard too [8]. CORE-STABILITY- VERIFICATION 
remains open. 


Applications 


Stable partitions arise in various economic and 
game theoretical models. They appear in the 
study of countries formation [10] and in multi- 
agent coordination scenarios and social network- 
ing services [13]. Stability is also desired in barter 
exchange economies with discrete commodities 
[20,21], including exchange of kidneys for trans- 
plantations [5, 16]. Notice that in case when the 
cooperation of players consists in the exchange of 
some items within one partition set, the exchange 
cycle has also to be specified. 


Open Problems 


Due to the great number of variants, a lot of open 
problems exists. In almost all cases, STABILITY- 
STRUCTURE is not satisfactorily solved. For in- 
stances with no stable partition, one may seek one 
that minimizes the number of players who have 
an incentive to deviate. Parallel algorithms were 
also not studied. 
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Experimental Results 


Stochastic local search algorithms for CORE- 
STABILITY-VERIFICATION in the additive 
preferences case were reported in [15]. 
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Problem Definition 


Stackelberg games [15] may model the inter- 
play among an authority and rational individuals 
that selfishly demand resources on a large-scale 


Stackelberg Games: The Price of Optimum 


network. In such a game, the authority (Leader) 
of the network is modeled by a distinguished 
player. The selfish users (Followers) are modeled 
by the remaining players. 

It is well known that selfish behavior 
may yield a Nash Equilibrium with cost 
arbitrarily higher than the optimum one, yielding 
unbounded Coordination Ratio or Price of 
Anarchy (PoA) [7,13]. Leader plays his strategy 
first assigning a portion of the total demand 
to some resources of the network. Followers 
observe and react selfishly assigning their 
demand to the most appealing resources. Leader 
aims to drive the system to an a posteriori 
Nash equilibrium with cost close to the overall 
optimum one [4, 6, 8, 10]. Leader may also 
be eager for his own rather than system’s 
performance [2, 3]. 

A Stackelberg game can be seen as a special, 
and easy [6] to implement, case of Mechanism 
Design. It avoids the complexities of either com- 
puting taxes or assigning prices, or even design- 
ing the network at hand [9]. However, a central 
authority capable to control the overall demand 
on the resources of a network may be unrealistic 
in networks which evolute and operate under the 
effect of many and diversing economic entities. A 
realistic way [4] to act centrally even in large nets 
could be via Virtual Private Networks (VPNs) 
[1]. Another flexible way is to combine such 
strategies with Tolls [5, 14]. 

A dictator controlling the entire demand opti- 
mally on the resources surely yields POA=1. On 
the other hand, rational users do prefer a liberal 
world to live. Thus, it is important to compute 
the optimal Leader strategy which controls the 
minimum of the resources (Price of Optimum) 
and yields PoA = 1. What is the complexity 
of computing the Price of Optimum? This is not 
trivial to answer, since the Price of Optimum de- 
pends crucially on computing an optimal Leader 
strategy. In particular, [6] proved that computing 
the optimal Leader strategy is hard. 

The central result of this lemma is Theorem 5. 
It says that on nonatomic flows and arbitrary s—t 
networks and latencies, computing the minimum 
portion of flow and Leader’s optimal strategy 
sufficient to induce PoA = 1 is easy [10]. 
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Problem (G(V, E),s,t € V,r) INPUT: Graph 
G,Ve ¢€ E latency €¢, flow r, 
destination pair (s,) of vertices in V. 
OUTPUT: (i) The minimum portion ag of the 
total flow r sufficient for an optimal Stackelberg 
strategy to induce the optimum on G. (ii) The 
optimal Stackelberg strategy. 


a source- 


Models and Notations 


Consider a graph G(V, E) with parallel edges 
allowed. A number of rational and selfish users 
wish to route from a given source s to a destina- 
tion node ¢ an amount of flow r. Alternatively, 
consider a partition of users in k commodities, 
where user(s) in commodity 7 wish to route flow 
r; through a source-destination pair (s;,¢;), for 
eachi = 1,...,k. Each edge e € E is associated 
to a latency function £,(), positive, differentiable, 
and strictly increasing on the flow traversing it. 


Nonatomic Flows 


There are infinitely many users, each routing 
his/her infinitesimally small amount of the total 
flow r; from a given source s; to a destination 
vertex ¢; in graph G(V,E). A flow f is an 
assignment of jobs f2 on each edge e € EF. The 
cost of the injected flow f. (satisfying the stan- 
dard constraints of the corresponding network- 
flow problem) that traverses edge e € E equals; 
Ce(fe) = fe x le(fe). It is assumed that on 
each edge e the cost is convex with respect to 
the injected flow f.. The overall system’s cost is 


the sum >> fe x €e(fe) of all edge costs in G. 
eck 
Let fp the amount of flow traversing the s; — tj 


path P. The latency 0p(f) of s; — t; path P is 
the sum )° £¢(fe) of latencies per edge e € P. 


eceP 
The cost Cp(f) of s; — t; path P equals the 
flow fp traversing it multiplied by path latency 
lp(f). That is, Cp(f) = fp x X £e( fe). In 


a Nash equilibrium, all s; — ¢; pains traversed by 
nonatomic users in part i have a common latency, 
which is at most the latency of any untraversed 
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S; — t; path. More formally, for any part 7 and 
any pair Pi}, P2 of s; — t; paths, if fp, > 0 
then fp,(f) < ¢p,(f). By the convexity of 
edge costs, the Nash equilibrium is unique and 
computable in polynomial time given a floating- 
point precision. Also computable is the unique 
Optimum assignment O of flow, assigning flow 
Oe on each e € FE and minimizing the overall 


cost )* oe£e(0¢). However, not all optimally tra- 
eck 
versed s; — t; paths experience the same latency. 


In particular, users traversing paths with high 
latency have incentive to reroute toward more 
speedy paths. Therefore, the optimal assignment 
is unstable on selfish behavior. 

A Leader dictates a weak Stackelberg strategy 
if on each commodity i = .,k controls 
a fixed @ portion of flow r;,a € [0,1]. A 
strong Stackelberg strategy is more flexible, since 
Leader may control a;7; flow in commodity i 


k 
such that )> 0; = a. Let a Leader dictating flow 
i=1 
Se on edge e € FE. Thea posteriori latency Le (Ne) 
of edge e, with respect to the induced flow ne by 
the selfish users, equals le (ne) = le(ne + Se). In 
the a posteriori Nash equilibrium, all s; — t; paths 
traversed by the free selfish users in commodity 
i have a common latency, which is at most the 
latency of any selfishly untraversed path, and its 


costis }> (te + Se)xle(Ne). 
ecE 


Atomic Splittable Flows 


There is a finite number of atomic users 1,...,k. 
Each user i is responsible for routing a non- 
negligible flow-amount r; from a given source 5; 
to a destination vertex ¢; in graph G. In turn, each 
flow-amount 7; consists of infinitesimally small 
jobs. 

Let flow f assigning jobs f. on each edge 
e € E. Each edge flow f, is the sum of partial 
flows f.',..., i injected by the corresponding 
users 1,...,k. That is, fe = ss a oe 
As in the model above, the latency on a given 


s; —t; path P is the sum > f¢( fe) of latencies 
ecP 


per edge e € P. Let ie be the flow that user 7 
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ships through an s; — t; path P. The cost of user 
i on a given s; — t; path P is analogous to her 
path flow Fe routed via P times the total path 


abel > le(fe). That is, the path cost equals 
eeP 


ee Le( fe). The overall cost C;(f) of user 


i is he: sit of the corresponding path costs of all 
i — t; paths. 

In a Nash equilibrium no user i can improve 
his cost C;(f) by rerouting, given that any user 
J #i keeps his routing fixed. Since each atomic 
user minimizes its cost, if the game consists of 
only one user, then the cost of the Nash equilib- 
rium coincides to the optimal one. 

In a Stackelberg game, a distinguished atomic 
Leader player controls flow ro and plays first 
assigning flow se on edge e € E. The a pos- 
teriori latency Le (x) of edge e on induced flow 
x equals Le (x) = le(x + Se). Intuitively, after 
Leader’s move, the induced selfish play of the 
k atomic users is equivalent to atomic splittable 
flows on a graph where each initial edge latency 
£- has been mapped to £,. In game parlance, 
each atomic user i € {1,...,k}, having fixed 
Leader’s strategy, computes his best reply against 
all other atomic users {1,...,k}\{i}. If ne is the 
induced Nash flow on edge e, this yields total cost 


Y (Me + Se) x bo(Ne). 
ecE 


Atomic Unsplittable Flows 


The users are finite 1,...,k and user i is allowed 
to send his non-negligible job 7; only on a single 
path. Despite this restriction, all definitions given 
in atomic splittable model remain the same. 


Key Results 


Let us see first the case of atomic splittable flows, 
on parallel M/M/1 links with different speeds 
connecting a given source-destination pair of ver- 
tices. 


Theorem 1 (Korilis, Lazar, Orda [6]) The 
Leader can enforce in polynomial time the 
network optimum if his/her controls flow ro 
exceeding a critical value r°. 


Stackelberg Games: The Price of Optimum 


In the sequel, we focus on nonatomic flows on 
s —¢t graphs with parallel links. In [6] primarily 
were studied cases that Leader’s flow cannot 
induce network’s optimum and was shown that an 
optimal Stackelberg strategy is easy to compute. 
In this vain, if s — ¢ parallel link instances are 
restricted to ones with linear latencies of equal 
slope, then an optimal strategy is easy [4]. 


Theorem 2 (Kaporis, Spirakis [4]) The opti- 
mal Leader strategy can be computed in polyno- 
mial time on any instance (G, 1, 0.) where G is an 
s—t graph with parallel links and linear latencies 
of equal slope. 


Another positive result is that the optimal 
strategy can be approximated within (1 + €) in 
polynomial time, given that link latencies are 
polynomials with nonnegative coefficients. 


Theorem 3 (Kumar, Marathe [8]) There is 
a__ fully polynomial approximate Stackelberg 
scheme that runs in poly (m, 1) time and outputs 
a strategy with cost (1 + €) within the optimum 
Strategy. 


For parallel link s — ¢ graphs with arbitrary 
latencies more can be achieved: in polynomial 
time a “threshold” value QG is computed, suf- 
ficient for the Leader’s portion to induce the 
optimum. The complexity of computing optimal 
strategies changes in a dramatic way around the 
critical value og from “hard” to “easy” (G, r, a) 
Stackelberg scheduling instances. Call a@ as the 
Price of Optimum for graph G. 


Theorem 4 (Kaporis, Spirakis [4]) On an in- 
put s —t parallel link graph G with arbitrary 
strictly increasing latencies, the minimum portion 
Og sufficient for a Leader to induce the optimum, 
as well as his/her optimal strategy, can be com- 
puted in polynomial time. 


As a conclusion, the Price of Optimum og 
essentially captures the hardness of instances 
(G,r,qa). Since, for Stackelberg scheduling in- 
stances (G,r,a® > Q@), the optimal Leader 
strategy yields PoA = 1 and it is computed 
as hard as in P, while for (G,7r,a@ < Og) the 
optimal strategy yields PoA < | and it is as easy 
as NP [10]. 
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The results above are limited to parallel links 
connecting a given s —f¢ pair of vertices. Is it pos- 
sible to efficiently compute the Price of Optimum 
for nonatomic flows on arbitrary graphs? This is 
not trivial to settle. Not only because it relies 
on computing an optimal Stackelberg strategy, 
which is hard to tackle [10], but also because 
Proposition B.3.1 in [11] ruled out previously 
known performance guarantees for Stackelberg 
strategies on general nets. 

The central result of this lemma is presented 
below and completely resolves this question (ex- 
tending Theorem 4). 


Theorem 5 (Kaporis, Spirakis [4]) On arbi- 
trary s — t graphs G with arbitrary latencies, 
the minimum portion dG sufficient for a Leader 
to induce the optimum, as well as her optimal 
strategy, can be computed in polynomial time. 


Example 


Consider the optimum assignment O of flow r 
that wishes to travel from source vertex s to sink 
t. O assigns flow o¢ incurring latency ¢¢(0¢) 
per edge e € G. Let Ps_,; the set of all s — tf 
paths. The shortest paths in Ps—.4 with respect 
to costs €-(0¢) per edge e € G can be com- 
puted in polynomial time. That is, the paths that 
given flow assignment O achieve path latency: 


min > Le(Oe) }, i.e., minimize their path 
PEPs—t eeP 


latency. It is crucial to observe that if we want 
the induced Nash assignment by the Stackelberg 
strategy to attain the optimum cost, then these 
shortest paths are the only choice for selfish users 
that are eager to travel from s to t. Furthermore, 
the uniqueness of the optimum assignment O 
determines the minimum part of flow which can 
be selfishly scheduled on these shortest paths. 
Observe that any flow assigned by O on a non- 
shortest s — ¢t path has incentive to opt for a 
shortest one. Then a Stackelberg strategy must 
freeze the flow on all non-shortest s — ¢ paths. 

In particular, the idea sketched above achieves 
coordination ratio 1 on the graph in Fig. 1. On this 
graph Roughgarden proved that +x (optimum 


a 
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0,xe€ [0,4 -€] 
£(x) =4 arbitray, x € (3-€,3) 


l-e, xe|t ,co| 


&(a2) = 1 
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0, x€ [0,7 -€] 
£(x) =4 arbitrary, x € (¢ ~e,3) 
l-e,rve [3° 


Stackelberg Games: The Price of Optimum, Fig. 1 A bad example for Stackelberg routing 


cost) guarantee is not possible for general (s, t)- 
networks, Appendix B.3 in [11]. The optimal 
edge flows are (r = 1): 


Osu = 4 + <€:0¢sy = 1 +609 = 4 —2e, 
Ovst = 7+ 6,0vor = G-€ 

The shortest path Pp € P with respect to the 
optimum O is Py = s > v > w = ft (see 
[11] pp. 143, Sth-3th lines before the end) and its 
flow is fp) = 5 — 2e. The non-shortest paths are 
Py =s—2>v—7tand Py =s—7>w-t with 
corresponding optimal flows: fp, = + + € and 
SP. = ; + e. Thus, the Price of Optimum is 


1 
dey es = 5 ee Pe 


Applications 


Stackelberg strategies are widely applicable in 
networking [6], see also Section 6.7 in [12]. 


Open Problems 


It is important to extend the above results on 
atomic unsplittable flows. 
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Problem Definition 


Algorithmic self-assembly is concerned with 
hands-off assembly of complex structures by 
mixing collections of simple particles that 
aggregate according to local rules. Staged 
self-assembly utilizes sequences of mixings to 
reduce the number of particle types used. The 
standard model of staged self-assembly builds 
on the abstract Tile Assembly Model (aTAM) 
of Winfree [7], where each particle is a non- 
rotatable unit square tile with a labeled glue on 
each side. Tiles attach to other tiles edgewise 
via glues of the same label, forming polyomino- 
shaped aggregates called assemblies. 

In the simplest model, a pair of assemblies 
(of which single tiles are a special case) can 
attach via a single matching glue. In a more 
general model, a pair of assemblies can attach 
if they share a total of t € N glues. The 
parameter t is called the temperature of the 
system. 

The self-assembly process is carried out by 
combining an infinite number of copies of a 
collection of reagent assemblies in a bin, where 
they attach in every possible way. The subset of 
the resulting assemblies that cannot attach to any 
other assemblies define the product assemblies of 
the mixing, 1.e., the set of assemblies that remain 
once the assembly process is complete. A system 
consisting of a single bin with single-tile reagent 
assemblies is a hierarchical [2], two-handed [1], 
or polyomino [5] self-assembly system. 

In a staged self-assembly system [3], the prod- 
ucts of one bin can be used as the reagents of 
other bin (see Fig. 1). The directed acyclic graph 
describing the relationship mixings is called the 
mix graph of the system. An initial set of mixings 
each have a single tile as the only product assem- 
bly and no reagent assemblies. 


Objectives In general, the goal is to design a 
system with a mixing containing a single product 
assembly of a desired polyomino shape while 
minimizing the size of some aspect of the sys- 
tem. Several aspects are considered, including the 
number of distinct tiles (tile complexity), number 
of edges of the mix graph (mix graph complexity), 


i, 

* 

7 i" 
fx Ff * 


Staged Assembly, Fig. 1 A staged self-assembly sys- 
tem. Each bin (blue box) contains the product assemblies 
of the bin. The reagent assemblies of a bin are the products 
of other bins (incoming arrows) 


width of the mix graph (bin complexity), height 
of the mix graph (stage complexity), and tempera- 
ture of the system. The computational complexity 
of finding an optimal system for an input shape 
under some measure of system complexity is also 
considered. In some cases, the desired polyomino 
shape also has each cell labeled, and the goal is 
to construct a given labeled shape using labeled 
tiles. 


Problem 1 (Smallest 
System) 


Staged Self-Assembly 


INPUT: A labeled polyomino P.. 

OUTPUT: A staged self-assembly system con- 
taining a bin with a single product assembly 
with labeled shape P that is minimum in some 
measure. 


Key Results 


We describe the results by increasing generality 
of the shapes assembled. 


Lines 
In the most constrained case, the input polyomino 
is an unlabeled 1 xn polyomino (a Jine). Lines can 
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be assembled by t = | systems using O(1) tile 
types, O(1) bins, and O(logn) stages and mix 
graph edges. The idea is to repeatedly double the 
length of a line assembly as proved by Demaine 
et al. [3]. 

Demaine, Eisenstat, Ishaque, and Winslow [4] 
prove that the case of labeled lines is roughly 
equivalent to the problem of finding the small- 
est context-free grammar that is a single string 
consisting of the labels of the line read from left 
to right. In particular, any context-free grammar 
G with |G| rules and deriving a single string o 
can be converted into a tT = I staged self- 
assembly system S assembling a line with left-to- 
right label string 0, where the number of edges in 
the mix graph of S is O(|G|). The complexity of 
the smallest staged self-assembly system where 
the input polyomino is a labeled line and the 
system has an upper limit on the number of glue 
types appearing on the tiles was proven to be NP- 
hard [4]. 


Squares 
Demaine et al. [3] prove that unlabeled n x n 
squares are possible with a t = | staged systems 
containing O(1) tile types and bins, O(logn) 
stages, and thus a mix graph with O(logn) 
stages. The system uses an idea similar to 
that for assembling lines but in two steps: first 
assemble n x 1 columns and then combine them 
to form n x 2, then n x 4, etc., rectangles. This 
construction uses a jigsaw technique to ensure 
attaching rectangles cannot assemble askew. 
Demaine et al. also prove that unlabeled 
n Xn Squares can be assembled using t = 2 
staged systems using O(1) tile types, O(,/logn) 
bins, and O(loglogn) stages. The approach is 
2 single-bin systems 
that efficiently assemble squares by constructing 
macrotiles: large assemblies that simulate the 
behavior of distinct tile types by encoding 
glue types in geometry on their surfaces. Such 
macrotiles allow staged systems to trade off 
tile types for stages by replacing many distinct 
tile types with an initial sequence of stages that 
assemble macrotile versions of the tiles. 


to simulate known t = 


Staged Assembly 


General Shapes 

For general shapes, several different results high- 
light the trade-offs in complexity enabled by 
staged assembly. Demaine et al. [3] prove that any 
unlabeled shape can be assembled by at = 1 
staged system using O(1) tile types, O(logn) 
bins, and a number of stages proportional to the 
diameter of the dual grid graph of the shape. If 
the shape is monotone, then a similar system with 
O(n) bins and O(logn) stages (increased bins 
but decreased stages) exists. 

If the system is permitted to assemble a scaled 
version of the input shape, then the system of 
Soloveichik and Winfree [6] can be simulated 
with macrotiles, resulting in a t = 2 staged sys- 
tem with O(1) tile types, O(K/ log K) bins, and 
O(log log K) stages, where K is the Kolmogorov 
complexity of the shape. For labeled shapes, 
Winslow [8] proves that any polyomino context- 
free grammar G (a generalization of context-free 
grammars to two dimensions) with |G| rules de- 
riving a single labeled polyomino P can be con- 
verted into a staged system S assembling a scaled 
version of P consisting of labeled macrotiles 
where the number of edges in the mix graph of 


S is O(\G)). 


Applications 


The theory of algorithmic self-assembly is rooted 
in the design of nanoscale particle systems, par- 
ticularly DNA-based systems. For staged self- 
assembly in particular, the capability of assem- 
bling complex shapes using only O(1) tile types 
is highly desirable in practice, as engineering 
many tile types with desired glues is often far 
more challenging than carrying out a sequence of 
mixings. 


Open Problems 


The complexity of the smallest staged self- 
assembly problem where the number of glue 
types used is unconstrained remains open, both 
for the case of lines and general shapes. For 
lines, the problem is known to be in NP and 
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when the number of glues is constrained is NP- 
complete (both proved in [4]). For general shapes, 
the problem is only known to be in PSPACE 
(proved in [9]) and NP-hard when the number of 
glues is constrained, following from the special 
case of lines. The complexity of verifying that a 
staged assembly system produces a given shape 
also remains open and is only known to lie in 
PSPACE. 
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Problem Definition 


The three main types of mutations modifying 
biological sequences are insertions, deletions, 
and substitutions. The simplest model involving 
these three types of mutations is the so-called 
Thorne-Kishino-Felsenstein model [16]. In this 
model, the characters of a sequence evolve 
independently. Each character in the sequence 
can be substituted with another character 
according to a prescribed reversible time- 
continuous Markov model on the possible 
characters. Insertion-deletions are modeled as 
a birth-death process. Insertions can happen at 
the beginning of the sequence, at the end of the 
sequence, and between any two characters. It 
is possible to insert a character into the empty 
sequence. The time span between two insertions 
is exponentially distributed with parameter /, 
and this parameter does not depend on the 
context of the position. The newborn character 
is drawn from the equilibrium distribution of the 
substitution process. Each character is deleted 
after an exponentially distributed waiting time 
with parameter j1, and its two positions where 
insertions can happen are joined. 

The multiple statistical alignment problem is 
to calculate the likelihood of a set of sequences, 
namely, what is the probability of observing a set 


Statistical Multiple Alignment 


of sequences, given all the necessary parameters 
that describe the evolution of sequences. Hein, 
Jensen, and Pedersen were the first who gave an 
algorithm to calculate this probability [5]. Their 
algorithm has O(5” L”) running time, where n is 
the number of sequences, and L is the geometric 
mean of the sequences. The running time has 
been improved to O(2” L”) by Lunter et al. [9]. 


Notations 


Substitutions 

A time-continuous Markov model for a substitu- 
tion process on an alphabet & is given by ak x k 
rate matrix Q, with constraints 


4i,j 29 Vi #ex () 


a7 =0 Vi (2) 


where k is the size of the alphabet. The probabil- 
ity that a character a; will be character a; after 
time ¢t can be calculated with the exponentiation 
of the rate matrix: 


P,(a;|ai) = pi,j where (3) 
oe (4) 


The exponentiated matrix can be easily calculated 
if the rate matrix is diagonalized, namely, if O = 
WAW—!, where A isa diagonal matrix, then 


e2' — We wt (5) 
e“! can be easily calculated, since it is a diagonal 
matrix containing e/" in the ith position of the 
diagonal. 


Insertions and Deletions 

A Galton-Watson tree is a rooted, edge-weighted 
binary tree that describes a birth-death process 
for a time span ¢. The process starts at the root 
of the tree, and a split represents a birth. Edge 
weights represent times, and leaves having a 
distance from the root smaller than ¢ represent 
death events. Leaves being ¢ time fare from the 
root are the individuals that live at time point f. 
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Insertion-deletion events transforming one 
sequence into another can be _ described 
with Galton-Watson forests: births represent 
insertions, and deaths represent deletions. Each 
character of the ancestral sequence has a tree, and 
there is an additional tree at the beginning of the 
sequence associated to an imaginary character. 
This imaginary character cannot die. Roots of the 
trees are the characters of the ancestral sequence, 
and each character of the descendant sequence 
is a leaf of one of the trees, being ¢ time fare 
from the root. There might be additional leaves 
that are not associated with characters of the 
descendant sequences; these are the died out 
lineages. The forest is aligned such that edges 
do not cross each other while the characters of 
the two sequences keep their original order. Each 
Galton-Watson forest indicates an alignment of 
the two sequences; see Fig. 1. Given a birth and 
death process, the probability density of a Galton- 
Watson tree can be calculated easily. Assuming 
independence, the probability of a Galton-Watson 
forest is the product of the probabilities of its 
trees. The probability of an alignment is the 
integral of the probabilities of the forests that 
represent it. Due to independence, it is enough 
to tell the probability of alignment patterns that 
might arise as an image of a Galton-Watson tree 
(see Fig. 1b); the probability of an alignment is 
the product of the probabilities of its patterns. 

In the Thorne-Kishino-Felsenstein model 
(TKF91 model) [16], both the birth and the death 
processes are Poisson processes with parameters 
A and yp, respectively. The probability of the 
possible patterns can be found on Fig. 2. 


Evolutionary Trees 

An evolutionary tree is a leaf-labeled, edge- 
weighted, rooted binary tree. Labels are the 
species related by the evolutionary tree, and 
weights are evolutionary distances. It might 
happen that the evolutionary changes had 
different speed at different lineages, and hence 
the tree is not necessary ultrametric, namely, 
the root not necessary has the same distance 
to all leaves. The nodes of an evolutionary tree 
can be partially ordered such that two nodes are 
comparable if there is a path from the root to 
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Statistical Multiple Alignment, Fig. 1 (a) A Galton- 
Watson forest representing insertion-deletion events. The 
first tree starts with an immortal element that is respon- 
sible to the insertions at the beginning of the sequence. 
(b) The alignment indicated by the Galton-Watson forest 
above. Each tree makes a pattern of the alignment; patterns 
are separated with dashed lines 


any of the leaves containing the two nodes in 
question, and in this case the smaller node is the 
one that is closer to the root on the path. Each 
node v of an evolutionary tree indicates a subtree 
that contains v and all the nodes that are greater 
than v. Hereafter we consider only these subtrees. 

Given a set S of /-long sequences over al- 
phabet &, a substitution model M on & and an 
evolutionary tree T are labeled by the sequences. 
The likelihood of the tree is the probability of 
observing the sequences at the leaves of the tree, 
given that the substitution process starts at the 
root of the tree with the equilibrium distribution. 
This likelihood is denoted by P(S|T, M). The 
substitution likelihood problem is to calculate the 
likelihood of the tree. 

Let & be a finite alphabet and let S; = 
S1,181,2---S1,Ly> So = $2,182,2---S82,L9. --- 
Sn =  Sn,18n,2--+-Sn,L, be sequences over 
this alphabet. Let a TKF91 model TK F91 be 
given with its parameters: substitution model 
M, insertion rate A, and deletion rate py. Let 
T be an evolutionary tree labeled by Sj, 
So, ... Sy. The multiple statistical alignment 
problem is to calculate the likelihood of the 
tree, P(S,,S2,...S,|T, TKF91), given that 
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Statistical Multiple Alignment, Fig. 2. The probabili- 
ties of alignment patterns. From Jeft to right: k insertions 
at the beginning of the alignment, a match followed by 


the TKF91 process starts at the root with the 
equilibrium distribution. 


Multiple Hidden Markov Models 

It will turn out that the TKF91 model can be 
transformed to a multiple Hidden Markov Model; 
therefore we formally define it here. A multiple 
Hidden Markov Model (multiple HMM) is a 
directed graph with distinguished start and end 
states, (the in degree of the start and the out 
degree of the end state are both 0), together with 
the following described transition and emission 
distributions. Each vertex has a transition distri- 
bution over its out edges. The vertexes can be 
divided into two classes, the emitting and silent 
states. Each emitting state emits one-one random 
character to a prescribed set of sequences; it is 
possible that a state emits only one character 
to one sequence. For each state, an emission 
distribution over the alphabet and the set of se- 
quences gives the probabilities which characters 
will be emitted to which sequences. The Markov 
process is a random walk from the start to the 
end, following the transition distribution on the 
out edges. When the walk is in an emitting 
state, characters are emitted according to the 
emission distribution of the state. The process is 
hidden since the observer sees only the emitted 
sequences, and the observer does not observe 
which character is emitted by which state, even 
the observer does not see which characters are 
co-emitted. The multiple HMM problem is to 
calculate the emission probability of a set of 
sequences for a multiple HMM. This probability 
can be calculated with the forward algorithm 
that has O(VL"”) running time, where V is the 
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k — 1 insertions, a deletion followed by k insertions, and 
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a deletion not followed by insertions. 8B = ata 
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number of emitting states in the multiple HMM, 
L is the geometric mean of the sequences, and n 
is the number of sequences [3]. 


Key Results 


Substitutions have been modeled with time- 
continuous Markov models since the late 1960s 
[8], and an efficient algorithm for likelihood 
calculations was published in 1980 [4]. The 
running time of this efficient algorithm grows 
linearly both with the number of sequences 
and with the length of the sequences being 
analyzed, and it grows squarely with the size of 
the alphabet. The algorithm belongs to the class 
of dynamic programming algorithms. For each 
character, subtree, and position x, the algorithm 
calculates what would be the likelihood of 
the characters in position x in the sequences 
belonging to the subtree if the substitution 
process started in the root of the subtree with 
the given character. These probabilities are called 
conditional likelihoods. It is easy to show that 


Lp(a,x) = > Pr, (a |a)La, a) 


(x Pi (at2| 0) La, (2, ») (6) 


a2 


where d, and d> are the descendant nodes of the 
parent node p and f, and fz are the length of the 
edges connecting p with d; and dz, respectively. 
The likelihood of the tree can be calculated from 
the conditional likelihoods of the tree. Recall that 
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P(S|T, M) is the likelihood of observing a set of 
sequences S on the leaves of an evolutionary tree 
T under the substitution model M: 


P(S|T, M) = ] [0 Lroot(, x)a (7) 


Thorne, Kishino, and Felsenstein gave an 
O(nm) running time algorithm for calculating 
the likelihood of an n-long and an m-long 
sequence under their model [16]. It was not 
clear for long time how to extend this algorithm 
to more than two sequences. In 2001, several 
researchers [7, 12] realized that the TKF91 model 
for two sequences is equivalent with a pair 
Hidden Markov Model (pair HMM) in the sense 
that the transition and emission probabilities of 
the pair HMM can be parameterized with A, ju 
and the transition and equilibrium probabilities 
of the substitution model; moreover there is 
a bijection between the paths emitting the 
two sequences and alignments such that the 
probability of a path in the pair HMM equals 
to the probability of the corresponding alignment 
of the two sequences. Hence the likelihood of 
two sequences can be calculated with the forward 
algorithm of the pair HMM. 

After this discovery, it was relatively easy 
to develop an algorithm for multiple statistical 
alignment [5]. The key observation is that a multi- 
ple HMM can be created as a composition of pair 
HMMs along the evolutionary tree. This tech- 
nique was already known in the speech recog- 
nition literature [14], and was also rediscovered 
by Ian Holmes [6], who named this technique as 
transducer composition. The number of states in 
the so-created multiple HMM is O(52 ), where n 
is the number of leaves of the tree. The emission 
probabilities are the substitution likelihoods on 
the tree, which can be efficiently calculated as 
shown above. The running time of the forward 
algorithm is 5”L”, where L is the geometric 
mean of the sequence lengths. 

Lunter et al. [9] introduced an algorithm that 
does not need a multiple HMM description of 
the TKF91 model to calculate the likelihood of 
a tree. Using a logical sieve algorithm, they were 
able to reduce the running time to O(2” L”). They 
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called their algorithm the “one-state recursion” 
since their dynamic programming algorithm does 
not need different state of a multiple HMM to 
calculate the likelihood correctly. 


Applications 


Since the running time of the best known al- 
gorithm for multiple statistical alignment grows 
exponentially with the number of sequences, on 
its own it is not useful in practice. However, 
Lunter et al. also showed that there is a one- 
state recursion to calculate the likelihood of the 
tree given an alignment [10]. The running time of 
this algorithm grows only linearly with both the 
alignment length and the number of sequences. 
Since the number of states in a multiple HMM 
that can emit the same multiple alignment column 
might grow exponentially, this version of the 
one-state recursion is a significant improvement. 
The one-state recursion for multiple alignments 
is used in a Bayesian Markov chain Monte Carlo 
where the state space is the Descartes product of 
the possible multiple alignments and evolutionary 
trees. The one-state recursion provides an effi- 
cient likelihood calculation for a point in the state 
space [11]. 

Csfirds and Miklos introduced a model for 
gene content evolution that is equivalent with the 
multiple statistical alignment problem for alpha- 
bet size 1 [2]. They gave a polynomial running 
time algorithm that calculates the likelihood of 
the tree. The running time is O(n + hL7), where 
n is the number of sequences, h is the height of 
the evolutionary tree, and L is the sum of the 
sequence lengths. 

Thorne, Kishino, and Felsenstein also intro- 
duced a fragment model, also called the TKF92 
model, in which multiple insertions and deletions 
are allowed [17]. The birth process is still a 
Poisson process, but instead of single charac- 
ters, fragments of characters are inserted with a 
geometrically distributed length. The fragments 
are unbreakable, and the death process is going 
on the fragments. The TKF92 model for a pair 
of sequences also can be described into a pair 
HMM and the TKF92 model on a tree can be 
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transformed to a multiple HMM. Such multiple 
HMM is used in the StatAlign software package 
[13]. The software package has been extended 
to predict the common structure of sequences 
(e.g., slowly quickly evolving regions, RNA sec- 
ondary structures) by combining this multiple 
HMM with other stochastic models describing 
the structure of sequences [1, 15]. 


Open Problems 


It is conjectured that the multiple statistical align- 
ment problem cannot be solved in polynomial 
time for any nontrivial alphabet size. One also 
can ask what the most likely multiple alignment 
is or, equivalently, what the most probable path 
in the multiple HMM is that emits the given 
sequences. For a set of sequences, a TKF91 
model, and an evolutionary tree, the decision 
problem “Is there a multiple alignment that is 
more probable than p” is conjectured to be NP- 
complete. 

It is conjectured that there is no one-state 
recursion for the TKF92 model. 
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Problem Definition 


The problem deals with learning to classify 
from random labeled examples in Valiant’s PAC 
model [30]. In the random classification noise 
model of Angluin and Laird [1], the label of each 
example given to the learning algorithm is flipped 
randomly and independently with some fixed 
probability 7 called the noise rate. Robustness 
to such benign form of noise is an important 
goal in the design of learning algorithms. Kearns 
defined a powerful and convenient framework for 
constructing noise-tolerant algorithms based on 
statistical queries. Statistical query (SQ) learning 
is a natural restriction of PAC learning that 
models algorithms that use statistical properties 
of a data set, rather than individual examples. 
Kearns demonstrated that any learning algorithm 
that is based on statistical queries can be 
automatically converted to a learning algorithm 
in the presence of random classification noise 
of arbitrary rate smaller than the information- 
theoretic barrier of 1/2. This result was used 
to give the first noise-tolerant algorithm for a 
number of important learning problems. In fact, 
virtually all known noise-tolerant PAC algorithms 
were either obtained from SQ algorithms or can 
be easily cast into the SQ model. 

In subsequent work, the model of Kearns has 
been extended to other settings and found a num- 
ber of additional applications in machine learning 
and theoretical computer science. 


Definitions and Notation 

Let C be a class of {—1,+1}-valued functions 
(also called concepts) over an input space X. 
In the basic PAC model, a learning algorithm 
is given examples of an unknown function f 
from C on points randomly chosen from some 
unknown distribution D over X and should pro- 
duce a hypothesis / that approximates f. More 
formally, an example oracle EX(f, D) is an ora- 
cle that upon being invoked returns an example 
(x, f(x)), where x is chosen randomly with re- 
spect to D, independently of any previous exam- 
ples. A learning algorithm for C is an algorithm 
that for every € > 0,65 > 0, f € C, and 
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distribution D over X, given ¢, 6, and access 
to EX(f,D) outputs, with probability at least 
1 — 46, a hypothesis / that ¢-approximates f with 
respect to D (i.e., Prp[f(x) 4 h(x)] < ©). 
Efficient learning algorithms are algorithms that 
run in time polynomial in 1/e, 1/6 and the size 
of the learning problem s. The size of a learning 
problem is determined by the description length 
of f under some fixed representation scheme for 
functions in C and the description length of an 
element in X (often proportional to the dimension 
n of the input space). 

A number of variants of this basic framework 
are commonly considered. The basic PAC model 
is also referred to as distribution-independent 
learning to distinguish it from distribution- 
specific PAC learning in which the learning 
algorithm is required to learn with respect to 
a single distribution D known in advance. A 
weak learning algorithm is a learning algorithm 
that can produce a hypothesis whose error on the 
target concept is noticeably less than 1/2 (and 
not necessarily any € > 0). More precisely, a 
weak learning algorithm produces a hypothesis 
h such that Prp[ f(x) 4 h(x)] < 1/2 — 1/p(s) 
for some fixed polynomial p. The basic PAC 
model is often referred to as strong learning in 
this context. 

In the random classification noise model 
EX(f,D) is replaced by a faulty oracle 
EX"(f,D), where 7 is the noise rate. When 
queried, this oracle returns a noisy example 
(x,b) where b = f(x) with probability 1 — n 
and — f(x) with probability 1 independently of 
previous examples. When 7 approaches 1/2 the 
label of the corrupted example approaches the 
result of a random coin flip, and therefore, the 
running time of learning algorithms in this 
model is allowed to depend on a7 (the 
dependence must be polynomial for the algorithm 
to be considered efficient). For simplicity, one 
usually assumes that 7 is known to the learning 
algorithm. This assumption can be removed using 
a simple technique due to Laird [26]. 

To formalize the idea of learning from statis- 
tical properties of a large number of examples, 
Kearns introduced a new oracle STAT(f, D) that 
replaces EX(f, D). The oracle STAT(f, D) takes 
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as input a statistical query (SQ) of the form 
(7%, T), where y is a {—1, +1}-valued function on 
labeled examples and t € [0, 1] is the tolerance 
parameter. Given such a query, the oracle re- 
sponds with an estimate v of Prp[y(x, f(x)) = 
1] that is accurate to within an additive +7. 

Note that the oracle does not guarantee 
anything else on the value v beyond |v — 
Prp[x(x, f(x)) = 1]| < t and an SQ learning 
algorithm needs to work with any possible 
implementation of the oracle. Yang proposed 
a stronger, honest version of the oracle which 
to a call with function x returns the value of 
x(x, f(x)), where x is chosen randomly and 
independently according to D [32]. This version 
was shown to be equivalent to the original model 
up to polynomial factors [17]. 

Chernoff bounds easily imply that STAT(f, D) 
can, with high probability, be simulated using 
EX(f,D) by estimating Prp[y(x, f(x)) = 1] 
on O(rt~*) examples. Therefore, the SQ model 
is a restriction of the PAC model. Efficient SQ 
algorithms allow only efficiently evaluatable y’s 
and impose an inverse polynomial lower bound 
on the tolerance parameter over all oracle calls. 
Kearns also observes that in order to simulate 
all the statistical queries used by an algorithm, 
one does not necessarily need new examples 
for each estimation. Instead, assuming that 
the set of possible queries of the algorithm 
has Vapnik-Chervonenkis dimension d, all 
its statistical queries can be simulated using 
O(dt~?(1 — 2n)~? log (1/8)) examples [24]. 


Key Results 


Statistical Queries and Noise-Tolerance 
The main result given by Kearns is a way to 
simulate statistical queries using noisy examples. 


Lemma 1 ([24]) Let (y, t) be a statistical query 
such that y can be evaluated on any input in time 
T and let EX"(f, D) be a noisy oracle. The value 
Prp[y(x, f(x)) = 1] can, with probability at 
least 1—6, be estimated within t using O(t~?(1— 
2n)~* log (1/8)) examples from EX"(f,D) and 
time O(t~?(1 — 2n)~? log (1/8) - T). 


Statistical Query Learning 


This simulation is based on estimating several 
probabilities using examples from the noisy or- 
acle and then offsetting the effect of noise. The 
lemma implies that any efficient SQ algorithm for 
a concept class C can be converted to an efficient 
learning algorithm for C tolerating random clas- 
sification noise of any rate 7 < 1/2. 


Theorem 1 ((24]) Let C be a concept class ef- 
ficiently PAC learnable from statistical queries. 
Then C is efficiently PAC learnable in the pres- 
ence of random classification noise of rate y for 
any n < 1/2. 


Balcan and Feldman describe more general 
conditions on noise under which a specific SQ 
algorithm can be simulated in the presence of 
noise [3]. 


Statistical Query Algorithms 

Kearns showed that, despite the major restriction 
on the way an SQ algorithm accesses the exam- 
ples, many PAC learning algorithms known at the 
time can be modified to use statistical queries 
instead of random examples [24]. Examples of 
learning algorithms for which he described an SQ 
analogue and thereby obtained a noise-tolerant 
learning algorithm include: 


¢ Learning decision trees of constant rank. 

¢ Attribute-efficient algorithms for learning con- 
junctions. 

¢ Learning axis-aligned rectangles over R”. 

* Learning AC® (constant-depth unbounded 
fan-in) Boolean circuits over {0,1}” with 
respect to the uniform distribution in 
quasipolynomial time. 


Subsequent works have provided numerous 
additional examples of algorithms used in theory 
and practice of machine learning that can either 
be implemented using statistical queries or can 
be replaced by an alternative SQ-based algorithm 
of similar complexity. For example, the Percep- 
tron algorithm and learning of linear threshold 
functions [6, 12], boosting [2], attribute-efficient 
learning via the Winnow algorithm (cf. [16]), 
k-means clustering [5] and convex optimization- 
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based methods [20]. We note that many learning 
algorithms rely only on evaluations of functions 
on random examples and therefore can be seen as 
using access to the honest statistical query oracle. 
In such cases the SQ implementation follows 
immediately from the equivalence of the Kearns’ 
SQ oracle and the honest one [17]. 

The only known example of a technique for 
which there is no SQ analogue is Gaussian elim- 
ination for solving linear equations over a finite 
field. This technique can be used to learn parity 
functions that are not learnable using SQs (as we 
discuss below). As a result, with the exception of 
the parity learning problem, known bounds on the 
complexity of learning from random examples 
are, up to polynomial factors, the same as known 
bound for learning with statistical queries. 


Statistical Query Dimension 

The restricted way in which SQ algorithms use 
examples makes it simpler to understand the 
limitations of efficient learning in this model. A 
long-standing open problem in learning theory 
is learning of the concept class of all parity 
functions over {0, 1}” with noise (a parity func- 
tion is a XOR of some subset of n Boolean in- 
puts). Kearns has demonstrated that parities can- 
not be efficiently learned using statistical queries 
even under the uniform distribution over {0, 1}” 
[24]. This hardness result is unconditional in 
the sense that it does not rely on any unproven 
complexity assumptions. 

The technique of Kearns was generalized by 
Blum et al. who proved that efficient SQ learn- 
ability of a concept class C is characterized by 
a relatively simple combinatorial parameter of C 
called the statistical query dimension [7]. The 
quantity they defined, measures the maximum 
number of “nearly uncorrelated” functions in a 
concept class. (The definition and the results were 
simplified and strengthened in subsequent works 
[17, 29] and we use the improved statements 
here.) More formally, 


Definition 1 For a concept class C and distri- 
bution D, the statistical query dimension of C 
with respect to D, denoted SQ-DIM(C, D), is 
the largest number d such that C contains d 
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functions fi, fo,.. 
IED[fi fill < 7- 


Blum et al. relate the SQ dimension to learning 
in the SQ model as follows. 


., fa such that for alli # j, 


Theorem 2 ([7, 17]) Let C be a concept class 
and D be a distribution such that SQ-DIM 
(C,D) =d. 


¢ Tf all queries are made with tolerance of at 
least 1/d‘/3, then at least d‘/3 —2 queries are 
required to learn C with error 1/2 — 1/(2d?) 
in the SQ model. 

¢ There exists an algorithm for learning C with 
respect to D that makes d fixed queries, each 
of tolerance 1/(4d), and finds a hypothesis 
with error at most 1/2 — 1/(2d). 


Thus SQ-DIM characterizes weak SQ learn- 
ability relative to a fixed distribution D up to a 
polynomial factor. Parity functions are uncorre- 
lated with respect to the uniform distribution and 
therefore, any concept class that contains a super- 
polynomial number of parity functions cannot be 
learned by statistical queries with respect to the 
uniform distribution. This, for example, includes 
such important concept classes as k-juntas over 
{0, 1}” (or functions that depend on at most k 
input variables) for k = w(1) and decision trees 
of superconstant size. 

Simon showed that (strong) PAC learning rel- 
ative to a fixed distribution D using SQs can also 
be characterized by a more general and involved 
dimension [28]. Simpler and tighter characteriza- 
tions of distribution-specific PAC learning using 
SQs have been demonstrated by Feldman [15] 
and Sz6rényi [29]. Feldman also extended 
the characterization to the agnostic learning 
model. 

Despite characterizing the number of queries 
of certain tolerance, the SQ-DIM and its gen- 
eralizations capture surprisingly well the com- 
putational complexity of SQ learning of most 
concept classes. One reason for this is that if a 
concept class has polynomial SQ-DIM then it 
can be learned by a polynomial-time algorithm 
with advice also referred to as a “non-uniform” 
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algorithm (cf. [18]). However it was shown by 
Feldman and Kanade that for strong PAC learning 
there exist artificial problems whose computa- 
tional complexity is larger than their statistical 
query complexity [18]. 

Applications of these characterizations to 
proving lower bounds on SQ algorithms can 
be found in [11, 15, 19,25]. Relationships of SQ- 
DIM to other notions of complexity of concept 
classes were investigated in [22,27]. 


Applications 


The ideas behind the use of statistical queries to 
produce noise-tolerant algorithms were adapted 
to learning using membership queries (or ability 
to ask for the value of the unknown function 
at any point) and used to give a noise-tolerant 
algorithm for learning DNF with respect to the 
uniform distribution [9, 21]. The SQ model of 
learning was generalized to active learning (or 
learning where labels are requested only for some 
of the points) and used to obtain new efficient 
noise-tolerant active learning algorithms [3]. 

The restricted way in which an SQ algorithm 
uses data implies it can be used to obtain learn- 
ing algorithms with additional useful properties. 
Blum et al. [5] show that an SQ algorithm can 
be used to obtain a differentially-private [13] 
algorithm for the problem. In fact, SQ algo- 
rithms are equivalent to local (or randomized- 
response) differentially-private algorithms [23]. 
Chu et al. [10] show that SQ algorithms can 
be automatically parallelized on multicore ar- 
chitectures and give many examples of popular 
machine learning algorithms that can be sped up 
using this approach. 

The SQ learning model has also been instru- 
mental in understanding Valiant’s model of evo- 
lution as learning [31]. Feldman showed that the 
model is equivalent to learning with a restricted 
form of SQs referred to as correlational SQs 
[14]. A correlational SQ is a query of the form 
x(x, £) = g(x)-¢ for some g : X > [-1, 1]. 
Such queries were first studied by Ben-David 
et al. [4] (remarkably, before the introduction 
of the SQ model itself) and distribution-specific 


Statistical Query Learning 


learning with such queries is equivalent to learn- 
ing with (unrestricted) SQs. 

Statistical query-based access can naturally be 
defined for any problem where the input is a set of 
i.i.d. samples from a distribution. Feldman et al. 
show that lower bounds based on SQ-DIM can 
be extended to this more general setting and give 
examples of applications [17,20]. 


Open Problems 


The main questions related to learning with ran- 
dom classification noise are still open. Is every 
concept class efficiently learnable in the PAC 
model also learnable in the presence of random 
classification noise? Is every concept class effi- 
ciently learnable in the presence of random clas- 
sification noise of arbitrarily high rate (less than 
1/2) also efficiently learnable using statistical 
queries? A partial answer to this question was 
provided by Blum et al. who show that Gaussian 
elimination can be used in low dimension to ob- 
tain a class learnable with random classification 
noise of constant rate 7 < 1/2 but not learnable 
using SQs [8]. For both questions a central issue 
seems to be obtaining a better understanding of 
the complexity of learning parities with noise. 

The complexity of learning from statistical 
queries remains an active area of research with 
many open problems. For example, there is 
currently an exponential gap between known 
lower and upper bounds on the complexity 
of distribution-independent SQ _ learning of 
polynomial-size DNF formulae and AC® circuits 
(cf. [27]). Several additional open problems 
on complexity of SQ learning can be found in 
[16, 19, 22]. 
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Problem Definition 


The timing behavior of integrated systems is 
strongly affected by the characteristics of tran- 
sistors and wires in the system. Variations in 
the manufacturing process can cause drifts in 
these characteristics from one manufactured part 
to another. The traditional approach to address- 
ing these variations was to choose a worst-case 
value for each process parameter, but this has 
become unsustainable in the face of current-day 
variations. Statistical timing analysis provides 
a computationally efficient way to translate the 
probability density function of the underlying 
process parameter spread to the distribution of 
circuit timing. 

A key underlying structure for timing analysis 
is a graph G(V, EZ) of a combinational circuit, 
where the vertex set V corresponds to the gates, 
primary inputs, and primary outputs of the cir- 
cuit, and each connection between these gates 
corresponds to an edge in F. The delay of each 
gate corresponds to a probability distribution that 
is a function of the distributions of the under- 
lying (possibly correlated) process parameters, 
and the task of combinational statistical timing 
analysis is to obtain the distribution of the maxi- 
mum (or minimum) delay of the circuit, over all 
primary outputs. The extension of this problem 
to general edge-triggered sequential circuits is 
straightforward. Such circuits can be decomposed 
into independent combinational blocks, and the 
maximum (or minimum) operator acts on the 
delay distribution at all primary outputs of all 
combinational blocks of the sequential circuit. 


Key Results 


The framework that is used for statistical tim- 
ing analysis is based on graph-based topological 
traversals that maintain a closed-form structure 
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for the delay from the primary inputs of the 
circuit to the output of each vertex (referred to 
as the arrival time). The computation under this 
paradigm scales linearly with |E|. While it is 
certainly possible to perform statistical timing 
analysis through Monte Carlo simulations based 
on samples of the process parameter space, such 
an approach is uncompetitive compared to graph 
traversal algorithms. The traversal approach con- 
sists of three key steps [1,2]: 


¢ Translating the underlying process parameter 
variations to an orthogonal set of random 
variables 

¢ Representing gate delay variations in terms of 
this orthogonal set 

¢ Performing a topological traversal of G and 
computing the arrival time at each node and 
maximum delay of the circuit 


Orthogonalizing Process Parameter 
Distributions 

A common assumption is that the underlying 
process parameters, such as the transistor width 
W and effective length Ler of devices, gate oxide 
thickness (75x), and device threshold voltage 
(V,) due to random dopant fluctuations, show 
a Gaussian distribution. Each individual device 
is separately represented by such a parameter. 
The distributions of 7), and VY are largely 
uncorrelated across devices. In contrast, the 
dimension-based parameters, W and Leg, show 
strong spatial correlations, whereby the distribu- 
tions of nearby devices are strongly correlated, 
and this correlation falls off as a function of 
distance. 

The existence of correlations can significantly 
complicate the task of statistical timing analysis, 
since all pairwise combinations of random vari- 
ables must be considered during the optimization, 
potentially leading to quadratic complexity in 
|V|. To overcome this, an initial principal compo- 
nent analysis (PCA) [7] step is carried out that or- 
thogonalizes the underlying Gaussians, enabling 
linear-time analysis. PCA is a one-time operation 
for a given process (which is used for numerous 
designs). Therefore, although its worst-case com- 
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plexity is cubic in |V|, the expense is practically 
manageable as it is amortized over numerous 
designs. Furthermore, sparsity properties of the 
correlation matrix realistically imply that in prac- 
tice, the cost of PCA scales considerably slower 
than this cubic rate. 

For cases where the underlying process 
parameters may be a mix of Gaussians or 
non-Gaussians, it is possible to orthogonalize 
the Gaussian parameters using PCA and 
non-Gaussian parameters using independent 
component analysis (ICA) [4]. The approach in 
[8] extends the graph-based approach presented 
here and shows how statistical timing analysis 
can be performed for case where some or all 
process parameters are non-Gaussian. 


Gate Delay Distribution 
To build a model for the gate delay that captures 
the underlying variations in process parameters, 


Ha = do 


af 7? 

2, 2 

=> |5-| oF +2 
Vi Opi ‘ 


» Vitj 
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we observe that the delay function d = f(P), 
where P is a set of process parameters, can be 
approximated d linearly using a first-order Taylor 
expansion: 


d=do+ 


De 


¥ parameters p; : 


af 
EAI Api (1) 


where do is the nominal value of d, calculated 
at the nominal values of parameters in the set 


P; [#4], is computed at the nominal values of 
Dis APi = Pi — Mp; 1S a normally distributed 
random variable; and Ap; ~ N(0,op,). The 
delay function here can be arbitrarily complex. 

If all parameters in P can be modeled by Gaus- 
sian distributions, this approximation implies that 
d is a linear combination of Gaussians, which is 
therefore Gaussian. Its mean j1g and variance on 
are 


(2) 


(3) 


where cov (pi. P;) is the covariance of p; and 
Pj- 

This approximation is valid when Ap; has 
relatively small variations, in which domain the 
first-order Taylor expansion is adequate and the 
approximation is acceptable with little loss of 
accuracy. This is generally true of the impact 
of within-die variations on delay, where the pro- 
cess parameter variations are relatively small in 
comparison with the nominal values, and the 
function changes by a small amount under this 
perturbation. Hence, the delays, as functions of 
the process parameters, can be approximated as 
normal distributions when the parameter varia- 
tions are assumed to be normal. Higher-order 
expansions based on quadratics have also be 
explored to cover cases where the variations are 
larger [6, 11]. 


Circuit Delay Distribution 

A PCA-based approach maintains the invariant 
that the output arrival time at each gate is a 
Gaussian variable represented as 


n 
@;(D1,..., Pn) = a; + So kit + kn+1 P41 


i=1 

(4) 
Here, the primed variables correspond to the 
principal components of the unprimed variables 
and maintain the form of the arrival time after 
each sum and max operation. Gate delays, as 
represented in Eq.1, can be translated into a 
similar representation based on principal com- 
ponents as a one-time step during gate library 
characterization. Under orthogonalization, many 
operations become much simpler since the co- 
variance terms disappear: for example, Eq. 3 can 
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be evaluated in linear time instead of quadratic 
time. 

The task of statistical timing analysis is 
to translate these gate delay distributions 
to circuit delay probabilities while  per- 
forming a topological traversal. The opera- 
tions performed at each node encountered 
during this traversal in STA are of two 


types [5]: 


e A gate (vertex) is being processed in STA 
when the arrival times of all inputs are 
known, at which time the candidate delay 
values at the output are computed using 
the “sum” operation that adds the delay 
at each input with the input-to-output pin 
delay. 

e Once these candidate delays have been 
found, the “max” operation is applied to 
determine the maximum arrival time at the 
output. 


” 


Since the gate delays are Gaussian, the “sum 
operation is merely an addition of Gaussians, 
which is well known to be a Gaussian. The 
computation of the max function, however, 
poses greater problems. The set of candidate 
delays are all Gaussian, so that this function 
must find the maximum of Gaussians. Such 
a maximum may be reasonably approximated 
using a Gaussian [3]. A detailed description of 
how the invariant representation is maintained 
under the max operation is presented in 
[12 

The cost of this method corresponds to run- 
ning a bounded number of deterministic STAs, 
and it is demonstrated to be accurate, given the 
Statistics of P. 


Applications 
Statistical timing analysis has been ex- 
tensively used in industry [10] and _ has 


seeded a large amount of academic research. 
Integrated circuit manufacturing foundries 
have promoted the use of statistical timing 
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analysis by providing PCA-like information 
with their process parameter models, thus 
enabling design flows that are statistically 
based. 

The ideas of statistical analysis have also mo- 
tivated simpler and more approximate methods, 
used in industry today, based on on-chip vari- 
ation (OCV) derating factors. In its most ele- 
mentary form, OCV adds margins to each timing 
path to account for possible variation. More in- 
volved versions of OCV, such as advanced OCV 
(AOCV), capture the essence of spatial corre- 
lation by using derating factors that depend on 
factors such as spatial distance and logical depth 
of a path [9]. 


Experimental Results 


Statistical timing analysis based on orthogonal- 
ization brings down the computational cost from 
quadratic to linear in the number of variables 
and can be applied to large circuit instances. The 
method is capable of considering both spatial 
correlations and structural correlations, i.e., cor- 
relations between paths that share gates, since 
such correlations are embedded into the invariant 
representation. This makes the approach accu- 
rate and computationally practical, as described 
in [1, 2, 10] and the large body of follow-on 
work. 


URLs to Code and Data Sets 


The MinnSSTA statistical static timing analyzer 
is available at http://www.ece.umn.edu/~sachin/ 
software/MinnSSTA/index.html. 
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Problem Definition 


The Steiner forest problem is a fundamental prob- 
lem in network design. Informally, the goal is to 
establish connections between pairs of vertices in 
a given network at minimum cost. The problem 
generalizes the well-known Steiner tree problem. 
As an example, assume that a telecommunica- 
tion company receives communication requests 
from their customers. Each customer asks for 
a connection between two vertices in a given 
network. The company’s goal is to build a min- 
imum cost network infrastructure such that all 
communication requests are satisfied. 


Formal Definition and Notation 

More formally, an instance J = (G,c,R) of 
the Steiner forest problem is given by an 
undirected graph G = (V,EF) with vertex set 
V and edge set E, a non-negative cost function 
c: E > Q*, and a set of vertex pairs R = 
{(51, t1),..-, (8%, t%)} GC Vx V. The pairs in 
R are called terminal pairs. A feasible solution 
is a subset F C FE of the edges of G such that 
for every terminal pair (s;, t;) € R there is a path 
between s; and ¢; in the subgraph G[F] induced 
by F. Let the cost c(F) of F be defined as the total 
cost of all edges in F, ie., c(F) = dover c(e). 
The goal is to find a feasible solution F of 
minimum cost c(F). It is easy to see that there 
exists an optimum solution which is a forest. 

The Steiner forest problem may alternatively 
be defined by a set of terminal groups 
R= {g1,...,g¢} with g; CV instead of 
terminal pairs. The objective is to compute 
a minimum cost subgraph such that all terminals 
belonging to the same group are connected. This 
definition is equivalent to the one given above. 


Related Problems 

A special case of the Steiner forest problem is the 
Steiner tree problem (see also the entry > Steiner 
Trees). Here, all terminal pairs share a common 
root vertex r € V,ie.,r € {5;,t;} for all terminal 
pairs (s;,t;) € R. In other words, the problem 
consists of a set of terminal vertices R C V and 
a root vertex r € V and the goal is to connect the 
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terminals in R to r in the cheapest possible way. 
A minimum cost solution is a tree. 

The generalized Steiner network problem (see 
the entry » Generalized Steiner Network), also 
known as the survivable network design problem, 
is a generalization of the Steiner forest prob- 
lem. Here, a connectivity requirement function 
r:V x V + N specifies the number of edge dis- 
joint paths that need to be established between 
every pair of vertices. That is, the goal is to find 
a minimum cost multi-subset H of the edges of G 
(H may contain the same edge several times) such 
that for every pair of vertices (x, y) € V there are 
r(x, y) edge disjoint paths from x to y in G[H]. The 
goal is to find a set H of minimum cost. Clearly, 
if r(x, y) € {0,1} for all (x,y) EV x V, this 
problem reduces to the Steiner forest problem. 


Key Results 


Agrawal, Klein and Ravi [1, 2] give an approx- 
imation algorithm for the Steiner forest prob- 
lem that achieves an approximation ratio of 2. 
More precisely, the authors prove the following 
theorem. 


Theorem 1 There exists an approximation algo- 
rithm that for every instance I = (G,c, R) of the 
Steiner forest problem, computes a feasible forest 
F such that 


c(F) < (2- z) -OPT(J), 


where k is the number of terminal pairs in R and 
OPT(/) is the cost of an optimal Steiner forest 
for l. 


Related Work 

The Steiner tree problem is NP-hard [10] and 
APX-complete [4, 8]. The current best lower 
bound on the achievable approximation ratio for 
the Steiner tree problem is 1.0074 [21]. Goemans 
and Williamson [11] generalized the results ob- 
tained by Agrawal, Klein and Ravi to a larger 
class of connectivity problems, which they term 
constrained forest problems. For the Steiner for- 
est problem, their algorithm achieves the same 
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approximation ratio of (2 — 1/k). The algorithms 
of Agrawal, Klein and Ravi [2] and Goemans and 
Williamson [11] are both based on the classical 
undirected cut formulation for the Steiner forest 
problem [3]. The integrality gap of this relax- 
ation is known to be (2 —1/k) and the results 
in [2, 11] are therefore tight. Jain [15] presents 
a 2-approximation algorithm for the generalized 
Steiner network problem. 


Primal-Dual Algorithm 

The main ideas of the algorithm by Agrawal, 
Klein and Ravi [2] are sketched below; subse- 
quently, AKR is used to refer to this algorithm. 
The description given here differs from the one 
in [2]; the interested reader is referred to [2] for 
more details. 

The algorithm is based on the following inte- 
ger programming formulation for the Steiner for- 
est problem. Let J = (G,c, R) be an instance of 
the Steiner forest problem. Associate an indicator 
variable xe € {0,1} with every edge e € E. The 
value of x, is | if e is part of the forest F and 0 oth- 
erwise. A subset S C V of the vertices is called 
a Steiner cut if there exists at least one terminal 
pair (s;,t;) € R such that |{s;,¢;} N S| = 1; Sis 
said to separate terminal pair (s;, t;). Let S be the 
set of all Steiner cuts. For a subset S C V, define 
6(S) as the the set of all edges in E that have 
exactly one endpoint in S. Given a Steiner cut 
S € S, any feasible solution F of J must contain 
at least one edge that crosses the cut S, ie., 
>» cca(s) Xe = 1. This gives rise to the following 
undirected cut formulation: 

minimize > c(e)Xe (IP) 
ecE 
Yi x1 VSeES (1) 
e<d(S) 
xe € {0,1} Vee E. (2) 


subject to 


The dual of the linear programming relaxation 
of (IP) has a variable ys for every Steiner cut 
S € S. There is a constraint for every edge e € E 
that requires that the total dual assigned to sets 
S' € S that contain exactly one endpoint of e is at 
most the cost c(e) of the edge: 
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maximize ys (D) 


Ses 
subject to 2 ys <c(e) VeeE (3) 
SES: e€5(S) 


ys>O0 VSES. (4) 
Algorithm AKR is based on the primal-dual 
schema (see, e.g., [22]). That is, the algorithm 
constructs both a feasible primal solution 
for (IP) and a feasible dual solution for (D). 
The algorithm starts with an infeasible primal 
solution and reduces its degree of infeasibility 
as it progresses. At the same time, it creates 
a feasible dual packing of subsets of large total 
value by raising dual variables of Steiner cuts. 

One can think of an execution of AKR as 
a process over time. Let x* and y’, respectively, 
be the primal incidence vector and feasible dual 
solution at time t. Initially, let x = 0 for all 
e € EF and yy = 0 for all S € S. Let F* denote 
the forest corresponding to the set of edges with 
xg = 1. A tree T in F” is called active at time 
t if it contains a terminal that is separated from 
its mate; otherwise it is inactive. Intuitively, AKR 
grows trees in F* that are active. At the same 
time, the algorithm raises dual values of Steiner 
cuts that correspond to active trees. If two active 
trees collide, they are merged. The process termi- 
nates if all trees are inactive and thus there are 
no unconnected terminal pairs. The interplay of 
the primal (growing trees) and the dual process 
(raising duals) is somewhat subtle and outlined 
next. 

An edge e € E is tight if the corresponding 
constraint (3) holds with equality; a path is tight if 
all its edges are tight. Let H* be the subgraph of G 
that is induced by the tight edges for dual y*. The 
connected components of H™ induce a partition 
C* on the vertex set V. Let S* be the set of all 
Steiner cuts contained in Ct, ie., ST =CT™NS. 
AKR raises the dual values ys for all sets S € S* 
uniformly at all times t > 0. Note that y’ is dual 
feasible. The algorithm maintains the invariant 
that F* is a subgraph of H” at all times. Consider 
the event that a path P between two trees T; and 
T> of F* becomes tight. The missing edges of P 
are then added to F* and the process continues. 
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Eventually, all trees in F* are inactive and the 
process halts. 


Applications 


The computation of (approximate) solutions for 
the Steiner forest problem has various applica- 
tions both in theory and practice; only a few 
recent developments are mentioned here. 

Algorithms for more complex network design 
problems often rely on good approximation 
algorithms for the Steiner forest problem. For 
example, the recent approximation algorithms 
[6,9, 12] for the multi-commodity rent-or-buy 
problem (MRoB) are based on the random 
sampling framework by Gupta et al. [12, 13]. The 
framework uses a Steiner forest approximation 
algorithm that satisfies a certain strictness 
property as a subroutine. Fleischer et al. [9] 
show that AKR meets this strictness requirement, 
which leads to the current best 5-approximation 
algorithm for MRoB. The strictness property 
also plays a crucial role in the boosted sampling 
framework by Gupta et al. [14] for two-stage 
stochastic optimization problems with recourse. 

Online versions of Steiner tree and forest prob- 
lems have been studied by by Awerbuch et al. [5] 
and Berman and Coulston [7]. In the area of algo- 
rithmic game theory, the development of group- 
strategyproof cost sharing mechanisms for net- 
work design problems such as the Steiner tree 
problem has recently received a lot of atten- 
tion; see e.g., [16, 17, 19, 20]. An adaptation of 
AKR yields such a cost sharing mechanism for 
the Steiner forest problem [18]. 


Cross-References 


Generalized Steiner Network 
Steiner Trees 


Recommended Reading 


The interested reader is referred in particular to 
the articles [2, 11] for a more detailed description 
of primal-dual approximation algorithms for gen- 
eral network design problems. 
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Definition 


Given a set of points, called terminals, in a 
metric space, the problem is to find the shortest 
tree interconnecting all terminals. There are three 
important metric spaces for Steiner trees, the Eu- 
clidean plane, the rectilinear plane, and the edge- 
weighted network. The Steiner tree problems 
in those metric spaces are called the Euclidean 
Steiner tree (EST), the rectilinear Steiner tree 
(RST), and the network Steiner tree (NST), re- 
spectively. EST and RST have been found to have 
polynomial-time approximation schemes (PTAS) 
by using adaptive partition. However, for NST, 
there exists a positive number 7 such that com- 
puting r-approximation is NP-hard. So far, the 
best performance ratio of polynomial-time ap- 
proximation for NST is achieved by k-restricted 
Steiner trees. However, in practice, the iterated 
1-Steiner tree is used very often. Actually, the 
iterated 1-Steiner was proposed as a candidate of 
good approximation for Steiner minimum trees 
a long time ago. It has a very good record in 
computer experiments, but no correct analysis 
was given showing the iterated 1-Steiner tree 
having a performance ratio better than that of 
the minimum spanning tree until the recent work 
by Du et al. [9]. There is minimal difference in 
construction of the 3-restricted Steiner tree and 
the iterated 1-Steiner tree, which makes a big 
difference in analysis of those two types of trees. 
Why does the difficulty of analysis make so much 
difference? This will be explained in this article. 


History and Background 

The Steiner tree problem was proposed by 
Gauss in 1835 as a generalization of the Fermat 
problem. Given three points A, B, and C 
in the Euclidean plane, Fermat studied the 
problem of finding a point S to minimize 
|SA| + |SB| + |SC|. He determined that when all 
three inner angles of triangle ABC are less than 
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120°, the optimal S should be at the position that 
ZASB = ZBSC = ZCSA = 120°. 

The generalization of the Fermat problem has 
two directions: 


1. Given n points in the Euclidean plane, find a 
point S to minimize the total distance from S 
ton given points. This is still called the Fermat 
problem. 

2. Given n points in the Euclidean plane, find 
the shortest network interconnecting all given 
points. 


Gauss found the second generalization through 
communication with Schumacher. On March 19, 
1836, Schumacher wrote a letter to Gauss and 
mentioned a paradox about Fermat’s problem: 
Consider a convex quadrilateral ABCD. It is 
known that the solution of Fermat’s problem for 
four points A, B, C, and D is the intersection E 
of diagonals AC and BD. Suppose extending DA 
and CB can obtain an intersection F. Now, move 
A and B to F. Then E will also be moved to F. 
However, when the angle at F is less than 120°, 
the point F cannot be the solution of Fermat’s 
problem for three given points F, D, and C. 
What happens? (Fig. 1.) 

On March 21, 1836, Gauss wrote a letter 
replying to Schumacher in which he explained 
that the mistake of Schumacher’s paradox occurs 
at the place where Fermat’s problem for four 
points A, B, C, and D is changed to Fermat’s 
problem for three points F, C, and D. When A 
and B are identical to F’, the total distance from 
E to four points A, B, C, and D equals 2|EF| + 
|EC| + |ED|, not |EF| + |EC| + |ED|. Thus, 


D Cc 


Steiner Trees, Fig. 1 Convex quadrilateral ABCD, Fer- 
mat’s problem 
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the point E may not be the solution of Fermat’s 
problem for F, C, and D. More importantly, 
Gauss proposed a new problem. He said that it 
is more interesting to find the shortest network 
rather than a point. Gauss also presented several 
possible connections of the shortest network for 
four given points. 

It was unfortunate that Gauss’ letter was not 
seen by researchers of Steiner trees at an earlier 
stage. Especially, R. Courant and H. Robbins 
who in their popular book What is mathematics? 
(published in 1941) [6] called Gauss’ problem 
the Steiner tree so that “Steiner tree” became a 
popular name for the problem. 

The Steiner tree became an important research 
topic in mathematics and computer science due to 
its applications in telecommunication and com- 
puter networks. Starting with Gilbert and Pollak’s 
work published in 1968, many publications on 
Steiner trees have been generated to solve various 
problems concerning it. 

One well-known problem is the Gilbert-Pollak 
conjecture on the Steiner ratio, which is the least 
ratio of lengths between the Steiner minimum 
tree and the minimum spanning tree on the same 
set of given points. Gilbert and Pollak in 1968 
conjectured that the Steiner ratio in the Euclidean 
plane is 3/2 which is achieved by three vertices 
of an equilateral triangle. A great deal of research 
effort has been put into the conjecture and it was 
finally proved by Du and Hwang [7]. 

Another important problem is called the bet- 
ter approximation. For a long time no approxi- 
mation could be proved to have a performance 
ratio smaller than the inverse of the Steiner ra- 
tio. Zelikovsky [14] made the first breakthrough. 
He found a polynomial-time 1 1/6-approximation 
for NST which beats 1/2, the inverse of the 
Steiner ratio in the edge-weighted network. Later, 
Berman and Ramaiye [2] gave a polynomial-time 
92/72-approximation for RST, and Du, Zhang, 
and Feng [8] closed the story by showing that in 
any metric space, there exists a polynomial-time 
approximation with a performance ratio better 
than the inverse of the Steiner ratio provided 
that for any set of a fixed number of points, 
the Steiner minimum tree is polynomial-time 
computable. 
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All the above better approximations came 
from the family of k-restricted Steiner trees. 
By improving some detail of construction, the 
constant performance ratio was decreasing, but 
the improvements were also becoming smaller. 
In 1996, Arora [1] made significant progress for 
EST and RST. He showed the existence of PTAS 
for EST and RST. Therefore, the theoretical 
researchers now pay more attention to NST. Bern 
and [3] showed that NST is MAX SNP-complete. 
This means that there exists a positive number r; 
computing the r-approximation for NST is NP- 
hard. The best-known performance for NST was 
given by Robin and Zelikovsky [12]. They also 
gave a very simple analysis to a well-known 
heuristic, the iterated 1-Steiner tree for pseudo- 
bipartite graphs. 

Analysis of the iterated 1-Steiner tree is an- 
other long-standing open problem. Since Chang 
[4,5] proposed that the iterated 1-Steiner tree 
approximates the Steiner minimum tree in 1972, 
its performance has been claimed to be very good 
through computer experiments [10, 13], but no 
theoretical analysis supported this claim. Actu- 
ally, both the k-restricted Steiner tree and the 
iterated 1-Steiner tree are obtained by greedy 
algorithms, but with different types of potential 
functions. For the iterated 1-Steiner tree, the 
potential function is non-submodular, but for the 
k-restricted Steiner tree, it is submodular; a prop- 
erty that holds for k-restricted Steiner trees may 
not hold for iterated 1-Steiner trees. Actually, 
the submodularity of potential function is very 
important in analysis of greedy approximations 
[11]. Du et al. [9] gave a correct analysis for the 
iterated 1-Steiner tree with a general technique to 
deal with non-submodular potential function. 


Key Results 


Consider input edge-weighted graph G = (V, E) 
of NST. Assume that G is a complete graph and 
the edge weight satisfies the triangular inequality; 
otherwise, consider the complete graph on V with 
each edge (u, v) having a weight equal to the 
length of the shortest path between u and v in 
G. Given a set P of terminals, a Steiner tree is a 


Steiner Trees 


tree interconnecting all given terminals such that 
every leaf is a terminal. 

In a Steiner tree, a terminal may have degree 
more than one. The Steiner tree can be decom- 
posed, at those terminals with degree more than 
one, into smaller trees in which every terminal is 
a leaf. In such a decomposition, each resulting 
small tree is called a full component. The size 
of a full component is the number of terminals 
in it. A Steiner tree is k-restricted if every full 
component of it has a size at most k. The short- 
est k-restricted Steiner tree is also called the 
k-restricted Steiner minimum tree. Its length is 
denoted by smt, (P). Clearly, smt2(P) is the 
length of the minimum spanning tree on P,, which 
is also denoted by mst(P). Let smt(P) denote 
the length of the Steiner minimum tree on P. If 
smt3(P) can be computed in polynomial time, 
then it is better than mst(P) for an approxima- 
tion of smt(P). However, so far no polynomial- 
time approximation has been found for smt3(P). 
Therefore, Zelikovsky [14] used a greedy ap- 
proximation of smt3(P) to approximate smt(P). 
Actually, Chang [4,5] used a similar greedy 
algorithm to compute an iterated 1-Steiner tree. 
Let F be a family of subgraphs of input edge- 
weighted graph G. For any connected subgraph 
H, denote by mst(#7) the length of the minimum 
spanning tree of H, and for any subgraph H, 
denote by mst(H) the sum of mst(H’) for H’ 
over all connected components of H. Define 


gain(H) = mst(P)—mst(P: H)-—mst(A), 


where mst(P : #) is the length of the minimum 
spanning tree interconnecting all unconnected 
terminals in P after every edge of H shrinks into 
a point. 


Greedy Algorithm H < 9; 

while P has not been interconnected by H do 
choose F € F to maximize gain(H U F); 
output mst(H7). 
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When F consists of all full components of size 
at most three, this greedy algorithm gives the 3- 
restricted Steiner tree of Zelikovsky [14]. When 
F consists of all 3-stars and all edges where a 3- 
star is a tree with three leaves and a central vertex, 
this greedy algorithm produces the iterated 1- 
Steiner tree. An interesting fact pointed out by Du 
et al. [9] is that the function gain(-) is submodular 
over all full components of size at most three, but 
not submodular over all 3-stars and edges. 

Let us consider a base set F and a function 
f from all subsets of E to real numbers. f is 
submodular if for any two subsets A, B of E, 


F(A) + f(B) = f(AU B) + f(AN B). 


For x € Eand A C E, denote A, f(A) = f(AU 
{x}) — f(A). 

Lemma 1 ff is submodular if and only if for any 
AC E and distinct x, y € E — A, 


A, Ay f(A) <0. (1) 


Proof Suppose f is submodular. Set B = A U 
{x} and C = AU {y}. Then BUC = AUAU 
{x,y} and BNC = A. Therefore, one has 


(AU {x, y}) — f(AU fx) — f(A U fy) 
+ f(A) < 0, 


that is, (1) holds. 

Conversely, suppose (1) holds for any A C E 
and distinct x, y € E — A. Consider two subsets 
A,B of E. If A C Bor B C A, it is trivial to 
have 


F(A) + f(B) = f(AU B) + f(AN B). 
Therefore, one may assume that A\B #4 @ and 


B\A # @. Write A\B = {x1,...,xx} and 
B\A = {y1,..., yn}. Then 


S(AU B)— f(A) — f(B) + f(AN B) 


nN 
ll 


Ax; Ay; f(A U {x1,...,X;-1} U {y1,---,yj-1})) 
1 
<0, 
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where {x1,...X;-1} = @ fori = 1 and 
{y1,-.-,¥j-1} = @ for j = 1. Oo 


Lemma 2 Define f(H) = —mst(P : H). Then 
} is submodular over edge set E. 


Proof Note that for any two distinct edges x and 
y not in subgraph H, 


AxAf(H) 
=—-mst(P:HUxUy)+mst(P: HUx) 
+ mst(P:HUy)—mst(P: H) 
= (mst(P: H)—mst(P: HUxUy)) 
—(mst(P : H)—mst(P: HUx)) 
+ (mst(P : H)—mst(P:H Uy)). 


Let T be a minimum spanning tree for uncon- 
nected terminals after every edge of H shrinks 
into a point. T contains a path P, connecting two 
endpoints of x and also a path Py connecting two 
endpoints of y. Let ex (ey) be a longest edge in 
Py (Py). Then 


mst(P : H)—mst(P : H Ux) =length(ex), 
mst(P: H)—mst(P: H Uy) =length(ey). 


mst(P : H) — mst(P : H Ux U y) can be 
computed as follows: Choose a longest edge e/ 
from P, U Py. Note that TU x Uy —e’ contains 
a unique cycle Q. Choose a longest edge e// from 
(P, U Py) Q. Then 


mst(P : H)—mst(P : HUxUy) = length(e”) 


Now, to show the submodularity of /, it suffices 
to prove 


length(ex) + length(e,) > length(e") (2) 


Case 1. e, Py 1 Py and eyP, M Py. Without 
loss of generality, assume Jlength(ex) = 
length(e,). Then one may choose e’ = ey 
so that (P, U Py) M Q = Py. Hence, one can 
choose e”” = ey. Therefore, the equality holds 
for (2). 
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Case 2. ex Py Py andey € P, M Py. Clearly, 
length(ex) > length(ey). Hence, one may 
choose e’ = ex so that (P, U Py) Q = Py. 
Hence, one can choose e” = ey. Therefore, 
the equality holds for (2). 

Case 3. ex € Py M Py and ey Py M Py. Similar 
to Case 2. 

Case 4. ex € Px 1 Py andey € Py M Py. In this 
case, length(ex) = length(ey) = length(e’). 
Hence, (2) holds. oO 


The following explains that the submodularity 
of gain(-) holds for a k-restricted Steiner tree. 


Theorem 1 Let ¢ be the set of all full compo- 
nents of a Steiner tree. Then gain(-) as a function 
on the power Set of € is submodular. 


Proof Note that for any H C € andx, y € €-H, 
AxAymst(H) = 0, 


where H = Uzexz. Thus, this theorem follows 
from Lemma 2. 

Let F be the set of 3-stars and edges chosen 
in the greedy algorithm to produce an iterated 1- 
Steiner tree. Then gain(-) may not be submodular 
on F. To see this fact, consider two 3-stars 
x and y in Fig.2. Note that gain(x U y) > 
gain(x), gain(y) < 0, and gain(@) = 0. One has 


gain(xUy)—gain(x)—gain(y)+gain(@) > 0. 


Steiner Trees, Fig. 2. An example for the proof of Theo- 
rem | 
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Applications 


The Steiner tree problem is a classic NP-hard 
problem with many applications in the design of 
computer circuits, long-distance telephone lines, 
multicast routing in communication networks, 
etc. There exist many heuristics of the greedy 
type for Steiner trees in the literature. Most of 
them have a good performance in computer ex- 
periments, without support from theoretical anal- 
ysis. The approach given in this work may apply 
to them. 


Open Problems 


It is still open whether computing the 3-restricted 
Steiner minimum tree is NP-hard or not. For 
k > 4, itis known that computing the k-restricted 
Steiner minimum tree is NP-hard. 
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Problem Definition 


This problem deals with packing a maximum 
reward set of items into a knapsack of given 
capacity, when the item sizes are random. The 
input is a collection of m items, where each item 
i € [n) := {1,...,m} has reward r; > 0 and 
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size S; > 0, and a knapsack capacity B > 0. In 
the stochastic knapsack problem, all rewards are 
deterministic but the sizes are random. The ran- 
dom variables S;s are independent with known, 
arbitrary distributions. The actual size of an item 
is known only when it is placed into the knapsack. 
The objective is to add items sequentially (one 
by one) into the knapsack so as to maximize 
the expected reward of the items that fit into the 
knapsack. As usual, a subset 7 of items is said to 
fit into the knapsack if the total size }°;.7 S; is 
at most the knapsack capacity B. 

A feasible solution (or policy) to the stochastic 
knapsack problem is represented by a decision 
tree. Nodes in this decision tree denote the current 
“state” of the solution (i.e., previously added 
items and the residual knapsack capacity) as well 
as the new item to place into the knapsack at this 
state. Branches in the decision tree denote the 
random size instantiations of items placed into 
the knapsack. Such solutions are called adaptive 
policies, to emphasize the fact that the items 
being placed may depend on previously observed 
outcomes. More formally, an adaptive policy is 
given by a mapping z : 2!"!x[0, B] — [n], where 
u(T,C) denotes the next item to place into the 
knapsack when some subset T C [n] of items has 
already been added, and C = B— ier S; is 
the residual knapsack capacity. The policy ends 
when the knapsack overflows (i.e., the total size 
of items added exceeds the knapsack capacity); 
we use the convention that no reward is obtained 
from the last overflowing item. 

Notice that an arbitrary adaptive policy may 
require exponential space to even store. This 
motivates a special class of solutions, called non- 
adaptive policies. A nonadaptive policy is just 
specified by a fixed ordering of the n items, 
and the solution adds items into the knapsack 
in that order (irrespective of the actual size in- 
stantiations) until the knapsack overflows. Again, 
there is no reward obtained from the last over- 
flowing item. While it may be easier to obtain a 
good nonadaptive policy, the obvious drawback 
is that nonadaptive policies may perform much 
worse than adaptive policies. The benefit of be- 
ing adaptive is quantified by a measure called 
the adaptivity gap, which is the maximum ratio 
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(over all instances) of the expected reward of an 
optimal adaptive policy to the expected reward of 
an optimal nonadaptive policy. 

In both the adaptive and nonadaptive settings, 
the stochastic knapsack problem is at least NP- 
hard, since it generalizes the deterministic knap- 
sack problem. Moreover, certain questions re- 
garding adaptive policies are PSPACE-hard [4]. 
Notation We assume that the item size 
distributions are given explicitly. For any itemi € 
[n] define its effective reward w; = r;-Pr[S; < B] 
and its mean truncated size 4; = E [min{S;, B}]. 
Note that the expected reward obtained by 
placing the single item i into the knapsack is 
exactly wj. 


Key Results 


Dean, Goemans, and Vondrak introduced the 
stochastic knapsack problem and the notion of 
adaptivity gaps. They proved the following. 


Theorem 1 ([4]) There is a polynomial time al- 
gorithm for the stochastic knapsack problem that 
computes a nonadaptive policy having expected 
reward at least + that of an optimal adaptive 
policy. 


As a consequence, the adaptivity gap of 
the stochastic knapsack problem is also upper 
bounded by four. Dean, Goemans, and Vondrak 
[4] also showed an instance of stochastic 
knapsack that lower bounds the adaptivity gap 
by 3. 

The algorithm in Theorem 1 uses a natural 
greedy approach. It outputs the better of the 
following two nonadaptive policies: 


¢ Place the single item i* = arg max; ¢[n] Wi- 
¢ Place items in nonincreasing order of w; / Ui. 


In terms of adaptive policies, Bhalgat, Goel, 
and Khanna proved the following. 


Theorem 2 ((2, 3]) For any constant « > 0, 
there is polynomial time algorithm for the 
stochastic knapsack problem that computes an 
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adaptive policy having expected reward at least 
_ that of an optimal adaptive policy. 


The algorithm in Theorem 2 relies on an intricate 
transformation of general size distributions to 
certain canonical distributions and an algorithm 
for computing a near-optimal adaptive policy 
under canonical size distributions. 


Extensions 


Several variants of the stochastic knapsack prob- 
lem have been studied, and good algorithms have 
been obtained for them. 


Correlated Stochastic Knapsack 

This is a generalization of the stochastic knap- 
sack problem, where each item’s reward is also 
random and possibly correlated with its size. The 
distributions across items are still independent: 
so the correlations are only between the size and 
reward of a single item. Gupta, Krishnaswamy, 
Molinaro, and Ravi [6] gave an algorithm for 
this problem that computes a nonadaptive policy 
having expected reward within factor 8 of the 
optimal adaptive policy. Recently, Ma [8] gave an 
algorithm that for any constant « > 0 computes 
an adaptive policy having expected reward within 
factor 2 + € of the optimal adaptive policy; this 
algorithm requires item sizes and the capacity B 
to be specified in unary. 


Budgeted Multi-armed Bandit 

The input to this problem consists of a bound 
B and n “arms” (each arm is a Markov chain 
with rewards at its states and a specified starting 
state). A feasible policy consists of B steps. In 
each step, the policy can select one armi € [n]: 
upon selecting arm i, it gets the reward at the 
current state of arm i and the arm transitions 
to its next state according to its Markov chain. 
The objective is to maximize the expected total 
reward over B steps of the policy. Again, we 
are interested in adaptive policies, whose ac- 
tions may depend on past outcomes. Guha and 
Munagala [5] introduced this problem and gave 
a (2 + €)-approximation algorithm, under the 
assumption that the rewards of each arm satisfy 
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a “Martingale” condition (which is natural in 
many settings). Gupta, Krishnaswamy, Molinaro, 
and Ravi [6] gave the first constant-factor ap- 
proximation algorithm for this problem without 
the Martingale reward assumption. The constant 
factor in the latter result was improved to 6.75 by 
Ma [8]. 


Stochastic Orienteering 

This problem is defined on a finite metric space 
(V,d) with vertex set V and distance function 
d:VxV — Rj, that satisfies (i) symmetry 
d(u, v) = d(v,u) forall u, v € V and (ii) triangle 
inequality d(u,w) < d(u,v) + d(v,w) for all 
u,v,w € V. The distances between vertices de- 
note travel times. Each vertex i € V corresponds 
to a job having deterministic reward r; > 0 and 
random processing time S; > 0. The random 
variables S;s are independent with known, arbi- 
trary distributions. Given a start-vertex p € V and 
bound B, the goal is to compute a policy, which 
describes a (possibly adaptive) path originating 
from p that visits vertices and runs the respective 
jobs. The actual processing time of a job is known 
only when it completes. The policy ends when 
the total time (for travel plus processing) exceeds 
B. The objective is to maximize the expected 
total reward; there is no reward obtained from 
a partially completed job (which may occur at 
the end of the policy). As before, an optimal 
policy may be adaptive and choose the next job 
to run based on previously observed outcomes. 
Gupta, Krishnaswamy, Nagarajan, and Ravi [7] 
gave an O(log log B)-approximation algorithm 
for the stochastic orienteering problem; this re- 
sult requires the bound B, distances, and pro- 
cessing times to be integer valued. As a corol- 
lary, [7] also upper bounded the adaptivity gap 
by O(loglog B). Recently, Bansal and Nagara- 


jan [1] gave an 22 ( Vlog log B) lower bound on 
the adaptivity gap. 


Applications 
The stochastic knapsack problem and its variants 


model various applications in advertising, logis- 
tics, medical diagnosis, and robotics. 
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Open Problems 


It is not known if the stochastic knapsack prob- 
lem is any harder to approximate than the usual 
(deterministic) knapsack problem. In particular, 
is there a PTAS for stochastic knapsack? Deter- 
mining a tight bound on its adaptivity gap is also 
an interesting open question. 
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Problem Definition 


Scheduling is concerned with the allocation of 
scarce resources (such as machines or servers) to 
competing activities (such as jobs or customers) 
over time. The distinguishing feature of a 
stochastic scheduling problem is that some of the 
relevant data are modeled as random variables, 
whose distributions are known, but whose 
actual realizations are not. Stochastic scheduling 
problems inherit several characteristics of their 
deterministic counterparts. In particular, there are 
virtually an unlimited number of problem types 
depending on the machine environment (single 
machine, parallel machines, job shops, flow 
shops), processing characteristics (preemptive 
versus nonpreemptive, batch scheduling versus 
allowing jobs to arrive “over time,’ due dates, 
deadlines), and objectives (makespan, weighted 
completion time, weighted flow time, weighted 
tardiness). Furthermore, stochastic scheduling 
models have some new, interesting features (or 
difficulties!): 


The scheduler may be able to make inferences 
about the remaining processing time of a job 
by using information about its elapsed pro- 
cessing time; whether the scheduler is allowed 
to make use of this information or not is a 
question for the modeler. 

Many scheduling algorithms make decisions 
by comparing the processing times of jobs. If 
jobs have deterministic processing times, this 
poses no problems as there is only one way 
to compare them. If the processing times are 
random variables, comparing processing times 
is a subtle issue. There are many ways to com- 
pare pairs of random variables, and some are 
only partial orders. Thus, any algorithm that 
operates by comparing processing times must 
now specify the particular ordering used to 
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compare random variables (and to determine 
what to do if two random variables are not 
comparable under the specified ordering). 


These considerations lead to the notion of a 
scheduling policy, which specifies how the scarce 
resources have to be allocated to the competing 
activities as a function of the state of the system at 
any point in time. The state of the system includes 
information such as prior job completions, the 
elapsed time of jobs currently in service, the 
realizations of the random release dates and due 
dates (if any), and any other information that 
can be inferred based on the history observed so 
far. A policy that is allowed to make use of all 
this information is said to be dynamic, whereas 
a policy that is not allowed to use any state 
information is static. 

Given any policy, the objective function 
for a stochastic scheduling model operating 
under that policy is typically a random variable. 
Thus, comparison of two policies entails the 
comparison of the associated random variables, 
so the sense in which these random variables 
are compared must be specified. A common 
approach is to find a solution that optimizes the 
expected value of the objective function (which 
has the advantage that it is a total ordering); less 
commonly, other orderings such as the stochastic 
ordering or the likelihood ratio ordering are 
used. 


Key Results 


Consider a single machine that processes n 
jobs, with the (random) processing time of job 
i given by a distribution F;(-) whose mean is 
pi. The Weighted Shortest Expected Processing 
Time first (WSEPT) rule sequences the jobs in 
decreasing order of w; / p;. Smith [13] proved that 
the WSEPT rule minimizes the sum of weighted 
completion times when the processing times are 
deterministic. Rothkopf [11] generalized this 
result and proved the following: 


Theorem 1 The WSEPT rule minimizes the ex- 
pected sum of the weighted completion times in 
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the class of all nonpreemptive dynamic policies 
(and hence also in the class of all nonpreemptive 
static policies). 


If preemption is allowed, the WSEPT rule is 
not optimal. Nevertheless, Sevcik [12] showed 
how to assign an “index” to each job at each 
point in time such that scheduling a job with 
the largest index at each point in time is op- 
timal. Such policies are called index policies 
and have been investigated extensively because 
they are (relatively) simple to implement and 
analyze. Often, the optimality of index policies 
can be proved under some assumptions on the 
processing time distributions. For instance, We- 
ber, Varaiya, and Walrand [14] proved the follow- 
ing result for scheduling n jobs on m identical 
parallel machines: 


Theorem 2 The SEPT rule minimizes the ex- 
pected sum of completion times in the class of all 
nonpreemptive dynamic polices, if the processing 
time distributions of the jobs are stochastically 
ordered. 


For the same problem but with the makespan 
objective, Bruno, Downey, and Frederickson [3] 
proved the optimality of the Longest Expected 
Processing Time first rule provided all the jobs 
have exponentially distributed processing times. 

One of the most significant achievements in 
stochastic scheduling is the proof of optimality of 
index policies for the multiarmed bandit problem 
and its many variants, due originally to Gittins 
and Jones [5, 6]. In an instance of the bandit 
problem, there are N projects, each of which is 
in any one of a possibly finite number of states. 
At each (discrete) time, any one of the projects 
can be attempted, resulting in a random reward; 
the attempted project undergoes a (Markovian) 
state transition, whereas the other projects remain 
frozen and do not change state. The goal of the 
decision maker is to determine an optimal way to 
attempt the projects so as to maximize the total 
discounted reward. Of course one can solve this 
problem as a large, stochastic dynamic program, 
but such an approach does not reveal any struc- 
ture and is moreover computationally impractical 
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except for very small problems. (Also, if the state 
space of any project is countable or infinite, it 
is not clear how one can solve the resulting DP 
exactly!) The remarkable result of Gittins and 
Jones [6] is the optimality of index policies: to 
each state of each project, one can associate an 
index so that attempting a project with the largest 
index at any point in time is optimal. The original 
proof of Gittins and Jones [6] has subsequently 
been simplified by many authors; moreover, sev- 
eral alternative proofs based on different tech- 
niques have appeared, leading to a much better 
understanding of the class of problems for which 
index policies are optimal [2,4,5, 10, 17]. 

While index policies are easy to implement 
and analyze, they are often not optimal in many 
problems. It is therefore natural to investigate the 
gap between an optimal index policy (or a natural 
heuristic) and an optimal policy. For example, the 
WSEPT rule is a natural heuristic for the problem 
of scheduling jobs on identical parallel machines 
to minimize the expected sum of the weighted 
completion times. However, the WSEPT rule 
is not necessarily optimal. Weiss [16] showed 
that, under mild and reasonable assumptions, the 
expected number of times that the WSEPT rule 
differs from the optimal decision is bounded 
above by a constant, independent of the number 
of jobs. Thus, the WSEPT rule is asymptotically 
optimal. As another example of a similar result, 
Whittle [18] generalized the multiarmed bandit 
model to allow for state transitions in projects that 
are not activated, giving rise to the “restless ban- 
dit” model. For this model, Whittle [18] proposed 
an index policy whose asymptotic optimality was 
established by Weber and Weiss [15]. 

A number of stochastic scheduling models 
allow for jobs to arrive over time according to 
a stochastic process. A commonly used model 
in this setting is that of a multiclass queueing 
network. Multiclass queueing networks serve as 
useful models for problems in which several 
types of activities compete for a limited number 
of shared resources. They generalize determinis- 
tic job-shop problems in two ways: jobs arrive 
over time and each job has a random processing 
time at each stage. The optimal control prob- 
lem in a multiclass queueing network is to find 
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an optimal allocation of the available resources 
to activities over time. Not surprisingly, index 
policies are optimal only for restricted versions 
of this general model. An important example is 
scheduling a multiclass single-server system with 
feedback: there are N types of jobs; type i jobs 
arrive according to a Poisson process with rate 
Ai, require service according to a service-time 
distribution F;(-) with mean processing time s;, 
and incur holding costs at rate c; per unit time. A 
type i job after undergoing processing becomes a 
type j job with probability p;; or exits the system 
with probability 1—* p;; isn’t in document. The 
j 


objective is to find a scheduling policy that min- 
imizes the expected holding cost rate in steady 
state. Klimov [9] proved the optimality of index 
policies for this model, as well as for the objective 
in which the total discounted holding cost is to 
be minimized. While the optimality result does 
not hold when there are many parallel machines, 
Glazebrook and Nifio-Mora [7] showed that this 
rule is asymptotically optimal. For more general 
models, the prevailing approach is to use ap- 
proximations such as fluid approximations [1] or 
diffusion approximations [8]. 


Applications 


Stochastic scheduling models are applicable in 
many settings, most prominently in computer and 
communication networks, call centers, logistics 
and transportation, and manufacturing systems 
[4, 10]. 
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Problem Definition 


Given a pattern string P = pi p2...Pm anda 
text string T = titz...t,, both being sequences 
over an alphabet »' of size o, the exact string- 
matching (ESM) problem is to find one or, more 
generally, all the text positions where P occurs in 
T, that is, compute the set {7 | 1 < 7 <n-—m+ 
land P = tjlj+i--- fameihs 

Both worst- and average-case complexities are 
considered. For the latter one assumes that pattern 
and text are randomly generated by choosing 
each character uniformly and independently from 
». For simplicity and practicality the assumption 
m = o(n) is made in this entry. 


Key Results 


Most algorithms that solve the ESM problem 
proceed in two steps: a preprocessing phase of the 
pattern P followed by a searching phase over the 
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text T. The preprocessing phase serves to collect 
information on the pattern in order to speed up 
the searching phase. 

The searching phase of string-matching algo- 
rithms works as follows: they first align the left 
ends of the pattern and the text, then compare 
the aligned symbols of the text and the pattern — 
this specific work is called an attempt or a scan — 
and after a whole match of the pattern or after 
a mismatch, they shift the pattern to the right. 
They repeat the same procedure again until the 
right end of the pattern goes beyond the right end 
of the text. The scanning part can be viewed as 
operating on the text through a window, which 
size is most often the length of the pattern. This 
processing manner is called the scan and shift 
mechanism. Different scanning strategies of the 
window lead to algorithms having specific prop- 
erties and advantages. 

The brute force algorithm for the ESM prob- 
lem consists in checking if P occurs at each 
position 7 on T, with 1 < 7 <n—m+1.It 
does not need any preprocessing phase. It runs 
in quadratic time O(mn) with constant extra 
space and performs O(n) character comparisons 
on average. This is to be compared with the 
following bounds. 


Theorem 1 (Cole et al. [6]) The minimum num- 
ber of character comparisons to solve the ESM 
problem in the worst case is n + a(n —m), 
and there exists an algorithm performing at most 
n+ sory” —m) character comparisons in the 
worst case. 


Theorem 2 (Yao [26]) The ESM problem needs 
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2 (“28 x n) time in expectation. 


Online Text Parsing 

The first linear ESM algorithm appears in the 
1970s. The preprocessing phase consists in com- 
puting the periods of the pattern prefixes, or 
equivalently the length of the longest border for 
all the prefixes of the pattern. A border of a 
string is both a prefix and a suffix of it distinct 
from the string itself. Let nexr[i] be the length 
of the longest border of p; ... p;—1. Consider an 
attempt at position 7, when the pattern p1 ... Dm 
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is aligned with the segment ¢; ...t;+m—1 of the 
text. Assume that the first mismatch (during a 
left to right scan) occurs between symbols p; 
and ¢;4; for 1 <i < m. Then, pj... pj-1 = 
tj...4i4j-1 = uanda = p; F tin; = OB. 
A prefix v of the pattern may match a suffix of 
the portion u of the text. By the definition of 
table next, a shift that aligns Pye] with t+; 
cannot miss any occurrence of P in 7’, and thus 
backtracking in the text is not necessary. There 
exist two variants [18, 19], depending on whether 
Pnext{i] has to be different from p; or not. The 
second is slightly more efficient. 


Theorem 3 (Knuth, Morris, and Pratt [18]) 
The text searching can be done in time O(n) and 
space O(m). Preprocessing the pattern can be 
done in time O(m). 


The search can also be realized using an im- 
plementation with successor by default of the 
deterministic automaton D(P) recognizing the 
language X* P. The size of the implementation 
is O(m) independent of the alphabet size, due 
to the fact that D(P) possesses m + | states, 
m forward arcs, and at most m backward arcs. 
Using the automaton for searching a text leads to 
an algorithm having an efficient delay (maximum 
time for processing a character of the text). 


Theorem 4 (Hancart [15]) Searching for 
the pattern P can be done with a delay of 
O(min{o, log, m)}) letter comparisons. 


Note that for most algorithms the pattern pre- 
processing is not necessarily done before the text 
parsing, as it can be performed on the fly during 
the parsing. 


Algorithms Sublinear on the Average 

The Boyer-Moore algorithm [3] is among the 
most efficient ESM algorithms. A simplified ver- 
sion of it, or the entire algorithm, is often imple- 
mented in text editors for the search and substi- 
tute commands. 

The algorithm scans the characters of the win- 
dow from right to left beginning with its right- 
most symbol. In case of a mismatch (or a com- 
plete match of the pattern), it uses two precom- 
puted functions to shift the pattern to the right. 
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These two shift functions are called the bad- 
character shift and the good-suffix shift. They 
are based on the following observations. Assume 
that a mismatch occurs between character pj; = 
a of the pattern and character t;,; = 5 of 
the text during an attempt at position 7. Then, 
Pi+1---Pm = ti+jti-.-ljt4m = u and p; x 
ti4;. The good-suffix shift consists in aligning 
the segment ¢j+;41...¢j+m With its rightmost 
occurrence in P that is preceded by a character 
different from p;. Another variant called the 
best-suffix shift consists in aligning the segment 
ti4j---tj4m With its rightmost occurrence in 
P. Both variants can be computed in time and 
space O(m) independent of the alphabet size. If 
there exists no such segment, the shift consists in 
aligning the longest suffix v of tj4j41...tj4im 
with a matching prefix of x. The bad-character 
shift consists in aligning the text character f+ ; 
with its rightmost occurrence in p;... Pm-1. If 
ti ; does not appear in the pattern, no occurrence 
of P in T can overlap the symbol f;+;, then the 
left end of the pattern is aligned with the character 
at position i + j + 1. The search can then be done 
in O(n/m) in the best case. 


Theorem 5 (Cole [5]) During the search for a 
nonperiodic pattern P of length m (such that the 
length of the longest border of P is less than 
m/2) in a text T of length n, the Boyer-Moore 
algorithm performs at most 3n comparisons be- 
tween letters of P and of T. 


In practice, when scanning the window 
from right to left during an attempt, it is 
sometimes more efficient to only use the bad- 
character shift. This was first done by the 
Horspool algorithm [16]. Other practical efficient 
algorithms are the Quick Search by Sunday [24] 
and the Tuned Boyer-Moore by Hume and 
Sunday [17]. 

Yao’s bound can be reached using an indexing 
structure giving access to all the factors of the 
reverse pattern. This is done by the Reverse 
Factor algorithm also called BDM (for Backward 
Dawg Matching). 


Theorem 6 (Crochemore et al. ([9]) The 
search can be done in optimal expected time 
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log, m ; 
OL s n) using the suffix automaton or the 


suffix tree of the reverse pattern. 


A factor oracle can be used instead of an index 
structure. A factor oracle is an automaton simpler 
than the suffix automaton that may recognize 
some additional strings of length smaller than m. 
The only string of length m accepted by the factor 
oracle of a string w of length m is w itself. Then it 
can be used for solving the ESM problem. This is 
done by the Backward Oracle Matching (BOM) 
algorithm of Allauzen, Crochemore, and Raffinot 
[1]. Its behavior in practice is similar to the one 
of the BDM algorithm. 


Time-Space Optimal Algorithms 

Algorithms of this type run in linear time (for 
both preprocessing and searching) and need only 
constant space in addition to the inputs. 


Theorem 7 (Galil and _ Seiferas [13]) The 
search can be done optimally in time O(n) and 
constant extra space. 


After Galil and Seiferas’ first solution, other 
solutions are by Crochemore-Perrin [8] and Ryt- 
ter [22]. These algorithms rely on a partition of 
the pattern in two parts; they first search for the 
right part of the pattern from left to right, and 
then, if no mismatch occurs, they search for the 
left part. The partition can be the perfect factor- 
ization [13], the critical factorization [8], or based 
on the lexicographically maximum suffix of the 
pattern [22]. Another solution by Crochemore [7] 
is a variant of KMP [18]: it computes lower 
bounds of pattern prefixes periods on the fly and 
requires no preprocessing. 


Bit-Parallel Solution 
It is possible to use the bit-parallelism technique 
for ESM. 


Theorem 8 (Baeza-Yates and Gonnet [2]; Wu 
and Manber [25]) /f the length m of the string 
P is smaller than the number of bits of a machine 
word, the preprocessing phase can be done in 


time and space O(c) and the searching phase in 
time O(n). 
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It is even possible to use this bit-parallelism 
technique to simulate the BDM algorithm. This 
is realized by the BNDM (Backward Nondeter- 
ministic Dawg Matching) algorithm [20]. 

There exists another method that uses the 
bit-parallelism technique that is optimal on the 
average. It considers sparse q-grams and thus 
avoids to scan a lot of text positions. It is due to 
Fredriksson and Grabowski [12]. 


Applications 


The methods that are described here apply to 
the treatment of the natural language, of genetic 
and musical sequences, the problems of safety 
related to data flows like virus detection, and the 
management of the textual databases, to quote 
only some immediate applications. 


Open Problems 


There remain only a few open problems on this 
question. It is still unknown if it is possible to 
design an average optimal time constant space 
string-matching algorithm. The exact size of the 
Boyer-Moore automaton is still unknown [3]. The 
Boyer-Moore automaton was first introduced by 
Knuth [18]. Its states encode all the possible 
situations when searching the pattern with the 
Boyer-Moore algorithm and remember every text 
character already matched in the window. 


Experimental Results 


The book of G. Navarro and M. Raffinot [21] is a 
good introduction and presents an experimental 
map of ESM algorithms for different alphabet 
sizes and pattern lengths. Basically, the Shift- 
Or algorithm is efficient for small alphabets and 
short patterns, the BNDM algorithm is efficient 
for medium-sized alphabets and medium-length 
patterns, the Horspool algorithm is efficient for 
large alphabets, and the BOM algorithm is ef- 
ficient for long patterns. The article of S. Faro 


String Matching 


and T. Lecrogq [11] updates the experimental map 
with the most recent results. 


URLs to Code and Data Sets 


The site monge.univ-mlv.fr/~lecroq/string 
presents a large number of ESM algorithms 
(see also [4]). Each algorithm is implemented 
in C code and a Java applet is given. The site 
www.dmi.unict.it/~faro/smart presents SMART, 
a string-matching research tool, which contains 
the C code of a great number of exact string- 
matching algorithms and some corpora (natural 
language, musical, biological, and random texts). 
The user can easily plug its own algorithm to 
compare it against some selected algorithms. 
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Problem Definition 


The problem is to sort a set of strings into lexi- 
cographical order. More formally: A string over 
an alphabet & is a finite sequence x1x2x3...Xk 
where x; € D fori = 1,...,k. The x;s 
are called the characters of the string, and k 
is the Jength of the string. If the alphabet & is 
ordered, the lexicographical order on the set of 
strings over & is defined by declaring a string 
X = X1X2x3...X,% smaller than a string y = 
y1y2y3... 1 if either there exists a 7 >1 such 
that x; = y; for l< i < j and x; < yj; or 
ifk < J and x; = y; forl< i < k. Givena 
set S of strings over some ordered alphabet, the 
problem is to sort S according to lexicographical 
order. 

The input to the string sorting problem con- 
sists of an array of pointers to the strings to be 
sorted. The output is a permutation of the array of 
pointers, such that traversing the array will point 
to the strings in nondecreasing lexicographical 
order. 

The complexity of string sorting depends on 
the alphabet as well as the machine model. The 
main solution [15] described in this entry works 
for alphabets of unbounded size (i. e., com- 
parisons are the only operations on characters 
of &) and can be implemented on a pointer 
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machine. See below for more information on the 
asymptotic complexity of string sorting in various 
settings. 


Key Results 


This section is structured as follows: first, the 
key result appearing in the title of this entry [15] 
is described; then an overview of other relevant 
results in the area of string sorting is given. 

The string sorting algorithm proposed by 
Bentley and Sedgewick in 1997 [15] is called 
three-way radix quicksort [5]. It works for 
unbounded alphabets, for which it achieves 
optimal performance. 


Theorem 1 The algorithm radix 
quicksort sorts K strings of total length N in 
time O(K log K + N). 


three-way 


This time complexity is optimal, which 
follows by considering strings of the form 
bbb...bx, where all xs are different: Sorting 
the strings can be no faster than sorting the 
xs, and all bs must be read (else an adversary 
could change one unread b to a or c, making 
the returned order incorrect). A more precise 
version of the bounds above (upper as well as 
lower) is K log K + D, where D is the sum of 
the lengths of the distinguishing prefixes of the 
strings. The distinguishing prefix d, of a string 
s in a set S is the shortest prefix of s which 
is not a prefix of another string in S (or is s 
itself, if s is a prefix of another string). Clearly, 
K<D<N. 

The three-way radix quicksort of Bentley and 
Sedgewick is not the first algorithm to achieve 
this complexity; however, it is a very simple 
and elegant way of doing it. As demonstrated in 
[3, 15], it is also very fast in practice. Although 
various elements of the algorithm had been noted 
earlier, their practical usefulness for string sorting 
was overlooked until the work in [15]. 

Three-way radix quicksort is shown in pseudo- 
code in Fig. 1 (adapted from [5]), where S is a 
list of strings to be sorted and d is an integer. To 
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Sort(S, d) 


IF|S|<1: 
RETURN 
Choose a partitioning character v € {sqls €S} 


S<={sESlsy<v} 
S-={seES|s,=v} 
S>={sESlsq>v} 
SortT(S<, d) 
IF v #4 EOS: 
Sorr(S_, d + 1) 
Sort(S>, d) 
S=S<4+S=+S> 


String Sorting, Fig. 1 Three-way radix quicksort (as- 
suming each string ends in a special EOS character) 


sort S, an initial call SORT(CS, 1) is made. The 
value sq denotes the dth character of the string 
s, and + denotes concatenation. The presentation 
in Fig. | assumes that all strings end in a special 
end-of-string (EOS) character (such as the null 
character in C). In an actual implementation, S 
will be an array of pointers to strings, and the sort 
will be in-place (using an in-place method from 
standard quicksort for three-way partitioning of 
the array into segments holding S<, S=, and S;), 
rendering concatenation superfluous. 

Correctness follows from the following invari- 
ant being maintained by the algorithm: At the 
start of a call SORT(S, d), all strings in S agree 
on the first d — 1 characters. 

Time complexity depends on how the par- 
titioning character v is chosen. One particular 
choice is the median of all the dth characters (in- 
cluding doublets) of the strings in S’. Partitioning 
and median finding can be done in time O(|S]), 
which is O(1) time per string partitioned. Hence, 
the total running time of the algorithm is the sum 
over all strings of the number of partitionings 
they take part in. For each string, let a partitioning 
be of type I if the string ends up in S< or Sy 
and of type II if it ends up in S—. For a string s, 
type II can only occur |d;| times and type I can 
only occur log K times. Hence, the running time 
is O(K log K + D). 
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Like for standard quicksort, median finding 
impairs the constant factors of the algorithm, and 
more practical choices of partitioning character 
include selecting a random element among all the 
dth characters of the strings in S and selecting 
the median of three elements in this set. The 
worst-case bound is lost, but the result is a fast, 
randomized algorithm. 

Note that the ternary recursion tree of three- 
way radix quicksort is equivalent to a trie over 
the input strings where each trie node is im- 
plemented by a binary search tree whose node 
elements are the child edges (in the trie) of the 
trie node. In more detail, a node in a binary tree 
contains the character of a trie edge and a pointer 
to the root of the binary tree implementing the 
corresponding trie child. The search keys in a 
binary tree are the characters in its nodes. This 
trie implementation is named ternary search trees 
in [15]. In the recursion tree of three-way radix 
quicksort, an edge representing a recursive call 
on S< or Ss corresponds to a tree edge inside a 
binary tree implementing a trie node, and an edge 
representing a recursive call on S— corresponds 
to a trie edge. 

For the version of the algorithm where the 
partitioning character v is chosen as the median 
of all the dth characters, it is not hard to see 
that the binary trees representing the trie nodes 
become weighted trees. These are binary trees in 
which each element x has an associated weight 
wx, and searches for x take O(log W/w,,), where 
W = Xxw, is the sum of all weights in the 
binary tree. Here, the weight of a binary tree 
node storing character x is the number of strings 
which in the trie reside below the corresponding 
trie edge. As shown in [13], in such a trie im- 
plementation, searching for a string P among K 
stored strings takes time O(log K+|P|), which is 
optimal for unbounded (i.e., comparison-based) 
alphabets. Hence, by the correspondence between 
the recursion trees of three-way radix quicksort 
and ternary search trees, three-way radix quick- 
sort may additionally be viewed as a construction 
algorithm for an efficient dictionary structure for 
strings. 
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Other key results in the area of string 
sorting are now described. The classic string 
sorting algorithm is radixsort, which assumes 
a constant-sized alphabet. The least-significant- 
digit-first variant is easy to implement and runs 
in O(N + /|3|) time, where / is the length of 
the longest string. The most-significant-digit-first 
variant is more complicated to implement but 
has a better running time of O(D + d|%)}), 
where D is the sum of the lengths of the 
distinguishing prefixes and d is the longest 
distinguishing prefix. MclIlroy et al. [12] 
discusses in depth efficient implementations of 
radixsort. 

If the alphabet consists of integers, then on 
a word-RAM the complexity of string sorting 
is essentially determined by the complexity of 
integer sorting. More precisely, the time (when 
allowing randomization) for sorting strings is 
O(Sortyr(K) + N), where Sorti,(K) is the time 
to sort K integers [2], which currently is known 
to be O(K ,/log log K) [11]. 

Returning to comparison-based model, the pa- 
pers [8, 10] give generic methods for turning any 
data structure over one-dimensional keys into a 
data structure over strings. Using finger search 
trees, this gives an adaptive sorting method for 
strings which uses O(N + K log(F/K)) time, 
where F is the number of inversions among the 
strings to be sorted. 

Concerning space complexity, it has been 
shown [9] that string sorting can still be 
done in O(KlogK + WN) time using only 
O(1) space besides the strings themselves. 
However, this assumes that all strings have equal 
lengths. 

All algorithms so far are designed to work in 
internal memory, where CPU time is assumed to 
be the dominating factor. For external memory 
computation, a more relevant cost measure is 
the number of I/Os performed, as captured by 
the I/O model [1], which models a two-level 
memory hierarchy with an infinite outer memory, 
an inner memory of size M, and transfer (I/Os) 
between the two levels taking place in blocks 
of size B. For external memory, upper bounds 
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were first given in [4], along with matching 
lower bounds in restricted I/O models. For a 
comparison-based model where strings may 
only be moved in blocks of size B (hence, 
characters may not be moved individually), 
it is shown in [4] that string sorting takes 
O(M1/B logy;p(Ni/B) + K2logyyg Kz + 
N/B) WOs, where N, is the total length of 
strings shorter than B characters, Kz is the 
number of strings of at least B characters, and 
N is the total number of characters. This bound 
is equal to the sum of the I/O costs of sorting 
the characters of the short strings, sorting B 
characters from each of the long strings, and 
scanning all strings. In the same paper, slightly 
better bounds in a model where characters may 
be moved individually in internal memory are 
given, as well as some upper bounds for non- 
comparison-based string sorting. Further bounds 
(using randomization) for non-comparison-based 
string sorting have been given, with I/O bounds of 
O(K/Blog yyp(K/M)loglogy)(K/M) + 
N/B) [7] and O(K/B(logy;_(N/M))? log, K+ 
N/B) (Ferragina, personal communication). 

Returning to internal memory, it may also 
there be the case that memory hierarchy effects 
are the determining factor for the running time 
of algorithms but now due to cache faults rather 
than disk I/Os. Heuristic algorithms (i.e., algo- 
rithms without good worst-case bounds), aiming 
at minimizing cache faults for internal memory 
string sorting, have been developed. Of these, 
the burstsort line of algorithms [16] performs 
particularly well in experiments. 


Applications 


Data sets consisting partly or entirely of string 
data are very common: Most database applica- 
tions have strings as one of the data types used, 
and in some areas, such as bioinformatics, Web 
retrieval, and word processing, string data is pre- 
dominant. Additionally, strings form a general 
and fundamental data model, containing, e.g., 
integers and multidimensional data as special 
cases. Since sorting is arguably among the most 
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important data processing tasks in any domain, 
string sorting is a general and important problem 
with wide practical applications. 


Open Problems 


As appears from the bounds discussed above, 
the asymptotic complexity of the string sorting 
problem is known for comparison-based alpha- 
bets. For integer alphabets on the word-RAM, the 
problem is almost closed in the sense that it is 
equivalent to integer sorting, for which the gap 
left between the known bounds and the trivial 
linear lower bound is small. 

In external memory, the situation is less 
settled. As noted in [4], a natural upper bound to 
hope for in a comparison-based setting is to meet 
the lower bound of O(K/B logy;, K/M + 
N/B) WOs, which is the sorting bound for 
K_ single characters plus the complexity of 
scanning the input. The currently known upper 
bounds only get close to this when leaving 
the comparison-based setting and allowing 
randomization. 


Experimental Results 


In [15], experimental comparison of two imple- 
mentations (one simple and one tuned) of three- 
way radix quicksort with a tuned quicksort [6] 
and a tuned radixsort [12] showed the simple im- 
plementation to always outperform the quicksort 
implementation and the tuned implementation to 
be competitive with the radixsort implementa- 
tion. 

In [3], experimental comparison among ex- 
isting and new radixsort implementations (in- 
cluding the one used in [15]), as well as tuned 
quicksort and tuned three-way radix quicksort, 
was performed. This study confirms the picture 
of three-way radix quicksort as very competitive, 
always being one of the fastest algorithms, and 
arguably the most robust across various input 
distributions. 
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Data Sets 


The data sets used in [15]: http://www.cs. 
princeton.edu/~rs/strings/, The data sets used 
in [3]: http://dl.acm.org/citation.cfm?id=297 136. 


URL to Code 


Code in C from [15]: http://www.cs.princeton. 
edu/~rs/strings/. 

Code in C from [3]: http://dl.acm.org/citation. 
cfm?id=297136. 

Code in Java from [14]: http://www.cs.princeton. 
edu/~rs/Algs3 .javal-4/code.txt. 
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Problem Definition 


Let G = (V,E) be a directed graph. For an 
arc (u,v) € E, u is said to dominate v, and 
v is said to absorb u. Vertex u is also called a 
dominator of v, and vertex v is called an absorber 
of u. A vertex set D C V is a dominating 
set (DS) of G if every vertex in V \ D has a 
dominator in D; it is an absorbing set (AS) of 
G if every vertex in V \ D has an absorber in D. 
A directed graph G is strongly connected if for 
any pair of ordered vertices u,v € V, there is a 
directed path in G from u to v. The “Minimum 
Strongly Connected Dominating and Absorbing 
Set” problem (MSCDAS) is to find a vertex set 
D such that D is both a dominating set and an 
absorbing set of G and the subgraph of G induced 
by D is strongly connected. 

Disk graph is a geometric graph which is 
of particular interest in the study of MSCDAS, 
since disk graph is a model of heterogeneous 
wireless sensor network, and as one can see in 
the application part, MSCDAS plays an important 
role in wireless sensor network. In a disk graph, 
every vertex u corresponds to a sensor on the 
plane equipped with an omnidirectional antenna 
of transmission radius r(u). Another sensor v 
can correctly decode the message sent by wu if 
and only if v is in the disk centered at u with 
radius r(u). Hence, there is an arc (u,v) in the 
disk graph if and only if ||uv|| < r(u), where 
|| - || is the Euclidean distance between u and 
v. In particular, if all sensors are equipped with 
the same transmission radius, then the disk graph 
degenerates to an undirected graph called unit 
disk graph. 
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Key Results 


Hardness Results 

In a general digraph, the MSCDAS problem can- 
not be approximated within a factor of (1—¢) Inn 
for any real number ¢ > O, where n is the 
number of vertices in the digraph. Even in disk 
graph, MSCDAS is still NP-hard. These hardness 
results follow from the fact that their undirected 
counterparts have these hardness results [1,5]. 


MSCDAS in General Digraph 

Liet al. [8] gave a (3 H(n—1)—1)-approximation 
for MSCDAS, where H(y) = )°/_, 1/i is the 
harmonic number. 

The algorithm is based on the following obser- 
vation. For a vertex u in a digraph G, a spanning 
in-arborescence (resp. out-arborescence) rooted 
at u is a spanning sub-digraph of G in which 
every vertex except u has in-degree (resp. out- 
degree) exactly one and vertex u has in-degree 
(resp. out-degree) zero. For a spanning arbores- 
cence T of G, denote by int(T) the set of internal 
vertices of J. For any vertex u, suppose 7 and 
T° are spanning in-arborescence and spanning 
out-arborescence of G rooted at u, respectively. 
Then int(T™) U int(T°) is an SCDAS of G. 

Define the problem “Spanning Arborescence 
with Fewest Internal Vertices’” (SAFIV) as fol- 
lows: given a digraph G and a vertex u, find a 
spanning arborescence 7 rooted at u such that 
|int(7’)| is as small as possible. By the above 
observation, if SAFIV has a p-approximation, 
then MSCDAS has a 29-approximation. Li et al. 
gave a (1.5H(n — 1) — 0.5)-approximation for 
SAFTYV, and thus the approximation ratio (3 H(n— 
1) — 1) for MSCDAS follows. 

The approximation algorithm for SAFTV uses 
the idea in [6,7] which study the problem of 
“Minimum Node-Weighted Steiner Tree” (MN- 
WST). The idea is to iteratively merge smaller 
arborescences greedily (a vertex is a trivial ar- 
borescence) until finally one gets one arbores- 
cence including all vertices which is rooted at 
the given vertex. It was pointed out in [8] that 
using the method in [6], the approximation ratio 
for SAFIV can be further reduced to 1.35 Inn. 
Since SAFIV is at least as hard as the minimum 
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connected dominating set problem, it cannot be 
approximated within factor (1 — «)Inn. Any 
progress narrowing the gap between Inn and 
1.35 Inn would be interesting. 


MSCDAS in Disk Graph 

Making use of geometric properties, can the ap- 
proximation ratio for MSCDAS be better in a 
disk graph? The answer is yes. Du et al. [2] were 
the first to give a constant approximation in this 
setting. Their idea was further explored by Park 
et al. [11] to output an SCDAS with size at most 
9.6(k + 1/2)? opt + 14.8(k + 1/2)”, where opt is 
the size of an optimal solution and k = rpax/Tmin, 
the ratio between the maximum radius and the 
minimum radius. The core in their work is an 
algorithm for SAFIV, which first colors all ver- 
tices white and then, by growing a search tree step 
by step, turns the colors to either black, blue, or 
gray. The set of black vertices forms a dominating 
set, and the set of blue vertices connects these 
black vertices into an out-arborescence. In fact, 
black vertices are mutually independent, where 
two vertices u and v are said to be independent 
if either wv or vu is not an arc. Two independent 
vertices have distance greater than rmin. Such a 
property guarantees an upper bound for the num- 
ber of black vertices. Furthermore, the structure 
of a search tree guarantees that the number of 
blues vertices is no larger than that of black 
vertices. Then, the desired approximation ratio 
follows. It should be noted that if rmax/Tmin iS 
unbounded, then the approximation ratio is not a 
constant. 

Without a bounded assumption on Pax /Tmin; 
Xu and Li [12] showed that a (2 + €)- 
approximation exists for MDAS, which is a 
combination of a PTAS for MDS and a PTAS for 
MAS. In fact, the PTAS for MAS is a special case 
of the “Geometric Hitting Set” problem studied 
in [10], and the PTAS for MDS is a variation for 
the MDS problem in an undirected graph studied 
in [4]. Both PTASs are obtained through a local 
search method. The analysis is based on the 
separator theorem for planar graphs [3,9]. Zhang 
et al. [13] also obtained approximation ratio 
(2 + ©) using the same method. Based on such 
a DAS, adding Steiner nodes to connect, Zhang 
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et al. showed that a (4 + 3In(2 + e)opt + e)- 
approximation exists for MSCDAS. When the 
optimal value opt is substantially smaller than n, 
this is an improvement on ratio 3H(n — 1) — 1 
for disk graphs. 


Applications 


One application of MSCDAS is the communi- 
cation in wireless sensor network (WSN). In a 
WSN, information is distributed among sensors 
by multi-hop transmissions. If all sensors trans- 
mit messages in a flooding manner, then a lot 
of energy is wasted, and large amount of inter- 
ference is created. To alleviate such problems, it 
is desirable that only a small fraction of sensors 
participate in the transmission, while information 
can still be successfully shared. An SCDAS can 
serve for this purpose. Suppose D is an SCDAS 
of directed graph G (the topology of the WSN). 
If there is a message at source sensor u to be sent 
to destination sensor v, then the message can be 
first sent from u to its absorber; since G[D] is 
strongly connected, it can be successfully relayed 
to the dominator of v and then sent to v. 


Open Problems 


It is still open whether there exists a constant 
approximation algorithm for MSCDAS in disk 
graph. 
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Subexponential Parameterized Algorithms 


Problem Definition 


A parameterized problem is a language L C 
o)* x N, where » is a fixed, finite alphabet. 
The second component is called the parameter 
of the problem. The central notion in parameter- 
ized complexity is the notion of fixed-parameter 
tractability (FPT). A parameterized problem L 
is called FPT if it can be determined in time 
Ff (k) + n° whether or not (x,k) € L, where n = 
|(x,k)|, f is a computable function depending 
only on k, and c is a constant independent of n 
and k. The complexity class containing all fixed- 
parameter tractable problems is called FPT. 
While in the definition of class FPT, we are 
happy with any computable function f, from 
application perspective it is often desirable to 
have the asymptotic growth of f as slow as 
possible. Take as an example an FPT problem 
VERTEX COVER which has been subjected 
to intense scrutiny with progressively faster 
algorithms designed for it. Let us remind 
that in the VERTEX COVER problem, we 
are asked if an n vertex graph G contains a 
vertex cover of size k or in other words a 
set of vertices S such that every edge of G 
has at least one endpoint in S. Starting from 
a k* algorithm of Buss and Goldsmith in 
1993, there have been algorithms with f(k) € 
{2* 1.324718, 1.29175*, 1.2906", 1.271*, 
1.2738}. The current fastest algorithm for 
VERTEX COVER runs in time 1.2738*n0 
(see the entry » Vertex Cover Search Trees 
from this book). The ever-decreasing running 
time leads to the following natural question: can 
VERTEX COVER admit a subexponential time 
algorithm? That is, can it have an algorithm 
with running time 20k) 9 The negative 
answer to this question would imply that 
P # NP. However, using a stronger assumption 
in complexity theory, namely, exponential time 
hypothesis (ETH) (see the entry » Exponential 
Lower Bounds for k-SAT Algorithms in this 
book), one can show that if ETH holds, then 
the answer to our question is NO. Moreover, 
subject to ETH, there are no subexponential 
algorithms for many other natural NP-complete 
problems. Thus, another natural question arises: 


Subexponential Parameterized Algorithms 


is it true that every NP-complete problem cannot 
be solved in subexponential time? Interestingly, 
the answer to this question is again NO, and there 
are examples in the literature of such problems. 
Coming back to our example of VERTEX 
COVER problem, if we restrict the input graph to 
be planar, the problem remains NP-complete, but 
the brute-force algorithm problem can be sped up 
even more. That is, VERTEX COVER on planar 
graphs can be solved in time 20k). yO). 
by a subexponential algorithm. We refer to 
more parameterized subexponential algorithms 
on planar graphs to the » Bidimensionality in 
this book. 

Until recently, the only subexponential 
algorithms were known for “geometric” graph 
problems, that is, problems on planar graphs or 
graphs excluding some fixed graph as minors. 
In 2009, Alon, Lokshtanov, and Saurabh [1] 
obtained the first parameterized subexponential 
algorithm for a natural “nongeometric” problem. 
This result has acted as catalyst for the discovery 
of new subexponential time algorithms. In 
this article, we give a short overview of these 
algorithms. 


Key Results 


FAST 

In the FEEDBACK ARC SET IN TOURNA- 
MENTS (FAST) problem, we are given an n- 
vertex tournament 7 and a positive integer k; 
the question is whether one can make T into a 
directed acyclic graph by deleting at most k arcs. 


FAST 

Input: A tournament T = (V, E) and a non- 
negative integer k. 

Parameter: k. 

Question: Is there F C E, |F| < k, such that 
diraph H = (V, E \ F) is acyclic? 


Alon, Lokshtanov, and Saurabh in [1] ob- 
tained a parameterized subexponential algorithm 
for FAST. 


Theorem 1 ({1]) FAST is solvable in 
aVk logk ,OU), 


time 
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The theorem is proved by making use of a 
novel randomized technique called Chromatic 
Coding. It appeared that subexponential algo- 
rithms exist for several other problems on tour- 
naments (see entry » Computing Cutwidth and 
Pathwidth of Semi-complete Digraphs in this 
book). 


Fill-In 

The next “nongeometric” problem for which a 
subexponential algorithm was found happened to 
be the classical MINIMUM FILL-IN problem. 

A graph is chordal (or triangulated) if every 
cycle of length at least four contains a chord, 
i.e., an edge between nonadjacent vertices of the 
cycle. The MINIMUM FILL-IN problem (also 
known as MINIMUM TRIANGULATION and 
CHORDAL GRAPH COMPLETION) is to de- 
cide if a given graph G can be transformed into a 
chordal graph by adding at most k edges. 


MINIMUM FILL-IN 

Input: A graph G = (V, E) and a nonnegative 
integer k. 

Parameter: k. 

Question: Is there F C [V]?, |F| < k, such that 
graph H = (V, E U F) is chordal? 


Theorem 2 ((6]) MINIMUM FILL-IN is solv- 
able in time avklogk,OW), 


The proof of the theorem is based on a combi- 
natorial bound estimating the number of specific 
objects in the graph, namely, potential maximal 
cliques. 


Completion to Graph Classes 


Since discoveries of subexponential algorithms 
for FAST and MINIMUM FILL-IN, it appeared 
that several other graph modification problems 
admit subexponential algorithms. In particular, 
it was shown that problems of completion to a 
certain subclass of chordal graphs like trivially 
perfect, threshold [4], split [7], proper interval 
[2], and interval graphs [3] admit parameterized 
subexponential algorithms. 
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On the other hand, it has been shown that for 
a number of other graph classes, like cographs, 
completion to these classes of graphs cannot 
be done in parameterized subexponential time 
unless the exponential time hypothesis (ETH) 
fails [4]. 


Open Problems 


The most natural open question about the 
given subexponential algorithms is the question 
about lower bounds. As a concrete example, an 
algorithm for FAST with running time bound 
29k), OD) would actually be a 2° time algo- 
rithm which inclines us to suspect that 20%) ig 
the best possible dependency on k in the running 
time for this problem. Unfortunately, there is a 
big gap here between what we suspect and what 
we can prove, even assuming ETH. The only 
tight bound on parameterized subexponential 
algorithms for graph modification problems 
we are aware of is the p-CLUSTERING 
problem [5]. 
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Problem Definition 


Bin packing is a classical problem in combi- 
natorial optimization. Given a collection of n 
items with different sizes, the objective is to pack 
the items into a minimum number of uniform 
capacity bins. More formally, the input of the 
bin packing problem is described by a set of 
n items J = {1,...,m} and a size function 
s : I — [0,1]. The output is a packing of 
the items into bins B,,...,B, C J such that 
s(B;) < 1 for j = 1,...,k, where the notation 
S(B) denotes )°;¢, 5; for any B C I. The 
objective is to minimize the number bins used in 
the packing. 

The SUBSET-SUM algorithm is an 
itively appealing greedy heuristic for the bin 
packing problem: Starting from the empty 
packing, the algorithm repeatedly finds a 
subset B of yet-unpacked items maximizing 
s(B) subject to s(B) < 1, adds B to 
the packing, and iterates. Each iteration 
requires that we solve an instance of the 
knapsack problem. In practice, instead of 
finding the optimal solution, one can use an 
fully polynomial time approximation scheme 
(FPRAS) to compute a (1 — €)-approximate 
solution [6]. 

This note is concerned with the worst-case 
asymptotic performance of the SUBSET-SUM 
algorithm. For a given instance s : J — [0,1], 
we use OPT(s) to denote the number of bins 
used in an optimal packing of s and SS(s) 
to denote the number of bins used by the 
SUBSET-SUM algorithm. Then for a_ given 
class C of instances, we define the worst-case 
asymptotic approximation ratio of SUBSET-SUM 
as 


intu- 


RE(C) = him 


(oe) sec 
OPT(s)=. 


Finally, we use Rgg to denote the ratio for general 
instances of the problem. 
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Key Results 


Lower Bound on R¢&: 
Graham [4] provided a family of instance 
exhibiting an approximation ratio that tends to 
rea oy & 1.6067. 


Theorem 1 (Graham [4]) R& > P21 544 © 
1.6067. 


Proof Consider the following instance parame- 
terized by two positive integers r and N. For 
each 7 = 1,...,r, we create N items of size 
2-' + 8, where 8 = 272". Let us denote this 
instance with s. Provided that 2' —1 divides N for 
alli = 1,...,7r, itis not hard to see that SUBSET- 
SUM first packs the smallest items into N/(2”—1) 
bins, then it packs the second-smallest items into 
N/(2’—! — 1) bins, and so on, until it packs the 
largest items into N bins. On the other hand, the 
optimal solution uses just N bins by packing one 
item of each size class per bin. Therefore, 

SS(s) Wl 

OPT(s) D eg 


i=1 


(2) 


which quickly approaches 1.6067 asr grows. O 


Upper Bound on R&: 

A trivial upper bound on Rgg is 2. This follows 
from the fact only the last bin can be less than 
half full. Caprara and Pferschy [1] gave the first 
nontrivial upper bound, by showing that Rgg is 
at most 4/3 + In4 = 1.6210. Interestingly, Gra- 
ham [4] had conjectured that the true value of R&S 
should match his lower bound. This conjecture 
was finally proven by Epstein et al. [2]. 


[3]) Rgs 


Theorem 2 (Epstein et al. 
par To = 1.6067. 


The proof of this result uses weighting func- 
tions and a factor revealing mathematical pro- 
gram. Here we only sketch the high level idea of 
the approach. Let B be one of the bins opened by 
SUBSET-SUM. For every item i € B we define 


x8) if 1 — Smin < s(B), 
Sj otherwise, 


(3) 


Wi = 
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where Sin is the size of the smallest yet- 
unpacked item just before opening B. 

The weights are used to charge the cost of 
the packing computed by SUBSET-SUM to an 
optimal packing. The following lemma allows 
us to bound the performance of the algorithm 
provided we can show that the sum of the weights 
is comparable to the cost of the SUBSET-SUM 
packing and that no bin in the optimal solution 
is charged too much. 


Lemma 1 Let O be an optimal solution and B be 
the solution computed SUBSET-SUM. If there is a 
weighting function w such that w(O) < p for all 
O € Oand |b| < wT) +6, then |B| < p|O| +6. 


Proof Because O is a packing )°geq w(O) = 
w(J), therefore, 


|B] < wi) +5 = Y> w(O)+8 < plO|+6. 
OEO 


GE 


The key contribution of Epstein et al. [3] 
was bounding the parameters p and 6 associated 
with the weighting function (3). Bounding 6 is a 
relatively straightforward exercise. Bounding p is 
more involved and requires analytically solving a 
mathematical program. Here we only state their 
bounds. 


Lemma 2 (Epstein et al. [3]) Let B be the 
SUBSET-SUM packing and let w be the weighting 
function (3) for B. Then 


1. |B] sw) +1, 
2. w(B) < P24 sk forall BC I such that 
s(B) <1. 


Theorem 2 follows immediately from Lem- 
mas | and 2. 


Parametric Case 

As it is the case with most bin packing heuris- 
tics, the performance of SUBSET-SUM improves 
when the items are small relative to the capacity 
of the bin. In a parametric analysis of a heuristic, 
we restrict our attention to instances where the 
maximum item size is bounded. More formally, 
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for every real aw € (0, 1], we define Cy to be the 
class of instances s such that maxje7 5; < a. 


Theorem 3 (Epstein et al. [3]) For every inte- 
gert >landae (=> +], we have R§s(Ca) = 
00 1 
1+ paar Gtp2—1° 
Notice that this is a strict generalization of 


Theorems | and 2, which only cover the case 
a=1. 


Applications 


There is an interesting connection between the 
performance of the SUBSET-SUM algorithm and 
the quality of equilibria of a game-theoretic ver- 
sion of bin packing. Let us associate a game 
with each instance s [0,1] of the 
bin packing problem. The set of players in this 
game is J, the set of items. Each player can 
decide in which bin it wants to be packed; this 
is the player’s strategy space. For each bin B 
chosen in this uncoordinated fashion, if s(B) > 1 
then the players in B are charged oo; otherwise, 
player i € B is charged xB) These payments 
enforce that a strategy profile is a valid packing 
if and only if the payments are finite. Further- 
more, if the payments are finite, the sum of 
these payments equals the number of bins in the 
packing. 

A strategy profile is said to be a Nash Equilib- 
rium (NE) if there is no player that can switch 
bins to decrease its payment. The price of an- 
archy of the bin packing game is the asymp- 
totic worst-case ratio between the number of 
bins used by an NE and the number of bins in 
an optimal packing. A packing is said to be a 
Strong Nash Equilibrium (SNE) if no coalition 
of players can switch bins to decrease the sum 
of their payments. The strong price of anarchy 
of the bin packing game is the asymptotic worst- 
case ratio between the number of bins used by 
an SNE and the number of bins by an optimal 
packing. 


I> 


Theorem 4 (Epstein and Kleiman [2]) The 
strong price of anarchy for the bin packing game 
is exactly R&S. 
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Notice that every SNE is an NE, since we 
can think of an NE as requiring that there are 
no “coalitions” of size 1. Therefore, Theorem 4 
establishes a lower bound on the price of anarchy 
for the bin packing game. However, not every 
NE is an SNE. In fact, it is known that the 
price of anarchy for the bin packing game is 
strictly worse than its strong price of anarchy 
[2,3]. 


Experimental Results 


Gupta and Ho [5] performed an experimental 
evaluation of SUBSET-SUM. (Gupta and Ho 
call the algorithm minimum bin slack because 
they formulate each iteration as trying to 
minimize the slack (unused space) of the 
bin, which is equivalent to maximizing the 
bin’s usage.) The instances used in the 
evaluation were randomly generated by selecting 
the item sizes uniformly at random from 
different numerical ranges. They compared 
the performance of SUBSET-SUM to two well- 
known heuristics: FIRST-FIT-DECREASING and 
BEST-FIT-DECREASING. They observed that 
SUBSET-SUM performed better on average 
without incurring a significant computational 
overhead. 
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Problem Definition 


The Substring Parsimony Problem, introduced by 
Blanchette et al. [1] in the context of motif dis- 
covery in biological sequences, can be described 
in a more general framework: 

Input: 


e A discrete space S on which an integral 
distance d is defined (i.e., d(x, y) € N Vx, 
yes). 

¢ A rooted binary tree T= (V,E) with n 
leaves. Vertices are labeled {1,2,...,n, 
...,|V|}, where the leaves are vertices 
{1,2,...,m}. 

¢ Finite sets S,, S2,...,.S,, where set S; C S is 
assigned to leaf i, for alli = 1...n. 

¢ A non-negative integer t 


Output: All solutions of the form (x1, X2,..., 
Xn.+++,X|y|) such that: 


* x; €Sforalli=1...|V| 
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¢ x; € S; foralli=1...n 
. Law d(Xy,Xy) <t 


The problem thus consists of choosing one ele- 
ment x; from each set S; such that the Steiner 
distance of the set of points is at most t. This 
is done on a Steiner tree T of fixed topology. 
The case where |S;| = 1 for all i =1...n is 
a standard Steiner tree problem on a fixed tree 
topology (see [11]). It is known as the Maximum 
Parsimony Problem and its complexity depends 
on the space S. 


Key Results 


The substring parsimony problem can be solved 
using a dynamic programming algorithm. Let 
uéV ands €S. Let W,[s] be the score of the 
best solution that can be obtained for the subtree 
rooted at node u, under the constraint that node u 
is labeled with s, i.e., 


W,,[s] = min ) d(x;,Xj). 
Kiligeses xX|ylES am 
Xy=s G@jJeE 


Let v be a child of u, and let X(,,y)[s] be the 
score of the best solution that can be obtained for 
the subtree consisting of node u together with the 
subtree rooted at its child v, under the constraint 
that node u is labeled with s: 


Xwv)[s] ae min Ss d(xi,x;). 
X15 XppEes GEE 
xu=s 7,7 €subtree(v)U{(u,v)} 
Then, we have: 

0 if uisaleaf ands € S, 

W,[s] = +00 if wisaleafands ¢ S, 

Xu.v)[s] if wis not a leaf 
v€children(u) 
and 


Xwwls] = min W,[s'] + d(s, 8’). 


ylé 


Tables W and X can thus be computed using 
a dynamic programming algorithm, proceeding 
in a post-order traversal of the tree. Solutions 
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can then be recovered by tracing the computation 
back for all s such that W,oot[s] < t. Note that the 
same solution may be recovered more than once 
in this process. 

A straight-forward implementation of this 
dynamic programming algorithm would run in 
time O(n - |S|? - y(S)), where y(S) is the time 
needed to compute the distance between any 
two points in S. Let Ng(S) be the maximum 
number of a-neighbors a point in S can have, 
ie., Ng(S) = maxyxes |{y € S: d(x, y) =a}|. 
Blanchette et al. [3] showed how to use a mod- 
ified breadth-first search of the space S to com- 
pute each table X(,,y) in time O(|S|-Ni(S)), 
thus reducing the total time complexity to 
O(n-|S|-N,(S)). Since only solutions with 
a score of at most f are of interest, the complexity 
can be further reduced by only computing those 
table entries which will yield a score of at most 
t. This results in an algorithm whose running 
time is O(n-M-Nir2\(S)-Ni(S)) where 
M = max;=1...n |Sj]. 

The problem has been mostly studied in 
the context of biological sequence analysis, 
where S = {A,C,G, Ty. for some small k 
(k = 5,...,20 are typical values). The distance 
d is the Hamming distance, and a phylogenetic 
tree T is given. The case where |S;| = 1 for all 
i = 1...n is known as the Maximum Parsimony 
Problem and can be solved in time O(7 - k) using 
Fitch’s algorithm [9] or Sankoff’s algorithm [12]. 
In the more general version, a long DNA 
sequence P,, of length L is assigned to each leaf u. 
The set S,, is defined as the set of all k-substrings 
of P,. In this case, M = L—k +1 € O(L), and 
Na € O(min(4*, (3k)%)), resulting in a com- 
plexity of O(n-L-3k-min(4*, (3k)!4/2!)). 
Notice that for a fixed k and d, the algorithm 
is linear over the whole sequence. The problem 
was independently shown to be NP-hard by 
Blanchette et al. [3] and by Elias [7]. 


Applications 
Most applications are found in computational 


biology, although the algorithm can be applied 
to a wide variety of domains. The algorithm 
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for the substring parsimony problem has been 
implemented in a software package called 
FootPrinter [5] and applied to the detection of 
transcription factor binding sites in orthologous 
DNA regulatory sequences through a method 
called phylogenetic footprinting [4]. Other 
applications include the search for conserved 
RNA secondary structure motifs in orthologous 
RNA sequences [2]. Variants of the problem 
have been defined to identify motifs regulating 
alternative splicing [13]. Blanchette et al. [3] 
study a relaxation of the problem where one 
does not require that a substring be chosen from 
each of the input sequences, but instead asks 
that substrings be chosen from a sufficiently 
large subset of the input sequence. Fang and 
Blanchette [8] formulate another variant of the 
problem where substring choices are constrained 
to respect a partial order relation defined by a set 
of local multiple sequence alignments. 


Open Problems 


Optimizations taking advantage of the specific 
structure of the space S may yield more effi- 
cient algorithms in certain cases. Many important 
variations could be considered. First, the case 
where the tree topology is not given needs to 
be considered, although the resulting problems 
would usually be NP-hard even when |S;| = 1. 
Another important variation is one where the 
phylogenetic relationships between trees is not 
given by a tree but rather by a phylogenetic net- 
work [10]. Finally, randomized algorithms sim- 
ilar to those proposed by Buhler et al. [6] may 
yield important and practical improvements. 


URL to Code 


http://bio.cs.washington.edu/software.html 
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Problem Definition 


A basic building block for compressed data struc- 
tures for texts and functions is the representa- 
tion of a permutation of the integers {1,...,7}, 
denoted by [1...”]. A permutation z is triv- 
ially representable in n [lg] bits which is within 
O(n) bits of the information theoretic bound of 
lg(n!), but instances from restricted classes of 
permutations can be represented using much less 
space. 

We are interested in encodings of permuta- 
tions that can efficiently access them. Given a 
permutation z over [1...m], an integer k and an 
integer i € [1...n], data structures on permu- 
tations aim to support the following operators as 
fast as possible, using as little additional space as 
possible: 


¢ gzc(i): application of the permutation to 7, 

¢ zx 1'(i): application of the inverse permutation 
toi, 

¢ a“ )(i): x() iteratively applied k times start- 
ing with value i (e.g., 7 (i) = a(z(i))). 
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Key Results 


We distinguish between two types of solutions: 
the succinct index and two succinct data 
structures for permutations introduced by Munro 
et al. [1], and the various compressed data 
structures proposed later [2-4]. 


Succinct Data Structures 

Munro et al. [1] studied the problem of succinctly 
representing a permutation to support operators 
on it quickly. They give several solutions, de- 
scribed below. 


“Shortcut” Index Supporting z() and 2—!() 
Given an integer parameter f, the operators 2 () 
and x~!() can be supported by simply writing 
down z in an array of n words of [lgn] bits 
each, plus an auxiliary array S of at most n/t 
back pointers called shortcuts: in each cycle of 
length at least t, every t-th element has a pointer 
t steps back. Then, z(i) is simply the i-th value 
in the primary structure, and z~!(i) is found by 
moving forward until a back pointer is found and 
then continuing to follow the cycle to the location 
that contains the value 7. 

The trick is in the encoding of the locations of 
the back pointers: this is done with a simple bit 
vector B of length n, in which a | indicates that 
a back pointer is associated with a given location. 
B is augmented using o(7) additional bits so that 
the number of 1’s up to a given position and the 
position of the r-th | can be found in constant 
time (i.e., using the rank and select operators on 
binary strings [5]). This gives the location of the 
appropriate back pointer in the auxiliary array S. 
As there are back pointers every ¢ elements in 
the cycle, finding the predecessor requires O(t) 
memory accesses. 


Theorem 1 For any strictly positive integer n 
and any permutation ma on [1...n] which can 
be decomposed into 6 cycles of respective sizes 
Cy,...,C3, there is a representation of m using 
within (Yiei..s LF) lgn+2n+o(n) C mien 5 
2n + o(n) bits to support the operator m() in 
constant time and the operator x~'() in time 


within O(t). 
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Interestingly enough, Munro et al. [1] did not 
notice that their construction is actually an index 
and that the raw encoding can be replaced by 
any data structure supporting the operator z(), 
including the compressed ones later described 


[4]. 


“Cycle” Data Structure Supporting zr* () 

For arbitrary i and k, 2*() is supported by 
writing the cycles of 2 together with a bit vector 
B marking the beginning of each cycle. Observe 
that the cycle representation itself is a permuta- 
tion in “standard form’; call it o. The first task 
is to find 7 in the representation: it is in posi- 
tion o '(i). The segment of the representation 
containing i is found through the rank and select 
operators on B. Then z*(i) is determined by 
taking k modulo the cycle length, moving that 
number of steps around the cycle starting at the 
position of i, and applying o () to obtain the value 
to return. 

Other than the support of the operators on 
o, all operators are performed in constant time; 
hence the asymptotic supporting time of 2*() 
depends on the supporting time in which the 
data structure chosen to represent o supports 
the operators o() and o~!(). Munro et al. [1] 
proposed the following, using a raw encoding of 
o with a shortcut index to support 0 !(): 


Theorem 2 For any strictly positive integer n 
and any permutation x on [1...n], there is a 
representation of using at most (1+ ¢)nlgn+ 
O(n) bits to support the operator 1*() in time 
within O(1/e), for any € less than 1 and for any 
arbitrary value of k. 


Under a restricted model of pointer machine, 
this technique is optimal: using O(7) extra bits 
(ie., O(n/logn) extra words), time within 
Q2(logn) is necessary to support both z() and 


x +(). 


“Benes Network” Data Structure Supporting 

nK() 

Any permutation can be implemented by a 
communication network composed of switches: 
this is called a Benes Network and uses even less 
space under the RAM model than the solutions 


2133 


described in the previous sections. Sparsely 
adding pointers accelerates the support of 2*() 


to time within O(a a): 


Theorem 3 For any strictly positive integer n 
and any permutation x on [1...n], there is a 
representation of 1 using at most ||g(n!)]+ O(n) 
bits to support the operator 1*() in time within 
O(logn/ log logn). 


This representation uses space within an addi- 
tive term within O(n) of the optimal, both on av- 
erage and in the worst case over all permutations 
over [1...7]. 


Compressed Data Structures 

Any comparison-based sorting algorithm yields 
an encoding for permutations, and any adaptive 
sorting algorithm in the comparison model yields 
a compression scheme for permutations. Support- 
ing operators on such compressed permutation in 
less time than required to decompress the whole 
of it requires some more work: 


Runs 

Barbay and Navarro [2] described how to seg- 
ment a partition into nRuns runs composed 
of consecutive positions forming already sorted 
blocks and how to merge those via a wavelet 
tree. This yields a data structure compressing a 
permutation within space optimal over all permu- 
tations with nRuns runs of sizes given by the 
vector vRuns. This data structure supports the 
operators () and z~!() in sublinear time within 
O(1 + lognRuns), with the average supporting 
time within O(1 + H(vRuns)), which decreases 
with the entropy of the partition of the permuta- 
tion into runs. Here, the entropy of a sequence of 
positive integers ¥ = (n1,n2,...,n,) adding up 
tonisH(X) =) j_, 1 oe 

Theorem 4 For any strictly positive integer n 
and any permutation x on [1 ...n] which can be 
decomposed into nRuns runs of sizes VRuns = 
("1,.--;Tnruns), there is a representation of 1 
using at most nH(vRuns) + O(nRuns logn) + 
o(n) bits to support the computation of m(i) 
and x~\(i) in time within O(1 + lognRuns) 
in the worst case over i € [1...n] and in 
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time within O(1 + H(vRuns)) on average when 
i € [1...n] is uniformly distributed. This com- 
pressed data structure can be computed in time 
within O(n(1 + H(vRuns))), which is worst- 
case optimal in the comparison model over all 
such permutations decomposed into nRuns runs 
of sizes given by the vector vRuns. 


The partitioning takes only n — 1 comparisons, 
and the construction of the compressed data 
structure itself is an adaptive sorting algorithm 
improving over previous results [6,7]. 


Heads of Strict Runs 

A two-level partition of the permutation yields 
further compression [2]. The first level parti- 
tions the permutation into strict ascending runs 
(maximal ranges of positions satisfying m(i + 
k) = x(i) + k). The second level partitions 
the heads (first position) of those strict runs into 
conventional ascending runs. This is analogous 
to the notion of blocks described by Moffat and 
Petersson [7] for multisets. 


Theorem 5 For any strictly positive integer n 
and any permutation a on [1...n] which can 
be decomposed into nBlock strict runs and 
into nRuns < nBlock monotone runs, let 
vHRuns be the vector formed by the nRuns 
monotone run lengths in the permutation of 
strict run heads. Then, there is a representation 
of m= using at most nBlockH(vHRuns)) + 
O(nBlock log ==) + 0(n) bits to support 
the operator m() and x~1\() in time within 
O(1 + lognBlock). This compressed data 
structure can be computed in time within 
O(n(1 + log nBlock)). 


Shuffled Subsequences 

The preorder measures seen so far have consid- 
ered runs which group contiguous positions in 
zr: this does not need to be always the case. A 
permutation z over [1...n] can be decomposed 
in n comparisons into a minimal number nSUS 
of Shuffled Up Sequences, defined as a set of, 
not necessarily consecutive, subsequences of in- 
creasing numbers that have to be removed from 
x in order to reduce it to the empty sequence [8]. 
Then those subsequences can be merged using 
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the same techniques as above, which yields a 
new adaptive sorting algorithm and a new com- 
pressed data structure [2]. An optimal partition 
of a permutation z over [1...] into a minimal 
number nSMS of Shuffled Monotone Sequences, 
sequences of not necessarily consecutive subse- 
quences of increasing or decreasing numbers, is 
NP-hard to compute [9], but if such a permutation 
is given, the same technique applies [10]. 


LRM Subsequences 

LRM trees partition a sequence of values into 
consecutive sorted blocks and express the relative 
position of the first element of each block within 
a previous block. Such a tree can be computed 
in 2(n — 1) comparisons within the array and 
overall linear time, through an algorithm similar 
to that of Cartesian Trees [11]. The interest of 
LRM trees in the context of adaptive sorting 
and permutation compression is that the val- 
ues are increasing in each root-to-leaf branch: 
they form a partition of the array into subse- 
quences of increasing values. Barbay et al. [3] 
described how to compute the partition of the 
LRM tree of minimal size-vector entropy, which 
yields a compressed data structure asymptoti- 
cally smaller than #(vRuns)-adaptive sorting, 
smaller in practice than H(vSUS)-adaptive sort- 
ing, as well as a faster adaptive sorting algorithm. 


Number of Inversions 

The preorder measure nInv counts the number 
of pairs (i, 7) of positions 1 <i < j <n 
in a permutation za over [1...n] such that 
m(i) > z(j/). Its value is exactly the number 
of comparisons performed by the algorithm 
Insertion Sort, betweenn andn? fora per- 
mutation over [1 ...”]. A variant of Insertion 
Sort, named Local Insertion Sort, 
sorts az in n(1 + [lg(nInv/n)]) comparisons 
[6,7]. 

Simply encoding the n values (2 (i)—i)je[1...n] 
using the y’ code from Elias [12], and indexing 
the positions of the beginning of each code by a 
compressed bit vector, yields a compressed data 
structure supporting the operator z() in constant 
time. The resulting data structure uses space 
within n(1 + 21g niny) + o(n) bits. Support for 
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the operator ~1() can be added in two distinct 
ways, either encoding both z and 27! using this 
technique within 2n(1 + 21g ainy) + o(n) bits, 
which supports both operators 2() and x~!() in 
constant time, or adding support for the operator 
a !() using Munro et al.’s shortcut succinct in- 
dex for permutations [1] described previously. 


Removing Elements 

The preorder measure nRem counts the minimum 
number of elements that must be removed from 
a permutation so that what remains is already 
sorted. Its exact value is m minus the length 
of the Longest Increasing Subsequence, which 
can be computed in time within O(n logn). Al- 
ternatively, the value of nRem can be approxi- 
mated within a constant factor of 2 in 2(n — 1) 
comparisons. Partitioning a into the removed 
elements and the remaining ones through a bit 
vector of n bits, representing the order of the 
2nRem elements in a wavelet tree (using any 
of the data structures described above), and rep- 
resenting the merging of both into n bits yield 
a compressed data structure using space within 
2n + 2nRemlg(n/nRem) + o(n) bits and sup- 
porting the operators 2() and z~!() in sublinear 
time, within O(1 + log(nRem + 1)). 


Applications 


Integer Functions 

Munro et al. [1] extended the results on per- 
mutations to arbitrary functions from [1...7] 
to [1...n]. Again f*(i) indicates the function 
iterated k times starting at i: if k is nonnegative, 
this is straightforward. The case in which k is 
negative is more complicated as the image is a 
(possibly empty) multiset over [1 ...7]. 

Whereas zz is a set of cycles, f can be viewed 
as a set of cycles in which each node is the root of 
a tree. Starting at any node (element of [1 ...7]), 
the evaluation moves one step along a branch of 
the tree, or one step along a cycle. Moving k 
steps in a positive direction is straightforward, 
and one moves up a tree and perhaps around a 
cycle. When & is negative, one must determine 
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all nodes at distance k from the starting location, 
i, in the direction toward the leaves of the trees. 
The key technical issue is to run across succinct 
tree representations picking off all nodes at the 
appropriate levels. Using a raw encoding of the 
permutation mapping integers to the nodes, and 
Munro et al.’s shortcut succinct index [1] to 
support the operations on it, yields the following 
result: 


Theorem 6 For any fixed ¢,n > O and f : 
[1...n] — [l...n], there is a representation of 
f using (1 + e)nlgn + O(1) bits of space to 
compute f*(i) in time within O(1 + |f*(i)]), 
for any integer k and for any integeri € [1...n]. 


Open Problems 


Other Measures of Disorder 

Moffat and Petersson [7] list many measures of 
preorder and adaptive sorting techniques. Each 
measure explored above yields a compressed data 
structure for permutations supporting the opera- 
tors 2() and z~!() in sublinear time. Each adap- 
tive sorting algorithm in the comparison model 
yields a compression scheme for permutations, 
but the encoding thus defined does not necessar- 
ily support the simple application of the permu- 
tation to a single element without decompressing 
the whole permutation nor the application of the 
inverse permutation. More work is required in 
order to decide whether there are compressed 
data structures for permutations, supporting the 
operators () and z~1() in sublinear time and 
using space proportional to the other preorder 
measures [6, 7] (e.g., Reg, Exc, Block, and 
Enc). 


Sorting and Encoding Multisets 

Munro and Spira [13] showed how to sort multi- 
sets through MergeSort, Insertion Sort, 
andHeap Sort, adapting them with counters to 
sort in time within O(n(1 + H((m,,...,™m,)))) 
where m; is the number of occurrences of i in 
the multiset (note that this is orthogonal to the 
results described in this chapter that depend on 
the distribution of the lengths of monotone runs). 
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It seems easy to combine both approaches (e.g., 
on MergeSort ina single algorithm using both 
runs and counters), yet quite hard to analyze 
the complexity of the resulting algorithm and 
compressed data structure. The difficulty measure 
must depend not only on both the entropy of the 
partition into runs and the entropy of the partition 
of the values of the elements but also on the 
interaction of those partitions. 


Compressed Data Structures Supporting 

*() 

In Munro et al.’s “cycle” data structure [1] for 
supporting the operator 2*() (Theorem 2), the 
raw encoding of the permutation o representing 
the cycles of a can be replaced by any com- 
pressed data structure such as those described 
here, with the warning that the compressibility of 
o depends not only on z but also on the order 
in which its cycles are placed in o. The question 
if there is a compressed data structure supporting 
the operator 2*() which takes advantage of this 
order is open. 
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Problem Definition 


This problem is to design succinct representation 
of balanced parentheses in a manner in which 
a number of “natural” queries can be supported 
quickly, and use it to represent trees and graphs 
succinctly. The problem of succinctly represent- 
ing balanced parentheses was initially proposed 
by Jacobson [6] in 1989, when he proposed 
succinct data structures, i.e., data structures that 
occupy space close to the information-theoretic 
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lower bound to represent them, while supporting 
efficient navigational operations. Succinct data 
structures provide solutions to manipulate large 
data in modern applications. The work of Munro 
and Raman [8] provides an optimal solution to the 
problem of balanced parentheses representation 
under the word RAM model, based on which they 
design succinct trees and graphs. 


Balanced Parentheses 

Given a balanced parenthesis sequence of length 
2n, where there are n opening parentheses and 
n closing parentheses, consider the following 
operations: 


¢ findclose(i) (findopen(i)), the match- 
ing closing (opening) parenthesis for the 
opening (closing) parenthesis at position 1; 

* excess(i), the number of opening parenthe- 
ses minus the number of closing parentheses 
in the sequence up to (and _ including) 
position i; 

« enclose(i), the closest enclosing (matching 
parenthesis) pair of a given matching 
parenthesis pair whose opening parenthesis 
is at position 7. 


Trees 
There are essentially two forms of trees. An 
ordinal tree is a rooted tree in which the children 
of a node are ordered and specified by their ranks, 
while in a cardinal tree of degree k, each child 
of a node is identified by a unique number from 
the set {1,2,--- ,k}. An binary tree is a cardinal 
tree of degree 2. The information-theoretic lower 
bound of representing an ordinal tree or binary 
tree of m nodes is 2m — o(n) bits, as there are 
(7”) /(a +1) different ordinal trees or binary 
trees. 

Consider the following operations on ordinal 
trees (a node is referred to by its preorder num- 
ber): 


¢ childd, i), the ith child of node x fori > 1; 
¢ child_rank(x), the number of left siblings 
of node x; 
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¢ depth(x), the depth of x, i.e., the number of 
edges in the rooted path to node x; 

* parent(x), the parent of node x; 

¢ nbdesc(x), the number of descendants of 
node x; 

¢ height(x), the height of the subtree rooted at 
node x; 

¢ LCA(x, y), the lowest common ancestor of 
node x and node y. 


On binary trees, the operations parent, 
nbdesc and the following operations are 
considered: 


¢ leftchild() (rightchild(s)), the left 
(right) child of node x. 


Graphs 

Consider an undirected graph G of n vertices 
and m edges. Bernhart and Kainen [1] introduced 
the concept of page book embedding. A k-book 
embedding of a graph is a topological embedding 
of it in a book of k pages that specifies the or- 
dering of the vertices along the spine, and carries 
each edge into the interior of one page, such 
that the edges on a given page do not intersect. 
Thus, a graph with one page is an outerplanar 
graph. The pagenumber or book thickness [1] of 
a graph is the minimum number of pages that 
the graph can be embedded in. A very com- 
mon type of graphs are planar graphs, and any 
planar graph can be embedded in at most four 
pages [15]. Consider the following operations on 
graphs: 


* adjacency(x, y), whether vertices x and y 
are adjacent; 

« degree(x), the degree of vertex x; 

* neighbors(x), the neighbors of vertex x. 


Key Results 


All the results cited are under the word RAM 
model with word size O(lgn) bits (ign denotes 
[log,]), where n is the size of the problem 
considered. 


Theorem 1 ([8]) A sequence of balanced paren- 
theses of length 2n can be represented using 


2138 


Balanced parentheses: ((() COO) OQ)CCOOO))) 


Succinct Data Structures for Parentheses Matching, 
Fig. 1 An example of the balanced parenthesis sequence 
of a given ordinal tree 


2n + o(n) bits to support the operations find- 
close, findopen, excess and enclose in constant 
time. 


There is a polymorphism between a balanced 
parenthesis sequence and an ordinal tree: when 
performing a depth-first traversal of the tree, 
output an opening parenthesis each time a node 
is visited, and a closing parenthesis immediately 
after all the descendants of a node are visited 
(see Fig. 1 for an example). The work of Munro 
and Raman proposes a succinct representation 
of ordinal trees using 2 + o(n) bits to support 
depth, parent and nbdesc in constant time, 
and child(,, i) in O(/) time. Lu and Yeh have 
further extended this representation to support 
child, child_rank, height and LCA in 
constant time. 


Theorem 2 ([8, 7]) An ordinal tree of n nodes 
can be represented using 2n+ o0(n) bits to 
support the operations child, child_rank, parent, 
depth, nbdesc, height and LCA in constant 
time. 


A similar approach can be used to represent 
binary trees: 


Theorem 3 ((8]) A binary tree of n nodes can 
be represented using 2n + o(n) bits to support 
the operations leftchild, rightchild, parent and 
nbdesc in constant time. 


Finally, balanced parentheses can be used to rep- 
resent graphs. To represent a one-page graph, the 
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work of Munro and Raman proposes to list the 
vertices from left to right along the spine, and 
each node is represented by a pair of parentheses, 
followed by zero or more closing parentheses and 
then zero or more opening parentheses, where 
the number of closing (or opening) parentheses is 
equal to the number of adjacent vertices to its left 
(or right) along the spine (see Fig. 2 for an exam- 
ple). This representation can be applied to each 
page to represent a graph with pagenumber k. 


Theorem 4 ([8]) An outerplanar graph of n 
vertices and m edges can be represented us- 
ing 2n + 2m + o(n + m) bits to support opera- 
tions adjacency and degree in constant time, and 
neighbors(x) in time proportional to the degree 


of x. 


Theorem 5 ([8]) A graph of n vertices and m 
edges with pagenumber k can be represented 
using 2kn + 2m + o(nk + m) bits to support 
operations adjacency and degree in O(k) time, 
and neighbors(x) in O(d(x) +k) time where d(x) 
is the degree of x. In particular, a planar graph 
of n vertices and m nodes can be represented 
using 8n + 2m + o(n) bits to support opera- 
tions adjacency and degree in constant time, and 
neighbors(x) in O(d(x)) time where d(x) is the 
degree of x. 


Applications 


Succinct Representation of Suffix Trees 

As a result of the growth of the textual data in 
databases and on the World Wide Web, and also 
applications in bioinformatics, various indexing 
techniques have been developed to facilitate pat- 
tern searching. Suffix trees [14] are a popular 
type of text indexes. A suffix tree is constructed 
over the suffixes of the text as a tree-based data 
structure, so that queries can be performed by 
searching the suffixes of the text. It takes O(m) 
time to use a suffix tree to check whether an 
arbitrary pattern P of length m is a substring of 
a given text T of length n, and to count the number 
of the occurrences, occ, of P in T. O(occ) addi- 
tional time is required to list all the occurrences 
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OM OVC 0) OC 


OIC 


OM 0))) O)) 


Succinct Data Structures for Parentheses Matching, Fig. 2. An example of the balanced parenthesis sequence of 


a graph with one page 


of P in T. However, a standard representation of 
a suffix tree requires somewhere between 4n lgn 
and 6n1gn bits, which is impractical for many 
applications. 

By reducing the space cost of representing 
the tree structure of a suffix tree (using the 
work of Munro and Raman), Munro, Raman 
and Rao [9] have designed space-efficient suffix 
trees. Given a string of m characters over a fixed 
alphabet, they can represent a suffix tree using 
nlgn+ O(n) bits to support the search of 
a pattern in O(m + occ) time. To achieve this 
result, they have also extended the work of Munro 
and Raman to support various operations to 
retrieve the leaves of a given subtree in an ordinal 
tree. Based on similar ideas and by applying 
compressed suffix arrays [5], Sadakane [13] has 
proposed a different trade-off; his compressed 
suffix tree occupies O(nlgo) bits, where o is 
the size of the alphabet, and can support any 
algorithm on a suffix tree with a slight slowdown 
of a factor of polylog(7). 


Succinct Representation of Functions 

Munro and Rao [11] have considered the problem 
of succinctly representing a given function, 
f:|n]— [n], to support the computation 
of f*() for an arbitrary integer k. The 
straightforward representation of a function is 
to store the sequence f(i), fori = 0,1,...,n —1. 
This takes 1 lgn bits, which is optimal. However, 
the computation of f*(i) takes O(k) time even 
in the easier case when k is positive. To address 
this problem, Munro and Rao [11] first extends 
the representation of balanced parenthesis to 
support the next _excess(i, k) operator, which 
returns the minimum j such that 7 >i and 


excess(j) =k. They further use this operator 
to support the level _anc(x, i) operator on 
succinct ordinal trees, which returns the ith 
ancestor of node x for i > 0 (given a node x 
at depth d, its ith ancestor is the ancestor of x 
at depth d—i). Then, using succinct ordinal 
trees with the support for level anc, they 
propose a succinct representation of functions 
using (1+ e¢)nlgn+ O(1) bits for any fixed 
positive constant €, to support f*(i) in constant 
time when k > 0, and f*(i) in O(1 + | f*(i)J) 
time when k < 0. 


Multiple Parentheses and Graphs 

Chuang et al. [3] have proposed to succinctly 
represent multiple parentheses, which is a string 
of O(1) types of parentheses that may be 
unbalanced. They have extended the operations 
on balanced parentheses to multiple parentheses 
and designed a succinct representation. Based on 
the properties of canonical orderings for planar 
graphs, they have used multiple parentheses and 
the succinct ordinal trees to represent planar 
graphs. One of their main results is a succinct 
representation of planar graphs of n vertices and 
m edges in 2m + (5 + €)n + o(m +n) bits, for 
any constant € > 0, to support the operations 
supported on planar graphs in Theorem 5 in 
asymptotically the same amount of time. Chiang 
et al. [2] have further reduced the space cost 
to 2m+3n+o(m-+n) bits. In their paper, 
they have also shown how to support the 
operation wrapped(i), which returns the number 
of matching parenthesis pairs whose closest 
enclosing (matching parenthesis) pair is the pair 
whose opening parenthesis is at position i, in 
constant time on balanced parentheses. They 
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have used it to show how to support the operation 
degree(x), which returns the degree of node x 
(i.e., the number of its children), in constant time 
on succinct ordinal trees. 


Open Problems 


One open research area is to support more op- 
erations on succinct trees. For example, it is not 
known how to support the operation to convert 
a given node’s rank in a preorder traversal into its 
rank in a level-order traversal. 

Another open research area is to further reduce 
the space cost of succinct planar graphs. It is not 
known whether it is possible to further improve 
the encoding of Chiang et al. [2]. 

A third direction for future work is to design 
succinct representations of dynamic trees and 
graphs. There have been some preliminary results 
by Munro et al. [10] on succinctly representing 
dynamic binary trees, which have been further 
improved by Raman and Rao [12]. It may be 
possible to further improve these results, and 
there are other related dynamic data structures 
that do not have succinct representations. 


Experimental Results 


Geary et al. [4] have engineered the implementa- 
tion of succinct ordinal trees based on balanced 
parentheses. They have performed experiments 
on large XML trees. Their implementation uses 
orders of magnitude less space than the standard 
pointed-based representation, while supporting 
tree traversal operations with only a slight slow- 
down. 
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Problem Definition 


The suffix array [4, 15] is the lexicographically 
sorted array of all the suffixes of a string. It is a 
popular text index structure with many applica- 
tions. The subject of this entry is algorithms that 
construct the suffix array. 

More precisely, the input to a suffix array 
construction algorithm is a text string T = 
T[O...n) = fot,...tp-1, ie., a sequence 
of n characters from an alphabet X’. For 
i € [0...n], let S; denote the suffix T[i...n) = 
titi4,..-tn—1. The output is the suffix array 
SA[O...n] of T, a permutation of [0...7] 
satisfying Ssajo] < Sst] Sor Ss SsA{n]; 
where < denotes the lexicographical order of 
strings. 

Two specific models for the alphabet » 
are considered. An ordered alphabet is an 
arbitrary ordered set with constant time character 
comparisons. An integer alphabet is the integer 
range [1...o] foro = nO), 

Many applications require that the suffix array 
is augmented with additional information, most 
commonly with the /Jongest common prefix array 
LCP{1...n]. Anentry LCP{i] of the LCP array 
is the length of the longest common prefix of the 
suffixes S's 4,;] and Ss4jj—1). The enhanced suffix 
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array [1] adds two more arrays to obtain a full 
range of text index functionalities. 

There are other important text indexes, most 
notably suffix trees and compressed text indexes, 
covered in separate entries. Each of these indexes 
has their own construction algorithms, but they 
can also be constructed efficiently from each 
other. However, in this entry, the focus is on direct 
suffix array construction algorithms that do not 
rely on other text indexes. 


Key Results 


The naive approach to suffix array construction is 
to use a general sorting algorithm or an algorithm 
for sorting strings. However, any such algorithm 
has a worst-case time complexity 2(n7) because 
the total length of the suffixes is 2(n7). 

The first efficient algorithms were based on the 
doubling technique of Karp, Miller, and Rosen- 
berg [10]. The idea is to assign a rank to all 
substrings whose length is a power of two. The 
rank tells the lexicographic order of the substring 
among substrings of the same length. Given the 
ranks for substrings of length h, the ranks for 
substrings of length 2/ can be computed using 
a radix sort step in linear time (doubling). The 
technique was first applied to suffix array con- 
struction by Manber and Myers [15]. The best 
practical algorithm based on the technique is by 
Larsson and Sadakane [14]. 


Theorem 1 (Manber and Myers [15]; Larsson 
and Sadakane [14]) The suffix array can be 
constructed in O(n logn) time, which is optimal 
for the ordered alphabet. 


Faster algorithms for the integer alphabet are 
based on a different technique, recursion. The 
basic procedure is as follows. 


1. Sort a subset of the suffixes. This is done 
by constructing a shorter string, whose suffix 
array gives the order of the desired subset. The 
suffix array of the shorter string is constructed 
by recursion. 

2. Extend the subset order to full order. 
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The technique first appeared in suffix tree con- 
struction [3], but 2003 saw the independent and 
simultaneous publication of three linear time suf- 
fix array construction algorithms based on the 
approach but not using suffix trees. Each of the 
three algorithms uses a different subset of suf- 
fixes requiring a different implementation of the 
second step. 


Theorem 2 (Karkkainen, Sanders, and 
Burkhardt [8]; Kim et al. [12]; Ko and 
Aluru [13]) The suffix array can be constructed 
in the optimal linear time for the integer alphabet. 


We will describe the algorithm of Kérkkdinen, 
Sanders, and Burkhardt [8] called DC3 in more 
detail. For k € {0, 1,2}, let Ry be the set of suf- 
fixes S; such that i mod 3 = k. Let Ry2 = RU 
Rz and define Ro; and Roz symmetrically. For 
example, Riz = {S1, S2,S4, 55,57, Sg,...}. 
The set R42 is the subset of suffixes sorted first. 
For S; € Ryo, let S; be the lexicographical 
rank of S; in R42. Given those lexicographical 
ranks, we can compare any two suffixes S; and 
Sj; in constant time using one of the following 
ways: 


1. If $;,S; € R12, compare the ranks S; and Sj. 
2. If $;,S; € Ro1, compare the pairs (¢;,.S;+41) 

and (t;,Sj+1). 
3. If S;,S; € Ro2, compare the triples 


(ti, titi, Sita) and (t;,tj41, Sj+2). 


Furthermore, we can radix sort Ro in linear time 
by using (¢;, $;+1) to represent the suffix S; € 
Ro. After this, we can merge Ro and R412, which 
takes linear time since we can compare suffixes 
in constant time. 

We still need to describe how to sort R42. 
Let tfj+1fi+2 be the lexicographical rank of 
the substring ¢;t;41t;4+2 among all substrings of 
length three. Let 


T12 = tylot3 tatste t7tgto... 


tot3t4 tstet7 tgtotio... . 


For example if T = yabbadabbado, we have 


Suffix Array Construction 


T,}2 = abb ada bba doS bba dab bad o$$ 
= 12575648 , 


where §$ is a special padding symbol that does not 
appear in the text and is considered smaller than 
any normal character. Clearly, sorting the suffixes 
of Tj2 is equivalent to sorting the set R12. The 
suffixes of 72 are sorted by a recursive call to 
the algorithm itself. Since the recursive call is for 
a text of length at most [2”/3] and everything 
outside the recursive call can be done in linear 
time, the total time complexity of DC3 is O(n). 

The above algorithms and many other suf- 
fix array construction algorithms are surveyed 
in [18]. Worth mentioning among the more recent 
results are the linear time algorithms of Nong, 
Zhang, and Chan [17]. 

The §2(n logn) lower bound for the ordered 
alphabet mentioned in Theorem | comes from 
the sorting complexity of characters, since the 
initial characters of the sorted suffixes are the text 
characters in sorted order. Theorem 2 allows a 
generalization of this result. For any alphabet, one 
can first sort the characters of T, remove dupli- 
cates, assign a rank to each character, and con- 
struct a new string T’ over the alphabet [1...7] 
by replacing the characters of T with their ranks. 
The suffix array of T’ is exactly the same as 
the suffix array of T. Optimal algorithms for the 
integer alphabet then give the following result. 


Theorem 3 For any alphabet, the complexity of 
suffix array construction is the same as the com- 
plexity of sorting the characters of the string. 


The result extends to the related arrays. 


Theorem 4 (Kasai et al. [11]; Abouelhoda, 
Kurtz, and Ohlebusch [1]) The LCP array and 
the enhanced suffix array can be computed in 
linear time given the suffix array. 


One of the main advantages of suffix 
arrays over suffix trees is their smaller space 
requirement (by a constant factor), and a 
significant effort has been spent making 
construction algorithms space efficient, too. The 
best algorithms need very little extra space. 


Suffix Array Construction 


Theorem 5 (Karkkainen, Sanders, and 
Burkhardt [8]; Nong [16]) For any v = 
O(n?/3), the suffix array can be constructed in 
O(n(v + logn)) time and O(n/./v) extra space 
for the ordered alphabet and in O(nv) time and 
O(n/J/v) extra space or O(n) time and O(c) 
extra space for the integer alphabet, where the 
extra space is the space needed in addition to 
the input (the string T ) and the output (the suffix 
array). 


In the algorithm DC3 described above, all 
steps can be performed by sorting, prefix sums 
(assigning lexicographical ranks) and localized 
computation. This makes it straightforward to 
adapt to several parallel and hierarchical memory 
models of computation [8] including the fol- 
lowing result for the standard external memory 
model. 


Theorem 6 (Karkkainen, Sanders, and 
Burkhardt [8]) The suffix array can _ be 
constructed in the optimal O(sort(n)) I/Os in the 
standard external memory model, where sort(n) 
is the I/O complexity of sorting n elements. 


The above algorithm can be modified to com- 
pute the LCP array too in the same I/O complex- 
ity [2,7]. 


Applications 


The suffix array is a simple and powerful text in- 
dex structure with numerous applications; see [1] 
and Cross-References. The practical construction 
of many other text indexes usually starts with 
the suffix array construction. In particular, the 
Burrows—Wheeler transform, which is an im- 
portant technique for text compression and the 
basis of many compressed text indexes, is easily 
computed from the suffix array. 


Open Problems 


Theoretically, the suffix array construction prob- 
lem is essentially solved. The development of 
ever more efficient practical algorithms is still 
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going on particularly for external memory and 
parallel computation. There is currently no ex- 
ternal memory algorithm for computing the LCP 
array from the suffix array in O(sort(n)) I/Os 
other than as a side effect of suffix array construc- 
tion [6]. 


Experimental Results 


Many papers on suffix array construction contain 
experimental results, but they are usually either 
out of date (e.g., [18]) or limited in scope 
(e.g., [16]). The most comprehensive comparison 
of algorithms is at https://code.google.com/ 
p/libdivsufsort/wiki/SACA_Benchmarks. The 
best practical algorithms for large data are 
divsufsort, which is an O(nlogn) time 
algorithm combining several techniques, and 
SATS, which is an implementation of the linear 
time algorithm by Gong, Zhang, and Chan [17] 
(see below for URLs to code). The comparison 
and the fastest implementation are by the same 
person, Yuta Mori, but the implementations are 
widely used and there are no substantial claims 
for other, faster algorithms. 

There are also experiments for suffix array 
construction in external memory [2,5] and for 
LCP array construction [2, 6, 9]. 


URLs to Code and Data Sets 


The input to a suffix array construction algorithm 
is simply a text, so an abundance of data exists. 
Links to many text collections are provided 
at https://code.google.com/p/libdivsufsort/wiki/ 
SACA_Benchmarks. Worth mentioning is also 
the Pizza&Chili site with its standard text corpus 
http://pizzachili.dcc.uchile.cl/texts.html and the 
repetitive text corpus http://pizzachili.dcc.uchile. 
cl/repcorpus.html. 

Notable implementations of suffix array 
construction algorithms are available at https:// 
code.google.com/p/libdivsufsort/, at https://sites. 
google.com/site/yuta256/sais, at http://panthema. 
net/2012/1119-eSAIS-Inducing-Suffix-and-LCP 
-Arrays-in-External-Memory/ [2], and at https:// 
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www.cs.helsinki.fi/group/pads/SAscan.html [5]. 
The latter two work in external memory and 


provide (links to) LCP array construction too. 
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Problem Definition 

The suffix tree is perhaps the best-known and 
most-studied data structure for string indexing 
with applications in many fields of sequence 
analysis. After its invention in the early 1970s, 
several approaches for the efficient construction 
of the suffix tree of a string have been developed 
for various models of computation. The most 
prominent of those that construct the suffix tree 
in main memory are summarized in this entry. 


Notations 
Given an alphabet &, a trie over ¥ is a rooted 
tree whose edges are labeled with strings over 
=x such that no two labels of edges leaving the 
same vertex start with the same symbol. A trie 
is compacted if all its internal vertices, except 
possibly the root, are branching. Given a finite 
string S ¢€ &”, the suffix tree of S, T(S), is 
the compacted trie over & such that the concate- 
nations of the edge labels along the paths from 
the root to the leaves are the suffixes of S. An 
example is given in Fig. 1. 

The concatenation of the edge labels from the 
root to a vertex v of T(S) is called the path- 
label of v, P(v). For example, the path label of 


Suffix Tree Construction, Fig. 1 The suffix tree for 
the string S = MAMMAMIA. Dashed arrows denote 
suffix links that are employed by all efficient suffix tree 
construction algorithms 
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the vertex indicated by the asterisk in Fig. 1 is 
P(*) = MAM. 


Constraints 

The time complexity of constructing the suffix 
tree of a string S of length nm depends on the 
size of the underlying alphabet %. It may be 
constant, it may be the alphabet of integers © = 
{1,2,...,m}, or it may be an arbitrary finite set 
whose elements can be compared in constant 
time. Note that the latter case reduces to the pre- 
vious one if one maps the symbols of the alphabet 
to the set {1,..., 7}, though at the additional cost 
of sorting &. 


Problem 1 (suffix tree construction) 


INPUT: A finite string S of length n over an 
alphabet &. 
OUTPUT: The suffix tree T(S). 


If one assumes that the outgoing edges at 
each vertex are lexicographically sorted, which is 
usually the case, the suffix tree allows retrieving 
the sorted order of S’s characters in linear time. 
Therefore, suffix tree construction inherits the 
lower bounds from the problem complexity of 
sorting: Q(n logn) in the general alphabet case 
and (2(7) for integer alphabets. 


Key Results 


Theorem 1 The suffix tree of a string of length 
n can be represented in O(n logn) bits of space. 


This is easy to see since the number of leaves 
of T(S) is at most n, and so is the number of 
internal vertices that, by definition, are all branch- 
ing, as well as the number of edges. In order to 
see that each edge label can be stored in O(log 1) 
bits of space, note that an edge label is always 
a substring of S. Hence it can be represented by 
a pair (/,7) consisting of left pointer 1 and right 
pointer r, if the label is S[/,r]. 

Note that this space bound is not optimal since 
there are |=|” different strings and hence suffix 
trees, while nlog n bits would allow to represent 
n! different entities. 


Theorem 2 Suffix trees can be constructed in 
optimal time, in particular: 
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1. For constant-size alphabet, the suffix tree 
T(S) of a string S of length n can be 
constructed in O(n) time [11-13]. For general 
alphabet, these algorithms require O(n logn) 
time. 

2. For integer alphabet, the suffix tree of S can 
be constructed in O(n) time [4, 9]. 


Generally, there is a natural strategy to construct a 
suffix tree: Iteratively all suffixes are inserted into 
an initially empty structure. Such a strategy will 
immediately lead to a linear-time construction al- 
gorithm if each suffix can be inserted in constant 
time. Finding the correct position where to insert 
a suffix, however, is the main difficulty of suffix 
tree construction. 

The first solution for this problem was given 
by Weiner in his seminal 1973 paper [13]. His 
algorithm inserts the suffixes from shortest to 
longest, and the insertion point is found in amor- 
tized constant time for constant-size alphabet, 
using rather a complicated amount of additional 
data structures. A simplified version of the algo- 
rithm was presented by Chen and Seiferas [3]. 
They give a cleaner presentation of the three 
types of links that are required in order to find 
the insertion points of suffixes efficiently, and 
their complexity proof is easier to follow. Since 
the suffix tree is constructed while reading the 
text from right to left, these two algorithms are 
sometimes called anti-online constructions. 

A different algorithm was given in 1976 by 
McCreight [11]. In this algorithm the suffixes 
are inserted into the growing tree from longest 
to shortest. This simplifies the update procedure, 
and the additional data structure is limited to just 
one type of link: an internal vertex v with path 
label P(v) = aw for some symbol a € & 
and string w € &* has a suffix link to the 
vertex u with path label P(u) = w. In Fig. 1, 
suffix links are shown as dashed arrows. They 
often connect vertices above the insertion points 
of consecutively inserted suffixes, like the vertex 
with path-label “M” and the root, when inserting 
suffixes “MAMIA” and “AMIA” in the example 
of Fig. 1. This property allows reaching the next 
insertion point without having to search for it 
from the root of the tree, thus ensuring amortized 
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constant time per suffix insertion. Note that since 
McCreight’s algorithm treats the suffixes from 
longest to shortest and the intermediate structures 
are not suffix trees, the algorithm is not an online 
algorithm. 

Another linear-time algorithm for constant- 
size alphabet is the online construction by Ukko- 
nen [12]. It reads the text from left to right and 
updates the suffix tree in amortized constant time 
per added symbol. Again, the algorithm uses 
suffix links in order to quickly find the insertion 
points for the suffixes to be inserted. Moreover, 
since during a single update the edge labels of 
all leaf edges need to be extended by the new 
symbol, it requires a trick to extend all these 
labels in constant time: all the right pointers of the 
leaf edges refer to the same end of string value, 
which is just incremented. 

An even stronger concept than online 
construction is real-time construction, where 
the worst-case (instead of amortized) time per 
symbol is considered. Amir et al. [1] present 
for general alphabet a suffix tree construction 
algorithm that requires O(logn) worst-case 
update time per every single input symbol when 
the text is read from right to left, and thus requires 
overall O(n log) time, like the other algorithms 
for general alphabet mentioned so far. They 
achieve this goal using a binary search tree on 
the suffixes of the text, enhanced by additional 
pointers representing the lexicographic and the 
textual order of the suffixes, called Balanced 
Indexing Structure. This tree can be constructed 
in O(log 7) worst-case time per added symbol 
and allows maintaining the suffix tree in the same 
time bound. 

The first linear-time suffix tree construction 
algorithm for integer alphabets was given 
by Farach-Colton [4]. It uses the so-called 
odd-even technique that proceeds in three 
steps: 


1. Recursively compute the compacted trie of all 
suffixes of S beginning at odd positions, called 
the odd tree To. 

2. From T, compute the even tree Te, the com- 
pacted trie of the suffixes beginning at even 
positions in S. 
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3. Merge 7, and 7, into the whole suffix tree 
T(S). 


The basic idea of the first step is to encode pairs 
of characters as single characters. Since at most 
n/2 different such characters can occur, these can 
be radix-sorted and range-reduced to an alphabet 
of size n/2. Thus, the string S of length 7 over the 
integer alphabet © = {1,...,} is translated in 
O(n) time into a string S’ of length n/2 over the 
integer alphabet ©’ = {1,...,n/2}. Applying 
the algorithm recursively to this string yields 
the suffix tree of S’. After translating the edge 
labels from substrings of S’ back to substrings 
of S, some vertices may exist with outgoing 
edges whose labels start with the same symbol, 
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because two distinct symbols from &’ may be 
pairs with the same first symbol from &. In such 
cases, by local modifications of edge labels or 
adding additional vertices, the trie property can 
be regained and the desired tree T, is obtained. 

In the second step, the odd tree 7, from the 
first step is used to generate the lexicographically 
sorted list (lex-ordering for short) of the suffixes 
starting at odd positions. Radix-sorting these with 
the characters at the preceding even positions as 
keys yields a lex-ordering of the even suffixes 
in linear time. Together with the longest com- 
mon prefixes (Icps) of consecutive positions that 
can be computed in linear time from 7, using 
constant-time lowest common ancestor queries 
and the identity 


lep(loi41.l2j+1) + 1 if Si] = SPJ] 


Iep(l2;, /2;) = ; 


otherwise 


this ordering allows reconstructing the even tree 
Te in linear time. 

In the third step, the two tries 7, and Te are 
merged into the suffix tree 7(S). Conceptually, 
this is a straightforward procedure: the two tries 
are traversed in parallel, and every part that is 
present in one or both of the two trees is inserted 
in the common structure. However, this proce- 
dure is simple only if edges are traversed charac- 
ter by character such that common and differing 
parts can be observed directly. Such a traversal 
would, however, require O(n”) time in the worst 
case, impeding the desired overall linear running 
time. Therefore, Farach-Colton suggests to use an 
oracle that tells for an edge of 7, and an edge of 
Te the length of their common prefix. 

However, the suggested oracle may overes- 
timate this length, and that is why sometimes 
the tree generated must be corrected, called un- 
merging. The full details of the oracle and the 
unmerging procedure can be found in [4]. 

Overall, if T (7) is the time it takes to build the 
suffix tree of a string S € {1,...,n}”, the first 
step takes T(n/2) + O(n) time and the second 
and third steps take O() time; thus the whole 


procedure takes O(n) overall time on the RAM 
model. 

Another linear-time construction of suffix 
trees for integer alphabets can be achieved 
via linear-time construction of suffix arrays 
together with longest common prefix tabulation, 
as described by Kérkkdinen and Sanders in [9]. 

All previously mentioned algorithms construct 
the suffix tree in main memory. However, since 
the data structure may become very large in 
practice, also methods for building the suffix tree 
in secondary memory have been studied. Possibly 
the simplest way is to first construct the suffix 
array A and the LCP array on disk, as described in 
the entry > Suffix Array Construction. When this 
is done, it is only a small final step to construct 
the suffix tree [4]. The idea is to construct the tree 
in” phases from left to right, such that after phase 
i the suffix tree of the strings A[1], ..., Ali] has 
been constructed. Simultaneously, an external- 
memory stack containing the nodes on the path 
leading from the root to Ali] is maintained. In 
phase i +1, first, the leaf representing string A[i+ 
1] is created, and then all nodes are popped 
from the stack whose string length is strictly 
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greater than LCP[i ]. Next, a new node with string 
depth LCP[i] is created (unless it already exists) 
whose parent is the top element of the stack and 
whose children are the last popped element and 
the new leaf. This new node and the new leaf 
are finally pushed on the stack. Keeping the two 
top pages of the stack in internal memory, the 
algorithm executes a total of O(n) pop and push 
operations and therefore uses a total of O(n/B) 
time, where B is the external memory block size. 

Other more direct ways to construct the 
suffix tree on disk have also been developed, 
e.g., [14, 15]. 

In some applications the so-called generalized 
suffix tree of several strings is used, a dictionary 
obtained by constructing the suffix tree of the 
concatenation of the contained strings. An im- 
portant question that arises in this context is that 
of dynamically updating the tree upon insertion 
and deletion of strings from the dictionary. More 
specifically, since edge labels are stored as pairs 
of pointers into the original string, when deleting 
a string from the dictionary, the corresponding 
pointers may become invalid and need to be 
updated. An algorithm to solve this problem in 
amortized linear time was given by Fiala and 
Greene [6], and a linear worst-case (and hence 
real-time) algorithm was given by Ferragina et 
al. [5]. 


Applications 

The suffix tree supports many applications, most 
of them in optimal time and space, including 
exact string matching, set matching, longest com- 
mon substring of two or more sequences, all- 
pairs suffix-prefix matching, repeat finding, and 
text compression. These and several other appli- 
cations, many of them from bioinformatics, are 
given in [2] and [8]. 


Open Problems 

Some theoretical questions regarding the 
expected size and branching structure of suffix 
trees under more complicated than i. i. d. 
sequence models are still open. Currently most 
of the research has moved toward more space- 
efficient data structures like suffix arrays and 
compressed string indices or the Burrows- 
Wheeler Transform. 
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Experimental Results 

Suffix trees are infamous for their high memory 
requirements. The practical space consumption 
is between 9 and 11 times the size of the string 
to be indexed, even in the most space-efficient 
implementations known [7, 10]. Moreover, [7] 
also shows that suboptimal algorithms like the 
very simple quadratic-time write-only top-down 
(WOTD) algorithm can outperform optimal algo- 
rithms on many real-world instances in practice, 
if carefully engineered. 


URLs to Code and Data Sets 

Several sequence analysis libraries contain code 
for suffix tree construction. For example, Str- 
mat (http://www.cs.ucdavis.edu/~gusfield/strmat. 
html) by Gusfield et al. contains implementations 
of Weiner’s and Ukkonen’s algorithm. An imple- 
mentation of the WOTD algorithm by Kurtz can 
be found at (http://bibiserv.techfak.uni-bielefeld. 
de/wotd). 
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Problem Definition 


The suffix tree is the ubiquitous data structure of 
combinatorial pattern matching myriad of situa- 
tions — just to cite a few, searching, data compres- 
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sion and mining, and bioinformatics [7]. In these 
applications, the large data sets now available in- 
volve the use of numerous memory levels which 
constitute the storage medium of modern PCs: L1 
and L2 caches, internal memory, multiple disks, 
and remote hosts over a network. The power of 
this memory organization is that it may be able 
to offer the expected access time of the fastest 
level (i.e., cache) while keeping the average cost 
per memory cell near the one of the cheapest 
level (i.e., disk), provided that data are properly 
cached and delivered to the requiring algorithms. 
Neglecting questions pertaining to the cost of 
memory references may even prevent the use 
of algorithms on large sets of input data. Engi- 
neering research is presently trying to improve 
the input/output subsystem to reduce the impact 
of these issues, but it is very well known [20] 
that the improvements achievable by means of 
a proper arrangement of data and a properly 
structured algorithmic computation abundantly 
surpass the best-expected technology advance- 
ments. 


The Model of Computation 

In order to reason about algorithms and data 
structures operating on hierarchical memories, it 
is necessary to introduce a model of computation 
that grasps the essence of real situations so that 
algorithms that are good in the model are also 
good in practice. The model considered here is 
the external-memory model [20], which received 
much attention because of its simplicity and rea- 
sonable accuracy. A computer is abstracted to 
consist of two memory levels: the internal mem- 
ory of size M and the (unbounded) disk memory 
which operates by reading/writing data in blocks 
of size B (called disk pages). The performance of 
algorithms is then evaluated by counting (a) the 
number of disk accesses (I/Os), (b) the internal 
running time (CPU time), and (c) the number 
of disk pages occupied by the data structure or 
used by the algorithm as its working space. This 
simple model suggests, correctly, that a good 
external-memory algorithm should exploit both 
spatial locality and temporal locality. Of course, 
“1/O” and “two-level view” refer to any two levels 
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Suffix Tree Construction in Hierarchical Memory, Fig. 
1 The suffix tree of S = ACACACCG on the left, and 
its compact edge-encoding on the right. The endmarker # 
is not shown. Node v spells out the string ACAC. Each 


of the memory hierarchy with their parameters M 
and B properly set. 


Notation 

Let S[1,n] be a string drawn from alphabet &, 
and consider the notation: S; for the ith suffix of 
string S, Lcp(a, B) for the longest common pre- 
fix between the two strings a and B, and Lca(u, v) 
for the lowest common ancestor between two 
nodes u and v in a tree. 

The suffix tree of S[1, 7], denoted hereafter by 
Ts, is a tree that stores all suffixes of S# in a 
compact form, where # ¢ © is a special character 
(see Fig. 1). 7s consists of n leaves, numbered 
from | to n, and any root-to-leaf path spells out a 
suffix of S#. The endmarker # guarantees that no 
suffix is the prefix of another suffix in S#. Each 
internal node has at least two children and each 
edge is labeled with a nonempty substring of S. 
No two edges out of a node can begin with the 
same character, and sibling edges are ordered lex- 
icographically according to that character. Edge 
labels are encoded with pairs of integers — say 
S[x, y] is represented by the pair (x,y). As a 
result, all ©(n7) substrings of S can be repre- 
sented in O(n) optimal space by Jy’s structure 
and edge encoding. Furthermore, the rightward 
scan of the suffix-tree leaves gives the ordered 
set of S’s suffixes, also known as the suffix array 
of S [13]. Notice that the case of a large string 
collection A = {S!, S?,..., S*} reduces to the 
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internal node stores the length of its associated string, and 
each leaf stores the starting position of its corresponding 
suffix 


case of one long string S = S!#,S?#--- S*#,, 
where #; ¢ & are special symbols. 

Numerous algorithms are known that build 
the suffix tree optimally in the RAM model (see 
[3] and references therein). However, most of 
them exhibit a marked absence of locality of 
references and thus elicit many I/Os when the 
size of the indexed string is too large to be 
fit into the internal memory of the computer. 
This is a serious problem because the slow 
performance of these algorithms can prevent 
the suffix tree being used even in medium-scale 
applications. This encyclopedia’s entry surveys 
algorithmic solutions that deal efficiently with 
the construction of suffix trees over large string 
collections by executing an optimal number 
of I/Os. Since it is assumed that the edges 
leaving a node in Js are lexicographically 
sorted, sorting is an obvious lower bound for 
building suffix trees (consider the suffix tree 
of a permutation!). The presented algorithms 
have sorting as their bottleneck, thus establishing 
that the complexity of sorting and suffix tree 
construction match. 


Key Results 


Designing a disk-efficient approach to suffix-tree 
construction has found efficient solutions only in 
the last few years [4]. The present section surveys 


Suffix Tree Construction in Hierarchical Memory 


DIVIDE-AND-CONQUER ALGORITHM 
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(1) Construct the string S’[j] = rank of (S[2j], S[2j + 1]), and recursively compute Ts’. 
(2) Derive from Ts’ the compacted trie To of all suffixes of S beginning at odd positions. 
(3) Derive from To the compacted trie Te of all suffixes of S beginning at even positions. 
(4) Merge To and Te into the whole suffix tree Ts, as follows: 

(4.1) Overmerge Jo and Te into the tree T yy. 


(4.2) Partially unmerge 7 jy to get Ts. 


Suffix Tree Construction in Hierarchical Memory, Fig. 2. The algorithm that builds the suffix tree directly 


Suffix Tree Construction in Hierarchical Memory, Fig. 3 The algorithm that builds the suffix tree passing through 


the suffix array 


two theoretical approaches which achieve the 
best (optimal!) I/O-bounds in the worst case; 
the next section will discuss some practical 
solutions. 

The first algorithm is based on a Divide-and- 
Conquer approach that allows us to reduce the 
construction process to external-memory sorting 
and few low-I/O primitives. It builds the suffix 
tree 7s by executing four (macro)steps, detailed 
in Fig. 2. It is not difficult to implement the first 
three steps in Sort(n) = O(% logy gz %) V/Os 
[20]. The last (merging) step is the most difficult 
one and its I/O-complexity bounds the cost of the 
overall approach. Farach-Colton et al. [3] propose 
an elegant merge for 7, and Je: substep (4.1) 
temporarily relaxes the requirement of getting Ts 
in one shot, and thus it blindly (over)merges the 
paths of 7, and 7, by comparing edges only via 
their first characters; then substep (4.2) refixes 
Tm by detecting and undoing in an I/O-efficient 
manner the (over)merged paths. Note that the 
time and I/O-complexity of this algorithm follow 
a nice recursive relation: T(n) = T(n/2) + 
O(Sort(n)). 


Theorem 1 (Farach-Colton et al. [5]) Given an 
arbitrary string S[1,n], its suffix tree can be 
constructed in O(Sort(n)) I/Os, O(n logn) time 
and using O(n/B) disk pages. 


The second algorithm [10] is deceptively sim- 
ple, elegant, and I/O optimal and applies suc- 
cessfully to the construction of other indexing 
data structures, like the string Btree [5]. The key 
idea is to derive 7s from the suffix array Ag 
and from the /cp array, which stores the longest- 
common-prefix length of adjacent suffixes in Ag. 
Its pseudocode is given in Fig.3. Note that step 
(1) may deploy any external-memory algorithm 
for suffix array construction: used here is the 
elegant and optimal Skew algorithm of [9] which 
takes O(Sort(n)) I/Os. Step (2) takes a total of 
O(n/B) VOs by using a stack that stores the 
nodes on the current rightmost path of 7s in 
reversed order, i.e., leaf £; is on top. Walking 
upward, splitting edges or attaching nodes in Ts 
boils down to popping/pushing nodes from this 
stack. As a result, the time and I/O-complexity 
of this algorithm follow the recursive relation: 
T(n) = T(2n/3) + O(Sort(n)). 
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Theorem 2 (Karkkdinen and Sanders 2003, 
see [10]) Given an arbitrary string S{1,n], its 
suffix tree can be constructed in O(Sort(n)) I/Os, 
O(n logn) time and using O(n/B) disk pages. 


It is not evident which one of these two algo- 
rithms is better in practice [10]. The first one ex- 
ploits a recursion with parameter 1/2 but incurs a 
large space overhead because of the management 
of the tree topology; the second one is more space 
efficient and easier to implement, but exploits a 
recursion with parameter 2/3. 


Applications 


The reader is referred to [4] and [7] for a long list 
of applications of large suffix trees and to [6, 18] 
for practical implementations. 


Open Problems 


The recent theoretical and practical achievements 
mean the idea that “suffix trees are not practical 
except when the text size to handle is so small 
that the suffix tree fits in internal memory” is 
no longer the case [15]. Given a suffix tree, it 
is known now (see, e.g., [4, 11]) how to map 
it onto a disk-memory system in order to al- 
low I/O-efficient traversals for subsequent pat- 
tern searches. A fortiori, suffix-tree storage, and 
construction are challenging problems that need 
further investigation. 

Space optimization is closely related to time 
optimization in a disk-memory system, so the 
design of succinct suffix-tree implementations is 
a key issue in order to scale to gigabytes of 
data in reasonable time. This topic is an active 
area of theoretical research with many fascinating 
solutions (see, e.g., [16] and the many papers that 
followed it), which need further exploration in the 
practical setting. 

It is theoretically challenging to design a 
suffix-tree construction algorithm that takes 
optimal I/Os and space proportional to the 
entropy of the indexed string. The more 
compressible is the string, the lighter should 
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be the space requirement of this algorithm. Some 
results are known [8, 11, 12], but both issues of 
compression and I/Os have been tackled jointly 
only recently [6], but more results are foreseen. 


Experimental Results 


The interest in building large suffix trees arose in 
the last few years because of the recent advances 
in sequencing technology, which have allowed 
the rapid accumulation of DNA and protein data. 
Some recent papers [1, 2, 9, 17, 18] proposed 
new practical algorithms that allow us to scale 
to Gbps/hours. Surprisingly enough, these algo- 
rithms are based on disk-inefficient schemes, but 
they properly select the insertion order of the 
suffixes and exploit carefully the internal mem- 
ory as a buffer, so that their performance does 
not suffers significantly from the theoretical I/O- 
bottleneck. 

In [9] the authors propose an incremental al- 
gorithm, called PrePar, which performs multiple 
passes over the string S' and constructs the suffix 
tree for a subrange of suffixes at each pass. For 
a user-defined parameter g, a suffix subrange is 
defined as the set of suffixes prefixed by the same 
q-long string. Suffix subranges induce subtrees 
of Ts which can thus be built independently 
and evicted from internal memory as they are 
completed. The experiments reported in [9] suc- 
cessfully index 286Mbps using 2Gb internal 
memory. 

In [2] the authors propose an improved version 
of PrePar, called DynaCluster, that deploys a 
dynamic technique to identify suffix subranges. 
Unlike Prepar, DynaCluster does not scan over 
and over the string S, but it starts from the q- 
based subranges and then splits them recursively 
in a DFS-manner if their size is larger than a fixed 
threshold t. Splitting is implemented by looking 
at the next q characters of the suffixes in the sub- 
range. This clustering and lazy-DFS visit of Ts 
significantly reduce the number of I/Os incurred 
by the frequent edge-splitting operations that oc- 
cur during the suffix-tree construction process 
and allow it to cope efficiently with skew data. 
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As a result, DynaCluster constructs suffix trees 
for 200 Mbps with only 16 Mb internal memory. 

In [17] authors improved the space require- 
ment and the buffering efficiency, thus being 
able to construct a suffix tree of 3 Gbps in 30h, 
whereas [1] improved the I/O behavior of RAM- 
algorithms for online suffix-tree construction, by 
devising a novel low-overhead buffering policy. 
More recently [14] introduced a new technique, 
called Elastic Range (ERA), which partitions 
the tree construction process horizontally and 
vertically and minimizes I/Os by dynamically 
adjusting the horizontal partitions independently 
for each vertical partition, based on the evolving 
shape of the tree and the available internal mem- 
ory. This technique is specialized to work also 
for shared-memory and shared-disk multi-core 
systems and for parallel shared-nothing architec- 
tures. ERA indexes the entire human genome in 
19min on a commodity desktop PC. For com- 
parison, the fastest existing method needs 15 min 
using 1024 CPUs on an IBM BluGene supercom- 
puter. 

Finally [19] observed that increasing memory 
sizes of current commodity PCs and servers 
enhance the impact of in-memory tasks on 
performance. So it is imperative nowadays 
to reassess the performance of in-memory 
algorithms and to propose new algorithms 
that incorporate the characteristics of modern 
hardware architectures, such as multilevel 
memory hierarchy and chip multiprocessors 
(CMPs). Starting from these premises the 
authors proposed cache-conscious — suffix-tree 
construction algorithms that are tailored to CMP 
architectures, using novel sample-based cache- 
partitioning techniques that improved cache 
performance and exploited on-chip parallelism 
of CMPs thus achieving satisfactory speedups 
with increasing number of cores. 
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The suffix tree is one of the oldest full-text 
inverted indexes and one of the most persistent 
subjects of study in the theory of algorithms. With 
extensions and refinements, including succinct 
and compressed variants that provide some of its 
expressive power in smaller space, it constitutes 
a fundamental conceptual tool in the design of 
string algorithms. The companion structure rep- 
resented by the suffix array is as powerful as the 
suffix tree in many applications, but it requires 
significantly less space. The uses of these data 
structures are so numerous that it is difficult to ac- 
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count for all of them, while even more are being 
discovered. Salient applications include search- 
ing for a pattern in a text in time proportional 
to the size of the pattern, various computations 
on regularities such as repeats and palindromes 
within a text, statistical tables of substring occur- 
rences, data compression by textual substitution, 
as well as ancillary yet fundamental tasks in 
string searching with errors, and more. 


Problem Definition 


It is well known that searching among n keys 
in an unsorted table takes optimal linear time. 
When multiple searches are expected, however, it 
becomes worth to sort the table once and for all, 
whereby each subsequent search will require only 
logarithmic time. It is similarly possible to build 
an inverted index on a long text so that the search 
for any query string will take time proportional to 
the length of the query rather than that of the text. 
It turns out that the data structures built for this 
purpose support many more applications, which 
are the topic of this entry. 

Formally, let T be a string of length ” on 
alphabet © = [1...o0], let T be its reverse, and 
let # ¢ SY be a shorthand for zero. To simplify 
the exposition, we assume throughout that o is a 
constant. The suffix tree STr = (1L,V, E) of T 
is a tree rooted at node Le V with set of nodes 
V and set of labeled edges F (Fig. 1, left). Edge 
labels are pointers to substrings of T#: we denote 
by £(e), and equivalently by €(u, v), the label of 
edge e = (u,v) € E, and we denote by €(v) 
the string €(L, v1) - €(v1, v2) + +++ + L(uK-4, v), 
where 1, v1,V2,...,Ux¢—1, U is a path in ST. 
We say that node v has string depth |€(v)|. Let 
v € V be an internal node, and let w1, w2,..., wx 
be its children: then, 2 < k < o + 1, and 
labels €(v,w 1), €(v, w2), ..., &(v, wx) start with 
distinct characters. The children of v are ordered 
lexicographically according to the labels of edges 
(v,W1), (UV, W2),...,(V, we). There is a bijection 
between the leaves of STy and the suffixes of 
T#, so every leaf is annotated with the starting 
position of its corresponding suffix. Moreover, 
if leaf v € V is associated with the suffix that 
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Suffix Trees and Arrays, Fig. 1 Relationship between 
the suffix tree, the suffix array (/eft), and the suffix-link 
tree (right) of string T = AGAGCGAGAGCGCGCH#. Thin 
black lines, edges of ST; thick gray lines, suffix links; 
thin dashed lines, implicit Weiner links; thick black lines, 
the subtree of ST7 induced by maximal repeats. Black 


starts at position 7, then €(v) = T[i ...n]#. Since 
STr has exactly n + 1 leaves and every internal 
node has at least two children, there are at most 
n internal nodes; thus, ST7 takes O(n) space. 
We drop the subscript from ST whenever the 
underlying string is clear from the context. 

A substring W of T# is called right maximal 
if both Wa and Wb occur in T, with {a,b} C 
+ U{#} anda  b. Clearly a substring W is right 
maximal iff W = €(v) for some v € V. More- 
over, assume that £(v) = aW for some v € V, 
aeé X,and W e€ &*. Since aW is right maximal, 
string W is right maximal as well; therefore, there 
is anode w € V with £(w) = W. Thus, the set 
of labels {€(v) : v € V} enjoys the suffix closure 
property, in the sense that if a string W belongs 
to the set so does every one of its suffixes. We say 
that there is a suffix link from v to w labeled by 
a, and we write suffixLink(v) = w. Clearly, 
if v is a leaf, then suf£ixLink(v) is either a 
leaf or L. The graph induced by V and by suffix 
links is a trie rooted at L: such trie is called the 
suffix-link tree SLTr of string T (Fig. 1, right). 
Inverting the direction of all suffix links yields 
the so-called explicit Weiner links. Given a node 
v and a symbol a € J, it might happen that 
string a€(v) does occur in T but that it is not the 
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dots, nodes of ST; large black dot, 1; white dots, 
destinations of implicit Weiner links. Squares, leaves of 
STr and cells of SA7; numbers, starting position of 
each suffix in 7’. For clarity, implicit Weiner links are not 
overlaid to ST, and suffix links from the leaves of ST 7 
are not drawn 


label of any node in V: all such left extensions 
of nodes in V that end in the middle of an edge 
of ST are called implicit Weiner links. A node in 
V can have more than one outgoing Weiner link, 
and all such Weiner links have different labels. 
The number of suffix links (or, equivalently, of 
explicit Weiner links) is upper-bounded by 2n—2, 
and the same bound holds for the number of 
implicit Weiner links: in some applications, we 
thus assume that ST is augmented with unary 
nodes that correspond to all the destinations of 
implicit Weiner links. A substring W of T# is 
called left maximal if both aW and bW occur in 
T#, with {a,b} C ZY U {#} anda 4 b, where 
T# is interpreted as a circular string. A string that 
is both left and right maximal is called maximal 
repeat. The set of all left-maximal strings enjoys 
the prefix closure property; therefore, there is a 
bijection between the maximal repeats and the 
nodes that lie in some paths of ST that start from 
the root (Fig. 1, left). 

The suffix array SA7[1...n + 1] of string 
T is the permutation of [1...1 + 1] such that 
SAr[k] = i iff suffix T[i...n]# has position 
k in the list of all suffixes of 7# taken in lex- 
icographic order. In this case, we say that suf- 
fix T[i ...n]# has lexicographic rank k. Clearly 
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SAr[l] = n + 1. The inverse suffix array of 
string T is an array Rr[1...n + 1] such that 
Rr[SA[i]] = i for alli € [l...n + 1]. A 
substring W of T# corresponds to a unique, 
contiguous interval (iw, jw) of SAr, which con- 
tains all the suffixes of 7# that are prefixed by 
W. An additional structure that complements the 
suffix array in many applications is the longest 
common prefix array LCPr[2...n + 1], which 
stores at position i the length of the longest 
prefix shared by suffix T[SAr[i]...n]# and by 
suffix T[SAr[i — 1]...n]#. Clearly LCP 7 [k] > 
|W| for all k € fiw + 1... jw]. Again, we 
drop the subscript from SA, R, and LCP when- 
ever the underlying string is clear from the con- 
text. 

Suffix tree, suffix array, and LCP array are 
strongly intertwined, and they have connections 
to other substring recognizers, like the directed 
acyclic word graph (DAWG) and its compact 
variant (CDAWG). SA can be thought of as 
the ordered set of leaves of ST, and ST can 
be thought of as a search tree built on top of 
SA (Fig. 1, left). The full ST, including suffix 
links, can be built from SA and LCP with a 
O(n)-time scan [1], and SA can be built from 
ST with a O(n) traversal. LCP itself can be 
built from SA in O(n) time [18]. A number 
of ingenious algorithms have been proposed to 
build ST and SA in linear time directly from 
the string itself, even in the case of polynomial 
alphabets: see [10, 17, 19, 20, 25, 32, 33] for a 
sampler of such algorithms, and see [28] for a 
detailed taxonomy. Some applications require to 
maintain the suffix tree after edits to the under- 
lying string: see [12, 13,22] for a sampler of 
such algorithms. Finally, see [21] for a compar- 
ative study of space-efficient allocations of suffix 
trees. 


Key Results 


Suffix trees are extremely versatile indexes 
that allow one to solve a variety of string 
matching and analysis problems [2, 9, 14]. 
We review few such problems, classifying the 
corresponding algorithmic solutions based on 
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the way they walk on the suffix tree and on 
the information they store in each node. This 
classification exposes recurrent design patterns, 
it highlights which parts of the suffix tree 
are needed by each application, and it helps 
decide which algorithms can be implemented 
on top of more succinct but less powerful 
representations of the suffix tree. The emphasis 
of this section is on the power of different 
traversals of the suffix tree, not necessarily on 
the most efficient solution of each string analysis 
problem. 


Top-Down 

Exact searching inside a string S of length 7 is 
the most natural example of top-down traversal of 
STs. Given a query string W, we can just match 
its characters from the root of ST in O(|W|) 
time to determine whether W occurs in S or not. 
Since edges are labeled by substrings of S, the 
search for W can end in the middle of an edge 
(u,v): we say that v is the locus of W in ST, 
and we denote it by locus(W). This approach 
generalizes to a set of patterns W1, W2,..., We 
of total length m, by building the suffix tree of 
the concatenation W = W,#, WoH2---#,_1We 
and by traversing STs and STw synchronously, 
where i # j implies #; A #; and #; # #. 

The total number of (possibly overlapping) 
occurrences of the label £(v) of a node v of ST 
equals the number of leaves in the subtree rooted 
at v, which can be computed by a bottom-up 
traversal of the tree. All strings that end in the 
middle of edge (u, v) start exactly at the same po- 
sitions as £(v) in S; therefore, ST with frequency 
annotation allows one to return the frequency in 
S of any string W in O(|W]) time. An important 
consequence of this is the fact that the number 
of distinct frequencies assumed by nonempty 
substrings of S is at most |S|. It is also possible 
to annotate every node of ST with the smallest 
and largest leaf in its subtree, supporting O(|W]|)- 
time queries on the first and last occurrence in S 
of any string W. More generally, traversing the 
tree rooted at locus(W) in O(|W| + k) time 
allows one to print all the k starting positions of 
WinS. 
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Finding all the occurrences of W in S can also 
be done in O(|W|logn) time, by binary search- 
ing SAg for strings W# and W$, where $ = o + 
1: the result of these searches are, respectively, 
the starting and ending position of the interval of 
all suffixes prefixed by W. Knowing this interval 
allows one to derive the number of occurrences of 
W in S in constant time and to output the starting 
positions of such occurrences in time linear in 
the size of the output. Using simple properties of 
LCPs, it is possible to reduce the time of binary 
search to O(|W|+logz), by reusing information 
during the search [24]. 

The top-down navigation of a suitably anno- 
tated suffix tree of S allows one also to compute 
the Lempel-Ziv factorization of S [23]. Recall 
that this factorization scans the string from left 
to right, and it determines at every position i the 
longest prefix of S[i...n] that equals a prefix 
of S[j...n], where 7 < i. Let W be such 
longest prefix: the factorization outputs the tuple 
Gi, |W|, S[i + |W|]). Clearly we can find all 
this information by annotating every node v of 
ST with the index j of the smallest leaf in the 
subtree rooted at v. Then, we can just match 
suffix S[i...n] from the root of ST until a 
mismatch occurs or until we find a node with 
index greater than 7. More advanced solutions 
embed the factorization in an online, one-pass 
construction of ST [29]. 


Bottom-Up 

A square is a string WW where W € St is not 
in the form Z* with k > 1 for any Z € Ut. 
Clearly, if a square WW occurs at position? in S, 
then there is a node v in STg such that |£(v)| > 
|W| and such that leaves i andi + |W| belong 
to the subtree rooted at v. The converse is also 
true [4]. Thus, we can output all the repeats of 
S by using the following bottom-up traversal of 
ST. Assume without loss of generality that all 
nodes in ST have exactly two children. Every 
node u of ST builds its list of occurrences, sorted 
by position in S, using the lists of its children. 
Then, it scans its list once to find all pairs of 
positions at distance at most |£(v)| in S that are 
consecutive in the list: every such pair is a square, 
and positions at distance at most |£(v)| that are 
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not consecutive induce squares that are implied 
by the consecutive positions. 

Let v and w be the two children of v, and 
assume without loss of generality that the list 
of occurrences of v is smaller than the list oc- 
currences of w. Then, the list of node u can be 
built by extracting all elements from the list of 
node v and by inserting them into the list of 
node w. As a consequence of such insertions, 
the occurrences in the list of v move to a list 
that is at least twice the size of the original 
list: it follows that an occurrence can be pushed 
into at most O(logn) lists; therefore, the total 
number of extractions and insertions is bounded 
by O(nlogn). If the lists of occurrences are 
implemented with balanced trees, the total time 
to extract all squares from S$ is O(n log? n). 
More advanced approaches manage to shave a 
logarithm, reaching optimal O(n logn) time [4], 
and to reduce the complexity to O(n + tT), where 
tT is the size of the output [15,31]. 

The algorithm for detecting squares can be 
adapted to compute all the maximal palindromes 
of S, by applying it to string T = S#S$. 
Note that a variant of the same algorithm can 
be implemented using the suffix array. First, it is 
easy to see that a bottom-up, in-order traversal of 
the internal nodes of STs can be simulated by 
a linear scan of SAs and of LCPs, maintaining 
a stack [1]. It follows that, for every interval 
(iy, jv) in SA of a node v in ST, we can just 
check whether SA[A] + |€(v)| € [iv ... jv] and 
S[SA[A] + |€(v)|] 4 S[SA[A] + 2|£(v)]], for ev- 
ery k € [iy ... Jy]: in this case, the occurrence of 
square £(v) at position SA[k] is called branching. 
It is easy to see that all squares can be derived 
from squares with branching occurrences [31]. 
Moreover, if the occurrence at position SA[k] 
is branching, then suffixes SA[K] + |€(v)| and 
SA[k] + 2|£(v)| belong to distinct children of 
node v in ST: we can thus discard the child 
w of v with the largest number of leaves and 
check for every k € [i ... j] that does not belong 
to the interval of w whether SA[k] — |£(v)| € 
liv... jy] and S[SA[k]] 4 S[SA[k] + |€(v)|]. 
The child of v with largest interval can be de- 
termined in constant time during the simulated 
bottom-up traversal of ST, and since the largest 
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interval is always excluded, the algorithm runs in 
O(n logn) time. 

Given a collection of k strings of total length 
n, let S be the concatenation of all such strings, 
each terminated by a distinct symbol that does 
not belong to ’. A bottom-up navigation of STs 
(called also the generalized suffix tree of the 
collection) allows one to compute the length of 
a longest string that occurs in x < k strings. 
To solve this problem, we can annotate each leaf 
v of ST with a bitvector which of length k, 
such that which[i] = 1 iff the suffix associated 
with v starts inside string i. Then, every node 
of ST can be annotated with the same bitvector 
via a bottom-up, O(nk) traversal, in which we 
compute the bitvector of a node by taking the 
logical or of the bitvectors of its children. More 
advanced algorithms solve this problem in O(n) 
time [8]. As a byproduct, this annotation allows 
one to answer queries on the number of strings 
in the collection that contain a given substring, 
a problem known as document counting. A ger- 
mane problem is that of document listing, in 
which we are given a pattern and we are asked 
to return the set of all documents that contain one 
or more copies of the pattern [26]. 


Top-Down and Suffix Links 
Given two strings S and 7, of length n and 
m, respectively, the matching statistics array 
MSs,7[1...n] is such that MSs,7[i] stores the 
length of the longest string that starts at position 
i in S and that occurs in T [33]. We can compute 
MSs,7 by scanning S from left to right, while 
simultaneously issuing child and _ suffix-link 
queries on ST7. This results in a peculiar walk 
on STr that consists of alternating sequences 
of suffix-tree edges and of suffix links (we can 
also compute MSs,7 symmetrically, by scanning 
S from right to left and by simultaneously 
issuing parent and Weiner-link queries on 
STr [27]). 

Specifically, assume that we are at position 7 in 
S, and letW = S[i...i +MSs,r[i] — 1]. Note 
that W can end in the middle of an edge (u, v) of 
STr: let W = aXY wherea € XY, X € &%, 
aX = &(u), and Y € &*. Moreover, let uv’ = 
suffixLink(u) and v’ = suffixLink(v). 
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Note that suffix links can project edge (u, v) onto 
a path u’,v1,02,...,Ux,v’, where v; € V for 
J€ [1 ee k]. Since MSs,r[i+1] = MSs,7[iJ-1, 
the first step to compute MSs [i + 1] is to find 
the position of XY in STr: we call this phase of 
the algorithm the repositioning phase. To imple- 
ment the repositioning phase, it suffices to take 
the suffix link from uw, to follow the outgoing edge 
from u’ whose label starts by the first character of 
Y, and then to iteratively jump to the next internal 
node of STr and to choose the next outgoing 
edge according to the corresponding character of 
Y. After repositioning, we start matching the new 
characters of S on ST7, i.e., we read characters 
Sti +MSs 7[i]], Si + MSs 7[i] + 1,... until 
such an extension becomes impossible in STr. 
We call this phase of the algorithm the matching 
phase. Note that no character of S that has been 
read during the repositioning phase of MS3s,7 [i + 
1] will be read again during the repositioning 
phase of MSs.7[i +k] with k > 1: it follows that 
every position j of S is consumed at most twice, 
once in the matching phase of some MSs,7[i] 
with i < 7 and once in the repositioning phase 
of some MSs,r[k] with i < k < j. Since 
every mismatch can be charged to the position 
of which it concludes the matching statistics, the 
total number of mismatches encountered by the 
algorithm is bounded by the length of S. 

These algorithms can be adapted to com- 
pute the shortest unique substring array 
SUSs[1...7], which stores at index i the length 
of the shortest substring of S that occurs only at 
position i [33]. The average of the matching 
Statistics vector can be used to estimate the 
cross-entropy of the probability distributions 
of two stationary, ergodic, stochastic processes 
with finite memory that generated S and T [11]. 
Moreover, a number of compositional similarity 
measures between two strings S and T can be 
computed by scanning S and by simultaneously 
navigating ST;y as in matching statistics: this 
has the advantage of building and annotating 
the suffix tree of just the shortest string [30]. 
Matching statistics on a suitably annotated suffix 
tree of T allows one also to approximate the 
probability that S was generated by the same 
variable-length Markov process that produced 
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T, another measure of similarity not based on 
sequence alignment [3]. 


Top-Down in the Suffix-Link Tree 

A number of statistical applications require to 
annotate the nodes of STs with empirical prob- 
abilities rather than with raw frequencies. The 
empirical probability ps(W) of a string W is 
essentially the number of its occurrences fs(W) 
divided by the maximum number of occurrences 
that W can have in a string of length |S| = n. 
This number cannot exceed n—|W|+1, but it also 
depends on the number of overlaps that W has 
with itself, i.e., on the number of proper borders 
of W: thus, we set ps(W) = fs(W)/b(W), 
where b(W) is the length of the shortest period 
of W. Note that ps can change inside an edge 
of ST. However, if we are interested only in the 
empirical probability of nodes of ST, we can 
compute all such values in overall linear time, by 
mapping the longest-border computation in the 
KMP algorithm onto a depth-first navigation of 
the suffix-link tree [5]. 

The exact computation of the variance of the 
frequency of a string W in S can be itself mapped 
onto the computation of the longest proper bor- 
der of W. Under suitable statistical assumptions, 
computing the expectation and variance of the 
frequency of all right-maximal substrings of S 
suffices to detect all substrings of S with anoma- 
lous frequency: it is thus possible to discover all 
statistically frequent and rare substrings of S in 
overall linear time [5]. 


Any Order 
A single pass over all nodes of ST in any order, 
coupled with a number of checks on the children 
and on the Weiner links of each node, suffices 
to solve a number of string analysis problems in 
linear time. 

A string W is a maximal unique match 
(MUM) between two strings S and T if it occurs 
exactly once in S and exactly once in T and if 
neither aW nor Wb occur in both S and 7 for any 
{a,b} C » (for simplicity, we disregard cases in 
which W occurs at the beginning or at the end of a 
string) [14]. Clearly W must be a right-maximal 
substring of U = S#T$, where # and $ are 
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separators not belonging to »’. Therefore, we just 
need to iterate over every node v of STy in any 
order, checking the following conditions: (1) v 
has exactly two leaves as children; (2) the suffixes 
that correspond to such leaves start before and 
after position |S| + 1 in U, respectively; and 
(3) v has two Weiner links. A similar approach 
extends to MUMs of more than two strings, as 
well as to maximal (not necessarily unique) exact 
matches between two strings and to the maximal 
repeats [7] and the minimal absent words of a 
single string [16]. 

Symmetrically, it is easy to detect the MUMs 
of two strings S and T by a linear scan of the 
suffix array of U = S#T$ and of the correspond- 
ing LCP array. Indeed, a MUM corresponds to 
an interval (i,i + 1) of size two in SAy such 
that LCPy [i] < LCPy[i + 1], LCPy[i + 2] < 
LCPy[i + 1], U[SAyv[i] — 1] 4 U[SAu[i + 
1] — 1], and SAg[i] < |S| +1 < SAu[i + 
1]. Similar criteria allow one to detect maximal 
repeats, supermaximal repeats [14], and maximal 
exact matches [1]. 


String Depth Annotation 

Assume that every node v of STg is annotated 
with |€(v)|. Recall that the shortest unique sub- 
string array SUSs[1...n] is such that SUS 5 [i] 
is the length of the shortest substring of S that oc- 
curs only at position. Since S[i ...i+SUS[i]— 
1] = Wa where a € &, since locus(Wa) is 
a leaf v, and since locus(W) = parent(v), 
traversing the nodes of ST in any order suffices 
to compute SUS[i] for every i. String depth an- 
notations, coupled with a traversal of the nodes of 
ST in any order, suffice also to compute measures 
of compositional complexity of S, like the total 
number of distinct substrings, possibly of a fixed 
length k. 


Frequency Annotation 

Recall that fs(W) is the number of occurrences 
of string W in S. Assume that we want to 
compute p(a|W) = fs(Wa)/fs(W) for all sub- 
strings W of S and for all characters a € X' such 
that Wa is a substring of S'. Such values are called 
conditional probabilities. Clearly p(a|W) = 1 if 
W ends in the middle of an edge of STs: it is thus 
sufficient to compute conditional probabilities 
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for the nodes of ST, and this can be done by 
traversing the nodes of ST in any order and by 
accessing their children. 


String Depth and Frequency Annotation 

Assume that every node v of STg is also anno- 
tated with the number of leaves in the subtree 
rooted at v. Then, traversing the nodes of ST 
in any order allows one to compute the longest 
substring of S that repeats at least t times, or 
the most frequent string of length at least t, for 


H(S,k) = 
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any user-specified threshold t. String depth and 
frequency annotations, coupled with a traversal 
of the nodes of ST in any order, allow one also 
to compute the number of distinct substrings that 
occur Tt times in S, for every frequency t in a 
user-specified range. 

Given a substring W of S, let right(W) 
be the set of characters that occur in S after 
W. More formally, right(W) = {a € ZX: 
f's(Wa) > 0}. The kth order empirical entropy 
of S' is defined as follows: 


wD, frees (Fray) 


| ease aé€right(W) 


To compute H(S, x), it suffices again to traverse 
the nodes of STs in any order, to check whether 
|€(v)| = &k, and to cumulate the contribution 
of v to H(S,k) by reading the frequency of 
its children. Strings of length k that end in the 
middle of an edge of ST do not contribute to 
H(S,k). 

In a similar fashion, given a string S on 
alphabet X', let S be a vector indexed by all 
strings in ©* for a fixed k > 0, such that S[W] 
contains the frequency of string W in S. We 
call S the k-mer composition vector of string 
S. Given two strings S and 7, assume that 
we want to compute a function x(S,7) that 
depends only on N = Yowey« f(S[W], T[W)), 
Ds = ewesk g(S[W]), and Dr = 
Yowes« h(T[W]), where f, g, and h are user- 
specified functions. «(S,7) if often called k- 
mer kernel in text classification. It is possible 
to compute «(S,7) in overall linear time by 
traversing the nodes of the generalized suffix tree 
of S and T in any order. A similar traversal 
of ST allows one to compute «(S,7) on 
composition vectors that are indexed by all 
possible substrings, of any length. In practice 
the frequencies used in composition vectors are 
normalized by their expected values under IID 
or Markov probability distributions: a number of 
kernels based on such normalized counts can still 
be computed in overall linear time by traversing 
the nodes of ST in any order [6]. 


Positional Annotations 

Given two strings S and 7, the longest string W 
that occurs in both S and T is clearly a right- 
maximal substring of the concatenation U = 
S#T$, where # and $ are separators not belong- 
ing to XY. Consider thus STy, and assume that 
every node v is annotated with |€(v)| and with 
a bit flag(v) set to one iff the subtree rooted 
at v contains at least one leaf that starts before 
position |S|+ 1 in U and at least one leaf starting 
after position |S| + 1 in U. Such annotation can 
be carried out in a bottom-up traversal of ST. 
We can compute W by iterating over the nodes 
v € ST with flag(v) = 1 and by cumulating 
the maximum of the lengths of the encountered 
labels. The set of all common substrings between 
Sand T is the set of all prefixes of the labels 
of nodes v € ST such that flag(v) = 1 and 
flag(w) 0 for every child w of v. This 
approach generalizes immediately to more than 
two strings, and it allows one to compute the 
length of the longest substring common to at least 
T strings in a collection of k strings in O(k|U |) 
time and space. More advanced approaches solve 
this problem in O(|U]|) time [8]. 


Applications 


The primitives discussed above find application 
in a wide set of domains. A list of the most salient 
ones includes exact and approximate string 


Suffix Trees and Arrays 


searching, string compression, statistical pattern 


discovery, 


alignment-free string comparison, 


string kernels in learning theory, sequence 


analysis, and assembly in bioinformatics. 
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Sugiyama Algorithm 


Problem Definition 


Given a directed graph (digraph) G(V, £) with 

a set of vertices V and a set of edges EF, the 

Sugiyama algorithm solves the problem of find- 

ing a 2D hierarchical drawing of G subject to the 

following readability requirements: 

(a) Vertices are drawn on _ horizontal lines 
without overlapping; each line represents 
a level in the hierarchy; all edges point 
downwards. 

(b) Short-span edges (i.e., edges between adja- 
cent levels) are drawn with straight lines. 

(c) Long-span edges (i.e., edges between nonad- 
jacent levels) are drawn as close to straight 
lines as possible. 

(d) The number of edge crossings is the mini- 
mum. 

(e) Vertices connected to each other are placed as 
close to each other as possible. 

(f) The layout of edges coming into (or going out 
of) a vertex is balanced, i.e., edges are evenly 
spaced around a common target (or source) 
vertex. 


Requirements (a) and (b) are easy to meet 
and they are imposed as mandatory basic 
drawing rules. Requirements (c)-(f) are much 
harder to satisfy and typically they are met 
approximately [1,4, 11]. 


Key Results 


Sugiyama et al. propose a four-step procedure 
for finding a hierarchical drawing of a digraph 
subject to the readability requirements listed 
above. It is known as the Sugiyama algorithm, 
the Sugiyama method, or the Sugiyama 
framework [19]. The steps of the Sugiyama 
framework are illustrated in Fig. 1. 


The Sugiyama Framework 
Step 1: Preparatory step for transforming the 
input digraph G into a proper hierarchy. 


Sugiyama Algorithm 
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Ay Ay 


Step 1.1 


Step 1.2 


Step 1.3 


Step 2 


Step 3 


Step 4 


Sugiyama Algorithm, Fig. 1 Illustration of the steps of the Sugiyama framework 


Step 1.1: Transform the input digraph G into 
a directed acyclic graph (dag) by reversing 
the direction of some edges. 

Step 1.2: Transform the dag into a multilevel 
digraph, called a hierarchy, by partitioning 
V into / levels (or layers) Vi, V2, ..., Vi 
such that for each edge e = (v,w) € E 
if v € V; then w e€ Vj+,. Levels are 
drawn on horizontal lines which determine 
the y—coordinates of the vertices. 

Step 1.3 Transform the hierarchy into a 
proper hierarchy by introducing dummy 
vertices along long-span edges; 
dummy vertex at each crossing of a long- 
span edge with a level. 


one 


Step 2: For each level V;, specify a linear 
order o; of the vertices in V; with the goal 
of minimizing the total number of edge 
crossing. 

Step 3: Determine the x—coordinates of the 
vertices subject to requirements (c), (e), and 
(f) while preserving the linear order in the 
levels. 


Step 4: Draw G in a 2D drawing area where 
dummy vertices are removed and the long- 
span edges are restored. 


Steps 1.3 and 4 are trivial as computational 
problems. Steps 1.1 and 1.2 can be solved easily 
if the only readability requirements are those 
listed above. However, some sensible additional 
requirements can turn Steps 1.1 and 1.2 into dif- 
ficult combinatorial optimization problems. For 
example, if we want to minimize the number 
of reversed edges at Step 1.1, then we need 
to solve the MINIMUM FEEDBACK ARC SET 
problem which is NP-hard [12]. Similarly, if we 
impose upper bounds on both the number of 
levels and the number of vertices per level, then 
the problem in Step 1.2, known as the layering 
problem, becomes NP-complete [4]. 

Following the work of Sugiyama et al., two 
types of solutions to the layering problem have 
been proposed in the research literature. The 
first type of layer assignment algorithm is list- 
scheduling algorithms (adapted from the area 
of static precedence-constrained multiprocessor 
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scheduling) which produce layer assignments 
with either the minimum number of levels 
or a specified maximum number of vertices 
per level [4]. These include the longest- 
path algorithm [13] and the Coffman-Graham 
algorithm [3] as well as the proposed by 
Nikolov et al. [15] MinWidth and StretchWidth 
heuristics which take into account the dummy 
vertices. The second type of algorithm employs 
network simplex and branch-and-cut techniques, 
respectively, for minimizing the number of 
dummy vertices with or without constraints on 
the number of levels and the number of vertices 
per level [9, 10]. 

Steps 2 and 3 are already hard to solve with 
the readability requirements listed above. It has 
also been suggested to precede Step 2 by an edge 
concentration or edge bundling step for achieving 
a more readable drawing [14, 16]. The other key 
results in the work of Sugiyama et al., besides 
defining the four-step framework, are efficient 
heuristics for Steps 2 and 3, respectively. 


Reduction of the Number of Edge 

Crossings 

Consider a proper hierarchy G(V, E, £) with a 
set of vertices V = {v1,0V2,...,U,}, a set of 
edges E = {e1,€2,...,@m}, and a partitioning 
L = {Vi,V2,..., Vi} of the vertex set V into 
I levels (the result of Step 1.3). Let oj : Vi > 
{1,2,...|Vi]} be a linear order of the vertices 
in level V; and let S; be the set of all possible 
orders o;. The problem at Step 2 of the Sugiyama 
algorithm is to find a set of linear orders 0 = 
{01,02,...,07} € Sy x Sz x... S}7 such that the 
total number of edge crossings is the minimum. 
Let K(G,o) be the total number of edge cross- 
ings for a hierarchy G and a set of linear orders 
o, and let K(V;, Vi+1,0;, 0;+1) be the number of 


Vil \Vil 


By = > axjoi(ve)/ > axj, 
k=1 k=1 
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edge crossings between layers V; and Vj; with 
linear orders oj and 0; +1, respectively. 

The algorithm, proposed by Sugiyama et al. 
for Step 2, is a heuristic which consists in ini- 
tially choosing a random order 0; for the ver- 
tices in level V; and then repeatedly executing 
the following five-step procedure, called Down- 
Up, until either o does not change or an ini- 
tially given maximum number of iterations is 
reached. 


The Down-Up Procedure 

Step A: i < 1. 

Step B: With a fixed linear order o;, find 
a linear order oj;,; which minimizes 
K(Vj, Viti, 07, Oi41)- 

Step C: Ifi <n—1,theni <i +1 and goto 
Step B. Otherwise, go to Step D. 

Step D: With a fixed linear 
find a linear order o; 
K(V;, Vi41, 07, 07-41). 

Step E: Ifi > 1, then i < i — 1 and go to Step 
D. Otherwise, stop. 


order 0j+1, 
which minimizes 


Both Step B and Step D involve minimizing 
the number of edge crossings between two 
adjacent layers with the linear order in one 
of them being fixed. This problem is known 
as the ONE-SIDED CROSSING MINIMIZATION 
(OSCM) problem, which has been shown to 
be NP-hard [5]. Based on previous work by 
Warfield [20], Sugiyama et al. show how OSCM 
can be reduced to the MINIMUM FEEDBACK 
SET problem and propose a heuristic method, 
called the barycentric method, for solving it. 
Let A = (aj;) be the adjacency matrix of G. 
In essence, with a fixed linear order o;, the 
barycentric method orders the vertices in level 
V;+1 in the increasing order of their barycenters 
B;, defined with Eq. (1). 


J €U1,2,..., Viral 


Sugiyama et al. evaluate the Down-Up procedure 
experimentally with 800 randomly generated 


hierarchies as well as with five hierarchies 
from practical applications. Their conclusion is 
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that the proposed heuristic is effective. It was 
observed that in most cases the Down-Up proce- 
dure requires a single iteration. Reportedly, the 
heuristic was successfully extended for the case 
when vertices in each level are partitioned into 
subsets where the vertices in each subset must be 
arranged adjacently. 

Step 2 is probably the best studied part of the 
Sugiyama framework. Numerous improvements 
to the original technique as well as alternative 
algorithms for crossing minimization have been 
proposed since the introduction of the Sugiyama 
framework [1,4,5,7,9, 11]. Notable among them 
is the 3-approximation median method proposed 
by Eades and Wormald [5] for solving the OSCM 
problem. Having the order of the vertices in level 
V; fixed, the median method consists of placing 
each vertex in level Vj+1 at a position which 
corresponds to the median of the positions of its 
neighbors in level V;. Since the median method 
is an approximation algorithm, it guarantees to 
find a solution without edge crossings if such 
exists. 


Determination of x-Coordinates of 

Vertices 

For Step 3 of their framework, Sugiyama et al. 
propose a version of the Down-Up procedure 
with the barycenter of a vertex based on the x- 
coordinates of the connected to it vertices in an 
adjacent level. Consider the down part of the 
Down-Up procedure (the up part is symmetrical). 
If the x-coordinates of the vertices in level V; are 
known, the barycenters B; of the vertices in level 
V;+1 are defined with Eq. (2). 


Vi Vi 


Be = S° agjx(ve)/ Yak. 
k=1 k=1 


j €{1,2,..., 


Vier} =) 

The x-coordinates of the vertices in level 
V;4 1 are determined according to their priority. 
The highest priority has the dummy vertices 
(introduced in Step 1.3), and the priority of 
each other vertex in level Vji; is the number 
of vertices in level V; connected to it. The x- 
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coordinate of each vertex v; € V;+, is the integer 
number which is the closest to B; available 
horizontal position (without changing the linear 
order from Step 2 and without displacing already 
placed vertices with higher priority). In finding 
this position, it is allowed to displace vertices 
with a priority lower than the priority of v;, 
where this displacement should be as little as 
possible. 

Sugiyama et al. evaluate the effectiveness of 
this method for improving the readability re- 
quirements (c), (e), and (f) experimentally. Re- 
portedly, they have extended their heuristic for 
the case when the dimensions of the vertices 
are not insignificant. Both the Step 2 and the 
Step 3 heuristics were successfully applied to a 
hierarchy with more than 500 vertices. 

Alternative algorithms for Step 3 have been 
proposed by Gansner et al. [9], Eades et al. [6], 
and Sander [17]. Probably, the best solution for 
Step 3 to date is the O(|V |) algorithm of Brandes 
and K6pf [2]. It assigns x-coordinates to vertices 
by computing four extreme vertex alignments 
which are then combined into a final layout with 
at most two bends per edge. 


Applications 


Hierarchical graph drawings are useful for pro- 
viding insight into hierarchical structures in com- 
plex systems. In recent years, the Sugiyama al- 
gorithm has found an important application for 
visual analysis of large social and biological 
networks [8, 18]. 
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Problem Definition 


In the 1970s, sequence alignment was introduced 
to demonstrate the similarity of the sequences of 
genes and proteins [12]. A DNA sequence is a 
finite sequence over four nucleotides — adenine, 
guanine, cytosine, and thymine, whereas a pro- 
tein sequence is over 20 amino acids. Homolo- 
gous proteins have similar biological functions. 
Since they evolve from a common ancestral se- 
quence, the sequences of homologous proteins 
and their encoding genes are often highly similar. 
Therefore, the DNA or amino acid sequence of 
a protein is often aligned with the sequences 
of well-studied proteins to infer the biological 
functions of the protein. 

Formally, an alignment of two sequences, S 
and 7, on an alphabet B is a two-row matrix with 
the following properties: 


Superiority and Complexity of the Spaced Seeds 


1. The letters in S' are listed in order, interspersed 
with space symbols “—,’ in a row, where 
represents the fact that a letter is missing at a 
position. 

2. The letters in T are listed in the other row in 
the same manner. 


3. Each column does not contain two 


“co 


“9 


An alignment of S and T poses a model of 
the evolution from their least common ancestral 
sequence to themselves. An alignment is scored 
using a scoring matrix that has a score for every 
pair of letters in 6 U {-}. The score of an align- 
ment is defined to be the sum of the scores of the 
pairs of letters appearing in the columns of the 
alignment. 

Proteins often have multiple functions. Two 
proteins having a common function often have 
one or several highly similar regions in their DNA 
and amino acid sequences. Such “conserved” 
regions are found by solving the local alignment 
problem: 


Input: Two sequences S = s5152...5m and T = 
tyt2-++t, on an alphabet. 

Find: Two subsequences S’ = sjsj41---5; @ < 
jy and T’ = tetepist (k <1) 
such that the alignment score of S’ and T’ is 
as large as possible. 


The alignments between their subsequences are 
called local alignments of S and T. 

A dynamic programming approach takes 
quadratic time to solve the local alignment 
problem [13]. Unfortunately, it is not fast enough 
for homology search against a database with 
millions of DNA or protein sequences. Therefore, 
a filtration technique was adopted to design fast 
algorithms for homology search in the 1990s 
[1], by which good local alignments between 
two sequences are found by first identifying 
short consecutive matches of a specified length 
between the sequences, called seed hits, and then 
extending them to obtain good local alignments. 

The filtration technique has a dilemma over 
sensitivity and speed. Employing a long seed will 
miss some good local alignments between two 
sequences, decreasing sensitivity; on the other 
hand, using a short seed will waste time on 
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extending many seed hits into local alignments 
that are not biologically meaningful, resulting in 
low speed. 

In PatternHunter [10], Ma, Tromp, and Li 
introduced the idea of optimized spaced seeds to 
achieve good balance between the sensitivity and 
speed of the filtration approach. PatternHunter by 
default looks for nucleotide match in 11 positions 
in every region of 18 bases long, specified by the 
string 111 * 1 * *1 * 1 * *11 * 111, to trigger 
the process of local alignment. Such hit patterns, 
called spaced seeds, led to surprisingly higher 
sensitivity as well as speed than the consecutive 
seed 11111111111 that has the same number 
of match positions [10]. Moreover, sensitivity 
can further be improved by employing multiple 
spaced seeds that are longer than 18 bases [8, 14]. 
This motivates the study of how to find the 
optimal spaced seeds of given length and weight 
(2-5, 7]. 


Key Results 


A spaced seed Q can be represented by a string 
of 1’s and *’s, where 1’s give the match po- 
sitions in a seed hit. The number of 1’s in Q 
is called its weight, denoted by wg; the length 
of the corresponding string is called its length, 
denoted by Lo. The relative positions in Q are 
denoted by RP(Q). For example, for Q = 
111 * 1 x *1 * 1 * x11 * 111, RP(Q) = 
{0, 1,2, 4,7, 9, 12, 13, 15, 16, 17}. 

An alignment containing no —’s is called a un- 
gapped alignment. A local ungapped alignment 
can be modeled as a 0-1 sequence by translating 
match columns (containing two identical letters) 
into 1’s and mismatch columns into 0’s. Hence, 
a hit of Q identifies an alignment if the relative 
positions of Q match 1’s in a region in the 
corresponding 0-1 string of the alignment. 

Assume match occurs independently with 
probability p at a position in a local ungapped 
alignment. The sensitivity of Q in detecting a 
local alignment of n columns of two sequences 
with identity p is then defined to be the 
probability that Q hits a Bernoulli random 
sequence, called a uniform region, in which 1 
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and 0 appear with probability p and (1 — p), 
respectively. A spaced seed is optimal for 
aligning sequences with identity p of length 
n if it has the largest hit probability over a 
uniform region of length m in which 1 appears 
with probability p at a position. 

A straightforward method for identifying op- 
timal spaced seeds is to exhaustively examine all 
the spaced seeds of given length and weight by 
keeping the largest sensitivity (or hit probability) 
over a uniform region. Unfortunately, the sensi- 
tivity of a spaced seed is unlikely computable in 
polynomial time. 


WISESAMPLE ALGORITHM 


Superiority and Complexity of the Spaced Seeds 


Theorem 1 Computing the sensitivity of a 
spaced seed over a uniform region is NP-hard. 


The hit probability of a spaced seed over a 
uniform region can be computed using a dynamic 
programming approach [7] or using recurrence 
relations [4,5]. Not surprisingly, these approaches 
become impractical for identifying long spaced 
seeds, because their complexities are an exponen- 
tial function in the difference of the length and 
weight of spaced seeds under consideration. Here 
a simple polynomial-time approximation scheme 
is presented. 


Input: A spaced seed Q, a positive integer n, 0 < p < l,ande > 0. 
Find: An estimate of hit probability Q in a uniform region of length n in which 


bit 1 appears at a position with probability p. 


Initialize an array A: A[i] << Ofor 7 = 1,2,...,.n—Lo; 


N < [6€~?n? logn]; 
Repeats N times 
R{i] <— 1 fori € RP(Q); 


R{i] < 1 with probability p fori € {1,2,...,2} —RP(Q); 


Fori = 1,2,...,L —Lo 


If Q does not hit the subregion R[1,i + Lg — 1] 


Alt) — Ali] +1: 
Output p'’2 (1 +N7! aie nj). 


j= 


Theorem 2 Let Q be a spaced seed and its hit 
probability be x on a uniform region with identity 
p of length n. WISESAMPLE outputs an estimate 
y of x on input Q,n, p, and € > such that |y — 
x| < €x with high probability. 


Let O bea spaced seed and R a uniform region 
with identity p of length n. Following convention 
in renewal theory, Q hits R at position k if and 
only if R[k -Lo +i; +1) =1 forall <j < 
wa. Let Ax be the event that Q hits R at position 
k and Ax be the complement event of A;. Then 
the probability f;, that QO first hits R at the k-th 
position is: 


fie = Pr[AoA1 «++ Ag—2 Ax-1]- 


The hit probability Q,(p) of QO on R is equal to: 


On(p) = Pr[Ao U Ay U--- U Ap_)]. 


When seed hits are extended into local align- 
ments, two seed hits will give one local alignment 
if they overlap. Therefore, the sensitivity of a 
spaced seed is closely related to the number of 
its nonoverlapping hits in a uniform region. A 
nonoverlapping hit of a spaced seed is a recurrent 
event with the following convention: If a hit at 
position k is selected as a nonoverlapping hit, 
then the next nonoverlapping hit is the first hit at 
or after position k + Lo. 

The average distance, jg, between two suc- 
cessive nonoverlapping hits of Q is defined to be 


Ho= >. fj. 


J=Lo 


Superiority and Complexity of the Spaced Seeds 


A spaced seed is nonuniform if g.c.d. (RP(Q)) = 
1. 


Theorem 3 For any nonuniform spaced seed Q, 


ce) 


Ho <>, p+ (Lo -waq) 
j=l 


—(1— p)(p? "2 —1)/p. 


Buhler et al. [3] proved that for any spaced 
seed Q, there are two constants ag and Ao 
that are independent of n such that limy—o0(1 — 
On(p))/ (woAQ) = 1, where XQ is the largest 
eigenvalue of the transition matrix of a Markov 
chain model constructed from Q. 


Theorem 4 For the consecutive seed B of 


weight w, 


1 
pa ee 


1 
as (p + pi) —w 


Sig ei= 
For a spaced seed Q, 


he ee 
ko—Lo+l 


If Lo < (1— p)[p? "2 -1]/p + 1, by 
Theorems 3 and 4, Ag < Ag. This implies that 
Q has a larger hit probability than the consecutive 
seed of the same weight in a long uniform region 
with identity p. 

The detailed proofs of these results can be 
found in [11,15]. 


Applications 


Spaced seed approach finds applications in 
homology search and comparison of genome 
sequences. PatternHunter was used to compare 
the mouse and human genomes in the mouse 
genome project [6]. MegaBLAST and BLASTZ 
have adopted spaced seeds for homology search. 
Recently, the approach has also been used in 
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mapping short reads into reference genome 
sequences. 

Interestingly, spaced seed design is found to 
be closely related to optimal Golomb ruler design 


[9]. 


Open Problems 


It is proved to be NP-hard to identify the optimal 
spaced seeds over a nonuniform region [8]. 


Open problem 1 Is it NP-hard to find the optimal 
spaced seed of a given length and weight over a 
uniform region? 


It has been shown that a uniform spaced seed 
has a lower hit probability than the consecutive 
seed of the same weight over any uniform region 
[4,7]. But the following problem is open: 


Open problem 2 For any nonuniform spaced seed 
Q and O < p <1, is there n(p, Q) such that 
Q has a larger hit probability than the consecutive 
seed of the same weight over a uniform region with 
identity p of lengthn > n(p, Q)? 
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Problem Definition 


In 1992 Vapnik and coworkers [1] proposed a 
supervised algorithm for classification that has 
since evolved into what are now known as support 
vector machines (SVMs) [2]: a class of algo- 
rithms for classification, regression, and other 
applications that represent the current state of the 
art in the field. Among the key innovations of 
this method were the explicit use of convex op- 
timization, statistical learning theory, and kernel 
functions. 


Classification 

Given a training set S = {(X1, y1),..., (Ke, ye)} 
of data points x; from X CC _ R” with 
corresponding labels y; from Y = { — 1,+1}, 
generated from an unknown distribution, the task 
of classification is to learn a function g:X¥ — Y 
that correctly classifies new examples (x, y) 
(i.e., such that g(x) = y) generated from the 
same underlying distribution as the training 
data. 

A good classifier should guarantee the 
best possible generalization performance (e.g., 
the smallest error on unseen examples). 
Statistical learning theory [3], from which 
SVMs originated, provides a link between 
the expected generalization error for a given 
training set and a property of the classifier known 
as its capacity. The SV algorithm effectively 
regulates the capacity by considering the function 
corresponding to the hyperplane that separates, 
according to the labels, the given training 
data and it is maximally distant from them 
(maximal margin hyperplane). When no linear 
separation is possible, a nonlinear mapping into 
a higher dimensional feature space is realized. 
The hyperplane found in the feature space 
corresponds to a nonlinear decision boundary 
in the input space. 

Let? : J C R” > F C RY bea 
mapping from the input space J to the feature 
space F (Fig. 1a). In the learning phase, the algo- 
rithm finds a hyperplane defined by the equation 
(w, &(x;)) = 5 such that the margin 
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Support Vector Machines, Fig. 1 (a) The feature map simplifies the classification task. (b) A maximal margin 


hyperplane with its support vectors highlighted 


y =minj<j<e yi ((w, 6(%;)) — 5) 


= min<j<¢ Vi¥ (Xi) (1) 


is maximized, where (,) denotes the inner prod- 
uct, w is a €-dimensional vector of weights, and 
b is a threshold. 

The quantity ((w, ¢(x;)) — b)/||w|| is the 
signed distance of the sample x; from the hyper- 
plane. When multiplied by the label y;, it gives 
a positive value for correct classification and a 
negative value for an uncorrect one. Given a new 
data point x, a label is assigned evaluating the 
decision function: 


g(x) = sign((w, $(x)) — 5) (2) 


Maximizing the Margin 
For linearly separable classes, there exists a hy- 
perplane (w, b) such that 


yi((w,o(%;)) -—b)>y, i=1,...,£. @) 


Imposing ||w||?_ = 1, the choice of the hy- 
perplane such that the margin is maximized is 
equivalent to the following optimization problem: 


maxw,b,y Y 


subject to y;((w, 6(x;)) —b) > y, i =1,...,£, 
(4) 


and ||w||? = 1. 


An efficient solution can be found in the dual 
space by introducing the Lagrange multipliers a;, 
i = 1,...,&. The problem (4) can be recast in the 
following dual form: 


£ £ £ 
max > aj — > > aj Vi V7 (P(X), $(x;)) 
i=1 i=l j=l 
(5) 
L 


subject to py aiyi =0, a; = 0. 


i=1 


This formulation shows how the problem reduces 
to a convex (quadratic) optimization task. A key 
property of solutions a* of this kind of problems 
is that they must satisfy the Karush-Kuhn-Tucker 
(KKT) conditions that ensure that only a subset 
of training examples needs to be associated to 
a nonzero a;. This property is called sparseness 
of the SVM solution and is crucial in practical 
applications. 

In the solution a*, often only a subset of train- 
ing examples is associated to nonzero a;. These 
are called support vectors and correspond to the 
points that lie closest to the separating hyperplane 
(Fig. 1b). For the maximal margin hyperplane, the 
weight vector w* is given by a linear function of 
the training points: 


l 
w* > at; Vib (Xi). (6) 


i=1 
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Then the decision function (2) can equivalently 
be expressed as 


£ 
g(x) = sign() | af yi($(xi). @(%)) — ). (7) 


i=1 


For a support vector x;, it is (w*, 6(x;)) —b = yi 
from which the optimum bias b* can be 
computed. However, it is better to average the 
values obtained by considering all the support 
vectors [2]. Both the quadratic programming 
(QP) problem (5) and the decision function (7) 
depend only on the dot product between 
data points. The matrix of dot products with 
elements Kjj = K(x;,x;) = (f(x), O(x;)) 
is called the kernel matrix. In the case of linear 
separation, we simply have K(x;,x;) = (x;,X;), 
but in general, one can use functions that 
provide nonlinear decision boundaries. Widely 
used kernels are the polynomial K(x;,x;) = 


where d and o are user-defined 
parameters. 


Key Results 


In the framework of learning from examples, 
SVMs have shown several advantages compared 
to traditional neural network models (which rep- 
resented the state of the art in many classifica- 
tion tasks up to 1992). The statistical motivation 
for seeking the maximal margin solution is to 
minimize an upper bound on the test error that 
is independent of the number of dimensions and 
inversely proportional to the separation margin 
(and the sample size). This directly suggests 
embedding of the data in a high-dimensional 
space where a large separation margin can be 
achieved; this can be done efficiently with ker- 
nels using techniques from convex optimization. 
The sparseness of the solution, implied by the 
KKT conditions, adds to the efficiency of the 
result. 

The initial formulation of SVMs by Vapnik 
and coworkers [1] has been extended by many 
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other researchers. Here we summarize some key 
contributions. 


Soft Margin 

In the presence of noise the SV algorithm can be 
subject to overfitting. In this case one needs to 
tolerate some training errors in order to obtain a 
better generalization power. This has led to the 
development of the soft margin classifiers [4]. 
Introducing the slack variables & > 0, optimal 
class separation can be obtained by 


l 
MiNy,by,£ yr Cc > §j 
i=l 
subject to y;((w, 6(x;)) —b) => y — & ,.& = 0 
(8) 


i =1,...,€and ||w]||? = 1. 


The constant C is user defined and controls the 
trade-off between the maximization of the margin 
and the number of classification errors. The dual 
formulation is the same as (5) with the only 
difference in the bound constraints (0 < a; < 
C, i = 1,...,€). The choice of soft margin 
parameter is one of the two main design choices 
(together with the kernel function) in applica- 
tions. It is an elegant result [5] that the entire 
set of solutions for all possible values of C can 
be found with essentially the same computational 
cost as finding a single solution: this set is often 
called the regularization path. 


Regression 

A SV algorithm for regression, called support 
vector regression (SVR), was proposed in 1996 
[6]. A linear algorithm is used in the kernel- 
induced feature space to construct a function 
such that the training points are inside a tube of 
given radius ¢. As for classification the regression 
function only depends on a subset of the training 
data. 


Speeding Up the Quadratic Program 

Since the emergence of SVMs, many researchers 
have developed techniques to effectively solve 
the problem (5): a quite time-consuming task, 
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especially for large training sets. Most methods 
decompose large-scale problems into a series of 
smaller ones. The most widely used method is 
that of Platt [7] and it is known as sequential 
minimal optimization. 


Kernel Methods 

In SVMs, both the learning problem and the 
decision function can be formulated only in terms 
of dot products between data points. Other popu- 
lar methods (i.e., principal component analysis, 
canonical correlation analysis, fisher discrimi- 
nant) have the same property. This fact has led to 
a huge number of algorithms that effectively use 
kernels to deal with nonlinear functions keeping 
the same complexity as the linear case. They are 
referred to as kernel methods [8,9]. 


Choosing the Kernel 

The main design choice when using SVMs is 
the selection of an appropriate kernel function, a 
problem of model selection that roughly relates 
to the choice of a topology for a neural network. 
It is a nontrivial result [10] that also this key 
task can be translated into a convex optimization 
problem (a semi-definite program) under general 
conditions. A kernel can be optimally selected 
from a kernel space resulting from all linear 
combinations of a basic set of kernels. 


Kernels for General Data 

Kernels are not just useful tools to allow us to 
deploy methods of linear statistics in a nonlinear 
setting. They also allow us to apply them to 
nonvectorial data: kernels have been designed to 
operate on sequences, graphs, text, images, and 
many other kinds of data [8]. 


Applications 


Since their emergence, SVMs have been widely 
used in a huge variety of applications. To give 
some examples, good results have been obtained 
in text categorization, handwritten character 
recognition, and biosequence analysis. 


2173 


Text Categorization 

In automatic text categorization, text documents 
are classified into a fixed number of predefined 
categories based on their content. In the works 
performed by Joachims [11] and Dumais et al. 
[12], documents are represented by vectors with 
the so-called bag-of-words approach used in the 
information retrieval field. The distance between 
two documents is given by the inner product 
between the corresponding vectors. Experiments 
on the collection of Reuters news stories showed 
good results for SVMs compared to other classi- 
fication methods. 


Handwritten Character Recognition 

This is the first real-world task on which SVMs 
were tested. In particular two publicly available 
data sets (USPS and NIST) have been considered 
since they are usually used for benchmarking 
classifiers. A lot of experiments, mainly sum- 
marized in [13], were performed which showed 
that SVMs can perform as well as other complex 
systems without incorporating any detailed prior 
knowledge about the task. 


Bioinformatics 

SVMs have been widely used also in bioinformat- 
ics. For example, Jaakkola and Haussler [14] ap- 
plied SVMs to the problem of protein homology 
detection, i.e., the task of relating new protein se- 
quences to proteins whose properties are already 
known. Brown et al. [15] describe a successful 
use of SVMs for the automatic categorization of 
gene expression data from DNA microarrays. 


URL to Code 


Many free software implementations of SVMs 
are available at the website 


* www.support-vector.net/software.html 


Two in particular deserve a special mention for 
their efficiency: 
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SVMlight: Joachims T. Making large-scale 
SVM learning practical. In: Schdlkopf B, 
Burges CJC, and Smola AJ (eds) Advances 
in Kernel Methods Support Vector Learning, 
MIT Press, 1999. Software available at http:// 
svinlight.joachims.org 

LIBSVM: Chang CC, and Lin CJ, LIBSVM: 
a library for support vector machines, 2001. 
Software available at http://www.csie.ntu.edu. 
tw/~cjlin/libsvm 
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Problem Definition 


Surface reconstruction, here, is the problem of 
producing a piecewise-linear representation of a 
two-dimensional surface S$ in R?, given as input 
a set P of point samples from the surface. Very 
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Surface Reconstruction, Fig. 1 The medial axis of an 
object; the Voronoi diagram of a set of samples from 
the object boundary; the set of polar balls, with those 


sparse sets of point samples clearly do not convey 
much about S, so in order to prove correctness, 
we need to assume that the sample P is some- 
how sufficiently dense. The minimum required 
density could vary across the surface, with more 
detailed areas requiring denser sampling. This 
idea is captured in the following definition [2]. 
Let S be a two-dimensional surface in R?. The 
medial axis of S is the closure of the set of 
points that have more than one nearest point on 
S; a two-dimensional example is shown in Fig. 1, 
top left. 


Definition 1 The local feature size f(x) at a 
point x is the minimum distance from x to the 
medial axis of S. 


The distance from the medial axis to the surface is 
zero at a sharp feature such as a corner or a crease, 
so we usually assume that S is smooth. The 
algorithms described here make the following €- 


inside the object shaded; the corresponding cells of the 
weighted Voronoi diagram, again with those inside the 
object shaded 


sampling assumption: the minimum distance, at 
any surface point x, to the nearest sample point 
is at most € f(x), for some small constant ¢. This 
leads to algorithms that are provably correct in 
the following sense. 

INPUT: A point set P that is an €-sample from 
a smooth surface S without boundary. 

OUTPUT: A piecewise-linear manifold without 
boundary, homeomorphic to S, that everywhere 
lies within distance O(«f(x)) of S.. The mono- 
graph [7] is an excellent reference for this line of 
research. 


Key Results 


One key idea is that in the neighborhood of any 
point p € P sampled from S, the surface is 
well approximated by a plane. Specifically, for 
any surface point x closer to p than to any other 
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sample, the distance of x from the tangent plane 
at p is O(ef(x)), as is the difference between the 
surface normal at x and the surface normal at p 
[2] (with the corrected proof [3]). Another key 
idea is that some subset of the Voronoi vertices of 
P approximates the medial axis of S, as in Fig. 1, 
top right. 


Crust Algorithm 

The crust algorithm [2] approximates the me- 
dial axis with a subset of the three-dimensional 
Voronoi vertices, called the poles. Each sample 
point in p € P selects the vertex of its Voronoi 
cell farthest from p as its first pole and the vertex 
farthest in the opposite direction as its second. We 
then eliminate any Delaunay triangle all of whose 
circumspheres contain a pole; this is easy to im- 
plement by computing the Delaunay triangulation 
of the set P augmented with the set of poles and 
eliminating any output triangle adjacent to a pole. 
A subset of the remaining surface triangles can 
then be selected as the piecewise-linear output 
surface. 


Cocone Algorithm 

The cocone algorithm [4] provides a simpler way 
of selecting a set of surface Delaunay triangles, 
requiring only one Voronoi diagram computation. 
It relies on the fact that the direction vector from 
a sample p € P to its first pole is within O(e) 
of the surface normal at p, under the e-sampling 
assumption. We define the cocone at p as the 
complement of a double cone, such that the angle 
between the cone surface and this approximate 
normal vector is at least 17 —/8. We consider the 
intersection of the cocone at p with the Voronoi 
cell of p; the Delaunay triangles dual to any 
edge in this intersection are marked as potential 
surface triangles. Triangles marked by all three 
of their vertices are included in the set of surface 
triangles. 


Powercrust Algorithm 

While it is easy in theory to select a subset of 
the surface triangles to form a piecewise-linear 
output surface, it can be difficult in practice when 
the sampling density fails to meet the assumption, 
as is inevitable at sharp features. The power crust 
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algorithm [5] eliminates this issue by producing a 
piecewise-linear output surface. The Voronoi ball 
centered at a pole is the ball with its nearest input 
samples on the boundary; see Fig. 1, lower left. 
We begin by labeling the Voronoi balls of all of 
the poles either as inside or outside the object 
bounded by S, using an iterative algorithm. We 
then compute the weighted Voronoi diagram, also 
known as the power diagram, of these polar 
Voronoi balls. Any Voronoi face separating the 
cell of an inner pole from the cell of an outer 
pole is output as part of the surface (Fig. 1, lower 
right). The faces of the piecewise-linear output 
surface are convex polygons but not in general 
triangles. 


Noisy Samples 

When the input sample points have noise, not 
every pole will be near the medial axis. Nonethe- 
less, if the level of noise is everywhere small 
relative to the local feature size f, some subset of 
Voronoi vertices will still approximate the medial 
axis. In [8], this idea is developed into a provably 
correct algorithm. In addition to the e-sampling 
assumption, we need to assume that the noise 
level is O(e? f(x)) and that the distance from 
any sample p to the kth nearest sample p’ is 
O(ef(x)). This allows us to recognize a Voronoi 
vertex of p as a pole only when it is significantly 
farther from p than the k-nearest neighbors of p. 
These poles are then labeled as either inner or 
outer. This algorithm produces a triangulation of 
the boundary of the union of the inner polar balls 
as the output surface. 


Complexity 

The complexity of all of these algorithms de- 
pends on the complexity of the Voronoi diagram. 
While in general the Voronoi diagram of 7 points 
in R? might have complexity O(n7), Attali, Bois- 
sonnat, and Lietier [6] proved that the complexity 
of the Voronoi diagram for points distributed 
uniformly on a nondegenerate smooth surface 
in R? is O(nlgn). Another idea, employed by 
Funke and Ramos [9] and advanced by Cheng 
et al. [12], is to replace the Voronoi diagram with 
a less computationally expensive structure to get 
an O(n lgn) algorithm. 
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Applications 


Interest in this problem was motivated by the 
advent of laser-range and LiDAR scanners [10], 
which produce depth maps sampled by point 
clouds. It is often reasonable to assume noise-free 
surface samples, since there are preprocessing 
methods, such as moving least squares (MLS) 
[1], that attract noisy point clouds onto nearby 
surfaces; there has also been theoretical work 
on MLS. MLS, or simply local plane-fitting, 
can be used to produce a normal vector at each 
sample point. Another common assumption is 
that the normal vectors can be consistently ori- 
ented. Poisson surface reconstruction [11] is an 
optimization technique that constructs manifold 
surfaces from possibly noisy points with nor- 
mals. Because of its very efficient implementa- 
tions, it is currently the most popular method in 
practice. 


Open Problems 


Subsequent work in surface reconstruction, both 
in computer graphics and in computational geom- 
etry, has focused on the identification and recon- 
struction of sharp features and then using them to 
construct surfaces that are non-manifold. Proving 
that the complexity of the Voronoi diagram of 
points distributed on a generic smooth surface 
with noise or with boundary is o(n?) remains 
open. 


URLs to Code and Data Sets 


There is code available for the cocone algo- 
rithm (http://web.cse.ohio-state.edu/~tamaldey/ 
cocone.html), with several subsequent variants. 
There is also code for the power crust algorithm 
(http://www.cs.ucdavis.edu/~amenta/powercrust. 
html). There is a set of benchmark data sets for 
surface reconstruction (http://www.cs.utah.edu/~ 
bergerm/recon_bench). 


2177 


Cross-References 


Curve Reconstruction 
Manifold Reconstruction 


Recommended Reading 


1. 


10. 


11. 


12. 


Alexa M, Behr J, Cohen-Or D, Fleishman S, Levin 
D, Silva CT (2003) Computing and rendering point 
set surfaces. IEEE Trans Vis Comput Graph 9(1):3- 
15 

Amenta N, Bern M (1999) Surface reconstruction by 
Voronoi filtering. Discret Comput Geom 22(4):481- 
504 

Amenta N, Dey TK (2007) Normal variation for 
adaptive feature size, arXiv 

Amenta N, Choi S$, Dey TK, Leekha N (2000) A 
simple algorithm for homeomorphic surface recon- 
struction. In: Proceedings of the sixteenth annual 
symposium on computational geometry, Hong Kong. 
ACM, pp 213-222 

. Amenta N, Choi S, Kolluri RK (2001) The power 
crust, unions of balls, and the medial axis trans- 
form. Comput Geom Theory Appl 19(2):127— 
153 

Attali D, Boissonnat JD, Lieutier A (2003) Com- 
plexity of the Delaunay triangulation of points on 
surfaces: the smooth case. In: Proceedings of the 
nineteenth annual symposium on computational ge- 
ometry, San Diego. ACM, pp 201-210 

. Dey TK (2006) Curve and surface reconstruction: 
algorithms with mathematical analysis. Cambridge 
monographs on applied and computational mathe- 
matics. Cambridge University Press, Leiden 

Dey TK, Goswami S (2004) Provable surface re- 
construction from noisy samples. In: Proceedings of 
the twentieth annual symposium on computational 
geometry, Brooklyn. ACM, pp 330-339 

Funke S, Ramos EA (2002) Smooth-surface recon- 
struction in near-linear time. In: Proceedings of the 
thirteenth annual ACM-SIAM symposium on discrete 
algorithms, San Francisco. Society for Industrial and 
Applied Mathematics, pp 781-790 

Hoppe H, DeRose T, Duchamp T, McDonald J, Stuet- 
zle W (1992) Surface reconstruction from unorga- 
nized points. ACM Trans Graph (TOG) 26(2):71- 
78 

Kazhdan M, Bolitho M, Hoppe H (2006) Poisson 
surface reconstruction. In: Proceedings of the fourth 
eurographics symposium on geometry processing, 
Cagliari, pp 61-70 

Cheng S-W, Jin J, Lau M-K (2012) A fast and simple 
surface reconstruction algorithm. In: Proceedings of 
the 28th annual symposium on computational geom- 
etry, Chapel Hill, pp 69-78 


2178 


Symbolic Model Checking 


Adnan Aziz! and Amit Prakash? 
‘Department of Electrical and Computer 
Engineering, University of Texas, Austin, 
TX, USA 

2Microsoft, MSN, Redmond, WA, USA 


Keywords 


Formal hardware verification 


Years and Authors of Summarized 
Original Work 


1990; Burch, Clarke, McMillan, Dill 


Problem Definition 


Design verification is the process of taking a 
design and checking that it works correctly. More 
specifically, every design verification paradigm 
has three components [6]: (1) a language for 
specifying the design in an unambiguous way, (2) 
a language for specifying properties that are to be 
checked of the design, and (3) a checking pro- 
cedure, which determines whether the properties 
hold off the design. 

The verification problem is very general: it 
arises in low-level designs, e.g., checking that a 
combinational circuit correctly implements arith- 
metic, as well as high-level designs, e.g., check- 
ing that a library written in high-level language 
correctly implements an abstract data type. 


Hardware Verification 


The verification of hardware designs is 
particularly challenging. Verification is difficult 
in part because the large number of concurrent 
operations make it very difficult to conceive of 
and construct all possible corner cases, e.g., one 
unit initiating a transaction at the same cycle 
as another receiving an exception. In addition, 
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software models used for simulation run orders 
of several magnitude slower than the final chip 
operates at. Faulty hardware is usually impossible 
to correct after fabrication, which means that the 
cost of a defect is very high, since it takes several 
months to go through the process of designing 
and fabricating new hardware. Wile et al. [15] 
provide a comprehensive account of hardware 
verification. 


State Explosion 


Since the number of state-holding elements in 
digital hardware is bounded, the number of pos- 
sible states that the design can be in is infinite, so 
complete automated verification is, in principle, 
possible. However, the number of states that a 
hardware design can reach from the initial state 
can be exponential in the size of the design; this 
phenomenon is referred to as “state explosion.” 
In particular, algorithms for verifying hardware 
that explicitly record visited states, e.g., in a hash 
table, have very high time complexity, making 
them infeasible for all but the smallest designs. 
The problem of complete hardware verification is 
known to be PSPACE-hard, which means that any 
approach must be based on heuristics. 


Hardware Model 


A hardware design is formally described using 
circuits [4, 8]. A combinational circuit consists 
of Boolean combinational elements connected 
by wires. The Boolean combinational elements 
are gates and primary inputs. Gates come in 
three types: NOT, AND, and OR. The NOT gate 
functions as follows: it takes a single Boolean- 
valued input and produces a single Boolean- 
valued output which takes value 0 if the input 
is | and | if the input is 0. The AND gate takes 
two Boolean-valued inputs and produce a single 
output; the output is 1 if both inputs are | and 
0 otherwise. The OR gate is similar to AND, 
except that its output is 1 if one or both inputs 
are 1. A circuit can be represented as a directed 
graph where the nodes represent the gates and 
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wires represent edges in the direction of signal 
flow. 

A circuit can be represented by a directed 
graph where the nodes represent the gates and 
primary inputs, and edges represent wires in the 
direction of signal flow. Circuits are required 
to be acyclic, that is, there is no cycle of 
gates. The absence of cycles implies that a 
Boolean assignment to the primary inputs can 
be propagated through the gates in topological 
order. 

A sequential circuit extends the notion of 
circuit described above by adding stateful ele- 
ments. Specifically, a sequential circuit includes 
registers. Each register has a single input, which 
is referred to as its next-state input. 

A valuation on a set V is a function whose 
domain is V. A state in a sequential circuit is a 
Boolean-valued valuation on the set of registers. 
An input to a sequential circuit is a Boolean- 
valued valuation on the set of primary inputs. 
Given a state s and an input 7, the logic gates 
in the circuit uniquely define a Boolean-valued 
valuation ¢ to the set of register inputs — this is 
referred to as the next state of the circuit at state s 
under input 7 and say s transitions to t on input i. 
It is convenient to denote such a transition by 
s— >t. 

A sequential circuit can naturally be identified 
with a finite state machine (FSM), which is a 
graph defined over the set of all states; an edge 
(s,t) exists in the FSM graph if there exists an 
input 7, state s transitions to ¢ on input 7. 


Invariant Checking 


An invariant is a set of states; informally, the term 
is used to refer to a set of states that are “good” 
in some sense. One common way to specify an 
invariant is to write a Boolean formula on the 
register variables — the states which satisfy the 
formula are precisely the states in the invariant. 
Given states r and s, define r to be reach- 
able from s if there is a sequence of inputs 
10 1 


.,J,—1 Such that s = sg —> sy — 


t. A fundamental problem in hardware 


ig, 1,-. 


“Sy, = 
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verification is the following: given an invariant A, 
and a state s, does there exists a state r reachable 
from s which is not in A? 


Key Results 


Symbolic model checking (SMC) is a heuristic 
approach to hardware verification. It is based 
on the idea that rather than representing and 
manipulating states one at a time, it is more 
efficient to use symbolic expressions to represent 
and manipulate sets of states. 

A key idea in SMC is that given a set 
A c {0,1}", a Boolean function A can be 
constructed such that f4: {0,1}” — {0, 1} given 
by f(ay,...,n) = Liff (a1,...,a,) € A. Note 
that given a characteristic function f4, A can be 
obtained and vice versa. 

There are many ways in which a Boolean func- 
tion can be represented: formulas in DNF, general 
Boolean formulas, combinational circuits, etc. In 
addition to an efficient representation for state 
sets, the ability to perform fast computations with 
sets of states is also important, for example, in 
order to determine if an invariant holds, it is 
required to compute the set of states reachable 
from a given state. BDDs [2] are particularly 
well suited to representing Boolean functions, as 
they combine succinct representation with effi- 
cient manipulation; they are the data structure 
underlying SMC. 


Image Computation 


A key computation that arises in verification is 
determining the image of a set of states A in a 
design D — the image of A is the set of all states 
t for which there exists a state in A and an input 
i such that state s transitions to ¢ under input 7. 
The image of A is denoted by Img(A). 

The transition relation of a design is the set 
of (s,i,f) triples such that s transitions to ¢ 
under input 7. Let the design have n registers and 
m primary inputs; then the transition relation is 
subset of {0, 1}” x {0, 1} x {0, 1}”. 
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Conceptually, the transition relation com- 
pletely captures the dynamics of the design — 
given an initial state, and input sequence, the 
evolution of the design is completely determined 
by the transition relation. 

Since the transition relation is a subset of 
{0,1}"*™+", it has a characteristic function 
fr: {0,1y7t™*" _, {0,1}. View fr as 
being defined over the variables xo,...,Xn—1, 
io,..-,im—1, Yo.---,¥n—1- Let the set of states 
A be represented by the function f4 defined 
over variables Xo,...,X,—1. Then the following 
identity holds 


Img(A) = (Axo *dxn-15lo-:: dim—1) (fa A Sr). 


The identity holds because (Bo,...,8y,—1) satis- 
fies the right-hand side expression exactly when 
there are values Qo,...,Q@y,—1, and lo,...,ly—1 
such that (ao9,...,@,-1) © A and the state 
(Qo,..-,Qn,—1) transitions to (Bo,...,Bn—1) on 


input (t9,..., lm—1)- 


Invariant Checking 


The set of all states reachable from a given set A 
is the limit as n tends to infinity of the sequence 
of states Ro, R1,... defined below: 


RA 
Ri4+1= R; U Img(R;). 


Since for all 7, Rj CG Rj+41 and the number of 
distinct state sets is finite, the limit is reached in 
some finite number of steps, i.e., for some n, it 
must be that Ry+1 = Ry. It is straightforward 
to show that the limit is exactly equal to the set 
of states reachable from A — the basic idea is to 
inductively construct input sequences that lead 
from states in A to R; and to show that state 
t is reachable from a state in A under an input 
sequence of length /, then ¢ must be in R;. 
Given BDDs F and G representing func- 
tions f and g, respectively, there is an algorithm 
based on dynamic programming for performing 
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conjunction, i.e., for computing the BDD for 
f -g. The algorithm has polynomial complexity, 
specifically O(|F'| - |G|), where |B| denotes the 
number of nodes in the BDD B. There are sim- 
ilar algorithms for performing disjunction (f + 
g) and computing cofactors (f, and fy’). To- 
gether these yield an algorithm for the opera- 
tion of existential quantification, since (Ax) f = 
fae Se 

It is straightforward to build BDDs for f4 and 
fr : Ais typically given using a propositional 
formula, and the BDD for f4 can be built up 
using functions for conjunction, disjunction, and 
negation. The BDD for fr is built using from the 
BDDs for the next-state nodes, over the register 
and primary input variables. Since the only gate 
types are AND, OR, and NOT, the BDD can 
be built using the standard BDD operators for 
conjunction, disjunction, and negation. Let the 
next-state functions be fo,..., fn—1; then fr is 
(Yo = fo): (v1 = fi) +++* n-1 = fn-1), and 
so the BDD for fr can be constructed using the 
usual BDD operators. 

Since the image computation operation can be 
expressed in terms of f4 and Fr, and conjunction 
and existential quantification operations, it can be 
performed using BDDs. The computation of R; 
involves an image operation, and a disjunction, 
and since BDDs are canonical, the test for fixed 
point is trivial. 


Applications 


The primary application of the technique 
described above is for checking properties of 
hardware designs. These properties can be 
invariants described using propositional formulae 
over the register variables, in which case the 
approach above is directly applicable. More 
generally, properties can be expressed in a 
temporal logic [5], specifically through formulae 
which express acceptable sequences of outputs 
and transitions. 

CTL is one common temporal logic. A CTL 
formula is given by the following grammar: if x 
is a variable corresponding to a register, then x 
is a CTL formula; otherwise, if @ and W are CTL 
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formulas, then so as (=o), (VW), (DAW), (6 > 
Ww), and EX, EdUw, and EGO. 

A CTL formula is interpreted as being true at 
a state; a formula x is true at a state if that register 
is 1 in that state. Propositional connectives are 
handled in the standard way, e.g., a state satisfies 
a formula (} A W) if it satisfies both @ and w. 
A state s satisfies EGo if there exists a state f 
such that s transitions to, and ¢ satisfies @. A state 
S satisfies EdU Wy if there exists a sequence of 
inputs ig,...,é, leading through state so = s, 
S1,52,.-.,5n41 Such that s,+1 satisfies Wy, and all 
states s;,i < n+ 1 satisfy ¢. A state s satisfies 
EGo if there exists an infinite sequence of inputs 
ig,i1,... leading through state sp = s,51,S2,... 
such that all states s; satisfy @. 

CTL formulas can be checked by a 
straightforward extension of the technique 
described above for invariant checking. One 
approach is to compute the set of states 
in the design satisfying subformulas of 4, 
starting from the subformulas at the bottom 
of the parse tree for g. A minor difference 
between invariant checking and this approach 
is that the latter relies on pre-image com- 
putation; the pre-image of A is the set of 
all states ¢ for which there exists an input 
i such that ¢ transitions under 7 to a state 
in A. 

Symbolic analysis can also be used to check 
the equivalence of two designs by forming a new 
design which operates the two initial designs in 
parallel and has a single output that is set to 
1 if the two initial designs differ [14]. In prac- 
tice this approach is too inefficient to be useful, 
and techniques which rely more on identifying 
common substructures across designs are more 
successful. 

The complement of the set of reachable states 
can be used to identify parts of the design which 
are redundant and to propagate don’t care con- 
ditions from the input of the design to internal 
nodes [12]. 

Many of the ideas in SMC can be applied to 
software verification — the basic idea is to “fini- 
tize” the problem, e.g., by considering integers 
to lie in a restricted range or setting an a priori 
bound on the size of arrays [7]. 
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Experimental Results 


Many enhancements have been made to the basic 
approach described above. For example, the BDD 
for the entire transition relation can grow large, 
so partitioned transition relations [11] are used 
instead; these are based on the observation that 
dx.(f -g) = f -4dx.g, in the special case that f 
is independent of x. Another optimization is the 
use of don’t cares; for example, when computing 
the image of A, the BDD for fy can be sim- 
plified with respect to transitions originating at 
A’ [13]. Techniques based on SAT have enjoyed 
great success recently. These approaches case the 
verification problem in terms of satisfiability of a 
CNF formula. They tend to be used for bounded 
checks, i.e., determining that a given invariant 
holds on all input sequences of length & [1]. 
Approaches based on transformation-based ver- 
ification complement symbolic model checking 
by simplifying the design prior to verification. 
These simplifications typically remove complex- 
ity that was added for performance rather than 
functionality, e.g., pipeline registers. 

The original paper by Clarke et al. [3] re- 
ported results on a toy example, which could 
be described in a few dozen lines of a high- 
level language. Currently, the most sophisticated 
model checking tool for which published re- 
sults are ready is SixthSense, developed at IBM 
[10]. 

A large number of papers have been published 
on applying SMC to academic and industrial 
designs. Many report success on designs with 
an astronomical number of states — these results 
become less impressive when taking into consid- 
eration the fact that a design with 7 registers has 
2” states. 

It is very difficult to define the complexity of a 
design. One measure is the number of registers in 
the design. Realistically, a hundred registers is at 
the limit of design complexity that can be handled 
using symbolic model checking. There are cases 
of designs with many more registers that have 
been successfully verified with symbolic model 
checking, but these registers are invariably part 
of a very regular structure, such as a memory 
array. 
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Data Sets 


The SMV system described in [9] has been up- 
dated, and its latest incarnation nuSMV (http:// 
nusmv.irst.itc.it/) includes a number of examples. 

The VIS (http://embedded.eecs.berkeley.edu/ 
pubs/downloads/vis) system from UC Berkeley 
and UC Boulder also includes a large collec- 
tion of verification problems, ranging from sim- 
ple hardware circuits to complex multiprocessor 
cache systems. 

The SIS (http://embedded.eecs.berkeley.edu/ 
pubs/downloads/sis/) system from UC Berkeley 
is used for logic synthesis. It comes with a num- 
ber of sequential circuits that have been used for 
benchmarking symbolic reachability analysis. 


Cross-References 


Binary Decision Graph 


Recommended Reading 


1. Biere A, Cimatti A, Clarke E, Fujita M, Zhu Y 
(1999) Symbolic model checking using sat proce- 
dures instead of BDDs. In: ACM design automation 
conference, New Orleans 

2. Bryant R (1986) Graph-based algorithms for Boolean 
function manipulation. IEEE Trans Comput C- 
35:677-691 

3. Burch JR, Clarke EM, McMillan KL, Dill DL (1992) 
Symbolic model checking: 107° states and beyond. 
Inf Comput 98(2):142-170 

4. Cormen TH, Leiserson CE, Rivest RH, Stein C 
(2001) Introduction to algorithms. MIT, Cambridge 

5. Emerson EA (1990) Temporal and modal logic. In: 
van Leeuwen J (ed) Formal models and semantics. 
Volume B of handbook of theoretical computer sci- 
ence. Elsevier Science, Amsterdam, pp 996-1072 

6. Gupta A (1993) Formal hardware verification meth- 
ods: a survey. Form Method Syst Des 1:151—238 

7. Jackson D (2006) Software abstractions: logic, lan- 
guage, and analysis. MIT, Cambridge 

8. Katz R (1993) Contemporary logic design. Ben- 
jamin/Cummings Publishing Company, Redwood 
City 

9. McMillan KL (1993) Symbolic model checking. 
Kluwer Academic, Boston 

10. Mony H, Baumgartner J, Paruthi V, Kanzelman R, 
Kuehlmann A (2004) Scalable automated verification 
via expert-system guided transformations. In: Formal 
methods in CAD, Austin 


Symmetric Graph Drawing 


11. Ranjan R, Aziz A, Brayton R, Plessier B, Pixley C 
(1995) Efficient BDD algorithms for FSM synthe- 
sis and verification. In: Proceedings of the interna- 
tional workshop on logic synthesis, Tahoe City, May 
1995 

12. Savoj H (1992) Don’t cares in multi-level network 
optimization. Ph.D. thesis, Electronics Research Lab- 
oratory, College of Engineering, University of Cali- 
fornia, Berkeley 

13. Shiple TR, Hojati R, Sangiovanni-Vincentelli AL, 
Brayton RK (1994) Heuristic minimization of BDDs 
using don’t cares. In: ACM design automation con- 
ference, San Diego, June 1994 

14. Touati H, Savoj H, Lin B, Brayton RK, Sangiovanni- 
Vincentelli AL (1990) Implicit state enumeration of 
finite state machines using BDDs. In: IEEE interna- 
tional conference on computer-aided design, Santa 
Clara, pp 130-133, Nov 1990 

15. Wile B, Goss J, Roesner W (2005) Comprehensive 
functional verification. Morgan-Kaufmann 


Symmetric Graph Drawing 


Seokhee Hong 
School of Information Technologies, University 
of Sydney, Sydney, NSW, Australia 


Keywords 


Graph automorphism; Graph drawing; Planar 
graph; Symmetry 


Years and Authors of Summarized 
Original Work 


2006; Hong, McKay and Eades 


Problem Definition 


Symmetry is one of the most important aes- 
thetic criteria in graph drawing that clearly re- 
veals the structure and properties of a graph. 
Many graphs in Graph Theory textbooks are often 
symmetric. 

A symmetry of a drawing D of a graph G 
induces an automorphism ¢ of the graph G, 
a permutation of the vertex set that preserves 
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adjacency. If an automorphism @ can be dis- 
played as a symmetry in a drawing of the graph 
G, then it is called a geometric automorphism [6]. 
A geometric automorphism @¢ of a planar graph 
G is a planar automorphism, if there is a planar 
drawing of G which displays ¢. Note that not 
every automorphism is geometric, and not every 
geometric automorphism is planar. 


In general, algorithms for constructing 
symmetric drawings of graphs have two 
steps: 


1. Symmetry finding step: Find the geometric 
automorphisms of a graph 

2. Symmetry drawing step: Draw the graph dis- 
playing these automorphisms as symmetries. 


Note that the first step is more difficult than 
the second step. For example, finding automor- 
phism of a graph is isomorphism-hard; however 
finding geometric automorphism of a graph is 
NP-hard in general [18]. For planar graphs, com- 
puting isomorphism (therefore, automorphism) 
of a graph can be solved in linear time [7, 17]. 
However, finding the best plane embedding of 
planar graphs that displays the maximum number 
of symmetries in a drawing of a planar graph 
is challenging, because a planar graph can have 
exponential number of possible plane embed- 
dings. 

Furthermore, the product of two geometric 
automorphisms is not necessarily geometric, be- 
cause they may be displayed by different draw- 
ings. A subgroup A of the automorphism group 
of a graph is a geometric automorphism group, if 
there is a single drawing of the graph that displays 
every element of A. Therefore, to construct a 
maximally symmetric drawing of a graph, one 
needs to compute a maximum size geometric 
automorphism group for the graph. Therefore, 
the main research problem for Symmetric Graph 
Drawing can be defined as below. 


Symmetric Graph Drawing Problem 

Input: A graph G. 

Output: A maximum size geometric automorphism 
group A of G, A symmetric drawing D of G that 
displays all elements of A. 
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Key Results 


There are two types of symmetry in two- 
dimensional drawings: rotational symmetry 
(i.e., a rotation about a point) and axial (or 
reflectional) symmetry (i.e., a reflection about 
an axis). The order of an automorphism a is the 
smallest positive integer k such that a* equals 
the identity 7. A group-theoretic characterization 
of geometric automorphism group was given by 
Eades and Lin [6] as follows: 


¢ A group of order 2 generated by an axial 
automorphism; 

¢ A cyclic group of order k generated by a 
rotational automorphism; 

¢ A dihedral group of order 2k generated by 
a rotational automorphism of order k and an 
axial automorphism. In this case there are k 
axial symmetries. 


In two dimensions, the problem of determin- 
ing whether a given graph can be drawn sym- 
metrically is NP-complete in general [18]. Exact 
algorithms are devised based on Branch and Cut 
approach by Buchheim and Junger [3] and a 
group-theoretic approach by Abelson et al. [1]. 
Linear-time algorithms are available for trees and 
outerplanar graphs by Manning and Atallah [19, 
20] and for series-parallel digraphs by Hong et 
al. [14]. Linear-time algorithms are presented for 
maximally symmetric drawings of triconnected 
planar graphs by Hong et al. [15] and for bicon- 
nected, oneconnected, and disconnected planar 
graphs by Hong and Eades [10, 12, 13]. Hong 
and Nagamochi presented a linear-time algorithm 
for constructing a symmetric convex drawings of 
internally trconnected planar graphs [16]. For a 
survey on symmetric drawings of graphs in two 
dimensions, see [5]. 

In three dimensions, the problem of determin- 
ing whether a graph can be drawn symmetrically 
in three dimensions is NP-hard in general [8]. 
A group-theoretic characterization of symmetric 
drawing in n-dimensions and exact algorithms 
based on a group-theoretic approach are given 


2184 


by Abelson et al. [1]. Linear-time algorithms 
are available for trees by Hong and Eades [9], 
series-parallel digraphs by Hong et al. [11], and 
biconnected and oneconnected planar graphs [8]. 

In this article, we review a_linear-time 
algorithm for constructing maximally symmetric 
straight-line drawings of triconnected planar 
graphs by Hong, McKay, and Eades [15]. 
The following theorem summarizes their main 
results. 


Theorem 1 There is a linear-time algorithm that 
constructs straight-line drawings of maximally 
symmetric planar drawings of triconnected pla- 
nar graphs. 


Computing a Planar Automorphism Group 

of Maximum Size 

We first review the first step of the algorithm, i.e., 
symmetry finding step for triconnected planar 
graphs [15]. A geometric automorphism group A 
of a graph G is a planar automorphism group, 
if there is a planar drawing of the graph that 
displays every element of A. 

Suppose that A is a group acting on a set X. 
The stabilizer of x € X, denoted by stab,(x), 
is{g € A | g(x) = x}, and the orbit of x, 
denoted by orbit,(x), is {g(x) | g € A}. We 
say that g € A fixes x € X if g(x) = x; if g fixes 
x for every g € A, then A fixes x. If X' C X 
and $(x’) € X’ for all x’ € X’, then g fixes X’. 
Automorphisms 21, 22,..., @% are called gener- 
ators of (g1, Z2,..-, &x); the group consists of all 
permutations formed from products of elements 
Ot {Pi Bii0ss5 Bir 

Hong et al. [15] characterize planar automor- 
phisms as below. 


Lemma 1 Let G be a triconnected planar graph. 
An automorphism of G is a planar automorphism 


if and only if it fixes a face of G. 


To find the best plane embedding to compute 
a planar automorphism group with a maximum 
size, the algorithm uses the Stabilizer-Orbit theo- 
rem in group theory [2]. 


Theorem 2 (Stabilizer-Orbit theorem) Sup- 
pose that A is a group acting on a set X and let 
x € X. Then |A| = |orbita(x)| x |staba(x)|. 


Symmetric Graph Drawing 


The overall algorithm computing a maximum 
size planar automorphism group of a triconnected 
planar graph can be described as follows; 


Algorithm Compute Max PAG 


1. Find a plane embedding which has a maxi- 
mum size planar automorphism group. 

2. Perform “star triangulation” for the given em- 
bedding. 

3. Compute the generators of the planar auto- 
morphism group of the new embedding. 


The first step of Compute _Max_PAG uses 
two applications of an algorithm of Fontet [7], 
which computes the orbits on vertices of the (full) 
automorphism group of a triconnected planar 
graph in linear time. 


Theorem 3 Fontet’s algorithm [7] can be used 
to find a plane embedding of a triconnected graph 
G such that the corresponding planar automor- 
phism group is maximized in linear time. 

Proof Based on Lemma 1, we take a dual graph 
of G* of G and compute the orbits of G* using 
Fontet’s algorithm [7]. Choose an orbit O of 
minimum size; the stabilizer O has the maximum 
size, by Theorem 2. Taking a face f € O as 
the outer face of the plane embedding of G, we 
have an embedding that displays the maximum 
number of symmetries. 


Once the outer face and thus the plane 
embedding is chosen, the second step of 
Compute Max PAG performs star triangu- 
lation, i.e., triangulate each internal face f 
by inserting a new vertex v in the face and 
joining v to each vertex of f. Clearly, this 
step takes linear time and simplifies the drawing 
algorithm. 

The final step of Compute Max _PAG is to 
compute the planar automorphism group for star- 
triangulated plane graph. Since an explicit repre- 
sentation of the planar automorphism group may 
take more than linear space, for a more com- 
pact representation, an algorithm for computing 
minimal generators was devised. For details on 
a linear-time algorithm for computing generators 
of a planar automorphism group, see [15]. 
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Symmetric Graph 
Drawing, Fig. 1 Example 
of (a) a wedge and (b) 
merging step 


Overview of the Drawing Algorithm 

We now review a linear-time drawing algorithm 
for constructing a symmetric drawing of a tricon- 
nected planar graph that achieves that maximum 
with straight-line edges. The main characteris- 
tic of symmetric drawings is the repetition of 
congruent drawings of isomorphic subgraphs. To 
exploit this property, the drawing algorithm uses 
a divide and conquer approach: (i) divide the 
graph into isomorphic subgraphs; (ii) compute a 
drawing for a subgraph; and (iii) merge multiple 
copies of drawings of subgraphs to construct a 
symmetric drawing of the whole graph. Overall, 
each step of the drawing algorithm runs in linear 
time. 

The input of the drawing algorithm is a 
triconnected planar graph with fixed plane em- 
bedding and a specified outer face, which maxi- 
mize the number of symmetries. The symmetric 
drawing algorithm takes a different approach 
for each type of planar automorphism group: 
i.e., cyclic case, one axial case, and dihedral 
case. 


The Cyclic Case 
Here we describe how to display k rotational 
symmetries. Note that after star triangulation, 
there is a central vertex c, which is fixed by the 
planar automorphism group for k > 3. If k = 2, 
there exits either a central vertex or a central edge. 
If there is a central edge, then we preprocess the 
graph by inserting a dummy central vertex c into 
the central edge with two dummy edges. 

The rotational symmetric drawing algorithm 
consists of three steps: 
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Algorithm Cyclic 


1. Find Wedge Cyclic. 
2. Draw Wedge Cyclic. 
3. Merge Wedges Cyclic. 


The first step is to find a subgraph wedge W, 
which takes linear time: 


Algorithm Find Wedge Cyclic 

1. Find the central vertex c. 

2. Find a shortest path P,, from c to a vertex v1 
on the outer face, using breadth first search. 

3. Find the path Pz which is a mapping of P, 
under a minimal generator of the rotation. 

4. Find the wedge W (see Fig. la), an induced 
subgraph of G enclosed by the cycle formed 
from P;, Pz anda path Po along the outer face 
from v, to v2. 


The second step, Draw Wedge Cyclic, 
constructs a drawing D of the wedge W using 
Algorithm CYN, the linear-time convex drawing 
algorithm by Chiba et al. [4], such that P;, Po, 
and Po are drawn as straight lines. The input 
to Algorithm CYN is an internally triconnected 
plane graph G with given outer face S and 
a straight-line drawing S* of S as a weakly 
convex polygon, 1.e., not every vertex of the outer 
face needs to be at an apex (i.e., the interior 
angle is less than zr) of the polygon. Algorithm 
CYN chooses a vertex v and deletes it from G 
together with incident edges and divides the 
resulting graph G’ = G — v into the biconnected 
components B,, Bz,..., By, p = 1. It defines a 
convex polygon S;* of the outer facial cycle S; of 
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Symmetric Graph a 
Drawing, Fig. 2 Example 
of (a) a fixed string of 
diamonds and (b) we 


each B; and recursively applies the algorithm to 
draw B; with S as outer boundary. For details, 
see [4]. 

The last step, Merge Wedges Cyclic, 
constructs a drawing of the whole graph G by 
replicating the drawing D of W, k times. Note 
that this merge step relies on the fact that P; and 
Pz are drawn as straight lines. See Fig. 1b. 

It is clear that Algorithm Cyclic constructs 
a straight-line drawing of a triconnected plane 
graph which shows k rotational symmetry in 
linear time. 


One Axial Symmetry 

Consider a drawing of a star-triangulated plane 
graph with one axial symmetry. There are fixed 
vertices, edges, and/or fixed faces on the axis; 
we need to characterize the subgraph formed by 
these. 

A diamond is either a triangle or the 4-vertex 
graph. A string of diamonds is a graph formed 
from a path P (V1, V2,...,UzK), kK = 2, by 
a number (zero or greater) of “splitting” opera- 
tions, as follows. If 1 < i < k —1, then the 
edge (v;, vj+1) may be replaced by a diamond. 
Alternatively, each of the end edges (v1, v2) and 
(Ugz—-1, Ue) may be replaced by a triangle. Note 
that a string of diamonds is basically a path 
consisting of edges and diamonds; each end of 
the path may be a triangle; see Fig. 2a. 

To display a single axial symmetry, we need 
two steps. First we identify the fixed string of di- 
amonds; then use Algorithm Symmetric _CYN, 
a modified version of Algorithm CYN. More for- 
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mally, the algorithm One Axial is described 
below. 


Algorithm One_Axial 


1. Find a fixed string of diamonds. Suppose that 
@1,@2,...,@x are the fixed edges and ver- 
tices in the fixed string of diamonds, in order 
from the outer face (@, is on the outer face). 
For each £, wg may be a vertex or an edge (see 
Fig. 2b). 

2. Choose a symmetric convex polygon $* for 
the outer face S' of G. 

3. Symmetric CYN(1,S*,G,y1). 


The main ingredient in Algorithm One_Axi- 
al is Algorithm Symmetric CYN. To modify 
Algorithm CYN to display a single axial sym- 
metry, the following three conditions should be 
satisfied: 


¢ Choose the first vertex or edge on the fixed 
string of diamonds @ (see Fig. 3). 

¢ Let D(B;) be the drawing of B; and a be 
the axial symmetry. Then, D(B;) should be a 
reflection of D(B;), where B; = a(B;),i = 
1,2,...,m and m | p/2|: To satisfy this 
condition, define S$ j to be the reflection of S*, 

1,2,...,m. Then we apply Algorithm 
CYN for B;,i = 1,2,...,m and construct 
D(8B;) using a reflection of D(B;). 

¢ If p is odd, then D(By+1) should display 
axial symmetry: To satisfy this condi- 
tion, we recursively apply Algorithm 
Symmetric CYN to By41. 


l= 
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Symmetric Graph Drawing, Fig. 3 Example of a symmetric version of CYN 


Note that the position of w2 in Fig.3 can 
be chosen arbitrarily along the axis of symme- 
try of S* within S*. This means that we can 
specify the positions of the fixed vertices and 
middle edges along the axis of symmetry a priori, 
that is, as input to the algorithm. The Algo- 
rithm Symmetric_CYN can be described as 
below: 


Algorithm Symmetric _CYN 


input: £: index of vertex or middle edge on the 
fixed string of diamonds. 

input: S*: a weakly convex polygon of the 
outer face of S of G. 

input: G: a triangulated planar graph. 

input: yg: a position on the axis of symmetry 
for the fixed vertex or the fixed edge wy. 


1. Delete wz from G together with edges incident 
to we. Divide the resulting graph G’ = G — 
wg into the blocks By, Bz,...,By, p = 1, 
ordered anticlockwise around the outer face. 
Let m = |p/2|. 

2. Determine a convex polygon S* of the outer 
facial cycle S; of each B; such that B; with 
S* satisfy the conditions for convex drawing 
algorithm CYN and ae 4, 1s a reflection of 
S*. 

3. For eachi = | tom, 

(a) Construct a drawing D(B;) of B; using 
Algorithm CYN. 

(b) Construct D(By-j+1) as a reflection of 
D(Bj). 


4. If p is odd, then construct a drawing 
D(Bm4i1) using Symmetric _CYN(£ + 
1, S41 Bm+i+ Yeri)- 

5. Merge the D(B;) to form a drawing of G, 
placing weg at ye. 


Since Algorithm CYN [4] runs in linear time, 
clearly Algorithm Symmetric CYN and Algo- 
rithm One_Axial takes linear time. 


The Dihedral Case 

We now review an algorithm for displaying a 
dihedral group < p,a >, where p is a rotation of 
order k and @ is an axial automorphism. As with 
the cyclic case, we assume that there is a central 
vertex. 

The drawing algorithm adopts the same strat- 
egy as for the cyclic case: (i) divide the graph 
into “wedges”; (ii) draw each wedge; and (iii) 
merge the drawings of wedges to construct a 
symmetric drawing of the whole graph. However, 
the dihedral case is more difficult than the cyclic 
case, because an axial symmetry in the dihedral 
group can have fixed faces as well as fixed edges; 
i.e., the boundary of a wedge may be a fixed string 
of diamonds as in the one axial case. To achieve 
dihedral symmetry, the axis of symmetry must be 
the perpendicular bisector of the middle edge of 
each diamond. This makes the merging operation 
more difficult. 

Consider a drawing of a triconnected planar 
graph with a dihedral symmetry group of size 2k. 
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Symmetric Graph Drawing, Fig. 4 Wedge for the dihedral case 


There are & axial symmetries, with axes at angles 
of wi/k,0 <i < k —1, to the x axis, as in 
Fig. 4a. Roughly speaking, a wedge is the area 
between two adjacent axes, as in Fig. 4b. Note 
that in these wedges, the boundaries P; and P2 
may be strings of diamonds. These may terminate 
in a triangle. 

As with the cyclic case, Algorithm Dihedral 
has three steps: (1) Find Wedge Dihedral, 
(ii) Draw Wedge Dihedral, and (iii) 
Merge Wedges Dihedral. 

The first step is to define the “wedge” sub- 
graph by finding two fixed strings of diamonds. 
Note that one can find the central vertex c and the 
two fixed strings of diamonds P; and P3 in linear 
time using the generators of the group. 


Algorithm Find Wedge Dihedral 


1. Find the central vertex c. 

2. Find a string of diamonds P, that is fixed by 
a, from c to a vertex v or an edge e on the 
outer face. 

3. Traverse the outer face, clockwise from v (or 
e) to the vertex v’ or edge e’ that is fixed by 
p ‘ap. Let Po denote the path so traversed. 

4. Find the string of diamonds P for p~!ap, 
from c to v’ (or e’). 

5. Define the wedge W to be the subgraph en- 
closed by Po, Pi, and P2, including the ver- 
tices and edges of Po, Pi, and Pp. 


The second step, Draw Wedge Dihedral, 
constructs a drawing of the wedge, which is the 
most complicated step of the drawing algorithm. 


This step must ensure that the middle edge of 
each diamond on the boundary is orthogonal to 
the axis of reflection. 

Roughly speaking, the algorithm Draw_ 
Wedge Dihedral runs as follows: (i) Find 
all special diamonds of P, and P2 that share 
fixed vertices or fixed edges, and draw them first 
using algorithm Draw _Special Diamonds; 
(11) choose the positions of all the fixed vertices 
of P,; and P, that have not been drawn 
so far; (iii) subdivide the wedge in various 
ways to form “subwedges”; (iv) draw each of 
these subwedges using Algorithms CYN and 
Symmetric _CYN accordingly. For details, 
see [15]. 

The final step, Algorithm Merge Wedges __ 
Dihedral simply constructs a drawing for 
the whole graph by merging the drawing D of 
the wedge W. Clearly each step of Algorithm 
Dihedral takes linear time. 
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Problem Definition 


Consider a communication network, modeled by 
an n-vertex undirected unweighted graph G = 
(V, E), for some positive integer n. Each vertex 
of G hosts a processor of unlimited computa- 
tional power; the vertices have unique identity 
numbers, and they communicate via the edges of 
G by sending messages of size O(log) each. 

In the synchronous setting, the communica- 
tion occurs in discrete rounds, and a message 
sent in the beginning of a round R arrives at 
its destination before the round R ends. In the 
asynchronous setting, each vertex maintains its 
own clock, and clocks of distinct vertices may 
disagree. It is assumed that each message sent (in 
the asynchronous setting) arrives at its destination 
within a certain time t after it was sent, but the 
value of t is not known to the processors. 

It is generally much easier to devise 
algorithms that apply to the synchronous setting 
(henceforth, synchronous algorithms) rather 
than to the asynchronous one (henceforth, 
asynchronous algorithms). In [1] Awerbuch 
initiated the study of simulation techniques 
that translate synchronous algorithms to 
asynchronous ones. These simulation techniques 
are called synchronizers. 
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To devise the first synchronizers, Awer- 
buch [1] constructed a certain graph partition 
which is of its own interest. In particular, 
Peleg and Schiffer noticed [8] that this graph 
partition induces a subgraph with certain 
interesting properties. They called this subgraph 
a graph spanner. Formally, for a positive integer 
parameter k, a k-spanner of a graph G = (V, E) 
isa subgraph G’ = (V, H), H © E, such that for 
every edge e = (v,u) € E, the distance between 
the vertices v and u in A, dist, (v, uv), is at 
most k. 


Key Results 


Awerbuch devised three basic synchronizers, 
called a, B, and y. The synchronizer « is the 
simplest one; using it results in only a constant 
overhead in time, but in a very significant 
overhead in communication. Specifically, the 
latter overhead is linear in the number of edges of 
the underlying network. Unlike the synchronizer 
a, the synchronizer 6 requires a somewhat costly 
initialization stage. In addition, using it results in 
a significant time overhead (linear in the number 
of vertices n), but it is more communication 
efficient than a. Specifically, its communication 
overhead is linear in 7. 

Finally, the synchronizer y represents a trade- 
off between the synchronizers a and B. Specif- 
ically, this synchronizer is parametrized by a 
positive integer parameter k. When k is small, 
then the synchronizer behaves similarly to the 
synchronizer a, and when k is large, it behaves 
similarly to the synchronizer 6. A particularly 
important choice of k is k = log n. At this point 
on the trade-off curve, the synchronizer y has a 
logarithmic in n time overhead and a linear in 
n communication overhead. The synchronizer y 
has, however, a quite costly initialization stage. 

The main result of [1] concerning spanners 
is that for every kK = 1,2,..., and every n- 
vertex unweighted undirected graph G = (V, E), 
there exists an O(k)-spanner with O(n tk) 
edges. (This result was explicated by Peleg and 
Schaffer [8].) 
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Applications 


Synchronizers are extensively used for con- 
structing asynchronous algorithms. The first 
applications of synchronizers are constructing 
the breadth-first-search tree and computing 
the maximum flow. These applications were 
presented and analyzed by Awerbuch in [1]. Later 
synchronizers were used for maximum matching 
[10], for computing shortest paths [7], and for 
other problems. 

Graph spanners were found useful for a vari- 
ety of applications in distributed computing. In 
particular, some constructions of synchronizers 
employ graph spanners [1,9]. In addition, span- 
ners were used for routing [4] and for computing 
almost shortest paths in graphs [5]. 


Open Problems 


Synchronizers with improved properties were de- 
vised by Awerbuch and Peleg [3] and Awerbuch 
et al. [2]. Both these synchronizers have poly- 
logarithmic time and communication overheads. 
However, the synchronizers of Awerbuch and 
Peleg [3] require a large initialization time. (The 
latter is at least linear in n.) On the other hand, 
the synchronizers of [2] are randomized. A major 
open problem is to obtain deterministic synchro- 
nizers with polylogarithmic time and communi- 
cation overheads and sublinear in n initialization 
time. In addition, the degrees of the logarithm 
in the polylogarithmic time and communication 
overheads in synchronizers of [2, 3] are quite 
large. Another important open problem is to con- 
struct synchronizers with improved parameters. 

In the area of spanners, spanners that distort 
large distances to a significantly smaller extent 
than they distort small distances were constructed 
by Elkin and Peleg in [6]. These spanners fall 
short from achieving a purely additive distortion. 
Constructing spanners with a purely additive dis- 
tortion is a major open problem. 
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Problem Definition 


Table compression was introduced by Buchsbaum 
et al. [3] as a unique application of compression, 
based on several distinguishing characteristics. 
Tables are collections of fixed-length records and 
can grow to be terabytes in size. They are often 
generated by information systems and kept in 
data warehouses to facilitate ongoing operations. 
These data warehouses will typically manage 
many terabytes of data online, with significant 
capital and operational costs. In addition, the 
tables must be transmitted to different parts 
of an organization, incurring additional costs 
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for transmission. Typical examples are tables 
of transaction activity, like phone calls and 
credit card usage, which are stored once but 
then shipped repeatedly to different parts of 
an organization: for fraud detection, billing, 
operations support, etc. The goals of table 
compression are to be fast, online, and effective: 
eventual compression ratios of 100:1 or better 
are desirable. Reductions in required storage and 
network bandwidth are obvious benefits. 

Tables are different than general databases 
[3]. Tables are written once and read many 
times, while databases are subject to dynamic 
updates. Fields in table records are fixed in 
length, and records tend to be homogeneous; 
database records often contain intermixed fixed- 
and variable-length fields. Finally, the goals 
of compression differ. Database compression 
stresses index preservation, the ability to retrieve 
an arbitrary record, under compression [7]. 
Tables are typically not indexed at the level of 
individual records; rather, they are scanned in 
toto by downstream applications. 

Consider each record in a table to be a row in 
a matrix. A naive method of table compression is 
to compress the string derived from scanning the 
table in row-major order. Buchsbaum et al. [3] 
observe experimentally that partitioning the 
table into contiguous intervals of columns and 
compressing each interval separately in this 
fashion can achieve significant compression 
improvement. The partition is generated by a 
one-time, off-line training procedure, and the 
resulting compression strategy is applied online 
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to the table. In their application, tables are 
generated continuously, so off-line training time 
can be ignored. They also observe heuristically 
that certain rearrangements of the columns prior 
to partitioning further improve compression by 
grouping dependent columns more closely. For 
example, in a table of addresses and phone 
numbers, the area code can often be predicted 
by the zip code when both are defined geo- 
graphically. In information-theoretic terms, these 
dependencies are contexts, which can be used to 
predict parts of a table. Analogously to strings, 
where knowledge of context facilitates succinct 
codings of symbols, the existence of contexts in 
tables implies, in principle, the existence of a 
more succinct representation of the table. 

Three main avenues of research have fol- 
lowed, one based on the notion of combinatorial 
dependency [3, 4], another on the notion of 
column dependency [17, 18], and the third on 
the notion of motifs and templates [1]. The 
first formalizes dependencies analogously to 
the joint entropy of random variables, while 
the second does so analogously to conditional 
entropy [8]. The third finds inspiration in classic 
paradigms of data compression such as textual 
substitution [19, 20]. These approaches to table 
compression have deep connections to universal 
similarity metrics [12], based on Kolmogorov 
complexity and compression, and their later uses 
in classification [6]. The first two approaches are 
instances of a new emerging paradigm for data 
compression, referred to as boosting [9], where 
data are reorganized to improve the performance 
of a given compressor. A software platform to 
facilitate the investigation of such invertible data 
transformations is described by Vo [16] 


Notations 

Let T be a table of n = |T| columns and m rows. 
Let T [i] denote the ith column of T. Given two 
tables 7; and 7», let T; 72 be the table formed by 
their juxtaposition. That is, 7 = 77> is defined 
so that T [i] = 7, [i] for 1 <i < |7,| and T[i] = 
T2[i — |T,|] for |T| <i < |T| + |To|. We use 
the shorthand 7'[i, 7] to represent the projection 
T |i]---T[j] for any 7 > i. Also, givena sequence 
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P of column indices, we denote by T[P] the table 
obtained from T by projecting the columns with 
indices in P. 


Combinatorial Dependency and Joint 
Entropy of Random Variables 

Fix a compressor C: e.g., gzip, based on LZ77 
[19]; compress, based on LZ78 [20]; or bzip, 
based on Burrows-Wheeler [5]. Let Hce(T) be 
the size of the result of compressing table T 
as a string in row-major order using C. Let 
He(T,T2) = He(T1, T2).Hc(-) is thus a cost 
function defined on the ordered power set of 
columns. Two tables T; and 72, which might be 
projections of columns from a common table T, 
are combinatorially dependent if Hc(T,, T2) < 
He(T,)+ He(T>) — if compressing them together 
is better than compressing them separately — 
and combinatorially independent otherwise. 
Buchsbaum et al. [3] show that combinatorial 
dependency is a compressive estimate of 
statistical dependency when formalized by the 
joint entropy of two random variables, i.e., 
the statistical relatedness of two objects is 
measured by the gain realized by compressing 
them together rather than separately. Indeed, 
combinatorial dependency becomes statistical 
dependency when Hc is replaced by the joint 
entropy function [8]. Analogous notions starting 
from Kolmogorov complexity are derived 
by Li et al. [12] and used for classification 
and clustering [6]. Figure | exemplifies why 
rearranging and partitioning columns may 
improve compression. 
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Table Compression, Fig. 1 The first three columns of 
the table, taken in row-major order, form a repetitive string 
that can be very easily compressed. Therefore, it may be 
advantageous to compress these columns separately. If the 
fifth column is swapped with the fourth, we get an even 
longer repetitive string that, again, can be compressed 
separately from the other two columns 
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Problem 1 Find a partition P of T into sets of 


contiguous columns that minimizes )> Hc(Y) 
Yep 
over all such partitions. 


Problem 2 Find a partition P of T that mini- 


mizes >> He(Y) over all partitions. 
YeP 


The difference between Problems | and 2 is 
that the latter does not require the parts of P to 
be sets of contiguous columns. 


Column Dependency and Conditional 
Entropy of Random Variables 


Definition 1 For any table 7, a dependency re- 
lation is a pair (P,c) in which P is a sequence 
of distinct column indices (possibly empty) and 
c € P is another column index. If the length of 
P is less than or equal to k, then (P, c) is called 
ak-relation. P is the predictor sequence and c is 
the predictee. 


Definition 2 Given a dependency relation 
(P,c), the dependency transform dtp(c) of c 
is formed by permuting column T[c] based on 
the permutation induced by a stable sort of the 
rows of P. 


Definition 3 A collection D of dependency re- 
lations for table T is said to be a k-transform if 
and only if (a) each column of T appears exactly 
once as a predictee in some dependency relation 
(P,c), (b) the dependency hypergraph G(D) is 
acyclic, and (c) each dependency relation (P, c) 
is a k-relation. 
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Let w(P, c) be the cost of the dependency rela- 
tion (P, c), and let 6(7m) be an upper bound on the 
cost of computing w(P,c). Intuitively, w(P, c) 
gives an estimate of how well a rearrangement 
of column c will compress, using the rows of P 
as contexts for its symbols. We will provide an 
example after the formal definitions. 


Problem 3 Find a k-transform D of minimum 
costw(D)= > a@(P,c). 
(P,c)ED 

Definition | extends to columns the notion of 
context that is well known for strings. Defini- 
tion 3 defines a microtransformation that reorga- 
nizes the column symbols by grouping together 
those that have similar contexts. The context of a 
column symbol is given by the corresponding row 
in T[P]. The fundamental ideas here are the same 
as in the Burrows and Wheeler transform [5]. 
Finally, Problem 3 asks for an optimal strategy to 
reorganize the data prior to compression. The cost 
function w provides an estimate of how well c can 
be compressed using the knowledge of T[P]. 

Vo and Vo [18] connect these ideas to the 
conditional entropy of random variables. Let S 
be a sequence, A(S) its distinct elements, and 
Fa the frequency of each element a. The zeroth- 
order empirical entropy of S [15] is 


Sa 
>, fale rer 


ace A(S) 


1 
Ho(S) = ~ TS] 


and the modified zeroth-order empirical entropy 
[15] is 


0 if|S| = 0, 
Ay (S) = 4 (1+ Ig|S|)/|S|_ if/S| A Oand Ho(S) = 0,. 
Ao(S) otherwise. 
For a dependency relation (P,c) with 
nonempty P, the modified conditional empirical Hi(c)= a H* (pe) 
entropy of c given P is then defined as m 2 Pel Ho (Pe 


peA(T[P]) 
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where (, is the string formed by catenating the 
symbols in c corresponding to positions of p 
in T[P] [15]. A possible choice of w(P,c) is 
given by H}(c). Vo and Vo also develop another 
notion of entropy, called run length entropy, to 
approximate more effectively the compressibility 
of low-entropy columns and define another cost 
function w accordingly. 


Key Results 


Combinatorial Dependency 

Problem | admits a polynomial-time algorithm, 
based on dynamic programming. Using the def- 
inition of combinatorial dependency, one can 
show: 


Theorem 1 ([3]) Let E[i] be the cost of an opti- 
mal, contiguous partition of T[1, i]. E[n] is thus 
the cost of a solution to Problem 1. Define E|0] = 
0; then, for1 <i <n, 


Eli] = min, EU] + He(Ty 41... T)- 


The actual partition with cost E[n] can be main- 
tained by standard backtracking. 


The only known algorithmic solution to 
Problem 2 is the trivial one based on enumerating 
all possible feasible solutions to choose an 
optimal one. Some efficient heuristics based 
on asymmetric TSP, however, have been 
devised and tested experimentally [4]. Define 
a weighted, complete, directed graph, G(T), 
with a vertex 7; for each column T[i] € T; 
the weight of edge {7;,7;} is w(7;, Tj) = 
min(He¢(7;, Tj), He(Ti) + He(7j)). One then 
generates a set of tours of various weights by 
iteratively applying standard optimizations (e.g., 
3-opt, 4-opt). Each tour induces an ordering of 
the columns, which are then optimally partitioned 
using the dynamic program (1). 

Buchsbaum et al. [4] also provide a general 
framework for studying the computational com- 
plexity of several variations of table compression 
problems based on notions analogous to combi- 
natorial dependence, and they give some initial 
MAX-SNP-hardness results. Particularly relevant 
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is the set of abstract problems in which one is 
required to find an optimal arrangement of a set 
of strings to be compressed, which establishes a 
nontrivial connection between table compression 
and the classical shortest common superstring 
problem [2]. Giancarlo et al. [11] connect table 
compression to the Burrows and Wheeler trans- 
form [5] by deriving the latter as a solution to an 
analog of Problem 2. 


Column Dependency 


Theorem 2 ([{17,18]) Fork > 2, Problem 3 is 
NP-hard. 


Theorem 3 ({17, 18]) An optimum 1-transform 
for a table T can be found in O(n?8(m)) time. 


Theorem 4 ({17, 18]) A 2-transform can be 
computed in O(n?8(m)) time. 


Theorem 5 ([18]) For any dependency relation 
(P,c) and some constant «€,|C(dtp(c))| < 
SmHj(c) + €. 


Motifs 

Apostolico et al. [1] propose improved versions 
of Table Compression based on Motifs, ie., 
regular expressions characterizing a set of 
templates based on which the rows of a table 
are compressed by textual substitution. They 
also discuss applications of the technique in 
Computational Biology. 


Applications 


Storage and transmission of alphanumeric tables. 
Moreover, the Citing Articles of the papers in [1, 
3,4, 17,18] in Google Scholar provide a full range 
of applications and related work. 


Open Problems 


All the techniques discussed use the general 
paradigms of context-dependent data rearrange- 
ment for compression boosting. It remains open 
to apply these paradigms to other domains, e.g., 
XML data [13, 14], where high-level structures 
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can be exploited, and to domains where pertinent 
structures are not known a priori. 


Experimental Results 


Buchsbaum et al. [3] showed that optimal parti- 
tioning alone (no column rearrangement) yielded 
about 55 % better compression compared to gzip 
on telephone usage data, with small training sets. 
Buchsbaum et al. [4] experimentally supported 
the hypothesis that good TSP heuristics can ef- 
fectively reorder the columns, yielding additional 
improvements of 5—20 % relative to partitioning 
alone. They extended the data sets used to include 
other tables from the telecom domain as well as 
biological data. Vo and Vo [17,18] showed further 
10-35 % improvement over these combinatorial 
dependency methods on the same data sets. 


Data Sets 


Some of the data sets used for experimentation 
are public [4]. 


URL to Code 


The pzip package, based on combinatorial de- 
pendency, is available at http://www.research.att. 
com/~gsf/pzip/pzip.html. The Vcodex package, 
related to invertible transforms, is available 
at http://www.research.att.com/~gsf/download/ 
ref/vcodex/vcodex.html. Although for the time 
being Vcodex does not include procedures to 
compress tabular data, it is a useful toolkit for 
their development. 


Cross-References 
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Problem Definition 


Consider a random allocation of m balls to n 
bins where each ball is placed in a bin chosen 
uniformly and independently. The properties of 
the resulting distribution of balls among bins 
have been the subject of intensive study in the 
probability and statistics literature [3, 4]. In com- 
puter science, this process arises naturally in ran- 
domized algorithms and probabilistic analysis. 
Of particular interest is the occupancy problem 
where the random variable under consideration is 
the number of empty bins. 

In this entry a series of bounds are presented 
(reminiscent of the Chernoff bound for binomial 
distributions) on the tail of the distribution of the 
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number of empty bins; the tail bounds are suc- 
cessively tighter, but each new bound has a more 
complex closed form. Such strong bounds do not 
seem to have appeared in the earlier literature. 


Key Results 


The following notation in presenting sharp 
bounds on the tails of distributions. The notation 
F~G will denote that F = (1+o0(1))G; 
further, F =x G will denote that In F ~ InG. 
The proof that f = g, is used for the purposes 
of later claiming that 2 = 28. These asymptotic 
equalities will be treated like actual equalities 
and it will be clear that the results claimed are 
unaffected by this “approximation’’. 

Consider now the probabilistic experiment of 
throwing m balls, independently and uniformly, 
into n bins. 


Definition 1 Let Z be the number of empty bins 
when m balls are placed randomly into n bins, and 
define r = m/n. Define the function H(m,n, z) 
as the probability that Z = z. The expectation of 
Z is given by 


1 m 
w=E[Z] =n (:--) ~ne’, 
n 


The following three theorems provide the bounds 
on the tail of the distribution of the random 
variable Z. The proof of the first bound is based 
on a martingale argument. 


Theorem 1 (Occupancy Bound 1) For any 
6>0, 


6? u?(n — 5) 
P {|Z — p| = Ou] < 2exp (-7ee oP 


Remark that for large r this bound is asymptoti- 
cally equal to 


67e-7"n 
2 exp (- >) : 
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The reader may wish to compare this with 
the following heuristic estimate of the tail 
probability assuming that the distribution of 
Z is well approximated by the approximating 
normal distribution also far out in the tails 
[3, 4]. 


62e-Tn 


arial : 


P(|Z—p| > Ou) <200( 


The next two bounds are in terms of point 
probabilities rather than tail probabilities (as 
was the case in the Binomial Bound), but the 
unimodality of the distribution implies that 
the two differ by at most a small (linear) 
factor. These more general bounds on the point 
probability are essential for the application to 
the satisfiability problem. The next result is 
obtained via a generalization of the Binomial 
Bound to the case of dependent Bernoulli 
trials. 


Theorem 2 (Occupancy Bound 2) For é@ > —1, 


H(m,n,(1 + @)) < exp(— (C1 + 6) In[1 + 6] — 4) ws). 
In particular, for —1 < 0 < 0, 
62 
H(m,n, (1+ @)) < exp (-=) . 


The last result is proved using ideas from large 
deviations theory (Weiss A (1993) Personal Com- 
munication). 


Theorem 3 (Occupancy Bound 3) For |z — | 
= Q(n), 


H (m,n,z) X 


oo(|= (f° nf — rnt)]) 


where k is defined implicitly by the equation z = 
n(l—k(1— e77/*)), 
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Applications 


Random allocations of balls to bins is a ba- 
sic model that arises naturally in many areas 
in computer science involving choice between 
a number of resources, such as communication 
links in a network of processors, actuator devices 
in a wireless sensor network, processing units 
in a multi-processor parallel machine etc. For 
such situations, randomization can be used to 
“spread” the load evenly among the resources, an 
approach particularly useful in a parallel or dis- 
tributed environment where resource utilization 
decisions have to be made locally at a large num- 
ber of sites without reference to the global impact 
of these decisions. In the process of analyzing 
the performance of such algorithms, of partic- 
ular interest is the occupancy problem where 
the random variable under consideration is the 
number of empty bins (i.e., machines with no 
jobs, routes with no load, etc.). The properties 
of the resulting distribution of balls among bins 
and the corresponding tails bounds may help in 
order to analyze the performance of such algo- 
rithms. 
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Problem Definition 


Technology mapping is the problem of 
implementing a sequential circuit using the gates 
of a particular technology library. It is an integral 
component of any automated VLSI circuit 
design flow. In the prototypical chip design flow, 
combinational logic gates and sequential memory 
elements are composed to form sequential 
circuits. These circuits are subject to various logic 
optimizations to minimize area, delay, power, 
and other performance metrics. The resulting 
optimized circuits still consist of primitive logic 
functions such as AND and OR gates. The next 
step is to efficiently realize these circuits in a 
specific VLSI technology using a library of gates 
available from the semiconductor vendor. Such a 
library would typically consist of gates of varying 
sizes and speeds for primitive logic functions 
(AND and OR) and more complex functions 
(exclusive-OR, multiplexer). However, a naive 
translation of generic logic elements to gates in 
the library will fall short of realistic performance 
goals. The challenge is to construct a mapping 
that maximally utilizes the gates in the library to 
implement the logic function of the circuit and 
achieve some performance goal, for example, 
minimum area with the critical path delay less 
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than a target value. This is accomplished by 
technology mapping. For the sake of simplicity, 
in the following discussion, it is presumed that 
the sequential memory elements are stripped 
from the digital circuit and mapped directly into 
memory elements of the particular technology. 
Then, only Boolean circuits composed of 
combinational logic gates remain to be mapped. 
Further, each remaining Boolean circuit is 
necessarily a directed acyclic graph (DAG). 

The technology mapping problem can be re- 
stated in a more general graph-theoretic setting: 
find a minimum cost covering of the subject graph 
(Boolean circuit) by choosing from the collection 
of pattern graphs (gates) available in a library. 
The inputs to the problem are: 


(a) Subject graph: This is a directed acyclic graph 
representation of a Boolean circuit expressed 
using a set of primitive functions (e.g., 2- 
input NAND gates and inverters). An example 
subject graph is shown in Fig. 1. 

Library of pattern graphs: This is a collection 
of gates available in the technology library. 
The pattern graphs are also DAGs expressed 
using the same primitive functions used to 
construct the subject graph. Additionally, each 
gate is annotated with a number of values for 
different cost functions, such as area, delay, 
and power. An example library and associated 
cost model is shown in Fig. 2. 


(b 


wm 


A valid cover is a network of pattern graphs 
implementing the function of the subject graph 
such that (a) every vertex (i.e., gate) of the subject 
graph is contained in some pattern graph and (b) 
each input required by a pattern graph is actually 
an output of some other pattern graph (i.e., the 
inputs of a gate must exist as outputs of other 
gates). Technology mapping can then be viewed 
as an optimization problem to find a valid cover 
of minimum cost of the subject graph. 


Key Results 
To be viable in a realistic design flow, an 


algorithm for minimum cost graph-covering 
for technology mapping should ideally possess 
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Technology Mapping, 
Fig. 1 Subject graph 
(DAG) of a Boolean circuit 
expressed using NAND2 
and INVERTER gates 


Technology Mapping, Gate 

Fig. 2 Library of pattern 

graphs (composed of 

NAND2 and INVERTER INVERTER 

gates) and associated costs 
NAND2 
NAND3 
AND-OR- 
INVERT-21 
AND-OR- 
INVERT-22 


the following characteristics: (a) the algorithm 
should be easily adaptable to diverse libraries 
and cost models; if the library is expanded or 
replaced, the algorithm must be able to utilize the 
new gates effectively; (b) it should allow detailed 
cost models to accurately represent the perfor- 
mance of the gates in the library; and (c) it should 
be fast and robust on large subject graph instances 
and large libraries. One technique for solving 
the minimum cost graph-covering problem is 
to formulate it as a binate-covering problem, 
which is a specialized integer linear program 
[10]. However, binate-covering for a DAG is 
NP-Hard for any set of primitive functions and is 
typically unwieldy on large circuits. The DAGON 
algorithm suggested solving the technology 
mapping problem through DAG-covering and 
advanced an alternate approach for DAG- 


— 


2201 


Pe 


+ 


Pattern Graph 


dy 


covering based on a tree-covering approximation 
that produced near-optimal solutions for practical 
circuits and was very fast even for large circuits 
and large libraries [7]. 

DAGON was inspired by prevalent techniques 
for pattern matching employed in the domain 
of code generation for programming language 
compilers [1]. The fundamental concept was 
to partition the subject graph (DAG) into 
a forest of trees and solve the minimum 
cost covering problem independently for 
each tree. The approach was motivated by 
the existence of efficient dynamic program- 
ming algorithms for optimum tree-covering 


[2]. The three salient components of the 
DAGON algorithm are (a) subject graph 
partitioning, (b) pattern matching, and (c) 
covering. 
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(a) Subject graph partitioning: To apply the tree- 
covering approximation the subject graph is 
first partitioned into a forest of trees. One 
approach is to break the graph at each vertex 
which has an out-degree greater than | (mul- 
tiple fan-out point). The root of each tree is 
the primary output of the corresponding sub- 
circuit and the leaves are the primary inputs. 
Other heuristic partitions of the subject graph 
that consider duplication of vertices can also 
be applied to improve the quality of the final 
cover. Alternate subject graph partitions can 
also be derived starting from different decom- 
positions of the original Boolean circuit in 
terms of the primitive functions. 

(b) Pattern matching: The optimum covering of a 
tree is determined by generating the complete 
set of matches for each vertex in the tree (i.e., 
the set of pattern graphs which are candidates 
for covering a particular vertex) and then 
selecting the optimum match from among 
the candidates. An efficient approach for 
structural pattern matching is to reduce the 
tree matching problem to a string matching 
problem [2]. Fast string matching algorithms, 
such as the Aho-Corasick and the Knuth- 
Morris-Pratt algorithms, can then be used to 
find all strings (pattern graphs) which match 
a given vertex in the subject graph in time 
proportional to the length of the longest string 
in the set of pattern graphs. Alternatively, 
Boolean matching techniques can be used to 
find matches based on logic functions [5]. 
Boolean matching is slower than structural 
string matching, but it can compute matches 
independent of the actual local decomposi- 
tions and under different input permutations. 

(c) Covering: The final step is to generate a 
valid cover of the subject tree using the 
pattern graph matches computed at each 
vertex. Consider the problem of finding a 
valid cover of minimum area for the subject 
tree. Every pattern graph in the library has an 
associated area and the area of a valid cover 
is the sum of the area of the pattern graphs 
in the cover. The key property that makes 
minimum area tree-covering efficient is this: 
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the minimum area cover of a tree rooted at 
some vertex v can be computed using only 
the minimum area covers of vertices below 
v. It follows that for every pattern graph 
that matches the tree rooted at vertex v, the 
area of the minimum cover containing that 
match equals the sum of the area of the 
corresponding match at v and the sum of the 
areas of the optimal covers of the vertices 
which are inputs to that match. This property 
enables a dynamic programming algorithm to 
compute the minimum area cover of the tree 
rooted at each vertex of the subject tree. The 
base case is the minimum area cover of a leaf 
(primary input) of the subject tree. The area 
of a match at a leaf is set to 0. A recursive 
formulation of this dynamic programming 
concept is summarized in the Algorithm min- 
imum_area_tree_cover shown below. As an 
example, the minimum area cover displayed 
in Fig. 3 is a result of applying this algorithm 
to the tree partitions of the subject graph from 
Fig. | using the library from Fig. 2. 


Given a vertex v in the subject tree, let M(v) 
denote the set of candidate matches from the 
library of pattern graphs for the sub-tree rooted 
at v. 

In this algorithm, each vertex in the tree is 
visited exactly once. Hence, the complexity of 
the algorithm is proportional to the number of 
vertices in the subject tree times the maximum 
number of pattern matches at any vertex. The 
maximum number of matches is a function of the 
pattern graph library and is independent of the 
subject tree size. As a result, the complexity of 
computing the minimum cost valid cover of a tree 
is linear in the size of the subject tree, and the 
memory requirements are also linear in the size 
of the subject tree. The algorithm computes the 
optimum cover when the subject graph is a tree. 
In the general case of the subject graph being a 
DAG, empirical results have shown that the tree- 
covering approximation yields industrial-quality 
results achieving aggressive area and timing re- 
quirements on large real circuit design problems 
[6, 12]. 
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Technology mapping is the key link between 
technology-independent logic synthesis and 
technology-dependent physical design of VLSI 
circuits. This motivates the need for efficient and 
robust algorithms to implement large Boolean 
circuits in a technology library. Early algorithms 
for technology mapping were founded on 
rule-based local transformations [4]. DAGON 
was the first in advancing an algorithmic 
foundation in terms of graph transformations 
that was practicable in the inner loop of iterative 
procedures in the VLSI design flow [7]. From 
a theoretical standpoint, the graph-covering 
formulation provided a formal description of 
the problem and specified optimality criteria 
for evaluating solutions. The algorithm was 
naturally adaptable to diverse libraries and cost 
models, and was relatively easy to implement 
and extend. The concept of partitioning the 
subject graph into trees and covering the trees 
optimally was effective for varied optimization 
objectives such as area, delay, and power. The 
DAGON approach has been incorporated in 


at Berkeley [11]) and industrial (Synopsys™ 
Design Compiler) tool offerings for logic 
synthesis and optimization. 

The graph-covering formulation has also 
served as a starting point for advancements in 
algorithms for technology mapping over the last 
decade. Decisions related to logic decomposition 
were integrated in the graph-covering algorithm, 
which in turn enabled technology independent 
logic optimizations in the technology mapping 
phase [9, 14]. Similarly, heuristics were proposed 
to impose placement constraints and make 
technology mapping more aware of the physical 
design and layout of the final circuit [8]. To 
combat the problem of high power dissipation 
in modern sub-micron technologies, the graph 
algorithms were enhanced to minimize power 
under area and delay constraints [13]. Special- 
izations of these graph algorithms for technology 
mapping have found successful application in 
design flows for Field Programmable Gate Array 
(FPGA) technologies [3, 15]. We recommend the 
following works for a comprehensive treatment 
of algorithms for technology mapping and a 
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Algorithm minimum_area_tree_cover (Vertex v) { 
// the algorithm minimum_area_tree_cover finds an 
optimal cover of the tree rooted at Vertex v 

// the algorithm computes best_match(v) and 
area_of_best_match(v), which denote the best 
pattern graph match at v and the associated area of 
the optimal cover of the tree rooted at v, respectively 
// check if v is a leaf of the tree 

if (v is a leaf) { 

area_of_best_match(v) = 0; 

best_match(v) = leaf; 

return; 

} 

// compute optimal cover for each input of v 
foreach (input of Vertex v) { 
minimum_area_tree_cover(input); 

} 

// each tree rooted at each input of v is now 
annotated with its optimal cover 

//find the optimal cover of the tree rooted at Vertex v 
area_of_best_match(v) = INFINITY; 

best_match(v) = NULL; 

foreach (Match m in the set of matches M(v)) { 

// compute the area of match m at Vertex v 

// area_of_match(v,m) denotes the area of the cover 
when Match m is selected for v 

area_of_match(v,m) = area(m); 

foreach input pin vi of match m { 

area_of_match (v,m) = area_of_match(v,m) + 
area_of_best_match(vi) 

} 

// update best pattern graph match and associated 
area of the optimal cover at Vertex v 

if (area_of_match(v,m) < area_of_best_match(v)) { 
area_of_best_match(v) = area_of_match(v,m); 
best_match(v) = m; 

} 

} 

} 


survey of new developments and challenges in 
the design of modern VLSI circuits: [5,6, 12]. 


Open Problems 


The enduring problem with DAGON-related 
technology mappers is handling non-tree pattern 
graphs that arise from modeling circuit elements 
such as multiplexors, Exclusive-Ors, or memory- 
elements (e.g., flip-flops) with associated logic 
(e.g., scan logic). On the other hand, approaches 
that do not use the tree-covering formulation 
face challenges in easily representing diverse 
technology libraries and in matching the subject 
graph in a computationally efficient manner. 
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Problem Definition 


Suppose there are two spatially separated par- 
ties Alice and Bob and Alice wants to send 
a quantum state consisting of n quantum bits 
(qubits) p to Bob. Since classical communication 
is much more reliable, and possibly cheaper, than 
quantum communication, it is desirable that this 
task be achieved by communicating just clas- 
sical bits. Such a procedure is referred to as 
teleportation. 


Unfortunately, it is easy to argue that this is in 
fact not possible if arbitrary quantum states need 
to be communicated faithfully. However, Bennett, 
Brassard, Crepeau, Jozsa, Peres, and Wootters [8] 
presented a nice solution to it by modifying the 
assumptions about the resources that are available 
to Alice and Bob. 
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Key Results 


Let {|0),|1)} be the standard basis for the state 
space of one quantum bit (which is equal to 
C?). For simplicity of notation |0) @ |0) are 
represented as |0}|0) or simply |00). An EPR pair 
is a special two-qubit quantum state defined as 
|v) = +, (J00) + |11)). 

Alice and Bob are said to share an EPR 
pair if each holds one qubit of the pair. In this 
article a standard notation is followed in which 
classical bits are called “cbits” and shared EPR 
pairs are called “ebits.” Bennett et al. showed the 
following: 


Theorem 1 Teleportation of an arbitrary n- 
qubit state can be achieved with 2n cbits and n 
ebits. 


These shared EPR pairs are referred to as 
prior entanglement to the protocol since they are 
shared at the beginning of the protocol (before 
Alice gets her input state) and are independent 
of Alice’s input state. This solution is a good 
compromise since it is conceivable that Alice 
and Bob share several EPR pairs at the begin- 
ning, when they are possibly together, in which 
case they do not require a quantum channel. 
Later they can use these EPR pairs to transfer 
several quantum states when they are spatially 
separated. 


Let us now see how Bennett et al. [8] achieve 
teleportation. Let us first note that in order to 
show Theorem 1, it is enough to show that a 
single qubit, which is possibly a part of a larger 
state p, can be teleported, while preserving its 
entanglement with the rest of the qubits of p, 
using 2 cbits and | ebit. Let us also note that the 
larger state p can now be assumed to be a pure 
state without loss of generality. 


Theorem 2 Let |~)4B = 4olfo)aB|0)a + 
41|¢1)4B\|1)4, where ao,a, are complex 
numbers with \ao|* + |ai|?_ = 1. Subscripts 
A, B (representing Alice and Bob, respectively) 
on qubits signify their owner. 

It is possible for Alice to send two classical 
bits to Bob such that at the end of the protocol 
the final state is ao\0) aB|9)B + 41|¢1) Bl!) B.- 
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Proof For simplicity of notation, let us assume 
below that |¢o) 4p and |¢1) 4g do not exist. The 
proof is easily modified when they do exist by 
tagging them along. Let an EPR pair |W) 4p = 
75 ((0).410) 2 + |1)4|1)gB) be shared between 


|00) 4B = |¢)aBlW) 4B 


= (aal0ha +anlt)4)( 
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Alice and Bob. Let us refer to the qubit under 
concern that needs to be teleported as the input 
qubit. 

The combined 
qubits is 


starting state of all the 


1 


S5(10)410} + I1)alt)2)) 


Let CNOT (controlled-not) gate be a two-qubit 
unitary operation described by the operator 
|00) (00| + |01)(O1| + [11)(10] + |10)(11]. Alice 
now performs a CNOT gate on the input qubit 
and her part of the shared EPR pair. The resulting 
State is then 


(61) 45 = 10). (10) 410)z + I1)all)) 


+ Fall)a ((1)l0)a + l0)alt)a) 


Let the Hadamard transform be a single- 
qubit unitary operation with operator 5 ((0) + 
|1)){O| + 35 (0) — |1))(1|. Alice next performs 
a Hadamard transform on her input qubit. The 
resulting state then is 


|02) 4B =F.((0)4+11) 4)(10) 410) 2-+11) al) 2) 
+5 +(10)4—I1) 4)(11) 410) 2+10).4|1) 2) 


= 5 (100) 4(aol0) + ail) ) 
+ [01)a(aolt)a + a110)2)) 
+ 5 (110) (a0) ~ ail) 
+ |11) (aol) — a10)a)) 


Alice next measures the two qubits in her posses- 
sion in the standard basis for C* and sends the 
result of the measurement to Bob. 

Let the four Pauli gates be the single-qubit 
unitary operations: identity, Poo = |0)(0| + 
|1)(1|; bit flip, Poy = |1)(0| + |0)(1|; phase flip, 


Pio = |0)(0| — |1)(1|; and bit flip together with 
phase flip, Pi; = |1)(0| — |0)(1|. On receiving 
the two bits coc, from Alice, Bob performs the 
Pauli gate Peoc, on his qubit. It is now easily 
verified that the resulting state of the qubit with 
Bob would be ao|0) g+a1|1) 8. The input qubit is 
successfully teleported from Alice to Bob! Please 
refer to Fig. | for the overall protocol. i 


Super-Dense Coding 

Super-dense coding [22] protocol is a dual to 
the teleportation protocol. In this protocol, Alice 
transmits 2 cbits of information to Bob using 1 
qubit of communication and 1 shared ebit. It is 
discussed more elaborately in another article in 
the encyclopedia. 


Lower Bounds on Resources 

The above implementation of teleportation re- 
quires 2 cbits and | ebit for teleporting 1 qubit. 
It was argued in [8] that these resource require- 
ments are also independently optimal. That is, 2 
cbits need to be communicated to teleport a qubit 
independent of how many ebits are used. Also 1 
ebit is required to teleport one qubit independent 
of how much (possibly two-way) communication 
is used. 


Remote State Preparation 

Closely related to the problem of teleportation is 
the problem of remote state preparation (RSP) 
introduced by Lo [21]. In teleportation Alice is 
just given the state to be teleported in some input 
register and has no other information about it. 
In contrast, in RSP, Alice knows a complete 
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Input qubit |@> 


Bell state 


ly> 
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Teleportation of Quantum States, Fig. 1 Teleportation protocol. H represents Hadamard transform and M 


represents measurement in the standard basis for C+ 


description of the input state that needs to be 
teleported. Also in RSP, Alice is not required to 
maintain any correlation of the input state with 
the other parts of a possibly larger state as is 
achieved in teleportation. The extra knowledge 
that Alice possesses about the input state can 
be used to devise protocols for probabilistically 
exact RSP with one cbit and one ebit per qubit 
asymptotically [9]. In a probabilistically exact 
RSP, Alice and Bob can abort the protocol with 
a small probability; however, when they do not 
abort, the state produced with Bob at the end of 
the protocol is exactly the state that Alice intends 
to send. 


Teleportation as a Private Quantum 

Channel 

The teleportation protocol also satisfies an in- 
teresting privacy property as follows. If there 
was a third party, say Eve, having access to the 
communication channel between Alice and Bob, 
then Eve learns nothing about the input state 
of Alice that she is teleporting to Bob. This is 
because the distribution of the classical messages 
of Alice is always uniform, independent of her 
input state. Such a channel is referred to as a 
private quantum channel [2, 11, 18]. 


Quantum State Redistribution 

The teleportation protocol is a part of a wide 
range of information theoretical tasks. A more 
general task, referred to as quantum state re- 
distribution, is as follows. Three parties, Alice, 


Bob, and Referee, share a joint quantum state 
|W) acer, where A,C registers are with Alice, 
B is with Bob, and R is with Referee. The 
task is to transfer register C to Bob. In the 
asymptotic setting (in this limit of infinite copies 
of the input state), it was shown by [14, 26] 
that the number of cbits to be transmitted is 
I(R : C|B) (quantum mutual information be- 
tween R and C conditioned on B). A sub-task 
of quantum state redistribution in which register 
A is not present is referred to as quantum state 
merging, which was used by [17] to give an 
operational interpretation to negative quantum 
conditional entropy. Another sub-task in which 
register B is not present is referred to as quantum 
state splitting. These protocols have also been 
well studied in the single-shot setting where a 
single copy of the quantum state is available 
[1,4,5, 13]. 


Applications 


Apart from the main application of transport- 
ing quantum states over large distances using 
only classical channel, the teleportation protocol 
finds other important uses as well. A generaliza- 
tion of this protocol to implement unitary oper- 
ations [12] is used in fault-tolerant computation 
in order to construct an infinite class of fault- 
tolerant gates in a uniform fashion. In another 
application, a form of teleportation called as 
the error correcting teleportation, introduced by 


2208 


Knill [19], is used in devising quantum circuits 
that are resistant to very high levels of noise. 

Ideas from quantum teleportation form the 
basis of measurement-based models of quan- 
tum computation. Starting from an arbitrary state 
lv) = alO) + bj1), and an ancilla |+) = 
5 ((0) + |1)), apply the controlled-Z gate on 
the two qubits to obtain a|0, +) + b|1,—) (here 
|-) = 5 ((0) — |1))). Measuring the first qubit 
in {|+), |—)} basis gives the state X”"|y), where 
m € {0, 1} is the measurement outcome. In par- 
ticular, if a phase gate Zg acted after controlled- 
Z unitary, then outcome would be X”Zg|W). 
Thus, the information about the state |y) is still 
preserved after the measurement. By preparing 
large arrays of standard entangled states, and per- 
forming single-qubit unitary measurement, one 
can hence simulate any quantum circuit, up to 
unitaries that depend on measurement outcomes. 
More details can be found in [23]. 

This protocol can in particular be used to per- 
form a blind quantum computation. Given a state 
|v), Alice can apply a random Pauli operator on 
it and obtain the state |y’). Then the state is sent 
to Bob, who uses the idea in previous paragraph 
to realize a desired phase gate Zg on the state. 
The input state to Bob is completely random, yet 
a unitary upto measurement outcome is realized 
by Bob, who then sends the state back to Alice. 
Since Alice knows which Pauli operation she 
applied, she can recover back the state Zg|W). 
More details can be found in [6, 16]. 


Experimental Results 


Teleportation protocol has been experimentally 
realized in various different forms, to name a few, 
by Boschi et al. [3] using optical techniques, by 
Bouwmeester et al. [10] using photon polariza- 
tion, by Nielsen et al. [24] using Nuclear mag- 
netic resonance (NMR), and by Ursin et al. [25] 
using photons for long distance. 

Krauter et al. [20] have achieved the teleporta- 
tion of a complicated quantum state: a continuous 
variable state stored in the collective spin of 
an atomic ensemble. Unlike qubits that can be 
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measured only in the 0 or | state, the outcome 
of a continuous variable measurement is a real 
number, like the position and momentum. The 
quantum states prepared and teleported in the 
experiment are actually similar to the coherent 
states of a harmonic oscillator — describing a par- 
ticle in a harmonic potential well moving under 
the influence of a classical force. 

Majorana bound states are localized zero- 
energy excitations of a superconductor. An 
isolated Majorana bound state is an equal 
superposition of electron and hole excitations and 
therefore not a fermionic state. Instead, two spa- 
tially separated Majorana bound states together 
make one zero-energy fermion level which can be 
either occupied or empty. This defines a two-level 
system which can store quantum information 
nonlocally, as needed to realize topological 
quantum computation. In [15] a nonlocal electron 
transfer process due to Majorana bound states in 
a mesoscopic superconductor is predicted. An 
electron which is injected into one Majorana 
bound state can go out from another one far apart 
maintaining phase coherence. The transmission 
phase shift is independent of the distance 
“traveled.” In this sense this phenomenon can 
be called “electron transportation.” 

In summary this work reveals a striking nonlo- 
cal electron transport phenomenon through Ma- 
jorana bound states in a finite-sized supercon- 
ductor with charging energy. Most interestingly, 
the transmission phase shift detects the state of a 
qubit made of two spatially separated Majorana 
bound states. 

In Baur et al. [7] have benchmarked a telepor- 
tation algorithm by tomographic reconstruction 
of the three-qubit entangled state generated by 
the circuit up to the single-qubit measurements. 
Using an entanglement witness, they showed that 
this state has genuine tripartite entanglement. 
This technique presents an important step toward 
making use of teleportation in quantum proces- 
sors realized in superconducting circuits. 
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Problem Definition 


Self-assembly is a process by which a small 
number of fundamental components automati- 
cally coalesce to form a target structure. In 1998, 
Winfree [10] introduced the abstract Tile As- 
sembly Model (aTAM) as a deliberately over- 
simplified, discrete mathematical model of the 
DNA tile self-assembly pioneered by Seeman [6]. 
The aTAM “effectivizes” classical Wang tiling 
[9] in the sense that the former augments the 
latter with a mechanism for sequential “growth” 
of a tile assembly. Very briefly, in the aTAM, the 
fundamental components are un-rotatable, trans- 
latable square “tile types” whose sides are labeled 
with (alpha-numeric) glue “colors” and (integer) 
“strengths.” Two tiles that are placed next to each 
other bind if the glues on their abutting sides 
match in both color and strength, and the com- 
mon strength is at least a certain (integer) “tem- 
perature.” Self-assembly starts from a “seed” tile 
type, typically assumed to be placed at the origin 
of the coordinate system, and proceeds nondeter- 
ministically and asynchronously as tiles bind to 
the seed-containing assembly one at a time. 

The multiple temperature model [2, 3, 8] is 
a natural generalization of the aTAM, where 
the temperature of a tile system is dynamically 
adjusted by the experimenter as self-assembly 
proceeds. In the multiple temperature model, a 
tile assembly system (TAS) is defined as an 
ordered triple 7 = (7. on ae ); where T is 
a tile set, o is a “seed assembly,” and the third 
component Gy is a sequence of nonnegative 
integer temperatures. 

Intuitively, self-assembly in the multiple tem- 
perature TAS 7 is carried out in k phases. In 
the first temperature phase, tiles are added to the 
existing assembly as they normally would be in 
the aTAM until a to-stable terminal assembly is 
reached. In phase two, tiles can accrete to the 
existing assembly if they can do so with at least 
strength t,. Also, and at any time during the 
second temperature phase, if there is ever a cut of 
the assembly having a strength less than T,, then 
all of the tiles on the side of the cut not containing 
the seed can be removed from the assembly. 
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When a 7,-stable terminal assembly is reached in 
phase two, phase three begins and proceeds in a 
similar fashion. This process continues through 
the final temperature phase in which tiles are 
added or removed with respect to the temperature 
Te—1 until reaching a t,_,-stable terminal assem- 
bly. See Fig. 1 for an example of this process. 


Problem 1 (Reducing tile complexity for the 
self-assembly of shapes through temperature 
programming) Given a shape X C Z?, finda 
VAST ={To.ey 4 
self-assembles in J and |7'| and & are minimal. 

In some cases, it is sufficient to uniquely self- 
assemble a scaled-up version of X, i.ec., X¥° = 
{(x, yyeZ | (|= | , | =|) € Ap Intuitively, 
X° is the shape obtained by replacing each point 
in X with ac x c block of points. We refer to 
the natural number c as the scaling factor or 
resolution loss. 


) such that X uniquely 


Key Results 


Thin Rectangles 

Aggarwal, Cheng, Goldwasser, Kao, Moisset de 
Espanés, and Schweller [2] proved that, in the 
aTAM, {2 (45) unique tile types are required 
to uniquely self-assemble a rectangle of size k x 
N, where k < See (this restriction 
makes the rectangle “thin’”). In the same paper, 


the authors reduced this bound to O (wt x) 


in the 2-temperature model. Intuitively, their con- 
struction builds a 7 x N rectangle for an optimal 
value of 7 >> k. Then, when the temperature 
is raised, the top 7 — k rows detach, leaving a 
(stable) k x N rectangle. 


Squares 
In the aTAM, the minimum number of unique 
tile types required to uniquely self-assemble an 


N x N square is O ( log N ) [1]. In 2006, Kao 


log log N 
and Schweller [3] reduced this bound to O(1) 
using O(log N) temperature changes. Their con- 
struction relies on a simple yet ingenious gadget 
called the “bit-flip gadget.” Basically, a bit-flip 
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Temperature Programming in Self-Assembly, Fig. 1 Thick notches are strength 5, and thin notches are strength 1. 


Not all glue labels are shown 


gadget is a constant set of tile types that can 
be programmed, via a carefully chosen sequence 
of temperature values, to build a rectangle of 
constant size that encodes either 0 or 1. Figure | 
gives an example of a simple bit-flip gadget. In 
their main construction, Kao and Schweller first 
self-assemble a sequence of O(log NV) bit-flip 
gadgets, using O(log NV) temperature changes. 
The result is a rectangle of length O(log NV) that 
encodes AN in binary. Finally, the temperature is 
lowered to 2, and a standard square-building tile 
set (i.e., [5], but without the “seed row” tile types) 
is used to fill in the rest of the square. In the 
same paper, Kao and Schweller also prove that 
there is no smooth tradeoff, i.e., fora TAS 7 = 
(7.0; (t1)f=9) 
N x N square, it cannot be the case that |T| = 


0 (ey) and k = o(log N). 


that uniquely self-assembles an 


Scaled Finite Shapes 

In [3], Kao and Schweller posed the following 
question: Is it possible to have a tile set of size 
O(1) that can, via some sequence of temperature 
values, uniquely self-assemble into an arbitrary 
finite shape (as specified by the sequence of 
temperature values)? In 2012, Summers [8] 
investigated this question and _ discovered 
the following: 


1. The answer to the previous question is 
“NO.” It turns out that a tile set of size 
O(1) cannot uniquely  self-assemble an 
arbitrary finite shape via (any number of) 
temperature values. Technically speaking, 
Summers proved that, for every tile set T, 
there exists a finite shape X¥ C Z?, such 
that, for each temperature sequence aye. 


T= (7. 0, (ai)EQ) does not uniquely self- 


assemble X. In the proof, X is always a line 
of length |7| + 1. 


2. Short temperature sequence, big scale 
factor. On the positive side, there exists a 
universal tile set that can be programmed via 
temperature values to build a scaled version 
of an arbitrary finite shape. For instance, 
Summers exhibited a construction in which 
the bit-flip gadget of Kao and Schweller 
[3] is combined with the non-seed portion 
of the optimal shape-building construction 
by Soloveichik and Winfree [7] to get the 
following result: there exists a tile set T 
with |7| O(1), such that, for every 
finite shape X, there exists c € N anda 
temperature sequence (q)Fo with m 
O(K(X)), where K(X) is the Kolmogorov 
complexity of X (see [4]), such that, 
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T= (7. Oo, (%)F2 ) uniquely self-assembles 
xX*, 

3. Long temperature sequence, small scale 
factor. Note that, in the previously mentioned 
construction, the scaling factor can be quite 
large. Technically, the scaling factor c 
depends on the running time of 2, whence 

poly(time(z)). In a truly nanoscale 

setting, it is necessary to have a construction 
in which the scaling factor is always small or, 
better yet, bounded by a constant independent 
of the shape being assembled. Summers gave 

such a construction: there exists a tile set T 

with |T| = O(1), such that, for every finite 

shape X, there exists a temperature sequence 

(x)*o2 with m = O(|X|), such that, 


i. (7. 0, (ri)E=5) uniquely self-assembles 


Cc = 


X?2. This construction utilizes a modified 
bit-flip gadget, in the form of a bit-flip 
“square,” which is essentially a bit-flip gadget 
that can be programmed (via temperature 
values) to follow the directions specified by a 
Hamiltonian path through a shape (not every 
shape has a Hamiltonian path, but every shape 
scaled up by a factor of 2 does). 


Open Problems 


Does there exist a tile set 7, with |7| = O(1) and 
c €N, such that, for every finite shape X, there 
exists a temperature sequence (x))*o4, with m = 
O(K(X)), such that, X° uniquely self-assembles 


inJ = (7. 0, (aiyE=S)? 
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Problem Definition 


A graph is bipartite (or 2-colorable) if its ver- 
tices can be partitioned into two sets such that 
there are no edges between pairs of vertices that 
reside in the same set. 

Given a (simple) graph, the task is to 
determine whether it is bipartite or is “far” 
from being bipartite. Thus, the standard decision 
problem is relaxed by allowing any answer 
when the graph is not bipartite but is “close” 
to some bipartite graph. We focus on dense 
graphs (i.e., for which the number of edges is 
quadratic in the number of vertices) and wish to 
solve the aforementioned “approximate decision” 
problem in constant time, given access to a data 
structure that answers adjacency queries in unit 
time. 

To complete the formulation of the problem, 
we need to define the distance between graphs 
and describe how the graph is accessed. The 
distance between the graphs G; = (V, Fi) 
and Gy = (V,E2) is determined by the 
symmetric difference between their edge sets 
(i.e., E,;AE2), and we say that they are e- 
close (resp., €-far) if |E;AE2| < e€- |V/? 
(resp., |E,; AE2| > € - |V|?). Note that this 
definition is appropriate for dense graphs 
(i.e, when the number of edges is 2(|V|7)), 
whereas any two sparse graphs are deemed to 
be close by it. (An alternative model that is 
more suitable for sparse graphs is presented in 
this encyclopedia’s entry Testing Bipartiteness 
of Graphs in Sublinear Time.) We say that 
G = (V, E) is €-far from a graph property P (i.e., 
a set of graphs that is closed under isomorphism) 
if for every G’ € P it holds that G is e-far 
from G’. 

We consider algorithms that make oracle 
queries to the input graph, denoted G = 
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(V, E). Specifically, the algorithm can perform 
adjacency queries of the form (u,v) € V7’, 
which are answered by 1 if {u,v} € E any by 0 
otherwise. We can now define property testing in 
this model. 


Definition 1 (testing graph properties in the 
dense-graph model) A tester for a graph prop- 
erty P is a randomized algorithm that is given 
as input a size parameter N and a proximity 
parameter € as well as access to an adjacency 
oracle for an N-vertex graph G = ([N], E). The 
tester should output a binary verdict that satisfies 
the following two conditions. 


1. If G € P, then the tester accepts with proba- 
bility at least 2/3. 

2. If G is e-far from P, then the tester accepts 
with probability at most 1/3. 


A tester has one-sided error if it accepts every 
graph in P with probability 1. A tester is non- 
adaptive if it determines all its queries based 
solely on its internal coin tosses (and the param- 
eters N and €); otherwise it is adaptive. 


Here we are interested in the case that P is the set 
of all bipartite graphs, and we seek a tester of time 
complexity that only depends on the proximity 
parameter, denoted ¢€, and is independent of the 
size of the graph. 


Key Results 


Goldreich, Goldwasser, and Ron [9] showed that 
there exists a tester for bipartiteness that, given a 
proximity parameter € and access to an N -vertex 
graph, runs in time poly(1/e). Furthermore, the 
tester is nonadaptive and has one-sided error; that 
is, it always accepts bipartite graphs. 

The fact that there exist properties that can be 
tested in time that only depends on the proximity 
parameter should not come as a surprise. It is 
well known that the average value of a (bounded) 
function defined over a huge domain can be 
approximated up to a factor of 1 + € by taking 
O(1/e€?) samples. This approximation problem 
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can be cast as a property testing problem (even 
as one that refers to graphs in the current model, 
by considering the edge density of a graph). But 
the foregoing approximation problem is highly 
unstructured (i.e., it merely refers to the average 
of values regardless of anything else), whereas 
bipartite graphs are highly structured (and the 
edge density in them is almost arbitrary). 

Turning back to bipartiteness, this property is 
a special case of k-colorability, where k = 2. (A 
graph is k-colorable if its vertices can be parti- 
tioned into k sets such that there are no edges be- 
tween pairs of vertices that reside in the same set.) 
The tester for bipartiteness, and more generally 
for k-colorability (for any k > 2), is very simple. 
It merely selects a sample of poly(1/e) random 
vertices and accepts if and only if the subgraph 
induced by the selected vertices is k-colorable. 
In fact, as shown by Alon and Krivelevich [1] 
(improving on the bounds obtained in [9]), in 
the case of k = 2 (bipartiteness), a sample of 
O(1 /€) vertices suffices, and in general a sample 
of size O(k /€?) is sufficient. (The notation O(q) 
“hides” polylogarithmic factors in q.) 

Clearly, the algorithm always accepts k- 
colorable graphs, and its running time is 
poly(1/e) if k = 2 and exponential in poly(1/e) 
(the sample size) otherwise. The analysis boils 
down to proving that if the graph G is e-far 
from being k-colorable, then it is rejected with 
probability at least 2/3. Below we shall sketch 
the argument for the case of k = 2 and when one 
uses a sample of O(1/e?) random vertices. 

We view the random sample (of vertices) as 
a union of two disjoint sets, denoted U and S, 
where 1 © |U| = O(1/e) and m “ |S| = 
O(t/e). We consider all possible (2-way) parti- 
tions of U and associate a partial partition of V 
with each such partition of U. Specifically, given 
a partition of U, denoted (U;, U2), we place all 
neighbors of U; (resp., of U2) opposite to Uy; 
(resp., U2). Indeed, such a placing is forced if 
we seek a partition of V that is consistent with 
the given partition (U;, U2) of U. One may show 
that, with high probability, most high-degree ver- 
tices in V have at least one neighbor in U, and so 
the partition of these vertices is forced by the par- 
tition of U. Since there are relatively few edges 
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incident to vertices that do not neighbor U, there 
must be many edges that violate the partition in- 
duced by (Uj, U2) (i.e., their endpoints are forced 
to be on the same side of the induced partition). It 
follows that when we take the additional sample 
S and perform all queries on pairs in U x S and 
S'x S, with high probability, we detect a violating 
edge with respect to each of the 2!Y! induced 
partitions, thus ruling out all potential partitions 
of U. This implies that with high probability, the 
subgraph induced by U U S is not bipartite. Let 
us stress the key observation: /t suffices to rule out 
relatively few (partial) partitions of V (i.e., these 
an partitions induced by partitions of U) rather 
than all (2'"!) possible partitions of V. 


Applications 


The procedure employed in the above analysis 
yields a randomized poly(1/e¢)-N-time algorithm 
for 2-partitioning a bipartite graph such that (with 
high probability) at most «N? edges lie within 
the same side. This is done by running the tester, 
determining a partition of U (defined as in the 
proof) that is consistent with the bipartite par- 
tition of U U S, and partitioning V as done in 
the proof (with vertices that do not neighbor U, 
or neighbor both U; and Up, placed arbitrarily). 
Thus, the placement of each vertex is determined 
by inspecting at most O(1/e) entries of the ad- 
jacency matrix. Furthermore, the aforementioned 
partition of U constitutes a succinct representa- 
tion of the 2-partition of the entire graph. All this 
is a typical consequence of the fact that the anal- 
ysis of the tester follows the “enforce-and-test” 
paradigm (see the survey of Ron [12, Sec. 4]). 


Open Problems 


As stated above, a more refined analysis 
yields a nonadaptive tester that inspects the 
subgraph induced by O(1/e) random vertices. 
One can easily show that this result is almost 
optimal with respect to testers that inspect an 
induced subgraph. Furthermore, as Bogdanov 
and Trevisan show [3], a query complexity 
of O(1/e*) is optimal for any nonadaptive 
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tester for bipartiteness, whereas any adaptive 
tester requires Q(e~3/2) queries. This raises 
the question of what is the query complexity of 
adaptive testers for bipartiteness. 

While Goldreich and Trevisan [8] have shown 
that the gap between adaptive and nonadaptive 
testers (in the dense-graph model) is at most 
quadratic, the above question falls within the un- 
certainly left open by the quadratic upper bound. 
Furthermore, Gonen and Ron [10] showed that 
N-vertex graphs of maximum degree O(€N) 
can be tested for bipartiteness in time O(e—3/2), 
whereas the aforementioned lower bound of [3] 
holds also for such graphs. 

The general question of the gap between 
adaptive and nonadaptive testers was studied 
by Goldreich and Ron [7]. They showed that 
there exist graph properties that have an adaptive 
tester that runs in time O(1/e), for which any 
nonadaptive tester requires $2(€~3/) queries. 
They conjectured that there exist graph properties 
that have an adaptive tester that runs in time 
O(1/e), for which any nonadaptive tester 
requires 2(€~*) queries. 


Cross-References 


Testing Bipartiteness of Graphs in Sublinear 
Time 


Comments for the Recommended 
Reading 


The current entry falls within the scope of prop- 
erty testing (see the surveys [5, 6, 11, 12]). A 
general definition of this setting was first put 
forward by Rubinfeld and Sudan [13]. This def- 
inition was further generalized and systemati- 
cally investigated by Goldreich, Goldwasser, and 
Ron [9], who focused on testing graph properties 
in the dense-graph model. Alternative models 
for testing graph properties are discussed in this 
encyclopedia’s entry cited above. 

As already noted, the tester for k-colorability 
described above was suggested and analyzed 
in [9]. A tighter analysis that yields the best 
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bounds known was subsequently provided 


in [1]. Testers for any “graph partition property” 
(including testing that a graph contains a clique 
of certain density or has a bisection of certain 
density) were also presented in [9]. 

The fact that all these (nonadaptive) testers 
operate by inspecting a random induced subgraph 
was shown to be no coincidence in [8]. We also 
mention that the class of graph properties that 
can be tested (in the dense-graph model) within 
complexity that is independent of the size of the 
graph was characterized by Alon et al. [2]. Their 
characterization is related to Szemerédi’s Regu- 
lar Partitions [14]. A different characterization, 
based on graph limits, was proved independently 
by Borgs et al. [4]. 
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Problem Definition 


A graph is bipartite (or 2-colorable) if its ver- 
tices can be partitioned into two sets such that 
there are no edges between pairs of vertices that 
reside in the same set. 


Given a (simple) graph, the task is to deter- 
mine whether it is bipartite or is “far” from being 
bipartite. Thus, the standard decision problem is 
relaxed by allowing any answer when the graph 
is not bipartite but is “close” to some bipartite 
graph. We wish to solve this “approximate de- 
cision” problem in sublinear time, given access 
to a data structure that answers adjacency and 
incidence queries in unit time. 
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To complete the formulation of the problem, 
we need to define the distance between graphs 
and describe how the graph is accessed. The 
distance between two graphs Gi; = (V, F1) 
and Gz = (V, E2) is determined by the sym- 
metric difference between their edge sets (i.e., 
E, AE2), where they are €-close (resp., €-far) if 
|E, AE2| < €-(|Ei| + |E2|) Gesp., |Z, AE2| > 
€ + (|Ei| + |E2|)). A graph property is a set 
of graphs that is closed under isomorphism, and 
we say that G = (V, £) is e-far from a graph 
property P if for every G’ € P it holds that G is 
e-far from G’. 


We consider algorithms that make oracle 
queries to the input graph, denoted G = (V, E). 
Specifically, an algorithm can perform adja- 
cency queries of the form (u,v) € V?, which 
are answered by 1 if {u, v} € E and by 0 other- 
wise, and incidence queries of the form (u,7) € 
V x [|V| — 1], which are answered by v if v is 
the ith neighbor of u and by if u has less than 
i neighbors. (Note that adjacency queries may be 
quite useless when the graph is very sparse.) We 
now define property testing in this model. 


Definition 1 (testing graph properties with ad- 
jacency and incidence queries) A tester for a 
graph property P is a randomized algorithm that 
is given as input a size parameter N and a prox- 
imity parameter € as well as access to adjacency 
and incidence oracles for an N -vertex graph G = 
([N], E). The tester should output a binary ver- 
dict that satisfies the following two conditions. 


1. If G e€ PP, then the tester accepts with 
probability at least 2/3. 

2. If G is e-far from P, then the tester accepts 
with probability at most 1/3. 


A tester has one-sided error if it accepts every 
graph in P with probability 1. 


Here we are interested in the case that P is the 
set of all bipartite graphs, and we seek a tester of 
time complexity that is sublinear in the size of the 
graph. The dependence of the running time on the 
proximity parameter, denoted €, is of secondary 
concern. 
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Key Results 


Building on the work of Goldreich and Ron [6] on 
testing bipartiteness of bounded-degree graphs, 
Kaufman, Krivelevich, and Ron [9] presented a 
tester for bipartiteness that, given a proximity 
parameter € and access to an N-vertex graph, 
runs in time JN - poly(log(N),1/e). This 
algorithm uses only incidence queries. In case 
the number of edges, denoted M, is larger 
than N3/2, the running time can be reduced 
to (N?/M) - poly(log N,1/e) by also using 
adjacency queries. Furthermore, in both cases, 
the testers have one-sided error; that is, they 
always accept bipartite graphs. 

From now on, we focus on the special case that 
M = O(N) and further assume that the graph 
has constant maximum degree, denoted d. 

As a warm-up, we note that the case of d = 
2 is easy. In this case, we are guaranteed that 
the graph consists of a collection of paths and 
cycles, and we only need to check that it does not 
have short cycles of odd length. Note that such 
an N-vertex graph is €-far from being bipartite 
if and only if it contains more than €N cycles 
of odd length, where most of these cycles must 
have length at most 2/¢. Hence, in this case, 
testing bipartiteness can be performed by select- 
ing O(1/e) random vertices and exploring their 
neighborhoods up to distance 1 /e. 

In contrast, in the case that d > 3, any tester 
for bipartiteness must perform Q(./N) queries. 
As shown by Goldreich and Ron [7], this can be 
proved by considering the following two families 
of N-vertex graphs (for any even NV): 


1. The first family, denoted GN , consists of all 
degree-3 graphs that are composed of the 
union of a Hamiltonian cycle and a perfect 
matching. That is, there are N edges connect- 
ing the vertices in a cycle, and the other N/2 
edges form a perfect matching. 

2. The second family, denoted Gy , is the same 
as the first, except that the perfect matchings 
allowed are restricted as follows. The distance 
on the cycle between every two vertices that 
are connected by a perfect matching edge must 
be odd. 


2217 


Clearly, all graphs in by are bipartite. It can be 
shown that almost all graphs in G , are far from 
being bipartite. On the other hand, one can prove 
that an algorithm that performs o(/N) queries 
cannot distinguish between a graph chosen ran- 
domly from oy (which is always bipartite) and 
a graph chosen randomly from gy (which with 
high probability is far from bipartite). Loosely 
speaking, this follows from the fact that in both 
cases the algorithm is unlikely to encounter a 
cycle (among the vertices that it has inspected). 
The algorithm itself is based on taking 
many (i.e., poly(1/e) - O(N'/2)) random walks 
from few (ie., O(1/e)) randomly selected 
start vertices, where each walk has length 
poly(e~! log N). Specifically, given as input N, 
d, € as well as access to an incidence oracle 
for an N-vertex graph, G = (V,E), of degree 
bound d, the algorithm repeats the following 


steps T e @(+) times: 


1. Uniformly select a vertex s in V. 
2. Try to find an odd-length cycle through s: 

(a) Perform K “ poly((log N)/e) - VN ran- 
dom walks starting from s, each of length 
L © poly((log N)/e). 

(b) Let Ro (respectively, R ,) denote set of 
vertices reached from s in an even (respec- 
tively, odd) number of steps in any of these 
walks. 

(c) If Ro M Ry is not empty, then reject. 


If the algorithm did not reject in any of the 
foregoing T iterations, then it accepts. 

Clearly, the algorithm always accepts bipartite 
graphs. Hence, the analysis boils down to proving 
that if the graph G is €-far from being bipartite, 
then it is rejected with probability at least 2/3. 

The analysis is quite involved. We confine 
ourselves to the special case where the graph 
has a “rapid mixing” feature. It is convenient to 
modify the random walks so that at each step 
each neighbor is selected with probability 1/2d, 
and otherwise (with probability at least 1/2) the 
walk remains at the present vertex. Furthermore, 
we will consider a single execution of Step (2) 
starting from an arbitrary vertex, s, which is fixed 
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in the rest of the discussion. The rapid mixing 
feature we assume is that, for every vertex v, a 
(modified) random walk of length L starting at 
s reaches v with probability approximately 1/N 
(say, up to a factor of 2). Note that if the graph is 
an expander, then this is certainly the case (since 
L = o(log N)). 

The key quantities in the analysis are the fol- 
lowing probabilities, referring to the parity of the 
length of a path obtained from the random walk 
by omitting the self-loops (transitions that remain 
at current vertex). Let p°(v) (respectively, p'(v)) 
denote the probability that a (modified) random 
walk of length L, starting at s, reaches v while 
making an even (respectively, odd) number of 
real (i.e., non-self-loop) steps. By the rapid mix- 
ing assumption (for every v € V), it holds that 


a2 p°(v) + p'(v) < S (1) 


2N N° 

We consider two cases regarding the sum 
Dev P°(v)pl(v): If the sum is (relatively) 
“small,” we show that V can be 2-partitioned 
so that there are relatively few edges between 
vertices that are placed in the same part, which 
implies that G is close to being bipartite. Other- 
wise (i.e., When the sum is not “‘small’’), we show 
that with significant probability, when Step (2) is 
started at vertex s, it is completed by rejecting G. 

In general, the input graph may not be “rapidly 
mixing,’ and so the actual analysis, which 
appears in [6], is far more complex. Another 
layer of complexity is added when we move 
from the case of constant degree bound (i.e., d) 
to the case where the vertex degrees may vary 
significantly; see [9]. 


Applications 


The foregoing algorithm can be used to find 
odd-length cycles (of polylogarithmic length) in 
graphs that are far from lacking such cycles. In 
general, any one-sided error tester for a property 
P finds subgraphs that are inconsistent with the 
property when invoked on a graph that is far 
from having property ?. Thus, the fact that the 
bipartite tester finds odd cycles (when invoked 
on graphs that are far from lacking such cycles) 
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follows directly from its definition, but the fact 
that these cycles are short is a feature of the 
specific tester presented above. 


Cross-References 
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Comments for the Recommended 
Reading 


The current entry falls within the scope of prop- 
erty testing (see [4,5,11,12]). A general definition 
of this setting was first put forward by Rubin- 
feld and Sudan [13]. This definition was further 
generalized and systematically investigated by 
Goldreich, Goldwasser, and Ron [8], who fo- 
cused on testing graph properties in the dense- 
graph model (see this Encyclopedia’s entry cited 
above). The model considered in this entry was 
suggested by Kaufman, Krivelevich, and Ron [9]. 
It generalizes a model proposed by Parnas and 
Ron [10], which in turn generalizes the bounded- 
degree model of Goldreich and Ron [7]. The 
latter paper focuses on testers of complexity that 
only depends on the proximity parameter (i.e., 
independent of the size of the graph). Among 
these testers is a two-sided error tester of cycle- 
freeness; a one-sided error tester for this problem 
is presented in [3], but its complexity depends on 
the size of the graph (where this dependence is 
unavoidable). 

As already noted, the bipartiteness tester for 
the bounded-degree model is due to Goldreich 
and Ron [6], and it was extended to the general 
model by Kaufman, Krivelevich, and Ron [9]. 
In contrast to these results, Bogdanov, Obata, 
and Trevisan [2] proved that 3-colorability cannot 
be tested with sublinear query complexity, even 
in the bounded-degree model. The problem of 
testing colorability of general graphs was further 
studied by Ben-Eliezer et al. [1]. 
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Problem Definition 


Suppose we would like to check whether a given 
array of real numbers is sorted (say, in non- 
decreasing order). Performing this task exactly 
requires reading the entire array. Here we con- 
sider the approximate version of the problem: 
testing whether an array is sorted or “far” from 
sorted. We consider two natural definitions of the 
distance of a given array from a sorted array. 
Intuitively, we would like to measure how much 
the input array must change to become sorted. We 
could measure the change by: 


1. The number of entries changed 
2. The sum of the absolute values of changes in 
all entries 


It is not hard to see that looking at the number of 
entries that must be deleted in an array to make it 
sorted is equivalent to the measure in item 1. 

To define the two distance measures formally, 
let a = (aj,...,@n) be the input array and S 
be the set of all sorted arrays of length n. We 
denote by [n] the set {1,2,...,”}. The Ham- 
ming distance from a to S, denoted dist(a, S), 
is Minges |{i € [nm] : ai FA H;}|. The Ly 
distance from a to S, denoted dist,(a,S), is 
minges Vien] |a; — b;|. Given a parameter € € 
(0, 1), an array is e-far from sorted with respect 
to the Hamming distance or, respectively, Ly 
distance, if the corresponding distance from a to 
S is at least en. 

A tester for sortedness is a randomized algo- 
rithm that is given parameters « € (0,1) andn 
and direct access to an input array a. It is required 
to accept with probability at least 2/3 if the array 
is sorted and reject with probability at least 2/3 
if the array is €-far from sorted. We consider two 
types of testers, Hamming and Lj, corresponding 
to the two distance measures we defined. The 
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query complexity of a tester is the number of 
array entries it reads. The goal is to design testers 
for sortedness with the smallest possible query 
complexity and running time. 

There are two special cases of testers we will 
discuss. A tester is nonadaptive if it makes all 
queries in advance, before receiving any query 
answers. A tester has 1-sided error if it always 
accepts all sorted arrays. 


Bibliographical Notes 

The Hamming testers for sortedness were first 
studied by Ergiin et al. [7]. The L1-testers (and, 
more generally, Lp-testers, which use the L, 
distance for some p > 1) were introduced by 
Berman, Raskhodnikova, and Yaroslavtsev [2]. 
The two distance measures we discussed, dist and 
dist}, are identical for arrays with 0/1 entries, 
which we call Boolean arrays. The L-tester in 
[2] builds on the sortedness tester for Boolean 
arrays by Dodis et al. [6]. 

Observe that an array (d1,d2,...,dy) of real 
numbers can be represented by a function f : 
[n] — R defined by f(i) = a; for alli € [n]. 
The formulated problem is equivalent to testing 
if a function f over an ordered finite domain is 
monotone. In fact, the L,-tester we will discuss 
can be easily adapted to work for functions over 
infinite domains (specifically, bounded intervals), 
because its complexity is independent of the 
domain size. The problem of Hamming testing 
monotonicity of functions over domain [n]? was 
first investigated by Goldreich et al. [11]; general 
partially ordered domains were studied by Fis- 
cher et al. [10]. These problems are discussed in 
the encyclopedia entry “Monotonicity Testing.” 


Key Results 


Ergiin et al. [7] designed two Hamming testers 
for sortedness that run in time O( gm) Later, 
Bhattacharyya et al. [3] and Chakrabarty and 
Seshadhri [5] gave different testers with the same 
complexity, with additional features that made 
them useful as subroutines in testing monotonic- 
ity of high-dimensional functions. Fischer [9] 
proved that the running time of these testers is op- 


timal. Berman, Raskhodnikova, and Yaroslavtsev 
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[2] gave an L,-tester for sortedness with running 
time O(1/e), which is also optimal. 

Here we present two Hamming testers from 
[3,7] and the L,-tester from [2]. 


Hamming Testers for Sortedness 


A Tester Based on Binary Search [7] 

We present and analyze the first tester for sort- 
edness (Algorithm |) with the assumption that all 
entries in the array a are distinct. This assumption 
can be removed by treating element a; as (a;,i) 
for alli € [n]. 


Algorithm 1: Hamming Tester for Sorted- 
ness Based on Binary Search 
input : parameters 7 and e€; direct access to array a. 


1 repeat [23] times: 


2 pick i € [n] uniformly at random; 
3 perform a binary search for the value a; in the 
array a; 
4 if a; is not located by the binary search, 
// it leads to another position 
5 reject; 
6 accept 


Analysis of the First Tester 

The tester always accepts all sorted arrays. Now 
consider an array that is €-far from sorted (in 
Hamming distance). We say that a position? € 
[n] is searchable if a; can be found by a binary 
search in Step 3 and not searchable otherwise. 
If positions 7 and 7 such that i < j are both 
searchable, then a; < aj;, because both a; and 
aj; are in the correct position with respect to their 
common ancestor in the binary search tree. Thus, 
all numbers in searchable positions are sorted. 
Since the array is €-far from sorted, at least en 
positions must be unsearchable. If the tester picks 
an unsearchable position in Step 2, it rejects. The 
probability that it happens in one trial is at least 
€. Therefore, the probability that it fails to happen 
in [23] trials is at most 


G2) lee (-«- ~*) = 1/3. 4) 
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Thus, the tester rejects an array that is e-far from 
sorted with probability at least 2/3. 


A Tester Based on Graph Spanners [3] 
The next tester we discuss is based on graph 
spanners. We can represent the requirement that 
the array is sorted as a directed graph G, where 
nodes are positions in [n], and there is an edge 
(i, 7) for alli < j. That is, an edge (i, 7) 
represents that a; < a;. A 2-spanner of G is a 
subgraph H of G with vertex set [n] such that for 
every edge (i, /) in G, there is a path of length 
at most 2 from i to j in H. It is not hard to 
construct a 2-spanner of G with at most n logn 
edges[3, 12]. (e.g., it can be done using divide- 
and-conquer as follows: connect all nodes to the 
one in the middle, orienting the edges towards 
the nodes with larger indices; remove the middle 
node; and recurse on the two resulting sublists.) 
The tester simply repeats the following step 
pa times: pick a uniformly random 
edge (i, j) of the 2-spanner H, and reject if this 
edge is violated, namely, if a; > a;. If the tester 
does not find a violated edge, it accepts. 


Analysis of the Second Tester 

If the input array is sorted, it does not have any 
violated edges, and the tester always accepts. 
Now consider an array that is ¢-far from sorted 
(in Hamming distance). We call a position? € [n] 
bad if node i is an endpoint of a violated edge in 
the 2-spanner H; otherwise, i is good. Note that 
any two good positions 7, 7 such thati < j are 
connected by a path of length at most 2 of non- 
violated edges in H. If this path is (7, 7), it im- 
plies that a; < a;.Ifthis pathis (i,k, 7) for some 
node k, it implies that aj < ag < aj. Conse- 
quently, for any two good positions 7, 7 such that 
i < j, the numbers a; and a; are in the correct 
order. That is, all numbers in good positions are 
sorted. As in the analysis of Algorithm 1, we can 
conclude that there are at least €n bad positions. 
But each bad position is adjacent to a violated 
edge. Each violated edge can contribute at most 
two new bad positions. Thus, there are at least 
en/2 violated edges. By a simple calculation 
similar to (1), the second algorithm rejects an 
array that is €-far from sorted with probability at 
least 2/3. 
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L-Tester for Sortedness 

The L,-tester for sortedness [2] requires only a 
uniform sample from the input (as opposed to the 
ability to query an arbitrary position). It picks 
[2] positions uniformly and independently 
at random and accepts iff the numbers in these 
positions are sorted. 

The main ingredient in the analysis of the 
tester is a reduction to the case of Boolean arrays. 
It states that if the tester is nonadaptive and has 
1-sided error, it suffices to show that it works 
for Boolean arrays. We omit the proof of the 
reduction. 

Clearly, the Ly ,-tester is nonadaptive and 
always accepts sorted arrays. Now consider a 
Boolean array a which is e-far from sorted. It 
remains to show that it is rejected with probability 
at least 2/3. Let Xo be the set of the en /2 largest 
indices i for which a; = 0. Similarly, let X1 be 
the set of the «n/2 smallest indices i for which 
1. It is easy to show that i < j for all 
i € X; and j € Xo, because a is €-far from 
sorted. The L,-tester samples no index from Xo 
with probability at most 1/6. The same holds for 
X 1. Thus, by a union bound, with probability at 
least 2/3, it samples an index from Xo and an 
index from X, and detects a violation. 


q= 


Running time 

We explained why the algorithm that samples 
[2B] positions uniformly and independently at 
random is an L,-tester for sortedness. Now we 
analyze its running time for the case of general 
arrays. The L1-tester makes O(1/e) queries. To 
determine whether the elements in these positions 
are sorted, the tester can use bucket sort to sort the 
sampled positions and then simply check if the 
sequence of queried elements is nondecreasing. 
Since the positions are sampled uniformly at ran- 
dom, the bucket sort can be implemented to run in 
expected time O(1/e), where the expectation is 
taken over the choice of the samples. By standard 
methods, the algorithm can be modified to run in 
O(1/e) time in the worst case. Observe that the 
running time does not depend on the length of the 
input. This is impossible for Hamming testers for 
sortedness, which, as we mentioned, must query 
§2 (log n) positions [9]. 
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Applications 


Testers for sortedness are used as subroutines 
in other property testers, e.g., for monotonicity 
of high-dimensional functions [2, 5,6] and for 
the property that given points represent ordered 
vertices of a convex polygon [7]. They are also 
used to construct fast approximate probabilisti- 
cally checkable proofs for different optimization 
problems [8]. Ben-Moshe et al. [1] employed 
sortedness testers (with additional features) to 
speed up query evaluation in databases. 


Open Problem 


Consider the case when all numbers in the input 
array lie in some specified small set such as [r] 
for some integer 7. As we discussed, for Boolean 
arrays, testing sortedness can be done in O(1/e) 
time [2, 6]. It is not hard to see that for larger 
ranges, it can be done in O(r/e) time. Whenr < 
n, can one test sortedness it time polylogarithmic 
inr?Is O ( er) running time achievable? 

Fischer’s lower bound for testing sorted- 
ness [9] applies only to n <_ r. The best 
known lower bound that takes into account both 
parameters is (2(min(logr,logn)), due to [4], 
but it applies only to nonadaptive testers. 


Cross-References 


Monotonicity Testing 


Acknowledgments The author was supported in part by 
NSF CAREER award CCF-0845701 and Boston Uni- 
versity’s Hariri Institute for Computing and Center for 
Reliable Information Systems and Cyber Security. 


Recommended Reading 


1. Ben-Moshe S, Kanza Y, Fischer E, Matsliah 
A, Fischer M, Staelin C (2011) Detecting 
and exploiting near-sortedness for efficient 


relational query evaluation. In: ICDT, Uppsala, 
pp 256-267 

2. Berman P, Raskhodnikova S, Yaroslavtsev G (2014) 
L p-testing. In: Shmoys DB (ed) STOC, New York. 
ACM, pp 164-173 


Testing Juntas and Related Properties of Boolean Functions 


3. Bhattacharyya A, Grigorescu E, Jung K, Raskhod- 
nikova S, Woodruff DP (2012) Transitive-closure 
spanners. SIAM J Comput 41(6):1380-1425 

4. Blais E, Raskhodnikova S, Yaroslavtsev G (2014) 
Lower bounds for testing properties of functions over 
hypergrid domains. In: IEEE 29th conference on 
computational complexity (CCC) 2014, Vancouver, 
11-13 June 2014, pp 309-320 

5. Chakrabarty D, Seshadhri C (2013) Optimal bounds 
for monotonicity and Lipschitz testing over hyper- 
cubes and hypergrids. In: STOC, Palo Alto, pp 419- 
428 

6. Dodis Y, Goldreich O, Lehman E, Raskhodnikova S, 
Ron D, Samorodnitsky A (1999) Improved testing al- 
gorithms for monotonicity. In: RANDOM, Berkeley, 
pp 97-108 

7. Ergiin F, Kannan S, Kumar R, Rubinfeld R, 
Viswanathan M (2000) Spot-checkers. J Comput Syst 
Sci 60(3):717-751 

8. Ergiin F, Kumar R, Rubinfeld R (2004) Fast approx- 
imate probabilistically checkable proofs. Inf Comput 
189(2):135-159 

9. Fischer E (2004) On the strength of comparisons in 
property testing. Inf Comput 189(1):107-116 

10. Fischer E, Lehman E, Newman I, Raskhodnikova S, 
Rubinfeld R, Samorodnitsky A (2002) Monotonicity 
testing over general poset domains. In: STOC, Mon- 
treal, pp 474-483 

11. Goldreich O, Goldwasser S, Lehman E, Ron D, 
Samorodnitsky A (2000) Testing monotonicity. Com- 
binatorica 20(3):301—337 

12. Raskhodnikova S (2010) Transitive-closure spanners: 
a survey. In: Goldreich O (ed) Property testing. Lec- 
ture notes in computer science, vol 6390. Springer, 
Berlin, pp 167-196 


Testing Juntas and Related 
Properties of Boolean Functions 


Eric Blais 
University of Waterloo, Waterloo, ON, Canada 


Keywords 


Dimension reduction; Juntas; Property testing; 
Sublinear-time algorithms 


Years and Authors of Summarized 
Original Work 


2004; Fischer, Kindler, Ron, Safra, Samorodnit- 


sky 
2009; Blais 


Testing Juntas and Related Properties of Boolean Functions 


Problem Definition 


Fix positive integers n and k withn > k. The 
function f : {0,1}" — {0,1} is a k-junta if it 
depends on at most k of the input coordinates. 
Formally, f is a k-junta if there exists a set 
JC {1,2,...,n} of size |J| < k such that 
for all inputs x,y € {0,1}” that satisfy x; = 
y; for each i € J, we have f(x) = f(y). 
Juntas play an important role in different areas 
of computer science. In machine learning, juntas 
provide an elegant framework for studying the 
problem of learning with datasets that contain 
many irrelevant attributes [9, 10]. In the analysis 
of Boolean functions, they essentially capture the 
set of functions of low complexity under natural 
measures such as total influence [19] and noise 
sensitivity [12]. 

How efficiently can we distinguish k-juntas 
from functions that are far from being k-juntas? 
We can formalize this question in the setting 
of property testing. Define the distance between 
two functions f,g : {0,1}”" — {0,1} to be 
the fraction of inputs on which f and g take 
different values: dist(f, g) := sa l{x € {0,1}”: 
F(x) # g(x)}. When dist(f, g) => € for every 
k-junta g, we say that f is e-far from being 
a k-junta; otherwise we say that f is €-close 
to being a k-juntas. An e-test for k-juntas is a 
randomized algorithm that queries the value of 
f : {0, 1}” — {0,1} on some of its inputs and 
then with probability at least 3 


1. accepts if f is a k-junta, and 
2. rejects if f is €-far from being a k-junta. 


(The algorithm is free to output anything when f 
is not a k-junta but is €-close to being a k-junta.) 


Problem 1 What is the minimum number of 
queries to f : {0,1}” — {0, 1} required to €-test 
if f is a k-junta? 


Key Results 


Testing 1-Juntas 

One important class of functions related to junta 
testing is dictator functions — the functions f : 
{0,1}" —> {0,1} of the form f(x) = x; 
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for some i € [n]. Bellare, Goldreich, and Su- 
dan [3], in a work that was stated in terms of 
testing the long code and part of their analy- 
sis of probabilistically checkable proofs (PCPs), 
showed that dictator functions can be e€-tested 
with O(1/e) queries. (See the » Locally Testable 
Codes entry for more details.) This result was 
later extended by Parnas, Ron, and Samorodnit- 
sky [21]. The class of 1-juntas includes dictator 
functions, their negations (known as anti-dictator 
functions), and the constant functions; using the 
algorithms in [3,21], we can test 1-juntas with 
O(1/e) queries. 


Testing k-Juntas 

The first result on testing k-juntas for values 
k > 1 followed from related work on the 
problem of /earning juntas. Blum, Hellerstein, 
and Littlestone [11] introduced an algorithm 
that queries a k-junta f : {0,1}” — {0,1} 
on O(klogn + k/e + 2*) inputs and with 
probability at least 2 returns a k-junta A 
{0,1}" —> {0,1} such that dist( fh) < e. 
Shortly afterward, Goldreich, Goldwasser, and 
Ron [20] gave a general reduction showing 
that a proper learning algorithm with query 
complexity q for a class C of functions can 
be used to €-test the class C with q + O(1/e) 
queries. This result, combined with the Blum— 
Hellerstein—Littlestone algorithm, shows that k- 
juntas can be tested with O(k logn + 2* + 1/e) 
queries. 

Fischer, Kindler, Ron, Safra, and Samorodnit- 
sky [18] showed that, remarkably, it is possible to 
test k-juntas with a number of queries that is inde- 
pendent of n. Specifically, they introduced €-tests 
for k-juntas with query complexity O(k?/e). 
This result was sharpened in [4,5], leading to the 
following theorem. 


Theorem 1 ({5]) Jt is possible to €-test if f : 
{0, 1}” — {0,1} is a k-junta with O(k logk + 
k /€) queries. 


Chockler and Gutfreund [16] showed that 
92(k) queries are required to test k-juntas, so 
the bound in Theorem | is nearly optimal. (See 
also [4, 7, 13] for related lower bounds.) 

Theorem | can be generalized to apply to the 
setting where X;,...,Xny, and Y are arbitrary 
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finite sets, and we wish to test whether a function 
f : X1,x-++x Xp; > Y isak-junta. Interestingly, 
the query complexity of the k-junta test remains 
unchanged in this general setting as well. See [5] 
for the details. 


Junta-Testing Algorithm 
The proof of Theorem | contains two main ingre- 
dients. 

The first ingredient is a simple modification 
of the Blum—Hellerstein—Littlestone learning al- 
gorithm. The original learning algorithm pro- 
ceeds in two stages: first, the algorithm learns 
the & relevant coordinates of the junta; then, 
it queries f for all 2" different values of the 
k relevant coordinates. When we test k-juntas, 
the second stage is unnecessary and can be re- 
placed with a simpler test that checks whether 
the (at most) k relevant coordinates that have 
been identified completely determine the value 
of f or not. With this modification, we ob- 
tain an e-test for k-juntas with query complexity 
O(k logn + k/e). Note that this result already 
yields the desired bound in Theorem | when 
n = poly(k). 

The second ingredient in the proof of Theo- 
rem | is a dimension reduction argument. Con- 
sider a random partition of the n coordinates 
into m = poly(k) parts S1,...,Sm. A func- 
tion f : {0,1}" — {0,1} is isomorphic to a 
function f’ : X, x --- x Xm — {0,1} where 
X; = {0, 1}!5:!, The function f’ is defined over 
a domain with much smaller dimension, and it 
satisfies two useful properties. First, when f is 
a k-junta, then so is f’. Second, when /f is €- 
far from k-juntas and m = 92(k?), then with 
high probability f’ is 5-far from k-juntas as 
well. The second fact is far from obvious. It 
was established in [5] using Fourier analysis and 
in [8] using a combinatorial argument. These 
two properties let us complete the algorithm for 
testing k-juntas by applying the modified Blum— 
Hellerstein—Littlestone algorithm on the function 
f’. More details on the algorithm itself can be 
found in the original papers [5, 18] and the sur- 
vey [6]. 
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Applications 


Feature Selection 

Feature selection is the general machine learning 
task of identifying the features (also known as 
attributes or variables) in a dataset that suffice 
to describe the model being studied. This task 
is formalized within the junta framework as fol- 
lows: given a function f : {0,1}” — {0, 1}, the 
algorithm seeks to identify a set J C [n] of size 
|J| = k where (i) k is as small as possible, and 
(ii) there is a k-junta h : {0,1}” — {0, 1} on the 
set J that is close to f. 

The junta testing algorithm can be used to 
approximate the minimal value of k for which 
these two conditions can be satisfied. For exam- 
ple, by executing the junta testing algorithm with 
k = 1,2,4,8,... until it accepts, we obtain the 
following estimation result. 


Corollary 1 There is an algorithm that, given 
query access to f : {0,1}” — {0,1}, outputs 
an estimate k such that f is €-close to a k- 
junta and such that f is not an €-junta for any 
£ < k/2. Furthermore, this algorithm makes 
O(k logk + k/e) queries to f. 


Testing by Implicit Learning 

Let C be any class (i.e., family) of Boolean 
functions where every function in C is close 
to a being a k-junta. Many natural classes of 
Boolean functions that have been studied in learn- 
ing theory and computational complexity fall 
into this framework. For example, functions with 
bounded, decision tree complexity, DNF com- 
plexity, circuit complexity, and sparse polynomial 
representation all satisfy this condition. (See the) 
Diakonikolas et al. [17] gave a general result 
showing that for each of these classes C, we 
can €-test the property of being in the class C 
efficiently. This result has since been sharpened 
by Chakraborty et al. [14], yielding the following 
bounds. 


Theorem 2 ({14]) Fix s > 0 ande > 0. We 
can €-test whether f : {0,1}” — {0,1} can be 
represented by 
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a DNF with s terms, 

a size-s Boolean formula, 

an S-sparse polynomial over F5, or 
a decision tree of size s 


RWNS 


with O(s/e? - polylog(s/€)) queries. 


The proof of Theorem 2 is remarkable in that 
the €-test algorithm in [14,17] learns the function 
f : {0,1}" — {0,1} when f is a k-junta, but 
without identifying which of the k coordinates 
of f are part of the junta. This technique is 
called testing by implicit learning, and it is ob- 
tained by using and building on the junta testing 
algorithm. 


Testing Function lsomorphism 

Two functions f,g : {0,1}" — {0,1} are iso- 
morphic to each other when they are identical up 
to relabeling of the input variables. In the function 
isomorphism testing problem, we are given query 
access to (an unknown function) f and must 
determine whether it is isomorphic to (the known 
function) g or whether it is e-far from being so. 
How many queries to f do we need to perform 
this task? The answer, it turns out, depends on 
the choice of the function g. The functions g 
for which we can test isomorphism to g with a 
constant number of queries are called efficiently 
isomorphism testable. 

Every symmetric function is efficiently iso- 
morphism testable. Using the junta testing algo- 
rithm, Fischer et al. [18] showed that for any 
constant k > 0, every k-junta is also efficiently 
isomorphism testable. An important open prob- 
lem in property testing is to characterize the 
set of functions that are efficiently isomorphism 
testable. The state of the art on this question 
is a recent result — also building on the junta 
testing algorithm — showing that every partially 
symmetric function is also efficiently isomor- 
phism testable. A function f : {0,1}” > {0, 1} 
is k-partially symmetric if there is a function 
g : {0,1}* x {0,1,2,...,n} — {0,1} and a 
mapping pe [k] — [n] such that f(x) = 
&(Xpqa)s +++» Xp(k)> |X ||) where ||x|| = 3°; x; the 
Hamming weight of x. 
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Theorem 3 ([8,15]) For every constant k > 0, 
every k-partially symmetric function is efficiently 
isomorphism testable. 


Open Problems 


There are two particularly appealing open prob- 
lems related to the junta testing problem that 
are motivated by its application to the feature 
selection problem. 


Distance Approximation 

Theorem | shows that we can distinguish k- 
juntas from functions that are ¢-far from k-juntas 
with few queries. Can we also approximate the 
distance of a function to its closest k-junta with a 
small number of queries? 


Problem 2 What is the minimum number of 
queries to f : {0,1}” — {0,1} required to 
approximate the distance of f to its closest 
k-junta within an additive error of +e, where 
e € (0, 3] is a parameter given to the 
algorithm? 


In some cases, property testing algorithms 
can also be used directly for the correspond- 
ing distance approximation problem. This is the 
case, for example, for the BLR linearity test in 
the > Linearity Testing/Testing Hadamard Codes 
chapter. But it is currently not known whether 
the junta testing algorithms in [18] or [5] can be 
extended to yield distance approximators or not. 


Testing with Random Samples 

The query model we have discussed throughout 
this chapter — where the algorithm is free to query 
the target function on any input of its choosing 
— is known as the membership query model in 
machine learning. In some applications, however, 
we must consider weaker query models where we 
restrict the queries that the algorithm can make in 
some ways. Can we also test k-juntas efficiently 
in restricted query models? 


Problem 3 In which restricted query models can 
we test whether f : {0, 1}” — {0, 1} is a k-junta 
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with a number of queries that is asymptotically 
smaller than the number of queries required to 
learn k-juntas in the same settings? 


Two examples of restricted query models 
include the passive sampling model (where 
each query is drawn independently at random 
from some fixed distribution) and the active 
query model (where the algorithm can choose 
its queries from a larger set of inputs drawn from 
some distribution). Some initial results on this 
problem can be found in [1, 2]. 


Cross-References 


Linearity Testing/Testing Hadamard Codes 


Recommended Reading 


1. Alon N, Hod R, Weinstein A (2013) On active and 
passive testing. arXiv preprint arXiv: 13077364 
2. Balcan MF, Blais E, Blum A, Yang L (2012) Active 
property testing. In: IEEE 53rd annual symposium 
on foundations of computer science (FOCS’ 12), New 
Brunswick. IEEE, pp 21-30 
3. Bellare M, Goldreich O, Sudan M (1998) Free 
bits, PCPs, and nonapproximability—towards tight 
results. SIAM J Comput 27(3):804—-915 
4. Blais E (2008) Improved bounds for testing jun- 
tas. In: Goel A, Jansen K, Rolim JDP, Rubinfeld 
R (eds) Approximation, randomization and com- 
binatorial optimization. Algorithms and techniques. 
Springer, Boston, pp 317-330 
5. Blais E (2009) Testing juntas nearly optimally. In: 
Proceedings of the 2009 ACM international sympo- 
sium on theory of computing (STOC’09). ACM, New 
York, pp 151-157 
6. Blais E (2010) Testing juntas: a brief survey. In: 
Goldreich O (ed) Property testing — current research 
and surveys. Springer, Berlin/Heidelberg, pp 32-40 
7. Blais E, Brody J, Matulef K (2012) Property testing 
lower bounds via communication complexity. Com- 
put Complex 21(2):311-358 
8. Blais E, Weinstein A, Yoshida Y (2012) Partially 
symmetric functions are efficiently isomorphism- 
testable. In: IEEE 53rd annual symposium on 
foundations of computer science (FOCS’12), New 
Brunswick, pp 551-560 
9. Blum A (1994) Relevant examples and relevant fea- 
tures: thoughts from computational learning theory. 
In: AAAI fall symposium on ‘Relevance’, New Or- 
leans 
10. Blum A, Langley P (1997) Selection of relevant fea- 
tures and examples in machine learning. Artif Intell 
97(2):245-271 


Text Indexing 


11. Blum A, Hellerstein L, Littlestone N (1995) Learn- 
ing in the presence of finitely or infinitely many 
irrelevant attributes. J Comput Syst Sci 50(1): 
32-40 

12. Bourgain J (2002) On the distribution of the 
fourier spectrum of boolean functions. Isr J Math 
131(1):269-276 

13. Buhrman H, Garcia-Soriano D, Matsliah A, de Wolf 
R (2013) The non-adaptive query complexity of test- 
ing k-parities. Chic J Theor Comput Sci 2013: Article 
6, 11 

14. Chakraborty S, Garcia-Soriano D, Matsliah A (2011) 
Efficient sample extractors for juntas with appli- 
cations. In: Aceto L, Henzinger M, Sgall J (eds) 
Automata, languages and programming. Springer, 
Zurich, pp 545-556 

15. Chakraborty S, Fischer E, Garcia-Soriano D, Mat- 
sliah A (2012) Junto-symmetric functions, hy- 
pergraph isomorphism, and crunching. In: 2012 
IEEE 27th conference on computational complexity 
(CCC’ 12). IEEE Computer Society, Los Alamitos, 
pp 148-158 

16. Chockler H, Gutfreund D (2004) A lower bound for 
testing juntas. Inf Process Lett 90(6):301-305 

17. Diakonikolas I, Lee HK, Matulef K, Onak K, Ru- 
binfeld R, Servedio RA, Wan A (2007) Testing for 
concise representations. In: 48th annual IEEE sympo- 
sium on foundations of computer science (FOCS’07), 
Providence. IEEE, pp 549-558 

18. Fischer E, Kindler G, Ron D, Safra S, Samorodnit- 
sky A (2004) Testing juntas. J Comput System Sci 
68(4):753-787 

19. Friedgut E (1998) Boolean functions with low aver- 
age sensitivity depend on few coordinates. Combina- 
torica 18(1):27-35 

20. Goldreich O, Goldwasser S, Ron D (1998) Property 
testing and its connection to learning and approxima- 
tion. J ACM 45(4):653-750 

21. Parnas M, Ron D, Samorodnitsky A (2002) Test- 
ing basic boolean formulae. SIAM J Discret Math 
16(1):20-46 


Text Indexing 
Srinivas Aluru 
Department of Electrical and Computer 


Engineering, Iowa State University, Ames, IA, 
USA 


Keywords 


String indexing 


Text Indexing 


Years and Authors of Summarized 
Original Work 


1993; Manber, Myers 


Problem Definition 


Text or string data naturally arises in many con- 
texts including document processing, information 
retrieval, natural and computer language pro- 
cessing, and describing molecular sequences. In 
broad terms, the goal of text indexing is to design 
methodologies to store text data so as to signif- 
icantly improve the speed and performance of 
answering queries. While text indexing has been 
studied for a long time, it shot into prominence 
during the last decade due to the ubiquity of web- 
based textual data and search engines to explore 
it, design of digital libraries for archiving human 
knowledge, and application of string techniques 
to further understanding of modern biology. Text 
indexing differs from the typical indexing of keys 
drawn from an underlying total order — text data 
can have varying lengths, and queries are of- 
ten more complex and involve substrings, partial 
matches, or approximate matches. 


Queries on text data are as varied as the di- 
verse array of applications they support. Con- 
sequently, numerous methods for text indexing 
have been developed and this continues to be an 
active area of research. Text indexing methods 
can be classified into two categories: (i) meth- 
ods that are generalizations or adaptations of 
indexing methods developed for an ordered set 
of one-dimensional keys, and (ii) methods that 
are specifically designed for indexing text data. 
The most classic query in text processing is to 
find all occurrences of a pattern P in a given 
text T (or equivalently, in a given collection of 
strings). Important and practically useful variants 
of this problem include finding all occurrences of 
P subject to at most k mismatches, or at most 
k insertions/deletions/mismatches. The focus in 
this entry is on these two basic problems and 
remarks on generalizations of one-dimensional 
data structures to handle text data. 
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Key Results 


Consider the problem of finding a given pattern P 
in text 7, both strings over alphabet X’. The case 
of a collection of strings can be trivially handled 
by concatenating the strings using a unique end 
of string symbol, not in 2’, to create text T. 
It is worth mentioning the special case where 
T is structured — i.e., T consists of a sequence 
of words and the pattern P is a word. Con- 
sider a total order of characters in »’. A string 
(or word) of length k can be viewed as a k- 
dimensional key and the order on »' can be nat- 
urally extended to lexicographic order between 
multidimensional keys of variable length. Any 
one-dimensional search data structure that sup- 
ports O(log) search time can be used to index 
a collection of strings using lexicographic order 
such that a string of length k can be searched 
in O(klogn) time. This can be considerably 
improved as below [8]: 


Theorem 1 Consider a data structure on one- 
dimensional keys that relies on constant-time 
comparisons among keys (e.g., binary search 
trees, red-black trees etc.) and the insertion 
of a key identifies either its predecessor or 
successor. Let O(F(n)) be the search time of 
the data structure storing n keys (e.g., O(logn) 
for red-black trees). The data structure can be 
converted to index n strings using O(n) additional 
space such that the query for a string s can be 
performed in O(F(n)) time if s is one of the 
strings indexed, and in O(F (n) + |s|) otherwise. 


A more practical technique that provides 
O(F(n) + |s|) search time for a string s 
under more restrictions on the underlying one- 
dimensional data structure is given in [9]. The 
technique is nevertheless applicable to several 
classic one-dimensional data structures, in 
particular binary search trees and its balanced 
variants. For a collection of strings that share 
long common prefixes such as IP addresses and 
XML path strings, a faster search method is 
described in [5]. 

When answering a sequence of queries, 
significant savings can be obtained by promoting 
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frequently searched strings so that they are 
among the first to be encountered in a search 
path through the indexing data structure. Ciriani 
et al. [4] use self-adjusting skip lists to derive 
an expected bound for a sequence of queries 
that matches the information-theoretic lower 
bound. 


Theorem 2 A collection of n strings of total 
length N can be indexed in optimal O(N) space 
so that a sequence of m string queries, say 
S1,°+*,Sm, can be performed in OO =1 lsj| + 
y=, Ni log(m/n;) expected time, _ where 
n; is the number of times the ith string is 
queried. 


Notice that the first additive term is a lower bound 
for reading the input, and the second additive 
term is a standard information-theoretic lower 
bound denoting the entropy of the query se- 
quence. Ciriani et al. also extended the approach 
to the external memory model, and to the case 
of dynamic sets of strings. More recently, Ko 
and Aluru developed a self-adjusting tree layout 
for dynamic sets of strings in secondary storage 
that provides optimal number of disk accesses for 
a sequence of string or substring queries, thus 
providing a deterministic algorithm that matches 
the information-theoretic lower bound [4]. 

The next part of this entry deals with some 
of the widely used data structures specifically 
designed for string data, suffix trees, and 
suffix arrays. These are particularly suitable 
for querying unstructured text data, such as 
the genomic sequence of an organism. The 
following notation is used: Let s[i] denote the 
ith character of string s, s[i...j] denote the 
substring s{i]s[i + 1]...s[j], and S$; = s[i]s[i + 
1]... s[|s|] denote the suffix of s starting at ith 
position. The suffix $; can be uniquely described 
by the integer i. In case of multiple strings, the 
suffix of a string can be described by a tuple 
consisting of the string number and the starting 
position of the suffix within the string. Consider 
a collection of strings over X’, having total length 
n, each extended by adding a unique termination 
symbol $ ¢ X’. The suffix tree of the strings is 
a compacted trie of all suffixes of these extended 


Text Indexing 


strings. The suffix array of the strings is the 
lexicographic sorted order of all suffixes of these 
extended strings. For convenience, we list ‘$’, the 
last suffix of each string, just once. The suffix tree 
and suffix array of strings ‘apple’ and ‘maple’ 
are shown in Fig. 1. Both these data structures 
take O(n) space and can be constructed in O(n) 
time [11, 13], both directly and from each other. 

Without loss of generality, consider the 
problem of searching for a pattern P as 
a substring of a single string 7. Assume the 
suffix tree ST of T is available. If P occurs in 
T starting from position i, then P is a prefix 
of suffix 7; = T[i]T[i+ 1]...7[|T|] in T. It 
follows that P matches the path from root to 
leaf labeled i in ST. This property results in the 
following simple algorithm: Start from the root 
of ST and follow the path matching characters in 
P, until P is completely matched or a mismatch 
occurs. If P is not fully matched, it does not occur 
in T. Otherwise, each leaf in the subtree below 
the matching position gives an occurrence of P. 
The positions can be enumerated by traversing 
the subtree in O(occ) time, where occ denotes 
the number of occurrences of P. If only one 
occurrence is desired, ST can be preprocessed 
in O(|T|) time such that each internal node 
contains the suffix at one of the leaves in its 
subtree. 


Theorem 3 Given a suffix tree for text T and 
a pattern P, whether P occurs in T can be an- 
swered in O(|P|) time. All occurrences of P in 
T can be found in O(|P| + occ) time, where occ 
denotes the number of occurrences. 


Now consider solving the same problem using the 
suffix array SA of T. All suffixes prefixed by P 
appear in consecutive positions in SA. These can 
be found using binary search in SA. Naively per- 
formed, this would take O(|P| * log|T|) time. 
It can be improved to O(|P| + log|T|) time as 
follows [15]: 

Let SA[L...R] denote the range in the 
suffix array where the binary search is focused. 
To begin with, L=1 and R=|T|. Let < 
denote “lexicographically smaller’, < denote 
“lexicographically smaller or equal’, and 
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Icp(a,B) denote the length of the longest 
common prefix between strings a and fp. 
At the beginning of an iteration, Ts,4jrz)_ =X 
P X_= Tyatry. Let M = [(L+ R)/2]. Let 
1 = lep(P,Tsatzy) and r = lcp(P, Tsar). 
Because SA _ is lexicographically ordered, 
lcep(P, Tsatm)) = min(,r). If J=r, then 
compare P and Ts] Starting from the (1+ 1)th 
character. If / 4 r, consider the case when / > r. 


Case I: ] < lep(Tsair); Ts4[M))- In this 
case, Ts4im)< P and Icp(P, Tsim) 
=Icp(P,Tsajr}). Continue search in 
SA[M...R]. No character comparisons 
required. 

Case II: / > lep(Tsar); Tsa[M])- In this 
case, P~<Tsarmj) and Icp(P, Tsim) 
=Icp(Tsair}, Ts4{m))- Continue — search 


in SA[L...M]. No character comparisons 
required. 

Case WI: / = Icp(Tsa{x}, Tstm}). In this case, 
lcep(P, Tsatm) = /. Compare P and Tsim} 
beyond /th character to determine their relative 
order and Icp. 


Similarly, the case when r >/ can be handled 
such that comparisons between P and Ts,4,m}, 
if at all needed, start from (r + 1)th character. 
To start the execution of the algorithm, 


Iep(P, Tsai) and Icp(P, Tsar) are com- 
puted directly using at most 2|P| character 
comparisons. It remains to be described how the 
Icp(Tsatr}, Tsatmy) and Icp(Tsarry, Tsai) 
values required in each iteration are computed. 
Let Lep[1...|TJ|—1] be an array such 
that  Lep[i] =/cp(SA[i], SA[i+1]). The 
Lcep array can be computed from SA in 
O(\T|) time [12]. For any 1<i<j <n, 
lep(Tsati, Tsatj)) = mind} Lep|k]. In order 
to find the /cp values required by the algorithm 
in constant time, note that the binary search can 
be viewed as traversing a path in the binary tree 
corresponding to all possible search intervals 
used by any execution of the binary search 
algorithm [15]. The root of the tree denotes the 
interval [1...n]. If [i...7] (GG —i = 2) is the 
interval at an internal node of the tree, its left 
child is given by [i...[(@ + j)/2]] and its right 
child is given by [[(@i + j)/2]...j]. The lcp 
value for each interval in the tree is precomputed 
and recorded in O(n) time and space. 


Theorem 4 Given the suffix array SA of text 
T and a pattern P, the existence of P in T can 
be checked in O(|P| + log |T|) time. All occur- 
rences of P in T can be found in O(occ) additional 
time, where occ denotes their number. 
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Proof The algorithm makes at most 2|P| 
comparisons in determining /cp(P, Tsar) 
and Icp(P,Tsajnj). A comparison made in 
an iteration to determine Icp(P,Ts,[mj) is 
categorized successful if it contributes the 
Icp, and categorized failed otherwise. There is 
at most one failed comparison per iteration. 
As for successful comparisons, note that 
the comparisons start with (max(/,r) + 1)" 
character of P, and each successful comparison 
increases the value of max(/, r) for the next 
iteration. Thus, each character of P is involved 
only once in a successful comparison. The total 
number of character comparisons is at most 
3|P| + log|T| = O(|P| + log |T)). Oo 


Abouelhoda et al. [1] reduce this time further to 
O(|P|) by mimicking the suffix tree algorithm 
on a suffix array with some auxiliary information. 
The strategy is useful in other applications based 
on top-down traversal of suffix trees. At this 
stage, the distinction between suffix trees and suf- 
fix arrays is blurred as the auxiliary information 
stored makes the combined data structure equiva- 
lent to a suffix tree. Using clever implementation 
techniques, the space is reduced to approximately 
6n bytes. A major advantage of the suffix tree and 
suffix array based methods is that the text T is 
often large and relatively static, while it is queried 
with several short patterns. With suffix trees and 
enhanced suffix arrays [1], once the text is pre- 
processed in O(|T|) time, each pattern can be 
queried in O(| P|) time for constant size alphabet. 
For large alphabets, the query can be answered 
in O(|P| * log|’|) time using O(n|’'|) space 
(by storing an ordered array of || pointers to 
potential children of a node), or in O(|P| * |2'|) 
time using O(n) space (by storing pointers to 
first child and next sibling). (Recently, Cole et al. 
(2006) showed how to further reduce the search 
time to O(| P| + log |+’|) while still keeping the 
optimal O(|T|) space). For indexing in various 
text-dynamic situations, see [3, 7] and references 
therein. The problem of compressing suffix trees 
and arrays is covered in more detail in other 
entries. 

While exact pattern matching has many use- 
ful applications, the need for approximate pat- 
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tern matching arises in several contexts ranging 
from information retrieval to finding evolution- 
ary related biomolecular sequences. The clas- 
sic approximate pattern matching problem is to 
find substrings in the text T that have an edit 
distance of k or less to the pattern P, i.e., the 
substring can be converted to P with at most k 
insert/delete/substitute operations. This problem 
is covered in more detail in other entries. Also 
see [16], the references therein, and Chapter 36 
of [2]. 


Applications 


Text indexing has many practical applications — 
finding words or phrases in documents under 
preparation, searching text for information re- 
trieval from digital libraries, searching distributed 
text resources such as the web, processing XML 
path strings, searching for longest matching pre- 
fixes among IP addresses for internet routing, to 
name just a few. The reader interested in further 
exploring text indexing is referred to the book by 
Crochemore and Rytter [6], and to other entries 
in this Encyclopedia. The last decade of explosive 
growth in computational biology is aided by the 
application of string processing techniques to 
DNA and protein sequence data. String indexing 
and aggregate queries to uncover mutual relation- 
ships between strings are at the heart of important 
scientific challenges such as sequencing genomes 
and inferring evolutionary relationships. For an 
in depth study of such techniques, the reader is 
referred to Parts I and II of [10] and Parts I and 
VIII of [2]. 


Open Problems 


Text indexing is a fertile research area, making it 
impossible to cover many of the research results 
or actively pursued open problems in a short 
amount of space. Providing better algorithms and 
data structures to answer a flow of string-search 
queries when caches or other query models are 
taken into account, is an interesting research 
issue [4]. 


Three Dimensional Graph Drawing 


Cross-References 


Compressed Suffix Array 

Compressed Text Indexing 

Indexed Approximate String Matching 
Indexed Two-Dimensional String Matching 


Suffix Array Construction 
Suffix Tree Construction in Hierarchical 
Memory 


Suffix Tree Construction 


Recommended Reading 


1. Abouelhoda M, Kurtz S, Ohlebusch E (2004) Replac- 
ing suffix trees with enhanced suffix arrays. J Discret 
Algorithms 2:53-86 

2. Aluru S (ed) (2005) Handbook of computational 
molecular biology, Computer and Information Sci- 
ence Series. Chapman and Hall/CRC, Boca Raton 

3. Amir A, Kopelowitz T, Lewenstein M, Lewenstein N 
(2005) Towards real-time suffix tree construction. In: 
Proceedings of the string processing and information 
retrieval symposium (SPIRE), pp 67-78 

4. Ciriani V, Ferragina P, Luccio F, Muthukrishnan S 
(2007) A data structure for a sequence of string ac- 
cesses in external memory. ACM Trans Algorithms 3 

5. Crescenzi P, Grossi R, Italiano G (2003) Search 
data structures for skewed strings. In: International 
workshop on experimental and efficient algorithms 
(WEA). Lecture notes in computer science, vol 2. 
Springer, Berlin, pp 81-96 

6. Crochemore M, Rytter W (2002) Jewels of stringol- 
ogy. World Scientific Publishing Company, Singa- 
pore 

7. Ferragina P, Grossi R (1998) Optimal on-line search 
and sublinear time update in string matching. SIAM 
J Comput 3:713-736 

8. Franceschini G, Grossi R (2004) A general tech- 
nique for managing strings in comparison-driven data 
structures. In: Annual international colloquium on 
automata, languages and programming (ICALP) 

9. Grossi R, Italiano G (1999) Efficient techniques for 
maintaining multidimensional keys in linked data 
structures. In: Annual international colloquium on 
automata, languages and programming (ICALP), pp 
372-381 

10. Gusfield D (1997) Algorithms on strings, trees and 
sequences: computer science and computational biol- 
ogy. Cambridge University Press, New York 

11. Karkkainen J, Sanders P, Burkhardt S (2006) Linear 
work suffix arrays construction. J ACM 53:918-936 

12. Kasai T, Lee G, Arimura H et al (2001) Linear-time 
longest-common-prefix computation in suffix arrays 
and its applications. In: Proceedings of the 12th 
annual symposium, combinatorial pattern matching 
(CPM), pp 181-192 


2231 


13. Ko P, Aluru S (2005) Space efficient linear time 
construction of suffix arrays. J Discret Algorithms 
3:143-156 

14. Ko P, Aluru S (2007) Optimal self-adjustring tree 
for dynamic string data in secondary storage. In: 
Proceedings of the string processing and informa- 
tion retrieval symposium (SPIRE), Santiago. Lecture 
notes in computer science, vol 4726, pp 184-194 

15. Manber U, Myers G (1993) Suffix arrays: a new 
method for on-line search. SIAM J Comput 22:935— 
948 

16. Navarro G (2001) A guided tour to approximate string 
matching. ACM Comput Surv 33:31-88 


Three-Dimensional Graph Drawing 


David R. Wood 
School of Mathematical Sciences, Monash 
University, Melbourne, VIC, Australia 


Keywords 


Track layout; Three-dimensional straight-line 
grid drawing; Treewidth 


Years and Authors of Summarized 
Original Work 


2005; Dujmovié, Morin, Wood 


Problem Definition 


A three-dimensional straight-line grid drawing 
of a graph, henceforth called a 3D drawing, 
represents the vertices by distinct grid-points in 
Z? and represents each edge by the line segment 
between its end vertices, such that no two edges 
cross. In contrast to the case in the plane, it is 
folklore that every graph has a 3D drawing. For 
example, the “moment curve” algorithm places 
the ith vertex at (i, 17, i). It is easily seen that no 
four vertices are coplanar, and thus no two edges 
cross. Since every graph has a 3D drawing, we are 
interested in optimizing certain measures of their 
aesthetic quality. If a 3D drawing is contained in 


2232 


an axis-aligned box with side lengths X¥—1, Y—1, 
and Z—1, then we speak of an X¥ x Y x Z drawing 
with volume X - Y - Z. This entry considers the 
problem of producing a 3D drawing of a given 
graph with small volume. 


Key Results 


Observe that the drawings produced by the mo- 
ment curve algorithm have O(n°) volume, where 
n is the number of vertices. Cohen et al. [2] 
improved this bound, by proving that if p is a 
prime with n < p < 2n, and the ith vertex is 
at (i,i7 mod p,i* mod p), then there is still no 
crossing. The resulting O(n?) volume bound is 
optimal for the complete graph K, since each 
grid plane may contain at most four vertices. It 
is therefore of interest to identify fixed graph 
parameters that allow for 3D drawings with small 
volume, as summarized in the following table. 


Graph family Min. volume Reference 
Arbitrary O(n?) [2] 
Bounded chromatic number @(n7) [19] 


O(n?/?) [7] 
O(n?/?) [9] 


Bounded maximum degree 
Bounded degeneracy 


H-minor-free (H fixed) n log? Dn [12] 
Bounded genus O(nlogn) [12] 
Apex-minor-free O(nlogn) [12] 
Planar O(nlogn) [6] 

Bounded treewidth O(n) [11] 


The first such parameter to be studied was the 
chromatic number. Pach et al. [19] proved that 
graphs of bounded chromatic number have 3D 
drawings with O(n?) volume. If p is a suitably 
chosen prime, the main step of their algorithm 
represents the vertices in the ith color class by 
grid-points in the set {(i, f, if) : tf = i? (mod p)}. 
It follows that the volume bound is O(k?n?) for 
k-colorable graphs. 

Pach et al. [19] also proved an Q(n”) lower 
bound for the volume of 3D drawings of the 
complete bipartite graph K,,,. This lower bound 
was generalized for all graphs by Bose et al. [1], 
who proved that every 3D drawing of an n-vertex 
m-edge graph has volume at least a(n + m). 
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In particular, the maximum number of edges in 
an X x Y x Z drawing is exactly (2X —1)(2Y — 
INQZ-1-XYZ. 

Graphs with bounded maximum degree have 
bounded chromatic number and, thus, by the 
result of Pach et al. [19], have 3D drawings with 
O(n?) volume. Pach et al. [19] conjectured that 
such graphs have 3D drawings with o(n”) vol- 
ume, which was verified by Dujmovié and Wood 
[7], who proved a O(n?/?) bound. The best lower 
bound is §2(n). Determining the optimal volume 
for 3D drawings of bounded degree graphs is a 
challenging open problem; see [13]. The O(n?/?) 
upper bound for bounded degree graphs was gen- 
eralized for graphs with bounded degeneracy [9]. 

The first nontrivial O(n) volume bound was 
established by Felsner et al. [15] for outerplanar 
graphs. Their elegant algorithm “wraps” a 2D 
drawing around a triangular prism to obtain a 
3D drawing. This result naturally led to the fol- 
lowing open problem due to Felsner et al. [15], 
which motivated much subsequent research: does 
every planar graph have a 3D drawing with O(n) 
volume? 

For some time, the O(n”) bound for 2D draw- 
ings was the best known bound in 3D. Then 
Dujmovié and Wood [7] proved that every planar 
graph has a 3D drawing with O(n?/2) volume. A 
breakthrough came with the O(n log’ n) bound 
of Di Battista et al. [4], which was improved to 
O(n logn) by Dujmovié [6] (with a much simpler 
proof). The most recent work in this direction, 
by Dujmovié et al. [12], extended this O(n log n) 
bound to all graphs of bounded Euler genus and 
more generally proved that every graph excluding 
a fixed minor has a 3D drawing with n log? n 
volume. 

The O(n) volume bound for outerplanar 
graphs mentioned above was generalized by 
Dujmovic et al. [11] as follows: 


Theorem 1 ({11]) Graphs with bounded 
treewidth have 3D drawings with O(n) volume. 


This result is the focus of the remainder of this 
entry. Treewidth is a measure of the similarity of 
a graph to a tree. It can be defined as follows. A 
graph is chordal if every induced cycle is a trian- 
gle. The treewidth of a graph G is the minimum 
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integer k such that G is a spanning subgraph of 
a chordal graph with no (k + 2)-clique. Many 
graphs arising in applications of graph drawing 
have small treewidth. Trees have treewidth 1, 
while outerplanar and series-parallel graphs have 
treewidth 2. Another example arises in software 
engineering applications. Thorup [20] proved that 
the control-flow graphs of go-to free programs 
in many programming languages have treewidth 
bounded by a small constant, in particular, 3 for 
Pascal and 6 for C. 

Reference [11] is also important because it 
discovered the connection between 3D draw- 
ings, track layouts, and queue layouts; also see 
[10, 16]. 


Track Layouts 

Track layouts are a combinatorial tool that effec- 
tively eliminates the geometry from 3D drawings 
and exposes the underlying combinatorial struc- 
ture. They were introduced in [11] although they 
are implicit in some previous work [15, 16]. 

Let V;,...,V; be the color classes in a 
(proper) vertex f-coloring of a graph G. Suppose 
that each color class V; is equipped with a 
total order, denoted by x. Call V; a track and 
Vi,...,V¢ a t-track assignment. An X-crossing 
in Vj,...,V; consists of two edges vw and xy 
such that v < x in some track V; and y ~ win 
some other track V;. A t-track assignment with 
no X-crossing is called a t-track layout. 

One can produce a track layout from an 
A x B x C drawing of a graph G as follows. 
Let V,,y be the set of vertices of G with an 
X-coordinate of x and a Y-coordinate of y. 
Order each set Vy, by the corresponding Z- 
coordinates. We obtain an A B-track layout of G, 
except that consecutive vertices in each track 
might be adjacent. Doubling each track and 
putting alternate vertices in V,, on distinct 
tracks gives a 2AB-track layout of G. Most 
interestingly, a converse result is also true. 


Theorem 2 ({11]) /f an n-vertex graph has a t- 
track layout, then G has a O(t) x O(t) x O(n) 
drawing with O(t?n) volume. 


The proof of Theorem 2 is inspired by the 
generalizations of the moment curve algorithm 
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by Cohen et al. [2] and Pach et al. [19]. Loosely 
speaking, Cohen et al. [2] allow three “free” 
dimensions, whereas Pach et al. [19] use a color- 
ing to “fix” one dimension with two dimensions 
free. Theorem 2 uses a track layout to fix two 
dimensions with one dimension free; see Fig. 1. 
In particular, say (Vi,..., V;) is the given f-track 
layout. Let p be the smallest prime such that p > 
k. Then p < 2k by Bertrand’s postulate. For 1 < 
i < k, represent the vertices in V; by the grid- 
points {(i,i? mod p,t): 1<t<p-|V;|,t =i° 
(mod p)}, such that the Z-coordinates respect 
the given total order of V;. 

Note that Dujmovic and Wood [7] combined 
the method of Pach et al. [19] with the proof of 
Theorem 2 to conclude a O(tn) volume bound 
of 3D drawings of t-track graphs with bounded 
chromatic number. 

As an example of how to construct a track 
layout, we now show that every tree T has a 3- 
track layout (which is implicitly proved in [15]). 
Let r be a vertex of 7. Let V; be the vertices 
at distance i from r. Note that (Vo,Vi,...) isa 
coloring of T. Clearly, each color class V; can be 
ordered so that there is no X-crossing; see Fig. 2a. 
Hence (Vo, Vi, ...) is a track layout. Note that, 
working from the root down, the child nodes of 
each node can be ordered arbitrarily. This will be 
important later. Now, imagine wrapping this track 
layout around a prism; see Fig. 2b. That is, for 
0 <i < 2, group tracks Vj ~ V34; ~ Vo+i ~< 
... to obtain a 3-track layout of T. 


An Algorithm for Graphs of Bounded 
Treewidth 

Theorem | is an immediate consequence of The- 
orem 2 and the following claim, which we prove 
by induction on k > 0: for each integer k > 0, 
there is an integer ¢, such that every k-tree has 
a t,-track layout. A O-tree has no edges and thus 
has a 1-track layout. A 1-tree is a tree which has a 
3-track layout. Thus the result holds with tg = 1 
and tj = 3. Let G be a k-tree. Various authors 
have proved that G can be decomposed as follows 
[11, 18]. There is a tree TJ rooted at some node 
r and a partition {B, : x € V(T)} of V(G) 
indexed by the nodes of T with the following 
properties: 
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¢ For each edge vw of G, there is a node x of T 
such that v, w € By, or there is an edge xy of 
T such that v € By, andwe By. 

¢ For each node x of T, the induced subgraph 
G[B,] is a (k — 1)-tree. 

¢ For each non-root node y of T, if x is the 
parent node of y, and Cy is the set of vertices 
in Bx adjacent to some vertex in By, then Cy 
is a clique in G called the parent clique of y. 


By induction, for each node x of 7,, there is 
a te—,-track layout of G[B,]. Each clique C in 
G[B,] has size at most k. Define the signature of 
C to be the set of (at most k) tracks that contain 
C. Since there is no X-crossing, the set of cliques 
of G[B,.] with the same signature can be linearly 
ordered C, < --- < Cp, such that if v and w are 
vertices in the same track, and in distinct cliques 
C; and C; withi < j, then v < w in that track. 
Call this a clique ordering. 
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Let 7o,7,, 72 be a 3-track layout of T de- 
scribed above. Replace each track T; by t,—, sub- 
tracks, and replace each node x ¢€ T; by the t,_1- 
track layout of G[B,,]. This defines a 3-t,_1 track 
assignment for G. Clearly an edge in some G[B,] 
is in no X-crossing with any other edge. There 
is no X-crossing between two edges between a 
parent bag B, and some same child bag By, 
since the end points in By of such edges form 
a clique (the parent clique of y) and therefore are 
in distinct tracks. The only possible X-crossing is 
between edges ab and cd, where a and ¢ are in 
some parent bag B, and b and d are in distinct 
child bags By and Bz, respectively. 

To solve this problem, when determining the 
3-track layout of 7, the child nodes of each node 
x are ordered in their track so that y < z when- 
ever the parent cliques Cy and Cz have the same 
signature and Cy ~< C, in the clique ordering. 
Then group the child nodes of x according to 
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Three-Dimensional Graph Drawing, Fig. 1 A 3D drawing produced from a track layout 


CELL 


Three-Dimensional Graph Drawing, Fig. 2 A 3-track layout of a tree 
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Three-Dimensional Graph Drawing, Fig. 3. Final track layout with 3(t;—1)* groups of tx,—1 tracks 


the signatures of their parent cliques, and for 
each signature o, use a distinct set of t,._, tracks 
for the child bags whose parent cliques have 
signature o. Now the ordering of the child bags 
with the same signature agrees with the clique 
ordering of their parent cliques and therefore 
agrees with the ordering of any neighbors in the 
parent bag. It follows that there is no X-crossing, 
as illustrated in Fig. 3. The number of tracks is 
at most 3¢,;_, times the number of signatures, 
which is at most ye ‘a ee (t,_1)*. This 
completes the proof with tp := 3(t_1)**?. 

This proof makes no effort to reduce the bound 
on t,. The recurrence roughly solves to 3&+?)', 
The original proof by Dujmovic et al. [11] re- 
duces this bound to a doubly exponential function 
in k. Further improvements were made by Di 
Giacomo et al. [5], but the bound is still doubly 
exponential. The best lower bound, due to Duj- 
movié et al. [11], is 2(k*). For k = 2, the best 
upper bound is 15, due to Di Giacomo et al. [5]. 


Other Models for 3D Graph Drawing 
¢ Polyline grid drawings, where bends in the 
edges are allowed (at grid-points) [3, 8] 


¢ Orthogonal 3D drawings, where the edges are 
routed along the grid-lines [14,21] 

¢ Upward 3D drawings of directed acyclic 
graphs [5,9] 

* Symmetrical 3D drawings with vertices in R? 
[17] 
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Problem Definition 


Consider n Boolean variables V = {x1,...,Xn} 
and the corresponding set of 2 literals L = 
{x1X1...,Xn,Xn}. A k-clause is a disjunction 
of k literals of distinct underlying variables. A 
random formula ¢y,m in k conjunctive normal 
form (k-CNF) is the conjunction of m clauses, 
each selected in a uniformly random and in- 
dependent way among the 2* (7) possible k- 
clauses on 1 variables in V. The density rz of 
a k-CNF formula $y m is the clauses-to-variables 
ratio m/n. 

It was conjectured that for each k >2 there 
exists a critical density r;,* such that asymptot- 
ically almost all (a.a.a.) K-CNF formulas with 
density r < r,* (r > rx*) are satisfiable (un- 
satisfiable, respectively). So far, the conjecture 
has been proved only for k = 2 [3, 11]. For 
k > 3, the conjecture still remains open but is 
supported by experimental evidence [14] as well 
as by theoretical, but non-rigorous, work based on 
statistical physics [15]. The value of the putative 
threshold r3* is estimated to be around 4.27. 
Approximate values of the putative threshold for 
larger values of k have also been computed. 
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As far as rigorous results are concerned, 
Friedgut [10] proved that for each k > 3, 
there exists a sequence r;,.*(n) such that for any 
€ > 0, a.a.a. kK-CNF formulas ¢n, | (7; (”) — €)n J 
(gn, (rg (1) + €)n]) are satisfiable (unsatisfiable, 
respectively). The convergence of the sequence 


re*(n),n = 0,1,... for k >3 remains open. 
Let now 
*— : * 
OF lim, _, 907% (n) 


II 


sup{rx : Pr[@n,jr,n| is satisfiable > 1]} 


and 


et 
Vk 


II 


limyoorg (1) 


II 


inf{rx : Pr[dnfr,n] is satisfiable + O]}. 
Obviously, r,*~ < r,* +. Bounding from below 
(from above) rg* ~(rz,* +, respectively) with an 
as large as possible (as small as possible, re- 
spectively) bound has been the subject of intense 
research work in the past decade. 

Upper bounds to rgz** are computed by 
counting arguments. To be specific, the standard 
technique is to compute the expected number 
of satisfying truth assignments of a random 
formula with density rz and find an as small as 
possible value of r; for which this expected value 
approaches zero. Then, by Markov’s inequality, 
it follows that for such a value of rz, a random 
formula $y ,[-in] 18 unsatisfiable asymptotically 
almost always. This argument has been refined in 
two directions: First, consider not all satisfying 
truth assignments but a subclass of them with 
the property that a satisfiable formula always has 
a satisfying truth assignment in the subclass 
considered. The restriction to a judiciously 
chosen such subclass forces the expected value 
of the number of satisfying truth assignments to 
get closer to the probability of satisfiability and 
thus leads to a better (smaller) upper bound rz. 
However, it is important that the subclass should 
be such that the expected value of the number of 
satisfying truth assignments can be computable 
by the available probabilistic techniques. 

Second, make use in the computation of the 
expected number of satisfying truth assignments 
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of typical characteristics of the random formula, 
i.e., characteristics shared by a.a.a. formulas. 
Again this often leads to an expected number 
of satisfying truth assignments that is closer 
to the probability of satisfiability (nontypical 
formulas may contribute to the increase of the 
expected number). Increasingly better upper 
bounds to r3** have been computed using 
counting arguments as above (see the surveys 
[6, 13]). Dubois, Boufkhad, and Mandler [7] 
proved r3* * < 4.506. The latter remains the 
best upper bound to date. 


On the other hand, for fixed and small values 
of k (especially for k = 3), lower bounds to r,* ~ 
are usually computed by algorithmic methods. 
To be specific, one designs an algorithm that for 
an as large as possible rz it returns a satisfying 
truth assignment for a.a.a. formulas @p || :4nj. Such 
an rz is obviously a lower bound to r,* ~. The 
simpler the algorithm, the easier to perform the 
probabilistic analysis of returning a satisfying 
truth assignment for a given rz, but the smaller 
the r;’s for which a satisfying truth assignment 
is returned asymptotically almost always. In this 
context, backtrack-free DPLL algorithms [4, 5] 
of increasing sophistication were rigorously an- 
alyzed (see the surveys [1,9]). At each step of 
such an algorithm, a literal is set to TRUE and 
then a reduced formula is obtained by (i) deleting 
clauses where this literal appears and by (ii) delet- 
ing the negation of this literal from the clauses it 
appears. At steps at which 1-clauses exist (known 
as forced steps), the selection of the literal to be 
set to TRUE is made so as a |-clause becomes 
satisfied. At the remaining steps (known as free 
steps), the selection of the literal to be set to TRUE 
is made according to a heuristic that characterizes 
the particular DPLL algorithm. A free step is fol- 
lowed by a round of consecutive forced steps. To 
facilitate the probabilistic analysis of DPLL al- 
gorithms, it is assumed that they never backtrack: 
if the algorithm ever hits a contradiction, i.e., a 
0-clause is generated, it stops and reports failure; 
otherwise, it returns a satisfying truth assignment. 
The previously best lower bound for the satisfia- 
bility threshold obtained by such an analysis was 
3.26 <1r3*~ [2]. 
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The previously analyzed such algorithms 
(with the exception of the Pure Literal algorithm 
[8]) at a free step take into account only the clause 
size where the selected literal appears. Due to this 
limited information exploited on selecting the 
literal to be set, the reduced formula in each step 
remains random conditional only on the current 
numbers of 3- and 2-clauses and the number 
of yet unassigned variables. This retention 
of “strong” randomness permits a successful 
probabilistic analysis of the algorithm in a not 
very complicated way. However, for k = 3, it 
succeeds to show satisfiability only for densities 
up to a number slightly larger than 3.26. In 
particular, in [2] it is shown that this is the optimal 
value that can be attained by such algorithms. 


Key Results 


In [12], a DPLL algorithm is described (and 
then probabilistically analyzed) such that each 
free step selects the literal to be set to TRUE, 
taking into account its degree (i.e., its number of 
occurrences) in the current formula. 


Algorithm Greedy 

The first variant of the algorithm is very simple: 
At each free step, a literal with the maximum 
number of occurrences is selected and set to 
TRUE (Section 4.A in [12]). Notice that in this 
greedy variant, a literal is selected irrespectively 
of the number of occurrences of its negation. This 
algorithm successfully returns a satisfying truth 
assignment for a.a.a. formulas with density up to 
a number slightly larger than 3.42, establishing 
that r3* — > 3.42. Its simplicity, contrasted with 
the improvement over the previously obtained 
lower bounds, suggests the importance of ana- 
lyzing heuristics that take into account degree 
information of the current formula. 


Algorithm CL 

In the second variant, at each free step ft, the 
degree of the negation T of the literal t that is 
set to TRUE is also taken into account (Section 
5.A in [12]). Specifically, the literal to be set 
to TRUE is selected so as upon the completion 
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of the round of forced steps that follow the 
free step t, the marginal expected increase of 
the flow from 2-clauses to 1-clauses per unit of 
expected decrease of the flow from 3-clauses to 
2-clauses is minimized. The marginal expectation 
corresponding to each literal can be computed 
from the numbers of its positive and negative 
occurrences. More specifically, if m;,i = 2,3 
equals the expected flow of i-clauses to (i — 1)- 
clauses at each step of a round, and T is the literal 
set to TRUE at the beginning of the round, then 
Am) 
M3 | 
of the differences Amz and Am3 between the 
beginning and the end of the round. This has an 
effect to the bounding of the rate of generation 
of l-clauses by the smallest possible number 
throughout the algorithm. For the probabilistic 
analysis to go through, we need to know for 
each i, 7 the number of literals with degree i 
whose negation has degree j. This heuristic suc- 
ceeds in returning a satisfying truth assignment 
for a.a.a. formulas with density up to a num- 
ber slightly larger than 3.52, establishing that 
r3* — > 3.52. 


t is chosen so as to minimize the ratio | 


Applications 


Some applications of SAT solvers include se- 
quential circuit verification, artificial intelligence, 
automated deduction and planning, VLSI, CAD, 
model-checking, and other types of formal ver- 
ification. Recently, automatic SAT-based model- 
checking techniques were used to effectively find 
attacks on security protocols. 


Open Problems 


The main open problem in the area is to formally 
show the existence of the threshold r;* for all 
(or at least some) k > 3. To rigorously compute 
upper and lower bounds better than the ones men- 
tioned here still attracts some interest. Related 
results and problems arise in the framework of 
variants of the satisfiability problem and also the 
problem of colorability. 
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Problem Definition 


The application of techniques from Combinato- 
rial and Algebraic Topology has been successful 
at solving a number of problems in distributed 
computing. In 1993, three independent teams [3, 
15, 17], using different ways of generalizing the 
classical graph-theoretical model of distributed 
computing, were able to solve set agreement 
a long-standing open problem that had eluded the 
standard approaches. Later on, in 2004, journal 
articles by Herlihy and Shavit [15] and by Saks 
and Zaharoglou [17] were to win the prestigious 
Gédel prize. This paper describes the approach 
taken by the Herlihy/Shavit paper, which was the 
first draw the connection between Algebraic and 
Combinatorial Topology and Distributed Com- 
puting. 

Pioneering work in this area, such as by Biran, 
Moran, and Zaks [2] used graph-theoretic notions 
to model uncertainty, and were able to express 
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certain lower bounds in terms of graph connec- 
tivity. This approach, however, had limitations. 
In particular, it proved difficult to capture the 
effects of multiple failures or to analyze decision 
problems other then consensus. 

Combinatorial topology generalizes the no- 
tion of a graph to the notion of a simplicial 
complex, a structure that has been well-studied 
in mainstream mathematics for over a century. 
One property of central interest to topologists is 
whether a simplicial complex has no “holes” be- 
low a certain dimension k, a property known as k - 
connectivity. Lower bounds previously expressed 
in terms of connectivity of graphs can be general- 
ized by recasting them in terms of k-connectivity 
of simplicial complexes. By exploiting this in- 
sight, it was possible to solve some open prob- 
lems (k-set agreement, renaming), to pose and 
solve some new problems ([13]), and to unify 
a number of disparate results and models [14]. 


Key Results 


A vertex U is a point in a high-dimensional 
Euclidean space. Vertexes Uo,..., Un are affinely 
independent if 01 — Uo,...,Un — Uo are linearly 
independent. An n -dimensional simplex (or n - 
simplex) S" = (So,..-,5n) is the convex hull 
of a set of n + 1 affinely-independent vertexes. 
For example, a 0-simplex is a vertex, a 1-simplex 
a line segment, a 2-simplex a solid triangle, and 
a 3-simplex a solid tetrahedron. Where conve- 
nient, superscripts indicate dimensions of sim- 
plexes. The So,...,5, are said to span S". By 
convention, a simplex of dimension d < 0 is an 
empty simplex. 

A simplicial complex (or complex) is a set 
of simplexes closed under containment and in- 
tersection. The dimension of a complex is the 
highest dimension of any of its simplexes. L is 
a subcomplex of K if every simplex of L is 
a simplex of K. A map uw: K — CL carrying 
vertexes to vertexes is simplicial if it also induces 
a map of simplexes to simplexes. 


Definition 1 A complex K is k -connected if 
every continuous map of the k-sphere to K can be 


Topology Approach in Distributed Computing 


extended to a continuous map of the (k + 1)-disk. 
By convention, a complex is (—1) -connected if 
and only if it is nonempty, and every complex is 
k -connected for k < —1. 


A complex is 0-connected if it is connected 
in the graph-theoretic sense, and a complex is k- 
connected if it has no holes in dimensions k or 
less. The definition of k-connectivity may appear 
difficult to use, but fortunately reasoning about 
connectivity can be done in a combinatorial way, 
using the following elementary consequence of 
the Mayer—Vietoris sequence. 


Theorem 2 [If K and L£ are complexes such 
that K and £ are k-connected, and K M1 L 
is (k —1)-connected, then K U L£ is k-connected. 


This theorem, plus the observation that any 
non-empty simplex is k-connected for all k, al- 
lows reasoning about a complex’s connectivity 
inductively in terms of the connectivity of its 
components. 


A set of n + | sequential processes commu- 
nicate either by sending messages to one another 
or by applying operations to shared objects. At 
any point, a process may crash: it stops and 
takes no more steps. There is a bound f on the 
number of processes that can fail. Models differ 
in their assumptions about timing. At one end of 
the spectrum is the synchronous model in which 
computation proceeds in a sequence of rounds. 
In each round, a process sends messages to the 
other processes, receives the messages sent to it 
by the other processes in that round, and changes 
state. (Or it applies operations to shared objects.) 
All processes take steps at exactly the same rate, 
and all messages are delivered with exactly the 
same message delivery time. At the other end 
is the asynchronous model in which there is no 
bound on the amount of time that can elapse 
between process steps, and there is no bound on 
the time it can take for a message to be delivered. 
Between these extremes is the semi-synchronous 
model in which process step times and message 
delivery times can vary, but are bounded between 
constant upper and lower bounds. Proving a lower 
bound in any of these models requires a deep 


Topology Approach in Distributed Computing 


understanding of the global states that can arise in 
the course of a protocol’s execution, and of how 
these global states are related. 

Each process starts with an input value taken 
from a set V, and then executes a deterministic 
protocol in which it repeatedly receives one or 
more messages, changes its local state, and sends 
one or more messages. After a finite number of 
steps, each process chooses a decision value and 
halts. 

In the k-set agreement task [5], processes are 
required to (1) choose a decision value after 
a finite number of steps, (2) choose as their 
decision values some process’s input value, and 
(3) collectively choose no more than k distinct 
decision values. When k = 1, this problem is 
usually called consensus [16]. 

Here is the connection between topological 
models and computation. An initial local state of 
process P is modeled as a vertex ¥ = (P,v) 
labeled with P’s process id and initial value v. 
An initial global state is modeled as an n-simplex 
S" = ((Po,vo),---,(Pu,Un)), where the P; 
are distinct. The term ids(S”) denotes the set of 
process ids associated with $”, and vals(S”) the 
set of values. The set of all possible initial global 
states forms a complex, called the input complex. 

Any protocol has an associated protocol com- 
plex P, defined as follows. Each vertex is labeled 
with a process id and a possible local state for that 
process. A set of vertexes (Po, vo0),..., (Pa, va) 
spans a simplex of P if and only if there is some 
protocol execution in which Po,..., Pq finish the 
protocol with respective local states vo,..., Ug. 
Each simplex thus corresponds to an equivalence 
class of executions that “look the same” to the 
processes at its vertexes. The term P(S”) to 
denote the subcomplex of P corresponding to ex- 
ecutions in which only the processes in ids(S”) 
participate (the rest fail before sending any mes- 
sages). If m <n — f, then there are no such exe- 
cutions, and P(S”) is empty. The structure of the 
protocol complex P depends both on the protocol 
and on the timing and failure characteristics of the 
model. P often refers to both the protocol and its 
complex, relying on context to disambiguate. 

A protocol solves k-set agreement if there is 
a simplicial map 4, called decision map, carrying 
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vertexes of P to values in V such that if p € 
P(S”) then 5(p) € vals(S”), and 5 maps the 
vertexes of any given simplex in P(S”) to at 
most k distinct values. 


Applications 


The renaming problem is a key tool for un- 
derstanding the power of various asynchronous 
models of computation. 


Open Problems 


Characterizing the full power of the topological 
approach to proving lower bounds remains an 
open problem. 


Cross-References 


Asynchronous Consensus Impossibility 
Renaming 


Recommended Reading 


Perhaps the first paper to investigate the solv- 
ability of distributed tasks was the landmark 
1985 paper of Fischer, Lynch, and Paterson [6] 
which showed that consensus, then considered 
an abstraction of the database commitment prob- 
lem, had no 1-resilient message-passing solution. 
Other tasks that attracted attention include re- 
naming [1, 12, 15] and set agreement [3, 5, 12, 
10,15, 17]; 

In 1988, Biran, Moran, and Zaks [2] gave 
a graph-theoretic characterization of decision 
problems that can be solved in the presence of 
a single failure in a message-passing system. 
This result was not substantially improved until 
1993, when three independent research teams 
succeeded in applying combinatorial techniques 
to protocols that tolerate delays by more 
than one processor: Borowsky and Gafni [3], 
Saks and Zaharoglou [17], and Herlihy and 
Shavit [15]. 
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Later, Herlihy and Rajsbaum used homology 
theory to derive further impossibility results for 
set agreement and to unify a variety of known 
impossibility results in terms of the theory of 
chain maps and chain complexes [12]. Using the 
same simplicial model. 

Biran, Moran, and Zaks [2] gave the first de- 
cidability result for decision tasks, showing that 
tasks are decidable in the 1-resilient message- 
passing model. Gafni and Koutsoupias [7] were 
the first to make the important observation that 
the contractibility problem can be used to prove 
that tasks are undecidable, and suggest a strategy 
to reduce a specific wait-free problem for three 
processes to a contractibility problem. Herlihy 
and Rajsbaum [11] provide a more extensive 
collection of decidability results. 

Borowsky and Gafni [3], define an iterated 
immediate snapshot model that has a recursive 
structure. Chaudhuri, Herlihy, Lynch, and 
Tuttle [4] give an inductive construction for 
the synchronous model, and while the resulting 
“Bermuda Triangle” is visually appealing and 
an elegant combination of proof techniques 
from the literature, there is a fair amount of 
machinery needed in the formal description 
of the construction. In this sense, the formal 
presentation of later constructions is substantially 
more succinct. 

More recent work in this area includes separa- 
tion results [8] and complexity lower bounds [9]. 
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Problem Definition 


A dynamic graph algorithm maintains a given 
property P on a graph subject to dynamic 
changes, such as edge insertions, edge deletions 
and edge weight updates. A dynamic graph 
algorithm should process queries on property P 
quickly, and perform update operations faster 
than recomputing from scratch, as carried out by 
the fastest static algorithm. A typical definition is 
given below: 


Definition 1 (Dynamic graph algorithm) Given 
a graph and a graph property P, a dynamic graph 
algorithm is a data structure that supports any 
intermixed sequence of the following operations: 


insert(u, v): insert edge (u, v) into the 
graph. 
delete(u, v): delete edge (u, v) from the 


graph. 
query (...): answer a query about prop- 
erty P of the graph. 


A graph algorithm is fully dynamic if it can 
handle both edge insertions and edge deletions 
and partially dynamic if it can handle either edge 
insertions or edge deletions, but not both: it is 
incremental if it supports insertions only, and 
decremental if it supports deletions only. Some 
papers study variants of the problem where more 
than one edge can be deleted of inserted at the 
same time, or edge weights can be changed. In 
some cases, an update may be the insertion or 
deletion of a node along with all edges incident to 
them. Some other papers only deal with specific 
classes of graphs, e.g., planar graphs, directed 
acyclic graphs (DAGs), etc. 

There is a vast literature on dynamic graph 
algorithms. Graph problems for which efficient 
dynamic solutions are known include graph con- 
nectivity, minimum cut, minimum spanning tree, 
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transitive closure, and shortest paths (see, e.g., [3] 
and the references therein). Many of them update 
explicitly the property P after each update in 
order to answer queries in optimal time. This may 
be a good choice in scenarios where there are 
few updates and many queries. In applications 
where the numbers of updates and queries are 
comparable, a better approach would be to try 
to reduce the update time, possibly at the price 
of increasing the query time. This is typically 
achieved by relaxing the assumption that the 
property P should be maintained explicitly. 

This entry focuses on algorithms for dynamic 
graph problems that maintain the graph property 
implicitly, and thus require non-constant query 
time while supporting faster updates. In particu- 
lar, it considers two problems: dynamic transitive 
closure (also known as dynamic reachability) and 
dynamic all-pairs shortest paths, defined below. 


Definition 2 (Fully dynamic transitive closure) 
The fully dynamic transitive closure problem con- 
sists of maintaining a directed graph under an 
intermixed sequence of the following operations: 


insert(u, v): insert edge (u, v) into the 
graph. 

delete(u, v): delete edge (u, v) from the 
graph. 

query(x, y): return true if there is a directed 
path from vertex x to vertex y, and false other- 
wise. 


Definition 3 (Fully dynamic all-pairs short- 
est paths) The fully dynamic transitive closure 
problem consists of maintaining a weighted di- 
rected graph under an intermixed sequence of the 
following operations: 


insert(u, v): insert edge (u, v) into the 
graph with weight w. 

delete(u, v): delete edge (u, v) from the 
graph. 

query(x, y): return the distance from x to y in 
the graph, or +00 if there is no directed path 
from x to y. 


Recall that the distance from a vertex x to a vertex 
y is the weight of a minimum-weight path from x 
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to y, where the weight of a path is defined as the 
sum of edge weights in the path. 


Key Results 


This section presents a survey of query/update 
tradeoffs for dynamic transitive closure and dy- 
namic all-pairs shortest paths. 


Dynamic Transitive Closure 

The first query/update tradeoff for this problem 
was devised by Henzinger and King [6], who 
proved the following result: 


Theorem 1 (Henzinger and King 1995 [6]) 
Given a general directed graph, there is a ran- 
domized algorithm with one-sided error for the 
fully dynamic transitive closure that supports 
a worst-case query time of O(n/logn) and an 
amortized update time of O(m,/n log? n). 


The first subquadratic algorithm for this problem 
is due to Demetrescu and Italiano for the case of 
directed acyclic graphs [4, 5]: 


Theorem 2 (Demetrescu and Italiano 2000 
[4,5]) Given a directed acyclic graph with 
n vertices, there is a randomized algorithm 
with one-sided error for the fully dynamic 
transitive closure problem that supports each 
query in O(n ) time and each insertion/deletion in 
O(nLe-D-€e 4 n!+), for any € € [0,1], where 
(1, €, 1) is the exponent of the multiplication of 
ann x n§ matrix by an n& x n matrix. 


Notice that the dependence of the bounds upon 
parameter ¢ leads to a full range of query/update 
tradeoffs. Balancing the two terms in the update 
bound of Theorem 2 yields that ¢ must satisfy the 
equation w(1,¢€,1) = 1+ 2¢. The current best 
bounds on w(1, €, 1) [2, 7] imply that € < 0.575. 
Thus, the smallest update time is O(n!°”>), which 
gives a query time of O(n®°”>) (Table 1): 


Corollary 1 (Demetrescu and Italiano 2000 
[4,5]) Given a directed acyclic graph with n 
vertices, there is a randomized algorithm with 
one-sided error for the fully dynamic transitive 
closure problem that supports each query in 
O(n°>7>) time and each insertion/deletion in 
O(n!97>) time. 
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This result has been generalized to the case of 
general directed graphs by Sankowski [13]: 


Theorem 3 (Sankowsk 2004 [13]) Given 
a general directed graph with n vertices, there 
is a randomized algorithm with one-sided error 
for the fully dynamic transitive closure problem 
that supports each query in O(n) time and each 
insertion/deletion in O(n@*-D-€ + n!+¢), for 
any € € [0,1], where w(1,€,1) is the exponent 
of the multiplication of an n x n€ matrix by an 
n& X n matrix. 


Corollary 2 (Sankowski 2004 [13]) Given 
a general directed graph with n vertices, there 
is a randomized algorithm with one-sided error 
for the fully dynamic transitive closure problem 
that supports each query in O(n°>") time and 
each insertion/deletion in O(n'>">) time. 


Sankowski has also shown how to achieve an 
even faster update time of O(n!) at the expense 
of a much higher O(n!*>) query time: 


Theorem 4 (Sankowski 2004 [13]) Given 
a general directed graph with n vertices, there is 
a randomized algorithm with one-sided error for 
the fully dynamic transitive closure problem that 
supports each query and each insertion/deletion 
in O(n! *> ) time. 


Roditty and Zwick presented algorithms designed 
to achieve better bounds in the case of sparse 
graphs: 


Theorem 5 (Roditty and Zwick 2002 [10]) 
Given a general directed graph with n vertices 
and m edges, there is a deterministic algorithm 
for the fully dynamic transitive closure problem 
that supports each insertion/deletion in O(m./n) 
amortized time and each query in O(./n) worst- 
case time. 


Theorem 6 (Roditty and Zwick 2004 [11]) 
Given a_ general directed graph with n 
vertices and m edges, there is a deterministic 
algorithm for the fully dynamic transitive closure 
problem that supports each insertion/deletion in 
O(m + nlogn) amortized time and each query 
in O(n) worst-case time. 
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Trade-Offs for Dynamic Graph Problems, Table 1 Fully dynamic transitive closure algorithms with implicit solution 


representation 

Type of graphs Type of algorithm Update time Query time Reference 
General Monte Carlo O(m/n log? n)amort O(n/ logn) HK [6] 
DAG Monte Carlo o(n! 57>) o(n®>?>) DI [4] 
General Monte Carlo o(n!>7>) o(n®>?>) Sank. [13] 
General Monte Carlo o(n'49>) o(n' 49>) Sank. [13] 
General Deterministic O(m./n)amort O(/n) RZ [10] 
General Deterministic O(m + n logn)amort O(n) RZ [11] 


Observe that the results of Theorem 5 and The- 
orem 6 are subquadratic for m = o(n!°) and 
m = 0(n”), respectively. Moreover, they are not 
based on fast matrix multiplication, which is 
theoretically efficient but impractical. 


Dynamic Shortest Paths 

The first effective tradeoff algorithm for dynamic 
shortest paths is due to Roditty and Zwick in 
the special case of sparse graphs with unit edge 
weights [12]: 


Theorem 7 (Roditty and Zwick 2004 [12]) 
Given a general directed graph with n vertices, m 
edges, and unit edge weights, there is a random- 
ized algorithm with one-sided error for the fully 
dynamic all-pairs shortest paths problem that 
supports each distance query in O(t + mien 
worst-case time and each insertion/deletion in 
On Joe 4 km + mn osm) amortized time. 


By choosing k = (n logn)!/? and (n logn)!/2 < 
t < n?/4(logn)!/4 in Theorem 7, it is possible to 
obtain an amortized update time of OE ery 
and a worst-case query time of O(t). The fastest 
update time of O(m,/nlogn) is obtained by 


choosing t = n3/4(logn)!/4. 


Later, Sankowski devised the first sub- 
quadratic algorithm for dense graphs based on 
fast matrix multiplication [14]: 


Theorem 8 (Sankowski 2005 [14]) Given 
a general directed graph with n vertices and 
unit edge weights, there is a randomized 
algorithm with one-sided error for the fully 
dynamic all-pairs shortest paths problem that 
supports each distance query in O(n'%8) 


time and each insertion/deletion in O(n'”*) 


time. 


Applications 


The transitive closure problem studied in this en- 
try is particularly relevant to the field of databases 
for supporting transitivity queries on dynamic 
graphs of relations [16]. The problem also arises 
in many other areas such as compilers, interac- 
tive verification systems, garbage collection, and 
industrial robotics. 

Application scenarios of dynamic shortest 
paths include network optimization [1], docu- 
ment formatting [8], routing in communication 
systems, robotics, incremental compilation, 
traffic information systems [15], and dataflow 
analysis. A comprehensive review of real-world 
applications of dynamic shortest path problems 
appears in [9]. 


Open Problems 


It is a fundamental open problem whether the 
fully dynamic all pairs shortest paths problem of 
Definition 3 can be solved in subquadratic time 
per operation in the case of graphs with real- 
valued edge weights. 
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Problem Definition 


A transactional memory (TM) is a concurrency 
control mechanism for executing accesses 
to memory shared by multiple processes. A 
transaction, in this context, is a section of code 
that executes a series of reads and writes to the 
shared memory as one atomic indivisible unit. 
As a result, intermediate states of a transaction 
are hidden from other concurrent transactions, 
and it is only possible to see either all of the 
modifications of a transaction or none of them. 


Transactional Memory 


The goal of transactional memory is to provide 
an alternative to lock-based concurrency control. 
A programmer can replace the use of lock-based 
critical sections with transactions and rely on 
the TM system to execute these sections concur- 
rently while preserving their atomicity. During 
the execution, the TM system tracks the reads 
and writes to the shared memory by the different 
transactions and, in this way, is able to detect 
conflicts: situations in which transactions are ex- 
ecuting operations to the same memory location. 
Most TM systems are optimistic, executing with 
the expectation that there will be few conflicts or 
none. When a conflict is detected, the TM system 
may have to abort and restart the transaction. 
The modifications to memory performed by a 
transaction must thus be reversible. 

The concept of transactional memory and a 
pure hardware implementation of it (HTM) were 
proposed by Herlihy and Moss [9] in 1993. Two 
years later, Shavit and Touitou proposed a pure 
software implementation (STM) [16], and since 
then HTM and STM systems have been in the 
focus of intensive research efforts to make them 
simple and practical for general use. Today’s TM 
systems are not pure hardware or software, but 
rather a hybrid of HTM and STM. 


Key Results 


TM C/C++ Specification and Compiler 
Support 

Transactional memory became an industry stan- 
dard with the addition of transactional language 
constructs into the C++ specification [1]. The lat- 
est GNU C/C++ compiler implements these TM 
constructs and provides runtime support for state- 
of-the-art TM algorithms. Figure 1 shows an 
example of a GCC TM transaction that is defined 
by using the new __transaction_atomic keyword. 


HTM in Mainstream Processors 

The latest commodity Intel and IBM proces- 
sors provide support for hardware transactions 
by leveraging the processor’s hardware cache- 
coherence protocol to track transactional reads 
and writes and detect conflicts. They unfortu- 
nately provide no progress guarantee for hard- 
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int red_black_tree_contains(node *root, int value) { 
node *cur_node = root; 


__transaction_atomic 
{ 
while (cur_node != NULL) 
{ 
If (cur_node.value == value) { 
return true; 


} 


If (cur_node.value < value) { 
cur_node = cur_node.left; 

} else { 
cur_node = cur_node. right; 


} 


} 


return false; 
} 
} 


Transactional Memory, Fig. 1 An example of using 
the GCC TM mechanism to define the red-black tree 
contains(...) operation as a transaction 


ware transactions: a transaction may fail due 
to a hardware-related reason (like an L1 cache 
capacity overflow or an interrupt), and this can 
happen repeatedly so the transaction may never 
succeed. To overcome this limitation and provide 
a progress guarantee, researchers have developed 
hybrid TM systems [5, 10, 11] that execute failed 
hardware transactions in an all-software fallback 
path. 


STM Implementations 

Software transactions have become much faster 
and more practical since their introduction 
by Shavit and Touitou. The state-of-the-art 
TL2/LSA style STM designs [6, 7] provide 
software transactions with a guarantee of 
opacity [8]: the transaction always executes 
on a consistent memory state. Opacity enables 
simple STM runtime implementations, since it 
effectively eliminates the need to detect and 
handle any runtime errors that could be generated 
by inconsistent executions. 

The TL2/LSA style STMs use a global clock 
and per object metadata to coordinate transac- 
tions, which introduce high constant overheads 
for reads and writes compared to the pure ex- 
ecution of those reads and writes in hardware. 
As a result, the TL2/LSA STMs usually perform 
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well at high concurrency levels, but exhibit poor 
results for low concurrency. Unlike hardware 
transactions, they provide a progress guarantee. 
An alternative design to TL2/LSA is the NORec 
STM [4] that has no per object metadata and 
only uses a single global clock to coordinate 
transactions. The overheads of NORec are very 
low and operate well at low concurrency levels. 


Hardware Lock Elision 

Hardware lock elision (HLE) [14] is a mech- 
anism provided by the HTM systems of Intel 
and IBM and used to optimize lock-based crit- 
ical sections. The idea of HLE is simple: try 
to execute the lock-based critical sections con- 
currently, by using hardware transactions, and if 
there is a conflict, then fall back to the serial 
lock-based execution. In this way, the HLE can 
automatically introduce concurrency into non- 
conflicting lock-based critical sections, without 
the need to modify the existing application’s 
code. 


Hybrid TM 

In order to provide both the performance of 
hardware and the guarantee of progress of the 
software implementations, recent TM systems are 
a hybrid of HTM and STM. A typical hybrid 
TM first tries to execute transactions in hardware, 
and if the transaction fails to commit, then it 
falls back to execute it in software. The key 
feature of a good hybrid TM is that it provides 
concurrency between transactions, some of which 
are executing in hardware and some in software. 
Recent research shows that it is challenging to 
provide hardware-software coordination to make 
hybrid TMs work efficiently [2,3, 12, 13, 15]. 


TM Applications 

It is still not clear how exactly TM will be used. 
The intention is that TM will replace the use 
of locks in application code. Replacing locks in 
existing code is proving to be a complex task. 
The main issues arise from the fact that trans- 
actions must be able to abort. This means that 
any side effect or update of a transaction must be 
reversible, and this constrains the programmer to 
use only functions that can be undone. The main 
problem now is that the standard libraries and the 
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C++ STL do not provide full support for TM. 
Providing such support would be a major step 
forward toward simple applicability. 

The hope going forward is that new 
programming languages will include transac- 
tional memory mechanisms in the language itself 
and thus allow future code to be written a priori in 
a transactional fashion without the use of locks. 
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Problem Definition 


In the traveling salesman problem (TSP) n cities 
1, 2, ..., nm together with all the pairwise dis- 
tances d(i, j) between cities i and j are given. 
The goal is to find the shortest tour that visits 
every city exactly once and in the end returns 
to its starting city. The TSP is one of the most 
famous problems in combinatorial optimization, 
and it is well-known to be NP-hard. For more 
information on the TSP, the reader is referred to 
the book by Lawler, Lenstra, Rinnooy Kan, and 
Shmoys [14]. 

A special case of the TSP is the so-called 
Euclidean TSP, where the cities are points in the 
Euclidean plane, and the distances are simply 
the Euclidean distances. A special case of the 
Euclidean TSP is the convex Euclidean TSP, 
where the cities are further restricted so that they 
lie in convex position. The Euclidean TSP is 
still NP-hard [4, 17], but the convex Euclidean 
TSP is quite easy to solve: Running along the 
boundary of the convex hull yields a shortest 
tour. Motivated by these two facts, the following 
natural question is posed: What is the influence 
of the number of inner points on the complexity 
of the problem? Here, an inner point of a finite 
point set P is a point from P which lies in the 
interior of the convex hull of P. Intuition says that 
“Fewer inner points make the problem easier to 
solve.” 

The result below answers this question and 
supports the intuition above by providing simple 
exact algorithms. 


Key Results 


Theorem 1 The special case of the Euclidean 
TSP with few inner points can be solved in the 
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following time and space complexity. Here, n 
denotes the total number of cities and k denotes 
the number of cities in the interior of the convex 
hull. 1. In time O(k!kn) and space O(k). 2. In 
time O(2* k? n) and space O(2* kn) [1]. 


Here, assume that the convex hull of a given point 
set is already determined, which can be done in 
time O(n” log n) and space O(n). Further, note that 
the above space bounds do not count the space 
needed to store the input but they just count the 
space in working memory (as usual in theoretical 
computer science). 

Theorem | implies that, from the viewpoint 
of parameterized complexity [2, 3, 16], these 
algorithms are fixed-parameter algorithms, when 
the number k of inner points is taken as a param- 
eter, and hence the problem is fixed-parameter 
tractable (FPT). (A fixed-parameter algorithm 
has running time O( f(k)poly(7)), where n is the 
input size, k is a parameter and f:N — N is an 
arbitrary computable function. For example, an 
algorithm with running time O(5*n) is a fixed- 
parameter algorithm whereas one with O(n") is 
not.) Observe that the second algorithm gives 
a polynomial-time exact solution to the problem 
when k = O(logn). 

The method can be extended to some general- 
ized versions of the TSP. For example, Deineko 
et al. [1] stated that the prize-collecting TSP and 
the partial TSP can be solved in a similar manner. 


Applications 


The theorem is motivated more from a theoretical 
side rather than an application side. No real-world 
application has been assumed. 

As for the theoretical application, the view- 
point (introduced in the problem definition sec- 
tion) has been applied to other geometric prob- 
lems. Some of them are listed below. 


The Minimum Weight Triangulation Problem: 


Given n points in the Euclidean plane, the 
problem asks to find a triangulation of the 
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points which has minimum total length. The 
problem is now known to be NP-hard [15]. 
Hoffmann and Okamoto [10] proved that 
the problem is fixed-parameter tractable 
with respect to the number k of inner 
points. The time complexity they gave is 
0(6*n5 log n). This is subsequently improved 
by Grantson, Borgelt, and Levcopoulos [6] 
to O(4*kn*) and by Spillner [18] to O(2*kn?). 
Yet other fixed-parameter algorithms have 
also been proposed by Grantson, Borgelt, 
and Levcopoulos [7, 8]. The currently best 
time complexity was given by Knauer and 
Spillner [13] and it is O(2¢V* 8k £3/27)3) 
where c = (24+ V2)/(V3 — V2) < 11. 

The Minimum Convex Partition Problem: 
Given n points in the Euclidean plane, the 
problem asks to find a partition of the convex 
hull of the points into the minimum number of 
convex regions having some of the points as 
vertices. 

Grantson and Levcopoulos [9] gave an al- 
gorithm running in O(k°">2!% n) time. Later, 
Spillner [19] improved the time complexity to 
O(2*k n?3). 

The Minimum Weight Convex Partition 

Problem: Given n points in the Euclidean plane, 
the problem asks to find a convex partition of 
the points with minimum total length. 

Grantson [5] gave an algorithm running 
in O(k®*-5216kn) time. Later, Spillner [19] 
improved the time complexity to O(2* k?n3). 

The Crossing Free Spanning Tree Problem: 
Given an n-vertex geometric graph (i.e., 
a graph drawn on the Euclidean plane 
where every edge is a straight line segment 
connecting two distinct points), the problem 
asks to determine whether it has a spanning 
tree without any crossing of the edges. Jansen 
and Woeginger [11] proved this problem is 
NP-hard. 

Knauer and Spillner [12] gave algo- 
rithms running in O(175*k? 3) time and 
0(233Vk lok p23) time. 
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The method proposed by Knauer and Spill- 
ner [12] can be adopted to the TSP as well. 
According to their result, the currently best time 
complexity for the TSP is 20(Vk log k)poly(n). 


Open Problems 


Currently, no lower bound result for the time 
complexity seems to be known. For example, is it 
possible to prove under a reasonable complexity- 
theoretic assumption the impossibility for the ex- 


istence of an algorithm running in 20(V®) noly(n) 
for the TSP? 
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Problem Definition 


A tree is a connected graph with no cycle. A 
rooted tree is a tree with one designated vertex, 
called the root. For each vertex v except the root 
in a rooted tree, the parent of v is the neighbor 
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vertex of v on the path between v and the root. If 
vertex p is the parent of vertex c, then c is a child 
of p. An ordered tree is a rooted tree in which 
the children of each vertex are ordered. The five 
ordered trees having four vertices are shown in 
Fig. 1. An unordered tree is a rooted tree in which 
the ordering of the children of each vertex does 
not matter. The four ordered trees having four 
vertices are shown in Fig. 2 

Given an integer n the problem of tree 
enumeration asks for generating all ordered 
(or unordered) trees with n vertices. Several 
tree generation algorithms are explained in 
[3] and [2]. 


Key Results 


Tree counting began with Cayley in 1889 to enu- 
merate the saturated hydrocarbons, “Cy, H2n+2,” 
which can be modeled as trees. 

The number of ordered trees with n vertices is 
Cyn—1 [6], where C, is the nth Canatal number, 
defined as follows: 


anCn 
n+1 


n= 


The number of binary trees with 1 leaves is Cy. 
No formula for the number of unordered trees 
with n vertices is known, but the number for 
n < 40 is listed at [6, p. 624]. There is a nat- 
ural one-to-one correspondence between ordered 
trees with n vertices and binary tree with n leaves 
[3]. (For each vertex v of an ordered tree if we 
regard its first child and its next younger sibling 
as the left child and the right child of v one can 
have a binary tree in which the root has only one 
child.) So one can use enumeration algorithm for 
ordered trees to enumerate binary trees. 


Enumeration of All Ordered Trees 

Using reverse search method [1], one can enu- 
merate all ordered trees with n vertices in O(1) 
time for each [4]. We sketch the method in [4]. 
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Tree Enumeration, Fig. 1 
The ordered trees with four 
vertices 
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Tree Enumeration, Fig. 2. The unordered trees with four 
vertices 


Let S, be the set of all ordered trees with 
n > 1 vertices. Let T be a tree in S, and 
RP = (1o,11,---,1%) be the “rightmost path” 
of 7, which is the path from the root to the 
rightmost leaf (a leaf is a vertex having no child) 
such that 7; is the rightmost child of r;—; for 
eachi = 1,2,...,k. Removing the last vertex 
rx and the edge attaching to it results in a tree 
with one less vertices. We repeat such removal 
of the last vertex of the rightmost path, until the 
resulting tree consists of exactly one vertex. An 
example of such repetitive removal is shown in 
Fig. 3. We call the sequence of ordered trees the 
removal sequence of T. The sequence has n trees 
and always ends with the tree with exactly one 
vertex. If we merge the removal sequences of all 
T in S,, then we have the (unordered) tree T,,, 
called the family tree of S,. An example is shown 
in Fig.4. Note that 7), has all trees in S, at its 
leaves. 
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Kage 


The reverse search method [1] efficiently tra- 
verses the family tree (without storing the family 
tree in the memory) and output each tree in S), 
at each leaf. Thus, we can efficiently enumerate 
all trees in S,. The algorithm enumerates all 
ordered trees with n vertices in O(1) time for 
each [4]. 

With some additional ideas, given two integers 
n and k, one can also enumerate all ordered trees 
with n vertices including k leaves in O(1) time 
for each [7]. 


Enumeration of All Unordered Trees 

Using a generalized version of the algorithm 
above, one can also enumerate all unordered 
trees with n vertices in O(1) time for each [5]. 
The algorithm generates the next tree in O(1) 
time using the “prepostorder traversal” technique 
[3, p. 31]. Since the ordering of the children of 
each vertex is not fixed we define a “canoni- 
cal” ordered tree for each unordered tree and 
define the family tree of the canonical ordered 
trees. The structure of the family tree is not so 
simple and this result in a more complicated 
algorithm. 
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Tree Enumeration 
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Tree Enumeration, Fig. 3 An example of the removing sequence 


Tree Enumeration, Fig. 4 The family tree Fs 
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Problem Definition 


The treewidth of graphs is defined in terms of tree 
decompositions. A tree decomposition of a graph 
G=(V, E)isapair ({X;|i €¢ 1}, T =U, F)) with 
{X;|i € I} a collection of subsets of V, called 
bags, and T, a tree, such that 


* O(kJ/logk). 

e¢ For all {v,w} € E, there is ani € J with v, 
Wwe X;. 

¢ Forall vu € V, the set {i € J|v € X;} induces 
a connected subtree of 7’. 


The width of a tree decomposition is max ;¢7|X;| 
—1, and the treewidth of a graph G is the mini- 
mum width of a tree decomposition of G (Fig. 1). 

An alternative definition is in terms of chordal 
graphs. A graph G = (V, E) is chordal, if and 
only if each cycle of length at least 4 has a chord, 
i.e., an edge between two vertices that are not suc- 
cessive on the cycle. A graph G has treewidth at 
most k, if and only if G is a subgraph of a chordal 
graph H that has maximum clique size at most k. 


A third alternative definition is in terms of 
orderings of the vertices. Let 7 be a permutation 
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(called elimination scheme in this context) of the 
vertices of G = (V,E). Repeat the following 
step for i = 1,..., |V|: take vertex w(i), turn 
the set of its neighbors into a clique, and then 
remove v. The width of m is the maximum over 
all vertices of its degree when it was eliminated. 
The treewidth of G equals the minimum width 
over all elimination schemes. 

In the treewidth problem, the given input is an 
undirected graph G = (V, £), assumed to be 
given in its adjacency list representation, and a 
positive integer k < |V|. The problem is to de- 
cide if G has treewidth at most k and, if so, to give 
a tree decomposition of G of width at most k. 


Key Results 


Theorem 1 (Arnborg et al. [2]) The problem, 
given a graph G and an integer k, is to decide 
if the treewidth of G of at most k is nondetermin- 
istic polynomial-time (NP) complete. 

For many applications of treewidth and tree 
decompositions, the case where k is assumed 
to be a fixed constant is very relevant. Arnborg 
et al. [2] gave in 1987 an algorithm that solves 
this problem in O(n* +) time. A number of faster 
algorithms for the problem with k fixed have been 
found; see, e.g., [6] for an overview. 


Theorem 2 (Bodlaender [5]) For each fixed 
k, there is an algorithm that, given a graph 
G =(V,E) and an integer k, decides if the 
treewidth of G is at most k and, if so, that finds 
a tree decomposition of width at most k in O(n) 
time. 


This result of Theorem 2 is of theoretical 
importance only: in a practical setting, the 
algorithm appears to be much too slow owing to 
the large constant factor, hidden in the Onotation. 
For treewidth 1, the problem is equivalent to 
recognizing trees. Efficient algorithms based on 
a small set of reduction rules exist for treewidth 
2 and 3 [1]. 
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Treewidth of Graphs, 
Fig. 1 A graph and a tree 
decomposition of width 2 


Two often-used heuristics for treewidth are the 
minimum fill-in and minimum degree heuristic. 
In the minimum degree heuristic, a vertex v of 
minimum degree is chosen. The graph G’, ob- 
tained by making the neighborhood of v a clique 
and then removing v and its incident edges, is 
built. Recursively, a chordal supergraph H’ of 
G’ is made with the heuristic. Then, a chordal 
supergraph H of G is obtained, by adding v and 
its incident edges from G to H’. The minimum 
fill-in heuristic works similarly, but now a vertex 
is selected such that the number of edges that is 
added to make the neighborhood of v a clique is 
as small as possible. 


Theorem 3 (Fomin and Villanger [11]) There 
is an algorithm that, given a graph G = (V, E), 
determines the treewidth of G and finds a tree 
decomposition of G of minimum width that uses 


O(1.7549") time. 


Bouchitté and Todinca [10] showed that the 
treewidth can be computed in polynomial time 
for graphs that have a polynomial number of min- 
imal separators. This implies polynomial-time 
algorithms for several classes of graphs, e.g., 
permutation graphs, weakly triangulated graphs. 


Applications 


One of the main applications of treewidth and 
tree decomposition is that many problems that 
are intractable (e.g., NP-hard) on arbitrary graphs 
become polynomial time or linear time solvable 
when restricted to graphs of bounded treewidth. 
The problems where this technique can be ap- 
plied include many of the classic graph and net- 
work problems, like Hamiltonian circuit, Steiner 
tree, vertex cover, independent set, and graph 
coloring, but it can also be applied to many other 
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problems. The technique can sometimes be used 
for directed graphs [12]. It is also used in the 
algorithm by Lauritzen and Spiegelhalter [14] 
to solve the inference problem on probabilistic 
(“Bayesian,” or “belief”) networks. Such algo- 
rithms typically have the following form. First, 
a tree decomposition of bounded width is found, 
and then a dynamic programming algorithm is 
run that uses this tree decomposition. Often, the 
running time of this dynamic programming al- 
gorithm is exponential in the width of the tree 
decomposition that is used, and thus one wants 
to have a tree decomposition whose width is as 
small as possible. 

There are also general characterizations of 
classes of problems that are solvable in linear 
time on graphs of bounded treewidth. Most no- 
table is the class of problems that can be formu- 
lated in monadic second-order logic and exten- 
sions of these. 

Treewidth has been used in the context of 
several applications or theoretical studies, includ- 
ing graph minor theory, data bases, constraint 
satisfaction, frequency assignment, compiler op- 
timization, and electrical networks. 


Open Problems 


There are polynomial-time approximation algo- 
rithms for treewidth that guarantee a width of 
U;e, Xi = V for graphs of treewidth k. Austrin 
et al. [3] show that there is no constant factor ap- 
proximation for treewidth under the small set ex- 
pansion conjecture. A long-standing open prob- 
lem is whether there is a polynomial-time algo- 
rithm to compute the treewidth of planar graphs. 
Also open is to find an algorithm for the case 
where the bound on the treewidth k is fixed 
and whose running time as a function on n is 
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polynomial and as a function on k improves 
significantly on the algorithm of Theorem 2. 

The base of the exponent of the running time 
of the algorithm of Theorem 3 can possibly be 
improved. 


Experimental Results 


Many algorithms (upper-bound heuristics, lower- 
bound heuristics, exact algorithms, and prepro- 
cessing methods) for treewidth have been pro- 
posed and experimentally evaluated. An overview 
of many of such results is given in [10]. A variant 
of the algorithm by Arnborg et al. [2] was imple- 
mented by Shoikhet and Geiger [18]. Rohrig [17] 
has experimentally evaluated the linear-time al- 
gorithm of Bodlaender [5] and established that it 
is not practical, even for small values of k. The 
minimum degree and minimum fill-in heuristics 
are frequently used [13]. 


Data Sets 


A collection of test graphs and results for many 
of the algorithms on these graphs can be found in 
the TreewidthLIB collection [7]. 
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Problem Definition 


This problem investigates the effect of the lack 
of input information on computational hardness. 
The central question under investigation is the 
following: 


How much extra difficulty is introduced due to the 
lack of input knowledge? 


We explore this question by studying search 
problems. Suppose that on an input instance x, 
there is a set S(x) of solutions. A search problem 
is to find a solution s € S(x) for the input x. 
More specifically, we consider the fairly broad 
class of Constraint Satisfaction Problems (CSPs): 
Suppose that there is an input space {0, 1}” and 
a space $2 = {0,1}” of candidate solutions. 
The problem is defined by a number of con- 
straints Cy,C2,...,Cm(,...), where each Cj 
{0, 1}7* —> {0, 1} is a 0-1 function on the input 
and solution variables. The valid solutions for 
input x are defined as those s that satisfy all con- 
straints C;, i.e., those in {s : C;(x,s) = 1, Vi}. 
Note that the number of constraints can range 
from constant to polynomial, exponential, or even 
infinite. CSPs form a subject with intensive re- 
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search in theoretical computer science, artificial 
intelligence, and operations research, and they 
provide a common basis for exploration of a large 
number of problems with both theoretical and 
practical importance. 

The standard setting for CSP is to find a 
solution s on a given input x. Now consider 
the situation in which the input x is unknown. 
For a search problem A, denote by Ay Ay the 
same search problem with unknown inputs. For 
example, in the StableMatching problem, the 
input contains the preference lists of all men and 
women; in StableMatching,y, these preference 
lists are unknown to us. The constraints are that 
all man-woman pairs (m,w) are not blocking 
pairs, and the task is to find a solution that sat- 
isfies all constraints, namely, a stable matching. 

The method of searching for a solution of 
an unknown CSP follows a trial and error ap- 
proach. Trial and error is a basic methodology 
in problem solving and knowledge acquisition, 
and it has also been used extensively in product 
design and experiments. In our setting for CSPs, 
an algorithm can propose a candidate solution s. 
If s is not a valid solution, then we are told so 
by a verification oracle V, and furthermore, V 
also gives the index of one constraint that is not 
satisfied. If s is a valid solution, i.e., it satisfies all 
constraints, V returns an affirmative answer, and 
the problem is solved. Two remarks: 


e¢ If more than one constraint is violated, then 
(the index of) any one of them can be returned 
by V. 

¢ Note that V does not reveal the constraint 
itself, but only its index. 


Given the verification oracle V, an algorithm 
is an interactive process with V. The algorithm 
chooses candidate solutions (i.e., trials), and the 
oracle returns violations (i.e., errors). The process 
is adaptive, i.e., a newly proposed solution can be 
based on the historical information returned by 
the oracle. 

Because the focus is on how much extra diffi- 
culty is introduced by the lack of input informa- 
tion for a search problem A, we single out this 
by comparing the unknown-input and known- 
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input complexities. To this end, the algorithms are 
equipped with another oracle, the computation 
oracle, which can solve the known-input version 
of the same problem A. Thus overall, trial-and- 
error algorithms can access two oracles, the veri- 
fication oracle and the computation oracle. 

The model is motivated from _ several 
applications in practice; please see [4] for more 
discussions. 


Time Complexity 

As is standard in complexity theory, a query 
to either oracle has a unit time cost. The time 
complexity of a problem with unknown inputs is 
the minimum time needed for an algorithm to 
solve it for all inputs and all verification oracles 
consistent with the input. The standard notation 
in computational complexity theory for complex- 
ity classes such as P and NP and also for oracles 
are employed. For example, Ay € PY“ means 
that problem Ay can be solved by a polynomial- 
time algorithm with verification oracle V and the 
computation oracle that can solve the known- 
input version of A. If this occurs, then one con- 
sider the extra complexity (resulting from the 
unknown input) not to be very high. The central 
question can therefore be translated to the follow- 
ing. Given a search problem A, is Ay € PY’? If 
the given known-input problem A is in P, then 
the computation oracle can be omitted, and the 
problem becomes “Is Ay € PY?” 


Trial Complexity 

The trial complexity of an unknown-input prob- 
lem Ay is defined as the minimum number of 
queries to the verification oracle that any algo- 
rithm needs to make, regardless of its computa- 
tional power. As is standard in query complexity 
theory, one can consider deterministic or (Las 
Vegas) randomized algorithms. Denote by D(Ay) 
and R(A,) the deterministic and randomized trial 
complexities of Ay, respectively. 


Key Results 


The trial and time complexities of a number of 
problems are investigated in the trial and error 
model. 
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Theorem 1 ([4]) For the following problems A, 
we have Ay € PY. 


¢ Nash: Find a Nash equilibrium of a normal- 
form game. 

* Core: Find a core of a cooperative game. 

¢ StableMatching: Find a stable matching of a 
two-sided market with preference lists. 

¢ SAT: Find a satisfying assignment of a CNF 
formula. 


Nash is a fundamental problem in game the- 
ory, and its complexity has been characterized 
as PPAD-complete [6,7]. Core is a fundamental 
problem in cooperative game theory [10]. Both 
problems are naturally defined as CSPs. Nash 
can be formulated as a CSP of finding a pair of 
mixed strategies, where the constraints are that 
for each player, for each strategy, adopting that 
strategy is not better than the current (mixed) 
strategy. StableMatching is a problem with in- 
teresting combinatorial structures and many ap- 
plications, such as the pairing of graduating med- 
ical students with hospital residencies [11, 12]. 
Formally, given are two sets of elements M and 
W, each element having a preference list of 
elements in the other set. The task is to find a 
matching of the two sets s.t. No two unmatched 
elements (m;,w;) both prefer each other to the 
currently assigned one. In the unknown input 
version, the preference list of each individual is 
not known. The algorithm can propose a match- 
ing; if it is not stable in the above sense, then a 
pair (m;,w;) not satisfying the above property, 
sometimes called “blocking pair,’ is returned. 
SAT is a natural CSP, with the constraints being 
the OR of some literals. 

Considering the practical significance of 
StableMatching and SAT, the next theorem 
takes a closer look at their trial complexities. 


Theorem 2 ([4]) 


* 2(n7) < R(StableMatching,) < 
D(StableMatchingy) < O(n? logn), where 
n is the number of agents. 

¢ Given a formula with n variables and m 


clauses, R(SATy) < D(SATy) = O(n). 
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Further, R(SATy) = Q(mn) ifm = Q(n?), 
and R(SATy) = 2(m?/?) ifm = o(n?). 


It is somewhat surprising that knowing only 
the indices of violated constraints is already suf- 
ficient to admit quite a number of efficient algo- 
rithms. It is therefore natural to wonder whether 
the lack of input information adds any extra 
difficulty at all in any problem. The answer turns 
out to be affirmative: there are problems whose 
unknown-input versions are considerably more 
difficult than their known versions. Two rep- 
resentatives are Graphlso and Grouplso, the 
problems of deciding whether two given graphs 
or groups are isomorphic. 


Theorem 3 ([4]) 


* If Graphisoy, ¢ PY-S'Ph's°, then the polyno- 
mial hierarchy (PH) collapses to the second 
level. 

* If Grouplso(-,Zp)u € PY, then we have 
P = NP. (Here, Grouplso(-, Zp) is the group 
isomorphism problem with the second group 
known as Zp for a prime p.) If Grouplsoy € 
pY:Grouplso | then we have NP C POs”). 


However, if SAT is given as the computation 
oracle, then deterministic polynomial-time algo- 
rithms exist for Graphlso and Grouplso, i.e., 
Graphlso, ¢ PY-SA' and Grouplsoy ¢ PY-S4", 
with O(n?) and O(n°) trials, respectively. 


Note that Grouplso(-, Z,) (with a known in- 
put) admits a simple polynomial-time algorithm 
by comparing the multiplication tables. Actu- 
ally, Grouplso is in P if the two groups are 
Abelian. However, if the multiplication table of 
the input group is unknown, then surprisingly, 
the problem becomes NP-hard. Putting the com- 
putational hardness and the low trial complexity 
together, one can see that if more computational 
time (enough to solve an NP problem) is given, 
then less trials are needed. This interesting trade- 
off between the two complexity measures is not 
commonly seen in other query models. 

Finally, beyond all of the foregoing problems 
that can be solved in PY:*47, one can show via an 
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information theoretical argument that the follow- 
ing two problems have exponential lower bounds 
for the randomized trial complexity. 


¢ LinearProgramming: Find a feasible solu- 
tion of a linear program with variables and 
m constraints. 

¢ SubsetSum: Decide whether a given set of n 
integers can be partitioned into two parts with 
equal summation of elements. 


Theorem 4 ([4,5]) 


* R(LinearProgrammingy) = 2(m!"/21). 
¢ R(SubsetSumy) = 2(2”). 


The approaches for Nash and LP are actually 
similar, yet the running time differs significantly. 
The key property that guarantees the efficiency 
of the algorithm for Nash is the existence of 
Nash equilibrium for any finite game. The algo- 
rithm for Nash could thus serve as an interesting 
example to illustrate how the solution-existing 
property helps computational efficiency. 

Moreover, the following time complexity up- 
per bound for LinearProgrammingu is estab- 
lished, which is exponential in the number of 
variables but not in the number of constraints. 


Theorem 5 ({5]) The —LinearProgrammingy 
problem with m constraints, n variables, and 
input size L can be deterministically solved in 
time (mnL)??™. In particular, the algorithm 
is of polynomial time for constant dimensional 
linear programming (i.e., constant number of 
variables n). 


In summary, these results illustrate the variety 
of time and trial complexities that arise from the 
lack of input information for different problems 
and imply distinct levels of the cruciality of input 
information for different problems. 


Related Work 


The trial and error model bears a resemblance to 
certain other problems and models, e.g., learning, 
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algorithm design in unknown environments, el- 
lipsoid method, and query complexity. However, 
there are fundamental distinctions between these 
models and ours. (More discussions are referred 
to [4].) 


Learning 

Trial and error model has apparent connections to 
various learning theories (e.g., concept learning 
with membership or equivalence query [1], de- 
cision tree learning, reinforcement learning [3], 
and (semi-)supervised learning [2]), but funda- 
mental differences also exist. A common high- 
level philosophy of various learning models is 
to “sample and predict,’ which is very different 
from our “trial and search” (for a solution) in 
current setting. With its solution-oriented objec- 
tive and advantages in computational efficiency, 
the trial and error model is hopefully to serve 
as a useful supplement to existing learning the- 
ories, particularly in contexts in which the un- 
known object itself is impossible or unafford- 
able to learn and the only available access to 
the unknown is through a solution-verification 
process. 


Ellipsoid Method 
The ellipsoid method is an elegant approach 
for proving the polynomial time solvability of 
a class of combinatorial optimization problems 
(see, e.g., [8]); it applies even when the explicit 
expressions of the constraints are unknown. The 
algorithm works as long as there exists an oracle 
that, on a proposed candidate solution, returns a 
violation in the form of a separating hyperplane. 
In general, trial and error model has a simi- 
larity to the ellipsoid method, in which a point is 
proposed as a trial and a separating hyperplane is 
returned as an error. Our LinearProgrammingy 
problem studies how to solve linear programs 
where the returned error is merely the index of 
a violated hyperplane (with the actual hyperplane 
still hidden). Moreover, the trial and error model 
includes a much broader class of search problems 
—not only convex optimization problems, but also 
many with pure combinatorial structures (e.g., 
the SAT, Grouplso, and Graphlso problems 
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discussed here). From this perspective, the ellip- 
soid method is only one possible approach for 
the trial and error search problems in current 
model. 
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Problem Definition 


The main problem consists in designing 
space-efficient data structures allowing to 
represent the connectivity of triangle meshes 
while supporting fast navigation and local 
updates. 


Mesh Structures: Definition 

Triangle meshes are among the most common 
representations of shapes. A triangle mesh is a 
collection of triangle faces that define a poly- 
hedral approximation of a surface. A mesh is 
manifold if every edge is bounding either one 
or two triangles and if the faces incident to a 
same vertex define a closed or open fan. Here 
we focus on manifold meshes. Assuming that 
the genus and the number of boundary edges are 
negligible when compared to the number n of 
vertices, the number m of faces is roughly equal 
to 2n. 


Triangulation Data Structures 


Data Structures: Classification 

Mesh data structures can be compared with re- 
spect to several criteria. A basic requirement 
(the traversability) for mesh representations is 
to provide fast navigational operators allowing 
to perform a mesh traversal (such as walking 
around a vertex). Most representations are also 
indexable, allowing to access in constant time to 
the description of a given vertex or triangle, given 
its index. In order to support efficient processing 
of large meshes, one needs to reduce memory 
trashing during navigation. An effective way of 
doing so is to design compact data structures 
requiring small storage. Many applications ask 
for the modifiability: the manipulation of meshes 
requires to perform updates such as vertex in- 
sertions/deletions, edge collapses, and edge flips. 
The choice of the data structure should also 
depend on the simplicity of its implementation 
and on its practical efficiency on common input 
data. 


Standard Mesh Representations 
Some common mesh representations are 
implemented in the explicit pointer-based form. 
References are used to describe incidence 
relations between mesh elements, and navigation 
is performed throughout address indirection. 
For example, a face-based representation [2] 
provides operators vertex(A, i) (giving the ith 
vertex of a triangle A) and neighbor(A, i) 
(giving the i-neighbor of A), as well as 
operator face(v) (returning a triangle incident 
to vertex v). As illustrated in Fig. la, the 
combination of these operators allows to 
implement operators faceIndex(Ay, Az) 
(giving the index of A; among the neighbors of 
A») and vertexIndex(v, A) (giving the index 
of a vertex in A). An alternative solution is given 
by the Corner Table proposed by Rossignac and 
colleagues, which uses integer indices to integer 
tables and provides a triangulation interface 
involving the corner operators defined in Fig. 2. 
The two abstract data types above fully 
support local navigation in the mesh: the face- 
based as well as corner operators support efficient 
mesh exploration (see Figs. 1 and 2). A simple 
implementation stores explicitly all incidence 
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a 
int valence(int v) { 

v = vertex(A, i) int d = 1; 

A = face(v) int f = face(v); 

i = vertexIndex(v, A) int g = neighbor(f, cw(vertexIndex(v, f))); 

Jo = neighbor(A, #) while (g! = f) { 

g, = neighbor(A, ccw(i)) int next = neighbor(g, cw(facelndex(f, g))); 


gg = neighbor(A, cw/(i)) 

z = vertex(g,, faceIndex(g,, A)) 
int ccw(int i) {return (i + 1)%3; } 
int cw(int i) {return (i + 2)%3; } 


int i = facelndex(g, next); 
g = next; 

d+ +; 

} 


return d; } 


b 


class Quad extends Patch { 
Patch pi, p2, p3, p4; 
Vertex vi, v2, v3, v4; 


Es 


class Pentagon extends Patch { 
Patch pl, p2, p3, p4, pd; 
Vertex vi, v2, v3, v4, v5; 


# 


class Hexagon extends Patch { 


Patch pi, p2, p3, p4, pS, pé; 
Vertex vi, v2, v3, v4, v5, v6; 


} 


Triangulation Data Structures, Fig. 1 (a) Triangle- 
based data structure: each triangle stores references to 
the 3, neighbors and to the 3 incident vertices yielding 


relations involving faces or corners, using 6 
references per triangle plus one reference per 
vertex (describing the map from vertices to 
faces): according to Euler formula, this leads to 
a storage cost of 13 references per vertex (rpv). 
The results of triangle(c) and next(c) are 
not stored explicitly but calculated assuming that 
the three corners of each triangle are assigned 
consecutive indices. 


Key Results 


A Theoretically Optimal Representation 

From the information theory point of view, 
encoding a planar triangulation requires 3.24 
bits per vertex (bpv), which is much less than the 
13logn bpv used by standard representations. 
Succinct representations provide theoretically 
optimal encodings for triangulations, which 


13rpv. (b) Catalog-based representation: using a catalog 
of size 3 one can guarantee that any quad is adjacent to at 
most two other quads, leading to a cost of 8.5 rpu 


match the optimal asymptotic bound of 3.24 bpv 
(or equivalently 1.62m bits), while efficiently 
supporting navigational operations [4, 5], as 
stated below. 


Theorem 1 Given a planar triangulation T of 
m triangles, there exists a succinct representation 


that uses 1.62m +O 


(zp) bits, supporting 


logm 
navigation in worst case O(1) time. 


This result is achieved with a multilevel 
hierarchical structure. The initial triangulation of 
size m is decomposed into small triangulations, 
each having @(log?m) triangles: such a 
decomposition leads to a map ¥ describing 
adjacency relations between small triangulations. 
Small triangulations are then decomposed 
into tiny triangulations of size O(logm), 
whose adjacency relations are described by 


a map G. Map ¥ has O (4k) nodes and 
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Uv 


v = vertex(c) 


t = triangle(c) 
n = next(c) 
8 = swing(c) 


p = prev(c) 

oO = opposite(c) 
1 = left(c) 

r = right(c) 


prev(c) = next(next(c)) 


opposite(c) = prev(swing(prev(c))) 


int c = seed; 
visited |vertex(next(c))] = true; 
visited |[vertex(previous(c))] = true; 
do { 
if(!visited[vertex(c)]) { 
visited |vertex(c)] = true; 
explored|triangle(c)] = true; 


else if(!explored[triangle(c)]) c = opposite(c); 
c = right(c); 
}while(c! = opposite(s)); 


Triangulation Data Structures, Fig. 2. The Corner Table: 
illustrated by the code of the Ring-Expander procedure [10] 


arcs and can be stored in sublinear space 


using 0(-4-) references of size O(logm) 


m ) <  O(logm)). Map 


log? m 


(actually O (log 


G has O(a) nodes and arcs: adjacencies 
between two tiny triangulation within the 


same small triangulation need references of 
size O (log a) = 
adjacencies crossing the small triangulation 
boundaries are accessed by referring to Ff. 
In that way the storage of both F and G is 
sublinear. The structure of tiny triangulations 
is optimally encoded throughout lookup into a 
table storing all possible triangulations of size 
O(logm). Such a framework can be extended 
in order to support updates: vertex deletions and 
edge flips are performed in O(log” m) amortized 
time (vertex insertions require O(1) amortized 
time). The optimality stated by Theorem | is 
obtained combining the two-level representation 
with a careful decomposition of the mesh into 
tiny regions, involving a bijection between 
triangulations and a special class of vertex 
spanning trees [13]. 

A different approach, based on small separa- 
tors, leads to compact representations [1] using 
O(n) bits for more general classes of meshes 
(storage performances are difficult to evaluate 
precisely). 


O(loglogm) while 


A More Practical Solution 
Succinct representations run under the word- 
RAM model and are mainly of theoretical 


corner operators allow to implement local navigation, as 


interest, since the amount of memory required 
in practice is quite important even for very large 
meshes. Some attempts to exploit the algorithmic 
framework of succinct representations in practice 
had lead to a space-efficient dynamic data 
structure [6]. The main idea is to gather together 
neighboring faces into small groups of triangles 
(called patches). While references are still of 
size O(logn), grouping triangles allows to save 
some references (corresponding to edges internal 
to a given patch). For example, using a catalog 
consisting only of triangles and quadrangles, we 
encode a triangulation with at most 10.6rpv (a 
19% improvement over simple representations 
mentioned earlier). More sophisticated choices of 
patches lead to dynamic structures with smaller 
storage (e.g., Fig. 1b), as stated below: 


Theorem 2 Given a triangulation (possibly hav- 
ing handles and boundaries), there exists a data 
structure using 7.67 rpv, which allows O(1) time 
navigation and supports updates in O(1) amor- 
tized time. 


Reducing Redundancy Throughout Face 
Reordering 

The main idea used in the SOT data structure [8] 
is to implicitly represent the map from triangles to 
corners (triangle operator), and the map from 
corners to vertices (vertex operator), through 
face reordering. First, match each vertex to an 
incident triangle (in such a way a triangle is 
matched with at most one vertex). Then permute 
triangles in such a way that the triangle associated 
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Triangulation Data Structures, Fig. 3 Illustrations of the SOT (a) and ESQ (b) data structures 


with the ith vertex v; has number i (thus, the 
first n triangles appearing in this ordering are 
the ones associated with a vertex). The corners 
of a triangle are listed consecutively, and the 
first one corresponds to the vertex matched for 
the triangle. The incidence relations are stored 


in an array O (of length 3m) having 3 entries 
per triangle: O[7] stores the index of the corner 
opposite to c; (which is matched to vertex v;, 
for i < m). Corner operators are supported in 
O(1) time performing arithmetic operations (see 
Fig. 3a). Accessing a vertex v; requires to walk 
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around its incident faces until c; is reached (v; 
being matched to c;). 


Theorem 3 ([8]) Given a triangulation (possi- 
bly having handles and boundaries), there exists 
a data structure using 6 rpv which supports O(1) 
time navigation (retrieving a vertex of degree d 
requires O(a) time). 


More Compact (Static) Representations 
Combining this reordering approach with a pair- 
ing of adjacent triangles into quads, the SQUAD 
data structure [9] reaches better storage requiring 
slightly more than 4rpv according to experimen- 
tal results on common meshes (the worst case 
upper bound is still 6rpv). If one is allowed 
to perform a reordering of the input vertices, it 
is possible to guarantee a storage of 4rpv in 
the worst case (with same time performances as 
before): the edge-based representation described 
in [3] matches this bound exploiting Schnyder 
woods decompositions [14]. Various heuristics 
allows to further reduce storage requirements in 
practice [10-12]. 


A Dynamic Representation 

Combining the reordering approach described 
above with the decomposition into triangle 
patches, the ESQ data structure [7] exhibits the 
same navigation performances as in SOT, while 
supporting local updates. As in [6] the mesh is 
decomposed into a collection of patches, each 
consisting of one or more triangles, and vertices 
are matched to patches. The assumption that each 
vertex is matched to a different triangle is relaxed. 
The catalog thus consists of a collection of k 
patch types (having possible one or more marked 
comers, describing how vertices are matched). 
Adjacency relations between faces are stored in 
k tables T,,..., 7, one for each patch type (see 
Fig. 3b). Extending the approach introduced in 
SOT, a reordering of the input vertices allows 
to represent the maps from vertices to triangles 
and from triangles to vertices. A table Ts of 
type S = (c,b) (with b boundary edges and 
c matched vertices) contains b references for 
each entry; the entries in the associated table 
Gs (containing geometric coordinates) are 
ordered accordingly. The decomposition into 
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patches is maintained under local modifications 
with a constant number of memory updates in 
tables 7;. 


Theorem 4 ([7]) Given a triangulation (possi- 
bly having handles and boundaries), there exists 
a dynamic data structure using 4.8rpv, which 
allows O(1) time navigation and O(d) time ac- 
cess to a vertex of degree d. Updates (vertex 
insertions/deletions and edge flips) are supported 
in O(1) amortized time. 


Experimental Results 


In [9] are reported timing comparisons of 
operators for SOT, SQUAD, and Corner Table 
data structures: experimental evaluations concern 
adjacency and navigational operations. On the 
tested mesh (the 55 millions triangles David), 
SQUAD requires 20s and uses 2.2GB of RAM 
for the construction (on a MacbookPro, equipped 
with 2.66 GHz Intel Core 17, 8GB). When the 
whole mesh fits in main memory, compact data 
structures (SQUAD and SOT) perform slower 
than Corner Table. When the allowed memory is 
reduced, SQUAD performances are comparable 
and sometimes even better than Corner Table 
performances for high-level tasks (e.g., valence 
computations). 
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Problem Definition 


This problem is concerned with designing truth- 
ful (dominant strategy) mechanisms for domains 
where each agent’s private information is ex- 
pressed by a single positive real number. The 
goal of the mechanisms is to allocate loads placed 
on the agents, and an agent’s private information 
is the cost incurred per unit load. Archer and 
Tardos [4] give an exact characterization for the 
algorithms that can be used to design truthful 
mechanisms for such load balancing problems 
using appropriate payments. The characterization 
shows that the allocated load must be monotonic 
in the cost (decreasing when the cost on an agent 
increases, fixing the costs of the others). Thus, 
truthful mechanisms are characterized by a con- 
dition on the allocation rule, and payments that 
ensure voluntary participation can be calculated 
using the given characterization. 

The characterization is used to design 
polynomial time truthful mechanisms for several 
problems in combinatorial optimization to 
which the celebrated VCG mechanism does 
not apply. For scheduling related parallel 
machines to minimize makespan (Q||Cmax), 
Archer and Tardos [4] present a 3-approximation 
mechanism based on randomized rounding of the 
optimal fractional solution. This mechanism is 
truthful only in expectation (a weaker notion of 
truthfulness in which truthful bidding maximizes 
the agent’s expected utility). Archer [3] improves 
it to a randomized 2-approximation truthful 
mechanism. Andelman, Azar, and Sorani [2] 
provide a deterministic truthful mechanism 
that is 5-approximation. Kovacs improves 
it to 3-approximation in [12] and to 2.8- 
approximation in [13] (Kovacs also gives other 
results for two special cases). Andelman, Azar, 
and Sorani [2] also present a deterministic 
Fully Polynomial Time Approximation Scheme 
(FPTAS) for scheduling on a fixed number 
of machines, as well as a suitable payment 
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scheme that yields a deterministic truthful 
mechanism. Dhangwatnotai et al. [8] present 
a randomized Polynomial Time Approximation 
Scheme (PTAS) that is truthful-in-expectation. 
Christodoulou and Kovacs [7] present a truthful 
deterministic Polynomial Time Approximation 
Scheme (PTAS); this matches the best possible 
result for the computational problem (without 
incentives) by Hochbaum and Shmoys [10] (it is 
known that this problem is strongly NP-hard [9]). 
This result shows that there is no “cost of truthful- 
ness” for this problem, as the best approximation 
with incentive constraints is as good as the best 
approximation without these constraints. 

Archer and Tardos [4] also present results 
for goals other than minimizing the makespan. 
They present a truthful mechanism for Q|| }° C; 
(scheduling related machines to minimize the 
sum of completion times) and show that for 
Q|| >> w,;C; (minimizing the weighted sum of 
completion times) 5 is the best approximation 
ratio achievable by a truthful mechanism. 

This family of problems belongs to the field of 
Algorithmic Mechanism Design, initiated in the 
seminal paper of Nisan and Ronen [15]. Nisan 
and Ronen consider makespan minimization for 
scheduling on unrelated machines and prove up- 
per and lower bounds (note that for unrelated 
machines agents have more than one parameter). 
Mu’alem and Schapira [14] present improved 
lower bounds. Other papers consider the problem 
of scheduling on related machines to minimize 
the makespan. Auletta et al. [5] and Ambro- 
sio and Auletta [1] present truthful mechanisms 
for several NP-hard restrictions of this problem. 
Nisan and Ronen [15] also introduce a model in 
which the mechanism is allowed to observe the 
machines’ actual processing time and compute 
the payments afterward (in such a model the 
machines essentially cannot claim to be faster 
than they are); Auletta et al. [6] present addi- 
tional results for this model. In particular, they 
show that it is possible to overcome the lower 
bound of wa for Q|| >’ w;C; (minimizing the 
weighted sum of completion times) and provide 
a polynomial time (1 + €)-approximation truthful 
mechanism (with verification) when the number 
of machines (7m) is constant. 


Truthful Mechanisms for One-Parameter Agents 


The Mechanism Design Framework 

Let I be the set of agents. Each agent i € J has 
some private value (type) consisting of a single 
parameter ¢; € R that describes the agent, and 
which only 7 knows. Everything else is public 
knowledge. Each agent will report a bid b; to the 
mechanism. Let ¢ denote the vector of true values, 
and b the vector of bids. 

There is some set of outcomes O, and given 
the bids b the mechanism’s output algorithm 
computes an outcome o(b) € O. For any types 
t, the mechanism aims to choose an outcome 
o € O that minimizes some function g(o, tf). 
Yet, given the bids b the mechanism can only 
choose the outcome as a function of the bids (0 = 
o(b)) and has no knowledge of the true types 
t. To overcome the problem that the mechanism 
knows only the bids 5, the mechanism is designed 
to be truthful (using payments), that is, in such 
a mechanism it is a dominant strategy for the 
agents to reveal their true types (b = t). For 
such mechanisms minimizing g(o,t) is done by 
assuming that the bids are the true types (and 
this is justified by the fact that truth telling is a 
dominant strategy). 

In the framework discussed here we assume 
that outcome o(b) will assign some amount of 
load or work w;(o(b)) to each agent 7, and given 
o(b) and ¢;, agent i incurs some monetary cost, 
cost; (t;,0(b)) = t;w;(o(b)). Thus, agent i’s pri- 
vate data ¢; measures her cost per unit work. Each 
agent i attempts to maximize her utility (profit), 
uj(t;,b) = P;(b) — cost; (t;,0(b)), where P;(b) 
is the payment to agent 7. 

Let b_; denote the vector of bids, not 
including agent i, and let b = (b_;,b;) . 
Truth telling is a dominant strategy for agent 
i if bidding ¢; always maximizes her utility, 
regardless of what the other agents bid. That 
is, uj(t;,(b-i,ti)) = ui(ti, (b-i, b;)) for all b_; 
and b;. 

A mechanism M consists of the pair M = 
(o(-), P(-)), where o(-) is the output function 
and P(-) is the payment scheme, i.e., the vector 
of payment functions P;(-). An output function 
admits a truthful payment scheme if there ex- 
ist payments P(-) such that for the mechanism 
M = (o(), P(-)), truth telling is a dominant 
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strategy for each agent. A mechanism that admits 
a truthful payment scheme is truthful. 

Mechanism M satisfies the voluntary partic- 
ipation condition if agents who bid truthfully 
never incur a net loss, i.e., uj(t;, (b-;,t;)) = 0 
for all agents i, true values ¢;, and other agents’ 
bids b_;. 


Definition 1 With the other agents’ bids b_; 
fixed, the work curve for agent i is w;(b_;,b;) 
considered as a single-variable function of bj. 
The output function o is decreasing if each of 
the associated work curves is decreasing (i.e., 
w;(b_;, b;) is a decreasing function of b;, for all 
i and b_;). 


Scheduling on Related Machines 

There are n jobs and m machines. The jobs 
represent amounts of work py > p2 =>... = 
Pn, and let p denote the set of jobs. Machine i 
runs at some speed s;, so it must spend p;/s; 
units of time processing each job j assigned to 
it. The input to an algorithm is 5, the (reported) 
speed of the machines, and the output is o(d), 
an assignment of jobs to machines. The load on 
machine i for outcome o(b) is wi(b) = Yo pj; 
, where the sum runs over jobs 7 assigned to 7. 
Each machine incurs a cost proportional to the 
time it spends processing its jobs. The cost of 
machine i is cost; (t;,0(b)) = t;w;(o(b)), where 
t; = 1/s; and w;(b) is the total load assigned 
to i when the speeds are b. Let C; denote the 
completion time of job 7. One can consider the 
following goals for scheduling related parallel 
machines: 


¢ Minimizing the makespan (Q||Cnax), the 
mechanism’s goal is to minimize the 
completion time of the last job on the last 
machine, i.e., g(0,t) = Cmax = max; tj - 
Wi (Dd). 

¢ Minimize the sum of completion times 
(Q|| Cj), ie, got) = Ql|C; = 
» j Cj 

¢ Minimize the weighted sum of completion 
times (Q||>\w;C;), ie, g(o,t) = 
Q|| iwjCj = D0; wyC; where w; is the 
weight of job /. 
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An algorithm is a c-approximation algorithm 
with respect to g, if for every instance (p,f), it 
outputs an outcome of cost at most c - g(o(t), f). 
A c-approximation mechanism is a mechanism 
whose output algorithm is an c-approximation. 
Note that if the mechanism is truthful the ap- 
proximation is with respect to the true speeds. A 
PTAS (Polynomial Time Approximation Scheme) 
is a family of algorithms such that for every 
€ > 0 there exists a (1 + €)-approximation 
algorithm. If the running time is also polynomial 
in 1/e, the family of algorithms is a FPTAS (Fully 
Polynomial Time Approximation Scheme). 


Key Results 


The following two theorems hold for the mech- 
anism design framework as defined in section 
“Problem Definition.” 


Theorem 1 ([4]) The output function o(b) ad- 
mits a truthful payment scheme if and only if it is 
decreasing. In this case, the mechanism is truthful 
if and only if the payments P;(b_;,6;) are of the 
form 


bj 
hj (b_-i) + Biwi (b-i, bi) -{ wi (b_;,u)du 
0 


where the h; are arbitrary functions. 


Theorem 2 ([4]) A decreasing output function 
admits a_ truthful payment scheme satisfy- 
ing voluntary participation if and only if 
i wi(b_j,u)du < oo for all i,b_;. In this 
case, the payments can be defined by 


Co 
Pi(B-, Bj) = Bini(b-ivbi) + f wi(b-s. ad 
Theorem 3 ([4]) There is a truthful mechanism 
(not polynomial time) that outputs an optimal 
solution for Q||Cax and satisfies voluntary par- 
ticipation. 


Theorem 4 ([2,7]) For the problem of minimiz- 
ing the makespan (Q ||Cimax): 


¢ There exists a deterministic Polynomial Time 
Approximation Scheme (PTAS) for scheduling 


2270 


on related machines that admits a truthful 
payment scheme [7]. The mechanism created 
satisfies voluntary participation. 

¢ There exists a deterministic Fully Polynomial 
Time Approximation Scheme (FPTAS) for 
scheduling on a fixed number of machines 
that admits a truthful payment scheme [2]. 
The mechanism created satisfies voluntary 
participation. 


Theorem 5 ([4]) There is a truthful polynomial 
time mechanism that outputs an optimal solution 
for Q\| >> Cj and satisfies voluntary participa- 
tion. 


Theorem 6 ([4]) No truthful mechanism for 
Q|| >> w;C; can achieve an approximation ratio 
better than FR even on instances with just two 


jobs and two machines. 


Applications 


Archer and Tardos [4] apply the characterization 
of truthful mechanisms to problems other than 
scheduling. They present results for the unca- 
pacitated facility location problem as well as the 
maximum flow problem. 

Kis and Kapolnai [11] consider the problem of 
scheduling of groups of identical jobs on related 
machines with sequence independent setup times 
(Q\u;, Pik = pj ||Cmax). They provide a truthful, 
polynomial time, randomized mechanism for the 
batch scheduling problem with a deterministic 
approximation guarantee of 4 to the minimal 
makespan, based on the characterization of truth- 
ful mechanisms presented above. 


Open Problems 


The problem of designing truthful mechanisms 
for related machines to minimize the makespan 
was completely resolved, as a deterministic 
PTAS [7] is the best one can hope for. For 
this problem there is no gap between the best 
approximation with and without incentives. The 
main open problem left is of finding some natural 
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single-parameter setting in which there is a gap 
between the approximation that is achievable by 
algorithms and truthful mechanisms. 


Experimental Results 


None is reported. 


Data Sets 


None is reported. 


URL to Code 


None is reported. 
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Problem Definition 


Several mechanisms [1, 3, 5, 9], which essentially 
all belong to the VCG mechanism family, have 
been proposed in the literature to prevent the 
selfish behavior of unicast routing in a wireless 
network. In these mechanisms, the least cost path, 
which maximizes the social efficiency, is used 
for routing. Wang, Li, and Wang [8] studied the 
truthful multicast routing protocol for a selfish 
wireless network, in which selfish wireless termi- 
nals will follow their own interests. The multicast 
routing protocol is composed of two components: 
(1) the tree structure that connects the sources and 
receivers, and (2) the payment to the relay nodes 
in this tree. Multicast poses a unique challenge 
in designing strategyproof mechanisms due to the 
reason that (1) a VCG mechanism uses an output 
that maximizes the social efficiency; (2) it is NP- 
hard to find the tree structure with the minimum 
cost, which in turn maximizes the social effi- 
ciency. A range of multicast structures, such as 
the least cost path tree (LCPT), the pruning min- 
imum spanning tree (PMST), virtual minimum 
spanning tree (VMST), and Steiner tree, were 
proposed to replace the optimal multicast tree. 
In [8], Wang et al. showed how payment schemes 
can be designed for existing multicast tree struc- 
tures so that rational selfish wireless terminals 
will follow the protocols for their own interests. 
Consider a communication network G = 
(V, E,c), where V = {v1,--- , Un} is the set of 
communication terminals, E = {e1,e2,--- ,@m} 
is the set of links, and c is the cost vector of 
all agents. Here agents are terminals in a node 
weighted network and are links in a link weighted 
network. Given a set of sources and receivers 
O = {40,91,92,°°: »Gr-1} C V, the multicast 
problem is to find a tree T C G spanning all 
terminals Q. For simplicity, assume that s = qo 
is the sender of a multicast session if it exists. 
All terminals or links are required to declare 
a cost of relaying the message. Let d be the 
declared costs of all nodes, i.e., agent i declared 
a cost d;. On the basis of the declared cost profile 
d, a multicast tree needs to be constructed and 
the payment p;(d) for each agent k needs to be 
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decided. The utility of an agent is its payment 
received, minus its cost if it is selected in the 
multicast tree. Instead of reinventing the wheels, 
Wang et al. still used the previously proposed 
structures for multicast as the output of their 
mechanism. Given a multicast tree, they studied 
the design of strategyproof payment schemes 
based on this tree. 


Notations 

Given a network H, w(#) denotes the total cost 
of all agents in this network. If the cost of any 
agent i (link e; or node v;) is changed to c’;, the 
new network is denoted as G’ = (V, E,c|‘c!), 
or simply ele. If one agent i is removed 
from the network, it is denoted as c|! oo. For 
the simplicity of notation, the cost vector c is 
used to denote the network G = (V, E,c) if no 
confusion is caused. For a given source s and 
a given destination g;, LCP(s,q;,c) represents 
the shortest path between s and gq; when the 
cost of the network is represented by vector c. 
|LCP(s, q;, d)| denotes the total cost of the least 
cost path LCP(s, q;,d). The notation of several 
multicast trees is summarized as follows. 


1. Link Weighted Multicast Tree 

e¢ LCPT: The union of all least cost paths 
from the source to receivers is called the 
least cost path tree, denoted by LCPT(d). 

e PMST: First construct the minimum span- 
ning tree MST(G) on the graph G. Take 
the tree MS7(G) rooted at sender s, prune 
all subtrees that do not contain a receiver. 
The final structure is called the Pruning 
Minimum Spanning Tree (PMST). 

e LST: The Link Weighted Steiner Tree 
(LST) can be constructed by the al- 
gorithm proposed by ‘Takahashi and 
Matsuyama [6]. 

2. Node Weighted Multicast Tree 

e WMST: First construct a virtual graph 
using all receivers plus the sources as 
the vertices and the cost of LCP as the 
link weight. Then compute the minimum 
spanning tree on the virtual graph, which 
is called virtual minimum spanning tree 
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Algorithm 1 Non-VCG mechanism for LCPT 


1: For each receiver q; # s, computes the least cost 
path from the source s to q;, and compute a pay- 
ment p,(d) to every link ex on the LCP(s, qi, d) us- 
ing the scheme for unicast 


pi (d) = dy +|LCP(s, qi, d|*00)|—|LCP(s, qi, d)|- 


2: The final payment to link e, € LCPT is then 
d) = max p;,(d). 1 
pad) = max pi(d) (a) 
The payment to each link not on LCPT is simply 0. 


(VMST). Finally, choose all terminals on 
the VMST as the relay terminals. 

e NST: The node weighted Steiner tree 
(NST) can be constructed by the algorithm 
proposed by [4]. 


Key Results 


If the LCPT tree is used as the multicast tree, 
Wang et al. proved the following theorem. 


Theorem 1 The VCG mechanism combined with 
LCPT is not truthful. 


Because of the failure of the VCG mechanism, 
they designed their non- VGC mechanism for the 
LCPT-based multicast routing as follows. 


Theorem 2 Payment (defined in Eq. (1)) based 
on LCPT is truthful and it is minimum among all 
truthful payments based on LCPT. 


More generally, Wang et al. [8] proved the fol- 
lowing theorem. 


Theorem 3 The VCG mechanism combined with 
either one of the LCPT, PMST, LST, VMST, NST 
is not truthful. 


Because of this negative result, they designed 
their non-VCG mechanisms for all multicast 
structures they studied: LCPT, PMST, LST, 
VMST, NST. For example, Algorithm 2 is the 
algorithm for PMST. For other algorithms, please 
refer to [8]. 


Truthful Multicast 


Algorithm 2 Non-VCG mechanism for PMST 


1: Apply VCG mechanism on the MST. The payment 
for edge ex, € PMST(d) is 


px(d) = @(MST(d|‘o0)) — @(MST(d)) + dy. (2) 


2: For every edge e, ¢ PMST(d), its payment is 0. 


Regarding all their non-VGC mechanisms, 
they proved the following theorem. 


Theorem 4 The non-VCG mechanisms designed 
for the multicast structures LCPT, PMST, LST, 
VMST,; NST are not only truthful, but also achieve 
the minimum payment among all truthful mecha- 
nisms. 


Applications 


In wireless ad hoc networks, it is commonly 
assumed that, each terminal contributes its local 
resources to forward the data for other termi- 
nals to serve the common good, and benefits 
from resources contributed by other terminals to 
route its packets in return. On the basis of such 
a fundamental design philosophy, wireless ad 
hoc networks provide appealing features such as 
enhanced system robustness, high service avail- 
ability and scalability. However, the critical ob- 
servation that individual users who own these 
wireless devices are generally selfish and non- 
cooperative may severely undermine the expected 
performances of the wireless networks. There- 
fore, providing incentives to wireless terminals is 
a must to encourage contribution and thus main- 
tains the robustness and availability of wireless 
networking systems. On the other hand, to sup- 
port a communication among a group of users, 
multicast is more efficient than unicast or broad- 
cast, as it can transmit packets to destinations 
using fewer network resources, thus increasing 
the social efficiency. Thus, most results of the 
work of Wang et al. can apply to multicast routing 
in wireless networks in which nodes are selfish. 
It not only guarantees that multicast routing be- 
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haves normally but also achieves good social effi- 
ciency for both the receivers and relay terminals. 


Open Problems 


There are several unsolved challenges left as 
future work in [8]. Some of these challenges are 
listed below. 


¢ How to design algorithms that can compute 
these payments in asymptotically optimum 
time complexities is presently unknown. 

¢ Wang et al. [8] only studied the tree-based 
structures for multicast. Practically, mesh- 
based structures may be more needed for 
wireless networks to improve the fault 
tolerance of the multicast. It is unknown 
whether a strategyproof multicast mechanism 
can be designed for some mesh-based 
structures used for multicast. 

¢ All of the tree construction and payment cal- 
culations in [8] are performed in a centralized 
way, it would be interesting to design some 
distributed algorithms for them. 

¢ Inthe work by Wang et al. [8] it was assumed 
that the receivers will always relay the data 
packets for other receivers for free, the source 
node of the multicast will pay the relay nodes 
to compensate their cost, and the source node 
will not charge the receivers for getting the 
data. As a possible future work, the budget 
balance of the source node needs to be con- 
sidered if the receivers have to pay the source 
node for getting the data. 

¢ Fairness of payment sharing needs to be con- 
sidered in a case where the receivers share 
the total payments to all relay nodes on the 
multicast structure. Notice that this is different 
from the cost-sharing studied in [2], in which 
they assumed a fixed multicast tree, and the 
link cost is publicly known; in that work 
they showed how to share the total link cost 
among receivers. 

¢« Another important task is to study how to 
implement the protocols proposed in [8] in 
a distributed manner. Notice that, in [3, 9], 
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distributed methods have been developed for 
a truthful unicast using some cryptography 
primitives. 
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Problem Definition 


An instance of the curve reconstruction problem 
is a finite set of sample points V in the plane, 
which are assumed to be taken from an unknown 
planar curve y. The task is to construct a geomet- 
ric graph G on V such that two points in V are 
connected by an edge in G if and only if the points 
are adjacent on y. The curve y may consist of one 
or more connected components, and each of them 
may be closed or open (with endpoints), and may 
be smooth everywhere (tangent defined at every 
point) or not. 

Many heuristic approaches have been 
proposed to solve this problem. This work 
continues a line of reconstruction algorithms 
with guaranteed performance, i.e., algorithms 
which probably solve the reconstruction 
problem under certain assumptions of y and 
V. Previous proposed solutions with guaranteed 
performances were mostly local: a subgraph of 
the complete geometric graph defined by the 
points is considered (in most cases the Delaunay 
edges), and then filtered using a local criteria into 
a subgraph that will constitute the reconstruction. 
Thus, most of these algorithms fail to enforce 
that the solution have the global property of 
being a path/tour or collection of paths/tours 
and so usually require a dense sampling to work 
properly and have difficulty handling nonsmooth 
See [6,7,8] for surveys of these 
algorithms. 


curves. 


TSP-Based Curve Reconstruction 


This work concentrates on a solution approach 
based on the traveling salesman problem (TSP). 
Recall that a traveling salesman path (tour) for 
a set V of points is a path (cycle) passing through 
all points in V. An optimal traveling salesman 
path (tour) is a traveling salesman path (tour) 
of shortest length. The first question is under 
which conditions for y and V a traveling salesman 
path (tour) is a correct reconstruction. Since the 
construction of an optimal traveling salesman 
path (tour) is an NP-hard problem, a second 
question is whether for the specific instances 
under consideration, an efficient algorithm is 
possible. 

A previous work of Giesen [9] gave a first 
weak answer to the first question: For every 
benign semiregular closed curve y, there exists 
an € >0 with the following property: If V 
is a finite sample set from y so that for 
every x € y there is a p € V with ||pv|| <e, 
then the optimal traveling salesman tour is 
a polygonal reconstruction of y. For a curve 
y : [0,1] > R?, its left and right tangents at 
y(to), are defined as the limits of the ratio 
lv(t2) — v(t) / |t2 -ti| as (t1,t2) converges 
to (fo,fo0) from the right (f9 < t; <f:2) and 
from the right (tf) <t2<t:0) respectively. 
A curve is semiregular if both tangents exist 
at every points and regular if the tangents exist 
and coincide at every point. The turning angle 
of y at p is the angle between the left and 
right tangents at a points p. A semiregular 
curve is benign if the turning angle is 
less than zr. 

To investigate the TSP-based solution of the 
reconstruction problem, this work considers its 
integer linear programming (JLP) formulation 
and the corresponding linear programming (LP) 
relaxation. The motivation is that a successful 
method for solving the TSP is to use a branch- 
and-cut algorithm based on the LP-relaxation. 
See Chapter 7 in [5]. For a path with endpoints 
a and b, the formulation is based on variables 
Xu,v € {0,1} for each pair u, v in V (indicating 
whether the edge wv is in the path (x,y = 1) or not 
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(Xy» = 0) and consists of the following objective 
function and constraints (x,,, = 0 for all u € V): 


minimize * luv || - Xun 
u,vEV 
subject to = Xuy=2 for all ueV \ {a, b} 
veV 
ym = | foru € {a,b} 
veV 
Yo xw <|V/|-1 for V’CY, 
u,veV’ 
V'AzAD 
Xuvy €{0, 1} for all u,v € V. 
Here ||uv|| denotes the Euclidean distance 


between u and v and so the objective function 
is the total length of the selected edges. This is 
called the subtour-ILP for the TSP with specified 
endpoints. The equality constraints are called 
the degree constraints, the inequality ones are 
called subtour elimination constraints and the 
last ones are called the integrality constraints. If 
the degree and integrality constraints hold, the 
corresponding graph could include disconnected 
cycles (subtours), hence the need for the subtour 
elimination constraints. The relaxed LP is 
obtained by replacing the integrality constraints 
by the constraints 0 < x,y < 1 and is called the 
subtour-LP for the TSP with specified endpoints. 
There is a polynomial time algorithm that given 
a candidate solution returns a violated constraint 
if it exists: the degree constraints are trivial to 
check and the subtour elimination constraints 
are checked using a min cut algorithm (if a,b 
are joined by an edge and all edge capacities 
are made equal to one, then a violated subtour 
constraint corresponds to a cut smaller than 
two). This means that the subtour-LP for the 
TSP with specified endpoints can potentially 
be solved in polynomial time in the bit size 
of the input description, using the ellipsoid 
method [10]. 
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TSP-Based Curve Reconstruction, Fig. 1 Sample data 
and its reconstruction 


Key Results 


The main results of this paper are that, given 
a sample set V with a,b ¢ V from a benign 
semiregular open curve y with endpoints a, b and 
satisfying certain sampling condition [it], then 


¢ The optimal traveling salesman path on V with 
endpoints a, b is a polygonal reconstruction of 
y from V, 

e The subtour-LP for traveling salesman paths 
has an optimal integral solution which is 
unique. 


This means that, under the sampling conditions, 
the subtour-LP solution provides a TSP solution 
and also suggests a reconstruction algorithm: 
solve the subtour-LP and, if the solution is 
integral, output it. If the input satisfies the 
sampling condition, then the solution will be 
integral and the result is indeed a polygonal 
reconstruction. Two algorithms are proposed to 
solve the subtour-LP. First, using the simplex 
method and the cutting plane framework: it 
starts with an LP consisting of only the degree 
constraints and in each iteration solves the current 
LP and checks whether that solution satisfies 
all the subtour elimination constraints (using 
a min cut algorithm) and, if not, adds a violated 
constraint to the current LP. This algorithm has 
a potentially exponential running time. Second, 
using a similar approach but with the ellipsoid 
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method. This can be implemented so that the 
running time is polynomial in the bit size of the 
input points. This requires justification for using 
approximate point coordinates and distances. 

The main tool in deriving these results is the 
connection between the subtour-LP and the so- 
called Held—Karp bound. The line of argument is 
as follows: 


¢ Let c(u,v) = |luv||) and w:V—>R_ be 
a potential function. The corresponding 
modified distance function cy, is defined by 
Cu(u, v) = c(u, v) — (uv) — Lv). 

¢ For any traveling salesman path T with end- 
points a, b, 


cu(T) = e(T) —2 9 av) + wa) + Hb), 
veV 


and so an optimal traveling salesman path with 
endpoints a, b for c,, is also optimal for c. 

e Let C, be the cost of a minimum spanning 
tree MST,, under c,,, then since a traveling 
salesman path is a spanning tree, the opti- 
mal traveling salesman To satisfies Cy, < 
cu(To) = c(To) — 2 vey Hv) + w@) + 
ju(b), and so 


max (c, +257 ev) — na) - uo) 


veV 


<c(To). 


The term on the left is the so called Held—Karp 
bound. 

¢ Now, if fora particular jp, MST,, is a path with 
endpoints a,b, then MST,, is in fact an op- 
timal traveling salesman path with endpoints 
a, b, and the Held—Karp bound matches c(To). 

¢ The Held—Karp bound is equal to the optimal 
objective value of the subtour-LP. This fol- 
lows by relaxation of the degree constraints 
in a Lagrangian fashion (see [5]) and gives 
an effective way to compute the Held-Karp 
bound: solve the subtour-LP. 

e Finally, a potential function | is constructed 
for y so that, for an appropriately dense sam- 
ple set V, MST,, is unique and is a polygonal 


TSP-Based Curve Reconstruction 


reconstruction with endpoints a, b. This then 
implies that solving the subtour-LP will pro- 
duce a correct polygonal reconstruction. 


Note that the potential function 1 enters the 
picture only as an analysis tool. It is not needed 
by the algorithm. The authors extend this work 
to the case of open curves without specified end- 
points and of closed curves using variations of the 
ILP formulation and a more restricted sampling 
condition. They also extend it to the case of 
a collection of closed curves. The latter requires 
preprocessing that partitions points into groups 
that are expected to form individual curves. Then 
each subgroup is processed with the subtour- 
LP approach and then the quality of the result 
assessed and then that partition may be updated. 


Finite Precision 

The above results are obtained assuming exact 
representation of point samples and the distances 
between them, so claiming a polynomial time 
algorithm is not immediate as the running time of 
the ellipsoid method is polynomial in the bit size 
of the input. The authors extend the results to the 
case in which points and the distances between 
them are known only approximately and from 
this they can conclude the polynomial running 
time. 


Relation to Local Feature Size 

The defined potential function «1 is related to 
the so called local feature size function f used 
in the theory of smooth curve reconstruction, 
where f(p) is defined as the distance from p 
to the medial axis of the curve y. In this paper, 
[L(p) is defined as d(p)/3 where d(p) is the 
size of the largest neighborhood of p so that y in 
that neighborhood does not deviate significantly 
from a flat segment of curve. This paper shows 
F(p) < 3d(p). In fact, w(p) amounts to a gen- 
eralization of the local feature size to nonsmooth 
curves (for a corner point p, {4(:p) is proportional 
to the size of the largest neighborhood of p 
such that y inside does not deviate significantly 
from a corner point with two nearly flat legs 
incident to it, and for points near the corner, | is 
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defined as an appropriate interpolation of the two 
definitions), and is in fact similar to definitions 
proposed elsewhere. 


Applications 


The curve reconstruction problem appears in ap- 
plied areas such as cartography. For example, to 
determine level sets, features, object contours, 
etc. from samples. Admittedly, these applications 
usually may require the ability to handle very 
sparse sampling and noise. The 3D version of 
the problem is very important in areas such as 
industrial manufacturing, medical imaging, and 
computer animation. The 2D problem is often 
seen as a simpler (toy) problem to test algorithmic 
approaches. 


Open Problems 


A TSP-based solution when the curve y is 
a collection of curves, not all closed, is not 
given in this paper. A solution similar to that for 
closed curves (partitioning and then application 
of subtour-LP for each) seems feasible for 
general collections, but some technicalities need 
to be solved. More interesting is the study of 
corresponding reconstruction approaches for 
surfaces in 3D. 


Experimental Results 


The companion paper [2] presents results of ex- 
periments comparing the TSP-based approach to 
several (local) Delaunay filtering algorithms. The 
TSP implementation uses the simplex method 
and the cutting plane framework (with a poten- 
tially exponential running time algorithm). The 
experiments show that the TSP-based approach 
has a better performance, allowing for much 
sparser samples than the others. This is to be ex- 
pected given the global nature of the TSP-based 
solution. On the other hand, the speed of the TSP- 
based solution is reported to be competitive when 
compared to the speed of the others, despite its 
potentially bad worst-case behavior. 
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Data Sets 


None reported. Experiments in [2] were per- 
formed with a simple reproducible curve based 
on a sinusoidal with varying number of periods 
and samples. 


URL to Code 


The code of the TSP-based solution as well as 
the other solutions considered in the companion 
paper [2] are available from: http://www.mpi- 
inf.mpg.de/~althaus/LEP:Curve-Reconstruction/ 
curve.html 
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Problem Definition 


Definition 1 Let T be a two-dimensional n x n 
array over some alphabet »’. 


1. 


The unit pixels array for T (T'*) consists of 
n° unit squares, called pixels in the real plane 
2. The corners of the pixel T[i, j] are (i — 
1, 7-1), @, 7-1), @-1, J), and (i, 7). Hence 
the pixels of T form a regular 7 x n array that 
covers the area between (0,0), (7,0), (0,7), 
and (n,7). Point (0, 0) is the origin of the unit 
pixel array. The center of each pixel is the 
geometric center point of its square location. 
Each pixel T[i, j] is identified with the value 
from » that the original array T had in that 


position. Say that the pixel has a color or 
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a character from 2’. See Fig. 1 for an exam- 
ple of the grid and pixel centers of a 7 x 7 
array. 

2. Let reR,r >. The r-ary pixels array 
for T (T) consists of n°r-squares, each of 
dimension r x r whose origin is (0,0) and 
covers the area between (0, 0), (nr, 0), (0,71), 
and (nr,nr). The corners of the pixel T[i, /] 
are (i — Dr,@ — Dr).ir. — Dr), 
(@ — lr, jr), and (ir, jr). The center of 
each pixel is the geometric center point of its 
square location. 


Notation: Let r € i. [r] denotes the rounding 
of r, 1.e., 


ifr — |r| <.5; 


otherwise. 


[r]= 7 


Definition 2 Let T be an nxn text array, 
P be an mx™m pattern array over alphabet 
x, and let re R,1<r <7. Say that there 
is an occurrence of P scaled to r at text 
location (i,7) if the following conditions 
hold: 

Let T'* be the unit pixels array of T and P’* 
be the r-ary pixel arrays of P. Translate P’* onto 
T'* in a manner that the origin of P™ coincides 
with location (i — 1, 7 — 1) of T'*. Every center 
of a pixel in T'* which is within the area covered 
by @-1,7-),@-1,j7 —l+mr),@-1+ 
mr, j —1) and (i —1+mr, j —1+ mr) has the 
same color as the r-square of P™ in which it falls. 

The colors of the centers of the pixels in T'* 
which are within the area covered by (i — 1, j — 
1),@-1,7 -—14+mr),@—1+mr, j —1) and 
(i —1+ mr, 7 —1+ mr) define a [mr] x [mr] 
array over /’. This array is denoted by P™” and 
called P scaled to r. 


The above definition is the one provided in 
the geometric model, pioneered by Landau and 
Vishkin [15], and Fredriksson and Ukkonen [14]. 
Prior to the advent of the geometric model, 
the only discrete definition of scaling was to 
natural scales, as defined by Amir, Landau and 
Vishkin [10]: 
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Definition 3 Let P[m x m] be a two-dimensional 
matrix over alphabet 2 (not necessarily 
bounded). Then P scaled by s (P*) is the 
sm xX sm matrix where every symbol Pii, /] 
of P is replaced by a s xs matrix whose 
elements all equal the symbol in P[i, 7]. More 
precisely, 


Po l= PUAN. 


Say that pattern P[m x m] occurs (or an occur- 
rence of P starts) at location (k,/) of the text 
T if for any i € {1,...,m} and 7 € {1,...,m}, 
Tik+i-1,/+ 7-1) = Pii, J]. 

The two dimensional pattern matching 
problem with natural scales is defined as 
follows. 


INPUT: Pattern matrix P[i,j] i a 
1,...m;j = 1,...,m and Text matrix 7/7, /] 
i=1,...,.m;7 =1,...,n wheren > m. 


OUTPUT: all locations in T where an occur- 
rence of P scaled by s (an s-occurrence) starts, 
for anys = Tynes | Fh 


The natural scales definition cannot answer 
normal everyday occurrences such as an image 
scaled to, say, 1.3. This led to the geometric 
model. The geometric model is a discrete 
adaptation, without smoothing, of scaling as used 
in computer graphics. The definition is pleasing 
in a “real-world” sense. Figure 2 shows “lenna” 
scaled to non-discrete scales by the geometric 
model definition. The results look natural. 


It is possible, of course, to consider a one 
dimensional version of scaling, or scaling in 
strings. Both above definitions apply for one 
dimensional scaling where the text and pattern 
are taken to be matrices having a single row. 
The interest in one dimensional scaling lies 
because of two reasons: (1) There is a faster 
algorithm for one dimensional scaling in the 
geometric model than the restriction of the two 
dimensional scaling algorithm to one dimension. 
(2) Historically, before the geometric model was 
defined, there was an attempt [3] to define real 
scaling on strings as follows. 
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Two-Dimensional Scaled (0) 
Pattern Matching, Fig. 1 
The grid and pixel centers 
of a unit pixel array for 
a7 X 7 array 


T(1,1) | T(1,2) 
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1 
T(2,1) | T(2,2) ° ° ° 
2 
T(3,1) | T(3,2) ° ° ° 
3 
° ) ° ° ° ° ° 
4 
° ° o =| T(5,4) ° ° ° 
5 
° ° ° ° ° ° ° 
6 
° ° ° ° ° ° T(7,7) 
7 


. 
{, 


"al 


Two-Dimensional Scaled Pattern Matching, Fig. 2 An original image, scaled by 1.3 and scaled by 2, using the 


geometric model definition of scaling 


Definition 4 Denote the string aa---a, where a 
is repeated r times, by a’. The one dimensional 
floor real scaled matching problem is the follow- 
ing. 
INPUT: A pattern P = aj'a;’.. 
m, and a text T of length n. 
OUTPUT: All locations in the text where the sub- 
C1 glr2k | ly-1K1 07 appears, where 


string dyad," ...4; 7) a; 


ay , of length 


c1 => [rik] andc; > [r;k]. 


This definition indeed handles real scaling but has 
a significant weakness in that a string of length 
m scaled to r may be significantly shorter than 
mr. For this reason the definition could not be 
generalized to two dimensions. The geometric 
model does not suffer from these deficiencies. 
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Key Results 


The first results in scaled natural matching dealt 
with fixed finite-sized alphabets. 


Theorem 1 (Amir, Landau, and Vishkin [10]) 
There exists an O(|T|log|2'|) worst-case time 
solution to the two dimensional pattern match- 
ing problem with natural scales, for fixed finite 
alphabet &. 


The main idea behind the algorithm is analyzing 
the text with the aid of power columns. Those 
are the text columns appearing m — 1 columns 
apart, where P is an m xm pattern. This de- 
pendence on the pattern size make the power 
columns useless where a dictionary of differ- 
ent sized patterns is involved. A significantly 
simpler algorithm with an additional advantage 
of being alphabet-independent was presented 
in [6]. 


Theorem 2 (Amir and Calinescu [6]) There 
exists an O(|T|) worst-case time solution to the 
two dimensional pattern matching problem with 
natural scales. 


The alphabet independent time complexity 
of this algorithm was achieved by developing 
a scaling-invariant “signature” of the pattern. 
This idea was further developed to scaled 
dictionary matching. 


Theorem 3 (Amir and Calinescu [6]) Given 
a Static dictionary of square pattern matrices. It 
is possible in O(|D|logk) preprocessing, where 
|D| is the total dictionary size and k is the number 
of patterns in the dictionary, and O(|T|logk) 
text scanning time, for input text T, to find all 
occurrences of dictionary patterns in the text in 
all natural scales. 


This is identical to the time at [8], the best non- 
scaled matching algorithm for a static dictionary 
of square patterns. It is somewhat surprising that 
scaling does not add to the complexity of single 
matching nor dictionary matching. 
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The first algorithm to solve the scaled match- 
ing problem for real scales, was a one dimen- 
sional real scaling algorithm using Definition 4. 


Theorem 4 (Amir, Butman, and Lewen- 
stein [3]) There exists an O(|T|) worst-case time 
solution to the one dimensional floor real scaled 
matching problem. 


The first algorithm to solve the two dimen- 
sional scaled matching problem for real scales in 
the geometric model is the following. 


Theorem 5 (Amir, Butman, Lewenstein, 
and Porat [4]) Given an nxn_ text and 
mxm pattern. It is possible to find all 
pattern occurrences in all real scales in time 
O(nm? + n*m logm) and space O(nm? + n?). 


The above result was improved. 


Theorem 6 (Amir and Chencinski [7]) Given 
ann Xn text andm x m pattern. It is possible to 
find all pattern occurrences in all real scales in 
time O(n?m) and space O(n). 


This algorithm achieves its time by exploiting 
geometric characteristics of nested scales occur- 
rences and a sophisticated use of dueling [1, 16]. 

The assumption in both above algorithms is 
that the scaled occurrence of the pattern starts at 
the top left corner of some pixel. 

It turns out that one can achieve faster times in 
the one dimensional real scaled matching prob- 
lem, even in the geometric model. 


Theorem 7 (Amir, Butman, Lewenstein, Porat, 
and Tsur [5]) Given a text string T of length n 
and a pattern string P of length m, there exists 
an O(nlogm + m vnm logm) worst-case time 
solution to the one dimensional pattern matching 
problem with real scales in the geometric model. 


Applications 


The problem of finding approximate occurrences 
of a template in an image is a central one in 
digital libraries and web searching. The current 
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algorithms to solve this problem use methods 
of computer vision and computational geometry. 
They model the image in another space and seek 
a solution there. A deterministic worst-case algo- 
rithm in pixel-level images does not yet exist. Yet, 
such an algorithm could be useful, especially in 
raw data that has not been modeled, e.g., movies. 
The work described here advances another step 
toward this goal from the scaling point of view. 


Open Problems 


Finding all scaled occurrences without fixing the 
scaled pattern start at the top left corner of the 
text pixel would be important from a practical 
point of view. The final goal is an integration of 
scaling with rotation [2, 11-13] and local errors 
(edit distance) [9]. 
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Problem Definition 


The problem is concerned with finding large 
constrained patterns in sets of 2-intervals. Given 
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a single-stranded RNA molecule, a sequence of 
contiguous bases of the molecule can be repre- 
sented as an interval on a single line, and a possi- 
ble pairing between two disjoint sequences can 
be represented as a 2-interval, which is merely 
the union of two disjoint intervals. Derived from 
arc-annotated sequences, 2-interval representa- 
tion considers thus only the bonds between the 
bases and the pattern of the bonds, such as hairpin 
structures, knots and pseudoknots. A maximum 
cardinality disjoint subset of a candidate set of 
2-intervals restricted to certain prespecified ge- 
ometrical constraints can provide a useful valid 
approximation for RNA secondary structure de- 
termination. 

The geometric properties of 2-intervals pro- 
vide a possible guide for understanding the com- 
putational complexity of finding structured pat- 
terns in RNA sequences. Using a model to rep- 
resent nonsequential information allows us to 
vary restrictions on the complexity of the pattern 
structure. Indeed, two disjoint 2-intervals, i.e., 
two 2-intervals that do not intersect in any point, 
can be in precedence order (<), be allowed to 
nest (2) or be allowed to cross (4). Furthermore, 
the set of 2-intervals and the pattern can have 
different restrictions, e.g., all intervals have the 
same length or all the intervals are disjoint. These 
different combinations of restrictions alter the 
computational complexity of the problems, and 
need to be examined separately. This examination 
produces efficient algorithms for more restrictive 
structured patterns, and hardness results for those 
that are less restrictive. 


Notations 

Let J = [a,b] be an interval on the line. Write 
start) = a and end(/) = b. A 2-interval is the 
union of two disjoint intervals defined over a sin- 
gle line and is denoted by D = (J, J); J is com- 
pletely to the left of J. Write left(D) = 1 and 
right(D) = J. Two 2-intervals D; = (11, J,) and 
Dz = (Io, Jz) are said to be disjoint (or nonin- 
tersecting) if both 2-intervals share no common 
point, ie., (2y U Jy) N U2 U Jz) = O. For such 
disjoint pairs of 2-intervals, three natural binary 
relations, denoted <, C and 4, are of special 
interest: 
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¢ Dy, < D2 (D, precedes D2), ify < i 
<I < Jo, 

* D,C D2 (D, isnested in Do), if In < i 
< J, < Jo, and 

¢ Dy, § D2 (Di crosses D2), if ly} <In< J 
< J. 


A pair of 2-intervals D,; and Dz is said to 
be R-comparable for some R € {<,C, §}, if 
either D; RD2 or D2RD,. Note that any two 
disjoint 2-intervals are R-comparable for some 
Re{<,C,%}. A set of disjoint 2-intervals 
D is said to be R-comparable for some 
R C{<,C, t}, R F G, if any pair of distinct 2- 
intervals in D is R-comparable for some R € R. 
The nonempty subset R is called a model for D. 

The 2-interval-pattern problem asks one to 
find in a set of 2-intervals a largest subset of 
pairwise compatible 2-intervals. In the present 
context, compatibility denotes the fact that any 
two 2-intervals in the solution are (1) noninter- 
secting and (2) satisfy some prespecified geomet- 
rical constraints. The 2-interval-pattern problem 
is formally defined as follows: 


Problem 1 (2-interval-pattern) 

INPUT: A set of 2-intervals D and a model 
R C{<,C, Y. 

SOLUTION: A R-comparable subset D’ C D. 
MEASURE: The size of the solution, i.e., |D’|. 


According to the above definition, any solution 
for the 2-interval-pattern problem for some 
model R C {<,, §} corresponds to an RNA 
structure constrained by ®. For example, 
a solution for the 2-interval-pattern problem 
for the R={<,C} model corresponds to 
a pseudoknot-free structure (a pseudoknot in an 
RNA sequence S = 51,52,...,5n, iS composed 
of two interleaving nucleotide pairings (s;,5;) 
and (sj, 8;7) such thati <i’ < j <j’). 

Some additional definitions are needed for 
further algorithmic analysis. Let D be a set 
of 2-intervals. The width (respectively height, 
depth) is the size of a maximum cardinality {<}- 
comparable (respectively {-}-comparable, {{}- 
comparable) subset D’ C D. The interleaving 
distance of a 2-interval D; € D is defined to 
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be the distance between the two intervals of D,, 
ie., start(right(D;)) — end(left(Dj)). The total 
interleaving distance of the set of 2-intervals D, 
written £(D), is the sum of all interleaving dis- 
tances, i.e., £L(D) = )'p, en Start(right(Di)) — 
end(left(D;)). The interesting coordinates 
of D are defined to be the set X(D) = 
Up, entend(left(D))), start(right(Dj))}. The 
density of D, written d(D), is the maximum 
number of 2-intervals in D over a single point. 
Formally, d(D) = maxyex~{D € D 
end(left(D) < x < start(right(D))}. 


Constraints 

The structure of the set of all (simple) intervals 
involved in a set of 2-intervals D turns out to be 
of particular importance for algorithmic analysis 
of the 2-interval-pattern problem. The interval 
ground set of D, denoted 7(D), is the set of all 
intervals involved in D, i.e., 7(D) = {left(Dj) : 
Di € D} U {right < (D: i) : Di € Dh. 
In [7, 20], four types of interval ground sets were 
introduced. 


1. Unlimited: no restriction on the structure. 

2. Balanced: each 2-interval D; € D is com- 
posed of two intervals having the same length, 
ie., |left(D,)| = |right(D;)]. 

3. Unit: the interval ground set J(D) is solely 
composed of unit length intervals. 

4. Disjoint: no two distinct intervals in the inter- 
val ground set J(D) intersect. 


Observe that a unit 2-interval set is balanced, 
while the converse is not necessarily true. Fur- 
thermore, for most applications, one may assume 
that a disjoint 2-interval set is unit. Observe that 
in this latter case, a set of 2-intervals reduces to 
a graph G = (V, E) equipped with a numbering 
of its vertices from 1 to |V|, and hence the 
2-interval-pattern problem for disjoint interval 
ground sets reduces to finding a constrained max- 
imum matching in a linear graph. Considering 
additional restrictions such as: 


¢ Bounding the width, the height or the depth 
of either the input set of 2-intervals or the 
solution subset 
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¢ Bounding the interleaving distances 


is also of interest for practical applications. 


Key Results 


The different combinations of the models and 
interval ground sets alter the computational com- 
plexity of the 2-interval-pattern problem. The 
main results are summarized in Tables | (time 
complexity and hardness) and 2 (approximation 
for hard instances). 


Theorem 1 The 2-interval-pattern problem is 
approximable (APX) hard for models R = {<, 
C,§} and R = {C, §}, and is nondeterministic 
polynomial-time (NP) complete — in its natural 
decision version — for model R = {<, §}, even 
when restricted to unit interval ground sets. 


Notice here that the 2-interval-pattern problem 
for model R = {<, §} is not APX-hard. Two 
hard cases of the 2-interval-pattern turn out to 
be polynomial-time-solvable when restricted to 
disjoint-interval ground sets. 


Theorem 2 The 2-interval-pattern problem for a 
disjoint-interval ground set is solvable in 


* O(nJn) time for model R = {<,C, U} (triv- 
ial reduction to the standard maximum match- 
ing problem) 


Two-Interval Pattern Problems, Table 1 Complexity 
of the 2-interval-pattern problem for all combinations of 
models and interval ground sets. For the polynomial-time 
cases, n = |D|, £L = L(D) andd = d(D) 


Interval ground set 7(D) 
Model R Unlimited, balanced, unit Disjoint 


{<,, 0} APX-hard [1] O(nJ/n) [15] 

{<, t} NP-complete [3] unknown 

{£,%}  APX-hard [19] O(n logn + L£) [8] 
{<,£} O(mlogn+nd) [8] 

{<} O(n logn) [19] 

{C}  Olmlogn) [3] 

{S}  Olnlogn + £) [8] 
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* O(nlogn + L) time for model R = {C, ¥} 


The complexity of the 2-interval-pattern prob- 
lem for model R = {<, §} and a disjoint-interval 
ground set is still unknown. Three cases of the 2- 
interval-pattern problem are polynomial-time- 
solvable, regardless of the structure of the interval 
ground sets. 


Theorem 3 The 2-interval-pattern problem is 
solvable in 


* O(nlogn + nd) time for model R = {<,C} 

* O(nlogn) time for models R = {<} and 
R= {Cc} 

* O(nlogn + L£) time for model R = {¥} 


One may now turn to approximating hard in- 
stances of the 2-interval-pattern problem. Sur- 
prisingly enough, no significant differences (in 
terms of approximation guarantees) have yet been 
found for the 2-interval-pattern problem be- 
tween the model R = {<,(, {} and the model 
R = {C, §} (the approximation algorithms are, 
however, different). 


Theorem 4 The 2-interval-pattern problem for 
model R = {<,C, §} or model R = {C, U} is 
approximable within ratio 


¢ 4 for unlimited-interval ground sets, and 
e 2+ € for unit-interval ground sets. 


The 2-interval-pattern problem for model 
R={<,%} is approximable within ratio 
1+ 1/e, € => 2 for all models. 


A practical 3-approximation algorithm for 
model R = {<,C, §} (resp. R = {C, §}) and 
unit interval ground set that runs in O(n lgn) 
(resp. O(n?lgn)) time has been proposed 
in [1] (resp. [7]). For model R = {<, §}, a 
more practical 2-approximation algorithm that 
runs in O(n? lgn) time has been proposed in 
[10]. Notice that Theorem 4 holds true for 
the weighted version of the 2-interval-pattern 
problem [7] except for models R = {<,L, 0} 
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and R = {C, {} and unit interval ground set 
where the best approximation ratio is 2.5 + € [5]. 


Applications 


Sets of 2-intervals can be used for modeling 
stems in RNA structures [20, 21], determining 
DNA sequence similarities [13] or scheduling 
jobs that are given as groups of nonintersecting 
segments in the real line [1, 9]. In all these 
applications, one is concerned with finding 
a maximum cardinality subset of nonintersecting 
2-intervals. Some other classical combinatorial 
problems are also of interest [5]. Also, 
considering sets of t-intervals (each element is 
the union of at most ¢ disjoint intervals) and their 
corresponding intersection graph has proved to 
be useful. 

It is computationally challenging to predict 
RNA structures including pseudoknots [14]. 
Practical approaches to cope with intractability 
are either to restrict the class of pseudoknots 
under consideration [18] or to use heuris- 
tics [6, 17, 19]. The general problem of 
establishing a general representation of structured 
patterns, i.e., macroscopic describers of RNA 
structures, was considered in [20]. Sets of 
2-intervals provide such a natural geometric 
description. 

Constructing a relevant 2-interval set from 
a RNA sequence is relatively easy: stable stems 
are selected, usually according to a simplified 
thermodynamic model without accounting for 
loop energy [2, 16, 19-21]. Predicting a reliable 
RNA structure next reduces to finding a max- 
imum subset of nonconflicting 2-intervals, i.e., 
a subset of disjoint 2-intervals. Considering in 
addition a model R C {<,L, §} allows us to 
vary restrictions on the complexity of the pattern 
structure. In [21], the treewidth of the intersection 
graph of the set of 2-intervals is considered for 
speeding up the computation. 

For sets of 2-intervals involved in practical 
applications, restrictions on the interval ground 
set are needed. Unit interval ground sets were 
considered in [7]. Of particular importance in 
the context of molecular biology (RNA structures 
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Two-Interval Pattern Problems 


Two-Interval Pattern Problems, Table 2. Performance ratios for hard instances of the 2-interval-pattern problem. 
LP stands for Linear Programming and N/A stands for Not Applicable 


Interval ground set 7(D) 


Model R Unlimited Balanced Unit Disjoint 
{<,5,%} 4LP [1] 4 O(nlgn) [7] 24+ €O(n2 + nOs!/©)) [13] | N/A 
{C, h} 4 LP [7] 4 O(n? Ign) [7] 24+ €O(n2 + nOrs!/©)) [13] | N/A 
{<,U} 14+ 1/e0(n7*t+3), € > 2[14] 


and DNA sequence similarities) are balanced 
interval ground sets, where each 2-interval is 
composed of two equally length intervals. 


Open Problems 


A number of problems related to the 2-interval- 
pattern problem remain open. First, improving 
the approximation ratios for the various flavors 
of the 2-interval-pattern problem is of partic- 
ular importance. For example, the existence of 
a fast approximation algorithm with good per- 
formance guarantee for the 2-interval-pattern 
problem for model R = {<,C, %} remains an 
apparently challenging open problem. A related 
open research area is concerned with balanced- 
interval ground sets. In particular, no evidence 
has shown yet that the 2-interval-pattern prob- 
lem becomes easier to approximate for balanced- 
interval ground sets. This question is of special 
importance in the context of RNA structures 
where most 2-intervals are balanced. 

A number of important question are still 
open for model R = {<, {}. First, it is still 
unknown whether the  2-interval-pattern 
problem for disjoint-interval ground sets and 
model R = {<, §} is polynomial-time-solvable. 
Observe that this problem trivially reduces to 
the following graph problem: Given a graph 
G=(V,E) with V={1,2,...,n}, — find 
a maximum cardinality matching M C E such 
that for any two distinct edges {i, 7} and {k,/} 
of M,i<j, k<l/ andi <k, either j <k 
or j </. Another open question concerns the 
approximation of the 2-interval-pattern problem 
for balanced interval ground set. Is this special 
case better approximable than the general case? 


A last direction of research is concerned with 
the parameterized complexity of the 2-interval- 
pattern problem. For example, it is not known 
whether the 2-interval-pattern problem for mod- 
els R= {<,C,%}, R= {C, 0} or R = {<, §} 
is fixed-parameter-tractable when parameterized 
by the size of the solution. Also, investigating the 
parameterized complexity for parameters such 
as the maximum number of pairwise crossing 
intervals in the input set or the treewidth of 
the corresponding intersection 2-interval graph, 
which are expected to be relatively small for most 
practical applications, is of particular interest. 
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Problem Definition 


The UNDIRECTED FEEDBACK VERTEX SET 
(UFVS) problem is defined as follows: 


An undirected graph G = (V, E) and an 
integer k > 0. 

Find a feedback vertex set F CV 
with | F'| <k such that each cycle in G 
contains at least one vertex from F. 
(The removal of all vertices in F from G 
results in a forest.) 


Karp [11] showed that UFVS is NP-complete. 
Lund and Yannakakis [12] proved that there 
exists some constant € > O such that it is NP-hard 


Input: 


Task: 


© Springer Science+Business Media New York 2016 
M.-Y. Kao (ed.), Encyclopedia of Algorithms, 
DOI 10.1007/978-1-4939-2864-4 


to approximate the optimization version of UFVS 
to within a factor of | + €. The best-known 
polynomial-time approximation algorithm for 
UFVS has a factor of 2 [1, 4]. There is a simple 
and elegant randomized algorithm due to Becker 
et al. [3] which solves UFVS in O(c-4*-k n) time 
on an n-vertex and m-edge graph by finding 
a feedback vertex set of size k with probability at 
least 1 — (1 —4-* yea for an arbitrary constant c. 
An exact algorithm for UFVS with a running 
time of O(1.7548") was recently found by 
Fomin et al. [9]. In the context of parameterized 
complexity [8, 13], Bodlaender [5] and Downey 
and Fellows [7] were the first to show that the 
problem is fixed-parameter tractable, i.e., that 
the combinatorial explosion when solving it can 
be confined to the parameter k. The currently 
best fixed-parameter algorithm for UFVS runs 
in O(c‘-mn) for a constant c [6, 10] (see [6] for 
the so far best running time analysis leading to 
a constant c = 10.567). This algorithm is the 
subject of this entry. 


Key Results 


The O(c*-mn)-time algorithm for the UNDI- 
RECTED FEEDBACK VERTEX SET is based on 
the so-called “iterative compression” technique, 
which was introduced by Reed et al. [14]. The 
central observation of this technique is quite 
simple but fruitful: To derive a fixed-parameter 
algorithm for a minimization problem, it suffices 
to give a fixed-parameter “compression routine” 
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that, given a size-(k + 1) solution, either proves 
that there is no size-k solution or constructs one. 
Starting with a trivial instance and iteratively 
applying this compression routine a_ linear 
number of rounds to larger instances, one obtains 
a fixed-parameter algorithm of the problem. 
The main challenge of applying this technique 
to UFVS lies in showing that there is a fixed- 
parameter compression routine. 

The compression routine from [6, 10] works 
as follows: 


1. Consider all possible partitions (X, Y) of 
the size-(k +1) feedback vertex set F 
with |X|<k under the assumption that 
set X is entirely contained in the new size- 
k feedback vertex set F’ and YN F’ = @ 

2. For each partition (X, Y), if the vertices in Y 
induce cycles, then answer “no” for this par- 
tition; otherwise, remove the vertices in X. 
Moreover, apply the following data reduction 
rules to the remaining graph: 

e Remove degree-1 vertices. 

e If there is a degree-2 vertex v with 
two neighbors v; and v2, where v; ¢ Y 
or v2 ¢ Y, then remove v and connect v; 
and v2. If this creates two parallel edges 
between v; and v2, then remove the vertex 
of v; and v> that is not in Y and add it to 
any feedback vertex set for the reduced 
instance. 

Finally, exhaustively examine every vertex 

set S with size at most k — |X| of the reduced 

graph as to whether S can be added to X to 
form a feedback vertex set of the input graph. 

If there is one such vertex set, then output it 

together with X as the new size-k feedback 

vertex set. 


The correctness of the compression routine 
follows from its brute-force nature and the easy 
to prove correctness of the two data reduction 
rules. The more involved part is to show that 
the compression routine runs in O(ck-m) time: 
There are 2+1 partitions of F into the above 
sets (X, Y) and one can show that, for each 
partition, the reduced graph after performing 
the data reduction rules has at most d-k vertices 
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for a constant d; otherwise, there is no size- 
k feedback vertex set for this partition. This 
then gives the O(c*-m)-running time. For more 
details on the proof of the d-k-size bound 
see [6, 10]. 

Given as input a graph G with vertex set 
{v1,...,Un}, the fixed-parameter algorithm 
from [6, 10] solves UFVS by iteratively con- 
sidering the subgraphs Gj; := G[{v1,...,v;}]. 
For i = 1, the optimal feedback vertex set 
is empty. For i > 1, assume that an optimal 
feedback vertex set X; for G; is known. 
Obviously, X; U{vj41} is a solution set 
for Gj;+1. Using the compression routine, 
the algorithm can in O(ckm) time either 
determine that Xj; U{vj4i1} is an optimal 
feedback vertex set for Gj+1, or, if not, 
compute an optimal feedback vertex set 
for Gj41. For i =n, we thus have computed 
an optimal feedback vertex set for G in O(ck-mn) 
time. 


Theorem 1 UNDIRECTED FEEDBACK VERTEX 
SET can be solved in O(ck-mn) time for a con- 
stant c. 


Applications 


The UNDIRECTED FEEDBACK VERTEX SET 
is of fundamental importance in combinatorial 
optimization. One typical application, for 
example, appears in the context of combinatorial 
circuit design [1]. For applications in the 


areas of constraint satisfaction problems 
and Bayesian inference, see Bar-Yehuda 
et al. [2]. 


Open Problems 


It is open to explore the practical performance 
of the described algorithm. Another research 
direction is to improve the running time 
bound given in Theorem 1. Finally, it remains 
a long-standing open problem whether the 
FEEDBACK VERTEX SET on directed graphs 
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is fixed-parameter tractable. The answer to 
this question would represent a_ significant 
breakthrough in the field. 
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Problem Definition 


The notion of searching a graph, in particular 
visiting each vertex in a systematic preordained 
fashion, is as old as graph theory itself. Indeed, 
Euler’s paper in 1736 [13] presented conditions 
on the vertex degrees of a graph that would 
certify the presence or absence of a path (or 
circuit) of edges visiting each edge exactly once. 
Later it was shown by Fleury that an easy al- 
gorithm to find such a path (or circuit) can be 
achieved using depth-first search (DFS) [14]. In 
the late nineteenth century, C. P. Trémaux [22] 
and G. Tarry [28] presented DFS-based algo- 
rithms for maze traversal; similarly, breadth-first 
search (BFS) algorithms were used to find the 
shortest possible successful maze traversals. 
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In the 1960s and 1970s, these searches were 
used in many of the early graph algorithms for 
problems such as distance and diameter determi- 
nation, network flows, planar graph recognition, 
and connected and 2-connected components; see 
[4,27]. A “generic” (GENS) search, as defined 
by Tarjan [27], is one in which an unvisited 
vertex adjacent to a visited vertex must be chosen 
before an arbitrary unvisited vertex. Note that this 
criterion includes the standard graph searches, 
but does not include some useful vertex orderings 
such as nonincreasing vertex degree. We caution 
the reader that by BFS we follow Golumbic [16] 
and refer to the distance layering where unvis- 
ited neighbors of the currently visited vertex are 
placed at the end of a queue data structure. Note 
that in [4] BFS is defined as distance layering, but 
their implementation of distance layering uses a 
queue. Throughout this note, we assume that our 
graphs are connected. 

In a seminal 1976 paper, Rose, Tarjan, and 
Lueker [24] introduced a variation of BFS, called 
lexicographic breadth-first search (LBFS), and 
showed that an arbitrary LBFS search could be 
used to achieve a linear time algorithm to recog- 
nize chordal graphs (there is no induced cycle of 
size strictly greater than 3). After the appearance 
of this paper, there were a few new applications of 
LBFS, mostly on applications on graph families 
related to chordal graphs. In the 1990s there was 
a marked increase in the application of LBFS 
in which a previous vertex ordering (usually a 
previous LBFS ordering) was used to break ties 
when there was more than one vertex eligible to 
be visited next. Such tiebreaking is referred to as 
a “++-sweep,” as defined below. 


Definition 1 Given a search S and vertex or- 
dering o of graph G, a plus S sweep with 
respect to o (denoted S*(G,c)) is the vertex 
ordering where the next vertex to be visited is the 
rightmost (as ordered by o) 7 vertex (where T 
denotes the set of tied vertices). 


Such “multi-sweep” LBFS algorithms were 
used for the recognition of interval graphs (for 
definitions of, and basic results on, various graph 
classes mentioned in this paper, see [3]), unit 
interval graphs, and cographs as well as finding 
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dominating pairs (a pair of vertices (x, y) such 
that every x — y path P dominates G in the sense 
that every vertex of G is either on P or has a 
neighbor on P) for asteroidal triple-free graphs. 
See [5] for an overview of these applications 
of LBFS. More recently, applications of LBFS 
have been found for graphs in general for such 
problems as modular decomposition [29] and 
split decomposition [15]. 

The proofs of correctness of multi-sweep 
LBFS algorithms typically are based on the 
following “4-vertex ordering characterization 
of LBFS”: 


Theorem 1 ((2, 16]) A vertex total ordering o 
could be produced by an LBFS if and only if for 
every triple of vertices {a, b,c} where a <g b <g 
c,ac € E, andab € E there exists vertex d such 
thatd <g a,db € E, and dc € E. 


Having seen the importance of Theorem | 
to the development of multi-sweep LBFS algo- 
rithms, a natural question and the question that is 
the basis of the Graph Searching Paper [6] is: 


Do other standard graph searches have a similar “4- 
vertex ordering characterization”? 


Key Results 


The first reaction to the question posed above is to 
try to understand exactly what is the structure of 
the search imposed by the {a,b,c} vertices. The 
relevant question is: 


In the presence of the ac edge, how could b have 
been visited before c? 


Since we are dealing with “generic” searches, 
some vertex d which is adjacent to b must have 
been visited before b since otherwise a would 
have to be chosen before b. If the search we are 
considering does not impose any further condi- 
tions on which unvisited vertices are eligible to be 
chosen, then the existence of d with db € E and 
d <, b is a “4-vertex ordering characterization 
of GENS search.” The full statement of the main 
theorem proved in [6] is: 


Unified View of Graph Searching and LDFS-Based Certifying Algorithms 


Theorem 2 ((6]) For S a graph search in 
{GENS, BFS, DFS, MNS, LBFS, LDFS} a total 
ordering o of the set of vertices of the given 
graph could have been produced by S if and 
only if for every triple a <g b <g c ino 
where ac € E,ab ¢ E there exists vertex d 
satisfying the requirements stated in the following 
table: 


Requirements on d 

Search S Location Adjacencies 
GENS d<b dbeE 

BFS d<a dbeE 

DFS a<d<b dbeE 

MNS d<b dbeE,dc€éE 
LBFS d<a dbeE,dc€éE 
LDFS a<d<b dbeE,dcéE 


Note that the hierarchy among these differ- 
ent searches (and layered search) is shown in 
Fig. 1. 

In the case of BFS, the characterization states 
that b must have a neighbor d where d <q a. 
In effect, this location for d reflects the role 
played by the queue in the BFS algorithm. 
Similarly, for DFS the location of d is between 
a and b reflecting the role played by the stack 
in the DFS algorithm. Note that the locations 
of d imposed by BFS and DFS capture the 
full range allowed by GENS search thereby 
exhibiting a type of duality between BFS and 
DFS. 

Now look at the difference between the 
characterizations of both BFS and LBFS. Both 
have the same location requirement for vertex d; 
however, BFS requires d to be a neighbor of b, 
whereas LBFS strengthens this condition so that 
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d has to be a private neighbor of b with respect 
to c. This means that b’s neighborhood in the set 
of visited vertices is maximal with respect to 
set inclusion. This raises the question of what 
happens to both DFS and GENS search if we 
add the “lexicographic” property — i.e., as with 
LBFS, that d has to be a private neighbor of b 
with respect to c, and thus that b’s neighborhood 
in the set of visited vertices is maximal with 
respect to set inclusion. In the case of the 
“lexical” version of GENS search, this search 
was already known as the maximal neighbor 
search (MNS) [25], in particular, a search that 
chooses any vertex that has a maximal (by set 
inclusion) neighborhood in the set of visited 
vertices. Interestingly, this vertex ordering was 
presented in [24] where they showed that any 
search that obeys this property would produce a 
perfect elimination ordering (PEO) if the given 
graph is chordal. Thus, they concluded that both 
maximum cardinality search and LBFS suffice. 
Turning to LDFS, we see that d has the same 
location requirement as DFS, and adding the 
“lexical” property shows that LDFS is also 
a restricted version of MNS, and thus, it too 
is guaranteed to produce a PEO on chordal 
graphs. Note that in [6] all of these conditions 
are shown to be characterizations of the specific 
searches. 

To illustrate the differences and relationships 
among these various searches, consider the graph 
in Fig. 2: 

Regarding complexity issues, all searches 
mentioned in Theorem 2, except LDFS, have 
a linear time implementation (see [17] for 
LBFS); the current best LDFS implementation 
for arbitrary graphs uses van Emde Boas 


MCS DFS 


MNS LDFS 


search S' to search S’ 


Layered Search 


indicates that S’ is a 
restriction of S 


GENS 
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Unified View of Graph 
Searching and 
LDFS-Based Certifying 
Algorithms, Fig. 2 
Sample graph and 
illustrative searches 
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abdce€ BFS \ {LBFS, DFS} 
abcde € {LBFS, DFS} \ LDFS 


abecd€ DFS \{LDFS, BFS} 


trees [26] and runs in time O(max(n?,n + 
mlog logn)). The key question arising from 
[6] is: 

Are there any applications of LDFS? 


Problem Definition (cont.) 


The first few attempts to find such an application 
quickly failed. The first was to see if LDFS 
could enjoy the same success as LBFS in build- 
ing recognition algorithms for various restricted 
families of graphs (apart from chordal graphs); in 
all cases easily found counterexamples thwarted 
the various attempts. The second approach was to 
determine if LDFS* could be helpful in finding 
Hamilton paths (HP) or more generally minimum 
path covers (MPC) where the goal is to find a 
minimum cardinality set of subpaths of given 
graph G such that each vertex belongs to exactly 
one such path. Unfortunately, LDFSt fails when 
applied to an interval ordering (G is an interval 
graph if and only if there is an interval ordering, 
o, of the vertices such that for all triples a <g 
b <g c where ac € E, then ab must also belong 
to E). To see this, consider a vertex universal 
to two disjoint paths on three vertices. From 
examples, it seems, however, that DFS* will find 
an HP, if one exists. 

In fact, [1] and [11] independently showed 
that using the rightmost neighbor (RMN) sweep 
on an interval ordering yields an MPC of the 
given interval graph. Note that RMN when 
presented with an ordering o greedily builds 
paths by starting at the rightmost unvisited vertex 
of o and proceeding to its rightmost unvisited 
neighbor if such a vertex exists; if not, a new 


b abced€ LDFS\ BFS 
cabede {MNSN BFS} \ LBFS 


baced€{MNSNODFS}\ LDFS 


path is started at the rightmost unvisited vertex. 
(Note that this backtracking is different than the 
DFS* restarting.) Building off this algorithm, 
Dalton [10] presented a simple algorithm that 
certifies the correctness of the computed set 
of paths by either finding a set of vertices S 
(called a “scattering set’) where the number 
of connected components of G \ S equals |S| 
plus the number of paths in the path cover or 
concludes that the given vertex ordering is not an 
interval ordering. 

The next step was to try to lift this simple MPC 
algorithm to the superclass of cocomparability 
graphs. Note that a graph is a cocomparability 
graph if and only if its complement G has a 
transitive orientation of its edges. This orientation 
condition in G immediately translates into a ver- 
tex ordering characterization of cocomparability 
graphs. In particular, G is a cocomparability 
graph if and only if there is a cocomp ordering, 
o, of the vertices such that for all triples a <g 
b <g c where ac € E, at least one of ab and bc 
must also belong to E. 

Although there were polynomial time algo- 
rithms that solved the MPC problem on cocom- 
parability graphs, all of these algorithms solved 
the “bump number” problem on the poset asso- 
ciated with the given graph and used the fact 
that any linear extension that minimizes the bump 
number contains the set of paths in a minimum 
path cover. The goal of this research was to 
find an MPC cocomparability graph algorithm 
that is directly graph theoretical and hopefully 
extends the interval graph MPC algorithm men- 
tioned above. Examples immediately showed that 
applying RMN to an arbitrary cocomp ordering 
does not work, so many attempts were made to 
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use multi-sweep LBFS to yield such a cocomp 
ordering. (Note that if o is a cocomp ordering 
and t = LBFS* (o), then t is also a cocomp or- 
dering.) This approach continued until the graph 
shown in Fig.3 was discovered. On this graph 
every LBFS cocomp o ordering fails, in the 
sense that RMN applied to o does not produce 
a Hamiltonian path! A sample LBFS and the 
resulting RMN (which consists of two paths) are 
included in Fig. 3. 


Having seen the failure of LBFS, is there any 
chance that LDFS could work? 


Key Results (cont.) 


If there is such a role for LDFS, the counterex- 
ample for interval graphs mentioned previously 
shows that LDFS could not be expected to pro- 
duce a minimum path cover itself; possibly LDFS 
could be used as a preprocessing step. If so, the 
simplest possible algorithm would be: 


. Let z be an arbitrary cocomp ordering. 

. Let o be LDFS* (zr). 

. Let t be RMN(o). 

. If t is not a Hamiltonian path, then from T, use 
Dalton’s algorithm to construct a separator S 
that certifies tT; otherwise, conclude z is not a 
cocomp ordering. 


BRwWN 


First of all, as with LBFS, LDFS when applied 
as a +-sweep on a given cocomp ordering returns 
a cocomp ordering. In this algorithm the hope is 
that an LDFS cocomp ordering would capture the 
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“interval structure of cocomparability graphs,” at 
least from the perspective of the MPC problem. 

Somewhat surprisingly, this algorithm worked 
on all attempted examples. In an attempt to under- 
stand the structure exposed by an LDFS cocomp 
ordering, there are two points. First of all, why 
we do not see the LDFS structure in interval 
graphs? From the vertex ordering characteriza- 
tion of interval graphs, we see that there can never 
be an ordered triple of vertices a < b < c 
with edge ac and nonedge ab, and thus, every 
interval ordering is simultaneously an example of 
every search mentioned in Theorem 2. Secondly, 
since an interval graph is chordal, every LBFS 
and LDFS must be a perfect elimination ordering 
implying that every vertex is simplicial in the 
graph formed on it and all vertices before it in 
the ordering. By considering a C4, this property 
will not hold for LDFS cocomp orderings. There 
is however a crucial observation of the structure 
guaranteed by a nonsimplicial vertex in an LDFS 
cocomp ordering. 


Lemma 1 ([7]) Let o be an LDFS cocomp or- 
dering of cocomparability graph G. If z is a 
nonsimplicial vertex in o as witnessed by x <g 
y <g Z where xz,yz € E,xy € E, then there 
exists vertex W, X <g W <g y where xw,wy € 
E,wz ¢ E. 


Proof By the LDFS vertex ordering characteriza- 
tion applied to the triple {x, y, z}, vertex w exists 
and satisfies all conditions of the lemma, except 
possibly xw € E; if this is not the case, then 
the triple {x,w,z} violates o being a cocomp 
ordering. 


This lemma plays a critical role in the proof of 
correctness of the MPC algorithm stated above. 
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Open Problems 


We first list a number of recent results that have 
grown out of the work presented in [6, 7]: 


¢ K6hler and Mouatadid [20] have recently 
shown that LDFS on a cocomparability graph 
can be done in linear time, thereby avoiding 
the log logn off linear factor in the MPC paper 
[7]. 

¢ Mertzios and Corneil [23] have “lifted” the 
O(n*) longest path algorithm on interval 
graphs [18] to achieve the same result and 
time bound for cocomparability graphs. As 
with the MPC algorithm, a LDFS cocomp 
ordering was required. 

e A similar technique of using an LDFS co- 
comparability ordering as a preprocessing step 
for a simple linear time interval graph algo- 
rithm has resulted in a linear time algorithm 
for the maximum independent set (and min- 
imum vertex cover) problems on cocompa- 
rability graphs [8]. Note that the algorithm 
also produces a minimum cardinality clique 
cover in order to certify the maximum inde- 
pendent set produced by the algorithm. This 
algorithm also uses the linear time LDFS 
cocomp ordering algorithm presented in [20]. 
Very recently Kohler and Mouatadid [19] have 
presented a linear time algorithm that com- 
putes a maximum weighted independent set 
of a cocomparability graph; this algorithm 
works on any cocomp ordering and, in par- 
ticular, does not require an LDFS cocomp 
ordering. 

¢ In [8] the authors also characterized the 
search orderings that are “cocomp ordering 
preserving” in the sense that when used as 
a +-sweep, the output is a cocomp ordering 
when the input is a cocomp ordering. They 
showed that dfgreedy is such a preserving 
search and can be used to simplify the current 
best recognition algorithm for permutation 
graphs. 

¢ Inhis PhD thesis, Dusart [12] studied the max- 
imal clique lattice of a cocomparability graph 
and showed that a graph G is a cocompara- 
bility graph if and only if the set of maximal 
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cliques of G satisfies specific lattice proper- 
ties. Furthermore, he defined a new cocomp 
ordering preserving search called local MNS 
to compute a maximal interval subgraph of G. 
The new characterization together with MNS 
yields linear time algorithms to compute the 
simplicial vertices, the clique separators, and 
associated components of a cocomparability 
graph. 

¢ Recently a new model of graph searching 
called “tiebreaking label search” (TBLS) [9] 
has been announced. This model builds off 
the vertex ordering characterization model ap- 
pearing in [6] as well as the General Label 
Search formalism of Krueger, Simonet, and 
Berry [21]. The TBLS model incorporates 
the +-sweep use of graph searches, restricts 
labels to be sets of integers, and presents some 
new vertex ordering characterizations. 


We now turn to some new directions for fur- 
ther research. From a graph algorithm perspec- 
tive, the most interesting question is whether the 
results on cocomparability graphs can be easily 
extended to asteroidal triple-free graphs, an in- 
clusive family that has received considerable at- 
tention. Further results, both structural and algo- 
rithmic are expected for cocomparability graphs 
and their associated posets. We expect that graph 
searching will continue to play a major role in 
these developments. 
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Problem Definition 
The Model 


A mobile robotic sensor (or simply sensor) is 
modeled as a computational unit with sensorial 
capabilities: it can perceive the spatial environ- 
ment within a fixed distance V > 0, called 
visibility range, it has its own local working 
memory, and it is capable of performing local 
computations [6, 7]. 
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Each sensor is a point with its own local 
coordinate system, which might not be consistent 
with the ones of the other sensors. The sensor 
can move in any direction, but it may be stopped 
before reaching its destination, e.g., because of 
limits to its motion energy; however, it is assumed 
that the distance traveled in a move by a sensor 
is not infinitesimally small (unless it brings the 
sensor to its destination). 

The sensors have no means of direct com- 
munication to other sensors. Thus, any commu- 
nication occurs in a totally implicit manner, by 
observing the other sensors’ positions. Moreover, 
they are autonomous (i.e., without a central 
control) identical (i.e., they execute the same pro- 
tocol), and anonymous (i.e., without identifiers 
that can be used during the computation). 

The sensors can be active or inactive. When 
active, a sensor performs a Look-Compute-Move 
cycle of operations: it first observes the por- 
tion of the space within its visibility range ob- 
taining a snapshot of the positions of the sen- 
sors in its range at that time (Look); using the 
snapshot as an input, the sensor then executes 
the algorithm to determine a destination point 
(Compute); finally, it moves toward the computed 
destination, if different from the current loca- 
tion (Move). After that, it becomes inactive and 
stays idle until the next activation. Sensors are 
oblivious: when a sensor becomes active, it does 
not remember any information from previous 
cycles. 

Depending on the degree of synchronization 
among the cycles of different sensors, three sub- 
models are traditionally identified: synchronous, 
semi-synchronous, and asynchronous. In the syn- 
chronous (FSYNC) and in the semi-synchronous 
(SSYNC) models, there is a global clock tick 
reaching all sensors simultaneously, and a sen- 
sor’s cycle is an instantaneous event that starts at 
a clock tick and ends by the next. In FSYNC, at 
each clock tick all sensors become active, while 
in SSYNC some sensors might not be active in 
each cycle. In the asynchronous model (ASYNC), 
there is no global clock and the sensors do not 
have a common notion of time. Furthermore, the 
duration of each activity (or inactivity) is finite 
but unpredictable. As a result, sensors can be seen 
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while moving, and computations can be made 
based on obsolete observations. 


The Problem 

The (distributed) uniform covering problem 
refers to sensors, randomly dispersed in a 
bounded region of space, that must scatter 
themselves throughout the region so to “cover” 
it satisfying some optimization criteria. Consider 
the case of a circular rim 7 (i.e., a ring), and 
let S = {59,...,5,-1} be the sensors initially 
arbitrarily placed in different points on R, with 
S$; preceding s;+4 clockwise (the index operations 
are modulo n). We emphasize that these names 
are used for presentation purposes only, and are 
not known to the sensors. If the sensors agree on 
the notion of clockwise, we say that they have 
a common orientation. Let d = Lr/n where 
LR is the length of the ring. In the following, 
unless otherwise stated, the sensors are assumed 
to have visibility range V > 2d. Let dj(t) 
be the distance between sensors s; and s;+, 
at time ¢; when no ambiguity arises, we shall 
omit the time and simply indicate the distance 
as d;. The sensors are said to have reached 
an exact uniform covering (exact covering for 
simplicity) at time ¢ if dj(t) = d for all 
0 <i < n—1. Given e > 0, the sensors are 
said to have reached an €-approximate covering 
at time ¢t if d—e < dj(t) < d +e for all 
0O<i<n-l. 


Key Results 
The Ring 


Exact Uniform Covering 

There is a strong impossibility result that stresses 
the importance of having common orientation. If 
the sensors have only a local notion of left and 
right, but do not share a common orientation of 
the ring, the exact covering problem is unsolv- 
able. This result holds even if the sensors had 
unbounded memory and visibility, and under a 
SSYNC scheduler. 


Theorem 1 ((5]) Let the sensors be ona ring R. 
In absence of common orientation, there is no 
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deterministic exact covering algorithm even if the 
sensors have unbounded persistent memory, their 
visibility range is unlimited, and the scheduling is 
SSYNC, 


To see why this is the case, consider the 
following setting. Let n be even; partition the 
sensors in two sets, Sj = {S1,...,5,/2} and 
Sy = S\Sj, and place the sensors of S; and 
Sy on the vertexes of two regular (1/2)-gons 
on R, rotated of an angle a < 360°/n. Fur- 
thermore, all sensors have their local coordinate 
axes rotated so that they all have the same view 
of the world. In other words, the sensors in S, 
share the same orientation, while those in S> 
share the opposite orientation of C. If activating 
only the sensors in S), an exact covering (resp. 
no exact covering) on 7 is reached at time step 
ti+1, then the same is true also activating only 
the ones in Sp. Clearly, in such a case, activating 
both sets no exact covering would be reached 
at time step ¢;+1, and the system would be an 
analogous configuration as the one of time step 
t;, with different angles. Using this property, it 
is easy to design an adversary that will force 
any algorithm to never succeed in solving the 
problem; its behavior would be as follows: (i) If 
activating only the sensors in S; (resp. Sz) no 
exact covering on 7e is reached, then activate all 
sensors in S$; (resp. Sz), while all sensors in S2 
(resp. S) are inactive; (ii) otherwise, activate all 
sensors. Go to (i). 

On the other hand, assuming common orien- 
tation and knowledge of the final inter-distance 
d among sensors, a simple algorithm that solves 
the exact covering in ASYNC is for each sensor 
to move toward the point at distance d from its 
clockwise successor (if visible). We remind that 
V > 2d. 


Protocol RINGCOVERINGEXACT (for sensor 
Si) 
Assumptions: Orientation, knowledge of d. 


1. If s;41 is not visible, move distance d 
clockwise. 

2. else, if d; > d move toward point x at 
distance d from 5; +1. 
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Theorem 2 ([5]) The exact covering of the ring 
problem is solvable in ASYNC, with common 
orientation and knowledge of the final inter- 
distance. 


Approximate Covering 

Assuming common orientation but no knowl- 
edge of the final inter-distance among sensors, 
an €-approximate covering is still possible for 
any € > 0, but no exact covering algorithm 
is known. Also this algorithm is very simple: 
the sensors asynchronously and independently 
Look in both directions, then they position them- 
selves in the middle between the closest observed 
sensors (if any). Correctness is shown by prov- 
ing that the minimum distance between any two 
neighboring sensors eventually grows, while the 
maximum distance eventually shrinks in such a 
way that there is a time when all sensors are 
within d + « distance. 


Theorem 3 ([5]) The approximate covering of 
the ring problem is solvable in ASYNC with 
common orientation. 


Algorithm RINGCOVERINGAPPROX (for sen- 
sor 5; ) 
Assumptions: Orientation 


¢ If no sensor is visible clockwise (resp. 


counterclockwise), let d; =V_ (resp. 
dj-1 = V). 

¢ Ifd; < d;_, do not move. 

¢ Ifd; > d;_, move distance dita diy 
clockwise. 


Note that the covering problem has been also 
studied in discrete rings [4]. 


The Line 

The case of a line segment is quite different from 
the one of the ring, and perhaps surprisingly, it is 
not easier. Let S = {50,...,5,—1} be the sensors 
initially arbitrarily placed in different points on 
a line £ with so and sy_; being two special 
immobile sensors delimiting the segment to be 
covered and with s; preceding sj; (0 <i < 
n—2). Letd = Le/(n — 1), where Lr denotes 
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the length of the segment. Exact covering and €- 
approximate covering are defined analogously to 
the case of the ring. 


Exact Uniform Covering 

With common orientation and known final inter- 
distance, an algorithm has been recently shown 
for oriented sensors in ASYNC [3]. The algorithm 
works even if the visibility range is just enough to 
sense the final inter-distance (V = d). Let 6 < d 
be a fixed positive (arbitrarily small) constant the 
sensors agree upon. 


Protocol CORRIDORCOVERINGEXACT (for 
sensor 5; ) 

Assumptions: Orientation, knowledge of d, 
V=d 


e If s;-; is not visible, move distance g to 
the left. 
¢ else, leta := d—dj_, 
If dj; > d anda > O, move distance 
min(4 — 6,a) to the right. 


Theorem 4 ((3]) The exact covering of the line 
problem is solvable in ASYNC with common 
orientation and knowledge of the final inter- 
distance. 


With fixed visibility, a distributed algorithm 
has been proposed for FSYNC in a discrete set- 
ting, to solve the slightly different problem of 
barrier coverage [2]. 


Approximate Covering 

Approximate covering has been studied 
slightly different visibility model where each 
sensor is able to perceive up to the next 
sensor on the line [1]. In other words, in each 
direction, a sensor sees the closest sensor (if 
it exists), regardless of its distance, but its 
visibility is blocked by it (neighbor visibility). 
For presentation purposes, a global linear 
coordinate system (not known to the sensors) 
is used here with so(t) = 0 and s,_1(t) = 1. 
For the sensors to be spread uniformly, sensor 
s; Should then occupy position ae The 
following is a simple approximate covering 
algorithm. 


in a 
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Protocol CORRIDORSPREAD (for sensor 5; ) 
Assumptions: SSYNC, neighbour visibility 


e Ifno sensor is visible in either direction, do 
nothing. 

¢ Otherwise, move toward point x = 
5 (si41 + 5i-1). 


The idea of the convergence proof in FSYNC 
is sketched below. Let u;[t] be the shift of the 
s;’s location at time ¢ from its final position. 
According to the protocol, the position of 
sensor sj; changes from s;(t) to s;(¢ + 1) = 
$(si—1(t) + sj41(t)) for 1 < i < n — 2, while 
sensors Sg and s,—; never move. Therefore, 
the shifts changes with time as y;[t + 1] = 
$(misalt] + pi-i|t]). Considering the progress 
measure, w[t] = L/=? p?[t], it can be shown 
that y[t] is a decreasing function of ¢ unless 
the sensors are already equally spread; more 
precisely, it is shown that every O(n?) cycle, 
wt] is at least halved thus reaching approximate 
covering. More complex but analogous reasoning 
is followed for SSYNC. 


Theorem 5 ({1]) The approximate covering of 
the line problem is solvable in SSYNC with neigh- 
bor visibility. 


With a simple modification of the algorithm, 
the result above can be extended to any fixed 
visibility V > d, provided that d is known, as 
described below [3]. 


Protocol CORRIDORSPREAD2 (for sensor 5; ) 
Assumptions: SSYNC, d known, V > d 


« If only one sensor s; € {sj41,Si—1} is 
visible to s; and d’ = dist(s;,s;) < d: 
move distance ara + ree away from s; 

¢ If both s;41,5;-1 are visible and d; = 
dist(s;-1,5;) < dy = dist(s;+41,5;) (resp. 
di, = dist(Sj+1, Si) <d,= dist(sj—1, S;)): 
move bra toward s;+1 (resp. toward 


Si-1) 


Unique k-SAT and General k-SAT 


Applications 


Uniform covering problems are important in 
many applications; covering of a circular rim 
occurs, for example, when the sensors have 
to surround a dangerous area and can only 
move along its outer perimeter. On the other 
hand, coverings of the line (often called barrier 
coverings) guarantee that any intruder attempting 
to cross the perimeter of a protected region (e.g., 
crossing an international border) is detected by 
one or more of the sensors. These problems 
are studied under a variety of assumption; the 
majority of the studies uses sensors provided 
with memory, explicit communication devices, 
global localization capabilities (e.g., GPS), 
and centralized approaches. The advantage 
of memoryless sensors are self-stabilization 
and tolerance to loss of sensors, the use of 
local coordinate systems has clear advantages 
over the full strength of a GPS; finally, 
decentralized solutions offer better fault 
tolerance. 


Open Problems 


It is known that the exact covering of the ring is 
impossible without orientation in SSYNC, but the 
impossibility does not extend to FSYNC where, 
however, no algorithm is known. Moreover, the 
only existing exact covering algorithm in ASYNC 
assumes orientation, which is needed, and knowl- 
edge of the inter-distance d, which is possibly 
not needed, so a tighter result might be possible. 
Finally, approximate covering is achieved in the 
ring in SSYNC assuming orientation, which is not 
shown to be necessary, furthermore, no solution 
exists for ASYNC. 

In the case of the line, the only impossibility 
result for exact covering [3] holds for fully disori- 
ented sensors (not even able to locally distinguish 
between their two directions) and with small visi- 
bility range V = d. As for approximate covering, 
the only known result in this model is for SSYNC, 
and it is not known whether an algorithm exists 
for the ASYNC model. 
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Problem Definition 


A Boolean formula F is said to be in conjunctive 
normal form (CNF) if it is a conjunction of 
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disjunction of literals. If furthermore every dis- 
junction (called clause) is over at most k literals, 
F is said to be in k-CNF. k-SAT, the decision 
problem whether a k-CNF formula admits a sat- 
isfying assignment, is one of the most prominent 
NP-complete problems. A special case of k-SAT 
is (promise) unique k-SAT, where the k-CNF is 
additionally promised to have either a unique or 
no satisfying assignment. 

Suppose F has n variables. The trivial algo- 
rithm tries all 2” satisfying assignments. For k- 
SAT and especially 3-SAT, there have been many 
successive improvements [3-9, 11]. The best of 
them are randomized in the sense that they always 
correctly report unsatisfiability but might fail to 
report satisfiability with probability * say. 


Problem 1 (k-SAT) 


INPUT: A k-CNF formula F. 
OUTPUT: “No” if F is not satisfiable. “Yes” with 
probability at least 3 if F is satisfiable. 


Problem 2 (Unique k-SAT) 


INPUT: A k-CNF formula F with at most one 
satisfying assignment. 

OUTPUT: “No” if F is not satisfiable. “Yes” 
with probability at least 3 if F is satisfiable. 


It is conjectured that unique kK-SAT and k-SAT 
have the same exponential complexity; however, 
this could only be shown for k — oo [I]. 
Especially for PPSZ [9], the fastest known (ran- 
domized) algorithm for unique k-SAT, the anal- 
ysis results in a gap between k-SAT and unique 
k-SAT for k = 3,4. Furthermore, the PPSZ 
algorithm has been derandomized for unique k- 
SAT [10] but not for general k-SAT. 


Notation For a CNF formula F over a variable 
set V, denote by sat(F) the set of satisfying 
assignments of F on V. For x a variable and b a 
Boolean value, define F’ [x>5] the restriction of F 
by x }» b, ie., the formula obtained by replacing 
x by bin F. 
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Key Results 


The bounds of the PPSZ algorithm for unique k- 
SAT hold for general k-SAT also if k = 3, 4 [2]. 
This makes PPSZ the fastest known k-SAT algo- 
rithm for all k. In the analysis of [2], the PPSZ 
algorithm is slightly modified. 


Theorem 1 There is a randomized algorithm for 
3-SAT running in time O(2°-387"). 


Theorem 2 There is a randomized algorithm for 
4-SAT running in time O(2°>°°"). 


Algorithm 1 PPSZ(k-CNF formula F’) 


V < variables of F 
Choose f uniformly at random from all assignments on 
V 
Choose z uniformly at random from all permutations 
of V 
Let @ be a partial assignment over V, initially the empty 
assignment 
for all x € V in the order prescribed by 2 do 
while there is an log n-implied assignment y +> a 
of F do 
F< Flyeral 
a(y) <a 
end while 
if w(x) not fixed yet then 
F< FR BO)] 
a(x) <— B(x) 
end if 
end for 
return If q@ satisfies F’, return ‘satisfiable’, otherwise 
return ‘failure’. 


If F is not satisfiable, then PPSZ will never 
find a satisfying assignment and thus is always 
correct. Hence, let F be a satisfiable kK-CNF 
formula over n variables V. The PPSZ algo- 
rithm tries to find a satisfying assignment of F 
by iteratively setting variables as follows: Go 
through the variables one by one, in random 
order. If a variable x is not set at its step, then 
its value will be guessed uniformly at random. 
Between steps, we might infer the value of some 
variables in subexponential time: Setting x to a 
is called logn-implied (by F’) if there is a set 
of log clauses G in F such that all satisfying 
assignments of G set x to a. logn-implication 
can be checked in subexponential time by brute 
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force; the algorithm fixes all log n-implications 
accordingly between steps. 

If a variable is determined by logn- 
implication, it is called forced; otherwise, it 
is called guessed. The key result of [9] for 
unique k-SAT is the following: If F is in k- 
CNF and has a unique satisfying assignment 
a, then given that B = a (ie., all guesses 
are according to @), a variable is forced with 
a certain probability R;,. By Jensen’s inequality 
one can then show that q@ is found with probability 
2-G-Ri)n We have Rz = 2—21n2 = 0.613 
and Ra ~ 0.445. Repeating PPSZ inversely 
proportional to its success, probability will match 
the above theorems for unique 3-SAT and unique 
4-SAT. 

If there are multiple satisfying assignments, 
there is no bound on the probability that a variable 
is forced. For example, the empty CNF formula 
that always evaluates to true will never have 
a forced variable, as being forced depends on 
certain assignments not being satisfying. How- 
ever, the following can be done: Given a satis- 
fiable CNF formula F’,, call a variable x frozen 
if it has the same value in all satisfying assign- 
ments of F, and call x non-frozen otherwise. 
If x is frozen, the same bound on the proba- 
bility that it is forced holds by the arguments 
of [9]. If x is non-frozen, then it can be set 
both ways and the resulting formula remains 
satisfiable. The remaining problem is that the 
probability for frozen variables depends on a 
fixed satisfying assignment and a uniform per- 
mutation; however, depending on the permuta- 
tion, certain assignments will be more or less 
likely. This leads to a correlation issue that has 
to be solved by balancing the correlation and 
the benefit of non-frozen variables by careful 
bookkeeping. 

Let Vy be the frozen variables of F and 
V, be the non-frozen variables of F. The 
likelihood of an assignment @ in F, lkhd(F, qa), 
is recursively defined as follows: If @ does 
not satisfy F’, then Ikhd(F,a) = 0. If a@ is 
the unique satisfying assignment of F, then 
Ikhd(vw) = 1. Otherwise, let Ikhd(~) = 
Waar (Lxev Ikhd(@, Fe), The 
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likelihood simulates how likely an assignment 
would be returned by PPSZ in an ideal setting. 
With this, the cost F is defined as follows: 
For a non-frozen variable x, cost(F,x) = 1 — 
R,. For a frozen variable x, first defined for 
a satisfying assignment a, cost(F,x,a) is the 
probability that x is guessed if executing PPSZ 
conditioned on 6B = a. Then cost(F,x) = 
ewesat(r) khd(F, a)cost(F, x, a). Observe that 
cost(F,x) < 1 — Rx, as frozen variables are 
guessed with probability at most 1 — Rx. In total 
we define cost(F) = )°,<y cost(F,x) < (l- 
R;,)n. The following theorem relates the cost to 
the probability that PPSZ finds an assignment: 


Theorem 3 


Pr(PPSZ finds some satisfying assignment of F) 


aor), 


This theorem immediately implies Theo- 
rems | and 2. The theorem is by induction on the 
number of variables of Ff. After a single PPSZ 
step, the cost decreases in expectation; the more 
the more frozen variables there are. On the other 
hand, the more non-frozen variables there are, 
the higher the probability is to retain a satisfiable 
formula. Balancing these factors and applying 
Jensen’s inequality gives the theorem. It is 
noteworthy that the proof relies on the inequality 
0.613 ~ R3 < 545 © 0.721, meaning that if 
PPSZ would be improved beyond this bound, the 
unique case might indeed be better. 


Open Problems 


¢ Is the exponential complexity of k-SAT and 
unique k-SAT the same? Here this has been 
shown for the specific case of the PPSZ algo- 
rithm. 

¢ Does PPSZ perform even better on formulas 
with exponentially many satisfying assign- 
ments? 

¢« Can PPSZ be derandomized for general k- 
SAT? 
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Problem Definition 


Given a set of jobs J = {1,2,...,n} with 
processing times p; € R+ and weights w; € 
R4, the task is to find a schedule for all jobs on 
a single machine that minimizes )* w;C;, where 
C; is the completion time of job /. 

Under the standard scheduling assumption of 
an ideal machine that runs at constant speed, an 
optimal schedule is obtained by sequencing the 
jobs in nonincreasing order of the ratio w;/p;; 
this is known as Smith’s Rule [12]. Unfortunately, 
as we shall see shortly, this sequence may per- 
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form arbitrarily bad when the machine is not 
ideal. 

This note is concerned with the setting in 
which the machine may change its processing 
speed over time or it may fully break down 
and is unavailable until it is fixed. Given 
a sequence, the jobs are processed in this 
order no matter how the machine behaves. 
In case of a machine breakdown, the job 
that is currently running is preempted and 
resumes processing when the machine becomes 
available again at a later time. The aim is to 
compute a universal sequence that, for any given 
machine behavior, is a good approximation of 
an optimal schedule for that particular machine 
behavior. 


Definition 1 A sequence z is a universal c- 
approximation if for any machine behavior 
the total weighted completion times of z is 
at most a factor c larger than the objective 
value of an optimal solution for this machine 
behavior. 


To illustrate this definition consider the 
following toy instance with two jobs: py = 
w, = 2 and po = w2 = N > 2. There are 
only two possible sequences: (1,2) and (2, 1). 
Both sequences are optimal on an ideal machine 
and are consistent with Smith’s Rule. Now 
suppose our machine breaks down att = N + 1 
and stays offline for T = N? units of time. 
The cost of (1,2) on this faulty machine is 
4+ N(N +24 T) = O(N), while the cost 
of (2,1) is N+ 2(N+2+T) = O(N?). 
This example shows that Smith’s Rule can 
produce a sequence that is not a universal 
O(1)-approximation. In fact, it is not clear 
that such a universal sequence should always 
exist. 


Key Results 


Epstein et al. [3] initiated the study of universal 
sequencing. They showed that universal O(1)- 
approximate sequences do indeed exist and 
established tight lower bounds on the universal 
approximation ratio that can be achieved. Their 
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study was subsequently furthered by Megow 
and Mestre [10] who showed that the best 
universal schedule can be approximated in 
polynomial time up to any desired level of 
accuracy. 


Bounding the Performance of a Universal 
Sequence 

The key observation needed to bound the 
performance of a universal sequence is that 
approximating the min-sum objective value on 
a machine with unknown processing behavior 
is equivalent to approximating the total weight 
of uncompleted jobs at any point in time on 
an ideal machine. To that end, let W(t) 
denote, for any ¢ > O, the total weight of 
outstanding jobs at time ¢ in the schedule 
obtained for job sequence z on an _ ideal 
machine. Define W*(t) := min, W(t) for 
allt > 0. 


Lemma 1 Let a be a sequence of jobs. Then, 
the objective value of the corresponding sched- 
ule is at most c times the value of an opti- 
mum schedule for any machine behavior, if and 


only if 


W7(t)<c-W*(t) for allt > 0. 


A Universal Sequencing Algorithm 

The universal sequencing algorithm computes 
the job sequence iteratively backwards. In 
each iteration it solves the subproblem of 
finding a set of jobs that has maximum total 
processing time and total weight within a 
given bound. This bound is doubled in each 
iteration. 

This approach is related to, but not equiv- 
alent to, an algorithm of Hall et al. [6] for 
online scheduling on ideal machines — the 
doubling there happens in the time horizon. 
Indeed, doubling strategies have been applied 
successfully in the design of approximation 
and online algorithms for various problems; 
see, e.g., the survey by Chrobak and Kenyon- 
Mathieu [1]. 
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Doubling Algorithm: 


1. Fori € {0,1,..., [log w(J)]}, find a sub- 
set J;* of jobs of maximum total process- 
ing time p(J;*), such that the total weight 
satisfies w(J;*) < a 

2. Construct a permutation z as follows. Start 
with an empty sequence of jobs. For i = 
[log w(J)] down to 0, append the jobs 
in J** \ Ci J; in any order at the end 
of the sequence. 


Finding the subsets of jobs J;* isa KNAPSACK 
problem and, thus, NP-hard [8]. Using straight- 
forward dynamic programming, the algorithm 
runs in pseudo-polynomial time and achieves a 
performance guarantee of 4 as shown below. 
However, FPTASes for the knapsack problem can 
be adopted such that the Doubling Algorithm runs 
in polynomial time loosing an arbitrarily small 
constant in the performance guarantee. 


Theorem 1 For every scheduling instance, the 
Doubling Algorithm produces a universal 4- 
approximation for all machine behaviors. 


Proof By Lemma 1 it is sufficient to show 
that W™(t) < 4W*(t) for all t > 0. Lett > 0 
and let i be minimal such that p(J;*) > p(J)—t. 
By construction of z, only jobs 7 in (Ss Je 
can have a completion time CF > t. Thus, 


W(t) < Dow) <> = 241-1. 


k=0 k=0 
(1) 


In case i = O, the claim is trivially true 
since w; > 1 for any j € J, and thus, W*(t) = 
W(t). Suppose i > 1; then by our choice of 7, 
it holds that p(J;*_,) < p(J) — t. Therefore, 
in any sequence z’, the total weight of jobs 
completing after time f¢ is larger than 2'~!, 
because otherwise we get a contradiction to the 
maximality of p(J;*_,). That is, W*(¢) > 2/71. 
Together with (1) this concludes the proof. Oo 


This result is best possible for universal se- 
quencing on a single machine. 
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Theorem 2 For any c < 4, there exists an 
instance for which there is no universal c- 
approximation. 


This can be shown through a connection to 
the online bidding problem and the corresponding 
lower bounds shown by Chrobak et al. [2]. 


Randomized Universal Schedules 

It is possible to obtain a better approximation 
ratio if we select the sequence at random and 
slightly relax the universality requirement. 


Definition 2 A probability distribution over 
sequences is a randomized universal c- 
approximation if for any machine behavior the 
expected total weighted completion times of a 
sequence chosen according to the distribution is 
at most a factor c larger than the objective value 
of an optimal solution for this machine behavior. 


By randomizing the “doubling parameter” 
in the Doubling Algorithm, the algorithm can 
achieve an approximation ratio of e ~ 2.718, 
which is best possible for randomized strategies. 


Theorem 3 For every scheduling instance, 
a randomized variant of the Doubling Al- 
gorithm produces a_ randomized universal 
e-approximation for all machine behaviors. 
Furthermore, for any c < e, there exists an 
instance for which there is no randomized 
universal c-approximation. 


Generalizations 


Global Cost Functions 

The universality of the sequence constructed by 
the Doubling Algorithm can be driven even fur- 
ther. Consider the generalized min-sum objective 
min >) w; f(C;) for any nondecreasing, nonneg- 
ative, differentiable cost function f/f. 


Theorem 4 The Doubling Algorithm computes 
a universal 4-approximation (randomized e- 
approximation) for all machine behaviors and 
all considered cost functions f simultaneously. 
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Precedence Constraints 
A natural generalization of the universal sequenc- 
ing problem requires that jobs must be sequenced 
in compliance with given precedence constraints. 
To a certain extent the Doubling Algorithm can 
be adopted to this more general problem setting. 
Essentially, the knapsack-related subroutine must 
respect the precedence constraints, and it must 
ensure that prepending the subsets found in dif- 
ferent iterations, starting in the end, does not 
violate the precedence order. 

This corresponds to solving a so-called par- 
tially ordered knapsack (POK) problem on the 
reverse of the given partial order. 


Theorem 5 The Doubling Algorithm computes 
a universal 4-approximation (randomized e- 
approximation) for the universal scheduling 
problem respecting given precedence constraints 
if the POK problem for the given partial order 
can be solved in polynomial time. 


In general, POK is strongly NP-hard [7] and 
hard to approximate [5]. However, FPTASes exist 
for special partial orders, including directed out- 
trees, two-dimensional orders, and the comple- 
ment of chordal bipartite orders [7,9]. 


Release Dates 

If jobs have release dates, we cannot hope for a 
universal sequence with bounded approximation 
ratio unless the scheduler is allowed to preempt 
jobs. We can think of a universal sequence as a 
priority order of the jobs guiding a preemptive 
list scheduling procedure: At any point in time, 
we work on the job of highest priority that has 
not been finished yet and that has already been 
released. Unfortunately, even with this flexibility, 
the problem is significantly harder. 


Theorem 6 There exists an instance with n jobs 
with release dates and unit weights, where the 
performance guarantee of any universal schedule 


is §2(logn/ log logn). 


The proof relies on the classical theorem of 
Erdés and Szekeres [4] on the existence of long 
increasing/decreasing subsequences of a given 
sequence of distinct real numbers. 
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Despite this negative result, there is a non- 
trivial algorithm that produces a universal 5- 
approximate sequencing for the class of instances 
with release dates in which the processing time of 
each job is proportional to its weight. 


Instance-Sensitive Performance Guarantee 
Theorem 1 says that the Doubling Algorithm 
produces for every instance a universal 4- 
approximation. Theorem 2 proves that this is 
best possible since there are particular instances 
that do not admit a sequence with a smaller 
approximation ratio. Many instances, however, 
admit _ better-than-4-approximate —_ universal 
sequences, yet the Doubling Algorithm is only 
guaranteed to find a 4-approximation. This 
motivates the problem of finding the best possible 
universal sequence on an instance-by-instance 
basis. 


Theorem 7 For any fixed € > Oandc > 1, 
there is a polynomial time algorithm that given 
an instance either finds a (c + €)-approximate 
universal sequence or determines that there is 
no universal c-approximation for this particular 
instance. 


Applications 


The unreliable machine scheduling problem ad- 
dresses the demand for high-quality scheduling 
solutions in the dynamic real-world environments 
of manufacturing processes or in operating sys- 
tems. The machine could be, for example, a com- 
puter server that slows down due to unpredictable 
third-party usage or an aging production unit 
prone to unexpected breakdowns. Another setting 
where the model is applicable is where a higher 
authority may give priority to another batch of 
jobs, thus delaying the execution of our jobs. In 
general, universally good performance regardless 
of the actual machine behavior is desirable in 
highly automated systems in which changing the 
schedule at an arbitrary point in time is too costly 
or technically infeasible. 
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Open Problems 


The worst-case performance of universal 
sequences is quite well understood. While the 
analysis in the model without release dates is 
tight, it remains open in the setting with release 
dates if it is possible to obtain a universal o(n)- 
approximation for general instances. The best 
known lower bound is (2 (log / log log). 

While the worst-case analysis assumes arbi- 
trary machine behaviors, it would be interesting 
to develop and analyze more realistic speed func- 
tions. For example, it is reasonable to assume 
that when a machine breaks down, then it will 
be repaired or replaced within a certain (possibly 
fixed) amount of time; or in a stochastic model, 
the availability periods between breakdowns may 
be assumed to be exponentially distributed. What 
improvements in the approximation guarantee do 
such restrictions allow? 

A different approach in aiming for more 
practice-relevant guarantees is to relax the strict 
universality requirement. In many situations, 
changing the scheduling sequence is possible 
to a certain extent at some extra cost. A very 
interesting problem is to quantify the amount 
of adaptivity an algorithm needs to achieve a 
certain performance guarantee. Ideally, there is a 
parameter describing the adaptivity that allows to 
scale between the nonadaptive 4-approximation 
(Theorem 1) and a fully adaptive (1 + €)- 
approximation, given by a PTAS that constructs 
an individual scheduling solution for a specific 
machine behavior [11]. 
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Problem Definition 


Upward graph drawing is concerned with 
computing two-dimensional layouts of directed 
graphs where all edges flow in the upward 
direction. Namely, given a directed graph 
G(V, E) (also called a digraph for short), an 
upward drawing of G is a drawing such that: (i) 
each vertex v € V is mapped to a distinct point 
Pv of the plane and (ii) each edge (u,v) € E 
is drawn as a simple curve from p, and Dy, 
monotonically increasing in the upward direction. 

Clearly, G admits an upward drawing only if 
it does not contain directed cycles; if we allow 
edge crossings, acyclicity is also a sufficient con- 
dition for the existence of an upward drawing. 
Instead, if G is planar and we require that also the 
upward drawing of G is crossing-free, acyclicity 
is only a necessary condition, and the upward 
drawability of G becomes a much more intrigu- 
ing problem. An upward drawing with no edge 
crossing is called an upward planar drawing; 
deciding whether a planar digraph G admits such 
a drawing is recognized as the upward planarity 


b 
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testing problem. This problem can be studied in 
two different settings: 


¢ Variable embedding setting. The existence 
of an upward planar drawing of G is checked 
over all possible planar embeddings of G. 

¢ Fixed embedding setting. The existence of an 
upward planar drawing of G is checked for a 
given planar embedding of G, 1.e., the drawing 
must preserve the given embedding. 


Both these settings have been widely studied 
in the literature. In the next section we briefly sur- 
vey few seminal results on the upward planarity 
testing problem, and then we concentrate on the 
first and most popular polynomial-time algorithm 
for the fixed embedding setting. Figure | shows a 
planar digraph G with a given planar embedding, 
an embedding-preserving upward planar drawing 
of G, and a planar digraph G that does not admit 
upward planar drawings. 


Key Results 
Let G(V, E) bea planar digraph. We will assume 
that G is connected (indeed, a digraph admits 


an upward planar drawing if and only if each 
of its connected components admits an upward 


Upward Graph Drawing, Fig. 1 (a) A planar digraph G with a given planar embedding. (b) An upward planar 
drawing of G that preserves the embedding of G. (c) A planar digraph G that has no upward planar drawing 
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planar drawing). A source (resp. a sink) of G isa 
vertex with only outgoing (resp. incoming) edges. 
An internal vertex of G is a vertex with both 
outgoing and incoming edges. We denote by S, 
T, and J the set of sources, sinks, and internal 
vertices of G, respectively. Digraph G is a planar 
st-digraph if it has only one source s and one 
sink ¢ and a planar embedding where s and t¢ 
belong to the same face. Di Battista and Tamassia 
proved different equivalent characterizations of 
upward planar drawable digraphs, as stated in the 
following result [7]: 


Theorem 1 ([7]) Let G be a planar digraph. The 
following properties are equivalent: 


(a) G admits an upward planar drawing; 

(b) G admits an upward planar drawing with 
straight-line edges; 

(c) G is the spanning subgraph of a planar st- 
digraph. 


Using Theorem 1, Garg and Tamassia focused 
on straight-line drawings and showed that the 
upward planarity testing problem in the variable 
embedding setting is NP hard [9]. As a conse- 
quence of this hardness result, polynomial-time 
algorithms in the variable embedding setting have 
been devised for restricted classes of planar di- 
graphs, like single-source digraphs [6] and series- 
parallel digraphs [8], while exponential-time al- 
gorithms have been proposed for more general 
planar digraphs (see, e.g., [1,8]). 

Conversely, Bertolazzi et al. showed that the 
upward planarity testing problem can be solved 
in polynomial time in the fixed embedding set- 
ting [2]. In the following we describe this break- 
through result, which inspired several subsequent 
papers on the subject. 


Polynomial-Time Upward Planarity Testing 
Let G(V, E) be an embedded planar digraph, and 
still denote by S, 7, and J the number of sources, 
sinks, and internal vertices of G, respectively. 
The result in [2] is based on an elegant combi- 
natorial characterization of the planar embedded 
digraphs that are upward planar drawable. We 
first recall few basic definitions. 


Upward Graph Drawing 


Digraph G is bimodal if for every vertex v € 
I, the outgoing edges of v are consecutive in the 
cyclic clockwise order around v (which implies 
that also the incoming edges of v are consecutive 
in the cyclic clockwise order around v). It is 
immediate to see that if a digraph G admits an 
embedding-preserving upward planar drawing, G 
is necessarily bimodal. 

Let f be a face of G, and let a = (e1, v, e2) 
be a triplet such that v € V is a vertex of the 
boundary of f and e;, ez are two edges incident 
to v that are consecutive on the boundary of f 
(e, and ey may coincide if G is not biconnected). 
Triplet a is called an angle at v in face f, 
or simply an angle of f, or an angle at v. If 
both e; and e2 are outgoing edges of v, we call 
a a source-switch angle of f; if both e; and 
é2 are incoming edges of v, we call a a sink- 
switch angle of f. Denote by S(f) and T(/) the 
number of source-switch angles and the number 
of sink-switch angles of f, respectively. It can be 
easily observed that S(f) = T(f). The capacity 
of f is defined as cap(f) = sew —1 
if f is an internal face of G and as cap(f) = 
SOT) + 1 if f is the external face of G. 
The number of sources and sinks in the digraph 
is nicely related to the face capacities, as stated 
by the following theorem. 


Theorem 2 ((2]) Jf G is a bimodal embedded 
planar digraph and F is the set of faces of G, 


then fer cap(f) = |S|+|T|. 


Now, given any upward planar drawing I’, 
denote by L(v) the number of geometric angles 
larger than wz at vertex v in I” and by L(f) 
the number of geometric angles larger than 2 
in face f in I’. The following result estab- 
lishes which kinds of angles in J” can occur 
around the vertices and inside the faces of the 
digraph: 


Theorem 3 ((2]) Let G be an embedded planar 
digraph and let I’ be an embedding-preserving 
upward planar drawing of G. We have that: 


(i) L(v) = 0 for each v € IT and L(v) = 1 for 
eachvu Ee SUT; 
(ii) L(f) = cap(f), for each f € F. 


Upward Graph Drawing 


Motivated by Theorem 3, for any given em- 
bedded planar digraph G, one can look for an 
assignment of the angles of G to the faces of G, 
with these properties: 


(a) For each source or sink v, exactly one angle 
at v is assigned to a face incident to v. 
(b) For each face f, the number of angles as- 


signed to f equals cap(/). 


Such an assignment is called an upward- 
consistent assignment of G. The following result 
translates the upward planarity testing problem 
into the problem of deciding whether G admits 
an upward-consistent assignment. 


Theorem 4 ((2]) Let G be an acyclic bimodal 
embedded planar digraph. G admits an 
embedding-preserving upward planar drawing 
if and only if G admits an upward-consistent 
assignment. 


In [2] it is proved that an upward-consistent 
assignment can be used to construct in linear 
time an upward planar drawing where each angle 
assigned to a face corresponds to a geometric 
angle larger than z. This is done by exploiting 
Theorem 1; namely, G is first augmented to an 
st-planar digraph G’, then an upward drawing of 
G’ is computed, and finally the dummy edges are 
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removed from the drawing of G’, thus obtaining 
an upward planar drawing of G. 

Deciding whether G admits an upward- 
consistent assignment, and in case finding one, 
can be done using a network flow model. Namely, 
construct a bipartite flow network N(G) having a 
node n(v) for each source or sink v of G, called 
a vertex-node, and a node n(f) for each face 
f of G, called a face node. Each vertex-node 
n(v) supplies flow 1, while each face-node n(f) 
demands a flow equal to cap(f). Also, N(G) 
has a directed arc (n(v), (f)) if v is a source or 
a sink that belongs to the boundary of f in G. 
A unit of flow on an arc (n(v),(f)) indicates 
that an angle at v in f must be assigned to f/f. 
Each feasible flow in N(G) defines an upward- 
consistent assignment of G. Using standard flow 
algorithms, testing whether N(G) has a feasible 
flow, and in case computing one, can be done in 
O(n + r*), where n is the number of vertices 
of G andr = |S| + |T|. Figure 2 illustrates 
the algorithmic approach described above for the 
upward planarity testing problem. 

The next theorem summarizes the main result 
of [2]. 


Theorem 5 ((2]) Let G be an acyclic bimodal 
embedded planar digraph with n vertices, and let 
r be the total number of sources and sinks of G. 


Upward Graph Drawing, Fig. 2 (a) An embedded pla- 
nar digraph G; each face is represented by a small box 
reporting its capacity. (b) An upward consistent assign- 
ment of G; a light gray arrow indicates the assignment of 


an angle to a face. (ec) An embedding-preserving upward 
planar drawing constructed from the upward-consistent 
assignment; the angles of G assigned to a face correspond 
to geometric angles larger than z in the drawing 
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There exists an O(n + r?)-time algorithm that 
tests whether G admits an embedding-preserving 
upward planar drawing of G and that computes 
such a drawing if the test is positive. 


Applications 


Upward drawings can be effectively used to 
represent PERT networks, ISA hierarchies 
in knowledge-representation diagrams, and 
subroutine call charts. A generalized model, 
called quasi-upward drawing, strongly enlarges 
the range of application domains of upward graph 
drawing, making it possible to also represent 
cyclic digraphs [1] by allowing an edge to break 
its upward monotonicity in a finite number of 
points. Petri nets are examples of diagrams that 
can be represented as quasi-upward drawings; 
Petri nets are widely used to describe distributed 


systems. 
Efficient C++ graph drawing _ libraries, 
like GDToolkit [5] and OGDF [3], im- 


plement advanced upward graph drawing 
algorithms. 


Experimental Results 


Extensive experimental studies on upward 
planarity testing are described in [l, 4]. 
Other references on experimental work about 
upward graph drawing algorithms can be found 
in [3,5]. 
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Problem Definition 


This problem deals with the design of efficiently 
computable incentive compatible, or truthful, 
mechanisms for combinatorial optimization 
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problems with selfish one-parameter agents and 
a single seller. The focus is on approximation 
algorithms for NP-hard mechanism design 
problems. These algorithms need to satisfy 
certain monotonicity properties to 
truthfulness. 


ensure 


A one parameter agent is an agent who as 
her private data has some resource as well as 
a valuation, 1.e., the maximum amount of money 
she is willing to pay for this resource. Sometimes, 
however, the resource is assumed to be known 
to the mechanism. The scenario where a single 
seller offers these resources to the agents is pri- 
marily considered. Typically, the seller aims at 
maximizing the social welfare or her revenue. 
The work by Briest, Krysta and Vocking [6] 
will mostly be considered, but also other existing 
models and results will be surveyed. 


Utilitarian Mechanism Design 

A famous example of mechanism design prob- 
lems is given by combinatorial auctions (CAs), 
in which a single seller, auctioneer, wants to sell 
a collection of goods to potential buyers. A wider 
class of problems is encompassed by a utilitarian 
mechanism design (maximization) problem ITI de- 
fined by a finite set of objects A, a set of feasible 
outputs Og C A” and a set of n agents. Each 
agent declares a set of objects S; C A and a val- 
uation function vj : P(A) x A” — R by which 
she values all possible outputs. Given a vector 
S = (Sj,...,5,) of declarations one is inter- 
ested in output o* € Oy maximizing the social 
welfare, i.e., o* € argmaxgeg,, )j=1 Vi (Si, 0). 
In CAs, an object a corresponds to a subset of 
goods. Each agent declares all the subsets she is 
interested in and the prices she would be willing 
to pay. An output specifies the sets to be allocated 
to the agents. 

Here, a limited type of agents called single- 
minded is considered, introduced by Lehmann 
et al. [10]. Let Rg C A? be a reflexive and 
transitive relation on A, such that there exists 
a special object @ € A with @ <a for any 
aeéA to model the situation in which some 
agent does not contribute to the solution at all. For 
a,b € A(a,b) € Rz will be denoted by a < b. 
The single-minded agent i declares a single ob- 
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ject a; and is fully defined by her type (aj, v;), 
with a; € A and v; > 0. The valuation function 
introduced earlier reduces to 


v;, ifa; < 0; 
vj; (aj,0) = ‘j ; 
, else. 


Agent i is called known if object a; is 
known to the mechanism [11]. Here, mostly 
unknown agents will be considered. In- 
tuitively, each a; corresponds to an_ ob- 
ject agent i offers to contribute to the so- 
lution, v,; describes her valuation of any 
output o that indeed selects a;. In CAs, 
relation R- is set inclusion: an agent in- 
terested in set S will is also satisfied by 
S’ with S CS’. For ease of notation let 
(a,v) = ((@1,01),--+5(Gn.Un)), (@-i, V-i) = 
((41, U1), .--, (Gi-1, Vi-1), (Gi41, Vi41),--- (Gn, 
Un)) and ((a;, vj), (4-1, Vv-i)) = (a, v). 


Mechanism 

A mechanism M = (A, p) consists of an algo- 
rithm A computing a solution A(a, v) € Og and 
an n-tuple p(a, v) = (pi(a,v),..., Pn(a, v)) € 
IR". of payments collected from the agents. 
If aj x A(a,v);, agent i is selected, and let 
S(A(a,v)) = {ilaj < A(a,v)i} be the set of 
selected agents. Agent i’s type is her private 
knowledge. Thus, the types declared by agents 
may not match their true types. To reflect 
this, let (a7,v;*) refer to agent i’s true type 
and (a;,v;) be the declared type. Given 
an output 0 € Oy, the utility of agent i is 
uj(a,v) = v;(a7,0) — pj(a,v). Each agent’s 
goal is to maximize her utility. To achieve this, 
she will try to manipulate the mechanism by 
declaring a false type if this could result in higher 
utility. A mechanism is called truthful, or incen- 
tive compatible, if no agent i can gain by lying 
about her type, i.e., given declarations (a_;, v_;), 
uj (a7, v*), (@—i, v-i)) = uj (ai, v;), (4—i, V-i)) 


Fy 


for any (4;,v;) # (a7, v7). 


Monotonicity 
A sufficient condition for truthfulness of approx- 
imate mechanisms for single-minded CAs was 
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first given by Lehmann et al. [10]. Their results 
can be adopted for the considered scenario. An 
algorithm A is monotone with respect to R= if 


i € S(A((aj, vj), (a—i, v-i))) 
=> i € S(A((a’, v}), (a_i, v-i))) 


for any a; <a; and v; > v;. Intuitively, one 
requires that a winning declaration (a;,v;) 
remains winning if an object aj, smaller 
according to Rx, and a higher valuation v; 
are declared. If declarations (a_;,v_;) are 
fixed and object a; declared by i, algorithm A 
defines a critical value ge ie., the minimum 
valuation vu; that makes (a;,v;) winning, i.e., 
i € S(A((q@j, vj), (@-;, v-;))) for any v; > Chg 
and i ¢ S(A((qj,v;),(a—j;,v-;))) for any 
vi < ee The critical value payment scheme p* 
associated with A is defined by p (a,v) = oe 
if i € S(A(a, v)), and via, v) = 0, otherwise. 
The critical value for any fixed agent i can be 
computed, e.g., by performing binary search on 
interval [0, v;] and repeatedly running algorithm 
A to check if 7 is selected. Also, mechanism 
M, = (A, p4) is normalized, i.e., agents that 
are not selected pay 0. Algorithm A is exact, 
if for declarations (a,v), A(a,v)j =a; or 
A(a,v); = © for all i. In analogy to [10] one 
obtains the following. 


Theorem 1 Let A be a monotone and exact algo- 
rithm for some utilitarian problem II and single- 
minded agents. Then mechanism M4 = (A, p4) 
is truthful. 


Algorithm Ate 

Ll ays cok: 1 

2 fori=1,...,ndo 2 

3 vi = min{v;, 2**1}; 3 

4 vi! = Lox vis 4 

5 return Ay(a, v”); 5 
6 
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Additional Definitions 

In the unsplittable flow problem (UFP), 
an undirected graph G=(V,E), |E| =m, 
|V| =n, with edge capacities b., e € E, and 
a set K of k >1 commodities described by 
terminal pairs (s;,t;)<€ VxV and a demand 
d; and a value c; are given. One assumes 
that max; d; < mingbe, d; € [0,1] for each 
ie K ={1,...,k}, and be > 1 for alle ec E. 
Let B = mine {be}. A feasible solution is a subset 
K' CK and a single flow s;-t;-path for each 
i €¢ K’, such that the demands of K’ can 
simultaneously and unsplittably be routed along 
the paths and the capacities are not exceeded. 
The goal in UFP, called B-bounded UFP, is to 
maximize the total value of the commodities in 
K’. A generalization is allocating bandwidth for 
multicast communication, where commodity is 
a set of terminals that should be connected by 
a multicast tree. 


Key Results 


Monotone Approximation Schemes 

Let JT be a given utilitarian (maximization) prob- 
lem. Given declarations (a, v), let Opt(a, v) de- 
note an optimal solution to /7 on this instance and 
w(Opt(a, v)) the corresponding social welfare. 
Assuming that Aj is a pseudopolynomial exact 
algorithm for /7 an algorithm AL and monotone 
FPTAS for JT is defined in Fig. 1. 


Theorem 2 Let IT be a utilitarian mechanism 
design problem among single-minded agents, 
Ay monotone pseudopolynomial algorithm 
for IT with running time poly(n,V), where 


Algorithm Aj; aad 
V := max; vj, Best := (@,..., O), best := 0; 
for j =0,..., [log(1 — e) 'n] +1do 


k := |[log(V)]| — j; 
if wi (Ai (a, v)) > best then 
Best := A‘ (a, v); best := w(A(a, v)); 


return Best; 


Utilitarian Mechanism Design for Single-Minded Agents, Fig. 1 A monotone FPTAS for utilitarian problem IT 


and single-minded agents 


Utilitarian Mechanism Design for Single-Minded Agents 


V = max; v;, and assume that V < w(Opt(a, v)) 
for declaration (a,v). Then Att TAS. as 
a monotone FPTAS for IT. 


Theorem 2 can also be applied to minimiza- 
tion problems. Section “Applications” describes 
how these approximation schemes can be used 
for forward multi-unit auctions and job schedul- 
ing with deadlines. 


Truthful Primal-Dual Mechanisms 

For an instance G = (V,E) of UFP defined 
above, let S; be the set of all s;-;-paths in G, 
and S$ = Ws S;. Given S € S;, let gs(e) = dj 
if e € S, and gs(e) = 0 otherwise. UFP is the 
following integer linear program (ILP) 


k 
max yee >> xs 


(1) 
i=1 SES; 
s.t > qs(e)xs <be VeeEE (2) 
S:S€S,eeS 
Yael VESilacisk} (3) 
Ses; 


xs € {0,1} VSES. (4) 
The linear programming (LP) relaxation is the 
same linear program with constraints (4) replaced 
with xs > 0 for all S € S. The corresponding 
dual linear program is 


k 
min > deve + ae: 


ecE i=1 


(5) 


Utilitarian Mechanism 
Design for 1 
Single-Minded Agents, 
Fig.2 Truthful 
mechanism for network 
(multicast) routing. 

e © 2.718 is Euler 
number is} 
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s.t. zit > gsl©)ye = ci 
ecS (6) 

Vie fl,...,kK} VS €S; 
ZisYerO Wiefl,...,k} Vee E. (7) 


Based on these LPs, Fig. 2 specifies a primal-dual 
mechanism for routing, called Greedy-1. Greedy- 
1 ensures feasibility by using y,’s: if an added set 
exceeded the capacity b, of some e € £, then this 
would imply the stopping condition already in the 
previous iteration. Using the weak duality of LPs 
the following result can be shown. 


Theorem 3 Greedy-1 outputs a feasible solu- 
tion, and it is a (4 (m)!/8-))- approximation 
algorithm if there is a polynomial time algorithm 


that finds a y-approximate set S; in line 4. 


In case of UFP y = 1, as the shortest s;-t;-path 
computation finds set S; in line 4 of Greedy-1. For 
multicast routing, this problem corresponds to the 
NP-hard Steiner tree problem, for which one can 
take y = 1.55. Greedy-1 can easily be shown 
to be monotone in demands and valuations as 
required in Theorem |. Thus it implies a truthful 
mechanism for allocating network resources. The 
commodities correspond to bidders, the terminal 
nodes of bidders are known, but the bidders might 
lie about their demands and valuations. In the 
multicast routing the set of terminals for each 
bidder is known but the demands and valuations 
are unknown. 


Algorithm Greedy-1: 

T :=@;K:={1,...,k}; 
2 foralle € Edo y, := 1/be; 
3 repeat 
4 


forall i € K do S; := argmin {> ,¢5 Ve | S € Sj}; 
Ci : 

j= argmax ) ——=—_ | 1€ K;; 

dj Dees; Ye 

T = T U{S)}; K = K\ {js 

forall e € S; doy, := ye: (e- 


? 


1 yy) Aj N ED, 


until > cp beye = e® ‘mor K = 9; 
return T. 
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Algorithm Greedy-2: 


Single-Minded Agents, 1 T=9; 
Fig.3 Truthful 2 foralle ¢ Udo y, := 1/b,; 
mechanism for multi-unit 3 repeat 
CAs among unknown : 
single-minded bidders. For 4 S:= argmax S Ses \ r| 
CAs without multisets: ce qs(e) Ve 
qs(e) € {0, 1} for each 5 T= 7 Wis 
eee ee 6 forall e € S do y, := ye - (eB m)48(©)/be; 
Ve := Ve (e’m) 3 

7 until cy beye = ce? m; 

8 returnT. 
Corollary 1 Given any «€>0, B>1+e, is (aj,v;) = (S,cs), S €S;, and cs =c; is 


Greedy-1 is a truthful O(m\/8-))-approxima- 
tion mechanism for UFP (unicast routing) as 
well as for the multicast routing problem, where 
the demands and valuations of the bidders are 
unknown. 


When B is large, §2(log m), then the approxima- 
tion factor in Corollary 1 becomes constant. 
Azar et al. [4] presented further results in 
case of large B. Awerbuch et al. [3] gave 
randomized online truthful mechanisms for uni- 
and multicast routing, obtaining an expected 
O(log(wm))-approximation if B= 92(logm), 
where wt is the ratio of the largest to smallest 
valuation. Their approximation holds in fact 
with respect to the revenue of the auctioneer, 
but they assume that the demands are known to 
the mechanism. Bartal et al. [5] give a truthful 
O(B - (m/0)'/“2-2))-approximation mechanism 
for UFP with unknown valuations and demands, 
where 0 = min; {d;}. 

Greedy-1 can be modified to give truthful 
mechanisms for multi-unit CAs among unknown 
single-mined bidders. (In the case of unknown 
single-minded bidders, the bidders have as 
private data not only their valuations (as in 
the case of known single-minded bidders) but 
also the sets they demand.) Archer et al. [2] 
used randomized rounding to obtain a truthful 
mechanism for multi-unit CAs, but only in 
a probabilistic sense and only for known bidders. 
Multi-unit CA among single-minded bidders is 
a special case of ILP (1)-(4), where |S;| = 1 for 
each i € K, and qs(e) € {0, 1} for each e € U, 
Sé€S (Eis U in CAs). A bid of bidder i € K 


the valuation. The relation R~ is C. Algorithm 
Greedy-2 in Fig. 3 is exact and monotone for CAs 
with unknown single-minded bidders, as needed 
in Theorem |. 


Theorem 4 Algorithm Greedy-2 is a truthful 
1 

O(m B® )-approximation mechanism for multi-unit 

CAs among unknown single-minded bidders. 


Bartal et al. [5] presented a truthful mechanism 
for this problem among unknown single-minded 
bidders which is O(B - m‘/(8-))-approximate. 
(It works in fact for more general bidders.) 


Applications 


Applications of the techniques described above 
are presented and a short survey of other results. 


Applications of Monotone Approximation 
Schemes 

In a forward multi-unit auction a single auction- 
eer wants to sell m identical items to n possi- 
ble buyers (bidders). Each single-minded bidder 
specifies the number of items she is interested in 
and a price she is willing to pay. Elements in the 
introduced notation correspond to the requested 
and allocated numbers of items. Relation R= 
describes that bidder i requesting g; items will 
be satisfied also by any larger number of items. 
Mu’alem and Nisan [11] give a 2-approximate 
monotone algorithm for this problem. Theorem 2 
gives a monotone FPTAS for multi-unit auctions 
among unknown single-minded bidders. This FP- 
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TAS is truthful with respect to agents where both 
the number of items and price are private. 

In job scheduling with deadlines (JSD), each 
agent i has a job with running time ¢;, deadline 
d; and a price v; she is willing to pay if her 
job is processed by deadline d;. Element a; is 
defined as a; = (t;,d;). Output for agent i is 
a time slot for processing i’s job. For two el- 
ements a; = (t;,d;) and a} = (t/,d/) one has 
aj x a’ ift; < t} andd; > dj. Theorem 2 leads to 
a monotone FPTAS, which, however, is not exact 
(see Theorem 1) with respect to deadlines, and so 
it is a truthful mechanism only if the deadlines 
are known. The techniques of Theorem 2 apply 
also to minimization mechanism design problems 
with a single buyer, such as reverse multi-unit 
auctions, scheduling to minimize tardiness, con- 
strained shortest path and minimum spanning tree 
problems [6]. 


Applications of the primal dual algorithms 
The applications of the primal dual algorithms are 
combinatorial auctions and auctions for unicast 
and multicast routing. As these applications are 
tied very much to the algorithms, they have al- 
ready been presented in section “Key Results”. 


Survey of Other Results 

First truthful mechanisms for single-minded CAs 
were designed by Lehmann et al. [10], where they 
introduced the concept of single-minded agents, 
identified the role of monotonicity, and used 
greedy algorithms to design truthful mechanisms. 
Better approximation ratios of these greedy 
mechanisms were proved by Krysta [9] with 
the help of LP duality. A tool-box of techniques 
for designing truthful mechanisms for CAs was 
given by Mu’alem and Nisan [11]. 

The previous section presented a monotone 
FPTAS for job scheduling with deadlines where 
jobs are selfish agents and the seller offers the 
agents the facilities to process their jobs. Such 
scenarios when jobs are selfish agents to be 
scheduled on (possibly selfish) machines have 
been investigated further by Andelman and 
Mansour [1], see also references therein. 

So far social welfare was mostly assumed 
as the objective, but for a seller probably more 
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important is to maximize her revenue. This ob- 
jective turns out to be much harder to enforce 
in mechanism design. Such truthful (in prob- 
abilistic sense) mechanisms were obtained for 
auctioning unlimited supply goods among one- 
parameter agents [7, 8]. Another approach to 
maximizing seller’s revenue is known as optimal 
auction design [12]. A seller wants to auction 
a single good among agents and each agent has 
a private value for winning the good. One as- 
sumes that the seller knows a joint distribution of 
those values and wants to maximize her expected 
revenue [13, 14]. 
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Problem Definition 


In the vector bin packing problem, we are 
given an integral dimension d > 1 and a list 
iL = (X1,X2,...,%n) of items, where 
each item is a d-dimensional tuple x; = 
(Xj,1, X7,2,.--,Xi,q) with rational entries x;,; € 
[0,1]. The goal is to assign the items to a 
minimum number of multidimensional bins, 
where if X is the set of items assigned to a 
bin, we must have, for each j, 1 < j <d, 


> Xi, j < 1. 


xjeX 


Note that when d = 1, the vector bin packing 
problem reduces to the classic (one-dimensional) 
bin packing problem. 

One potential application of the vector bin 
packing problem is that of assigning jobs to 
servers in a shared hosting platform, where each 
job may require a specific number of cycles per 
second and specific amounts of memory, band- 
width, and other resources [12]. Here the servers 
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would correspond to the bins, the dimension d is 
the number of resources, the items are the jobs, 
and x;,; is the fraction of the total amount of a 
server’s jth resource that job x; requires. 

In the early literature, this problem was often 
called the multidimensional bin packing problem. 
That term, however, is now more typically re- 
served for the related problem where the items are 
d-dimensional rectangular parallelepipeds (rect- 
angles, when d = 2), the bins are d-dimensional 
unit cubes, and the items assigned must not only 
be assigned to bins but also to specific positions 
in the bins, in such a way that no point in any 
bin is in the interior of more than one item. With 
vector bin packing, in contrast, the dimensions 
are all independent and there is no geometric 
interpretation of the items. 


A(L) 
OPT(L) 


R&(d) = limsup R4y (d) 
Noo 


RY (d) = max 
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Key Results 


As a generalization of bin packing, vector bin 
packing is clearly NP-hard in the strong sense, 
and so most of the research on this problem has 
been directed toward the study of approximation 
algorithms for it. This will be the primary topic 
in this entry. Most of the theoretical results con- 
cerning these algorithms can be expressed in 
terms of asymptotic worst-case ratios. For a given 
algorithm A and a list of items L, let A(L) denote 
the number of bins used by A for L. Let OPT(L) 
denote the optimal number of bins for list L. 
We define the asymptotic worst-case ratio RP (d) 
for algorithm A on d-dimensional instances as 
follows. 


: L is a list of d-dimensional items with OPT(L) = N 


Generalizations of Classical Bin Packing 
Algorithms 


Generalizing First Fit and First Fit Decreasing 
Several classic one-dimensional bin packing al- 
gorithms have been generalized to vector bin 
packing. Imagine we have a potentially infinite 
sequence of empty bins B,, Bz,..., and let Xj, ; 
denote the total amount of resource j used by the 
items currently assigned to By. In the generalized 
“First Fit” algorithm, the first item goes in bin 
B,, and thereafter each item goes into the lowest- 
index bin into which it can be legally placed, 
subject to the resource constraints. In generalized 
“Best Fit,’ each item is assigned to a bin with the 
maximum value of ean Xn,j; among those to 
which it can legally be added, ties broken in favor 
of the smallest index h. 

As in the one-dimensional case, a plausible 
way to improve the above two online algorithms 
is to first reorder the list in decreasing order, and 
then apply the packing algorithm. Now, however, 
there are a variety of ways to define “decreasing 


order,” each leading to different algorithms. For 
example, in FFDmax items are ordered by non- 
increasing value of max? _» x;,; and then FF is 
applied. Similarly, in FFDsum, the items are or- 
dered by nonincreasing value of Sy X;,; and, 
in FFDprod, they are ordered by nonincreasing 
value of em x;,;- In FFDlex, they are ordered 
so that x; precedes x; only if either x;,; = x;7,;, 
1 < j < d, or there is a j* < d such that 
Xi,j = Xi',j, 1< J < T" and Xije < Xij*- 
The algorithms BFDmax, BFDsum, BFCprod, 
and BFDlex are defined analogously, with Best 
Fit being used to pack the reordered list instead 
of First Fit. 

Call an algorithm “reasonable” if it produces 
packings in which no two bins can be combined, 
that is, are such that all the items contained 
in the two would collectively fit together in a 
single bin [9]. All of the above algorithms are 
easily seen to be reasonable, and, indeed, any 
vector bin packing algorithm has a “reasonable” 
counterpart that uses no more bins and spends at 
most O(n?) additional time (in a final pass that 
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combines legally combinable pairs of bins as long 
as such pairs exist). A general upper bound on 
asymptotic worst-case behavior is the following. 


Theorem 1 ((9]) Jf A is a reasonable vector bin 
packing algorithm, then for all d > 1, 


RG (d)<d+. 
Unfortunately, none of the above algorithms are 
much better. 
Theorem 2 ([9, 11]) For each of the 10 algo- 
rithms defined above and all d > 1, 

RY (d) = d. 
Tighter bounds have been proved for two of the 
algorithms. 
Theorem 3 ([6]) For all d => 
7 


on 
* 70 


L, R&(d) = 


d—-1 


Theorem 4 ([6]) For all d>1l1,d+ dd +l) 


1 
Ss Rerpmax (4) = d+ 3" 


Note that the classic one-dimensional bin 
packing results of [8] yield RPRU1) = 17/10 
(the precise specialization of Theorem 3) and 
Repp() = 11/9 (a tighter result than the 
specialization of Theorem 4). Matching upper 
and lower bounds are not known for RPfpmax(@) 
for any d > 1. In special cases, however, the 
lower bounds can be improved. It was observed 
in [6] that the lower bounds for d € {2,3} could 
be increased to d + 11/60 using ideas from [8]. 
And Csirik et al. [4] showed that for odd d > 5, 
the lower bound of Theorem 2 could be increased 
by I/(d(d + 1)(d +2). 


Generalizing the de la Vega and Lueker 
Asymptotic Approximation Scheme 

In [5], de la Vega and Lueker devised an 
“asymptotic polynomial-time approximation 
scheme” (APTAS) for one-dimensional bin 
packing, that is, a collection of polynomial-time 
algorithms A¢ with RF (1) < 1+e forall e > 0. 
In that same paper, they also showed how to 
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generalize the algorithms to provide a collection 
of vector bin packing algorithms B, such that for 
each € and each integer d > 1, RB. (d)<d+e. 

The algorithm B, is quite simple. Divide L 
into d sublists, L},L2,...,L¢, where L; con- 
sists of all those items x for which 7 is the index 
of the dimension with the largest entry in the 
corresponding tuple, ties broken arbitrarily. Then 
apply A,/q to each list L; separately, viewed 
as an instance of one-dimensional bin packing 
with the size of item x; being x;,;, and output 
the union of the d packings. Unfortunately, al- 
though the running times for the B,’s are linear 
in dn, they contain additive constants that are 
potentially exponential in (d/e)?, and so they 
may not be practical for small e. 

In contrast, FF, FFDmax, and all their variants 
mentioned in the previous section have straight- 
forward O(dn7) implementations, and, although 
the data structures that allow them to be sped 
up to O(n logn) when d = 1 do not extend to 
higher dimensions, speedups should be possible 
by using d-dimensional dynamic range searching 
procedures to identify the set of bins that can 
contain the next item to be packed [11]. 


Hardness of Approximation Results 

In [14], Yao observed that, under a standard 
decision tree model of computation, any vector 
bin packing algorithm A that has RP (d) < d for 
all d cannot have o(n log) running time. This is 
not much of a constraint, however, since almost 
all the algorithms that have been proposed for 
Vector Bin Packing are slower than this. For those 
algorithms, a weaker bound applies. Assuming 
P ¥ NP, no polynomial-time vector bin packing 
algorithm A can have R9(d) < Vd —€ for all 
d and any € > 0. This follows from a straight- 
forward reduction of graph coloring to vector bin 
packing and a result of Zuckerman [15] for the 
former [3]. Under the same assumption, there can 
be no APTAS for any fixed d > 2 [13]. 


Algorithms with R9(d) < d 

Chekuri and Khanna [3] devised the first 
polynomial-time algorithms to guarantee 
R&(d) < d for all sufficiently large d. 
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Theorem 5 ((3]) For any «€ > O there is a 
polynomial-time algorithm C, such that 


@(d) <e-d +In(e!) +2. 


The algorithm works in three phases. The first 
considers the following linear program (LP). Let 
Zi, be a decision variable with value | if item x1 
is packed in bin 7. The LP’s constraints are 


(1) 


n 
Yo wig Zk SL 1Sk<m1<j <d Q) 
i=1 


Zik 20, 1<is<n,l<k<m (3) 


This is the LP relaxation of an integer program 
(with z;,~ € {0, 1}) which has a feasible solution 
if and only if our list can be packed into m bins. 
The first set of constraints insures that each item 
is packed into exactly one bin. The second set 
insures that all resource constraints are satisfied 
by the packing. 

Let M be the least value of m such that 
this LP is feasible. Then we clearly must have 
OPT(L) = M. Moreover we can in polynomial 
time determine M and a basic feasible solution 
for the corresponding LP, by using binary search 
and a polynomial time LP-solver. In this basic 
feasible solution, there will be at most n + dM 
positive variables (the number of nontrivial con- 
straints). Since each of the n items x; by (1) must 
be assigned to at least one bin, at least one of the 
variables z; , must be positive for each 7, mean- 
ing that at most dM of the items can be assigned 
to more than one bin. That leaves n — dM items 
assigned to exactly one bin, and consequently our 
LP solution yields a feasible packing of these 
items into M < OPT(L) bins, which is the 
output of our first phase. The remaining dM or 
fewer items will be packed into additional bins in 
two additional phases as follows. 

Let k = {[1/e]. While there are at least 
k unpacked items that will fit in a single bin, 
find such a set and pack them all in a new bin 
(Phase 2). Otherwise, find a maximum size set 
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of unpacked items that will fit in a bin, assign 
them to a new bin, and repeat until all items 
are packed (Phase 3). Note that in both of these 
phases the next set of items to be packed can be 
found in time O(n*kd), which is polynomial for 
fixed €. Thus the overall time for the algorithm 
is itself polynomial for fixed €. Phase 2 creates 
at most dM/k < ed -OPT(ZL) bins. Phase 3 
can be interpreted as implementing the Greedy 
algorithm for Set Covering, as applied to the 
instance in which the elements to be covered are 
the items left to be packed after Phase 2, the sets 
are the collections of those items which will fit in 
a bin, and no set has size exceeding k — 1. Thus, 
by standard results about Greedy Set Covering 
(see [7] for example), the number of bins added in 
this phase is less than (In(k —1)+1)-OPT(L) < 
(In(1/e) + 1) - OPT(L). Adding up the above 
three terms yields the claimed theorem. 

Note that if we set ¢ = 1/d in the above, 
we get a series of algorithms C,/g with 
RO 4 (d) < In(d) + 3, where the running time 
of each is polynomial in 7, although exponential 
in d. A slight improvement to this has recently 
been obtained by Bansal et al. [1]. They devise 
algorithms Dg,. that run in polynomial time for 
fixed d and ¢ (although exponential in both) 
that have RD, . <In(d +¢€)+1+ 6, which, 
for d > 2, already beats In(d) + 3 whene = 1. 


Experimental Results 


There have been several experimental studies of 
approximation algorithms for vector bin packing 
[2, 10-12]. These studies were for the most part 
limited to d < 10 anda < 500, which may 
well make sense in the context of the proposed 
applications, and used distinct sets of randomly 
generated test instances. In two cases ({10] and 
[12]), the algorithms were compared using ob- 
jective functions other than the number of bins 
packed. Nevertheless, certain common conclu- 
sions emerge. The FFD algorithms in particular 
yielded substantially better packings than worst- 
case analysis suggests. Both [11] and [12] sug- 
gest, however, that a different class of algorithms, 
ones that attempt to keep the bins as “balanced” 
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as possible, may perform even better. An example 
of such an algorithm is the “norm-based greedy” 
algorithm of [11], which packs the bins one- 
by-one, at each step adding to the current bin 
By, that item x; that fits and yields the smallest 
weighted L? norm for the resulting “gap vector” 
(Xn — Xi,1, Xh,2 — Xi,2,---,Xna — Xia). For 
more details, see [11]. As for the algorithms 
described above with R9?(d) < d for large d, the 


only ones with hopes of feasible running times 


are the algorithms Cj/q from [3] when n@ is 


of manageable size. Limited experiments from 
[12] indicate that the packings these algorithms 
produce are not competitive. 
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Problem Definition 


Vector scheduling is a multidimensional exten- 
sion of traditional machine scheduling problems. 
Whereas in traditional machine scheduling a job 
only uses a single resource, normally time, in 
vector scheduling a job uses several resources. In 
traditional scheduling, the load of a machine is 
the total resource consumption by the jobs that it 
serves. In vector scheduling, we define the load of 
a machine as the maximum resource usage over 
all resources of the jobs that are served by this 
machine. In the setting that we consider here, the 
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makespan, which is normally defined to be the 
time by which all jobs are completed, is equal to 
the maximum machine load. 

To define the vector scheduling problem that 
we consider more formally, we let ||x||.o denote 
the standard €,.9-norm of the vector x. In the 
vector scheduling problem, the input consists of 
a set J of n jobs, where each job j is associated 
with a d-dimensional vector pj € [0, 1]¢, and 
m identical machines. The goal is to find an as- 
signment of the jobs to the m machines such that 
maxy<i<m || )) jem; Pilloo is minimized, where 
M,; denotes the set of jobs that are assigned to 
machine 7. 

The traditional machine scheduling problem 
corresponds to the case d = 1, and this is known 
to be strongly NP-hard [8]. For d = 1, Gra- 
ham’s well-known list scheduling algorithm has 
a performance guarantee of 2 [9] and Hochbaum 
and Shmoys developed at PTAS [10]. For gen- 
eral vector scheduling, Graham’s list scheduling 
algorithm can be extended to the d-dimensional 
case, having a performance guarantee of d + 1. 
In this entry we focus on the work of Chekuri and 
Khann [5], who developed a PTAS for fixed d 
and gave a polylogarithmic approximation factor 
for the case of general d. 


Key Results 


Constant Dimension d 

Chekuri and Khann [5] designed an approxi- 
mation scheme that runs in polynomial time 
whenever the dimension d of the job vectors is 
constant. 


Theorem 1 For any € > 0, there exists an 
(1 + €)-approximation algorithm that has a 
running time of O((nd/€)9), where s is in 
O ((2eg2)¢), 


The proof of this theorem is a_ nontrivial 
generalization of the ideas that Hochbaum and 
Shmoys used for the 1-dimensional case [10]. 
In this primal-dual approach, the main idea 
is to view the scheduling problem as a bin- 
packing problem in which the jobs need to 
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be packed into a number of bins of a certain 
capacity B. If all jobs fit into m bins, then 
the makespan is bounded by B. Hochbaum 
and Shmoys gave an algorithm that determines 
whether the jobs fit into m bins of capacity 
(1 + €)B or the jobs need to be packed into 
at least m + 1 bins of capacity B. Chekuri 
and Khanna extended this idea by using d- 
dimensional bins, where the jobs assigned to 
one bin should have a total resource usage of at 
most B in any of the d dimensions. By standard 
scaling techniques, we assume w.l.o.g. that 
B=1. 

Like in the 1-dimensional case, Chekuri and 
Khanna divide the jobs in small and large jobs, 
where the size of a job is based on the £,9 norm. 
They first do a preprocessing step in which each 
coordinate of the vectors is set to 0 whenever it 
is too small compared to the maximum value of 
the coordinates in the same vector. To find an 
(1 + €)-approximation, the algorithm performs 
two stages. In the first stage all large jobs will be 
assigned to the machines, and in the second stage 
all small jobs will be assigned to the machines. 
Whereas in the 1-dimensional case the assign- 
ment of the small jobs can be done greedily on 
top of the large jobs, for d > 2 the interaction 
between the two stages needs to be taken into 
account. 

To accommodate this interaction, Chekuri and 
Khanna define a capacity configuration as a d- 
tuple (c1,...,cq) such that cz is an integer be- 
tween 0 and [1/e]. A set of jobs S can be 
feasible, scheduled on one machine according to 
a capacity configuration (c,,...,cqg) when for 


any dimension k, it holds that (Dyes Pi), < 


ay + €, ie., in each dimension k the resource 
usage is not more than cy - €. The number of 
distinct capacity configurations is given by t = 
(1 + [1/e])?. 

A capacity configuration describes approxi- 
mately how a machine is filled. As there are m 
machines available to process the jobs, a ma- 
chine configuration can be described by a f-tuple 
(m,,...,mr), satisfying m; > 0 and }°,m; = 
m, where m; denotes the number of machines 
of the ith capacity configuration. The number 
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of distinct machine configurations is certainly 
bounded from above by m’. 

After the preprocessing of the vectors and 
the splitting of the jobs in small and large, it 
needs to be determined whether all large jobs can 
be scheduled according to a machine configura- 
tion M. As a first step, all nonzero elements of 
the large vectors are rounded (down) to the be- 
gin points of geometrically increasing intervals. 
Moreover, as the vectors are in some sense large, 
not too many vectors can be scheduled on one 
machine. Therefore, using a dynamic program- 
ming approach, one can approximately determine 
whether the set of large vectors can be scheduled 
according to machine configuration M. 

When the set of large jobs are scheduled such 
that a certain machine 7 is scheduled according 
to a capacity configuration (c1,...,Cq), then the 
small jobs on this machine need to be scheduled 
according to the empty capacity configuration, 
i.e., the capacity configuration (1 + [1/e]) - 
(1,1,...,1)—(c1,..., cq). Given a machine con- 
figuration M, we let M denote the corresponding 
machine configuration as the one obtained by 
taking the empty capacity configurations for each 
of the machines in M. 

To see whether the small jobs can be sched- 
uled according to a machine configuration M, 
Chekuri and Khanna present an integer program- 
ming (ILP) formulation that assigns the vectors 
to the machines. Moreover, they show that solv- 
ing the LP relaxation of this ILP formulation 
and distributing the fractionally assigned vectors 
equally over the machines result in a solution in 
which each dimension of each machine is only 
overloaded by a factor of (1 + €). 

Once they have found a machine configuration 
M according to which the large jobs can be 
scheduled and corresponding machine configu- 
ration M according to which the small jobs can 
be scheduled, Chekuri and Khanna have shown 
that all jobs can be scheduled such that the load 
of any machine does not exceed 1 + e. If for 
all machine configurations M and corresponding 
machine configuration M the large jobs cannot 
be scheduled according to M or the small jobs 
cannot be scheduled according to M, then the 
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vectors cannot be scheduled such that the load of 
each machine is at most 1. 


General Dimension d 

For the general case in which the dimension d 
of the vectors is not restricted to be a constant, 
Chekuri and Khanna present several approxi- 
mation algorithms. Also for the general case, 
they assume that all vectors can be scheduled 
such that the makespan is bounded by 1. For two 
algorithms, they use as a subroutine an approx- 
imation algorithm for finding a set of vectors S 
that maximizes the volume of these vectors, 1.e., 
the sum of all coordinates of all these vectors 


d ; 
Vies > =1 (Pj) x. restricted to || Dies Billoo < 
1. This resulted in the following results. 


Theorem 2 There exists a _ polynomial-time 
O(log? d)-approximation algorithm for the 
vector scheduling problem. 


Theorem 3 There exists a O(log d)-approxi 
mation algorithm for the vector scheduling 


problem that runs in time polynomial in n@. 


These approximation results are good when d 
is small compared to the number of machines m. 
On the other hand, Chekuri and Khanna also give 
a randomized algorithm, which just assigns each 
job uniformly at random to one of the machines, 
obtaining a performance guarantee that is better 
when d is large compared to m. 


Theorem 4 There exists a randomized algo- 
rithm that has a performance guarantee of 
O(log dm/ log log dm) with high probability. 


Finally, there is also a hardness result for the 
vector scheduling problem. 


Theorem 5 For any constant p > 1, there is no 
polynomial-time approximation algorithm with a 
performance guarantee of p, unless NP = ZPP. 


Extensions 

Epstein and Tassa [6, 7] extended the vector 
scheduling problem to deal with more general 
objective functions. Instead of defining the 
load of a machine as the maximum resource 
usage over all resources of the jobs that are 
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served by this machine, they defined the load 
as the sum of the vectors pj assigned to the 
machine. That is, the load itself is now also a 
d-dimensional vector. Letting k = )? jew, Pi 
denote the load of machine 7, then in [6] they 
gave PTASes for several objective functions of 
the form F(S) = f(g(),...,¢(m)). Note that 
the vector scheduling problem as discussed in 
this entry is equal to the case that f = g = max. 
In [7], they extended their results to the more 
general case where the function g may vary per 
machine, ie., F(S) = f(g1(),.--; %m(m))- 

Bonifaci and Wiese [4] extended the vector 
scheduling problem to the £, norm and the case 
of unrelated machines. That is, a job j has a 
d-dimensional resource usage py on machine i. 
They considered the case in which the number of 
types of machines is constant: on the same type 
of machine, a certain job has the same resource 
usage. Moreover, they restricted themselves to 
the case of having only a constant number of re- 
sources, that is, the vectors py are d dimensional 
for a constant d. For this setting, they developed 
a PTAS. 

The PTAS of Chekuri and Khanna has a run- 
ning time that is doubly exponential in d. Bansal, 
Vredeveld, and Van der Zwaan [2] showed that 
this double exponential dependence on d is nec- 
essary. For € < 1, they showed that unless 
the exponential time hypothesis fails, there is no 
(1 + €)-approximation algorithm with running 
time exp(o(|1/e|?/*)). Moreover, they showed 
that unless NP has subexponential algorithms, no 
(1 + €)-approximation algorithm exists with run- 
ning time exp(|1/e|°@)). These lower bounds 
even hold for the case that €m more machines are 
allowed, for sufficiently small « > 0. Moreover, 
they also gave a (1 + €)-approximation algorithm 
with running time exp((1/e)0@ "8% + nd), 
which is the first efficient approximation scheme 
(EPTAS) for the problem with constant d. 


Open Problems 
The gap between the lower bounds and upper 


bounds on the running time of (1 + €)- 
approximation algorithms has almost been 
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closed. The question remains whether this is 
also the case when instead of the £,.-norm, the 
€p-norm is minimized. Furthermore, it would 
be interesting to know whether one can obtain 
better running times when the vectors are highly 
structured. These highly structured vectors may 
occur, for example, in applications of real-time 
scheduling; see, e.g., [1,3]. 
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Problem Definition 


Let G be an undirected graph. A subset C of 
vertices in G is a vertex cover for G if every edge 
in G has at least one end in C. The (parametrized) 
VERTEX COVER problem is for each given in- 
stance (G, k), where G is a graph and k > 0 is an 
integer (the parameter), to determine whether the 
graph G has a vertex cover of at most k vertices. 

The VERTEX COVER problem is one of the 
six “basic” NP-complete problems according to 
Garey and Johnson [4]. Therefore, the problem 
cannot be solved in polynomial time unless P = 
NP. However, the NP-completeness of the prob- 
lem does not obviate the need for solving it 
because of its fundamental importance and wide 
applications. One approach was initiated based 
on the observation that in many applications, the 
parameter k is small. Therefore, by taking the 
advantages of this fact, one may be able to solve 
this NP-complete problem effectively and practi- 
cally for instances with a small parameter. More 
specifically, algorithms of running time of the 
form f(k)p(n) have been studied for VERTEX 
COVER, where p(n) is a low-degree polynomial 
of the number ” = |G| of vertices in G and f(k) is 
a function independent of n. 
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There has been an impressive sequence of 
improved algorithms for the VERTEX COVER 
problem. A number of new techniques have 
been developed during this research, including 
kernelization, folding, and refined branch-and- 
search. In particular, the kernelization method 
is the study of polynomial time algorithms 
that can significantly reduce the instance 
size for VERTEX COVER. The following are 
some concepts related to the kernelization 
method: 


Definition 1 Two instances (G, k) and (G’, k’) of 
VERTEX COVER are equivalent if the graph G has 
a vertex cover of size < k if and only if the graph 
G’ has a vertex cover of size < k’. 


Definition 2 A kernelization algorithm for the 
VERTEX COVER problem takes an instance (G, 
k) of VERTEX COVER as input and produces an 
equivalent instance (G’, k’) for the problem, such 
that |G’| < |G| andk’ <k. 


The kernelization method has been used exten- 
sively in conjunction with other techniques in 
the development of algorithms for the VERTEX 
COVER problem. Two major issues in the study 
of kernelization method are (1) effective reduc- 
tions of instance size; and (2) the efficiency of 
kernelization algorithms. 


Key Results 


A number of kernelization techniques are dis- 
cussed and studied in the current paper. 


Preprocessing Based on Vertex Degrees 

Let (G, k) be an instance of VERTEX COVER. 
Let v be a vertex of degree larger than k in G. 
If a vertex cover C does not include v, then C 
must contain all neighbors of v, which implies 
that C contains more than k vertices. Therefore, 
in order to find a vertex cover of no more than k 
vertices, one must include v in the vertex cover, 
and recursively look for a vertex cover of k — 1 
vertices in the remaining graph. 
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The following fact was observed on vertices of 
degree less than 3. 


Theorem 1 There is a linear time kernelization 
algorithm that on each instance (G, k) of vertex 
cover, where the graph G contains a vertex of de- 
gree less than 3, produces an equivalent instance 


(G’,k’) such that |G'| <|G| and/or k < k’. 


Therefore, vertices of high degree (i.e., degree 
> k) and low degree (i.e., degree < 3) can always 
be handled efficiently before any more time- 
consuming process. 


Nemhauser-Trotter Theorem 

Let G be a graph with vertices v1, v2, ..., Up. 
Consider the following integer programming 
problem: 


(IP)Minimize x; +xX2+---+Xn 


Subject to xj +x; > 1 


for each edge [vj, vj] in G 


x € {0,1}, L<i<n 


It is easy to see that there is a one-to-one corre- 
spondence between the set of feasible solutions 
to (IP) and the set of vertex covers of the graph 
G. A natural LP-relaxation (LP) of the problem 
(IP) is to replace the restrictions x; € {0, 1} with 
x; => 0 for all i. Note that the resulting linear 
programming problem (LP) now can be solved in 
polynomial time. 

Leto = txt, er xe be an optimal solution 
to the linear programming problem (LP). The 
vertices in the graph G can be partitioned into 
three disjoint parts according to o: 


Ig = {v; | x2 < 0.53, 
Co = {v; | x? > 0.5}, and 
Vo = {ui | xP = 0.5} 


The following nice property of the above vertex 
partition of the graph G was first observed by 
Nemhauser and Trotter [5]. 


Theorem 2 (Nemhauser-Trotter) Let G[Vo| be 
the subgraph of G induced by the vertex set Vo. 
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Then (1) every vertex cover of G[Vo| contains 
at least |Vo|/2 vertices; and (2) every minimum 
vertex cover of G[Vo| plus the vertex set Co makes 
a minimum vertex cover of the graph G. 


Let k be any integer, and let G’ = G[Vo] and 
k’ =k —|Co|. As first noted in [3], by The- 
orem 2, the instances (G, k) and (G’,k’) are 
equivalent, and |G’| < 2k’ is a necessary con- 
dition for the graph G’ to have a vertex cover 
of size k’. This observation gives the following 
kernelization result. 


Theorem 3 There is  a_ polynomial-time 
algorithm that for a given instance (G, k) 
for the vertex cover problem, constructs an 
equivalent instance (G', k') such that k' < k and 
|G"| < 2k’. 


A Faster Nemhauser-Trotter Construction 
Theorem 3 suggests a polynomial-time ker- 
nelization algorithm for VERTEX COVER. The 
algorithm is involved in solving the linear 
programming problem (LP) and partitioning 
the graph vertices into the sets Jp, Co, and Vo. 
Solving the linear programming problem (LP) 
can be done in polynomial time but is kind of 
costly in particular when the input graph G is 
dense. Alternatively, Nemhauser and Trotter [5] 
suggested the following algorithm without using 
linear programming. Let G be the input graph 
with vertex set {v1,..., Un}. 


1. construct a bipartite graph B with vertex set 
(Ui ss<pty UT saax, UE} Sach that juPyo? 
is an edge in B if and only if [v;, v;] is an edge 
in G; 

2. find a minimum vertex cover Cz for B; 

3. I = {v; | if neither uP nor vw is in Cp}; 

Co = {v; | if both uP and vF are in Cp}; 

Y= {v; | if exactly one of oF and ur is 

in Cp} 


It can be proved [5] (see also [2]) that The- 
orem 2 still holds true when the sets Cp and 
Vo in the theorem are replaced by the sets Cj 
and V,, respectively, constructed in the above 
algorithm. 
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The advantage of this approach is that the sets 
C4 and Vj can be constructed in time O(m./n) 
because the minimum vertex cover Cz for the 
bipartite graph B can be constructed via a max- 
imum matching of B, which can be constructed 
in time O(m,/n) using Dinic’s maximum flow 
algorithm, which is in general faster than solving 
the linear programming problem (LP). 


Crown Reduction 

For a set S of vertices in a graph G, denote 
by M(S) the set of vertices that are not in S$ 
but adjacent to some vertices in S. A crown in 
a graph G is a pair (J, H) of subsets of ver- 
tices in G satisfying the following conditions: 
(1) J # Mis an independent set, and H = N(J); 
and (2) there is a matching M on the edges 
connecting J and H such that all vertices in H are 
matched in M. 

It is quite easy to see that for a given crown 
U, H), there is a minimum vertex cover that 
includes all vertices in H and excludes all ver- 
tices in J. Let G’ be the graph obtained by re- 
moving all vertices in J and H from G. Then, 
the instances (G, k) and (G’,k’) are equivalent, 
where k’ = k — |H |. Therefore, identification of 
crowns in a graph provides an effective way for 
kernelization. 

Let G be the input graph. The following algo- 
rithm is proposed. 


1. construct a maximal matching M, in G; let O 
be the set of vertices unmatched in M); 

2. construct a maximum matching M) of the 
edges between O and N(O); i = 0; let Jp be 
the set of vertices in O that are unmatched in 
Mp; 

3. repeat until J; =Jj-. {H; = N(ij); 
Ti44 = 1; UNup(Aij)3i =i +15 35 (where 
Nm (H;) is the set of vertices in O that match 
the vertices in H; in the matching M2) 

4. l= 1;; H = N(j;); output (/, A). 


Theorem 4 (/) if the set Ig is not empty, then 
the above algorithm constructs a crown (I, H); 
(2) if both |M,| and |M2| are bounded by k, and 
Io = Q, then the graph G has at most 3k vertices. 
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According to Theorem 4, the above algorithm 
on an instance (G, k) of VERTEX COVER either 
(1) finds a matching of size larger than k — which 
implies that there is no vertex cover of k vertices 
in the graph G; or (2) constructs a crown (/, 
H) — which will reduce the size of the instance; 
or (3) in case neither of (1) and (2) holds 
true, concludes that the graph G contains 
at most 3k vertices. Therefore, repeatedly 
applying the algorithm either derives a direct 
solution to the given instance, or constructs an 
equivalent instance (G’,k’) with k’<k and 
|G’| < 3k’. 


Applications 


The research of the current paper was directly 
motivated by authors’ research in bioinformat- 
ics. It is shown that for many computational 
biological problems, such as the construction of 
phylogenetic trees, phenotype identification, and 
analysis of microarray data, preprocessing based 
on the kernelization techniques has been very 
effective. 


Experimental Results 


Experimental results are given for handling 
graphs obtained from the study of phylogenetic 
trees based on protein domains, and from the 
analysis of microarray data. The results show that 
in most cases the best way to kernelize is to start 
handling vertices of high and low degrees (i.e., 
vertices of degree larger than k or smaller than 3) 
before attempting any of the other kernelization 
techniques. Sometimes, kernelization based 
on Nemhauser-Trotter Theorem can solve the 
problem without any further branching. It is 
also observed that sometimes particularly on 
dense graphs, kernelization techniques based on 
Nemhauser-Trotter Theorem are kind of time- 
consuming but do not reduce the instance size by 
much. On the other hand, the techniques based on 
high-degree vertices and crown reduction seem 
to work better. 
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Data Sets 


The experiments were performed on graphs ob- 
tained based on data from NCBI and SWISS- 
PROT, well known open-source repositories of 
biological data. 
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Problem Definition 


The VERTEX COVER problem is one of the 
six “basic” NP-complete problems according 
to Garey and Johnson [7]. Therefore, the problem 
cannot be solved in polynomial time unless 
P= NP. However, the NP-completeness of the 
problem does not obviate the need for solving it 
because of its fundamental importance and wide 
applications. 

One approach is to develop parameterized al- 
gorithms for the problem, with the computational 
complexity of the algorithms being measured in 
terms of both input size and a parameter value. 
This approach was initiated based on the observa- 
tion that in many applications, the instances of the 
problem are associated with a small parameter. 
Therefore, by taking the advantages of the small 
parameters, one may be able to solve this NP- 
complete problem effectively and practically. 

The problem is formally defined as follows. 
Let G be an (undirected) graph. A subset C of 
vertices in G is a vertex cover for G if every 
edge in G has at least one end in C. An instance 
of the (parameterized) VERTEX COVER problem 
consists of a pair (G, k), where G is a graph and 
k is an integer (the parameter), which is to deter- 
mine whether the graph G has a vertex cover of 
k vertices. The goal is to develop parameterized 
algorithms of running time O(f(k)p(n)) for the 
VERTEX COVER problem, where p(n) is a lower- 
degree polynomial of the input size n, and f(x) is 
the non-polynomial part that is a function of the 
parameter k but independent of the input size n. It 
would be expected that the non-polynomial func- 
tion f(k) is as small as possible. Such an algorithm 
would become “practically effective” when the 
parameter value k is small. It should be pointed 
out that unless an unlikely consequence occurs in 
complexity theory, the function f(x) is at least an 
exponential function of the parameter k [8]. 


Key Results 
A number of techniques have been proposed in 


the development of parameterized algorithms for 
the VERTEX COVER problem. 
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Kernelization 

Suppose (G, k) is an instance for the VERTEX 
COVER problem, where G is a graph and k is the 
parameter. The kernelization operation applies 
a polynomial time preprocessing on the instance 
(G, k) to construct another instance (G’, k’), where 
G’ is a smaller graph (the kernel) and k' < k, 
such that G’ has a vertex cover of k’ vertices if 
and only if G has a vertex cover of k vertices. 
Based on a classical result by Nemhauser and 
Trotter [9], the following kernelization result was 
derived. 


Theorem 1 There is an algorithm of running 
time O(kn + k?) that for a given instance (G, k) 
for the VERTEX COVER problem, constructs an- 
other instance (G', k’) for the problem, where the 
graph G' contains at most 2k vertices andk’ < k, 
such that the graph G has a vertex cover of k 
vertices if and only if the graph G' has a vertex 
cover of k’ vertices. 


Therefore, kernelization provides an efficient 
preprocessing for the VERTEX COVER problem, 
which allows one to concentrate on graphs of 
small size (i.e., graphs whose size is only related 
to k). 


Folding 

Suppose v is a degree-2 vertex in a graph G with 
two neighbors u and w such that u and w are not 
adjacent to each other. Construct a new graph G’ 
as follows: remove the vertices v, u, and w and 
introduce a new vertex vg that is adjacent to all 
remaining neighbors of the vertices u and w in G. 
The graph G’ is said being obtained from the 
graph G by folding the vertex v. The following 
result was derived. 


Theorem 2 Let G’ be a graph obtained by fold- 
ing a degree-2 vertex v in a graph G, where the 
two neighbors of v are not adjacent to each other. 
Then the graph G has a vertex cover of k vertices 
if and only if the graph G' has a vertex cover of 
k — 1 vertices. 


An folding operation allows one to decrease 
the value of the parameter k without branching. 
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Therefore, folding operations are regarded as 
very efficient in the development of exponential 
time algorithms for the VERTEX COVER 
problem. Recently, the folding operation has 
be generalized to apply to a set of more than one 
vertex in a graph [6]. 


Branch and Search 

A main technique is the branch and search 
method that has been extensively used in the 
development of algorithms for the VERTEX 
COVER problem (and for many other NP- 
hard problems). The method can be described 
as follows. Let (G,k) be an instance of the 
VERTEX COVER problem. Suppose that somehow 
a collection {C,,...,C,} of vertex subsets in 
the graph G is identified, where for each i, the 
subset C; has c; vertices, such that if the graph 
G contains a vertex cover of k vertices, then 
at least for one C; of the vertex subsets in the 
collection, there is a vertex cover of k vertices 
for G that contains all vertices in C;. Then 
a collection of (smaller) instances (G;, k;) can be 
constructed, where 1 <i < b,k; =k —c;, and 
G; is obtained from G by removing all vertices 
in C;. Note that the original graph G has a vertex 
cover of k vertices if and only if for one (Gj, 
k;) of the smaller instances the graph G; has 
a vertex cover of k; vertices. Therefore, now the 
process can be branched into b sub-processes, 
each on a smaller instance (G;, k;) recursively 
searches for a vertex cover of k; vertices in the 
graph Gj. 

Let 7(k) be the number of leaves in the 
search tree for the above branch and search 
process on the instance (G, k), then the above 
branch operation gives the following recurrence 
relation: 


T(k)=T(k—c1)+T(k — c2)+ ++» +T(k — cp) 


To solve this recurrence relation, let T(k) = x* 


so that the above recurrence relation becomes 


xk = kel +4 x k-€2 tee che xk-eb 
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It can be proved [3] that the above polynomial 
equation has a unique root xo larger than 1. 
From this, one gets T(k) = a , which, up to 
a polynomial factor, gives an upper bound on the 
running time of the branch and search process on 
the instance (G, k). 

The simplest case is that a vertex v of degree 
d > 0 in the graph G is picked. Let w;, ..., wa 
be the neighbors of v. Then either v is contained 
in a vertex cover C of k vertices, or, if v is 
not contained in C, then all neighbors w,, ..., 
Wg of v must be contained in C. Therefore, one 
obtains a collection of two subsets Cy = {v} and 
C2 = {wj,...,wa}, on which the branch and 
search process can be applied. 

The efficiency of a branch and search opera- 
tion depends on how effectively one can identify 
the collection of the vertex subsets. Intuitively, 
the larger the sizes of the vertex subsets, the more 
efficient is the operation. Much effort has been 
made in the development of VERTEX COVER 
algorithms to achieve larger vertex subsets. Im- 
provements on the size of the vertex subsets 
have been involved with very complicated and 
tedious analysis and enumerations of combina- 
torial structures of graphs. The current paper [3] 
achieved a collection of two subsets C; and C> 
of sizes c;} = 1 and cz = 6, respectively, and 
other collections of vertex subsets that are at 
least as good as this (the techniques of ker- 
nelization and vertex folding played important 
roles in achieving these collections). This gives 
the following algorithm for the VERTEX COVER 
problem. 


Theorem 3 The VERTEX COVER problem can be 
solved in time O(kn + 1.2852*), 


Very recently, a further improvement over Theo- 
rem 3 has been achieved that gives an algorithm 
of running time O(kn + 1.2738") for the VER- 
TEX COVER problem [4]. 


Applications 


The study of parameterized algorithms for the 
VERTEX COVER problem was motivated by ETH 
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Ziirich’s DARWIN project in computational 
biology and computational biochemistry 
(see, e.g., [10, 11]). A number of computational 
problems in the project, such as multiple 
sequence alignments [10] and biological conflict 
resolving [11], can be formulated into the 
VERTEX COVER problem in which the parameter 
value is in general not larger than 100. Therefore, 
an algorithm of running time O(kn + 1.2852") 
for the problem becomes very effective and 
practical in solving these problems. 

The parameterized algorithm given in Theo- 
rem 3 has also induced a faster algorithm for 
another important NP-hard problem, the MAX- 
IMUM INDEPENDENT SET problem on sparse 
graphs [3]. 


Open Problems 


The main open problem in this line of research 
is how far one can go along this direction. More 
specifically, how small the constant c > 1 can 
be for the VERTEX COVER problem to have an 
algorithm of running time O(ckn?™)? With 
further more careful analysis on graph combi- 
natorial structures, it seems possible to slightly 
improve the current best upper bound [4] for the 
problem. Some new techniques developed more 
recently [6] also seem very promising to improve 
the upper bound. On the other hand, it is known 
that the constant c cannot be arbitrarily close to 
1 unless certain unlikely consequence occurs in 
complexity theory [8]. 


Experimental Results 


A number of research groups have implemented 
some of the ideas of the algorithm in Theorem 3 
or its variations, including the Parallel Bioin- 
formatics project in Carleton University [2], the 
High Performance Computing project in Univer- 
sity of Tennessee [1], and the DARWIN project 
in ETH Ziirich [10, 11]. As reported in [5], these 
implementations showed that this algorithm and 
the related techniques are “quite practical” for the 
VERTEX COVER problem with parameter value k 
up to around 400. 
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Problem Definition 


The whole process of designing, analyzing, im- 
plementing, tuning, debugging and experimen- 
tally evaluating algorithms can be referred to as 
Algorithm Engineering. Algorithm Engineering 
views algorithmics also as an engineering dis- 
cipline rather than a purely mathematical disci- 
pline. Implementing algorithms and engineering 
algorithmic codes is a key step for the transfer 
of algorithmic technology, which often requires 
a high-level of expertise, to different and broader 
communities, and for its effective deployment in 
industry and real applications. 

Experiments can help measure practical indi- 
cators, such as implementation constant factors, 
real-life bottlenecks, locality of references, cache 
effects and communication complexity, that may 
be extremely difficult to predict theoretically. 
Unfortunately, as in any empirical science, it 
may be sometimes difficult to draw general con- 
clusions about algorithms from experiments. To 
this aim, some researchers have proposed accu- 
rate and comprehensive guidelines on different 
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aspects of the empirical evaluation of algorithms 
maturated from their own experience in the field 
(see, for example [1, 15, 16, 20]). The interested 
reader may find in [18] an annotated bibliography 
of experimental algorithmics sources addressing 
methodology, tools and techniques. 

The process of implementing, debugging, test- 
ing, engineering and experimentally analyzing 
algorithmic codes is a complex and delicate task, 
fraught with many difficulties and pitfalls. In this 
context, traditional low-level textual debuggers 
or industrial-strength development environments 
can be of little help for algorithm engineers, who 
are mainly interested in high-level algorithmic 
ideas rather than in the language and platform- 
dependent details of actual implementations. A]- 
gorithm visualization environments provide tools 
for abstracting irrelevant program details and 
for conveying into still or animated images the 
high-level algorithmic behavior of a piece of 
software. 

Among the tools useful in algorithm 
engineering, visualization systems exploit 
interactive graphics to enhance the development, 
presentation, and understanding of computer 
programs [27]. Thanks to the capability of 
conveying a large amount of information in 
a compact form that is easily perceivable by 
a human observer, visualization systems can 
help developers gain insight about algorithms, 
test implementation weaknesses, and _ tune 
suitable heuristics for improving the practical 
performances of algorithmic codes. Some 
examples of this kind of usage are described 
in [12]. 


Key Results 


Systems for algorithm visualization have ma- 
tured significantly since the rise of modern com- 
puter graphic interfaces and dozens of algorithm 
visualization systems have been developed in 
the last two decades [2, 3, 4, 5, 6, 8, 9, 10, 
13, 17, 25, 26, 29]. For a comprehensive sur- 
vey the interested reader can be referred to [11, 
27] and to the references therein. The remain- 
der of this entry discusses the features of al- 


Visualization Techniques for Algorithm Engineering 


gorithm visualization systems that appear to be 
most appealing for their deployment in algorithm 
engineering. 


Critical Issues 

From the viewpoint of the algorithm developer, 
it is desirable to rely on systems that offer visu- 
alizations at a high level of abstraction. Namely, 
one would be more interested in visualizing the 
behavior of a complex data structure, such as 
a graph, than in obtaining a particular value of 
a given pointer. 

Fast prototyping of visualizations is another 
fundamental issue: algorithm designers should be 
allowed to create visualization from the source 
code at hand with little effort and without heavy 
modifications. At this aim, reusability of visual- 
ization code could be of substantial help in speed- 
ing up the time required to produce a running 
animation. 

One of the most important aspects of algo- 
rithm engineering is the development of libraries. 
It is thus quite natural to try to interface visu- 
alization tools to algorithmic software libraries: 
libraries should offer default visualizations of 
algorithms and data structures that can be refined 
and customized by developers for specific pur- 
poses. 

Software visualization tools should be able 
to animate not just “toy programs”, but signif- 
icantly complex algorithmic codes, and to test 
their behavior on large data sets. Unfortunately, 
even those systems well suited for large infor- 
mation spaces often lack advanced navigation 
techniques and methods to alleviate the screen 
bottleneck. Finding a solution to this kind of 
limitations is nowadays a challenge. 

Advanced debuggers take little advantage 
of sophisticated graphical displays, even in 
commercial software development environments. 
Nevertheless, software visualization tools may 
be very beneficial in addressing problems 
such as finding memory leaks, understanding 
anomalous program behavior, and studying 
performance. In particular, environments that 
provide interpreted execution may more easily 
integrate advanced facilities in support to 
debugging and performance monitoring, and 
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many recent systems attempt at exploring this 
research direction. 


Techniques 

One crucial aspect in visualizing the dynamic 
behavior of a running program is the way it is 
conveyed into graphic abstractions. There are two 
main approaches to bind visualizations to code: 
the event-driven and the state-mapping approach. 


Event-Driven Visualization 

A natural approach to algorithm animation con- 
sists of annotating the algorithmic code with calls 
to visualization routines. The first step consists 
of identifying the relevant actions performed by 
the algorithm that are interesting for visualization 
purposes. Such relevant actions are usually re- 
ferred to as interesting events. As an example, in 
a sorting algorithm the swap of two items can be 
considered an interesting event. The second step 
consists of associating each interesting event with 
a modification of a graphical scene. Animation 
scenes can be specified by setting up suitable vi- 
sualization procedures that drive the graphic sys- 
tem according to the actual parameters generated 
by the particular event. Alternatively, these visu- 
alization procedures may simply log the events in 
a file for a post-mortem visualization. The calls 
to the visualization routines are usually obtained 
by annotating the original algorithmic code at the 
points where the interesting events take place. 
This can be done either by hand or by means 
of specialized editors. Examples of toolkits based 
on the event-driven approach are Polka [28] and 
GeoWin, a C++ data type that can be easily 
interfaced with algorithmic software libraries of 
great importance in algorithm engineering such 
as CGAL [14] and LEDA [19]. 


State Mapping Visualization 

Algorithm visualization systems based on state 
mapping rely on the assumption that observing 
how the variables change provides clues to the ac- 
tions performed by the algorithm. The focus is on 
capturing and monitoring the data modifications 
rather than on processing the interesting events 
issued by the annotated algorithmic code. For this 
reason they are also referred to as “data driven” 
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visualization systems. Conventional debuggers 
can be viewed as data driven systems, since they 
provide direct feedback of variable modifications. 
The main advantage of this approach over the 
event-driven technique is that a much greater 
ignorance of the code is allowed: indeed, only the 
interpretation of the variables has to be known 
to animate a program. On the other hand, fo- 
cusing only on data modification may sometimes 
limit customization possibilities making it diffi- 
cult to realize animations that would be natural 
to express with interesting events. Examples of 
tools based on the state mapping approach are 
Pavane [23, 25], which marked the first paradigm 
shift in algorithm visualization since the intro- 
duction of interesting events, and Leonardo [10] 
an integrated environment for developing, visual- 
izing, and executing C programs. 

A comprehensive discussion of other tech- 
niques used in algorithm visualization appears 
in [7, 21, 22, 24, 27]. 


Applications 


There are several applications of visualization 
in algorithm engineering, such as testing and 
debugging of algorithm implementations, visual 
inspection of complex data structures, identifica- 
tion of performance bottlenecks, and code opti- 
mization. Some examples of uses of visualization 
in algorithm engineering are described in [12]. 


Open Problems 


There are many challenges that the area of al- 
gorithm visualization is currently facing. First of 
all, the real power of an algorithm visualization 
system should be in the hands of the final user, 
possibly inexperienced, rather than of a profes- 
sional programmer or of the developer of the tool. 
For instance, instructors may greatly benefit from 
fast and easy methods for tailoring animations 
to their specific educational needs, while they 
might be discouraged from using systems that are 
difficult to install or heavily dependent on partic- 
ular software/hardware platforms. In addition to 
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being easy to use, a software visualization tool 
should be able to animate significantly complex 
algorithmic codes without requiring a lot of ef- 
fort. This seems particularly important for fu- 
ture development of visual debuggers. Finally, 
visualizing the execution of algorithms on large 
data sets seems worthy of further investigation. 
Currently, even systems designed for large in- 
formation spaces often lack advanced navigation 
techniques and methods to alleviate the screen 
bottleneck, such as changes of resolution and 
scale, selectivity, and elision of information. 
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Problem Definition 


This problem is concerned with scheduling jobs 
with as little energy as possible by adjusting the 
processor speed wisely. This problem is moti- 
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vated by dynamic voltage scaling (DVS) (or speed 
scaling) technique, which enables a processor to 
operate at a range of voltages and frequencies. 
Since energy consumption is at least a quadratic 
function of the supply voltage (hence CPU fre- 
quency/speed), it saves energy to execute jobs as 
slowly as possible while still satisfying all timing 
constraints. The associated scheduling problem 
is referred to as min-energy DVS scheduling. 
Previous work showed that the min-energy DVS 
schedule can be computed in cubic time. The 
work of Li and Yao [7] considers the discrete 
model where the processor can only choose its 
speed from a finite speed set. This work designs 
an O(dn logn) two-phase algorithm to compute 
the min-energy DVS schedule for the discrete 
model (d represents the number of speeds) and 
also proves a lower bound of Q(n login) for the 
computation complexity. 


Notations and Definitions 


In the variable voltage scheduling model, there 

are two important sets: 

1. Set J (job set) consists of n jobs: j1, j2,.-.-Jjn- 
Each job jx has three parameters as its in- 
formation: a, representing the arrival time of 
Jk, bg representing the deadline of j,, and Rx 
representing the total CPU cycles required by 
Jz. The parameters satisfy 0 < ax, < by < 1. 

2. Set SD (speed set) consists of the possible 
speeds that can be used by the processor. 
According to the property of SD, the schedul- 
ing model is divided into the following two 
categories: 


Continuous model: The set SD is the set of 
positive real numbers. 

Discrete model: The set SD consists of d pos- 
itive values: 5) > Sy >--- > Sq. 


A schedule S consists of the following two 
functions: s(t) which specifies the processor 
speed at time f¢ and job(t) which specifies the job 
executed at time ¢. Both functions are piecewise 
constant with finitely many discontinuities. 
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A feasible schedule must give each job its 
required number of cycles between arrival time 
and deadline, therefore satisfying the property 


bx 
: s(t)6(k, job(t))dt = R;,, where 6(i, 7) = 1 


ify = j and 6(i, 7) = O if otherwise. 

The EDF principle defines an ordering on the 
jobs according to their deadlines. At any time f, 
among jobs jx that are available for execution, 
that is, j, satisfying t € [ax,,b,) and jz not yet 
finished by f, it is the job with minimum by that 
will be executed during [t, ¢ + €]. 

The power P, or energy consumed per unit 
of time, is a convex function of the proces- 
sor speed. The energy consumption of a sched- 
ule S = (s(t), job(t)) is defined as E(S) = 

1 


P(s(t))dt. 


A schedule is called an optimal schedule if 
its energy consumption is the minimum possible 
among all the feasible schedules. Note that for the 
continuous model, the optimal schedule uses the 
same speed for the same job. 

The work of Li and Yao considers the problem 
of computing an optimal schedule for the discrete 
model under the following assumptions. 


Assumptions 


1. Single processor: At any time f, only one job 
can be executed. 

2. Preemptive: Any job can be interrupted dur- 
ing its execution. 

3. Non-precedence: There is no precedence re- 
lationship between any pair of jobs. 

4. Offline: The processor knows the information 
of all the jobs at time 0. 


This problem is called min-energy discrete dy- 
namic voltage scaling (MEDDVS). 


Problem 1 (MEDDVS,,sp) 


INPUT: Integer n, set J = {j1, j2,..., jn} and 
SD = {81,82,...,Sa}+ je = {ax, be, Re}. 
OUTPUT: Feasible schedule S = (s(t), job(t)) 

that minimizes E(S). 


Voltage Scheduling 


Kwon and Kim [6] proved that the optimal 
schedule for the discrete model can be obtained 
by first calculating the optimal schedule for the 
continuous model and then individually adjusting 
the speed of each job appropriately to adjacent 
levels in set SD. The time complexity is O(n?). 


Key Results 


The work of Li and Yao finds a direct approach 
for solving the MEDDVS problem without first 
computing the optimal schedule for the continu- 
ous model. 


Definition 1 An s-schedule for J is a schedule 
which conforms to the EDF principle and uses 
constant speed s in executing any job of J. 


Lemma 1 The s-schedule for J can be com- 
puted in O(n logn) time. 


Definition 2 Given a job set J and any speed s, 
let J=* and J <* denote the subset of J consisting 
of jobs whose executing speeds are >s and <s, 
respectively, in the optimal schedule for J in 
the continuous model. The partition J=*°, J<* is 
referred to as the s-partition of J. 


By extracting information from the s-schedule, 
a partition algorithm is designed to prove the 
following lemma: 


Lemma 2 The s-partition of J can be computed 
in O(n logn) time. 


By applying s-partition to J using all the d 
speeds in SD consecutively, one can obtain d 
subsets Jj, J2,...,Jq of J where jobs in the 
same subset J; use the same two speeds s; and 
S;4+1 in the optimal schedule for the Discrete 
Model (sg41 = 0). 


Lemma 3 Optimal schedule for job set J; using 
speeds s; and s;41 can be computed in O(n logn) 
time. 


Combining the above three lemmas together, the 
main theorem follows: 


Theorem 1 The min-energy discrete DVS sched- 
ule can be computed in O(dn log n) time. 


Voltage Scheduling 


A lower bound to compute the optimal schedule 
for the discrete model under the algebraic deci- 
sion tree model is also shown by Li and Yao. 


Theorem 2 Any deterministic algorithm for 
computing min-energy discrete DVS schedule 
with d > 2 voltage levels requires O(n logn) 
time for n jobs. 


Applications 


Currently, dynamic voltage scaling technique is 
being used by the world’s largest chip companies, 
e.g., Intel’s SpeedStep technology and AMD’s 
PowerNow technology. Although the schedul- 
ing algorithms being used are mostly online al- 
gorithms, offline algorithms can still find their 
places in real applications. Furthermore, the tech- 
niques developed in the work of Li and Yao for 
the computation of optimal schedules may have 
potential applications in other areas. 

People also study energy-efficient scheduling 
problems for other kinds of job sets. Yun and 
Kim [10] proved that it is NP-hard to compute the 
optimal schedule for jobs with priorities and gave 
an FPTAS for that problem. Aydin et al. [1] con- 
sidered energy-efficient scheduling for real-time 
periodic jobs and gave an O(n” logn) scheduling 
algorithm. Chen et al. [4] studied the weakly dis- 
crete model for non-preemptive jobs where speed 
is not allowed to change during the execution of 
one job. They proved the NP-hardness to compute 
the optimal schedule. 

Another important application for this work is 
to help investigating scheduling model with more 
hardware restrictions (Burd and Brodersen [3] 
explained various design issues that may happen 
in dynamic voltage scaling). Besides the single- 
processor model, people are also interested in the 
multiprocessor model [11]. 


Open Problems 


A number of problems related to the work 
of Li and Yao remain open. In the discrete 
model, Li and Yao’s algorithm for computing 
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the optimal schedule requires time O(dn logn). 
There is a gap between this and the currently 
known lower bound Q(n login). Closing this gap 
when considering d as a variable is an open 
problem. 

Another open research area is the computa- 
tion of the optimal schedule for the continu- 
ous model. Li, Yao, and Yao [8] obtained an 
O(n” log n) algorithm for computing the optimal 
schedule. The bottleneck for the log n factor 
is in the computation of s-schedules. Reducing 
the time complexity for computing s-schedules 
is an open problem. It is also possible to look 
for other methods to deal with the continuous 
model. 
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Problem Definition 


Suppose there is some set of objects p called sites 
that exert influence over their surrounding space, 
M. For each site p, we consider the set of all 
points z in M for which the influence of p is 
strongest. 

Such decompositions have already been con- 
sidered by R. Descartes [5] for the fixed stars 
in solar space. In mathematics and computer 
science, they are called Voronoi diagrams, hon- 
oring work by G.F. Voronoi on quadratic forms. 
Other sciences know them as domains of ac- 
tion, Johnson-Mehl model, Thiessen polygons, 
Wigner-Seitz zones, or medial axis transform. 

In the case most frequently studied, the space 
M is the real plane, the sites are n points, and 
influence corresponds to proximity in the Eu- 
clidean metric, so that the points most strongly 
influenced by site p are those for which p is 
the nearest neighbor among all sites. They form 
a convex region called the Voronoi region of p. 
The common boundary of two adjacent regions 
of p and q is a segment of their bisector B(p, q), 
the locus of all points of equal distance to p and 
q. An example of 10 point sites is depicted in 
Fig. 1. 

Let us assume that the set S of point sites is 
in general position, so that no three points are 
situated on a line, and no four on a circle. Then 
the Voronoi diagram V(S') of S is a connected 
planar graph. Its vertices are those points in the 
plane which have three nearest neighbors in S, 
while the interior edge points have two. As a 
consequence of the Euler formula, V(S) has only 
O(n) many edges and vertices. 

If we connect with line segments, those sites in 
S' whose Voronoi regions share an edge in V(S), 
a triangulation D(S) of S results, called the 
Delaunay triangulation or Dirichlet tessellation; 
see Fig. 1. Each triangle with vertices p,q,r in 
S' is dual to a vertex v of V(S) situated on the 
boundary of the Voronoi regions of p,q, and r. 
Because p,q,r are the nearest neighbors of v in 
S', the circle through p, q,r centered at v contains 
no other point of S. Thus, D(S) consists of 
triangles with vertices in S whose circumcircles 
are empty of points in S; see Fig. 2. Conversely, 
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Voronoi Diagrams and 
Delaunay 
Triangulations, Fig. 1 
Voronoi diagram and 
Delaunay triangulation of 
10 point sites in the 
Euclidean plane 


Voronoi Diagrams and 
Delaunay 
Triangulations, Fig. 2 
The empty circle property 


each triangle with empty circumcircle occurs in 
D(S). 

Given a set S' of n point sites, the problem is 
to efficiently construct one of V(S) or D(S); the 
dual structure can then easily be obtained in linear 
time. 


Generalizations 

Voronoi diagrams can be generalized in sev- 
eral ways. Instead of point sites, other geomet- 
ric objects can be considered. One can replace 
the Euclidean distance with distance measures 
more suitable to model a given situation. Instead 
of forming regions of all points that have the 
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same nearest site, one can consider higher-order 
Voronoi diagrams where all points share a region 
for which the nearest k sites are the same, for 
some k between 2 and n — 1. Many more variants 
can be found in [9] and [1]. Abstract Voronoi 
diagrams provide a unifying framework for some 
of the variants mentioned; see the corresponding 
chapter in this encyclopedia. 


Key Results 


Quite a few algorithms for constructing the 
Voronoi diagram or the Delaunay triangulation 
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Voronoi Diagrams and 
Delaunay 
Triangulations, Fig. 3 
The sweepline advancing 
to the right 


of m points in the Euclidean plane have been 
developed. 


Divide and Conquer 

The first algorithm was presented in the sem- 
inal paper [11], which gave birth to the field 
of computational geometry. It applies the divide 
and conquer paradigm. Site set S is split by a 
line into subsets L and R of equal cardinality. 
After recursively computing V(L) and V(R), one 
needs to compute the bisector B(L, R), the locus 
of all points in the plane that have a nearest 
neighbor in L and in R. This bisector is an 
unbounded monotone polygonal chain. In time 
O(n) one can find a starting segment of B(L, R) 
at infinity, and trace the chain through V(L) and 
V(R) simultaneously. Thus, the algorithm runs 
in time O(nlogn) and linear space, which is 
optimal. 


Sweep 

How to design a left-to-right sweepline algorithm 
for constructing V(S) is not obvious. When the 
advancing sweepline H enters the Voronoi region 
of p before site p has been detected, it is not clear 
how to correctly maintain the Voronoi diagram 
along H. This difficulty has been overcome in [7] 
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by applying a transformation that ensures that 
each site is the leftmost point of its Voronoi 
region. In [10] and [4], a more direct version of 
this approach was suggested. At each time during 
the sweep, one maintains the Voronoi diagram of 
all point sites to the left of sweepline H, and of 
H7 itself, which is considered a site of its own; 
see Fig.3. Because the bisector of a point and 
a line is a parabola, the Voronoi region of H 
is bounded by a connected chain of parabolic 
segments, called the wavefront W. As H moves 
to the right, W follows at half the speed. Each 
point z to the left of W is closer to some point 
site p left of H than to H and, all the more, 
to all point sites to the right of H that are yet 
to be discovered. Thus, the Voronoi regions of 
the point sites to the left of W keep growing, 
as sweepline H proceeds, along the extensions 
of Voronoi edges beyond W; these spikes are 
depicted by dashed lines in Fig. 3. 

There are two kinds of events one needs to 
handle during the sweep. When sweepline H 
hits a new point site (like point pe in Fig. 3), 
a new wave separating this point site from H 
must be added to W. When wavefront W hits 
the intersection of two neighboring spikes, the 
wave between them must be removed from W; 
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this will first happen in Fig.3 when W’ arrives 
at v’. Intersections between neighboring spikes 
can be determined as in the standard line segment 
intersection algorithm [2]. There are only O(n) 
many events, one for each point site and one for 
each Voronoi vertex v of V(S). Since wavefront 
W is always of linear size, the sweepline algo- 
rithm runs in O(n log7) time using linear space. 


Reduction to Convex Hull 

A rather different approach [3] obtains the De- 
launay triangulation in dimension 2 from the 
convex hull in dimension 3, which can itself 
be constructed in time O(n logn). As suggested 
in [6], one vertically lifts the point sites to the 
paraboloid Z = X? + Y? in 3-space. The lower 
convex hull of the lifted points, projected onto 
the X Y -plane, equals the Delaunay triangulation, 
D(S). 


Incremental Construction 

Another, very intuitive algorithm first suggested 
in [8] constructs the Delaunay triangulation in- 
crementally. In order to insert a new point site p; 
into an existing Delaunay triangulation D(S;_), 
one first finds the triangle containing p; and 
connects p; to its vertices by line segments. 
Should p; be contained in the circumcircles of 
adjacent triangles, the Delaunay property must be 
restored by edge flips that replace the common 
edge of two adjacent triangles T, T’ by the other 
diagonal of the convex quadrilateral formed by 
T and T’. If the insertion sequence of the p; is 
randomly chosen, a running time in O(n logn) 
can be expected. Details on all algorithms can be 
found in [1]. 


Applications 


Although of linear size, Voronoi diagram and 
Delaunay triangulation contain a lot of informa- 
tion on the point set S. Once V(S) or D(S) are 
available, quite a few distance problems can be 
solved very efficiently. We mention only the most 
basic applications here and refer to [1] and [9] for 
further reading. 
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By definition, the Voronoi diagram reduces the 
post office or nearest neighbor problem to a point 
location problem: given an arbitrary query point 
z, the site in S nearest to z can be found by deter- 
mining the Voronoi region containing z. In order 
to find the largest empty circle whose center z lies 
inside a convex polygon C over m vertices, one 
needs to inspect only three types of candidates 
for z, the vertices of V(S), the intersections of 
the edges of V(S) with the boundary of C, and 
the vertices of C. All these can be done in time 
O(n+m). 

If the site set S is split into subsets L and R, 
then the closest pair p € L andgq € R forms an 
edge of the Delaunay triangulation D(S) (which 
crosses the Voronoi edge separating the regions of 
p and q). This fact has nice consequences. First, 
the nearest neighbor of a site p € S must be one 
of its neighboring vertices in D(S). Hence, all 
nearest neighbors and the closest pair in S can 
be found in linear time once D(S) is available, 
because D(S') has only O(n) many edges. Sec- 
ond, D(S) contains the minimum spanning tree 
of S, which can be extracted from D(S) in linear 
time. 

Remarkable and useful is the equiangularity 
property of D(S). Of all (exponentially many) 
triangulations of S, the Delaunay triangulation 
maximizes the ascending sequence of angles oc- 
curring in the triangles, with respect to lexico- 
graphic order. In particular, the minimum angle 
is as large as possible. In fact, if a triangulation is 
not Delaunay, it must contain two adjacent trian- 
gles, such that the circumcircle of one contains 
the third vertex of the other. By flipping their 
common edge, a new triangulation with larger 
angles is obtained. 


Cross-References 


3D Conforming Delaunay Triangulation 
> Abstract Voronoi Diagrams 
Delaunay Triangulation and Randomized Con- 
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Problem Definition 


The traditional use of locking to maintain con- 
sistency of shared data in concurrent programs 
has a number of disadvantages related to soft- 
ware engineering, robustness, performance, and 
scalability. As a result, a great deal of research 
effort has gone into nonblocking synchronization 
mechanisms over the last few decades. 

Herlihy’s seminal paper Wait-Free Syn- 
chronization [12] studied the problem of 
implementing concurrent data structures in 
a wait-free manner, i.e., so that every operation 
on the data structure completes in a finite number 
of steps by the invoking thread, regardless of how 
fast or slow other threads run and even if some 
or all of them halt permanently. Implementations 
based on locks are not wait-free because, while 
one thread holds a lock, others can take an 
unbounded number of steps waiting to acquire 


© Springer Science+Business Media New York 2016 
M.-Y. Kao (ed.), Encyclopedia of Algorithms, 
DOI 10.1007/978-1-4939-2864-4 


the lock. Thus, by requiring implementations to 
be wait-free, some of the disadvantages of locks 
may potentially be eliminated. 

The first part of Herlihy’s paper examined the 
power of different synchronization primitives for 
wait-free computation. He defined the consensus 
number of a given primitive as the maximum 
number of threads for which we can solve wait- 
free consensus using that primitive (together with 
read-write registers). The consensus problem re- 
quires participating threads to agree on a value 
(e.g., true or false) amongst values proposed by 
the threads. The ability to solve this problem is 
a key indicator of the power of synchronization 
primitives because it is central to many natural 
problems in concurrent computing. For exam- 
ple, in a software transactional memory system, 
threads must agree that a particular transaction 
either committed or aborted. 

Herlihy established a hierarchy of synchro- 
nization primitives according to their consensus 
number. He showed (i) that the consensus 
number of read-write registers is 1 (so wait- 
free consensus cannot be solved for even two 
threads), (i) that the consensus number of 
stacks and FIFO queues is 2, and (iii) that 
there are so-called universal primitives, which 
have consensus number oo. Common examples 
include compare-and-swap (CAS) and 
the load-linked/store-conditional 
(LL/SC) pair. 

There are a number of papers which examine 
Herlihy’s hierarchy in more detail. These show 
that seemingly minor variations in the model or 
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in the semantics of primitives can have a sur- 
prising effect on results. Most of this work is 
primarily of theoretical interest. The key practi- 
cal point to take away from Herlihy’s hierarchy 
is that we need universal primitives to support 
effective wait-free synchronization in general. 
Recognizing this fact, all modern shared-memory 
multiprocessors provide some form of universal 
primitive. 

Herlihy additionally showed that a solution to 
consensus can be used to implement any shared 
object in a wait-free manner, and thus that any 
universal primitive suffices for this purpose. 
He demonstrated this idea using a so-called 
universal construction, which takes sequential 
code for an object and creates a wait-free 
implementation of the object using consensus 
to resolve races between concurrent operations. 
Despite the important practical ramifications of 
this result, the universal construction itself was 
quite impractical. The basic idea was to build 
a list of operations, using consensus to determine 
the order of operations, and to allow threads 
to iterate over the list applying the operations 
in order to determine the current state of the 
object. The construction required O(N*) space to 
ensure enough operations are retained to allow 
the current state to be determined. It was also 
very slow, requiring many threads to recompute 
the same information, and thus preventing 
parallelism between operations in addition. 

Later, Herlihy [13] presented a more concrete 
universal construction based on the LL/SC in- 
struction pair. This construction required N + 1 
copies of the object for N threads and still did 
not admit any parallelism; thus it was also not 
practical. Despite this, work following on from 
Herlihy’s has brought us to the point today that 
we can support practical programming models 
that provide nonblocking implementations of ar- 
bitrary shared objects. The remainder of this 
chapter discusses the state of nonblocking syn- 
chronization today, and mentions some history 
along the way. 


Weaker Nonblocking Progress Conditions 
Various researchers, including us, have had 
some success attempting to overcome the 


Wait-Free Synchronization 


disadvantages of Herlihy’s wait-free construc- 
tions. However, the results remain impractical 
due to excessive overhead and overly complicated 
algorithms. In fact, there are still no nontrivial 
wait-free shared objects in widespread practical 
use, either implemented directly or using 
universal constructions. 

The biggest advances towards practicality 
have come from considering weaker progress 
conditions. While theoreticians worked on 
wait-free implementations, more pragmatic 
researchers sought lock-free implementations 
of shared objects. A lock-free implementation 
guarantees that, after a finite number of steps 
of any operation, some operation completes. 
In contrast to wait-free algorithms, it is in 
principle possible for one operation of a lock- 
free data structure to be continually starved by 
others. However, this rarely occurs in practice, 
especially because contention control techniques 
such as exponential backoff [1] are often used to 
reduce contention when it occurs, which makes 
repeated interference even more unlikely. Thus, 
the lack of a strong progress guarantee like wait- 
freedom has often been found to be acceptable 
in practice. 

The observation that weaker nonblocking 
progress conditions allow simpler and more prac- 
tical algorithms led Herlihy et al. [15] to define 
an even weaker condition: An obstruction-free 
algorithm does not guarantee that an operation 
completes unless it eventually encounters no 
more interference from other operations. In our 
experience, obstruction-free algorithms are easier 
to design, simpler, and faster in the common 
uncontended case than lock-free algorithms. The 
price paid for these benefits is that obstruction- 
free algorithms can “livelock”, with two or more 
operations repeatedly interfering with each other 
forever. This is not merely a theoretical concern: 
it has been observed to occur in practice [16]. 
Fortunately, it is usually straightforward to 
eliminate livelock in practice through contention 
control mechanisms that control and manipulate 
when operations are executed to avoid repeated 
interference. 

The obstruction-free approach to synchro- 
nization is thus to design simple and efficient 
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algorithms for the weak obstruction-free progress 
condition, and to integrate orthogonal contention 
control mechanisms to facilitate progress 
when necessary. By largely separating the 
difficult issues of correctness and progress, 
we significantly ease the task of designing 
effective nonblocking implementations: the 
algorithms are not complicated by tightly 
coupled mechanisms for achieving lock-freedom, 
and it is easy to modify and experiment with 
contention control mechanisms because they are 
separate from the algorithm and do not affect its 
correctness. We have found this approach to be 
very powerful. 


Transactional Memory 

The severe difficulty of designing and verifying 
correct nonblocking data structures has led re- 
searchers to investigate the use of tools to produce 
them, rather than designing them directly. In 
particular, transactional memory [5, 17, 23] has 
emerged as a promising direction. Transactional 
memory allows programmers to express sections 
of code that should be executed atomically, and 
the transactional memory system (implemented 
in hardware, software, or a combination of the 
two) is responsible for managing interactions 
between concurrent transactions to ensure this 
atomicity. Here we concentrate on software trans- 
actional memory (STM). 

The progress guarantee made by a concurrent 
data structure implemented using STM depends 
on the STM implementation. It is possible 
to characterize the progress conditions of 
transactional memory implementations in 
terms of a system of threads in which 
each operation on a shared data structure 
is executed by repeatedly attempting to 
apply it using a transaction until an attempt 
successfully commits. In this context, say 
the transactional memory implementation is 
obstruction-free if it guarantees that, if a thread 
repeatedly executes transactions and eventually 
encounters no more interference from other 
threads, then it eventually successfully commits 
a transaction. 
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Key Results 


This section briefly discusses some of the most 
relevant results concerning nonblocking synchro- 
nization, and obstruction-free synchronization in 
particular. 


While progress towards practicality was made 
with lock-free implementations of shared objects 
as well as lock-free STM systems, this progress 
was slow because simultaneously ensuring cor- 
rectness and lock-freedom proved difficult. Be- 
fore the introduction of obstruction-freedom, the 
lock-free STMs still had some severe disadvan- 
tages such as the need to declare and initialize 
all memory to be accessed by transactions in 
advance, the need for transactions to know in ad- 
vance which memory locations they will access, 
unacceptable constraints on the layout of such 
memory, etc. 


In addition to the work on tools such as STM 
for building nonblocking data structures, there 
has been a considerable amount of work on direct 
implementations. While this work has not yielded 
any practical wait-free algorithms, a handful of 
practical lock-free implementations for simple 
data structures such as queues and stacks have 
been achieved [21, 24]. There are also a few 
slightly more ambitious implementations in the 
literature that are arguably practical, but the al- 
gorithms are complicated and subtle, many are 
incorrect, and almost none has a formal proof. 
Proofs for such algorithms are challenging, and 
minor changes to the algorithm require the proofs 
to be redone. 


The next section, discusses some of the re- 
sults that have been achieved by applying the 
obstruction-free approach. The remainder of this 
section, briefly discusses a few results related to 
the approach itself. 


An important practical aspect of using an 
obstruction-free algorithm is how contention 
is managed when it arises. In introducing 
obstruction-freedom, Herlihy et al. [15] ex- 
plained that contention control is necessary to fa- 
cilitate progress in the face of contention because 
obstruction-free algorithms do not directly make 
any progress guarantee in this case. However, 
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they did not directly address how contention 
control mechanisms could be used in practice. 

Subsequently, Herlihy et al. [16] presented 
a dynamic STM system (see next section) that 
provides an interface for a modular contention 
manager, allowing for experimentation with 
alternative contention managers. Scherer and 
Scott [22] experimented with a number of 
alternatives, and found that the best contention 
manager depends on the workload. Guerraoui 
et al. [9] described an implementation that 
supports changing contention managers on 
the fly in response to changing workload 
conditions. 

All of the contention managers discussed 
in the above-mentioned papers are ad hoc 
contention managers based on intuition; no 
analysis is given of what guarantees (if any) 
are made by the contention managers. Guerraoui 
et al. [10] made a first step towards a formal 
analysis of contention managers by showing that 
their Greedy contention manager guarantees 
that every transaction eventually completes. 
However, using the Greedy contention manager 
results in a blocking algorithm, so their proof 
necessarily assumes that threads do not fail while 
executing transactions. 

Fich et al. [7] showed that any obstruction- 
free algorithm can be automatically transformed 
into one that is practically wait-free in any real 
system. “Practically” is said because the wait-free 
progress guarantee depends on partial synchrony 
that exists in any real system, but the transformed 
algorithm is not technically wait-free, because 
this term is defined in the context of a fully 
asynchronous system. Nonetheless, an algorithm 
achieved by applying the transformation of Fich 
et al. to an obstruction-free algorithm does guar- 
antee progress to non-failed transactions, even if 
other transactions fail. 

Work on incorporating contention manage- 
ment techniques into obstruction-free algorithms 
has mostly been done in the context of STM, 
so the contention manager can be called 
directly from the STM implementation. Thus, 
the programmer using the STM need not be 
concerned with how contention management 
is integrated, but this does not address how 
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direct implementations of obstruction-free data 
structures. 

One option is for the programmer to manually 
insert calls to a contention manager, but this 
approach is tedious and error prone. Guerraoui 
et al. [11] suggested a version of this approach 
in which the contention manager is abstracted 
out as a failure detector. They also explored what 
progress guarantees can be made by what failure 
detectors. 

Attiya et al. [4] and Aguilera et al. [2] sug- 
gested changing the semantics of the data struc- 
ture’s operations so that they can return a special 
value in case of contention, thus allowing con- 
tention management to be done outside the data 
structure implementation. These approaches still 
leave a burden on the programmer to ensure that 
these special values are always returned by an 
operation that cannot complete due to contention, 
and that the correct special value is returned 
according to the prescribed semantics. 

Another option is to use system support to en- 
sure that contention management calls are made 
frequently enough to ensure progress. This sup- 
port could be in the form of compiled-in calls, 
runtime support, signals sent upon expiration of 
a timer, etc. But all of these approaches have dis- 
advantages such as not being applicable in gen- 
eral purpose environments, not being portable, 
etc. 

Given that it remains challenging to design 
and verify direct obstruction-free implementa- 
tions of shared data structures, and that there are 
disadvantages to the various proposals for inte- 
grating contention control mechanisms into them, 
using tools such as STMs with built-in contention 
management interfaces is the most convenient 
way to build nonblocking data structures. 


Applications 


The obstruction-free approach to nonblocking 
synchronization was introduced by Herlihy 
et al. [15], who used it to design a double-ended 
queue (deque) based on the widely available CAS 
instruction. All previous nonblocking deques 
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either require exotic synchronization instructions 
such as double-compare-and-swap 
(DCAS), or have the disadvantage that operations 
at opposite ends of the queue always interfere 
with each other. 

Herlihy et al. [16] introduced Dynamic STM 
(DSTM), the first STM that is dynamic in the fol- 
lowing two senses: new objects can be allocated 
on the fly and subsequently accessed by transac- 
tions, and transactions do not need to know in 
advance what objects will be accessed. These two 
advantages made DSTM much more useful than 
previous STMs for programming dynamic data 
structures. As a result, nonblocking implementa- 
tions of sophisticated shared data structures such 
as balanced search trees, skip lists, dynamic hash 
tables, etc. were suddenly possible. 

The obstruction-free approach played a key 
role in the development of both of the results 
mentioned above: Herlihy et al. [16] could 
concentrate on the functionality and correctness 
of DSTM without worrying about how to achieve 
stronger progress guarantees such as_ lock- 
freedom. 

The introduction of DSTM and of the 
obstruction-free approach have led to numerous 
improvements and variations by a number of 
research groups, and most of these have similarly 
followed the obstruction-free approach. However, 
Harris and Fraser [8] presented a dynamic STM 
called OSTM with similar advantages to DSTM, 
but it is lock-free. Experiments conducted at 
the University of Rochester [20] showed that 
DSTM outperformed OSTM by an order of 
magnitude on some workloads, but that OSTM 
outperformed DSTM by a factor of 2 on others. 
These differences are probably due to various 
design decisions that are (mostly) orthogonal to 
the progress condition, so it is not clear what we 
can conclude about how the choice of progress 
condition affects performance in this case. 

Perhaps a more direct comparison can be 
made between another pair of algorithms, again 
an obstruction-free one by Herlihy et al. [14] 
and a similar but lock-free one by Harris and 
Fraser [8]. These algorithms, invented indepen- 
dently of each other, implement MCAS (CAS 
generalized to access M independently chosen 
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memory locations). The two algorithms are very 
similar, and a close comparison revealed that the 
only real differences between them were due to 
Harris and Fraser’s desire to have a lock-free im- 
plementation. As a result of this, their algorithm 
is somewhat more complicated, and also requires 
a minimum of 3M + 1 CAS operations, whereas 
the algorithm of Herlihy et al. [14] requires only 
2M + 1. The authors are unaware of any direct 
performance comparison of these algorithms, 
but they believe the obstruction-free one would 
outperform the lock-free one, particularly in the 
absence of conflicting MCAS operations. 


Open Questions 


Because transactional memory research has 
grown out of research into nonblocking data 
structures, it was long considered mandatory 
for STM implementations to support the 
development of nonblocking data structures. 
Recently, however, a number of researchers have 
observed that at least the software engineering 
benefits of transactional memory can be delivered 
even by a blocking STM. There are ongoing 
debates whether STM needs to be nonblocking 
and whether there is a fundamental cost to being 
nonblocking. 

While we agree that blocking STMs are con- 
siderably easier to design, and that in many cases 
a blocking STM is acceptable, this is not always 
true. Consider, for example, an interrupt handler 
that shares data with the interrupted thread. The 
interrupted thread will not run again until the 
interrupt handler completes, so it is critical that 
the interrupted thread does not block the interrupt 
handler. Thus, if using STM is desired to simplify 
the code for accessing this shared data, the STM 
must be nonblocking. The authors are therefore 
motivated to continue research aimed at improv- 
ing nonblocking STMs and to understand what 
fundamental gap, if any, exists between blocking 
and nonblocking STMs. 

Progress in improving the common-case per- 
formance of nonblocking STMs continues [19], 
and the authors see no reason to believe that non- 
blocking STMs should not be very competitive 
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with blocking STMs in the common case, i.e., un- 
til the system decides that one transaction should 
not wait for another that is delayed (an option that 
is not available with blocking STMs). 

It is conjectured that indeed a separation 
between blocking and nonblocking STMs can 
be proved according to some measure, but that 
this will not imply significant performance 
differences in the common Indeed 
results of Attiya et al. [3] show a separation 
between obstruction-free and blocking algo- 
rithms according to a measure that counts 
the number of distinct base objects accessed 
by the implementation plus the number of 
“memory stalls’, which measure how often 
the implementation can encounter contention 
for a variable from another thread. While this 
result is interesting, it is not clear that it is useful 
for deciding whether to implement blocking or 
obstruction-free objects, because the measure 
does not account for the time spent waiting by 
blocking implementations, and thus is biased in 
their favor. For now, remain optimistic that STMs 
can be made to be nonblocking without paying 
a severe performance price in the common case. 

Another interesting question, which is 
open as far as the authors know, is whether 
there is a fundamental cost to implementing 
stronger nonblocking progress conditions versus 
obstruction-freedom. Again, they conjecture that 
there is. It is known that there is a fundamental 
difference between obstruction-freedom and 
lock-freedom in systems that support only 
reads and writes: It is possible to solve 
obstruction-free consensus but not lock-free 
consensus in this model [15]. While this is 
a fascinating observation, it is mostly irrelevant 
from a practical standpoint as all modern 
shared memory multiprocessors support stronger 
synchronization primitives such as CAS, with 
which it is easy to solve consensus, even wait- 
free. The interesting question therefore is whether 
there is a fundamental cost to being lock-free as 
opposed to obstruction-free in real systems. 

To have a real impact on design directions, 
such results need to address common case per- 


case. 
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formance, or some other measure (perhaps space) 
that is relevant to everyday use. Many lower 
bound results establish a separation in worst- 
case time complexity, which does not necessar- 
ily have a direct impact on design decisions, 
because the worst case may be very rare. So 
far, efforts to establish a separation according 
to potentially useful measures have only led to 
stronger results than we had conjectured were 
possible. In the authors first attempt [18], they 
tried to establish a separation in the number 
of CAS instructions needed in the absence of 
contention to solve consensus, but found that this 
was not a very useful measure, as were able to 
come up with a wait-free implementation that 
avoids CAS in the absence of contention. The 
second attempt [6] was to establish a separation 
according to the obstruction-free step complexity 
measure, which counts the maximum number of 
steps to complete an operation once the opera- 
tion encounters no more contention. They knew 
we could implement obstruction-free DCAS with 
constant obstruction-free step complexity, and 
attempt to prove this impossible for lock-free 
DCAS, but achieved such an algorithm. These 
experiences suggest that, in addition to their di- 
rect advantages, obstruction-free algorithms may 
provide a useful stepping stone to algorithms with 
stronger progress properties. 

Finally, while a number of contention 
managers have proved effective for various 
workloads, it is an open question whether 
a single contention manager can adapt to be 
competitive with the best on all workloads, 
and how close it can come to making optimal 
contention management decisions. Experience to 
date suggests that this will be very challenging 
to achieve. Therefore, as in any system, the 
first priority should be avoiding contention 
in the first place. Fortunately, transactional 
memory has the potential to make this much 
easier than in lock-based programming models, 
because it offers the benefits of fine-grained 
synchronization without the programming 
complexity that accompanies fine-grained 
locking schemes. 
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Problem Definition 


A radio network is modeled as a directed, 
strongly connected graph G with n nodes. 
The nodes of a network G correspond to 
transmitting/receiving wireless devices, and 
directed edges represent their immediately 
reached neighbors: if a node w is within the 
transmission range of a node v, then G contains 
an edge (v,w). We call w an out-neighbor of v 
and v an in-neighbor of w. 

Each node v has a unique label £, from the set 
[VN] = {1,...,N}, where VN = O(n). Initially, 
each node knows only its label and the values of 
nand N. 

The time is divided into discrete time steps. It 
is assumed that nodes have unlimited computing 
power and can perform arbitrary computations 
within one time step. However, only one trans- 
mission or message receipt is allowed in one time 
step. Each node has its own local clock, whose 
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initial value at the time of its activation is 0. All 
local clocks run at the same speed. 

A message M transmitted in time step f by a 
node v is sent instantly to all its out-neighbors. 
However, an out-neighbor w of v successfully 
receives M in time step ¢f only if no collision 
occurred in this time step, that is, if no other in- 
neighbor of w transmits in step ¢. Collision cannot 
be distinguished from the background noise: if w 
does not receive any message in time step f¢, it 
knows that either none of its in-neighbors trans- 
mitted in step ¢, or that at least two did, but it does 
not know which of these two events occurred. 
It is assumed that nodes only transmit wake-up 
signals (to their neighbors); no other messages 
are used. 

A wake-up schedule is a vector w = (@x)xev;, 
where w, denotes the time step in which x wakes- 
up spontaneously. For any set X C V, wy 
denotes the earliest wake-up time step in X, ie., 
Ox = MiNxex @y. Without loss of generality, 
one can assume that wy = minxey @, = 0. A 
wake-up network is the pair (G, w), where G is a 
radio network G and @ is a wake-up schedule. 

A deterministic wake-up protocol W is a func- 
tion that, for each label £ and for each tT = 
1,2,3,..., given all past messages received by 
the node v with label 2, = €, specifies whether v 
will transmit the wake-up signal in time step t 
since its activation. A randomized wake-up pro- 
tocol is defined for each node as a probability dis- 
tribution over the class of deterministic protocols 
for that node. 

The running time of a wake-up protocol W is 
the smallest 7 such that, for any wake-up net- 
work (G, @), all nodes are activated by time T. 


Synchronizers 

All efficient deterministic wake-up algorithms are 
based on the combinatorial notion of a radio 
synchronizer, called also a synchronizer for short 
(see also a simpler notion of related structures 
called selectors [5], efficiently exploited in the 
context of broadcasting in radio networks). 

Let S = {S*} ety}, where each SY = 
S* Sz ...S7 is a 0-1 sequence of length m. The 
set S is a (N, k, m)-synchronizer if it satisfies the 
following property: 
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(*) For any nonempty set X C [N] of cardinal- 
ity at most k, and for any wake-up schedule w, 
there exists t, where wy < t < wy +m, such 
that, 


= 


xEeX 


It is assumed here that S* = 0 fori < 0. 

The set S as above can be interpreted as a 
transmission protocol, where $* = 1 indicates 
that node x transmits in time step w, + 7. Thus 
the condition (*) states that, in at most m time 
steps after the first node in X wakes-up, there 
will be a time step when exactly one node in X 
transmits. 

More details about radio synchronizers and 
synchronization protocols can be found in the 
survey [8]. 


Key Results 


A Deterministic Wake-Up Protocol 


Lemma 1 ([3, 4]) Let C > 31 be an integer 
constant. For each N and k <_ N, there exists 
an (N,k,m)-synchronizer with m = Ck? log N. 


The key ingredient of the wake-up algorithm 
in [3, 4] ([3] is a conference version of [4]) is 
an application of (N,k,m)-synchronizer with 
k = N*'/3 and m = CN? log N, where 
C = 31. The analysis of this algorithm relies 
on the fact that it is sufficient to prove that 
the algorithm satisfies claimed time bounds 
for path graphs. A directed graph His 
called a path graph if the nodes of H can 
be partitioned into sets L;, i = 0O,...,D, 
each with a distinguished node v; € L; and 
the edges of H are of the form (v, v;+1), 
where 0 < i < D andv € L;. Moreover, 
Lp = {vp}. 


Theorem 1 ([3, 4]) There exists a deterministic 
protocol that completes the wake-up process in 
each n-node strongly connected directed graph in 
time O(n>/3 logn). 
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A Randomized Wake-Up Protocol 

In [7], the authors presented a randomized wake- 
up protocol Probability Increase for complete 
networks working in time O(log log(1/e)) with 
probability 1 — € (see [6] for deterministic wake- 
up algorithms for complete graphs). Using this 
protocol along with appropriate formal analysis, 
one can obtain a randomized Monte Carlo wake- 
up protocol for general multi-hop radio networks. 


Theorem 2 ((3, 4]) One can build a random- 
ized protocol which completes wake-up in time 
O(D logn log(n/e)) in each wake-up network 
with n nodes and diameter D with probability at 
least 1 — €. 


The Monte Carlo protocol from Theorem 2 
can be modified to obtain Las Vegas protocol with 
low expected running time. 


Theorem 3 ([3,4]) One can build a randomized 
protocol which completes wake-up in expected 
time O(D log* n) in each wake-up network with 
n nodes and diameter D. 


All the above randomized protocols do not 
require labels. 


Applications 


Universal Synchronizers and Faster 

Wake-Up Protocols 

The notion of a synchronizer has been general- 
ized to a universal synchronizer [1]. 


Let g Nx WN — N be a nondecreasing 
function. Let S = {S*};yj, where each 
S* = S*SF...S% is a O-1 sequence of 


length g(N, N). The set S is a (N, g)-universal 
synchronizer if it satisfies the following property: 


(*) For any nonempty set X C [N] and for 
any wake-up schedule w, there exists t, where 
wx <t <x + g(N,|X)]), such that, 


> Shee = 1 


xeX 


Chlebus and Kowalski proved in [1] that there 
exist (NV, g)-universal synchronizers for g(k) = 
O(k min{k, /n}logn). Using this result they 
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showed that there exists a wake-up protocol run- 
ning in time O(n3/? logn). In [2], Chlebus et 
al. provided an existential proof of the fact that 
much shorter universal synchronizers exist and 
obtained a corresponding faster wake-up proto- 
col. 


Lemma 2 ([2]) For each N_ there exists a 
(N, g)-universal synchronizer for g(N,k) = 
ck log k log N, where c is a fixed constant. 


Theorem 4 ([2]) There exists a deterministic 
protocol that completes the wake-up process in 
each n-node strongly connected directed graph 
in time O(n log? n). 


Leader Election and Clock Synchronization 

In [3,4], applications of wake-up protocols for the 
problems of leader election and clock synchro- 
nization were considered. 

In the leader election problem, the goal 
is to designate one node as the leader, and 
to announce its identity to all nodes in the 
network. In the clock synchronization problem, 
upon the completion of the protocol, all nodes 
must agree on a common global time. For 
clock synchronization, messages may include 
numerical values representing the global 
time. 

It has been shown in [3, 4] that any wake- 
up protocol W (deterministic or randomized) 
can be transformed into a leader election 
protocol or a clock synchronization protocol 
with only a logarithmic overhead. The leader 
election protocol is obtained by an execution of 
appropriately composed O(log) executions of 
a wake-up protocol, in which nodes gradually 
learn consecutive bits of the node with the 
largest label. In the clock synchronization 
protocol, the leader is elected first and then 
it broadcasts its clock state over the whole 
network. 


Open Problems 


The exact complexity of the wake-up problem 
is not known — there is a logarithmic gap 
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between the complexities of the best known 
protocols and lower bounds. No _ efficient 
algorithms for a construction of (universal) 
synchronizers described in Lemmata 1 and 2 
are known (i.e., polynomial time construction 
with a_ polylogarithmic overhead to the 
length), and thus the results from Theorems 1 
and 4 are nonconstructive either. It is not 
known whether the logarithmic overhead in 
the complexity of leader election and clock 
synchronization with respect to wake-up is 
necessary. 
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Problem Definition 


The wavelet tree is a data structure that represents 
a recursive partition of a sequence S of length n 
according to its symbols. Letting ©’ = {1,...0} 
be the alphabet of symbols of S, the wavelet tree 
for S has the root representing S itself and o 
leaves representing the positions of the symbols: 
leaf c € 2 represents all the positions i such 
that S[i] = c and 1 <i < n. The internal 
nodes describe how the symbols are grouped. In 
the original wavelet tree, nodes are binary and 
thus there are two groups, called the 0-group and 
the 1-group, which form an alphabet partition. In 
the multi-ary wavelet tree, the nodes are obtained 
by forming more than two groups each time. We 
focus on binary wavelet trees in the following. 
For example, consider the sequence S$ = 
SENSELESSNESS# in Fig. 1. Here we divide 
the symbols in two groups {E, L}, and {N, S, #}, 
giving rise to the two children of the root: the 
left child contains the subsequence of S ob- 
tained by copying the symbols in {E, L}; the right 
child contains the subsequence of S obtained 
by copying the rest of the symbols (which are 
in {N,S,#}). The partition of {E,L} into {E} 
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and {L} produces two leaves. The partition of 
{N, S, #} into {N} and {S, #} gives rise to a leaf 
and an internal node. The latter represents the 
partition of {S, #} giving rise to two leaves. 

In general each internal node of the wavelet 
tree represents a subsequence S’ of the input se- 
quence S, obtained by selecting certain symbols 
from S. More precisely, if X” is the alphabet for 
the symbols in S’, then S’ is formed by selecting 
all the symbols of S belonging to ©’. Note that 
+’ = {c} if and only if the node storing S’ is 
the leaf whose associated symbol is c. For an 
internal node, its two children are determined by 
the choice of the 0-group 2) and of the 1-group 
partitioning Y’ = V5 U L}. To this end, a 
bitvector Bg, is associated with S’, where the 0’s 
mark which positions of S’ contain symbols from 
X74 and the 1’s mark which positions contain 
symbols from 1-group Y’}. 

It is worth noting that any choice for the 
recursive alphabet partitioning in 0- and 1-groups 
can be translated into a simple dichotomy test at 
each node by suitably reordering the alphabet »’: 
without loss of generality, we assume that given 
a node representing S’ over alphabet ©’, there 
exists a symbol c’ € &” such that c belongs to 
the 0-group XY if and only ifc € XY” andc < c’. 

Finally, the sequences S and S’ can be actually 
dropped from the nodes of the wavelet tree: just 
knowing the symbols from » associated with 
its leaves allows us to reconstruct the dropped 
sequences in its internal nodes as we discuss next. 


Key Results 


Despite its simplicity, the wavelet tree is a versa- 
tile data structure that offers solutions to a variety 
of situations using small additional space. 


¢ Compressed sequences. Sequence S can be 
stored using a number of bits close to the 
0-order entropy and still supporting random 
access and other operations such as rank and 
select of individual symbols. 

* Geometric points and 2D data. The leaves rep- 
resent the individual points in x-order, while 
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Wavelet Trees, Fig.1 A 
wavelet tree for the 
sequence 

S = SENSELESSNESS# 
with ©’ = {E, L,N, S, #}. 
Only the symbols in the 
leaves and the bitvectors 
are actually stored 


the root represents the same points but in y- 
order. Range and percentile queries can be 
performed in this way. 

¢ Permutations, shufflings, and reorderings. The 
leaves represent the elements in a certain order 
and the root represents the same set in a 
permuted order. Mapping these two orders can 
be done efficiently. 


As for the construction of the wavelet tree, it 
can be easily done in O(n logo) time and space. 
More sophisticated algorithms have been devel- 
oped to lower the construction time and/or the 
additional working space. In the following, we fo- 
cus here on the usage of the original wavelet tree. 


Compressed Sequences 

The first natural question is how to access 
symbol S[i] from the wavelet tree for S. We can 
only use the bitvectors Bs, in the internal nodes 
and the mapping from the leaves to the symbols 
of &’. We start out from the root, check bit Bs [i], 
and count the number i’ of bits equal to Bs[i] in 
the first 7 positions of Bs. After that, we repeat 
the step on the left child (if Bs[i] = 0) or the 
right child (if Bs[i] = 1), setting the new value 
of i = i’ and using the bitvector By of that child. 
Eventually we end up in a leaf, and the symbol c 
corresponding to that leaf gives the answer that 
Sli] = c. 
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Regarding the time and space complexity, we 
need an operation to count how many 1’s occur in 
the first 7 positions of a bitvector (as the number 
of 0’s can be obtained by subtracting this count 
from 7). This operation is called rank in the lit- 
erature and takes constant time by preprocessing 
the bitvector and adding a little-oh number of bits 
to it. In this way, the cost of access operation is 
given by the height of the wavelet tree, which is 
O(log o) in case of a balanced shape. 

As for the space complexity, note that any 
binary tree shape with o leaves is feasible. Using 
a Huffman tree shape, less frequent symbols cor- 
respond to deeper leaves. Note that each symbol 
occurrence in a leaf can be charged a bit from 
each bitvector in its ancestors. Equivalently, the 
sum of the lengths of the bitvectors in all the in- 
ternal nodes of the wavelet tree is equal to the sum 
of the lengths of the Huffman encodings of the 
symbols of the input sequence S. In other words, 
the space required by the bitvectors is equal to 
the space achieved by the Huffman encoding. The 
additional rank data structures use little-oh of 
that space. Letting Ho < logo (logarithms in 
base 2) denote the 0-order entropy of S, the total 
space to store a wavelet tree is therefore n Ho + 
o(nHo) bits, which can be lowered to nHp + 
o(n) with additional machinery. For compressible 
sequences S, this is better than storing them in 
n logo bits with the standard format. In general, 
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any prefix-free encoding of the symbols in » can 
be used in place of the Huffman coding, giving 
the same number of bits of the chosen encoding: 
it suffices to choose as a shape the resulting prefix 
tree of the chosen encoding of the symbols in 
». (Note that storing the tree shape and symbol 
mappings requires O(o logo) further bits.) 

Interestingly, the above space bound can 
be obtained using any shape if the bitvectors 
are stored in compressed format. For example, 
one compressed bitvector representation stores 
a bitvector of length m with k 1’s using the 
theoretic information minimum of log (7?) + 
o(m) bits and supports constant time rank and 
select operations. The latter operation returns 
the position of the jth 1 in the bitvector (same 
for the jth 0). It can be shown that for any shape 
of the wavelet tree, summing the log (7) + o(m) 
contribution of all the bitvectors in its nodes still 
gives a total space of n Hp + o(n) bits to store the 
wavelet tree. In other words, the 0-order entropy 
bound can be achieved independently of the tree 
shape. 

As a by-product of what we discussed above, 
the wavelet tree allows us to extend the rank 
and select operations from a bitvector to any 
sequence over an alphabet %'. To see why, sup- 
pose we want to know how many occurrences of 
symbols c occur in the first i positions of S. We 
perform the same steps as described above for the 
access operation (where we initially set i’ = 7) 
except that now we already know that the path to 
follow is from the root to the leaf representing c. 
In the generic step, we can easily test if c belongs 
to the O- or 1-group of the current node, and 
branch according to the target leaf, updating the 
value of i’. However, when we reach the leaf for 
c, we have to return the corresponding value of 
i’ as the answer for rank of c, since it tells 
how many c’s are up to position 7. As for the 
select operation on c, suppose that we want 
to identify the jth occurrence of c in S. This 
time we proceed from the leaf corresponding to 
c backwards to the root. We initially set i’ = j 
and then reverse the branching process: at the 
generic step, we are in a node storing S’ and on 
position i’. We reach the parent p of the current 
node, and select the i’th 0 (if arriving from the 
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left child) or the i’th 1 (if arriving from the right 
child) in the bitvector stored in p. We set i’ to be 
the resulting position and iterate. Eventually we 
reach the root and return the current value of i’ as 
the answer for select of c. 

Time cost is proportional to the height of the 
wavelet tree, which is O(loga) in case of a 
balanced shape. Using multi-ary wavelet trees, 
the height can be reduced and so does the cost, 
achieving O(1 + loga/logw) time with a word 
size w = Q2(logn). 


Geometric Points and Two-Dimensional 

Data 

Given a set of n points (x;, y;) in the plane, or 
equivalently 2D data, where x1 < X2 < -::Xn, 
we can use the wavelet tree as a space-efficient 
data structure for storing and querying them. We 
store these coordinates in two vectors X and Y, 
such as X[i] = x; and Y[i] = y; forl <i <n. 
We then build the wavelet tree where S = Y and 
»/ is the set of distinct values in Y. 


As a result, we obtain a compacted hierar- 
chical space decomposition for the n points. To 
see why, we can conceptually think of the n 
points as belonging to a n x n grid stored in 
the root of the wavelet tree, where the actual 
coordinates are those stored in X and Y. The 
geometric interpretation of alphabet partitioning 
in O- and 1-groups is that of choosing a value 
y’ and splitting the points in two groups, those 
having coordinate y; < y’ (the 0-group) and 
those having y; > y’ (the 1-group). Let no be 
the size of the 0-group for the root and 1, be the 
size of the 1-group, where n = no + 71. In each 
group, only the rows and the columns that still 
contain points survive. As a result, two grids of 
Size Ng X No and ny, X ny, are produced from one 
of size n xn. Here, the left child corresponds to a 
subsequence Y; of Y that represents the mo points 
(with y; < y’) in the grid of size no x no, and 
the right child represents the sequence Y/ of 71 
points (with y; > y’) in the grid of size ny x ny. 
The leaves of the wavelet tree are in y-order, and 
each of them stores the points sharing the same 
y-coordinate. 

As for the storage, we observe that the values 
in X can be represented compactly as they are in 
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x-order. The values of Y do not need to be stored 
separately as they are represented in the wavelet 
tree. Total space is O(n log 1) bits but can be less 
if the sequences of coordinates are compressible. 
The supported operations exploit the afore- 
mentioned hierarchical space decomposition for 
the two-dimensional data as illustrated next. For 
example, consider the classical 2D range query, 
reporting (or counting) the points contained in the 
range [a...b]x|[c...d]. If the current cutting co- 
ordinate y’ is outside the range [c ...d], we move 
to one of the two children; otherwise, we branch 
on both children, using respectively the ranges 
[a...b]x[c...y’] and [a...b] x (y’... d]. Note 
that the restriction [a...b] on the x-coordinates 
can be used to test if the grid represented by the 
reached child has a nonempty intersection with 
the range: the mechanism is the same as that of 
rank. This means that each reported occurrence 
potentially requires a traversal down to a leaf; 
thus the cost is proportional to the number of 
reported points times the height of the wavelet 
tree. A refined version of this idea allows for 
O(log n/ log log n) time for a counting query. 
Another interesting use is quantile queries. For 
the range [a ...b], consider the values in Vay = 
{yi | a < x; < b} obtained as the y-coordinates 
of the points in that range. For any given a, b, 
and k, the query asks to find the kth element in 
Vap. Using the above wavelet tree, we can find the 
rank ig of a and that i, of b. Then, using rank 
operations on the bitvector Bs in the root, we can 
count how many 0’s and 1’s are in Bs|ig... ip]. 
If there are at least kK 0’s, we know that the kth 
value in Vz, is smaller than or equal than the 
cutting coordinate y’, and we iterate in the left 
child; otherwise, we subtract the number of 0’s 
from k, and we iterate in the right child. When 
we reach a leaf, we return the associated value as 
the answer for the quantile query. Along the same 
lines, we can also report the topmost & values in 
Vap. Once again, the cost is proportional to the 
wavelet tree height for each reported value. 


Permutations, Shufflings, and Reorderings 

The above discussion brings the combinatorial 
structure of wavelet tree to light as we can store 
two orders inside it: the former is the order in 
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the sequence stored at the root, and the latter is 
obtained by a left-to-right traversal of the leaves. 
The bitvectors are internal routers that guide how 
elements are permuted and shuffled to produce 
the reordering. Mergesort can be modeled ac- 
cording to this view: the internal nodes merge 
the content of their children, and the bitvectors 
tell who goes where in the resulting merged 
reordering. 

An immediate application of the above ob- 
servations is setting S to be a permutation z 
of the integers in {1,2,...,}. Traversing the 
wavelet tree upward (as in the select) com- 
putes z(i), while traversing it downward com- 
putes the inverse permutation 2~!. The best cost 
is O(logn/ log logn) time. 

In general, we can store two orders using 
the wavelet tree, one being a permutation of 
the other. Inverted lists in information retrieval 
can store document IDs in increasing order of 
enumeration, but they also want to store these 
IDs in decreasing order of importance through 
some raking function. As it is clear now, these are 
two orders that can be simultaneously preserved 
inside the wavelet tree. 


Applications 


Looking back at previous work, some ideas be- 
hind the wavelet tree can be found in Karkkai- 
nen’s PhD thesis and in Chazelle’s functional 
approach to data structures for multidimensional 
searching. The wavelet tree in its explicit and 
fully functional form has been introduced by 
Grossi, Gupta, and Vitter to store the Burrows- 
Wheeler transform (BWT) for obtaining com- 
pressed text indexes (able to support fast pattern 
searching). Its natural application is supporting 
rank and select queries for the symbols of 
the resulting compressed BWT. Since then, many 
papers have explored the properties of the wavelet 
trees in several applications. Apart from com- 
pressed full-text indexes, researchers have em- 
ployed wavelet trees in inverted lists, graphs, bi- 
nary relations, numeric sequences, colored range 
queries, XPath queries, semi-structure data, and 
frequent item sets, to name a few. The wavelet 
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trie extends the wavelet tree to store a sequence 
of strings, rather than a sequence of symbols, thus 
allowing the supported operations to operate also 
on the prefixes of the strings. The wavelet matrix 
is a variant of a balanced wavelet tree, in which all 
the bitvectors on the same level are concatenated, 
and is particularly efficient for large alphabet 
size oO. 
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Problem Definition 


This problem is concerned with a weighted 
version of the classical minimum connected 
dominating set problem. This problem has 


numerous motivations including wireless 
networks and distributed systems. Previous 
work [1,2,4,5,6,14] in wireless networks 
focuses on designing efficient distributed 
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algorithms to construct the connected dominating 
set which can be used as the virtual backbone for 
the network. Most of the proposed methods try to 
minimize the number of nodes in the backbone 
(i.e., the number of clusterheads). However, 
in many applications, minimizing the size of 
the backbone is not sufficient. For example, in 
wireless networks different wireless nodes may 
have different costs for serving as a clusterhead, 
due to device differences, power capacities, and 
information loads to be processed. Thus, by 
assuming each node has a cost to being in the 
backbone, there is a need to study distributed 
algorithms for weighted backbone formation. 
Centralized algorithms to construct a weighted 
connected dominating set with minimum weight 
have been studied [3, 7, 9]. Recently, the work 
of Wang, Wang, and Li [12, 13] proposes 
an efficient distributed method to construct 
a weighted backbone with low cost. They proved 
that the total cost of the constructed backbone 
is within a small constant factor of the optimum 
when either the nodes’ costs are smooth (i.e., 
the maximum ratio of costs of adjacent nodes 
is bounded) or the network maximum node 
degree is bounded. To the best knowledge of 
the entry authors, this work is the first to consider 
this weighted version of minimum connected 
dominating set problem and provide a distributed 
approximation algorithm. 


Notations 

A communication graph G = (V, E) over a set 
V of wireless nodes has an edge uv between 
nodes u and v if and only if wu and v can com- 
municate directly with each other, i.e., inside 
the transmission region of each other. Let dg(u) 
be the degree of node u in a graph G and A 
be the maximum node degree of all wireless 
nodes (i.e., A = max,ey dg(u)). Each wireless 
node u has a cost c(u) of being in the backbone. 
Let 6 = max;jer c(i)/c(/), where ij is the edge 
between nodes i and j, E is the set of commu- 
nication links in the wireless network G, and 
the maximum operation is taken on all pairs of 
adjacent nodes i and j in G. In other words, 6 
is the maximum ratio of costs of two adjacent 
nodes and can be called the cost smoothness of 
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the network. When 6 is bounded by some small 
constant, the node costs are smooth. When the 
transmission region of every wireless node is 
modeled by a unit disk centered at itself, the 
communication graph is often called a unit disk 
graph, denoted by UDG(V). Such networks are 
also called homogeneous networks. 

A subset S of Vis a dominating set if each node 
in V is either in S or is adjacent to some node 
in S. Nodes from S are called dominators, while 
nodes not in S are called dominatees. A subset 
B of V is a connected dominating set (CDS) 
if B is a dominating set and B induces a con- 
nected subgraph. Consequently, the nodes in B 
can communicate with each other without using 
nodes in V — B. A dominating set with mini- 
mum cardinality is called minimum dominating 
set (MDS). A CDS with minimum cardinality is 
the minimum connected dominating set (MCDS). 
In the weighted version, assume that each node u 
has a cost c(u). Then a CDS B is called weighted 
connected dominating set (WCDS). A subset B of 
V is a minimum weighted connected dominating 
set (MWCDS) if B is a WCDS with minimum 
total cost. It is well-known that finding either 
the minimum connected dominating set or the 
minimum weighted connected dominating set is 
a NP-hard problem even when G is a unit disk 
graph. The work of Wang et al. studies efficient 
approximation algorithms to construct a low-cost 
backbone which can approximate the MWCDS 
problem well. For a given communication graph 
G = (V,E,C) where V is the set of nodes, E is 
the edge set, and C is the set of weights for edges, 
the corresponding minimum weighted connected 
dominating set problem is as follows. 


Problem 1 (Minimum Weighted Connected 
Dominating Set) 

INPUT: The weighted communication graph 
G=(V,E,C). 

OuTPUT: A subset A of Vis a minimum weighted 
connected dominating set, i.e., (1) A is a domi- 
nating set; (2) A induces a connected subgraph; 
(3) the total cost of A is minimum. 


Another related problem is independent set prob- 
lem. A subset of nodes in a graph G is an 
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independent set if for any pair of nodes, there 
is no edge between them. It is a maximal inde- 
pendent set if no more nodes can be added to 
it to generate a larger independent set. Clearly, 
any maximal independent set is a dominating set. 
It is a maximum independent set (MIS) if no 
other independent set has more nodes. The inde- 
pendence number, denoted as a(G), of a graph 
G is the size of the MIS of G. The k-local 
independence number, denoted by a!1(G), is 
defined as alkl(G) = max,cy a(G;(u)). Here, 
G;(u) is the induced graph of G on k-hop neigh- 
bors of u (denoted by N;(u)), 1.e., G,(u) is de- 
fined on N;(u), and contains all edges in G with 
both end-points in N,(u). It is well-known that 
for a unit disk graph, a!!](UDG) <5 [2] and 
a?l(UDG) < 18 [11]. 


Key Results 


Since finding the minimum weighted con- 
nected dominating set (MWCDS) is NP- 
hard, centralized approximation algorithms for 
MWCDS have been studied [3, 7, 9]. In [9], 
Klein and Ravi proposed an approximation 
algorithm for the node-weighted Steiner tree 
problem. Their algorithm can be generalized 
to compute a O(log A) approximation for 
MWCDS. Guha and Khuller [7] also studied 
the approximation algorithms for node-weighted 
Steiner tree problem and MWCDS. They 
developed an algorithm for MWCDS with an 
approximation factor of (1.35+«)logA for 
any fixed € > 0. Recently, Ambuhl et al. [3] 
provided a constant approximation algorithm 
for MWCDS under UDG model. Their 
approximation ratio is bounded by 89. All 
these algorithms are centralized algorithms, 
while the applications in wireless ad hoc 
networks prefer distributed solutions for 
MWCDS. 

In [12, 13], Wang et al. proposed a distributed 
algorithm that constructs a weighted connected 
dominating set for a wireless ad hoc network G. 
Their method has two phases: the first phase 
(clustering phase, Algorithm 1 in [12, 13]) 
is to find a set of wireless nodes as the 
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dominators (clusterheads) and the second 
phase (Algorithm 2 in [12, 13]) is to find 
a set of nodes, called connectors, to connect 
these dominators to form the final backbone. 
Wang et al. proved that the total cost of 
the constructed backbone is no more than 
min(a1(G) log (A + 1), (a@!1(G) — 1)8 + 1) 
+2a!](G) times of the optimum solution. 

Algorithm 1 first constructs a maximal inde- 
pendent set (MIS) using classical greedy method 
with the node cost as the selection criterion. For 
each node v in MIS, it then runs a local greedy set 
cover method on the local neighborhood N>(v) 
to find some nodes (GRDY,,) to cover all one-hop 
neighbors of v. If GRDY, has a total cost smaller 
than v, then it uses GRDY, to replace v, which 
further reduces the cost of MIS. The following 
theorem of the total cost of this selected set is 
proved in [12, 13]. 


Theorem 1 For a network modeled by a graph 
G, Algorithm I (in [12, 13]) constructs a dom- 
inating set whose total cost is no more than 
min(a!?1(G) log(A+1), (a!!1(G)—1)6 +1) times 
of the optimum. 


Algorithm 2 finds some connectors among all 
the dominatees to connect the dominators into 
a backbone (CDS). It forms a CDS by finding 
connectors to connect any pair of dominators u 
and v if they are connected in the original graph 
G with at most 3 hops. A distributed algorithm 
to build a MST then is performed on the CDS. 
The following theorem of the total cost of these 
connectors is proved in [12, 13]. 


Theorem 2 The connectors selected by Algo- 
rithm 2 (in [12, 13]) have a total cost no more 
than 2-al4I(G) times of the optimum for net- 
works modeled by G. 


Combining Theorems | and 2, the following 
theorem is the main contributions of the work of 
Wang et al.. 


Theorem 3 For any communication graph 
G, Algorithm I and Algorithm 2. construct 
a weighted connected dominating set whose total 
cost is no more than 
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min(a1(G) log(A + 1), (a(G) — 1)8 + 1) 
+ 2alI(G) 


times of the optimum. 


Notice that, for homogeneous wireless net- 
works modeled by UDG, it implies that the 
constructed backbone has a cost no more than 
min(18 log(A + 1),45 + 1)+ 10 times of the 
optimum. The advantage of the constructed 
backbone is that the total cost is small compared 
with the optimum when either the costs of 
wireless nodes are smooth, i.e., two neighboring 
nodes’ costs differ by a small constant factor, or 
the maximum node degree is low. 

In term of time complexity, the most time- 
consuming step in the proposed distributed 
algorithm is building the MST. In [10], Kuhn 
et al. gave a lower bound on the distributed 
time complexity of any distributed algorithm 
that wants to compute a minimum dominating 
set in a graph. Essentially, they proved that even 
for the unconnected and unweighted case, any 
distributed approximation algorithm with poly- 
logarithmic approximation guarantee for the 
problem has to have a time-complexity of at 


least 2 (log A/ log log A). 
Applications 
The proposed distributed algorithms for 


MWCDS can be used in ad hoc networks or 
distributed system to form a low-cost network 
backbone for communication application. The 
cost used as the input of the algorithms could 
be a generic cost, defined by various practical 
applications. It may represent the fitness or 
priority of each node to be a clusterhead. 
The lower cost means the higher priority. In 
practice, the cost could represent the power 
consumption rate of the node if a backbone 
with small power consumption is needed; the 
robustness of the node if fault-tolerant backbone 
is needed; or a function of its security level if 
a secure backbone is needed; or a combined 
weight function to integrate various metrics 
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such as traffic load, signal overhead, battery 
level, and coverage. Therefore, by defining 
different costs, the proposed low-cost backbone 
formation algorithms can be used in various 
practical applications. Beside forming the 
backbone for routing, the weighted clustering 
algorithm (Algorithm 1) can also be used in 
other applications, such as selecting the mobile 
agents to perform intrusion detection in ad 
hoc networks [8] (to achieve more robust and 
power efficient agent selection), or select the 
rendezvous points to collect and store data in 
sensor networks [15] (to achieve the energy 
efficiency and storage balancing). 


Open Problems 


A number of problems related to the work of 
Wang, Wang, and Li [12, 13] remain open. The 
proposed method assumes that the nodes are 
almost-static in a reasonable period of time. How- 
ever, in some network applications, the network 
could be highly dynamic (both the topology or 
the cost could change). Therefore, after the gen- 
eration of the weighted backbone, the dynamic 
maintenance of the backbone is also an important 
issue. It is still unknown how to update the 
topology efficiently while preserving the approx- 
imation quality. 

In [12, 13], the following assumptions on 
wireless network model is used: omni-directional 
antenna, single transmission received by all 
nodes within the vicinity of the transmitter. 
The MWCDS problem will become much more 
complicated if some of these assumptions are 
relaxed. 


Experimental Results 


In [12, 13], simulations on random networks are 
conducted to evaluate the performances of the 
proposed weighted backbone and several back- 
bones built by previous methods. The simulation 
results confirm the theoretical results. 


Weighted Popular Matchings 


Cross-References 


Connected Dominating Set 


Recommended Reading 


1. 


10. 


11. 


12. 


Alzoubi K, Wan P-J, Frieder O (2002) New dis- 
tributed algorithm for connected dominating set in 
wireless ad hoc networks. In: Proceedings of IEEE 
35th Hawaii international conference on system sci- 
ences (HICSS-35), Hawaii, 7-10 Jan 2002 

Alzoubi K, Li X-Y, Wang Y, Wan P-J, Frieder 
O (2003) Geometric spanners for wireless ad 
hoc networks. IEEE Trans Parallel Distrib Process 
14:408-421 

Ambuhl C, Erlebach T, Mihalak M, Nunkesser M 
(2006) Constant factor approximation for minimum- 
weight (connected) dominating sets in unit disk 
graphs. In: Proceedings of the 9th international 
workshop on approximation algorithms for com- 
binatorial optimization problems (APPROX 2006), 
Barcelona, 28-30 Aug 2006. LNCS, vol 4110. 
Springer, Berlin/Heidelberg, pp 3-14 

Bao L, Garcia—Aceves JJ (2003) Topology manage- 
ment in ad hoc networks. In: Proceedings of the 
4th ACM international symposium on mobile ad hoc 
networking & computing, Annapolis, 1-3 June 2003. 
ACM Press, New York, pp 129-140 

Chatterjee M, Das S, Turgut D (2002) WCA: a 
weighted clustering algorithm for mobile ad hoc 
networks. J Clust Comput 5:193—204 

Das B, Bharghavan V (1997) Routing in ad-hoc 
networks using minimum connected dominating sets. 
In: Proceedings of IEEE international conference 
on communications (ICC’97), Montreal, 8-12 June 
1997, vol 1, pp 376-380 

Guhaa S, Khuller S (1999) Improved methods for 
approximating node weighted Steiner trees and con- 
nected dominating sets. Inf Comput 150:57-74 
Kachirski O, Guha R (2002) Intrusion detection us- 
ing mobile agents in wireless ad hoc networks. In: 
Proceedings of IEEE workshop on knowledge media 
networking, Kyoto, 10-12 July 2002 

Klein P, Ravi R (1995) A nearly best-possible approx- 
imation algorithm for node-weighted Steiner trees. J 
Algorithms 19:104—-115 

Kuhn F, Moscibroda T, Wattenhofer R (2004) What 
cannot be computed locally! In: Proceedings of the 
23rd ACM symposium on the principles of dis- 
tributed computing (PODC), St. John’s, July 2004 

Li X-Y, Wan P-J (2005) Theoretically good dis- 
tributed CDMA/OVSF code assignment for wireless 
ad hoc networks. In: Proceedings of 11th interna- 
tional computing and combinatorics conference (CO- 
COON), Kunming, 16-19 Aug 2005 

Wang Y, Wang W, Li X-Y (2005) Efficient distributed 
low-cost backbone formation for wireless networks. 


2363 


In: Proceedings of 6th ACM international symposium 
on mobile ad hoc networking and computing (Mobi- 
Hoc 2005), Urbana-Champaign, 25-27 May 2005 
Wang Y, Wang W, Li X-Y (2006) Efficient distributed 
low cost backbone formation for wireless networks. 
IEEE Trans Parallel Distrib Syst 17:681-693 

Wu J, Li H (2001) A dominating-set-based routing 
scheme in ad hoc wireless networks. Spec Iss Wirel 
Netw Telecommun Syst J 3:63-84 

Zheng R, He G, Gupta I, Sha L (2004) Time indexing 
in sensor networks. In: Proceedings of Ist IEEE 
international conference on mobile ad-hoc and sensor 
systems (MASS), Fort Lauderdale, 24-27 Oct 2004 


13. 
14. 


15. 


Weighted Popular Matchings 


Julian Mestre 

Department of Computer Science, University of 
Maryland, College Park, MD, USA 

School of Information Technologies, The 
University of Sydney, Sydney, NSW, Australia 


Years and Authors of Summarized 
Original Work 


2006; Mestre 


Problem Definition 


Consider the problem of matching a set of indi- 
viduals X to a set of items Y where each individual 
has a weight and a personal preference over the 
items. The objective is to construct a matching 
M that is stable in the sense that there is no 
matching M’ such that the weighted majority vote 
will choose M’ over M. 

More formally, a bipartite graph (X,Y, £), 
a weight w(x) € R™ for each individual x € X, 


and a rank function r: EF — {1,...,|Y]} 
encoding the individual preferences are 
given. For every applicant x and _ items 


y1,¥2 € Y say applicant x prefers y, over y if 
r(x, v1) < r(x, y2), and x is indifferent between 
y, and yo if r(x, y1) = r(x, y2). The preference 
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lists are said to be strictly ordered if applicants are 
never indifferent between two items, otherwise 
the preference lists are said to contain ties. 

Let M and M’ be two matchings. An appli- 
cant x prefers M over M’ if x prefers the item 
he/she gets in M over the item he/she gets in M’. 
A matching M is more popular than M' if the 
applicants that prefer M over M’ outweigh those 
that prefer M’ over M. Finally, a matching M is 
weighted popular if there is no matching M’ more 
popular than M. 

In the weighted popular matching problem 
it is necessary to determine if a given instance 
admits a popular matching, and if so, to produce 
one. In the maximum weighted popular matching 
problem it is necessary to find a popular matching 
of maximum cardinality, provided one exists. 

Abraham et al. [2] gave the first polynomial 
time algorithms for the special case of these 
problems where the weights are uniform. Later, 
Mestre [8] introduced the weighted variant and 
developed polynomial time algorithms for it. 


Key Results 


Theorem 1 The weighted popular matching and 
maximum weighted popular matching problems 
on instances with strictly ordered preferences can 
be solved in O(|X| + |E]) time. 


Theorem 2 The weighted popular matching and 
maximum weighted popular matching problems 
on instances with arbitrary preferences can be 


solved in O(min{k \/|X |, |X |}|E]) time. 


Both results rely on an alternative easy-to- 
compute characterization of weighted popular 
matchings called well-formed matchings. It can 
be shown that every popular matching is well- 
formed. While in unweighted instances every 
well-formed matching is popular [2], in weighted 
instances there may be well-formed matchings 
that are not popular. These non-popular well- 
formed matchings can be weeded out by pruning 
certain bad edges that cannot be part of any 
popular matching. In other words, the instance 
can be pruned so that a matching is popular if and 
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only if it is well-formed and is contained in the 
pruned instance [8]. 


Applications 


Many real-life problems can be modeled us- 
ing one-sided preferences. For example, the as- 
signment of graduates to training positions [5], 
families to government-subsidized housing [10], 
students to projects [9], and Internet rental mar- 
kets [1] such as Netflix where subscribers are 
assigned DVDs. 

Furthermore, the weighted framework allows 
one to model the naturally occurring situation 
in which some subset of users has priority over 
the rest. For example, an Internet rental site may 
offer a “premium” subscription plan and promise 
priority over “regular” subscribers. 
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Problem Definition 


The problem of random sampling without re- 
placement (RS) calls for the selection of m dis- 
tinct random items out of a population of size n. If 
all items have the same probability to be selected, 
the problem is known as uniform RS. Uniform 
random sampling in one pass is discussed in 
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[1, 6, 11]. Reservoir-type uniform sampling algo- 
rithms over data streams are discussed in [12]. 
A parallel uniform random sampling algorithm 
is given in [10]. In weighted random sampling 
(WRS) the items are weighted and the probability 
of each item to be selected is determined by its 
relative weight. WRS can be defined with the 
following algorithm D: 


Algorithm D, a definition of WRS 


Input: A population V of n weighted items 

Output: A set S with a WRS of size m 

1: Fork = 1 tomdo 

2: Let pi(k) =wi/d’s,ev—s Wy be the probability 
of item v; to be selected in round k 

3: Randomly select an item v; € V — S and insert it into 
S 

4: End-For 


Problem 1 (WRS) 

INPUT: A population V of n weighted items. 
OuTPuT: A set S with a weighted random 
sample. 


The most important algorithms for WRS are the 
Alias Method, Partial Sum Trees and the Accep- 
tance/Rejection method (see [9] for a summary 
of WRS algorithms). None of these algorithms is 
appropriate for one-pass WRS. In this work, an 
algorithm for WRS is presented. The algorithm 
is simple, very flexible, and solves the WRS 
problem over data streams. Furthermore, the al- 
gorithm admits parallel or distributed implemen- 
tation. To the best knowledge of the entry authors, 
this is the first algorithm for WRS over data 
streams and for WRS in parallel or distributed 
settings. 


Definitions 

One-pass WRS is the problem of generating 
a weighted random sample in one-pass over 
a population. If additionally the population size 
is initially unknown (e.g., a data streams), the 
random sample can be generated with reservoir 
sampling algorithms. These algorithms keep an 
auxiliary storage, the reservoir, with all items that 
are candidates for the final sample. 
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Notation and Assumptions 

The item weights are initially unknown, strictly 
positive reals. The population size is n, the size 
of the random sample is m and the weight of 
item y; is w;. The function random(L, H) gener- 
ates a uniform random number in (L, H). X de- 
notes a random variable. Infinite precision arith- 
metic is assumed. Unless otherwise specified, 
all sampling problems are without replacement. 
Depending on the context, WRS is used to denote 
a weighted random sample or the operation of 
weighted random sampling. 


Key Results 


All the results with their proofs can be found 
in [4]. 

The crux of the WRS approach of this work is 
given with the following algorithm A: 


Algorithm A 


Input: A population V of n weighted items 

Output: A WRS of size m 

1: For each vu; €V, 
k; = ul /wi) 

2: Select the m items with the largest keys k; as a 
WRS 


uj =random(0,1) and 


Theorem 1 Algorithm A generates a WRS. 


A reservoir-type adaptation of algorithm A is the 
following algorithm A-Res: 


Algorithm A with a Reservoir (A-Res) 


Input: A population V of n weighted items 

Output: A reservoir R with a WRS of size m 

1: The first m items of V are inserted into R 

2: For each item v;€R: Calculate a key k; =" : 
where uj = random(0, 1) 

3: Repeat Steps 4-7 fori=m+1,m-+2,...,n 

4: The smallest key in R is the current threshold T 

(1/wi) 


5: For item v;: Calculate a key kj =u; 
uj = random(0, 1) 

6: If the key k; is larger than T, then: 

7: The item with the minimum key in R is replaced 
by item v; 


, where 
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Algorithm A-Res performs the calculations re- 
quired by algorithm A and hence by Theorem 1 
A-Res generates a WRS. The number of reservoir 
operations for algorithm A-Res is given by the 
following Proposition: 


Theorem 2 /f A-Res is applied on n weighted 
items, where the weights w; > 0 are independent 
random variables with a common continuous dis- 
tribution, then the expected number of reservoir 
insertions (without the initial m insertions) is: 


n n 
a P [item i is inserted into S| = 


= 0(m-toe(*) 


Let S,, be the sum of the weights of the items 
that will be skipped by A-Res until a new item 
enters the reservoir. If 7, is the current threshold 
to enter the reservoir, then S,, is a continuous 
random variable that follows an exponential dis- 
tribution. Instead of generating a key for every 
item, it is possible to generate random jumps 
that correspond to the sum S,,. Similar techniques 
have been applied for uniform random sampling 
(see for example [3]). The following algorithm A- 
ExpJ is an exponential jumps-type adaptation of 
algorithm A: 


Theorem 3 Algorithm A-ExpJ generates a WRS. 


The number of exponential jumps of A-ExpJ 
is given by Proposition 2. Hence algorithm A- 
ExpJ reduces the number of random variates 
that have to be generated from O(n) (for A- 
Res) to O(mlog(n/m)). Since generating 
high-quality random variates can be a costly 
operation this is a significant improvement 
for the complexity of the sampling algo- 
rithm. 


Applications 


Random sampling is a fundamental problem in 
computer science with applications in many fields 
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Algorithm A with exponential jumps (A-ExpJ) 

Input: A population V of n weighted items 

Output: A reservoir R with a WRS of size m 

1: The first m items of V are inserted into R 

2: For each item vu; € R: Calculate a 
ki; = we where u; = random(0, 1) 

3: The threshold 7,, is the minimum key of R 

4: Repeat Steps 5-10 until the population is 
exhausted 

5: Let r=random(0, 1) 


key 


and X,, = log(r)/ 


log (Ty) 

6: From the current item v, skip items until item v;, 
such that: 

7. We twep1 tees + wi-r < Xy < wet 
We+1 te1+ bwi-1 t+ wi 

8: The item in R with the minimum key is replaced 
by item v; 

9: Lett, = T,"", ro = random(t,, 1) and ys key: 
kj = ry (1/mi) 

10: The new threshold 7,, is the new minimum key 
of R 


including databases (see [S5, 9] and the refer- 
ences therein), data mining, and approximation 
algorithms and randomized algorithms [7]. Con- 
sequently, algorithm A for WRS is a general 
tool that can find applications in the design of 
randomized algorithms. For example, algorithm 
A can be used within approximation algorithms 
for the k-Median [7]. 

The reservoir based versions of algorithm 
A, A-Res and A-ExpJ, have very small 
requirements for auxiliary storage space (m keys 
organized as a heap) and during the sampling 
process their reservoir continuously contains 
a weighted random sample that is valid for 
the already processed data. This makes the 
algorithms applicable to the emerging area 
of algorithms for processing data streams 
[2, 8]. 

Algorithms A-Res and A-ExpJ can be used 
for weighted random sampling with replacement 
from data streams. In particular, it is possible 
to generate a weighted random sample with 
replacement of size k with A-Res or A- 
ExpJ, by running concurrently, in one pass, 
k instances of A-Res or A-ExpJ respectively. 
Each algorithm instance must be executed 
with a trivial reservoir of size 1. At the end, 
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the union of all reservoirs is a WRS with 
replacement. 


URL to Code 


The algorithms presented in this work are easy 
to implement. An experimental implementation 
in Java can be found at: http://utopia.duth. gr/~ 
pefraimi/projects/WRS/index.html 
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Problem Definition 


Well-separated pair decomposition, introduced 
by Callahan and Kosaraju [4], has found numer- 
ous applications in solving proximity problems 
for points in the Euclidean space. A pair of point 
sets (A, B) is c well separated if the distance 
between A and B is at least c times the diam- 
eters of both A and B. A well-separated pair 
decomposition of a point set consists of a set of 
well-separated pairs that “cover” all the pairs of 
distinct points, i.e., any two distinct points belong 
to the different sets of some pair. Callahan and 
Kosaraju [4] showed that for any point set in a 
Euclidean space and for any constant c > 1, 
there always exists a c-well-separated pair de- 
composition (c-WSPD) with linearly many pairs. 
This fact has been very useful for obtaining 
nearly linear-time algorithms for many problems, 
such as computing k-nearest neighbors, N-body 
potential fields, geometric spanners, approximate 
minimum spanning trees, etc. Well-separated pair 
decomposition has also been shown to be very 
useful for obtaining efficient dynamic, parallel, 
and external memory algorithms. 

The definition of well-separated pair decom- 
position can be naturally extended to any metric 
space. However, a general metric space may not 
admit a well-separated pair decomposition with 
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a subquadratic size. Indeed, even for the metric 
induced by the shortest path distance in a star 
tree with unit weight on each edge, any well- 
separated pair decomposition requires quadrati- 
cally many pairs. This makes the well-separated 
pair decomposition useless for such a metric. 
However, it has been shown that for the unit- 
disk graph metric, there do exist well-separated 
pair decompositions with almost linear size, and 
therefore many proximity problems under the 
unit-disk graph metric can be solved efficiently. 


Unit-Disk Graphs 

Denote by d(-, -) the Euclidean metric. For a set of 
points S in the plane, the unit-disk graph 7(S) = 
(S, E) is defined to be the weighted graph where 
an edge e = (p,q) is in the graph if d(p,q) < 
1, and the weight of e is d(p,q). Likewise, one 
can define the unit-ball graph for points in higher 
dimensions [5]. 

Unit-disk graphs have been used extensively to 
model the communication or influence between 
objects [9, 12] and have been studied in many 
different contexts [5, 10]. For an example, wire- 
less ad hoc networks can be modeled by unit-disk 
graphs [8], as two wireless nodes can directly 
communicate with each other only if they are 
within a certain distance. In unsupervised learn- 
ing, for a dense sampling of points from some 
unknown manifold, the length of the shortest 
path on the unit-ball graph is a good approxi- 
mation of the geodesic distance on the underly- 
ing (unknown) manifold if the radius is chosen 
appropriately [6, 14]. By using well-separated 
pair decomposition, one can encode the all-pair 
distances approximately by a compact data struc- 
ture that supports approximate distance queries in 
O(1) time. 


Metric Space 

Suppose that (S, 1) is a metric space where S' 
is a set of elements and x the distance function 
defined on S x S. For any subset S; C S, 
the diameter D,(S;) (or D(S;) when wm is 
clear from the context) of S is defined to be 
MAaXs,,5.€¢ 5, U(51,52). The distance 1(S}, S2) 
between two sets S;,S2 C S is defined to be 
MiNs,,¢5,,s2¢S> (51, $2). 
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Well-Separated Pair Decomposition 
For a metric space (S,1), two nonempty sub- 
sets $1,S2 C S are called c well separated if 
m(S1,S2) > ¢-max(Dx(Si), Dx(S2)). 
Following the definition in [4], for any two sets 
A and B, a set of pairs P = {P1, Po,..., Pm}, 
where P; = (A;, B;), is called a pair decomposi- 
tion of (A, B) (or of Aif A = B) if: 


¢ For all the i’s, A; C A, and B; C B. 

s A; N B; = 0, 

¢ For any two elements a € A and b € B, there 
exists a unique i such thata € A;,andb € B;. 
Call (a, b) is covered by the pair (A;, B;). 


If in addition, every pair in P is c well separated, 
P is called a c-well-separated pair decomposition 
(or c-WSPD for short). Clearly, any metric space 
admits a c-WSPD with quadratic size by using 
the trivial family that contains all the pairwise 
elements. 


Key Results 


In [7], it was shown that for the metric induced 
by the unit-disk graph on n points and for any 
constant c > 1, there does exist a c-WSPD 
with O(n logn) pairs, and such a decomposition 
can be computed in O(n log) time. It was also 
shown that the bounds can be extended to higher 
dimensions. The following theorems state the key 
results for two and higher dimensions: 


Theorem 1 For any set S of n points in the plane 
and any c = 1, there exists a c-WSPD P of S 
under the unit-disk graph metric where P con- 
tains O (c4n log n) pairs and can be computed in 
O (c*n logn) time. 


Theorem 2 For any set S of n points in R*, for 
k > 3, and for any constant c > 1, there exists a 
c-WSPD P of S under the unit-ball graph metric 
where P contains O (yn?) pairs and can be 
constructed in O (n4/ 3polylog n) time for k = 3 
and in O (ge er) time fork > 4. 


The difficulty in obtaining a well-separated 
pair decomposition for the unit-disk graph metric 
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is that two points that are close in space are 
not necessarily close under the graph metric. 
The above bounds are first shown for the point 
set with constant-bounded density, i.e., a point 
set where any unit disk covers only a constant 
number of points in the set. The upper bound 
on the number of pairs is obtained by using a 
packing argument similar to the one used in [1]. 
For a point set with unbounded density, one 
applies a clustering technique similar to the one 
used in [8] to the point set and obtains a set 
of “clusterheads” with a bounded density. Then 
the result for bounded density is applied to those 
clusterheads. Finally, the well-separated pair de- 
composition is obtained by combining the well- 
separated pair decomposition for the bounded 
density point sets and for the Euclidean metric. 
The number of pairs is dominated by the number 
of pairs constructed for a constant density set, 
which is in turn dominated by the bound given by 
the packing argument. It has been shown that the 
bounds on the number of pairs is tight for k > 3. 


Applications 


For a pair of well-separated sets, the distance 
between two points from different sets can be 
approximated by the “distance” between the two 
sets or the distance between any pair of points 
in different sets. In other words, a well-separated 
pair decomposition can be thought of as a com- 
pressed representation to approximate the O(n) 
pairwise distances. Many problems that require 
the pairwise distances to be checked can there- 
fore be approximately solved by examining those 
distances between the well-separated pairs of 
sets. When the size of the well-separated pair 
decomposition is subquadratic, it often results in 
more efficient algorithms than examining all the 
pairwise distances. Indeed, this is the intuition 
behind many applications of the geometric well- 
separated pair decomposition. By using the same 
intuition, one can apply the well-separated pair 
decomposition in several proximity problems un- 
der the unit-disk graph metric. 
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Suppose that (S, d) is a metric space. Let 
S; C S. Consider the following natural proxim- 
ity problems: 


¢ Furthest neighbor, diameter, center. The 
furthest neighbor of p € Sj, is the point in 
S; that maximizes the distance to p. Related 
problems include computing the diameter, 
the maximum pairwise shortest distance for 
points in S,, and the center, the point that 
minimizes the maximum distance to all the 
other points. 

¢ Nearest neighbor, closest pair. The nearest 
neighbor of p € Sj is the point in S; with 
the minimum distance to p. Related problems 
include computing the closest pair, the pair 
with the minimum shortest distance, and the 
bichromatic closest pair, the pair that mini- 
mizes the distance between points from two 
different sets. 

¢ Median. The median of S is the point in S 
that minimizes the average (or total) distance 
to all the other points. 

¢ Stretch factor. For a graph G defined on S, 
its stretch factor with respect to the unit-disk 
graph metric is defined to be the maximum 
ratio tg(p,q)/n(p,q), where 1G, 7 are the 
distances induced by G and by the unit-disk 
graph, respectively. 


All the above problems can be solved or 
approximated efficiently for points in the 
Euclidean space. However, for the metric induced 
by a graph, even for planar graphs, very little 
is known besides solving the expensive all- 
pair shortest-path problem. For computing the 
diameter, there is a simple linear-time method 
that achieves a 2-approximation (Select an 
arbitrary node v and compute the shortest-path 
tree rooted at v. Suppose that the furthest node 
from v is distance D away. Then the diameter 
of the graph is no longer than 2D, by triangle 
inequality.) and a 4/3-approximate algorithm 
with running time O (myn logn +n? log n), 
for a graph with n vertices and m edges, by 
Aingworth et al. [2]. 
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By using the well-separated pair decompo- 
sition, Gao and Zhang [7] showed that one 
can obtain better approximation algorithms 
for the above proximity problems for the unit- 
disk graph metric. Specifically, one can obtain 
almost linear-time algorithms for computing 


the 2.42-approximation and O (n Jn logn/ e) 


time algorithms for computing the (1 + 6)- 
approximation for any ¢ > 0. In addition, the 
well-separated pair decomposition can be used to 
obtain an O(n logn/e*) space distance oracle so 
that any (1 + €) distance query in the unit-disk 
graph can be answered in O(1) time. 

The bottleneck of the above algorithms turns 
out to be computing the approximation of 
the shortest-path distances between O(n logn) 
pairs. The algorithm in [7] only constructs 
well-separated pair decompositions without 
computing a good approximation of the 
distances. The approximation ratio and the 
running time are dominated by that of the 
approximation algorithms used to estimate the 
distance between each pair in the well-separated 
pair decomposition. Once the distance estimation 
has been made, the rest of the computation only 
takes almost linear time. 

For a general graph, it is unknown whether 
O(nlogn) pairs shortest-path distances can 
be computed significantly faster than all-pair 
shortest-path distances. For a planar graph, 
one can compute the O(n logn) pairs shortest- 
path distances in O (nya log n) time by using 
separators with O (Jn) size [3]. This method 
extends to the unit-disk graph with constant- 
bounded density since such graphs enjoy a 
separator property similar to that of planar 
graphs [13]. As for approximation, Thorup [15] 
recently discovered an algorithm for planar 
graphs that can answer any (1 + €)-shortest- 
distance query in O(1/e) time after almost linear- 
time preprocessing. Unfortunately, Thorup’s 
algorithm uses balanced shortest-path separators 
in planar graphs which do not obviously extend 
to the unit-disk graphs. On the other hand, it is 
known that there does exist a planar 2.42-spanner 
for a unit-disk graph [11]. By applying Thorup’s 
algorithm to that planar spanner, one can compute 
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the 2.42-approximate shortest-path distance for 
O(n logn) pairs in almost linear time. 


Open Problems 


The most notable open problem is the gap be- 
tween Q(n) and O(n logn) on the number of 
pairs needed in the plane. Also, the time bound 
for (1 + £)-approximation is still about O (n J/n) 
due to the lack of efficient methods for computing 
the (1 + )-approximate shortest-path distances 
between O(n) pairs of points. Any improvement 
to the algorithm for that problem will immedi- 
ately lead to improvement to all the (1 + ¢)- 
approximate algorithms presented in this entry. 
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Problem Definition 


Notations 

Given a finite point set A in R®, its bounding 
box R(A) is the d-dimensional hyperrectangle 
[a1,),] x [a2, bz] x «++ x [ag, bq] that contains 
A and has minimum extension in each dimen- 
sion. 

Two point sets A, B are said to be well sepa- 
rated with respect to a separation parameter s > 0 
if there exist a real number r > 0 and two d- 
dimensional spheres C4 and Cg of radius r each, 
such that the following properties are fulfilled: 


1.C4nCg = 

2. C4 contains the bounding box R(A) of A 
3. Cg contains the bounding box R(B) of B 
4. |C4CB| =>S°Pr. 


Here |C4Cp| denotes the smallest Euclidean dis- 
tance between two points of C4 and Cg, respec- 
tively. An example is depicted in Fig. 1. Given the 
bounding boxes R(A), R(B), it takes time only 
O(d) to test if A and B are well separated with 
respect to s. 

Two points of the same set, A or B, have a 
Euclidean distance at most 2/s times the distance 
any pair (a,b) € Ax B can have. Also, any 
two such pairs (a,b), (a’,b’) differ in their dis- 
tances |a — b|,|a’ — b’| by a factor of at most 
1+4/s. 

Given a set S of n points in R¢, a well- 
separated pair decomposition of S_ with 
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respect to separation parameter s is a sequence 
(A, By), (A2, Bo),...,(Am, Bm) where 


1. A;, B; C S, fori =1...m. 

2. A; and B; are well separated with respect to s, 
fori =1...m. 

3. For all points a,b € S,a # b, there exists 
a unique index 7 in 1...m such that a ¢€ 
A; and b € B;, orb € A; anda ¢€ B; 
hold. 


Obviously, each set S = {51,...,5,} possesses 
a well-separated pair decomposition. One can 
simply use all singleton pairs ({5;},{s;}) where 
i < j. The question is if decompositions con- 
sisting of fewer than O(n”), many pairs exist and 
how to construct them efficiently. 


Key Results 


In fact, the following result has been shown by 
Callahan and Kosaraju [1, 2]. 


Theorem 1 Given a set S of n points in R@ 
and a separation parameter s, there exists a 
well-separated pair decomposition of S with re- 
spect to s that consists of O(s¢d4/?n) many 
pairs (A;, Bj). It can be constructed in time 
O(dnlogn + s4d4/?*!n), 

Thus, if dimension d and separation param- 
eter § are fixed — which is the case in many 
applications — then the number of pairs is in 


Well Separated Pair Decomposition for Unit-Disk Graph, Fig. 1 The sets A, B are well-separated with respect 
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O(n), and the decomposition can be computed in 
time O(n logn). 

The main tool in constructing the well- 
separated pair decomposition is the split tree 
T(S) of S. The root, r, of T(S) contains the 
bounding box R(S) of S. Its two child nodes are 
obtained by cutting through the middle of the 
longest dimension of R(S), using an orthogonal 
hyperplane. It splits S into two subsets Sq, Sp, 
whose bounding boxes R(Sq) and R(Sp) are 
stored at the two children a and b of root r. 
This process continues until only one point of 
S remains in each subset. These singleton sets 
form the leaves of T(S). Clearly, the split tree 
T(S) contains O(n) many nodes. It needs not 
be balanced, but it can be constructed in time 
O(dnlog n). 

A well-separated pair decomposition of S, 
with respect to a given separation parameter Ss, 
can now be obtained from T(S) in the following 
way. For each internal node of T(S) with chil- 
dren v and w, the following recursive procedure 
FindPairs(v,w) is called. If Sy and Sy, are well 
separated, then the pair (Sy, Sw) is reported. 
Otherwise, one may assume that the longest di- 
mension of R(S,) exceeds in length the longest 
dimension of R(S,,) and that v;, v; are the child 
nodes of v in T(S). Then, FindPairs(v;,w) and 
FindPairs(v,, w) are invoked. 

The total number of procedure calls is 
bounded by the number of well-separated 
pairs reported, which can be shown to be 
in O(s¢d4/2n) by a_ packing argument. 
However, the total size of all sets A;, Bj in 
the decomposition is in general quadratic 
inn, 


Applications 


From now on the dimension d is assumed to be a 
constant. The well-separated pair decomposition 
can be used in efficiently solving proximity prob- 
lems for points in R?. 


Theorem 2 Let S be a set of n points in R4. 
Then a closest pair in S can be found in optimal 
time O(n logn). 
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Indeed, let q € S be a nearest neighbor of 
p € S. One can construct a well-separated pair 
decomposition with separation parameter s > 2 
in time O(nlogn), and let (Aj, B;) be the pair 
where p € A; and q € B;. If there were another 
point p’ of S in A;, one would obtain | pp'| < 
2/s-|pq| < |pq|, which is impossible. Hence, 
Aj; is a singleton set. If (p,q) is a closest pair 
in S, then B; must be singleton, too. Therefore, 
a closest pair can be found by inspecting all 
singleton pairs among the O(n) many pairs of the 
well-separated pair decomposition. 

With more effort, the following generalization 
can be shown. 


Theorem 3 Let S be a set of n points in R@, and 
let k <n. Then for each p € S, its k nearest 
neighbors in S can be computed in total time 
O(n logn + nk). In particular, for each point in 
Scan a nearest neighbor in S be computed in 
optimal time O(n logn). 

In dimension d = 2, one would typically use 
the Voronoi diagram for solving these problems. 
But as the complexity of the Voronoi diagram 
of n points can be as large as n'4/?!, the well- 
separated pair decomposition is much more con- 
venient to use in higher dimensions. 

A major application of the well-separated pair 
decomposition is the construction of good span- 
ners for a given point set S. A spanner of S of 
dilation t is a geometric network N with vertex 
set S such that for any two vertices p,q € S, the 
Euclidean length of a shortest path connecting 
p and q in N is at most t times the Euclidean 
distance | pq|. 


Theorem 4 Let S be a set of n points in R¢, 
and let t > 1. Then a spanner of S of dilation 
t containing O(s¢n) edges can be constructed in 
time O(s¢n-+n logn), where s = 4(t+1)(t—1). 

Indeed, if one edge (a;,b;) is chosen from 
each pair (A;, B;) of a well-separated pair de- 
composition of S with respect to s, these edges 
form a t-spanner of S, as can be shown by 
induction on the rank of each pair (p,q) € S? 
in the list of all such pairs, sorted by distance. 

Since spanners have many interesting applica- 
tions of their own, several articles of this encyclo- 
pedia are devoted to this topic. 
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Open Problems 


An important open question is which metric 
spaces admit well-separated pair decompositions. 
It is easy to see that the packing arguments used 
in the Euclidean case carry over to the case of 
convex distance functions in R?. More generally, 
Talwar [6] has shown how to compute well- 
separated pair decompositions for point sets of 
bounded aspect ratio in metric spaces of bounded 
doubling dimension. 

On the other hand, for the metric induced 
by a disk graph in R?, a quadratic number of 
pairs may be necessary in the well-separated pair 
decomposition. (In a disk graph, each point p € 
S is center of a disk Dp of radius ry. Two points 
P,q are connected by an edge if and only if 
D),QDq # 9. The metric is defined by Euclidean 
shortest path length in the resulting graph. If this 
graph is a star with rays of identical length, a 
well-separated pair decomposition with respect to 
s > 4 must consist of singleton pairs.) Even for 
a unit disk graph, Q(n?-2/¢) many pairs may be 
necessary for points in R¢, as Gao and Zhang [4] 
have shown. 
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Problem Definition 


The problem is about minimizing the delay of an 
interconnect wire in a very-large-scale integration 
(VLSD) circuit by changing the width (..e., siz- 
ing) of the wire. The delay of interconnect wire 
has become a dominant factor in determining 
VLSI circuit performance for advanced VLSI 
technology. Wire sizing has been shown to be an 
effective technique to minimize the interconnect 
delay. The work of Chu and Wong [1] shows 
that the wire sizing problem can be transformed 
into a convex quadratic program. This quadratic 
programming approach is very efficient and can 
be naturally extended to simultaneously consider 
buffer insertion, which is another popular inter- 
connect delay minimization technique. Previous 
approaches apply either a dynamic programming 
approach [2], which is computationally more ex- 
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pensive, or an iterative greedy approach [3, 4], 
which is hard to combine with buffer insertion. 

The wire sizing problem is formulated as fol- 
lows and is illustrated in Fig. 1. Consider a wire 
of length L. The wire is connecting a driver 
with driver resistance Rp to a load with load 
capacitance Cy. In addition, there is a set H = 
{h1,...,4n} of n wire widths allowed by the 
fabrication technology. Assume hy > --- > hy. 
The wire sizing problem is to determine the wire 
width function f(x) : [0,L] — 4 so that the 
delay for a signal to travel from the driver through 
the wire to the load is minimized. 

As in most previous works on wire sizing, the 
work of Chu and Wong uses the Elmore delay 
model to compute the delay. The Elmore delay 
model is a delay model for RC circuits (ie., 
circuits consisting of resistors and capacitors). 
The Elmore delay for a signal path is equal to 
the sum of the delays associated with all resistors 
along the path, where the delay associated with 
each resistor is equal to its resistance times its 
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sheet resistance and c(h) is the unit length wire 
capacitance. c(h) is an increasing function in 
practice. The wire segment can be modeled as a 
mt-type RC circuit as shown in Fig. 2. 


Key Results 

Lemmal The optimal wire width func- 
tion f(x) is a monotonically decreasing 
function. 


Lemma | above can be used to greatly sim- 
plify the wire sizing problem. It implies that 
an optimally sized wire can be divided into n 
segments such that the width of i-th segment is 
h;. The length of each segment is to be deter- 
mined. The simplified problem is illustrated in 
Fig. 3. 


Lemma 2 For the wire in Fig.3, the Elmore 
delay is 


total downstream capacitance. For a wire segment D= lire; + pil + RpCz 
of length / and width h, its resistance is ro//h 2 
and its capacitance is c(h)/, where ro is the wire where 
c(hy)ro/hy c(h2)ro/hy c(h3)ro/ hy ace C(hn)ro/hy 
c(h2)ro/hy c(h2)ro/ hz c(h3)ro/h2 ae c(hy)ro/ hz 
& = | c(hs)ro/hi clha)ro/h2 c(h3)ro/ha +++ c(An)ro/hs | | 
C(An)ro/hy c(hn)ro/h2 c(hn)ro/h3 ++: c(An)ro/hn 
Rpc(hy) + Crro/hi I 
Rpc(h2) + Crro/h2 iD) 
p =| Roclhs) + Crro/hs | andi = | 3 
Rpc(hn) + Ciro/hn In 
So the wire sizing problem can be written in Quadratic programming is NP-hard in 


the following quadratic program: 


WS : minimize 117 +p'l 
subject to]; +---+], =L 
I, >Oforl <i<n 


general. In order to solve WS efficiently, 
some properties of the Hessian matrix ® are 
explored. 


Definition 1 (Symmetric Decomposable Ma- 
trix) Let Q = (qi;) be an n x n symmetric 
matrix. If for some a (a1,...,n)" and 


2376 Wire Sizing 
Wire Sizing, Fig.1 The 
wire sizing problem 
= 62 
0 x L 
: Vv 
Wire Sizing, Fig. 2 The rol /h 
model of a wire segment 
by a m-type RC circuit [ie] — . WWW : 
a sa == c(h)l = c(h)! 
2 2 
V V 
Wire Sizing, Fig.3 The L 
simplified wire sizing ae Soe ee te ae ee 
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v = (v1,..., Un)! such that 0 < aj <-+-<a@y,, VS is transformed into the following equality- 


Gij = Qji = ajvjv; fori < j, then Q is called 
a symmetric decomposable matrix. Let Q be 
denoted as SDM («, v). 


Lemma 3 /f Q is symmetric decomposable, then 
Q is positive definite. 


Lemma 4 @ in WS is symmetric decompos- 
able. 


Lemma 3 together with Lemma 4 implies that 
the Hessian matrix ® of WS is positive definite. 
Hence, the problem WS is a convex quadratic 
program and is solvable in polynomial time [5]. 

The work of Chu and Wong proposes to solve 
WS by active set method. The active set method 
transforms a problem with some inequality con- 
straints into a sequence of problems with only 
equality constraints. The method stops when the 
solution of the transformed problem satisfies both 
the feasibility and optimality conditions of the 
original problem. For the problem WS, the active 
set method keeps track of an active set A in 
each iteration. The method sets /; = 0 for all 
j € A and ignores the constraints /; > 0 for all 
J GA. Let {fi,..., jr} = {1,...,2} — A. Then 


constrained wire sizing problem: 


ECWS : minimize LIT Bala + pila 
subject to F yl, = L 


where 24 = (1;,,...,1;,)7, Fa = (1+ 0), 
pa = (Rocth;,)+Crro/hj,....,Rpoc(hj,)+ 
Cyro/hj,)", and @®, is the symmetric 
decomposable matrix corresponding to A (i.e., 


- 
4 = SDM(a a, ro 
A — v4) with a4 (can, 
To 
<3 d = h;,),..-, 
Goo) a Ae (c(hj,), 
t(hy,))" 


Lemma 5 The solution of ECWS is 


da = —PAO3'T YL APy' pa tL) 
ly =O) Trlr, @,)p, 


II 


Lemma 6 /f Q is symmetric decomposable, 
then Q~' is tridiagonal. In particular, if 
Q = SDM(a,v), then Q-' = (6) 
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where 6;; = a ai 
(a; —Qj-1)¥7 (O41 — A); 
Gi41 = O41 = forl < 


(Qj41 — Oj )UiV;41 


and Gi; = 0 


Bae ae (Qn — On—1) v2’ 
otherwise. 

By Lemmas 5 and 6, ECWS can be solved 
in O(n) time. To solve WS, in practice, the 
active set method takes less than n iterations 
and hence the total runtime is O(n). Note that 
unlike previous works, the runtime of this convex 
quadratic programming approach is independent 
of the wire length L. 


Applications 


The wire sizing technique is commonly applied 
to minimize the wire delay and hence to improve 
the performance of VLSI circuits. As there are 
typically millions of wires in modern VLSI cir- 
cuits, and each wire may be sized many times 
in order to explore different architecture, logic 
design, and layout during the design process, it 
is very important for wire sizing algorithms to be 
very efficient. 

Another popular technique for delay mini- 
mization of slow signals is to insert buffers (also 
called repeaters) to strengthen and accelerate the 
signals. The work of Chu and Wong can be natu- 
rally extended to simultaneously handle buffer in- 
sertion. It is shown in [1] that the delay minimiza- 
tion problem for a wire by simultaneous buffer 
insertion and wire sizing can also be formulated 
as a convex quadratic program and be solved by 
active set method. The runtime is only m times 
more than that of wire sizing, where m is the 
number of buffers inserted. m is typically 5 or less 
in practice. 

About one third of all nets in a typical VLSI 
circuit are multi-pin nets (i.e., nets with a tree 
structure to deliver a signal from a source to 
several sinks). It is important to minimize the 
delay of multi-pin nets. The work of Chu and 
Wong can also be applied to optimize multi-pin 
nets. The extension is described in Mo and Chu 
[6]. The idea is to integrate the quadratic pro- 
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gramming approach into a dynamic programming 
framework. Each branch of the net is solved as a 
convex quadratic program, while the overall tree 
structure is handled by dynamic programming. 


Open Problems 


After two decades of active research, the wire 
sizing problem by itself is now considered a well- 
solved problem. Some important solutions are 
[1-4, 6-15]. The major remaining challenge is 
to simultaneously apply wire sizing with other 
interconnect optimization techniques to improve 
circuit performance. Wire sizing, buffer inser- 
tion, and gate sizing are three most commonly 
used interconnect optimization techniques. It has 
been demonstrated that better performance can be 
achieved by applying these three techniques si- 
multaneously rather sequentially. One very prac- 
tical problem is to perform simultaneous wire siz- 
ing, buffer insertion, and gate sizing to a combi- 
national circuit such that the total resource usage 
(e.g., wire/buffer/gate area, power consumption) 
is minimized while the delay of all input-to- 
output paths are less than a given target. 
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Problem Definition 


In the k-Server Problem, the task is to schedule 
the movement of k-servers in a metric space M 
in response to a sequence Q = 11,/2,...,Tn Of 
requests, where r; € M for all i. The servers 
initially occupy some configuration X9 C M. 
After each request r; is issued, one of the k- 
servers must move to r;. A schedule S' specifies 
which server moves to each request. The task is 
to compute a schedule with minimum cost, where 
the cost of a schedule is defined as the total dis- 
tance traveled by the servers. The example below 
shows a schedule for 2 servers on a sequence of 
requests (Fig. 1). 

In the offline case, if the complete request 
sequence g is known, the optimal schedule can 
be computed in polynomial time [9]. 

Most of the research on the k-Server Problem 
focussed on the online variant, where the requests 
are issued one at a time. After the ith request 
rj 1s issued, an online algorithm must decide, 
irrevocably, which server to move to 7; before the 
next request 7;+ 1 is issued. It is quite easy to see 
that in this Online scenario it is not possible to 
guarantee an optimal schedule for all request se- 
quences. The accuracy of solutions produced by 
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Work-Function Algorithm for k-Servers, Fig.1 A 
schedule for 2 servers on a request sequence 9 = 
r1,12,..-,7. The initial configuration is Xo = 
{x 1, x2}. Server 1 serves r1,72,15,16, while server 2 
serves r3, 74,17. The cost of this schedule is d(x1,r1)+ 
d(r\,r2) + d(r2,rs) + d(rs,re) + d(x2,7r3) + 
d(r3,r4) + d(ra,r7), where d(x, y) denotes the dis- 
tance between points x, y 
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such online algorithms is often evaluated within 
the framework of competitive analysis. Denote 
by cost,4(@) the cost of the schedule produced 
by an online k-server algorithm A on a request 
sequence g, and let opt(gQ) be the cost of an 
optimal schedule on 9. A is called R-competitive 
if cost (a) < R-opt(o) + B, where B is a 
constant that may depend on M and Xo. The 
smallest such R is called the competitive ra- 
tio of A. Of course, the smaller the ratio R 
the better. 

The k-Server Problem was introduced by 
Manasse, McGeoch, and Sleator [14, 15], who 
proved that no (deterministic) online algorithm 
can achieve a competitive ratio smaller than k, 
in any metric space with at least k + 1 points. 
They also gave a 2-competitive algorithm for 
k = 2 and stated what is now known as the 
k-Server Conjecture, which postulates that there 
exists a k-competitive online algorithm for all 
k. Koutsoupias and Papadimitriou [11, 12] (see 
also [3, 8, 10]) proved that the Work-Function 
Algorithm, presented in the next section, has 
competitive ratio at most 2k — 1, which to date 
remains the best upper bound on the competitive 
ratio. 


Key Results 


The idea of the Work-Function Algorithm is to 
balance two greedy strategies when a new re- 
quest is issued. The first one is to simply serve 
the request with the closest server. The second 
strategy attempts to follow the optimum schedule. 
Roughly, from among the k possible new con- 
figurations, this strategy chooses the one where 
the optimum schedule would be at this time, if no 
more requests remained to be issued. 

To formalize this idea, for each request se- 
quence g and a k-server configuration X, let 
Wo(X) be the minimum cost of serving @ under 
the constraint that at the end the server config- 
uration is X. (Assume, for simplicity, that the 
initial configuration Xo is fixed.) The function 
@o(-) is called the work function after the request 
sequence Q. 
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Algorithm WFA. Denote by o the sequence of 
past requests, and suppose that the current server 
configuration is S = {s1,52,...,5,¢}, where s; 
is the location of the j-th server. Let r be the 
new request. Choose s; € S that minimizes the 
quantity Wo; (S— {sj \ Uf{r})+d(s;,r), and move 
server j tor. 


Theorem 1 ({11,12]) Algorithm WFA is (2k — 
1)-competitive. 


As observed in [6], Algorithm WFA can be 
interpreted as a primal-dual algorithm. 


Applications 


The k-Server Problem can be viewed as an ab- 
straction of online problems that arise in emer- 
gency crew scheduling, caching (or paging) in 
two-level memory systems, scheduling of disk 
heads, and other. Nevertheless, in its pure abstract 
form, it is mostly of theoretical interest. 
Algorithm WFA can be applied to some gener- 
alizations of the k-Server Problem. In particular, 
it is (2n — 1)-competitive for n-state metrical task 
systems, matching the lower bound [3, 4, 8]. See 
[1,3,5] for other applications and extensions. 


Open Problems 


Theorem | comes tantalizingly close to settling 
the k-Server Conjecture described earlier in this 
section. In fact, it has been even conjectured that 
Algorithm WFA itself is k-competitive for k- 
servers, but the proof of this conjecture, so far, 
remains elusive. 

For k > 3, k-competitive online k-server 
algorithms are known only for some restricted 
metric spaces, including trees, metric spaces with 
up to k + 2 points, and the Manhattan plane 
for k = 3 (see [2, 7,9, 13]). As the analysis 
of Algorithm WFA in the general case appears 
difficult, it would be of interest to prove its k- 
competitiveness for some natural special cases, 
for example in the plane (with any reasonable 
metric) for k > 4 servers. 
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Very little is known about the competitive ratio 


of the k-Server Problem in the randomized case. 


In 


fact, it is not even known whether a ratio better 


than 2 can be achieved for k = 2. 
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